Reinforcement learning-based design of sampling policies under cost constraints in Markov random fields: application to weed map reconstruction
Résumé
Weeds are responsible for yield losses in arable fields, whereas the role of weeds in agro-ecosystem food webs and in providing ecological services has been well established. Innovative weed management policies have to be designed to handle this trade-off between production and regulation services. As a consequence, there has been a growing interest in the study of the spatial distribution of weeds in crops, as a prerequisite to management. Such studies are usually based on maps of weed species. The issues involved in building probabilistic models of spatial processes as well as plausible maps of the process on the basis of models and observed data are frequently encountered and important. As important is the question of designing optimal sampling policies that make it possible to build maps of high probability when the model is known. This optimization problem is more complex to solve than the pure reconstruction problem and cannot generally be solved exactly. A generic approach to spatial sampling for optimizing map construction, based on Markov Random Fields (MRF), is provided and applied to the problem of weed sampling for mapping. MRF offer a powerful representation for reasoning on large sets of random variables in interaction. In the field of spatial statistics, the design of sampling policies has been largely studied in the case of continuous variables, using tools from the geostatistics domain. In the MRF case with finite state space variables, some heuristics have been proposed for the design problem but no universally accepted solution exists, particularly when considering adaptive policies as opposed to static ones. The problem of designing an adaptive sampling policy in an MRF can be formalized as an optimization problem. By combining tools from the fields of Artificial Intelligence (AI) and Computational Statistics, an original algorithm is then proposed for approximate resolution. This generic procedure, referred to as Least-Squares Dynamic Programming (LSDP), combines an approximation of the value of a sampling policy based on a linear regression, the construction of a batch of MRF realizations and a backwards induction algorithm. Based on an empirical comparison of the performance of LSDP with existing one-step-look-ahead sampling heuristics and solutions provided by classical AI algorithms, the following conclusions can be derived: (i) a naïve heuristic consisting of sampling sites where marginals are the most uncertain is already an efficient sampling approach; (ii) LSDP outperforms all the classical approaches we have tested; and (iii) LSDP outperforms the naïve heuristic approach in cases where sampling costs are not uniform over the set of variables or where sampling actions are constrained.