Reinforcement learning for spatial processes

Nicklas Paul Forsell; Frederick F. Garcia; Régis Sabbadin

Communication Dans Un Congrès Année : 2009

Reinforcement learning for spatial processes

(1, 2) , (2) , (2)

1
2

Nicklas Paul Forsell

Fonction : Auteur
PersonId : 1072145

Swedish University of Agricultural Sciences = Sveriges lantbruksuniversitet

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Frederick F. Garcia

Fonction : Auteur
PersonId : 736434
IdHAL : frederick-garcia
ORCID : 0000-0003-1974-3317
IdRef : 085540951

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Régis Sabbadin

Fonction : Auteur
PersonId : 736236
IdHAL : regis-sabbadin
ORCID : 0000-0002-6286-1821
IdRef : 133261395

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Résumé

Markov decision processes (MPDs) have become a popular model for real-world problems of planning under uncertainty. A vide range of applications has been published within the fields of natural resources management, forestry, agricultural economics, and robotics. An MDP can be used to represent and optimize the management of an environment. A state variable is used to represent the current state of the environment and the interaction with the environment is expressed with an action variable, allowing for stochastic transition from one state to another state. A transition function is used to express the transition between the different states of the environment, and a reward function expresses the rewards received by applying an action when the environment is in a specific state. MDPs can thereby be used to optimize management strategies for problems with uncertain effects of actions, multiple and conflicting objectives, stochastic events, and stochastic environments. A common problem when optimizing management strategies for natural resources is that large areas have to be considered, rendering the state and action space of the MDP large. For example, an agricultural area consisting of thousands of crop fields, or forests consisting of thousands of stands. Also, spatial aspects may have to be considered. Some examples are management of epidemics, invasive or endangered species, management to prevent wind damage and wild fire. However, the structure of the problem can be used to model the problem on a compact form. For example is the probability of a tree being damaged by wind only on the state of the neighbouring trees. This type of local dependency and internal structure can be used to express the problem on a compact factored form. For example, instead of modelling the state of forest with a single state variable, a set of state variables is utilized, each expressing the state of a part of forest. A number of different representational models have been proposed for this type of problems. Factored MDPs and collaborative multiagent MDPs are two frameworks that have recently received a loot of attention. The graph-based Markov decision process (GMDP) framework has recently been developed specifically for modelling a range of large-scale spatiotemporal problems. As the name suggests, the GMDP uses a graph structure to represent the transition and reward functions on a compact form. Two algorithms for computing approximate solutions for GMDPs have been proposed. The algorithms are capable of solving large-scale GMDPs as their running time increases only linearly and polynomially with the size of the problem for a fixed induced width of the graph. However, the algorithms are model-dependent and require the transition and reward functions to be known. Simulation based models of the transition and reward functions can therefore not be used. Since there are a large number of simulation based models of real-world management problems, for example (Blennow & Sallnäs, 2004; Finney, 1994), the development of model-free solution algorithms is an important extension of the GMDP framework and will increase the applicability of GMDPs. In this paper, we describe a number of model-free reinforcement learning (RL) algorithms for GMDPs derived from existing model-free RL algorithms for collaborative multiagent MDPs. The collaborativemultiagent MDP framework is similar to the GMDP framework in a number of aspects, and in this paper we use these similarities to propose RL algorithms for GMDPs. The performance of the algorithms is compared to model-dependent algorithms for a medium- and large-scale forest management problem. Our experimental results show that the proposed RL algorithms are able to efficiently compute policies demonstrating nearoptimal performance. The policies computed by the RL algorithms are of similar quality to policies computed by the model-dependent algorithms.

Mots clés

Reinforcement learning Planning under risk and uncertainty Spatial processes Markov decision processes

Domaines

Mathématiques [math] Informatique [cs]

Fichier principal

Reinforcement learning for spatial processes_RS_1.pdf (399.87 Ko)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02752972

Soumis le : mercredi 3 juin 2020-19:09:32

Dernière modification le : jeudi 14 mars 2024-03:13:41

Archivage à long terme le : jeudi 3 décembre 2020-09:22:23

Dates et versions

hal-02752972 , version 1 (03-06-2020)

Identifiants

HAL Id : hal-02752972 , version 1
PRODINRA : 261258

Citer

Nicklas Paul Forsell, Frederick F. Garcia, Régis Sabbadin. Reinforcement learning for spatial processes. 18. World IMACS Congress and MODSIM09. International Congress on Modelling and Simulation, Jul 2009, Cairns, Australia. 7 p. ⟨hal-02752972⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRA INRAE MATHNUM MIAT

57 Consultations

41 Téléchargements

Reinforcement learning for spatial processes

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager