**Abstract** : Markov decision processes (MPDs) have become a popular model for real-world problems of planning under uncertainty. A vide range of applications has been published within the fields of natural resources management, forestry, agricultural economics, and robotics. An MDP can be used to represent and optimize the management of an environment. A state variable is used to represent the current state of the environment and the interaction with the environment is expressed with an action variable, allowing for stochastic transition from one state to another state. A transition function is used to express the transition between the different states of the environment, and a reward function expresses the rewards received by applying an action when the environment is in a specific state. MDPs can thereby be used to optimize management strategies for problems with uncertain effects of actions, multiple and conflicting objectives, stochastic events, and stochastic environments. A common problem when optimizing management strategies for natural resources is that large areas have to be considered, rendering the state and action space of the MDP large. For example, an agricultural area consisting of thousands of crop fields, or forests consisting of thousands of stands. Also, spatial aspects may have to be considered. Some examples are management of epidemics, invasive or endangered species, management to prevent wind damage and wild fire. However, the structure of the problem can be used to model the problem on a compact form. For example is the probability of a tree being damaged by wind only on the state of the neighbouring trees. This type of local dependency and internal structure can be used to express the problem on a compact factored form. For example, instead of modelling the state of forest with a single state variable, a set of state variables is utilized, each expressing the state of a part of forest. A number of different representational models have been proposed for this type of problems. Factored MDPs and collaborative multiagent MDPs are two frameworks that have recently received a loot of attention. The graph-based Markov decision process (GMDP) framework has recently been developed specifically for modelling a range of large-scale spatiotemporal problems. As the name suggests, the GMDP uses a graph structure to represent the transition and reward functions on a compact form. Two algorithms for computing approximate solutions for GMDPs have been proposed. The algorithms are capable of solving large-scale GMDPs as their running time increases only linearly and polynomially with the size of the problem for a fixed induced width of the graph. However, the algorithms are model-dependent and require the transition and reward functions to be known. Simulation based models of the transition and reward functions can therefore not be used. Since there are a large number of simulation based models of real-world management problems, for example (Blennow & Sallnäs, 2004; Finney, 1994), the development of model-free solution algorithms is an important extension of the GMDP framework and will increase the applicability of GMDPs. In this paper, we describe a number of model-free reinforcement learning (RL) algorithms for GMDPs derived from existing model-free RL algorithms for collaborative multiagent MDPs. The collaborativemultiagent MDP framework is similar to the GMDP framework in a number of aspects, and in this paper we use these similarities to propose RL algorithms for GMDPs. The performance of the algorithms is compared to model-dependent algorithms for a medium- and large-scale forest management problem. Our experimental results show that the proposed RL algorithms are able to efficiently compute policies demonstrating nearoptimal performance. The policies computed by the RL algorithms are of similar quality to policies computed by the model-dependent algorithms.