Comparison of regression methods for spatially-autocorrelated count data on regularly- and irregularly-spaced locations
Comparaison par simulation des méthodes de régressions spatiale pour données de comptage sur espace régulier ou irrégulier
Résumé
It has long been known that insufficient consideration of spatial autocorrelation leads to unreliable hypothesis-tests and inaccurate parameter estimates. Yet, ecologists are confronted with a confusing array of methods to account for spatial autocorrelation. Although Beale et al. (2010) provided guidance for continuous data on regular grids, researchers still need advice for other types of data in more flexible spatial contexts. In this paper, we extend Beale et al. (2010)'s work to count data on both regularly- and irregularly-spaced plots, the latter being commonly encountered in ecological studies. Through a simulation-based approach, we assessed the accuracy and the type I errors of two frequentist and two Bayesian ready-to-use methods in the family of generalized mixed models, with distance-based or neighbourhood-based correlated random effects. In addition, we tested whether the methods are robust to spatial non-stationarity, and over- and under-dispersion - both typical features of species distribution count data which violate standard regression assumptions. In the simplest of our simulated datasets, the two frequentist methods gave inflated type I errors, while the two Bayesian methods provided satisfying results. When facing real-world complexities, the distance-based Bayesian method (MCMC with Langevin-Hastings updates) performed best of all. We hope that, in the light of our results, ecological researchers will feel more comfortable including spatial autocorrelation in their analyses of count data.