Skip to Main content Skip to Navigation
Journal articles

Analysing the impact of soil spatial sampling on the performances of Digital Soil Mapping models and their evaluation: A numerical experiment on Quantile Random Forest using clay contents obtained from Vis-NIR-SWIR hyperspectral imagery

Abstract : It has long been acknowledged that the soil spatial samplings used as inputs to DSM models are strong drivers – and often limiting factors – of the performances of such models. However, few studies have focused on evaluating this impact and identifying the related spatial sampling characteristics. In this study, a numerical experiment was conducted on this topic using the pseudo values of topsoil clay content obtained from an airborne Visible Near InfraRed-Short Wave InfraRed (Vis-NIR-SWIR) hyperspectral image in the Cap Bon region (Tunisia) as the source of the spatial sampling. Twelve thousand DSM models were built by running a Random Forest algorithm from soil spatial sampling of different sizes and average spacings (from 200 m to 2000 m) and different spatial distributions (from clustered to regularly distributed), aiming to mimic the various situations encountered when handling legacy data. These DSM models were evaluated with regard to both their prediction performances and their ability to estimate their overall and local uncertainties. Three evaluation methods were applied: a model-based one, a classical model-free one using 25% of the sites removed from the initial soil data, and a reference one using a set of 100,000 independent sites selected by stratified random sampling over the entire region. The results showed that: 1) While, as expected, the performances of the DSM models increased when the spacing of the sample increased, this increase was diminished for the smallest spacing as soon as 50% of the spatially structured variance was captured by the sampling, 2) Sampling that provided complete and even distributions in the geographical space and had as great spread of the target soil property as possible increased the DSM performances, while complete and even sampling distributions in the covariate space had less impacts, 3) Systematic underestimations of the overall uncertainty of DSM models were observed, that were all the more important that the sparse samplings poorly covered the real distribution of the target soil property and that the dense sampling were unevenly distributed in the geographical space, 4) The local uncertainties were underestimated for sparse sampling and over-estimated for dense sampling while being sensitive to the same sampling characteristics as overall uncertainty. Such finding have practical outcomes on sampling strategies and DSM model evaluation that are discussed.
Document type :
Journal articles
Complete list of metadata

https://hal.inrae.fr/hal-02891658
Contributor : Hélène Lesur <>
Submitted on : Tuesday, July 7, 2020 - 9:22:15 AM
Last modification on : Monday, May 31, 2021 - 5:04:19 PM

Links full text

Identifiers

Citation

Philippe Lagacherie, D. Arrouays, H. Bourennane, Cecile Gomez, L. Nkuba-Kasanda. Analysing the impact of soil spatial sampling on the performances of Digital Soil Mapping models and their evaluation: A numerical experiment on Quantile Random Forest using clay contents obtained from Vis-NIR-SWIR hyperspectral imagery. Geoderma, Elsevier, 2020, 375 (1), ⟨10.1016/j.geoderma.2020.114503⟩. ⟨hal-02891658⟩

Share

Metrics

Record views

40