Estimating forest soil bulk density using boosted regression modelling
Résumé
Soil bulk density (rho) is an important physical property, but its measurement is frequently lacking in soil surveys due to the time-consuming nature of making the measurement. As a result pedotransfer functions (PTFs) have been developed to predict rho from other more easily available soil properties. These functions are generally derived from regression methods that aim to fit a single model. In this study, we use a technique called Generalized Boosted Regression Modelling (GBM; Ridgeway, 2006) which combines two algorithms: regression trees and boosting. We built two models and compared their predictive performance with published PTFs. All the functions were fitted based on the French forest soil dataset for the European demonstration Biosoil project. The two GBM models were Model G3 which involved the three most frequent quantitative predictors used to estimate soil bulk density (organic carbon, clay and silt), and Model G10, which included ten qualitative and quantitative input variables such as parent material or tree species. Based on the full dataset, Models G3 and G10 gave R-2 values of 0.45 and 0.86, respectively. Model G3 did not significantly outperform the best published model. Even when fitted from an external dataset, it explained only 29% of the variation of rho with a root mean square error of 0.244 g/cm(3). In contrast, the more complex Model G10 outperformed the other models during external validation, with a R-2 of 0.67 and a predictive deviation of +/- 0.168 g/cm(3). The variation in forest soil bulk densities was mainly explained by five input variables: organic carbon content, tree species, the coarse fragment content, parent material and sampling depth.