Speeding up estimation of spatially varying coefficients models
Résumé
Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281-298, 1996 and McMillen in J Urban Econ 40:100-124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size n x n) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43-61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74-85, 2018; Li et al. in Int J Geogr Inf Sci 33:155-175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459-480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating beta coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459-480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10-20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood (<5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package mgwrsar for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar.