Detecting correlation between allele frequencies and environmental variables as a signature of selection. A fast computational approach for genome-wide studies
Résumé
Genomic regions (or loci) displaying outstanding correlation with some environmental variables are likely to be under selection and this is the rationale of recent methods of identifying selected loci and retrieving functional information about them. To be efficient, such methods need to be able to disentangle the potential effect of environmental variables from the confounding effect of population history. For the routine analysis of genome-wide datasets, one also needs fast inference and model selection algorithms. We propose a method based on an explicit spatial model which is an instance of spatial generalized linear mixed model (SGLMM). For inference, we make use of the INLA-SPDE theoretical and computational framework developed by Rue et al. (2009) and Lindgren et al. (2011). The method we propose allows one to quantify the correlation between genotypes and environmental variables. It works for the most common types of genetic markers, obtained either at the individual or at the population level. Analyzing the simulated data produced under a geostatistical model then under an explicit model of selection, we show that the method is efficient. We also re-analyze a dataset relative to nineteen pine weevils (Hylobius abietis) populations across Europe. The method proposed appears also as a statistically sound alternative to the Mantel tests for testing the association between the genetic and environmental variables.