A fast method to fit the mean of unselected base animals in single-step SNP-BLUP
Résumé
Single-step GBLUP (SSGBLUP) is the reference method for genomic evaluation. To bypass the inversion of the genomic relationship matrix when many animals are genotyped, equivalent formulations of SSGBLUP have been proposed, predicting the effects of markers rather than breeding values. In such models, missing genotypes are imputed linearly, which requires the centring of the observed genotypes using generally unknown base allele frequencies. Hsu et al. proposed to solve this by fitting a covariable to model the mean of unselected base animals. This requires the computation of a covariate vector J with entries equal to -1 for genotyped animals (ga) and to Jn=(Ann)-1Ang1 for ungenotyped animals (na), where Ann and Ang are the na × na and the na × ga submatrices of the inverse of the pedigree-based relationship matrix. Ann is sparse, so the computations involving (Ann)-1 can be based on its sparse Cholesky factor L. In dairy cattle populations, the factorization of Ann is fast and L is sparse. However, the factorization is more expensive and L is much denser in beef cattle populations where the number of ungenotyped bulls is large. We propose a simple method to compute the Jn vector at low cost. It requires the following steps: (1) Divide the ungenotyped population into ancestors of ga (ANC) and other animals (OTH); (2) Compute L of the AANC nn submatrix built considering only the ga and their ancestors; (3) Compute JANC=(AANC nn)-1 AANC ng1 for ANC animals; (4) for any animal i of OTH, from oldest to youngest, compute JOTH(i) = 0.5×(γsire+γdam), where γparent is either 0, -1, JANC(parent) or JOTH(parent) if animal i’s parent (sire or dam) is respectively either unknown, genotyped, belongs to the ANC or to the OTH sub-population. This method was tested on a French Charolais breed dataset containing 156,447 ANC, 10,333,930 OTH and 22,449 ga. The Cholesky factorization of Ann and AANC nn using MKL PARDISO and 8 CPUs required 5 h 07 and 33 sec, respectively. The subsequent computation of Jn as (Ann)-1Ang1 or with the proposed method required 27 sec and 1 sec, respectively.