Confidence intervals for validation statistics with data truncation in genomic prediction

Matias Bermann; Andres Legarra; Alejandra Alvarez Munera; Ignacy Misztal; Daniela Lourenco

doi:10.1186/s12711-024-00883-w

Article Dans Une Revue Genetics Selection Evolution Année : 2024

Confidence intervals for validation statistics with data truncation in genomic prediction

, , , ,

Matias Bermann

Fonction : Auteur correspondant
PersonId : 1161538
ORCID : 0000-0002-5374-0710

Connectez-vous pour contacter l'auteur

Andres Legarra

Fonction : Auteur

Alejandra Alvarez Munera

Fonction : Auteur

Ignacy Misztal

Fonction : Auteur

Daniela Lourenco

Fonction : Auteur

Résumé

AbstractBackgroundValidation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of “early” and “late” EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method.ResultsWe derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping.ConclusionsEstimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.

Domaines

Sciences du Vivant [q-bio]

Fichier principal

12711_2024_Article_883.pdf (1.64 Mo)

Origine	Publication financée par une institution

BMC BMC : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04497822

Soumis le : lundi 11 mars 2024-05:12:38

Dernière modification le : jeudi 14 mars 2024-03:29:18

Dates et versions

hal-04497822 , version 1 (11-03-2024)

Identifiants

HAL Id : hal-04497822 , version 1
DOI : 10.1186/s12711-024-00883-w

Citer

Matias Bermann, Andres Legarra, Alejandra Alvarez Munera, Ignacy Misztal, Daniela Lourenco. Confidence intervals for validation statistics with data truncation in genomic prediction. Genetics Selection Evolution, 2024, 56 (1), pp.18. ⟨10.1186/s12711-024-00883-w⟩. ⟨hal-04497822⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ARINRAE-GSE ARINRAE

27 Consultations

62 Téléchargements

Confidence intervals for validation statistics with data truncation in genomic prediction

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager