Learning the optimal scale for GWAS through hierarchical SNP aggregation

Background Genome-Wide Association Studies (GWAS) seek to identify causal genomic variants associated with rare human diseases. The classical statistical approach for detecting these variants is based on univariate hypothesis testing, with healthy individuals being tested against affected individuals at each locus. Given that an individual's genotype is characterized by up to one million SNPs, this approach lacks precision, since it may yield a large number of false positives that can lead to erroneous conclusions about genetic associations with the disease. One way to improve the detection of true genetic associations is to reduce the number of hypotheses to be tested by grouping SNPs.ResultsWe propose a dimension-reduction approach which can be applied in the context of GWAS by making use of the haplotype structure of the human genome. We compare our method with standard univariate and group-based approaches on both synthetic and real GWAS data.ConclusionWe show that reducing the dimension of the predictor matrix by aggregating SNPs gives a greater precision in the detection of associations between the phenotype and genomic regions.

Mots clés

Genome-wide association study Hierarchical clustering

Statistical genetics Variable selection

Domaines

Mathématiques [math] Informatique [cs]

Fichier principal

2018_Guinot_BMC Bioinformatics_1.pdf (2.04 Mo)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02623460

Soumis le : mardi 26 mai 2020-08:16:37

Dernière modification le : mardi 4 juin 2024-21:30:12

Dates et versions

hal-02623460 , version 1 (26-05-2020)

Licence

Paternité

Identifiants

HAL Id : hal-02623460 , version 1
DOI : 10.1186/s12859-018-2475-9
PRODINRA : 463597
PUBMED : 30497371
WOS : 000451684500004

Citer

Florent Guinot, Marie Szafranski, Christophe Ambroise, Franck Samson. Learning the optimal scale for GWAS through hierarchical SNP aggregation. BMC Bioinformatics, 2018, 19, ⟨10.1186/s12859-018-2475-9⟩. ⟨hal-02623460⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH CNRS UNIV-EVRY INRA INSMI LAMME UNIV-PARIS-SACLAY INRAE GS-ENGINEERING MATHNUM ENSIIE

84 Consultations

55 Téléchargements