A model selection criterion for model-based clustering of annotated gene expression data

Mélina Gallopin; Gilles Celeux; Florence Jaffrezic; Andrea Rau

doi:10.1515/sagmb-2014-0095

Article Dans Une Revue Statistical Applications in Genetics and Molecular Biology Année : 2015

A model selection criterion for model-based clustering of annotated gene expression data

(1) , (2, 3) , (1) , (1)

1
2
3

Mélina Gallopin

Fonction : Auteur

Génétique Animale et Biologie Intégrative

Gilles Celeux

Fonction : Auteur
PersonId : 833415
ORCID : 0000-0002-7221-6594
IdRef : 02951598X

Model selection in statistical learning

Laboratoire de Mathématiques d'Orsay

Florence Jaffrezic

Fonction : Auteur
PersonId : 744399
IdHAL : florence-jaffrezic
ORCID : 0000-0001-7579-6419
IdRef : 14446554X

Génétique Animale et Biologie Intégrative

Andrea Rau

Fonction : Auteur
PersonId : 744212
IdHAL : andrea-rau
ORCID : 0000-0001-6469-488X
IdRef : 196132118

Génétique Animale et Biologie Intégrative

Résumé

In co-expression analyses of gene expression data, it is often of interest to interpret clusters of co-expressed genes with respect to a set of external information, such as a potentially incomplete list of functional properties for which a subset of genes may be annotated. Based on the framework of finite mixture models, we propose a model selection criterion that takes into account such external gene annotations, providing an efficient tool for selecting a relevant number of clusters and clustering model. This criterion, called the integrated completed annotated likelihood (ICAL), is defined by adding an entropy term to a penalized likelihood to measure the concordance between a clustering partition and the external annotation information. The ICAL leads to the choice of a model that is more easily interpretable with respect to the known functional gene annotations. We illustrate the interest of this model selection criterion in conjunction with Gaussian mixture models on simulated gene expression data and on real RNA-seq data.

Mots clés

functional gene annotation gene expression data model-based clustering model selection

Domaines

Applications [stat.AP]

Fichier principal

2015_Gallopin_SAGMB_1.pdf (2.28 Mo)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Melina Gallopin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01255908

Soumis le : jeudi 28 mai 2020-14:38:41

Dernière modification le : lundi 29 janvier 2024-12:32:23

Dates et versions

hal-01255908 , version 1 (28-05-2020)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-01255908 , version 1
DOI : 10.1515/sagmb-2014-0095
PRODINRA : 336851
PUBMED : 26461845
WOS : 000364311000001

Citer

Mélina Gallopin, Gilles Celeux, Florence Jaffrezic, Andrea Rau. A model selection criterion for model-based clustering of annotated gene expression data. Statistical Applications in Genetics and Molecular Biology, 2015, 14 (5), pp.413-428. ⟨10.1515/sagmb-2014-0095⟩. ⟨hal-01255908⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH CNRS INRIA INRA LM-ORSAY INRIA2 UNIV-PARIS-SACLAY INRAE GENETIQUE_ANIMALE GS-MATHEMATIQUES GS-COMPUTER-SCIENCE GS-BIOSPHERA GABI

256 Consultations

290 Téléchargements

A model selection criterion for model-based clustering of annotated gene expression data

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager