Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

Magalie Celton; Alain Malpertuy; Gaëlle Lelandais; Alexandre de Brevern

doi:10.1186/1471-2164-11-15

Article Dans Une Revue BMC Genomics Année : 2010

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

(1, 2) , (3) , (2, 4) , (2, 4)

1
2
3
4

Magalie Celton

Fonction : Auteur
PersonId : 919260

Sciences Pour l'Oenologie

Bioinformatique génomique et moléculaire

Alain Malpertuy

Fonction : Auteur
PersonId : 833206

Atragene Informatics

Gaëlle Lelandais

Fonction : Auteur
PersonId : 879919

Bioinformatique génomique et moléculaire

Dynamique des Structures et Interactions des Macromolécules Biologiques

Alexandre de Brevern

Fonction : Auteur correspondant
PersonId : 9903
IdHAL : alexandre-de-brevern
ORCID : 0000-0001-7112-5626
IdRef : 135431697

Connectez-vous pour contacter l'auteur

Bioinformatique génomique et moléculaire

Dynamique des Structures et Interactions des Macromolécules Biologiques

Résumé

BACKGROUND: Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human. RESULTS: We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations. CONCLUSIONS: More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.

Domaines

Génomique, Transcriptomique et Protéomique [q-bio.GN]

Fichier principal

1471-2164-11-15.pdf (661.64 Ko)

1471-2164-11-15-S1.DOC (38.5 Ko)

1471-2164-11-15-S2.DOC (68 Ko)

1471-2164-11-15-S3.DOC (89 Ko)

1471-2164-11-15-S4.DOC (32.5 Ko)

1471-2164-11-15-S5.DOC (44.5 Ko)

1471-2164-11-15.xml (226.4 Ko)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Format	Autre

Format	Autre

Format	Autre

Format	Autre

Format	Autre

Format	Autre

Ed. BMC : Connectez-vous pour contacter le contributeur

https://inserm.hal.science/inserm-00663912

Soumis le : vendredi 27 janvier 2012-17:18:40

Dernière modification le : mercredi 6 mars 2024-03:23:28

Archivage à long terme le : lundi 19 novembre 2012-15:16:34

Dates et versions

inserm-00663912 , version 1 (27-01-2012)

Identifiants

HAL Id : inserm-00663912 , version 1
DOI : 10.1186/1471-2164-11-15
PRODINRA : 207490
PUBMED : 20056002
WOS : 000275279300001

Citer

Magalie Celton, Alain Malpertuy, Gaëlle Lelandais, Alexandre de Brevern. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.. BMC Genomics, 2010, 11 (1), pp.15. ⟨10.1186/1471-2164-11-15⟩. ⟨inserm-00663912⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM UNIV-PARIS7 INRA BA UNIV-MONTPELLIER SPO INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER

380 Consultations

422 Téléchargements

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager