Accéder directement au contenu Accéder directement à la navigation
Article dans une revue

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets

Abstract : Background
Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled.
Results
The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used.
Conclusions
FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com webcite.
Type de document :
Article dans une revue
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01141183
Déposant : Isabelle Gouat <>
Soumis le : lundi 4 mars 2019 - 14:57:33
Dernière modification le : mercredi 24 février 2021 - 16:16:06
Archivage à long terme le : : mercredi 5 juin 2019 - 15:13:16

Fichier

1471-2105-15-341.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Cuong Cao Dang, Vinh Sy Le, Olivier Gascuel, Bart Hazes, Quang Si Le. FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets. BMC Bioinformatics, BioMed Central, 2014, 15 (1), pp.341-343. ⟨10.1186/1471-2105-15-341⟩. ⟨lirmm-01141183⟩

Partager

Métriques

Consultations de la notice

411

Téléchargements de fichiers

336