Improved sensitivity and reliability of anchor based genome alignment

Whole genome alignment is a challenging problem in computational comparative genomics. It is essential for the functional annotation of genomes, the understanding of their evolution, and for phylogenomics. Many global alignment programs are heuristic variations on the anchor based strategy, which relies on the initial detection of similarities and their selection in an ordered chain. Considering that alignment tools fail to align some pairs of bacterial strains, we investigate whether this is intrinsically due to the strategy or to a lack of sensitivity of the similarity detection method. For this, we implement and compare 6 programs based on three different detection methods (from exact matches to local alignments) on a large benchmark set. Our results suggest that the sensitivity of well known methods, like MGA or Mauve, can be greatly improved in the case of divergent genomes if one exploits spaced seeds at the detection phase. In other cases, such methods yield alignments that cover nearly the whole genome. Then, we focus on global reliability of alignments: should an aligned pair of segments be included in the global genome alignment? We investigate this reliability according to both the segment ”alignability” and to inclusion of orthologs. Again, we provide evidence that for both close and divergent genomes, one of our programs, YH, achieves alignments with sometimes a lower coverage, but a higher inclusion of orthologs. It opens the way to the first reliable alignments for some highly divergent species like Buchnera aphidicola or Prochlorococcus marinus.

Mots clés

anchor based strategy spaced seeds global genome alignment

Domaines

Mathématiques [math] Informatique [cs] Sciences du Vivant [q-bio]

Fichier principal

jobim2009-actes_1.pdf (24.37 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02751358

Soumis le : mercredi 3 juin 2020-17:47:35

Dernière modification le : jeudi 14 mars 2024-03:13:40

Archivage à long terme le : vendredi 4 décembre 2020-17:43:19

Dates et versions

hal-02751358 , version 1 (03-06-2020)

Identifiants

HAL Id : hal-02751358 , version 1
PRODINRA : 197233

Citer

Raluca Uricaru, Célia Michotey, Laurent Noé, Helene H. Chiapello, Eric Rivals. Improved sensitivity and reliability of anchor based genome alignment. JOBIM - Journées Ouvertes en Biologie Informatique Mathématiques2009 - Nantes :, Jun 2009, Nantes, France. 259 p. ⟨hal-02751358⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRA UNIV-MONTPELLIER UNIV-LILLE INRAE MATHNUM DPT_ECODIV

36 Consultations

48 Téléchargements