Comparing the Statistical Fate of Paralogous and Orthologous Sequences

Florian Massip; Michael Sheinman; Sophie Schbath; Peter Arndt

doi:10.1534/genetics.116.193912

Article Dans Une Revue Genetics Année : 2016

Comparing the Statistical Fate of Paralogous and Orthologous Sequences

(1, 2, 3) , (4) , (1) , (2)

1
2
3
4

Florian Massip

Fonction : Auteur correspondant
PersonId : 1275950

Connectez-vous pour contacter l'auteur

Mathématiques et Informatique Appliquées du Génome à l'Environnement [Jouy-En-Josas]

Max Planck Institute for Molecular Genetics

Laboratoire de Biométrie et Biologie Evolutive - UMR 5558

Michael Sheinman

Fonction : Auteur

Universiteit Utrecht / Utrecht University [Utrecht]

Sophie Schbath

Fonction : Auteur
PersonId : 183444
IdHAL : sophie-schbath
ORCID : 0000-0003-3574-8222
IdRef : 07553424X

Mathématiques et Informatique Appliquées du Génome à l'Environnement [Jouy-En-Josas]

Peter Arndt

Fonction : Auteur

Max Planck Institute for Molecular Genetics

Résumé

For several decades, sequence alignment has been a widely used tool in bioinformatics. For instance, finding homologous sequences with a known function in large databases is used to get insight into the function of nonannotated genomic regions. Very efficient tools like BLAST have been developed to identify and rank possible homologous sequences. To estimate the significance of the homology, the ranking of alignment scores takes a background model for random sequences into account. Using this model we can estimate the probability to find two exactly matching subsequences by chance in two unrelated sequences. For two homologous sequences, the corresponding probability is much higher, which allows us to identify them. Here we focus on the distribution of lengths of exact sequence matches between protein-coding regions of pairs of evolutionarily distant genomes. We show that this distribution exhibits a power-law tail with an exponent α = −5. Developing a simple model of sequence evolution by substitutions and segmental duplications, we show analytically and computationally that paralogous and orthologous gene pairs contribute differently to this distribution. Our model explains the differences observed in the comparison of coding and noncoding parts of genomes, thus providing a better understanding of statistical properties of genomic sequences and their evolution.

Mots clés

Comparative genomics Statistical genomics DNA duplications Genome evolution

Domaines

Mathématiques [math] Sciences du Vivant [q-bio]

Sophie SCHBATH : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-04181589

Soumis le : mercredi 16 août 2023-12:10:30

Dernière modification le : mercredi 27 mars 2024-03:24:20

Dates et versions

hal-04181589 , version 1 (16-08-2023)

Identifiants

HAL Id : hal-04181589 , version 1
DOI : 10.1534/genetics.116.193912

Citer

Florian Massip, Michael Sheinman, Sophie Schbath, Peter Arndt. Comparing the Statistical Fate of Paralogous and Orthologous Sequences. Genetics, 2016, 204 (2), pp.475-482. ⟨10.1534/genetics.116.193912⟩. ⟨hal-04181589⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LYON1 BIOENVIS UNIV-PARIS-SACLAY LBBE UDL INRAE GS-MATHEMATIQUES GS-COMPUTER-SCIENCE GS-BIOSPHERA MAIAGE MATHNUM

10 Consultations

0 Téléchargements

Comparing the Statistical Fate of Paralogous and Orthologous Sequences

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager