Numerical comparison of several approximations of the word count distribution in random sequences

Stephane S. Robin; Sophie S. Schbath

Article Dans Une Revue Journal of Computational Biology Année : 2001

Numerical comparison of several approximations of the word count distribution in random sequences

(1) , (2)

1
2

Stephane S. Robin

Fonction : Auteur
PersonId : 15469
IdHAL : scjrobin
ORCID : 0000-0003-1045-069X
IdRef : 052503720

Mathématiques et Informatique Appliquées

Sophie S. Schbath

Fonction : Auteur
PersonId : 183444
IdHAL : sophie-schbath
ORCID : 0000-0003-3574-8222
IdRef : 07553424X

Unité Mathématique Informatique et Génome

Résumé

The exact distribution of word counts in random sequences and several approximations have been proposed in the past few years. The exact distribution has no theoretical limit but may require prohibitive computation time. On the other hand, approximate distributions can be rapidly calculated but, in practice, are only accurate under specific conditions. After making a survey of these distributions, we compare them according to both their accuracy and computational cost. Rules are suggested for choosing between Gaussian approximations, compound Poisson approximation, and exact distribution. This work is illustrated with the detection of exceptional words in the phage Lambda genome

Mots clés

EXCEPTIONAL WORDS WORD COUNT MARKOV-CHAINS COMPOUND POISSON

APPROXIMATE DISTRIBUTIONS EXACT DISTRIBUTION SIMULATIONS OCCURRENCES

Domaines

Bio-Informatique, Biologie Systémique [q-bio.QM]

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02675878

Soumis le : dimanche 31 mai 2020-18:04:50

Dernière modification le : mardi 12 mars 2024-10:45:25

Dates et versions

hal-02675878 , version 1 (31-05-2020)

Identifiants

HAL Id : hal-02675878 , version 1
PRODINRA : 39661
WOS : 000171024100001

Citer

Stephane S. Robin, Sophie S. Schbath. Numerical comparison of several approximations of the word count distribution in random sequences. Journal of Computational Biology, 2001, 8 (4), pp.349-359. ⟨hal-02675878⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH INRA MIA-PARIS INRAE MATHNUM

15 Consultations

0 Téléchargements

Numerical comparison of several approximations of the word count distribution in random sequences

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager