Compound poisson approximation of word counts in DNA sequences

Sophie Schbath

Article Dans Une Revue ESAIM: Probability and Statistics Année : 1995

Compound poisson approximation of word counts in DNA sequences

(1)

Sophie Schbath

Fonction : Auteur
PersonId : 183444
IdHAL : sophie-schbath
ORCID : 0000-0003-3574-8222
IdRef : 07553424X

Unité de biométrie et intelligence artificielle de Jouy

Résumé

Identifying words with unexpected frequencies is an important problem in the analysis of long DNA sequences. To solve it, we need an approximation of the distribution of the number of occurences N(W) of a word W. Modeling DNA sequences with m-order Markov chains, we use the Chen-Stein method to obtain Poisson approximations for two different counts. We approximate the "declumped" count of W by a Poisson variable and the number of occurences N(W) by a compound Poisson variable. Combinatorial results are used to solve the general case of overlapping words and to calculate the parameters of these distributions.

Mots clés

COMBINATOIRE

Domaines

Sciences du Vivant [q-bio]

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02699788

Soumis le : lundi 1 juin 2020-12:54:12

Dernière modification le : mardi 12 mars 2024-10:47:05

Dates et versions

hal-02699788 , version 1 (01-06-2020)

Identifiants

HAL Id : hal-02699788 , version 1
PRODINRA : 135890

Citer

Sophie Schbath. Compound poisson approximation of word counts in DNA sequences. ESAIM: Probability and Statistics, 1995, 1 (1), pp.1-16. ⟨hal-02699788⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRA INRAE MATHNUM

4 Consultations

0 Téléchargements

Compound poisson approximation of word counts in DNA sequences

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager