Compound poisson approximation of word counts in DNA sequences - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue ESAIM: Probability and Statistics Année : 1995

Compound poisson approximation of word counts in DNA sequences

Résumé

Identifying words with unexpected frequencies is an important problem in the analysis of long DNA sequences. To solve it, we need an approximation of the distribution of the number of occurences N(W) of a word W. Modeling DNA sequences with m-order Markov chains, we use the Chen-Stein method to obtain Poisson approximations for two different counts. We approximate the "declumped" count of W by a Poisson variable and the number of occurences N(W) by a compound Poisson variable. Combinatorial results are used to solve the general case of overlapping words and to calculate the parameters of these distributions.

Mots clés

Fichier non déposé

Dates et versions

hal-02699788 , version 1 (01-06-2020)

Identifiants

  • HAL Id : hal-02699788 , version 1
  • PRODINRA : 135890

Citer

Sophie Schbath. Compound poisson approximation of word counts in DNA sequences. ESAIM: Probability and Statistics, 1995, 1 (1), pp.1-16. ⟨hal-02699788⟩

Collections

INRA INRAE MATHNUM
4 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More