R'MES: Finding exceptional motifs in sequences - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

R'MES: Finding exceptional motifs in sequences

Mark M. Hoebeke
  • Fonction : Collaborateur
  • PersonId : 1203219

Résumé

The R’MES project started in 1995. This is now the 3rd version. The main question R’MES addresses is ”does this motif occur in that biological sequence with an expected frequency?” In other words, can we observe it so many times, or so few times, just by chance? Usually, when the answer is no, such a motif is a candidate to have a particular biological meaning. To do so, we calculate an exceptionality score for each word of a given length (or for each given set of words); this score is a one-to-one transformation of the corresponding p-value. The p-value is the probability that a random sequence having the same 1- up to (m + 1)-letter word composition as the biological sequence contains as many occurrences of the given word. This probability is approximated thanks to rigorous statistical approximations of the word count distribution, namely either a Gaussian distribution (for frequent words) or a compound Poisson distribution (for rare words). Details about the statistical results on word counts in random sequences can befound in [1]. R’MES is getting enriched thanks to novel questions from the biologists. R’MES can now for instance compute an exceptionality score related to the skew of an oligonucleotide; the typical question is indeed “does this motif occur significantly more often on the leading strand than on the lagging strand?” At the moment, we are implementing the statistical tests proposed by [2] to compare motif exceptionalities between two different sequences. In the talk, we will illustrate how we have identified the Chi site of Staphylococcus aureus [3] and the matS site of Escherichia coli [4] thanks to R’MES.
Fichier principal
Vignette du fichier
bosc2009-rmes_1.pdf (33.15 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02818582 , version 1 (06-06-2020)

Identifiants

  • HAL Id : hal-02818582 , version 1
  • PRODINRA : 183418

Citer

Sophie S. Schbath, Mark M. Hoebeke. R'MES: Finding exceptional motifs in sequences. BOSC, Jun 2009, Stockholm, Sweden. 1p. ⟨hal-02818582⟩

Collections

INRA INRAE MATHNUM
15 Consultations
13 Téléchargements

Partager

Gmail Facebook X LinkedIn More