R'MES: Finding exceptional motifs in sequences

Sophie S. Schbath; Mark M. Hoebeke

Communication Dans Un Congrès Année : 2009

R'MES: Finding exceptional motifs in sequences

(1) , (1)

Sophie S. Schbath

Fonction : Auteur
PersonId : 183444
IdHAL : sophie-schbath
ORCID : 0000-0003-3574-8222
IdRef : 07553424X

Unité Mathématique Informatique et Génome

Mark M. Hoebeke

Fonction : Collaborateur
PersonId : 1148859
IdHAL : mark-hoebeke
ORCID : 0000-0001-6311-9752

Unité Mathématique Informatique et Génome

Résumé

The R’MES project started in 1995. This is now the 3rd version. The main question R’MES addresses is ”does this motif occur in that biological sequence with an expected frequency?” In other words, can we observe it so many times, or so few times, just by chance? Usually, when the answer is no, such a motif is a candidate to have a particular biological meaning. To do so, we calculate an exceptionality score for each word of a given length (or for each given set of words); this score is a one-to-one transformation of the corresponding p-value. The p-value is the probability that a random sequence having the same 1- up to (m + 1)-letter word composition as the biological sequence contains as many occurrences of the given word. This probability is approximated thanks to rigorous statistical approximations of the word count distribution, namely either a Gaussian distribution (for frequent words) or a compound Poisson distribution (for rare words). Details about the statistical results on word counts in random sequences can befound in [1]. R’MES is getting enriched thanks to novel questions from the biologists. R’MES can now for instance compute an exceptionality score related to the skew of an oligonucleotide; the typical question is indeed “does this motif occur significantly more often on the leading strand than on the lagging strand?” At the moment, we are implementing the statistical tests proposed by [2] to compare motif exceptionalities between two different sequences. In the talk, we will illustrate how we have identified the Chi site of Staphylococcus aureus [3] and the matS site of Escherichia coli [4] thanks to R’MES.

Mots clés

statistiques de motifs, motif d'ADN, RMES

Domaines

Applications [stat.AP]

Fichier principal

bosc2009-rmes_1.pdf (33.15 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02818582

Soumis le : samedi 6 juin 2020-15:53:47

Dernière modification le : vendredi 3 janvier 2025-09:37:26

Dates et versions

hal-02818582 , version 1 (06-06-2020)

Identifiants

HAL Id : hal-02818582 , version 1
PRODINRA : 183418

Citer

Sophie S. Schbath, Mark M. Hoebeke. R'MES: Finding exceptional motifs in sequences. BOSC, Jun 2009, Stockholm, Sweden. 1p. ⟨hal-02818582⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRA INRAE MATHNUM

25 Consultations

22 Téléchargements

R'MES: Finding exceptional motifs in sequences

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager