Statistical learning for OTUs identification - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Statistical learning for OTUs identification

Résumé

Statistical learning for OTUs identification: Molecular based inventories are currently made rountinely with metabarcoding. However, comparisons with optical based inventories are scarce in micro-organisms. Here, we study whether a morphological based taxonomy and unsupervized clustering of amplicons on a same dataset provide the same picture of diversity. For OTU building, we implement both HAC and a novel approach based on the Stochastic Block Models (SBM). Plants are among the best known organisms (both botanically and with molecular phylogenies). Therefore, we use a dataset of amplicons (trnH-psbA) of 1502 trees from an experimental plot in French Guiana, over a large spectrum of botanical diversity, identified by field botanists. We study whether the convergence/divergence of the 3 classifications depends on the taxonomic level addressed (order, family, genus). We deploy the HAC and test several aggregation methods. We deploy SBM with Poisson probability distribution to model the pattern of distances between sequences. Finally, we compare the 3 classifications we obtained by building contingency tables. Preliminary result show that the convergence of the three methods depends on the distribution of intra and inter-class distances. For instance, in Magnoliales they are well differentiated and convergence is very good, whereas for the Gentianales convergence is poor and distances are not well differentiated. Moreover, the SBM provides a matrix of parameters which quantify the connection between the classes. It is an excellent candidate for being a multivariate index of diversity, richer than a scalar one. Finally, we will discuss the issue of scaling of this approach to metabarcoding.
Fichier non déposé

Dates et versions

hal-02941708 , version 1 (17-09-2020)

Identifiants

  • HAL Id : hal-02941708 , version 1

Citer

Mohamed Anwar Abouabdallah, Olivier Coulaud, Alain Franc, Nathalie Peyrard. Statistical learning for OTUs identification. ISEC 2020 - International Statistical Ecology Conference, Jun 2020, Sydney / Virtual, Australia. ⟨hal-02941708⟩
75 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More