Intra-species diversity in metagenomic datasets - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Poster De Conférence Année : 2024

Intra-species diversity in metagenomic datasets

Résumé

Microbial ecosystems are composed of tens to thousands of species of bacteria, archaea, microbial eukaryotes, and viruses. Shotgun metagenomic sequencing has revealed a high level of intra-species diversity in several ecosystems. Identifying polymorphisms and reconstructing strains is challenging due to sequencing errors (which must be differentiated from true polymorphisms) and short read length, particularly for species in low abundance. Some approaches aim at resolving strains, either based on selected marker genes or on entire genomes (review by Ventolero et al. [9]). These approaches have the advantage of providing precise information on strain contents. However, they are usually limited to species with a high abundance, requiring approximately 5X coverage. Other methods use reads mapped to references to quantify within and between-sample genomic variation, by computing several metrics to compare samples, such as similarity indexes inspired by population genetics (π and FST) [2, 7], distribution of major allele frequencies [3] or pairwise distance between samples [8]. To our knowledge, none of these methods can handle species in very low abundance. Here, we present INTERSTICE (INTra-species divERSity in meTagenomIC rEads), a new method for studying intra-species diversity that is designed to handle species in low abundance. The method proposes an estimation of within-sample diversity and between-sample distance, for each species, by adapting to metagenomic samples the computation of indexes used in population genetics : nucleotide diversity π and Nei’s standard genetic distance [5,6]. It first maps metagenomic reads to a complete ecosystem- adapted reference genome catalog (UHGG for human gut microbiota [1]) and applies stringent quality filters. Diversity indexes are computed only on reads mapped on genomic regions that are conserved at species-level. These regions are determined by analyzing coverage variation across samples (removing regions with atypical profiles) and are designated as the Typ-genome. We applied this method on data from two cohorts: HMP [4] (adults) and DIABIMMUNE [10] (longitudinal data on children between 0 and 3 years). With sub-sampled datasets, we assessed the robustness of our metrics with respect to decreasing coverage and confirm that values above 0.001 bp-1 require the pairwise comparison of reads on only 10Kbp of the Typ-genome to be reliably estimated. This makes it possible to retrieve information on low abundance species with genome coverage below 0.1X. By analyzing the 747 bacterial species satisfying this minimal criterion, we identify the species with high or low within-sample diversity, the species with rapid lineage turnover, and the species with atypical amount of shared lineages between samples.
Fichier principal
Vignette du fichier
Poster_JOBIM2024.pdf (136.43 Ko) Télécharger le fichier
POSTER_ALA_Jobim2024.pptx.pdf (2.09 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04631038 , version 1 (01-07-2024)

Identifiants

  • HAL Id : hal-04631038 , version 1

Citer

Anne-Laure Abraham, Guillaume Kon Kam King, Solène Pety, Anne-Carmen Sanchez, Hélène Chiapello, et al.. Intra-species diversity in metagenomic datasets. JOBIM 2024, Jun 2024, Toulouse, France. , 2024. ⟨hal-04631038⟩
0 Consultations
0 Téléchargements

Partager

Gmail Mastodon Facebook X LinkedIn More