A bioinformatic pipeline to elucidate the links between viruses and their hosts in microbial communities, applied to viruses in anaerobic digestion processes - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Poster De Conférence Année : 2021

A bioinformatic pipeline to elucidate the links between viruses and their hosts in microbial communities, applied to viruses in anaerobic digestion processes

Résumé

Viruses are key-players in microbial ecosystems. However, predicting hosts from viruses is still a major challenge in microbial ecology. A few in silico methods for metagenomics data have proven useful for this purpose (e.g. in [1]), and they are highly invaluable when studying environmental samples. We developed a bioinformatic pipeline including the detection of CRISPR protospacers in viral contigs, a method previously used to predict hosts from marine viruses [1]. We applied our pipeline to anaerobic digestion (AD) ecosystems, in the context of organic waste treatment and valorisation. We focused on the diversity of viruses infecting methanogens, the latter being the key actors of methane production during AD. Viral diversity is only starting to be explored in AD processes [2], hence the great potential of new virus discovery in our study. After enrichment of methanogenic archaea in AD microcosms by growth on formate as the sole carbon source, 2 DNA metaviromes and 5 cellular metagenomes were sequenced using Illumina NextSeq. Our pipeline was applied to all the obtained data. It was executed on the cluster of the INRAE MIGALE bioinformatics platform. The most generic steps of our pipeline were scripted as a snakemake workflow, to favour reproducible and scalable data analysis (https://forgemia.inra.fr/cedric.midoux/workflow_metagenomics). After a preprocessing step, reads were assembled with metaSPADES. Coding regions were predicted with Prodigal. Taxonomic assignation of the contigs and of their predicted genes was obtained with kaiju against NCBI nr database. Functional annotation of the predicted genes was performed with Diamond against Phrogs (https://phrogs.lmge.uca.fr/), a database dedicated to prokaryotic viruses, and with ghostKoala against KEGG database (https://www.kegg.jp/ghostkoala/). For each dataset, reads were mapped to the assembled contigs. Several steps specifically dedicated to the prediction of hosts from viral contigs were performed using bash and python scripts. For the cellular metagenomes, spacers were detected in contigs with CRISPRdetect and CRISPRCasFinder. A non-redundant spacer database was built from the obtained spacer sequences. The viral contigs were subsequently aligned with blastn against this specific database, enabling host prediction. In addition, metagenome-assembled genomes (MAGs) were constructed from cellular metagenomic data with Metabat2. Their quality was improved with RefineM and controlled with CheckM. Thanks to this spacer-based approach, we were able to identify 77 viral contigs possibly originating from methanogenic archaea. We are currently further analysing them to confirm their nature and to study their gene content. The MAG reconstruction yielded 15 methanogenic archaea genomes. Thanks to these latter, we will search for archaeal proviruses with Phaster and VirSorter and we will also use a k-mer based method to identify additional putative archaeal viruses, using the tool WiSH. References 1. Felipe H Coutinho, Cynthia B. Silveira, Gustavo B. Gregoracci, Cristiane C. Thompson, Robert A. Edwards, Corina PD Brussaard, Bas E. Dutilh, and Fabiano L. Thompson. Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nature communications 8, no. 1: 1-12, 2017. 2. Magdalena Calusinska, Martyna Marynowska, Xavier Goux, Esther Lentzen, and Philippe Delfosse. Analysis of ds DNA and RNA viromes in methanogenic digesters reveals novel viral genetic diversity. Environmental microbiology 18, no. 4: 1162-1175, 2016.
Fichier principal
Vignette du fichier
70_Vuong-Quoc-Hoang-NGO_JOBIM.pdf (837.43 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04359920 , version 1 (21-12-2023)

Identifiants

  • HAL Id : hal-04359920 , version 1

Citer

Vuong Quoc Hoang Ngo, Cédric Midoux, Mahendra Mariadassou, Valentin Loux, François Enault, et al.. A bioinformatic pipeline to elucidate the links between viruses and their hosts in microbial communities, applied to viruses in anaerobic digestion processes. JOBIM 2021 (JOBIM (Journées Ouvertes en Biologie, Informatique et Mathématiques), Jul 2021, Paris, France. . ⟨hal-04359920⟩
30 Consultations
11 Téléchargements

Partager

Gmail Facebook X LinkedIn More