Feedback on a c omparative metatranscriptomic analysis
Résumé
The progress of next generation sequencing favors the development of more comprehensive ecosystem studies thanks to metatranscriptomic approaches. These latter can indeed provide access to functional information at a good analysis depth. Through a study of anaerobic digesters treating anionic surfactant contaminated wastewater [1] (namely the linear alkylbenzene sulfonate, LAS), we developed a bioinformatics pipeline to perform the RNAseq data analysis for shotgun metatranscriptomics data.
In this pipe-line, the raw data are cleaned and pre-processed. Reads corresponding to rRNA are detected and discarded from the datasets. After a normalization step based on k-mer counts, the mRNA reads from the datasets are de novo co-assembled using the Trinity software. Coding regions of the metatranscriptomic assembly are subsequently predicted and annotated. For functional annotation, sequences with matches to the eggNOG and KEGG GENES databases are retrieved to establish functional categories and reconstruct the metabolic pathways. For taxonomic classification, the sequences are assigned by comparing them to a NCBI-nr database. For each dataset individually, reads are mapped back to the co-assembled contigs. Eventually, a count table is constructed; it contains, for each predicted gene, the counts obtained by samples, as well as the associated taxonomic and functional annotations.
After aggregation and statistical analysis, this study enabled detecting active genes likely involved in each step of LAS biodegradation and exploring the microbial active core related to LAS degradation.
We developed a bioinformatics pipeline to perform the RNAseq data analysis for shotgun metatranscriptomics data, through a study of anaerobic digesters treating anionic surfactant contaminated wastewater.
In this pipeline, the raw data are cleaned and pre-processed. Reads corresponding to rRNA are detected and discarded from the datasets. After a normalization step based on k-mer counts, the mRNA reads from the datasets are de novo co-assembled. Coding regions of the metatranscriptomic assembly are subsequently predicted and annotated. Taxonomic and functional annotations are obtained by comparison to public reference databases. The latter are used to define functional categories and reconstruct metabolic pathways.
For each dataset individually, reads are mapped back to the co-assembled contigs. Finally, a count table is constructed; it contains, for each predicted gene, the counts obtained by samples, as well as the associated taxonomic and functional annotations.
After aggregation and statistical analysis, this study enabled detecting active genes likely involved in each step of the anionic surfactant degradation and exploring the associated microbial activse core.
Origine | Fichiers produits par l'(les) auteur(s) |
---|