GnS-PIPE: an optimized bionformatic pipeline to efficiently assess microbial taxonomic diversity of complex environments using high throughput sequencing technologies
Résumé
The rRNA genes (16S, 18S, ITS) are widely used to study microbial communities in soils, as they can be easily amplified from metagenomic DNA. Moreover, the recent development of high-throughput sequencing technologies allows the assessment of millions of sequences from a single metagenomic DNA. Some pipelines are already available (e.g. QIIME or Mothur) to efficiently treat such data. However, the development of bioinformatic tools must now be validated by various biological tests. This was particularly true for key steps to appraise microbial diversity and richness. Here, we present a new pipeline named GnS-PIPE, a software application performing bacterial, archaeal and fungal taxonomic diversity analyses. One of the key design in the development of GnS-PIPE was that we conduct biological validations of defined bioinformatic steps. These biological tests have been performed using the expertise of the GenoSol platform, a biological resource centre unique in France, devoted to the conservation and analysis of the genetic resources of soil microbial communities. GnS-PIPE includes several optimized steps, like a step to correct homopolymer errors due to the pyrosequencing technology, a biological approach to detect chimera sequences using a database of known sequences from various metagenomic studies, and a taxonomic assignation method merging results from various databases and methods. Moreover, a user-friendly graphical interface working on Windows, Mac OS and Linux systems was developed to easily interact with GnS-PIPE, and manage samples and analysis. To date, GnS-PIPE provides an optimized approach to easily analyse microbial biodiversity from widescale soil samplings.