Biocluster: an NGS-dedicated HPC cluster in IMBBC, HCMR Upcoming Upgrade Usage/Partners Training
Résumé
Modern biology is largely shaped by the development of next generation sequencing (NGS) technology. Researchers sequence massively model and non-model organisms from single cell transcriptomes, to whole genomes with applications that range from medicine up to ecology and conservation. However, the great challenge following the scaling up of sequencing throughput is data analysis. The amounts of data produced by NGS experiments require high computational power and cannot be analyzed in desktop computers. This fact underlines the need for HPC platforms that allow the analysis of high throughput data in reasonable timeframes. In IMBBC (HCMR), an HPC cluster named Biocluster has been built, dedicated to bioinformatics applications and NGS data analysis. Since 2010, when the cluster was first launched, we have been configuring it to accommodate more than 200 state-of-the-art pieces of software. On top of that, we have been developing parallelized pipelines for NGS data analysis with special focus on the challenges rising from sequencing non-model species. The parallelized pipelines include raw reads pre-preprocessing, gene annotation, variant discovery, population genetic analyses, metabarcoding analyses and others. Biocluster can accommodate all possible OMICS data schemes in an efficient and optimized way, allowing analyses for various experimental designs in a speed comparable to that of modern sequencing data production. The unprecedented collection of sequence analysis software and the availability of parallelized pipelines turn this platform to a unique bioinformatics tool. TMM-FPKM
Domaines
Biologie végétale
Origine : Fichiers produits par l'(les) auteur(s)