Comparison of mapping softwares for next generation sequencing data
Abstract
Recent DNA sequencers, usually called "next generation", produce reads that are shorter and in much larger amounts than previous sequencers. New alignment tool have been developed for these new type of reads. Our study evaluates the efficiency, strong points and weaknesses of these tools. We have identified about 40 software tools that are currently used to map on known genomes the reads produced by next generation sequencers (NGS). Our study focuses on reads produced by Illumina sequencers, but also consider specificity associated with SOLiD reads (color code). Methodology: We simulate two sets of reads of length 40 bp, that are drawn uniformly in a dataset. To reflect the diversity of genomic data, we use 2 kinds of datasets: the human genome and a concatenation of 1000 bacterial genomes. The sets contain 10M reads, close to the actual amount produced by NGS tools. In the first set reads are without errors, in the second, three mismatches are added at random positions. We use 11 of the most used tools (BWA, Novoalign, Bowtie, MOSAIK, MOM, Probematch, SOAP2, Bfast, SHRiMP, maq, and ZOOM) to align the simulated reads on the genome. We monitor several indicators of the performance of each tool: CPU time used, memory, whether the read matches at its "initial" position, number of match positions found for a given read, number of match positions that are in the original genome. We also take into account usability, flexibility, output format, documentation of the tools.
Fichier principal
PosterJOBIM_FAYOLLE_1.pdf (209.54 Ko)
Télécharger le fichier
actes_jobim_2010-1_2.pdf (18.2 Mo)
Télécharger le fichier
Origin | Files produced by the author(s) |
---|
Origin | Files produced by the author(s) |
---|
Loading...