Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Benchmarking tools for aligning millions of short reads and implementation of a web-based data processing interface

Barrero, R.A., Albertyn, Z., Hercus, C., Hunter, A., Dennis, L., Shannon, M.F. and Bellgard, M. (2008) Benchmarking tools for aligning millions of short reads and implementation of a web-based data processing interface. In: 19th International Conference on Genome Informatics (GIW2008), 1 - 3 December 2008, Gold Coast, QLD.

Image (JPEG) (Poster)
Download (370kB)


New DNA sequencing technologies can produce millions of short reads in a single run, generating an unprecedented volume of raw sequence data that needs to be processed. Numerous tools have been developed to align short sequence reads to reference genomes. However, their relative performance in specific applications remains poorly characterized. Here we present the first benchmarking study of existing freely available short-read aligning tools. We tested and compared six tools (MAQ, SOAP, ELAND, SHRiMP, RMAP and NOVOALIGN) for aligning single-end short sequence reads onto a reference genome. We generated simulated short reads data sets from the human chromosome 22 and evaluated the sensitivity and specificity of alignments for each tool. Our findings indicate that NOVOALIGN is the overall best performing tool aligning 88% of the short reads onto a reference genome with a false mapping rate less than 1%. We also tested the performance of these tools with real datasets collected from the human HapMap project. This result confirmed that NOVOALIGN is the most robust tool currently available. We next compared the performance tools (MAQ and NOVOALIGN) to map mate-pair end reads using quality scores. Our findings suggest that NOVOALIGN can map up to 23% more short reads with a high quality score (Q50>=) than MAQ. In this study we also report the benchmarking results for calling SNPs and insertions/del;etions (indels) based on millions of short sequence reads. Furthermore, we have implemented a web-based data processing workflow environment for mapping short sequence reads onto reference genomes.

Item Type: Conference Item
Murdoch Affiliation(s): Centre for Comparative Genomics
Item Control Page Item Control Page


Downloads per month over past year