Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

G4010.05: Development of Integrated SNP Mining and Utilization (ISMU) pipeline based on next generation sequencing (NGS) and high-throughput (HTP) genotyping technologies for facilitating molecular breeding

Varshney, R.ORCID: 0000-0002-4562-9131, Shah, T., Marshall, D., Milne, I., Rathore, A., Azam, S., Bhanuprakash, A., Pradeep, R., Edwards, D., May, G. and Farmer, A. (2011) G4010.05: Development of Integrated SNP Mining and Utilization (ISMU) pipeline based on next generation sequencing (NGS) and high-throughput (HTP) genotyping technologies for facilitating molecular breeding. Project Updates. CGIAR Generation Challenge Programme .


Next generation sequencing (NGS) technologies are revolutionizing crop genomics, including our understanding of genome diversity, development of mapping resources, and studies of ecological and evolutionary biology. By lowering costs and increasing the rate of sequence acquisition, these technologies are causing researchers to re-think how crop genomes and transcriptomes are analyzed. One of the most convenient approaches is to discover single nucleotide polymorphisms (SNPs) in thousands of genes by sequencing and re-sequencing of genome / transcriptome with NGS platforms. Analysis of large-scale NGS data however is a serious challenge to the crop genomics community especially in under-resourced crops like chickpea and pigeonpea where reference genome sequence is not available. This project deals with identification and optimization of appropriate tool/ approach for analyzing the NGS (Illumina GA reads) for identification of the variants or polymorphisms between genotypes of an under-resourced crop species like chickpea.

To discuss issues related to tools, criteria, an international workshop was organized. Presentations and discussions are available at htm as a resource for research community (Activity 1). In brief, in consultation with several international experts through a workshop, four commonly used tools namely Maq, NovoAlign, SOAP2 and Bowtie were selected. Unlike probability based statistical approaches for consensus calling and by comparison with a reference sequence, a Coverage based Consensus Calling (CbCC) approach was applied with four commonly used short read alignment tools (Maq, Bowtie, Novoalign and SOAP2) on 15.7 and 22.1 million Illumina reads for chickpea genotypes ICC 4958 and ICC 1882 were aligned with the chickpea trancriptome assembly (CaTA). Using CbCC results for these two genotypes, a non-redundant set of 4466 SNPs was identified. Experimental validation of 224 randomly selected SNPs showed the superiority of Maq among individual tools, as 50.0% of SNPs predicted by Maq were true SNPs. Using combinations of two tools, the greatest accuracy (55.7%) was reported for Maq and Bowtie, with a combination of Bowtie, Maq and Novoalign identifying 61.5% true SNPs. SNP prediction accuracy generally increased with increasing reads depth, however, in case of Maq, SNPs predicted at lower read depths (<10) showed greatest accuracy. In addition to identification of a large number of SNPs in chickpea, this study provides a benchmark comparison of tools as well as read depths for four commonly used tools for NGS SNP discovery in a crop species without a reference genome sequence. (Activity 5)

To convert in silico identified SNPs into genotyping assays, a perl script that calculates Assay design (ADT) score has been included in the pipeline (, the first version was developed in the earlier version of the proposal). ADT score is actually a method for predicting the successful creation of custom genotyping assays (specific for GoldenGate technology from Illumina). (Activity 2)

To improve the functionalities in the existing pipeline, the Flapjack programme ( of SCRI was modified to provide direct web-based functions at SCRI. Furthermore, in collaboration with SCRI, Flapjack has been included in the pipeline for graphical genotyping tool for visualization and analysis of the genotyping data for the purpose of selecting the suitable genotypes for diversity analysis and selecting the best parental genotypes for marker-assisted backcrossing (MABC) and marker-assisted recurrent selection (MARS). (Activity 3)

The developed pipeline has been well documented and all documentation will be made available soon in public domain. The documentation outlines the steps to be followed in the pipeline. End users can use this pipeline easily with help of documents even though they may not be familiar with /Linux/Unix background. (Activity 4)

A standalone version of available pipeline is also being prepared and packaged in CD/DVD to distribute to the plant research community. Standalone version will be helpful for those researchers who have resources like computational facilities, sequencing data but do not have expertise in NGS data analysis. The pipeline is open source and can be developed further by the community. (Activity 6)

A demonstration of developed pipeline is being scheduled in the General Research Meeting of GCP in Hyderabad. (Activity 7)

Flapjack Yan J, Yang X, Shah T, Sánchez H, Li J, et al. High-throughput SNP genotyping with the Golden Gate assay in maize. Mol Breed 2009 DOI 10.1007/s11032-009- 9343-2

Thakur V, Varshney R (2010) Challenges and strategies for next generation sequencing (NGS) data analysis. J Comput Sci Syst Biol 3:040-042

Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522-30

Item Type: Others
Publisher: CIMMYT
Publisher's Website:
Item Control Page Item Control Page