Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Haplotype mapping uncovers unexplored variation in wild and domesticated soybean at the major protein locus cqProt-003

Marsh, J.I., Hu, H.ORCID: 0000-0003-1070-213X, Petereit, J., Bayer, P.E., Valliyodan, B., Batley, J., Nguyen, H.T. and Edwards, D. (2021) Haplotype mapping uncovers unexplored variation in wild and domesticated soybean at the major protein locus cqProt-003. bioRxiv . pp. 1-28.

PDF (Preprint)
Download (754kB) | Preview
Free to read:
*No subscription required


Here, we present association and linkage analysis of 985 wild, landrace and cultivar soybean accessions in a pan genomic dataset to characterize the major high-protein/low-oil associated locus cqProt-003 located on chromosome 20. A significant trait associated region within a 173 kb linkage block was identified and variants in the region were characterised, identifying 34 high confidence SNPs, 4 insertions, 1 deletion and a larger 304 bp structural variant in the high-protein haplotype. Trinucleotide tandem repeats of variable length present in the third exon of gene 20G085100 are strongly correlated with the high-protein phenotype and likely represent causal variation. Structural variation has previously been found in the same gene, for which we report the global distribution of the 304bp deletion and have identified additional nested variation present in high-protein individuals. Mapping variation at the cqProt-003 locus across demographic groups suggests that the high-protein haplotype is common in wild accessions (94.7%), rare in landraces (10.6%) and near absent in cultivated breeding pools (4.1%), suggesting its decrease in frequency primarily correlates with domestication and continued during subsequent improvement. However, the variation that has persisted in under-utilized wild and landrace populations holds high breeding potential for breeders willing to forego seed oil to maximise protein content. The results of this study include the identification of distinct haplotype structures within the high-protein population, and a broad characterization of the genomic context and linkage patterns of cqProt-003 across global populations, supporting future functional characterisation and modification.

Item Type: Non-refereed Article
Publisher: Cold Spring Habor Laboratory
Item Control Page Item Control Page


Downloads per month over past year