Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Identifying disease-causing short tandem repeat expansions in massively parallel sequencing data, with a focus on ataxias

Tankard, Rick MichaelORCID: 0000-0002-8847-9401 (2017) Identifying disease-causing short tandem repeat expansions in massively parallel sequencing data, with a focus on ataxias. PhD thesis, The University of Melbourne .

Free to read:
*No subscription required


Short tandem repeat (STR) expansions are responsible for over twenty neurological diseases in humans. This thesis explores the ability to identify disease-causing STR expansions in massively parallel sequencing (MPS) data (also known as next-generation sequencing (NGS) data). The focus of this thesis is on repeat expansions of spinocerebellar ataxias (SCAs) as these can be difficult to subtype clinically. Detection of repeat expansion disorder alleles is important for disease management and carrier screening. Well-developed methods for analysing STRs in MPS data were limited to the fragment size of reads and did not attempt to give calls for oversized repeats. At the start of my PhD there were no published methods for detecting repeat expansions in MPS data. During the course of my PhD I developed one of these methods, exSTRa (expanded STR algorithm). exSTRa is based on the hypothesis that although MPS reads are too short to cover most repeat expansion alleles, they are still detectable because such alleles lead to an increased number of reads mapping to the STR in question. Firstly, exSTRa identifies as many reads as possible that could map into the highly repetitive STR region, recovering read pairs that are usually discarded, to achieve this. Secondly, exSTRa counts the repeat motif content to form a "repeat score" for each read that approximates the length of the repeat within the read. This is used to form a test statistic for each sample at each locus. We derived approximate distributions of the test statistic under the null distribution. We tested exSTRa on the largest and currently most diverse cohort of repeat expansion individuals available. We showed that repeat expansions can be detected with polymerase chain reaction-free (PCR-free) whole-genome sequencing (WGS) protocol data (as used by the other two published methods) and with PCR-based WGS sequencing protocol data, as well as whole-exome sequencing (WES) data (for targeted loci). We also present a family with an heritable, undiagnosed spinocerebellar ataxia with apparent anticipation that we mapped to four genomic locations, including one within the SCA25 locus (OMIM %608703). Anticipation suggested the causal mutation was a repeat expansion, but several heuristic and visual methods failed to find a causal expansion. We found two rare non-repeat variants in the genes STON1 (or read-through gene STON1-GTF2A1L) and PNPT1, but these were of unknown significance and did not explain the apparent anticipation. This work leads the way to retrospective and prospective repeat expansion detection of known STR loci and, in the future, could be expanded to detecting novel STR loci. This will benefit patients worldwide who may have their genetic disorder pinpointed to a repeat expansion disorder.

Item Type: Others
Item Control Page Item Control Page