Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Large-scale detection of repetitions

Smyth, W. F. (2014) Large-scale detection of repetitions. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372 (2016).

Free to read: http://dx.doi.org/10.1098/rsta.2013.0138
*No subscription required

Abstract

Combinatorics on words began more than a century ago with a demonstration that an infinitely long string with no repetitions could be constructed on an alphabet of only three letters. Computing all the repetitions (such as ⋯TTT⋯ or ⋯CGACGA⋯ ) in a given string x of length n is one of the oldest and most important problems of computational stringology, requiring Embedded Image time in the worst case. About a dozen years ago, it was discovered that repetitions can be computed as a by-product of the Θ(n)-time computation of all the maximal periodicities or runs in x. However, even though the computation is linear, it is also brute force: global data structures, such as the suffix array, the longest common prefix array and the Lempel–Ziv factorization, need to be computed in a preprocessing phase. Furthermore, all of this effort is required despite the fact that the expected number of runs in a string is generally a small fraction of the string length. In this paper, I explore the possibility that repetitions (perhaps also other regularities in strings) can be computed in a manner commensurate with the size of the output.

Item Type: Journal Article
Publisher: Royal Society Publishing
Copyright: 2014 The Author(s)
URI: http://researchrepository.murdoch.edu.au/id/eprint/28248
Item Control Page Item Control Page