Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Fast and practical algorithms for computing all the runs in a string

Chen, G., Puglisi, S.J. and Smyth, W.F. (2007) Fast and practical algorithms for computing all the runs in a string. Lecture Notes in Computer Science, 4580 . pp. 307-315.

Link to Published Version: http://dx.doi.org/10.1007/978-3-540-73437-6_31
*Subscription may be required

Abstract

A repetition in a string x is a substring w=ue of x, maximum e ≥ 2, where u is not itself a repetition in w. A run in x is a substring w=ueu∗ of “maximal periodicity”, where ue is a repetition and u * a maximum-length possibly empty proper prefix of u. A run may encode as many as |u| repetitions. The maximum number of repetitions in any string x=x[1..n] is well known to be Θ(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a Θ(n)-time algorithm, based on Farach’s Θ(n)-time suffix tree construction algorithm (STCA), Θ(n)-time Lempel-Ziv factorization, and Main’s Θ(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a Θ(n)-time Lempel-Ziv factorization algorithm based on an “enhanced” suffix array — a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.

Item Type: Journal Article
Publisher: Springer Verlag
Copyright: 2007 Springer-Verlag Berlin Heidelberg
URI: http://researchrepository.murdoch.edu.au/id/eprint/27936
Item Control Page Item Control Page