Catalog Home Page

The role of the prefix array in sequence analysis: A survey

Franěk, F., Smyth, W.F. and Wang, X. (2017) The role of the prefix array in sequence analysis: A survey. AIMS Medical Science, 4 (3). pp. 261-273.

[img]
Preview
PDF - Published Version
Download (422kB) | Preview
Free to read: https://doi.org/10.3934/medsci.2017.3.261
*No subscription required

Abstract

The prefix array was apparently first computed and used algorithmically in 1984, playing a pivotal role in an optimal algorithm to determine all the tandem repeats in a given (DNA or protein) sequence. However, it is especially since the turn of the 21st century that applications of the prefix array to fundamental sequencing problems have been recognized. An important aspect of this expanding role has been the recognition that the prefix table and the border array are “equivalent” data structures 一 that is, one can be computed from the other in linear time. Since the border array in turn specifies all the periods of every prefix of the sequence, the prefix array thus turns out to be a structure of central importance. In this paper we survey important applications of the prefix array 一 in particular to approximate string matching under Hamming distance, as well as to the computation of covers and enhanced covers 一 and show how, unlike border array algorithms, these are extendible to sequences containing “don’t-care” or indeterminate letters such as {a, c} or {g, t}. This extension leads to a surprising correspondence between prefix arrays and undirected graphs that seems likely to be a fertile source of new insights in future. We conclude with an overview of sequencing problems that the authors believe can be handled using prefix array technology.

Publication Type: Journal Article
Murdoch Affiliation: School of Engineering and Information Technology
Publisher: AIMS Press
Copyright: © 2017, W. F. Smyth, et al.
URI: http://researchrepository.murdoch.edu.au/id/eprint/39899
Item Control Page Item Control Page

Downloads

Downloads per month over past year