Counting distinct strings
Moore, D., Smyth, W.F. and Miller, D. (1999) Counting distinct strings. Algorithmica, 23 (1). pp. 1-13.
*Subscription may be required
This paper discusses how to count and generate strings that are ``distinct'' in two senses: p -distinct and b -distinct. Two strings x on alphabet A and x' on alphabet A' are said to be p -distinct iff they represent distinct ``patterns''; that is, iff there exists no one—one mapping from A to A' that transforms x into x' . Thus aab and baa are p -distinct while aab and ddc are p -equivalent. On the other hand, x and x' are said to be b -distinct iff they give rise to distinct border (failure function) arrays: thus aab with border array 010 is b -distinct from aba with border array 001 . The number of p -distinct (resp. b -distinct) strings of length n formed using exactly k different letters is the [k,n] entry in an infinite p' (resp. b' ) array. Column sums p[n] and b[n] in these arrays give the number of distinct strings of length n . We present algorithms to compute, in constant time per string, all p -distinct (resp. b -distinct) strings of length n formed using exactly k letters, and we also show how to compute all elements p'[k,n] and b'[k,n] . These ideas and results have application to the efficient generation of appropriate test data sets for many string algorithms.
|Publication Type:||Journal Article|
|Item Control Page|
Downloads per month over past year