Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Exploring the use of fuzzy signature for text mining

Wong, K.W., Chumwatana, T. and Tikk, D. (2010) Exploring the use of fuzzy signature for text mining. In: 6th IEEE World Congress on Computational Intelligence, 18 - 23 July, Barcelona pp. 1-5.

PDF - Published Version
Download (162kB)
Link to Published Version:
*Subscription may be required


The classical approaches for the traditional problems of text mining, such as document indexing, document clustering or text classification, represent the text as bag-of-words. Words, the units of the representation, are determined by tokenization, using e.g. whitespace and punctuation characters as separator. The bag-of-word based methods face problem with non-segmented text typical for some Asian languages, since the tokenization based solution cannot be applied anymore to determine the representation units. Several solutions were proposed so far, among them frequent max substring mining is adopted here because of its language-independency and favourable speed and store requirements. We present in this paper a fuzzy signature based solution using frequent max substring for non-segmented document representation, and propose how it could be applied for some typical text mining tasks. We show how the flexibility of fuzzy signatures can be exploited for text mining tasks. With the use of this proposed concept, complex decision models in text mining may be constructed more effectively in future.

Item Type: Conference Paper
Murdoch Affiliation(s): School of Information Technology
Publisher: IEEE
Copyright: © 2010 IEEE.
Item Control Page Item Control Page


Downloads per month over past year