Catalog Home Page

Text dimensionality reduction for document clustering using hybrid memetic feature selection

Al-Jadir, I., Wong, K.W., Fung, C.C. and Xie, H. (2017) Text dimensionality reduction for document clustering using hybrid memetic feature selection. Lecture Notes in Computer Science, 10607 . pp. 281-289.

Link to Published Version: https://doi.org/10.1007/978-3-319-69456-6_23
*Subscription may be required

Abstract

In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL).

Publication Type: Journal Article
Murdoch Affiliation: School of Engineering and Information Technology
Publisher: Springer Verlag
Copyright: © 2017 Springer International Publishing AG
URI: http://researchrepository.murdoch.edu.au/id/eprint/39790
Item Control Page Item Control Page