Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Unsupervised text Feature Selection using memetic Dichotomous Differential Evolution

Al-Jadir, I., Wong, K.W., Fung, C.C.ORCID: 0000-0001-5182-3558 and Xie, H. (2020) Unsupervised text Feature Selection using memetic Dichotomous Differential Evolution. Algorithms, 13 (6). Article 131.

[img]
Preview
PDF - Published Version
Download (590kB) | Preview
Free to read: https://doi.org/10.3390/a13060131
*No subscription required

Abstract

Feature Selection (FS) methods have been studied extensively in the literature, and there are a crucial component in machine learning techniques. However, unsupervised text feature selection has not been well studied in document clustering problems. Feature selection could be modelled as an optimization problem due to the large number of possible solutions that might be valid. In this paper, a memetic method that combines Differential Evolution (DE) with Simulated Annealing (SA) for unsupervised FS was proposed. Due to the use of only two values indicating the existence or absence of the feature, a binary version of differential evolution is used. A dichotomous DE was used for the purpose of the binary version, and the proposed method is named Dichotomous Differential Evolution Simulated Annealing (DDESA). This method uses dichotomous mutation instead of using the standard mutation DE to be more effective for binary purposes. The Mean Absolute Distance (MAD) filter was used as the feature subset internal evaluation measure in this paper. The proposed method was compared with other state-of-the-art methods including the standard DE combined with SA, which is named DESA in this paper, using five benchmark datasets. The F-micro, F-macro (F-scores) and Average Distance of Document to Cluster (ADDC) measures were utilized as the evaluation measures. The Reduction Rate (RR) was also used as an evaluation measure. Test results showed that the proposed DDESA outperformed the other tested methods in performing the unsupervised text feature selection.

Item Type: Journal Article
Murdoch Affiliation(s): Information Technology, Mathematics and Statistics
Publisher: MDPI
Copyright: © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
URI: http://researchrepository.murdoch.edu.au/id/eprint/56147
Item Control Page Item Control Page

Downloads

Downloads per month over past year