Catalog Home Page

Data cleaning for classification using misclassification analysis

Jeatrakul, P., Wong, K.W. and Fung, C.C. (2010) Data cleaning for classification using misclassification analysis. Journal of Advanced Computational Intelligence and Intelligent Informatics, 14 (3). pp. 297-302.

[img] PDF - Published Version
Download (3255kB)
    Link to Published Version: http://www.fujipress.jp/finder/xslt.php?mode=prese...
    *Subscription may be required

    Abstract

    In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could sometime affect the classification accuracies of a model. In this paper, we investigate the use of misclassification analysis for data cleaning. In order to demonstrate our concept, we have used Artificial Neural Network (ANN) as the core computational intelligence technique. We use four benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository to investigate the results from our proposed data cleaning technique. The experimental data sets used in our experiment are binary classification problems, which are German credit data, BUPA liver disorders, Johns Hopkins Ionosphere and Pima Indians Diabetes. The results show that the proposed cleaning technique could be a good alternative to provide some confidence when constructing a classification model.

    Publication Type: Journal Article
    Murdoch Affiliation: School of Information Technology
    Publisher: Fuji Technology Press Co. Ltd.
    Copyright: (c) Fuji Technology Press
    URI: http://researchrepository.murdoch.edu.au/id/eprint/1310
    Item Control Page

    Downloads

    Downloads per month over past year