Data cleaning for classification using misclassification analysis
Jeatrakul, P., Wong, K.W. and Fung, C.C. (2010) Data cleaning for classification using misclassification analysis. Journal of Advanced Computational Intelligence and Intelligent Informatics, 14 (3). pp. 297-302.
|PDF - Published Version |
*Subscription may be required
In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could sometime affect the classification accuracies of a model. In this paper, we investigate the use of misclassification analysis for data cleaning. In order to demonstrate our concept, we have used Artificial Neural Network (ANN) as the core computational intelligence technique. We use four benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository to investigate the results from our proposed data cleaning technique. The experimental data sets used in our experiment are binary classification problems, which are German credit data, BUPA liver disorders, Johns Hopkins Ionosphere and Pima Indians Diabetes. The results show that the proposed cleaning technique could be a good alternative to provide some confidence when constructing a classification model.
|Publication Type:||Journal Article|
|Murdoch Affiliation:||School of Information Technology|
|Publisher:||Fuji Technology Press Co. Ltd.|
|Copyright:||(c) Fuji Technology Press|
|Item Control Page|