Using misclassification analysis for data cleaning
Jeatrakul, P., Wong, K.W. and Fung, C.C. (2009) Using misclassification analysis for data cleaning. In: International Workshop on Advanced Computational Intelligence and Intelligent Informatics, IWACIII 2009, 7 November, Tokyo, Japan
Data cleaning is a pre-processing technique used in most data mining problems. The purpose of data cleaning is to remove noise, inconsistent data and errors in order to obtain a better and representative data set to develop a reliable prediction model. In most prediction model, unclean data could sometime affect the prediction accuracies of a model. In this paper, we investigate classification problem, which make use of misclassification analysis technique for data cleaning. To demonstrate our concept, we have used artificial neural network (ANN) as the core computational intelligence technique. We use three benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository to investigate the results from our proposed data cleaning technique. The experimental data sets used in our experiment are binary classification problems, which are German credit data, BUPA liver disorders, and Johns Hopkins Ionosphere. The results from our experiments show that the proposed cleaning technique could be a good alternative to provide some confidence when constructing a classification model.
|Publication Type:||Conference Paper|
|Murdoch Affiliation:||School of Information Technology|
|Item Control Page|
Downloads per month over past year