Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

Classification of overlapped data with improved regularisation techniques using Fuzzy Deep Neural Networks

Dabare, Rukshima (2020) Classification of overlapped data with improved regularisation techniques using Fuzzy Deep Neural Networks. PhD thesis, Murdoch University.

[img]
Preview
PDF - Whole Thesis
Download (3MB) | Preview

Abstract

This thesis investigates methods to enhance the performance of a Deep Neural Network (DNN) classifier when dealing with numerical data by the introduction of improved regularisation techniques. In this thesis, three factors are considered for enhancement. The three significant factors that are considered are overlapped data in balanced and imbalanced data environments and the issue of invariance and overfitting.

Many classification algorithms, such as DNNs, classify a data item as belonging to either true or false for binary classification. However, in some real-world applications, they may have data items belonging to two classes to a certain degree. Data with similar characteristics appear in the feature space with different degrees of belongings is known as overlapped data. The overlapping class issue is one of the significant factors that lead to poor classification performance. In practice, there are two ways of handling this overlapped data issue. First, is the removal of overlapped instances, and the second is the separating of the overlapped regions and classify them separately. However, there are many drawbacks to these practices. The removal of overlapped instances is not the best option as it may remove essential data items that describe the dataset, especially in an imbalanced dataset.

On the other hand, when the overlapped regions and non-overlapped regions are classified separately, then it is a time-consuming task. Hence, there is a need to consider other techniques to handle overlapped data. Furthermore, a traditional classifier does not consider the underlying overlapping behaviour of the data attributes. However, the underlying overlapping behaviour of the data attributes can be addressed with the use of fuzzy concepts. When a data item belongs to different degrees to different classes, that belongings can be modelled using fuzzy concepts to classify the classes. Therefore, in this research, an overlapped data handling technique using Fuzzy C-Means, fuzzy membership grades, and cluster centre values named as FuzzyDNN is proposed. The results indicated that the proposed FuzzyDNN is capable of addressing the underlying behaviour of the overlapped data when performing classification. FuzzyDNN improves the classification accuracy by 8.89%, 0.88% and 1.24% when compared with the next highest performing technique for the three datasets tested on this thesis.

On the other hand, DNNs tend to overfit due to its ability to extract more features from a given set of data. One of the main problems in the generalisation capability of a DNN classifier is due to a small number of training data with limited variations is used. It is, therefore, vital to present training data with different variations of the domain to a classifier to ensure that the classifier can generalise well. Therefore, if pattern variations are smaller in the training dataset, one cannot expect a good generalisation from the classifier. Hence, in this research, a technique to improve the generalisation capability of DNNs is proposed to address this issue. Generally, the techniques used to improve the generalisation ability is known as regularisation techniques. There are various regularisation techniques in practice to handle different issues that can affect the generalisation capability of the DNN. However, the proposed technique is capable of augmenting numerical dataset to enhance the training dataset by introducing variations in the training of a classifier. In this thesis, the FDA, the proposed data augmentation technique, uses fuzzy concepts. The experimental results indicated that the FDA could enhance the training dataset to assist the DNN classifier to generalise well to the unseen data and act as a proper regularisation technique when compared with some commonly used regularisation techniques.

Finally, in this research, the classification of the overlapped data for an imbalanced dataset, and its generalisation capability are considered concurrently. An imbalanced binary dataset is a dataset with instances of one class predominately higher than the other class. In such scenarios, the traditional classifiers biases towards the majority classes. However, the performance of a classifier degrades heavily when overlapped data also appear in the imbalanced dataset. Given that the issues of invariant of training data for the DNN can also occur at the same time given that the available data could be limited, there is a need to have a suite of techniques working together to address the three issues concurrently. Further, there is a limited amount of work concentrates on numerical data classification with DNNs for imbalanced overlapped data. Therefore, in this research, a model is proposed to handle the overlapped data in an imbalanced dataset using the proposed data augmentation technique to improve the generalisation ability of the DNN classification model. All the algorithms proposed in this thesis was implemented using MATLAB and Python (in an Anaconda Environment).

Item Type: Thesis (PhD)
Murdoch Affiliation(s): Information Technology, Mathematics and Statistics
Supervisor(s): Wong, Kevin, Shiratuddin, Fairuz and Koutsakis, Polychronis
URI: http://researchrepository.murdoch.edu.au/id/eprint/59203
Item Control Page Item Control Page

Downloads

Downloads per month over past year