Catalog Home Page

A comparison of procedures for robust multivariate outlier detection

Grose, Andrew (2019) A comparison of procedures for robust multivariate outlier detection. Honours thesis, Murdoch University.

[img]
PDF - Whole Thesis
Embargoed until January 2022.

Abstract

This thesis aims to make a comparison between the efficacy of three prominent multivariate outlier detection algorithms, and in addition compare the efficiency of the methods (methods with outliers removed) with common robust estimation methods. This is done by comparing estimates of their relative efficiency based on estimates of multivariate mean in Monte Carlo simulation. The number of samples generated for these simulations is chosen to ensure reliable estimates yet avoid excessive computing times. Efficacy in comparison of multivariate outlier detection is judged by examining performance in terms of power in finding outliers in simulated data from contaminated distributions, as well as size for uncontaminated distributions, as well as looking at power in real-world data sets with known or planted outliers. Comparisons of time elapsed for various routines are briefly investigated, albeit on an ad hoc basis.

A particular motivation for this study is the focus on a specific adaptive method known as the adaptive trimmed likelihood algorithm (ATLA). ATLA is the multivariate version of a method which developed out of adaptive univariate location estimation first explored in Clarke (1994) and later related in terms of asymptotic theory in Bednarski and Clarke (2002). The asymptotic theory for the trimmed likelihood estimator was countenanced in Bednarski and Clarke (1993). A numerical routine using what is termed forward search and various comparisons made using ATLA are also described in Schubert (2005). The routine was later slightly modified by Robert Hammarstrand for completeness and is available in the supplementary materials of Clarke (2018) at the website:

https://www.wiley.com/en-au/Robustness+Theory+and+Application-p- 9781118669303

Also available at that website is an algorithm called Onesample written by Brenton R Clarke and Betty Mouchel that implements the initial algorithm used to evaluate the estimator described in Clarke (1994) in the case of univariate estimation. For multivariate estimation, ATLA serves as an outlier detection method based on the use of the minimum covariance determinant (MCD) (Rousseeuw; 1983) used in an adaptive way.

Simulations and resulting output given in this thesis are presented with software in R and MATLAB, calling on new and pre-established functions available in downloadable packages. The code is presented in the appendix, together with supporting information that details the respective tasks of the associated functions along with the various functions and/or packages that are required.

Item Type: Thesis (Honours)
Murdoch Affiliation: Information Technology, Mathematics and Statistics
Supervisor(s): Clarke, Brenton
URI: http://researchrepository.murdoch.edu.au/id/eprint/53953
Item Control Page Item Control Page