Mahalanobis

Submitted by: Submitted by

Views: 147

Words: 1707

Pages: 7

Category: Other Topics

Date Submitted: 04/03/2013 12:20 AM

Report This Essay

A MULTIVARIATE OUTLIER DETECTION METHOD

P. Filzmoser Department of Statistics and Probability Theory Vienna, AUSTRIA e-mail: P.Filzmoser@tuwien.ac.at

Abstract A method for the detection of multivariate outliers is proposed which accounts for the data structure and sample size. The cut-off value for identifying outliers is defined by a measure of deviation of the empirical distribution function of the robust Mahalanobis distance from the theoretical distribution function. The method is easy to implement and fast to compute.

1

Introduction

Outlier detection belongs to the most important tasks in data analysis. The outliers describe the abnormal data behavior, i.e. data which are deviating from the natural data variability. Often outliers are of primary interest, for example in geochemical exploration they are indications for mineral deposits. The cut-off value or threshold which divides anomalous and non-anomalous data numerically is often the basis for important decisions. Many methods have been proposed for univariate outlier detection. They are based on (robust) estimation of location and scatter, or on quantiles of the data. A major disadvantage is that these rules are independent from the sample size. Moreover, by definition of most rules (e.g. mean ±2· scatter) outliers are identified even for “clean” data, or at least no distinction is made between outliers and extremes of a distribution. The basis for multivariate outlier detection is the Mahalanobis distance. The standard method for multivariate outlier detection is robust estimation of the parameters in the Mahalanobis distance and the comparison with a critical value of the χ2 distribution (Rousseeuw and Van Zomeren, 1990). However, also values larger than this critical value are not necessarily outliers, they could still belong to the data distribution. In order to distinguish between extremes of a distribution and outliers, Garrett (1989) introduced the χ2 plot, which draws the empirical...