APPROXIMATE ENVELOPES FOR FINDING
AN UNKNOWN NUMBER OF MULTIVARIATE
OUTLIERS IN LARGE DATA SETS

 

 Anthony C. Atkinson Marco Riani Laurini Fabrizio
The London School of Economics, Dipartimento di Economia, Dipartimento di Economia,
London WC2A 2AE, UK Università di Parma, Università di Parma,
 UK Italy Italy
 a.c.atkinson@lse.ac.uk mriani@unipr.it fabrizio.laurini@unipr.it

Abstract

 

We provide thresholds for the test statistic for multiple outliers in multivariate normal samples. Except in small problems, direct simulation of the required combinations of sample size, number of outliers, dimension of the observations and significance level is excessively time consuming. We find the thresholds by rescaling a paradigmatic curve found by simulation. Our method is illustrated on an example with 1,827 observations.

 

Additional material (figures using unscaled distances)

Figure 3 ps pdf
Figure 4 ps pdf
Figure 5 ps pdf
Figure 7 ps pdf
Figure 8 ps pdf
Figure 9 ps pdf

 

Additional material (scatter plot matrices of supermarket data)

First 3 variables before transformation ps pdf
First 3 variables after transformation ps pdf
First 3 variables after transformation with outliers highlighted ps pdf
Variables 4, 5 and 9 after transformation ps pdf
Variables 4, 5 and 9 after transformation with outliers highlighted ps pdf

 

Supermarket data (transformed data)

Excel format xls
Splus format sdd

Last modified 10/04/2017 17.25.14