APPROXIMATE ENVELOPES FOR FINDING
AN UNKNOWN NUMBER OF MULTIVARIATE
OUTLIERS IN LARGE DATA SETS
Anthony C. Atkinson | Marco Riani | Laurini Fabrizio |
The London School of Economics, | Dipartimento di Economia, | Dipartimento di Economia, |
London WC2A 2AE, UK | Università di Parma, | Università di Parma, |
UK | Italy | Italy |
a.c.atkinson@lse.ac.uk | mriani@unipr.it | fabrizio.laurini@unipr.it |
Abstract
We provide
thresholds for the test statistic for multiple outliers in multivariate normal
samples. Except in small problems, direct simulation of the required
combinations of sample size, number of outliers, dimension of the observations
and significance level is excessively time consuming. We find the thresholds by
rescaling a paradigmatic curve found by simulation. Our method is illustrated on
an example with 1,827 observations.
Additional material (figures using unscaled distances)
Figure 3 | ps | |
Figure 4 | ps | |
Figure 5 | ps | |
Figure 7 | ps | |
Figure 8 | ps | |
Figure 9 | ps |
Additional material (scatter plot matrices of supermarket data)
First 3 variables before transformation | ps | |
First 3 variables after transformation | ps | |
First 3 variables after transformation with outliers highlighted | ps | |
Variables 4, 5 and 9 after transformation | ps | |
Variables 4, 5 and 9 after transformation with outliers highlighted | ps |
Supermarket data (transformed data)
Excel format | xls |
Splus format | sdd |
Last modified 10/04/2017 17.25.14