Title:
|
Outlier detection with an application in seed testing
|
The main problem addressed in this work is the detection of outliers in multivariate data. The practical motivation for this work is given by the desire of the Scottish Agricultural Science Agency (SASA) to automate the test of analytical purity in seed testing using machine vision. This test involves identifying any contaminant seeds in a sample of normal cereal seed. At present this test is done manually, but it is hoped that it may be possible to identify any contaminants on the basis of certain shape and size measurements recorded using image analysis. Hence the statistical problem is one of identifying outliers (contaminants) in multivariate data. A Bayesian diagnostic for outlier detection is used, and an extension to this diagnostic (involving kernel density estimation) is proposed. Both these diagnostics, as well as two other methods of outlier detection, are applied to the seed data supplied by SASA, and a comparison of the results is given. Problems encountered with high dimensionality are reported, and a solution based on principal component analysis is proposed. The use of robust estimators is also explored. An alternative approach to the problem, using discriminant analysis to classify each seed in the sample as either normal or contaminant, is reported. A comparison of the results achieved using discrimination with those of the outlier detection methods is given.
|