Multivariate outlier detection in laboratory safety data
Clinical laboratory safety data consist of a wide range of biochemical and haematological variables which are collected to monitor the safety of a new treatment during a clinical trial. Although the data are multivariate, testing for abnormal measurements is usually done for only one variable at a time. A Monte Carlo simulation study is described, which compares 16 methods, some of which are new, for detecting multivariate outliers with a view to finding patients with an unusual set of laboratory measurements at a follow-up assessment. Multivariate normal and bootstrap simulations are used to create data sets of various dimensions. Both symmetrical and asymmetrical contamination are considered in this study. The results indicate that in addition to the routine univariate methods, it is desirable to run a battery of multivariable methods on laboratory safety data in an attempt to highlight possible outliers. Mahalanobis distance is a well-known criterion which is included in the study. Appropriate critical values when testing for a single multivariate outlier using Mahalanobis Distance are derived in this thesis, and the jack-knifed Mahalanobis distance is also discussed. Finally, the presence of missing data in laboratory safety data sets is the motivation behind a study which compares eight multiple imputation methods. The multiple imputation study is described, and the performance of two outlier detection methods in the presence of three different proportions of missing data are discussed. Measures are introduced for assessing the accuracy of the missing data results, depending on which method of analysis is used.