Title:

Computational methods for transformations to multivariate normality

The classical multivariate theory has been largely based on the multivariate normal distribution (MVN): the scarcity of alternative models for the meaningful and consistent analysis of multiresponse data is a well recognised problem. Further, the complexity of generalising many nonnormal univariate distributions makes it undesirable or impossible to use their multivariate versions. Hence, it seems reasonable to inquire about ways of transforming the data so as to enable the use of more familiar statistical techniques that are based implicitly or explicitly on the normal distribution. Techniques for developing databased transformations of univariate observations have been proposed by several authors. However, there is only one major technique in the multivariate (pvariable) case by Andrews et. al. [1971]. Their approach extended the power transformations proposed by Box & Cox [1964] to the problem of estimating power transformations of multiresponse data so as to enhance joint normality. The approach estimates the vector of transformation parameters by numerically maximising the loglikelihood function. However, since there are several parameters to be estimated, p(p+5)/2 for multivariate data without regression, the resulting maximisation is of high dimension, even with modest values of p and sample size n. The purpose of the thesis is to develop computationally simpler and more informative statistical procedures which are incorporated in a package. The thesis is in three main parts:  A proposed complementary procedure to the loglikelihood approach which attempts to reduce the size of the computational requirements for obtaining the estimates. Though computational simplicity is the main factor, the statistical qualities of the estimates are not compromised, indeed the estimated values are numerically identical to those of the loglikelihood. Further, the procedure implicitly produces diagnostic statistics and some useful statistical quantities describing the structure of the data. The technique is a generalisation of the constructed variables method of obtaining quick estimates for transformation parameters [Atkinson 1985]. To take into account the multiresponse nature of the data and, hence, joint estimates, a seemingly unrelated regression is carried out. The algorithm is iterative. However, there is considerable savings in the number of iterations required to converge to the maximum likelihood (MLE) estimates compared to those using the loglikelihood function. The technique is refered to as the Seemingly Unrelated Regressions/Constructed Variable (SURCON) analysis, and the estimates obtained are the Surcon estimates.  The influence of individual observations on the need for transformations is quite crucial and, hence, it is necessary to investigate the data for any spurious or suspicious observations, outliers. The thesis also proposes an iterative technique for detecting and identifying outliers based on Mahalanobis distances computed from subsamples of the observations. The results of the analysis are displayed in a graphical summary called the Stalactite Chart, hence, the analysis is refered to as the Stalactite Analysis.  The development of a userfriendly microcomputerbased statistical package which incorporates the above techniques. The package is written in the C programming language.
