Title:

Bayesian regression and discrimination with many variables

This thesis attempts to provide general procedures for Bayesian regression and discriminant analysis with many variables and explore potential problems in the analysis. For regression analysis, a normal random regression model is assumed, i.e. the joint distribution of the response variables and the regressors is multivariate normal given their means and covariance matrix. For the discriminant analysis, we consider the case that each observation is from one of several multivariate normal populations. In classical statistics, the problem in fitting a multivariate model with more variables than the number of observations is that the estimate of the covariance matrix of the multivariate normal distribution is singular and the distribution is degenerate. In Bayesian statistics, this problem can be avoided by using proper prior assumptions for the covariance matrix. We assign an inverseWishart distribution (which is a conjugate prior in the case of a nonhierarchical analysis) for the covariance matrix and suppose the prior expected covariance matrix has a simple structure so that the number of hyperparameters required in the model is small. Hierarchical modelling of these hyperparameters is employed. Although we have managed to keep the model relatively simple with our strong assumptions, the posterior model is still complicated. We found ARMS within Gibbs sampling with multiple chains to be an appropriate MCMC strategy for fitting our models. Convergence checking for multiple chains MCMC is simple. Due to the illcondition of the sample covariance matrix and the large number of variables, the computational problems are significant. Appropriate matrix manipulating and rescaling techniques are required. Two practical cases are considered as examples, one for regression and the other for discrimination. Both cases involve NIR spectral data with many variables. The high correlation between variables makes the examples more challenging. We consider three correlation structures including the oversimplified identity structure and two autoregressive correlation functions, which are believed to be much closer to the real situation than the oversimplified one. However, we found the autoregressive correlation functions do not guarantee better predictions in our examples.
