Use this URL to cite or link to this record in EThOS:
Title: Multivariate analysis of tumour gene expression profiles applying regularisation and Bayesian variable selection techniques
Author: Zucknick, Manuela
ISNI:       0000 0004 2677 7360
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2009
Availability of Full Text:
Access from EThOS:
Access from Institution:
High-throughput microarray technology is here to stay, e.g. in oncology for tumour classification and gene expression profiling to predict cancer pathology and clinical outcome. The global objective of this thesis is to investigate multivariate methods that are suitable for this task. After introducing the problem and the biological background, an overview of multivariate regularisation methods is given in Chapter 3 and the binary classification problem is outlined (Chapter 4). The focus of applications presented in Chapters 5 to 7 is on sparse binary classifiers that are both parsimonious and interpretable. Particular emphasis is on sparse penalised likelihood and Bayesian variable selection models, all in the context of logistic regression. The thesis concludes with a final discussion chapter. The variable selection problem is particularly challenging here, since the number of variables is much larger than the sample size, which results in an ill-conditioned problem with many equally good solutions. Thus, one open problem is the stability of gene expression profiles. In a resampling study, various characteristics including stability are compared between a variety of classifiers applied to five gene expression data sets and validated on two independent data sets. Bayesian variable selection provides an alternative to resampling for estimating the uncertainty in the selection of genes. MCMC methods are used for model space exploration, but because of the high dimensionality standard algorithms are computationally expensive and/or result in poor Markov chain mixing. A novel MCMC algorithm is presented that uses the dependence structure between input variables for finding blocks of variables to be updated together. This drastically improves mixing while keeping the computational burden acceptable. Several algorithms are compared in a simulation study. In an ovarian cancer application in Chapter 7, the best-performing MCMC algorithms are combined with parallel tempering and compared with an alternative method.
Supervisor: Gabra, Hani ; Richardson, Sylvia Sponsor: Wellcome Trust
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral