Use this URL to cite or link to this record in EThOS:
Title: Complex modelling of multi-outcome data with applications to cancer biology
Author: Oftadeh, Elaheh
ISNI:       0000 0004 6497 3768
Awarding Body: University of Kent
Current Institution: University of Kent
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
In applied scientific areas such as economics, finance, biology, and medicine, it is often required to find the relationship between a set of independent variables (predictors) and a set of response variables (i.e., outcomes of an experiment). If we model individual outcomes separately, we potentially miss information of the correlation among outcomes. Therefore, it is desirable to model these outcomes simultaneously by multivariate linear regressions. With the advent of high-throughput technology, there is an enormous amount of high dimensional multivariate regression data being generated at an extraordinary speed. However, only a small proportion of them are informative. This has imposed a challenge on modern statistics because of this high dimensionality. In this work, we propose methods and algorithms for modelling high-dimensional multivariate regression data. The contributions of this thesis are as follows. Firstly, we propose two variable screening techniques to reduce the high dimension of predictors. One is a beamforming-based screening method which is based on a statistic called SNR. The second approach is a mixture-based screening where the screening is conducted through the so-called likelihood fusion. Secondly, we propose a variable selection method called principal variable analysis (PVA). In PVA we take into account the correlation between response variables in the process of variable selection. We compare PVA with some of well-known variable selection methods by simulation studies, showing that PVA can substantially enhance the selection accuracy. Thirdly, we develop a method for clustering and variable selection simultaneously, by using the likelihood fusion. We show the feature of the proposed method by simulation studies. Fourthly, we study a Bayesian clustering problem through the mixture of normal distributions where we propose mixing-proportion dependent priors for component parameters. Finally, we apply the proposed methods to cancer drug data. This data contain expression levels of 13321 genes across 42 cell lines and the responses of these cell lines to 131 drugs, recorded as fifty percent inhibitory concentration (IC50) values. We identify 37 genes which are important for predicting IC50 values. We found that although the expressions of these genes are weakly correlated, they are highly correlated in terms of their regression coefficients. We also identify a regression coefficient-based network between genes. We also show that 34 out of 37 selected genes have played certain roles in at least one type of cancer. Moreover, by applying the likelihood fusion model to real data we classify the drugs into five groups.
Supervisor: Zhang, Jian ; Villa, Cristiano Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available