Use this URL to cite or link to this record in EThOS:
Title: Probabilistic latent variable models in statistical genomics
Author: Fusi, Nicolo
ISNI:       0000 0004 5363 6096
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
In this thesis, we propose different probabilistic latent variable mod- els to identify and capture the hidden structure present in commonly studied genomics datasets. We start by investigating how to cor- rect for unwanted correlations due to hidden confounding factors in gene expression data. This is particularly important in expression quantitative trait loci (eQTL) studies, where the goal is to identify associations between genetic variants and gene expression levels. We start with a na¨ ıve approach, which estimates the latent factors from the gene expression data alone, ignoring the genetics, and we show that it leads to a loss of signal in the data. We then highlight how, thanks to the formulation of our model as a probabilistic model, it is straightforward to modify it in order to take into account the specific properties of the data. In particular, we show that in the na¨ ıve ap- proach the latent variables ”explain away” the genetic signal, and that this problem can be avoided by jointly inferring these latent variables while taking into account the genetic information. We then extend this, so far additive, model to additionally detect interactions between the latent variables and the genetic markers. We show that this leads to a better reconstruction of the latent space and that it helps dis- secting latent variables capturing general confounding factors (such as batch effects) from those capturing environmental factors involved in genotype-by-environment interactions. Finally, we investigate the effects of misspecifications of the noise model in genetic studies, show- ing how the probabilistic framework presented so far can be easily ex- tended to automatically infer non-linear monotonic transformations of the data such that the common assumption of Gaussian distributed residuals is respected.
Supervisor: Lawrence, Neil Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available