Use this URL to cite or link to this record in EThOS:
Title: Statistical methods for the analysis of contextual gene expression data
Author: Arnol, Damien
ISNI:       0000 0004 7651 4948
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Technological advances have enabled profiling gene expression variability, both at the RNA and the protein level, with ever increasing throughput. In addition, miniaturisation has enabled quantifying gene expression from small volumes of the input material and most recently at the level of single cells. Increasingly these technologies also preserve context information, such as assaying tissues with high spatial resolution. A second example of contextual information is multi-omics protocols, for example to assay gene expression and DNA methylation from the same cells or samples. Although such contextual gene expression datasets are increasingly available for both popu- lation and single-cell variation studies, methods for their analysis are not established. In this thesis, we propose two modelling approaches for the analysis of gene expression variation in specific biological contexts. The first contribution of this thesis is a statistical method for analysing single cell expression data in a spatial context. Our method identifies the sources of gene expression variability by decomposing it into different components, each attributable to a different source. These sources include aspects of spatial variation such as cell-cell interactions. In applications to data across different technologies, we show that cell-cell interactions are indeed a major determinant of the expression level of specific genes with a relevant link to their function. The second contribution is a latent variable model for the unsupervised analysis of gene expression data, while accounting for structured prior knowledge on experimental context. The proposed method enables the joint analysis of gene expression data and other omics data profiled in the same samples, and the model can be used to account for the grouping structure of samples, e.g. samples from individuals with different clinical covariates or from distinct experimental batches. Our model constitutes a principled framework to compare the molecular identities of these distinct groups.
Supervisor: Stegle, Oliver ; Saez-Rodriguez, Julio Sponsor: EMBL
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: Gaussian Processes ; Factor Analysis ; Gene Expression ; Machine Learning ; Bayesian Modelling