Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.714054
Title: Latent variable models for analysing multidimensional gene expression data
Author: Hore, Victoria
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Abstract:
Multi-tissue gene expression studies give rise to 3D arrays of data. These experiments make it possible to study the tissue-specific nature of gene regulation and also the relationship between genotypes and higher level traits such as disease status. Analysing these multidimensional data sets is a statistical challenge, as they contain high noise levels and missing data. In this thesis I introduce a new approach for analysing multidimensional gene expression data sets called SPIDER (SParse Integrated DEcomposition for RNA-sequencing). SPIDER is a sparse Bayesian tensor decomposition that models the data as a sum of components (or factors). Each component consists of three vectors of scores or loadings that describe modes of variation across individuals, genes and tissues. Sparsity is induced in the components using a spike and slab prior, allowing for recovery of sparse structure in the data. The decomposition is easily extended to jointly decompose several data types, handle missing data and allow for relatedness between individuals, another common problem in genetics. Inference for the model is performed using variational Bayes. SPIDER is compared to existing approaches for decomposing multidimensional data via simulations. Results suggest that SPIDER performs comparably to, or better than, existing approaches and particularly well when the underlying signals are very sparse. Additional simulations designed to contain realistic levels of signal and noise suggest that SPIDER has the power to recover gene networks from gene expression data. I have applied SPIDER to gene expression data measured using RNA-sequencing for 845 individuals in three tissues from the TwinsUK cohort. Estimated components were tested for association with genetic variation genome-wide. Five signals describing gene regulation networks driven by genetic variants are uncovered, building on the current understanding of these pathways. In addition, components uncovering effects of experimental artefacts and covariates were also recovered from the data.
Supervisor: Marchini, Jonathan Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.714054  DOI: Not available
Share: