Use this URL to cite or link to this record in EThOS:
Title: Bayesian modelling and sampling strategies for ordering and clustering problems with a focus on next-generation sequencing data
Author: Strauss, Magdalena Elisabeth
ISNI:       0000 0004 7968 4045
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis presents novel methods for ordering and clustering problems. The first two parts focus on the development of models and sampling strategies specifically tailored for next-generation sequencing data. Most high-throughput measurements for single-cell data are destructive, resulting in the loss of longitudinal information. I developed a new, Bayesian, way of reconstructing this information computationally, sampling orders efficiently using MCMC on a space of permutations. This Bayesian approach provides novel insights into biological phenomena and experimental artefacts. The second part presents a new clustering method for single-cell data, which specifically models the uncertainty of the clustering structure that results in part from the uncertainty of the orders discussed above. The proposed method uses nonparametric Bayesian methods, consensus clustering and efficient MCMC sampling to identify differences in dynamic patterns for different branches of gene expression data. It also categorises genes in a way consistent with biological function in an application to stimulated dendritic cells, and integrates data from different cell lines in a principled way. The third part of the thesis adapts some of the methods developed in the first two parts to applications with very sparsely and irregularly sampled data, and explores through simulations the applicability of such models in different circumstances. The fourth part discusses clustering methods for samples in a variety of different contexts, such as RNA expression, methylation or protein expression, and develops and critically discusses a novel hierarchical Bayesian method that integrates both different contexts and different groups of samples, for example different cancer types. The unifying underlying theme of the thesis is the development of methods and efficient sampling and approximation strategies capable of capturing the uncertainty inherent in any statistical analysis of high-dimensional and noisy data.
Supervisor: Wernisch, Lorenz Sponsor: Medical Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: next-generation sequencing data ; pseudotime ordering ; nonparametric Bayes ; efficient sampling ; multi-omics methods