Use this URL to cite or link to this record in EThOS:
Title: Gaussian process in computational biology : covariance functions for transcriptomics
Author: Rahman, Muhammad Arifur
ISNI:       0000 0004 6500 4475
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
In the field of machine learning, Gaussian process models are widely used families of stochastic process for modelling data observed over time, space or both. Gaussian processes models are nonparametric, meaning that the models are developed on an infinite-dimensional parameter space. The parameter space is then typically learnt as the set of all possible solutions for a given learning problem. Gaussian process distributions are distribution over functions. The covariance function determines the properties of functions samples drawn from the process. Once the decision to model with a Gaussian process has been made the choice of the covariance function is a central step in modelling. In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences and controls the flow of genetic information from DNA to mRNA. To develop models of cellular processes, quantitative estimation of the regulatory relationship between transcription factors and genes is a basic requirement. Quantitative estimation is complex due to various reasons. Many of the transcription factors' activities and their own transcription level are post transcriptionally modified; very often the levels of the transcription factors' expressions are low and noisy. So, from the expression levels of their target genes, it is useful to infer the activity of the transcription factors. Here we developed a Gaussian process based nonparametric regression model to infer the exact transcription factor activities from a combination of mRNA expression levels and DNA-protein binding measurements. Clustering of gene expression time series gives insight into which genes may be coregulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic backgrounds. In this thesis, we developed a new clustering method that allows each cluster to be parametrized according to the behaviour of the genes across conditions whether they are correlated or anti-correlated. By specifying the correlation between such genes, we gain more information within the cluster about how the genes interrelate. Our study shows the effectiveness of sharing information between replicates and different model conditions while modelling gene expression time series.
Supervisor: Lawrence, Neil Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available