Use this URL to cite or link to this record in EThOS:
Title: Unsupervised learning with graph theoretical algorithms and its applications to transcriptomic data analysis
Author: Liu, Zijing
ISNI:       0000 0004 7963 7767
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
High-throughput sequencing technologies bring a large amount of data in genomic research, with complex structure and of high dimension. With the aim of extracting meaningful knowledge from a simplified representation, we develop graph-based methods for analysing high dimensional data, focusing on clustering analysis and dimensionality reduction. We first study the problem of graph partition, which is closely connected with clustering analysis. With spectral methods, we reformulate a dynamical based multiscale graph partition framework as a max-sum vector partitioning problem. The graph nodes are embedded as vectors varying with the time of a Markov process running on the graph, which leads to multi-resolution graph partitions. Our derivation also clarifies the quantity optimised by k-means in graph partition, and establishes its connection to spectral clustering. Clustering analysis with multiscale graph partitioning is then investigated. Different methods for estimating a graph from the vector data are compared empirically on real datasets. The advantage of using multiscale graph partitioning for clustering is illustrated with both synthetic and real data. We further propose a similarity measure for time-dependent data based on a Gaussian process model. An RNA sequencing, time course dataset is analysed as an example application. Finally, we integrate the graph theoretical clustering and a graph-based dimensionality reduction method with Gaussian processes. We exemplify our approach through the analysis of a transcriptomic dataset of cellular reprogramming from B-cells to iPSCs. We extract a landscape that describes the reprogramming process and identify associated genes for clustering analysis. We also reconstruct another landscape from an integrated transcriptomic dataset characterising the hematopoietic differentiation process from stem cells to somatic cells. The differences between the forward and backward processes are then studied by integrating two landscapes.
Supervisor: Mauricio, Barahona Sponsor: European Commission
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral