Use this URL to cite or link to this record in EThOS:
Title: Machine learning approaches for extracting protein complexes from protein-protein interaction networks
Author: Cai, Bingjing
Awarding Body: University of Ulster
Current Institution: Ulster University
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Recent advances in molecular biology have led to the accumulation of large amounts of data on Protein-Protein Interaction (PPI) networks in different species, such as yeast and humans. Due to the inherent complexity, analysing such volumes of data to extract knowledge, such as protein complexes or regulatory pathways, represents not only an enormous challenge but also a great opportunity. This Thesis explores the application of machine learning approaches to detecting protein complexes from PPI networks obtained by Tandem Affinity Purification/Mass Spectrometry (TAP-MS) experiments. TAP-MS PPI networks are usually constructed as binary, and the co-complex relations are largely ignored. In order to take into account the non-binary information of co-complex relations in T AP-MS PPI networks, a new framework for detecting protein complexes has been proposed. Under this framework, two types of graph clustering algorithms and an integrative evaluation platform combining data-driven and knowledge-based quality measures have been proposed and studied. One type of the proposed graph clustering algorithms is random walk based graph clustering, resulting in Enhanced Random Walk with Restart (ERWR) and Random Walk with Restarting Baits (RWRB). The other type is based on the modelling of TAP-MS PPI networks as bipartite graphs, resulting in the Bipartite Graph based Clustering Algorithm (BGCA). The ER WR algorithm has been developed from the Random Walk with Restart (R WR). The key contribution of the ERWR is the introduction of a tuning factor into the random walk process. The tuning factor strengthens connections between nodes that are closer and weakens those that are distant, so that the random walker prefers moving to nodes which are potentially in the same clusters with the starting node.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available