Use this URL to cite or link to this record in EThOS:
Title: Shape analysis in protein structure alignment
Author: Gkolias, Theodoros
ISNI:       0000 0004 7228 249X
Awarding Body: University of Kent
Current Institution: University of Kent
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
In this Thesis we explore the problem of structural alignment of protein molecules using statistical shape analysis techniques. The structural alignment problem can be divided into three smaller ones: the representation of protein structures, the sampling of possible alignments between the molecules and the evaluation of a given alignment. Previous work done in this field, can be divided in two approaches: an adhoc algorithmic approach from the Bioinformatics literature and an approach using statistical methods either in a likelihood or Bayesian framework. Both approaches address the problem from a different scope. For example, the algorithmic approach is easy to implement but lacks an overall modelling framework, and the Bayesian address this issue but sometimes the implementation is not straightforward. We develop a method which is easy to implement and is based on statistical assumptions. In order to asses the quality of a given alignment we use a size and shape likelihood density which is based in the structure information of the molecules. This likelihood density is also extended to include sequence infor- mation and gap penalty parameters so that biologically meaningful solution can be produced. Furthermore, we develop a search algorithm to explore possible alignments from a given starting point. The results suggest that our approach produces better or equal alignments when it is compared to the most recent struc- tural alignment methods. In most of the cases we managed to achieve a higher number of matched atoms combined with a high TMscore. Moreover, we extended our method using Bayesian techniques to perform alignments based on posterior modes. In our approach, we estimate directly the mode of the posterior distribution which provides the final alignment between two molecules. We also, choose a different approach for treating the mean parameter. In previous methods the mean was either integrated out of the likelihood density or considered as fixed. We choose to assign a prior over it and obtain its posterior mode. Finally, we consider an extension of the likelihood model assuming a Normal density for both the matched and unmatched parts of a molecule and diagonal covariance structure. We explore two different variants. In the first we consider a fixed zero mean for the unmatched parts of the molecules and in the second we consider a common mean for both the matched and unmatched parts. Based on simulated and real results, both models seems to perform well in obtaining high number of matched atoms and high TMscore.
Supervisor: Kume, Alfred Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA Mathematics (inc Computing science) ; QA276 Mathematical statistics ; QP Physiology (Living systems)