Use this URL to cite or link to this record in EThOS:
Title: Role of mutual information for predicting contact residues in proteins
Author: Gomes, Mireille
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Mutual Information (MI) based methods are used to predict contact residues within proteins and between interacting proteins. There have been many high impact papers citing the successful use of MI for determining contact residues in a particular protein of interest, or in certain types of proteins, such as homotrimers. In this dissertation we have carried out a systematic study to assess if this popularly employed contact prediction tool is useful on a global scale. After testing original MI and leading MI based methods on large, cross-species datasets we found that in general the performance of these methods for predicting contact residues both within (intra-protein) and between proteins (inter-protein) is weak. We observe that all MI variants have a bias towards surface residues, and therefore predict surface residues instead of contact residues. This finding is in contrast to the relatively good performance of i-Patch (Hamer et al. [2010]), a statistical scoring tool for inter-protein contact prediction. i-Patch uses as input surface residues only, groups amino acids by physiochemical properties, and assumes the existence of patches of contact residues on interacting proteins. We examine whether using these ideas would improve the performance of MI. Since inter-protein contact residues are only on the surface of each protein, to disentangle surface from contact prediction we filtered out the confounding buried residues. We observed that considering surface residues only does indeed improve the interprotein contact prediction ability of all tested MI methods. We examined a specific "successful" case study in the literature and demonstrated that here, even when considering surface residues only, the most accurate MI based inter-protein contact predictor,MIc, performs no better than random. We have developed two novel MI variants; the first groups amino acids by their physiochemical properties, and the second considers patches of residues on the interacting proteins. In our analyses these new variants highlight the delicate trade-off between signal and noise that must be achieved when using MI for inter-protein contact prediction. The input for all tested MI methods is a multiple sequence alignment of homologous proteins. In a further attempt to understand why the MI methods perform poorly, we have investigated the influence of gaps in the alignment on intra-protein contact prediction. Our results suggest that depending on the evaluation criteria and the alignment construction algorithm employed, a gap cutoff of around 10% would maximise the performance of MI methods, whereas the popularly employed 0% gap cutoff may lead to predictions that are no better than random guesses. Based on the insight we have gained through our analyses, we end this dissertation by identifying a number of ways in which the contact residue prediction ability of MI variants may be improved, including direct coupling analysis.
Supervisor: Deane, Charlotte M. ; Reinert, Gesine Sponsor: Clarendon Scholarship
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Bioinformatics (biochemistry) ; Computational biochemistry ; Biochemistry ; Bioinformatics (life sciences) ; Biology ; Cell Biology ; mutual information ; contact residues ; proteins ; protein complexes ; multiple sequence alignment ; correlated mutations