Use this URL to cite or link to this record in EThOS:
Title: Categorical data analysis of protein structure
Author: Hommola, Susan Kerstin
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
It has long been known that the amino-acid sequence of a protein determines its 3- dimensional structure, but accurate ab initio prediction of structure from sequence remains elusive. In this thesis, we aim to gain insight into generic principles of protein folding through statistical modelling of protein structure. The first part is concerned with local protein structure. We study the relationship of dihedral angles in short protein segments up to a length of three residues. We adopt a contingency table approach, exploring a targeted set of hypotheses through log-linear modelling to detect patterns of association between the dihedral angles in the segments considered. For segments of length two (dipeptides), our models indicate a substantial association of the side-chain conformation of the first residue with the backbone conformation of the second residue (side-to-back interaction) as well as a weaker, but still significant, associa- tion of the backbone conformation of the first residue with the side-chain conformation of the second residue (back-to-side interaction). Comparison of these interactions across dif- ferent dipeptides through cluster analysis reveals a striking pattern. For the side-to-back term, all dipeptides having the same first residue cluster together, whereas for the back- to-side term we observe a much weaker pattern. This suggests that the conformation of the first residue dictates the conformation of the second. Our categorical approach proves difficult for the analysis of longer segments due to the discrepancy between the increased complexity and the shrinking amount of data available. In the second part, we study non-local interactions represented by contact maps. Our approach focuses entirely on the positions of contacting residues and is completely inde- pendent of protein amino-acid sequence. We investigate and quantify patterns in three specific regions of aggregated contact maps of single-domain proteins belonging to the four major SCOP classes (all-α, all-β, α/β, α+β) using logistic regression models. The first two regions represent contacts of residues aligned to the N-terminus with subsequent residues, and contacts of residues aligned to the C-terminus with previous residues, in a symmetric fashion with respect to the chain termini. The third region contains contacts between terminal residues. The models for each region contain factors for the positions of contacting residues as well as factors describing parallel and anti-parallel β-strand contact patterns. There is an interesting asymmetry between N-aligned and C-aligned contacts for the α/β SCOP class. The region around the -terminus shows a strong propensity towards parallel contacts between the first few residues and residues further along the sequence, whereas the last few residues do not show any strong patterns. This N-terminal dominance could indicate cotranslational folding. The other classes do not exhibit this asymmetry, but reveal predominantly anti-parallel β-strand patterns (all-β class), mixed patterns (α+β class) or no distinct patterns (all-a class). Contact patterns in the terminal regions are generally weak showing no strong preferences towards parallel or anti-parallel f3-strand contacts.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available