Use this URL to cite or link to this record in EThOS:
Title: Network modularity and local environment similarity as descriptors of protein structure
Author: Grant, William
ISNI:       0000 0004 9348 2044
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Here we explore two approaches for analysing protein structure, both starting from the three-dimensional co-ordinates of each atom within the structure, which are then abstracted into a more useful form. The first method transforms the protein into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. By applying the Infomap community detection algorithm, we can fragment the protein into highly intra-connected subregions - these subregions are compact and globular, and can be compared with known structural and functional subunits of the protein (also known as domains). By performing this fragmentation process systematically across a large set of proteins, and checking for structurally conserved fragments, we can search for novel candidate domains. This method for automatically decomposing a protein into compact substructures may also be useful in coarse-graining molecular dynamics, analysing the protein’s topology, in de novo protein design, or in fitting electron density maps derived from single particle electron microscopy. The second method calculates a descriptor for each atom of the protein based on its local environment, known as a Smooth Overlap of Atomic Positions (SOAP) descriptor. Using these descriptors we can perform overall comparisons of the subregions identified above. In addition, by comparing the descriptors of a set of proteins known to share common structural or functional features (such as binding of a particular ligand), we can automatically identify the most highly conserved atoms of the set. These atoms may line ligand binding pockets or correspond to allosteric sites, which could inform drug design.
Supervisor: Ahnert, Sebastian Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: computational biology ; protein structure ; network science