Use this URL to cite or link to this record in EThOS:
Title: Predicting the structure and function of genomic sequences using the CATH structural database
Author: Bray, James Edward
ISNI:       0000 0001 3477 9743
Awarding Body: University of London
Current Institution: University College London (University of London)
Date of Award: 2001
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
The field of bioinformatics faces the challenge of reliably annotating genomic sequences with structural and functional information. Structure classification databases are now sufficiently populated to provide a framework for meeting this challenge. This thesis focuses on the superfamily level of structural classification that groups together distantly related proteins that have evolved from a common ancestor. In order to cope with the functional diversity that occurs at the structural superfamily level, sequences have been classified into functionally related protein families that can serve as the basis for genome annotation. Knowledge of the key structural and functional features of structural superfamilies provides valuable insights for accurately transferring biological information. This thesis describes the development of two new structure-based resources that enhance the ability of the CATH structural database to annotate genomic sequences. Firstly, the CATH Dictionary of Homologous Superfamilies (DHS) presents functionally annotated structural alignments for distantly related domains. Key residues can be identified and used diagnostically for validating the results of sequence search algorithms. Secondly, the CATH Protein Family Database (CATH-PFDB) integrates sequence and structure by assigning genomic sequences to structural superfamilies. The sequences within each superfamily are further clustered into families sharing close functional similarity. Extensive benchmarking of this sequence library using pairwise and profile search algorithms showed that both approaches can used to reliably identify distantly related genomic sequences. A protocol for analysing the quality of three-dimensional protein models derived from distantly related proteins has also been developed. Residue environment scores from the SSAP structure comparison algorithm have been used to identify well- modelled structural fragments through histogram and coverage plots. This facilitates the assessment of structure prediction and modelling algorithms that are vital for accurately transferring structural data to genomic sequences. This work was generously supported by the Biotechnology and Biological Sciences Research Council.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available