Use this URL to cite or link to this record in EThOS:
Title: Sequence database searching using structural models of protein evolution
Author: Davies, L.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2002
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Commonly used programs to search sequence databases such as BLAST, FASTA and SSEARCH identify sequence homology through pairwise alignment techniques. These programs are good at detecting closely related sequences but have problems accurately detecting homologous sequences with low sequence identity. This thesis describes a new approach that attempts to improve the detection of distantly related sequences by rejecting the assumption that all sites in a protein behave in an identical manner. This is done without the use of profile techniques, which require the preliminary collection of a set of homologs. Existing programs use general properties of proteins to generate alignment scores, which simplify calculations but may also result in a decrease in accuracy. In reality, amino acid replacement probabilities and rates, amino acid frequencies and gap probabilities all vary according to where a residue lies in a protein structure. Typical patterns of these structure-specific variations in evolutionary dynamics can be incorporated into a database search program through the use of hidden Markov models (HMMs), and hence potentially improve the detection of more distantly related sequences. In this thesis, the utility of including structure-specific evolutionary information in a database search program has been assessed. I have developed a general methodology permitting structure-based evolutionary models to be used for database searching, and specific algorithms that incorporate either solvent accessibility distinctions or protein secondary structure distinctions for globular proteins. In addition I have developed a database search algorithm for transmembrane proteins. The improvement afforded by adding the extra information has then been evaluated through the use of both simulated sequences, which exactly fit the models, and real sequences from the SCOP database. The success rate of each of these programs has been compared to a simplified model that contains the general properties of proteins but with no structural distinctions.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available