Use this URL to cite or link to this record in EThOS:
Title: Molecular propinquity : evolutionary and structural relationships of proteins
Author: Brenner, S. E.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 1996
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
The SCOP (Structural Classification of Proteins) database hierarchically organizes all proteins of known structure according to their structural and evolutionary relationships. Close evolutionary links are represented at the most detailed levels of the database, while broad structural similarities are at the highest, most general levels. A crucial intermediate level, called superfamilies, records evolutionary relationships too distant to be recognized by sequence comparison alone. The database pioneered the use of the World Wide Web for scientific communication, and it has consequently become a valuable resource for interpreting the structure, function, and evolution of new proteins. Scop reveals that overall population statistics of current structural knowledge are extremely skewed at all levels. For example, half of all proteins are represented by a single set of coordinates, but one protein has more than 200. More intriguingly, though there are .4 evolutionarily-related superfamilies for each structural fold, the vast majority of folds are used by only a single superfamily. The remainder, an eighth of all folds, are used by a disproportionate number of superfamilies, ranging up to 19 each. Because it records very distant homology detected by structural similarities, the scop database provides a unique set of evolutionary relationships for testing and calibrating pairwise sequence comparison algorithms. The Smith-Waterman algorithm was the most successful method, but it capable of identifying only a small fraction of distant evolutionary relationships. FASTA (ktup=1) performed nearly as well, while BLAST was notably worse. Smith-Waterman and FASTA statistical scores were highly reliable and these scores produced the best results. Raw scores could identify nearly as many homologs, but measures based on percent identity performed extremely poorly.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available