Use this URL to cite or link to this record in EThOS:
Title: Kernel-based classification of protein structure and function from amino acid sequences
Author: Ward, Jonathan James
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2005
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
The thesis describes the application of kernel methods and, in particular, the support vector machine (SVM) to the classification of protein structure and function. The thesis is divided into two related halves with chapters 2 and 3 containing descriptions of methods for predicting different aspects of protein structure. Chapter 4 investigates the functions of disorder in the proteome of a model eukaryote and Chapter 5 describes algorithms and data sources for inferring protein function. The data sources include structure predictions and other properties that can be derived directly from amino acid sequences. Chapter 2 describes a new method for the prediction of secondary structure using an SVM learning algorithm. This is presented as a guide to the application of SVMs to problems in bioinformatics, and includes a discussion of the positive and negative aspects of the technique. The final prediction method is shown to have comparable performance to several of the most accurate modern methods. The third chapter discusses the development of a method to recognize native disorder from amino acid sequences. This predictor (DISOPRED2) is shown to be the most accurate contemporary method on targets from the fifth CASP experiment. The false positive rate of DISOPRED2 is determined using ordered structures from the Protein Data Bank, and the classifier is then used to estimate the frequency of disorder in complete genomes. The final part of this chapter presents the design and implementation of a publicly-available web service for disorder prediction. The fourth chapter describes the use of DISOPRED2 to investigate the functional annotations that are associated with long predictions of disorder in the proteome of the model organism Saccharomyces cerevisiae. This chapter also provides several biochemical and evolutionary explanations for the disparity in the predicted frequencies of disorder between eukaryote and prokaryote proteomes. The chapter also demonstrates that the boundaries between structural domain have a propensity toward being predicted as disordered by DISOPRED2. The final research chapter discusses the development of machine learning methods for determining the function of unannotated proteins. Individual classifiers are trained using phylogenetic profiles, structure predictions and simple features derived from the amino acid sequence to predict the function of yeast proteins in the absence of significant sequence similarity.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available