Use this URL to cite or link to this record in EThOS:
Title: The development of bioinformatics tools for the rapid identification of novel cellulase sequences
Author: Roche, Daniel Barry
ISNI:       0000 0004 2726 6503
Awarding Body: University of Reading
Current Institution: University of Reading
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
The main aim of this project was to develop bioinformatics tools to rapidly identify novel cellulases sequences for use in next generation biofuels production. Firstly, a detailed analysis of the sequences and folds of structurally elucidated cellulases was undertaken. From this analysis it was discovered that cellulases are structurally diverse and are classified into 19 different CA TH superfamilies. The study of cellulase fold space was subsequently utilized for the development of a cellulase specific fold recognition tool, CellulaseFOLD. CellulaseFOLD was found to be over 30% faster than the fastest leading fold recognition tool (HHsearch) for the detection of cellulases. In addition, from the evaluation of 3 cellulase containing proteomes, the CellulaseFOLD method achieved a higher percentage coverage of cellulase sequences when compared to HHsearch. Secondly, FunFOLD a ligand binding site residue prediction tool and a novel metric for its evaluation (the Binding-site Distance Test - BDT score), were developed. The FunFOLD method showed a significant improvement over the best available servers and was shown to be competitive with the top methods. In addition, the BDT score was determined to be a more robust score than the previous metric for ligand binding site evaluation (the MCC score) and was subsequently adopted by official assessors at CASP9. Thirdly, a comprehensive analysis of binding site residues for all structurally elucidated cellulases was undertaken. From this study it was concluded that aromatic residues such as tryptophan were important in saccharide binding. Furthermore, cellulase binding sites contained a higher percentage of charged residues, when compared to the entire cellulase structure. Fourthly, a ligand binding site quality assessment tool, FunFOLDQA, was developed, which assesses predictions prior to the availability of experimental data. The FunFOLDQA score was shown to be highly correlated to both the MCC and BDT metrics. Thus, FunFOLDQA can be utilized to assess binding site predict quality in the absence of experimental data. Finally, both the general (FunFOLD and FunFOLDQA) and cellulase specific algorithms (CellulaseFOLD and the cellulase binding site data) were utilized to assess case study sequences, identified as potential cellulases, from 3 proteomes under intense study in the biofuels industry.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available