Use this URL to cite or link to this record in EThOS:
Title: Exploiting whole-PDB analysis in novel bioinformatics applications
Author: Ramraj, Varun
ISNI:       0000 0004 5354 4327
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximately 97,000 structures. This represents an expanding wealth of high-quality information but there seem to be few bioinformatics tools that consider and analyse these data as an ensemble. This thesis explores the development of three efficient, fast algorithms and software implementations to study protein structure using the entire PDB. The first project is a crystal-form matching tool that takes a unit cell and quickly (< 1 second) retrieves the most related matches from the PDB. The unit cell matches are combined with sequence alignments using a novel Family Clustering Algorithm to display the results in a user-friendly way. The software tool, Nearest-cell, has been incorporated into the X-ray data collection pipeline at the Diamond Light Source, and is also available as a public web service. The bulk of the thesis is devoted to the study and prediction of protein disorder. Initially, trying to update and extend an existing predictor, RONN, the limitations of the method were exposed and a novel predictor (called MoreRONN) was developed that incorporates a novel sequence-based clustering approach to disorder data inferred from the PDB and DisProt. MoreRONN is now clearly the best-in-class disorder predictor and will soon be offered as a public web service. The third project explores the development of a clustering algorithm for protein structural fragments that can work on the scale of the whole PDB. While protein structures have long been clustered into loose families, there has to date been no comprehensive analytical clustering of short (~6 residue) fragments. A novel fragment clustering tool was built that is now leading to a public database of fragment families and representative structural fragments that should prove extremely helpful for both basic understanding and experimentation. Together, these three projects exemplify how cutting-edge computational approaches applied to extensive protein structure libraries can provide user-friendly tools that address critical everyday issues for structural biologists.
Supervisor: Esnouf, Robert Mark Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Medical Sciences ; Computer science (mathematics) ; Bioinformatics (biochemistry) ; Life Sciences ; Genetics (life sciences) ; Computational biochemistry ; Bioinformatics (life sciences) ; Biology (medical sciences) ; Genetics (medical sciences) ; Structural genomics ; Crystallography ; Enzymes ; Membrane proteins ; Protein chemistry ; Protein folding ; Polymers Amino acid and peptide chemistry ; NMR spectroscopy ; Mass spectrometry ; Chemistry & allied sciences ; Physical Sciences ; Mathematical genetics and bioinformatics (statistics) ; Computationally-intensive statistics ; Bioinformatics (technology) ; Computing ; Applications and algorithms ; Program development and tools ; Scalable systems ; Software engineering ; Theory and automated verification ; Biomedical engineering ; protein data bank ; structural biology ; bioinformatics ; clustering ; disorder prediction ; unit cell ; space group ; OpenMP ; parallelization ; proteins ; peptides ; fragments ; training ; neural network ; MoreRONN ; nearest cell ; nearest-cell