Use this URL to cite or link to this record in EThOS:
Title: Computational proteomics for genome annotation
Author: Blakeley, Paul
ISNI:       0000 0004 2740 876X
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Access from Institution:
The field of proteogenomics operates at the interface between proteomics and genomics, and has emerged during the past decade to exploit the vast quantities of high-throughput sequence data. A range of different proteogenomics approaches have been developed, which integrate mass spectrometry data with genome sequence data to provide empirical evidence for protein-coding genes. However, current methods may not be optimized as they do not fully consider the splicing complexity in eukaryotes and there is currently no best practice method. To address this, we investigate the level of proteomics support for Ensembl gene models in human, and a selection of model organisms. We find a disparity between the number of splice variants confirmed by extant data, and the number that can theoretically be confirmed using current proteomics technologies. We then go on to investigate EST-based proteogenomics methods, which enabled the discovery of novel peptide sequences in the chicken genome, which represent hitherto unannotated genes, amended gene models, polymorphisms, and genes missing from the genome assembly. Different approaches for searching mass spectrometry data against transcript sequences are explored, and we show that searching mass spectra against protein sequences predicted by the EORF and ESTScan2 translation tools results in the best sensitivity.
Supervisor: Hubbard, Simon; Griffiths-Jones, Samuel Sponsor: BBSRC
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Proteomics ; Genome Annotation ; Alternative Splicing ; ESTs ; Mass spectrometry