Use this URL to cite or link to this record in EThOS:
Title: Computational localization of promoters and transcription start sites in mammalian genomes
Author: Down, T.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2004
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Here, I investigate the question of identifying and annotating promoters, one of the most important regulatory signals in the genome, which mark the points where transcription is initiated, and regulate the transcription of genes. I present a new computational method, EponineTSS, which can predict transcription start sites in bulk genomic sequence data with excellent sensitivity and specificity. Unlike the existing methods, it gives an indication of the actual location of the transcription start site. Comparisons with available experimental data suggest that the positional accuracy of these predictions is very good. Results form this method are included as part of the Ensembl human genome annotation. Having located transcription start sites for genes, I also discuss the use of results from comparative genomics the estimate the extent of the fundamental promoter region upstream of the start site. I show that the extent of promoters is very variable, and that promoter size is correlated with the function of the gene for whose regulation it is responsible. Genes associated with developmental processes tend to have particularly large, and thus presumably complex, promoters, with the homeobox transcription factors among the most extreme examples. I also introduce sparse Bayesian learning, a recently developed approach to supervised machine learning which can be applied to the training of a wide range of model types, and embodies the principle of selecting the simplest possible model to explain the observed data. I demonstrate a new technique which makes sparse Bayesian learning much more scaleable, allowing it to be applied to very large and complex problems, and present a convenient, freely available Java library which provides a general-purpose implementation of this technique. This library was used here in the training of the transcription start site predictor, but has a wide range of applications in computational biology and beyond.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available