Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.604280
Title: Gene prediction using a configurable system for the integration of data by dynamic programming
Author: Howe, K.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2003
Availability of Full Text:
Full text unavailable from EThOS. Please contact the current institution’s library for further details.
Abstract:
A new approach to the computational identification of protein-coding gene structures in genomic DNA sequence is described. It overcomes rigidities inherent in most existing gene prediction methods, for example those based on Hidden Markov Models (HMMs), by supporting a flexible computational model of how sequence signal signals fit together into complete gene structures. The primary result of the work is a gene prediction tool for the assembly of evidence for individual gene components (features) into predictions of complete gene structures. The system is completely configurable in that both the features themselves, and the model of gene structure against which candidate assemblies are validated and scored, are external to the system and supplied by the user. The gene prediction process is therefore tied neither to any specific techniques for the recognition of gene prediction signals, nor any specific underlying model of gene structure. The methodology is implemented in a piece of software called “GAZE” which uses a dynamic programming algorithm to obtain the highest scoring gene structure consistent with the user-supplied features and gene-structure model, and also posterior probabilities that each feature is part of a gene. The algorithm employs a novel pruning strategy, ensuring that it has a runtime effectively linear in the length of the sequence without compromising accuracy. The effectiveness of the strategy is explored by applying it to the prediction of gene structures in sequences of the nematode worm C. elegans. GAZE allows the integration of gene prediction data from multiple, arbitrary sources. It is important for the accuracy of the system that the various pieces of evidence are weighted appropriately with respect to each other. A novel strategy for the automatic determination of optimal values for these weights is described.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.604280  DOI: Not available
Share: