Use this URL to cite or link to this record in EThOS:
Title: Genome annotation errors and how to fix them
Author: Dunne, Michael Peter
ISNI:       0000 0004 7960 018X
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Many inferences about the biological properties of an organism depend on the completeness and accuracy of its genome annotation. Advances in sequencing technologies and the associated decreased costs have brought whole genome sequencing projects into the reach of individual laboratories, precipitating a huge acceleration in the publication of draft genome assemblies and annotations. While genome assembly quality metrics have received substantial attention, adequate frameworks for quantifying and controlling errors in genome annotations are lacking, and thus the completeness and accuracy of published genome annotations are unknown. Moreover, genome annotations are frequently taken at face value by those using them, and any errors present are propagated in downstream inferences and analyses. Despite underpinning much of comparative genomic research, little attention has been paid to quantifying the extent of genome annotation inaccuracies and the majority of attempts to systematically rectify such errors have relied either on manual input, which is impractical on a large scale, or on universally conserved gene sets, which only account for a small percentage of genes. The aim of the research described in this thesis is to provide methods to assess and rectify two main classes of genome annotation errors (missing genes and incorrect gene models) at a phylogenetically local level, by mutually improving genome annotations for sets of related species, in the absence of extrinsic experimental data. I introduce several non-extrinsic metrics for assessing genome annotation completeness and the accuracy of the gene models contained therein, and provide two self-contained methods that improve genome annotation accuracy. In summary, this thesis reveals that genome annotation errors are widespread, even in widely studied community annotated genomes, and that many of these errors can be identified and corrected using automated phylogenetically local approaches.
Supervisor: Kelly, Steve Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Genomics ; Genetics ; Bioinformatics