Use this URL to cite or link to this record in EThOS:
Title: Reference-free identification of genetic variation in metagenomic sequence data using a probabilistic model
Author: Ahiska, Bartu
ISNI:       0000 0004 2723 6056
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2012
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Microorganisms are an indispensable part of our ecosystem, yet the natural metabolic and ecological diversity of these organisms is poorly understood due to a historical reliance of microbiology on laboratory grown cultures. The awareness that this diversity cannot be studied by laboratory isolation, together with recent advances in low cost scalable sequencing technology, have enabled the foundation of culture-independent microbiology, or metagenomics. The study of environmental microbial samples with metagenomics has led to many advances, but a number of technological and methodological challenges still remain. A potentially diverse set of taxa may be represented in anyone environmental sample. Existing tools for representing the genetic composition of such samples sequenced with short-read data, and tools for identifying variation amongst them, are still in their infancy. This thesis makes the case that a new framework based on a joint-genome graph can constitute a powerful tool for representing and manipulating the joint genomes of population samples. I present the development of a collection of methods, called SCRAPS, to construct these efficient graphs in small communities without the availability or bias of a reference genome. A key novelty is that genetic variation is identified from the data structure using a probabilistic algorithm that can provide a measure of the confidence in each call. SCRAPS is first tested on simulated short read data for accuracy and efficiency. At least 95% of non-repetitive small-scale genetic variation with a minor allele read depth greater than 10x is correctly identified; the number false positives per conserved nucleotide is consistently better than 1 part in 333 x 103. SCRAPS is then applied to artificially pooled experimental datasets. As part of this study, SCRAPS is used to identify genetic variation in an epidemiological 11 sample Neisseria meningitidis dataset collected from the African meningitis belt". In total 14,000 sites of genetic variation are identified from 48 million Illumina/Solexa reads. The results clearly show the genetic differences between two waves of infection that has plagued northern Ghana and Burkina Faso.
Supervisor: McVean, Gilean Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Population genetics--Statistical methods ; Human genetics--Variation--Statistics ; Metagenomics