Use this URL to cite or link to this record in EThOS:
Title: Pan-genomic analysis of clonal bacterial samples using nanopore reads and genome graphs
Author: Colquhoun, Rachel
ISNI:       0000 0004 8506 9348
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Bacterial genetic variation originates through multiple mechanisms, including mutations during replication, movement of mobile elements, and various forms of recombination. As a result, genomes can be highly divergent with only a small fraction of genes core to all and a large pangenome of genes which have been identified in one or more sequenced samples. In this context, the ability to accurately detect genetic variation throughout the pangenome and compare many genomes remains a difficult problem. Here we present a novel pangenome reference graph structure, which represents the known genetic variation within a species as a collection of `floating' graphs. Each of these represents some homologous region such as a cluster of genes. By approximating a sequenced genome as a mosaic of genomes from the reference panel, this design forms the basis for a systematic framework in which to analyse diverse sets of samples where a single reference would be inappropriate. Applying this method to E. coli, we demonstrate how it enables us to describe genetic variation at both a coarse (gene presence) and a fine (SNP/indel) level. We demonstrate how this enables us to successfully compare divergent genomes within a species, gaining dramatically higher sensitivity to SNP variation than single reference-genome approaches. We go on to demonstrate how this method enables us to investigate global genetic variation in K. pneumoniae, and to describe the spectrum of allele frequencies in accessory genes. The method works for either long Nanopore or short Illumina reads, and we hope will provide the basis for addressing many questions in diverse datasets.
Supervisor: Iqbal, Zamin ; Crook, Derrick Sponsor: Wellcome Trust
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Genetics ; Bioinformatics