Use this URL to cite or link to this record in EThOS:
Title: Efficient analysis of microbial whole-genome sequence data using de Bruijn graphs
Author: Bradley, Phelim
ISNI:       0000 0004 7652 5292
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Antimicrobial resistance (AMR) is a persistent and growing threat to global health. Whole genome sequencing (WGS) has the potential to dramatically improve our ability to detect, understand, and monitor AMR. However, microbial diversity and complexity means that the analysis and interpretation of their genomes is challenging. In this thesis, I explore applications of de Bruijn graphs (DBGs) to the analysis of these data. First, I present a tool, Mykrobe predictor, that uses DBGs to rapidly identify species and AMR from WGS data. I show that it is accurate, flexible, and efficient. Next, I explore an extension of Mykrobe predictor to long read sequencing of direct clinical samples of M. tuberculosis. In doing so, I show that one could reduce the turn-around time for susceptibility testing of an M. tuberculosis isolate from 2 weeks to 12 hours. Finally, I explore the challenges of DNA search in very large collections (millions) of microbial data sets. In particular, I address the super-linear scaling of existing k-mer indexing tools and present a novel representation and implementation of a probabilistic coloured de Bruijn graph, "Coloured Bloom Graph" (CBG). I demonstrate its scalability by building a CBG of all publicly accessible microbial WGS data (almost half a million samples) and use it to run millisecond searches in these data.
Supervisor: Iqbal, Zamin ; McVean, Gil Sponsor: Wellcome Trust
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available