Use this URL to cite or link to this record in EThOS:
Title: Computational approaches for metagenomic analysis of high-throughput sequencing data
Author: Ainsworth, David
ISNI:       0000 0004 6059 279X
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the nascent field of metagenomics has been built. This ability to cheaply sample billions of DNA reads directly from environments has democratised sequencing and allowed researchers to gain unprecedented insights into diverse microbial communities. These technologies however are not without their limitations: the short length of the reads requires the production of vast amounts of data to ensure all information is captured. This 'data deluge' has been a major bottleneck and has necessitated the development of new algorithms for analysis. Sequence alignment methods provide the most information about the composition of a sample as they allow both taxonomic and functional classification but algorithms are prohibitively slow. This inefficiency has led to the reliance on faster algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments against annotated genomes. This thesis will describe k-SLAM, a novel ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k -mer based method k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches, allowing a full taxonomic classification and gene identification to be tractable on modern large datasets. The alignments found by k-SLAM can also be used to find variants and identify genes, along with their nearest taxonomic origins. A novel pseudo-assembly method produces more specific taxonomic classifications on species which have high sequence identity within their genus. This provides a significant (up to 40%) increase in accuracy on these species. Also described is a re-analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes. k-SLAM has been used by a range of research projects including FLORINASH and is currently being used by a number of groups.
Supervisor: Sternberg, Michael ; Butcher, Sarah ; Knottenbelt, William Sponsor: Biotechnology and Biological Sciences Research Council ; Illumina Inc.
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral