Use this URL to cite or link to this record in EThOS:
Title: Computational methods for the analysis of next generation viral sequences
Author: Lamzin, Sergey
ISNI:       0000 0004 5916 6064
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Recent advances in sequencing technologies have brought a renewed impetus to the development of bioinformatics tools necessary for sequence processing and analysis. Along with the constant requirement to be able to assemble more complex genomes from ever evolving sequencing experiments and technologies there also exists a lack in visually accessible representations of information generated by analysis tools. Most of the novel algorithms, specifically for de novo genome assembly of next generation sequencing (NGS) data, are not able to efficiently handle data generated on large populations. We have assessed the common methods for genome assembly used today both from a theoretical point of view and their practical implementations. In this dissertation we present StarK (stands for k�), a novel assembly algorithm with a new data structure designed to overcome some of the limitations that we observed in established methods enabling higher quality NGS data processing. The StarK approach structurally combines de Brujin graphs for all possible dimensions in one supergraph. Although the technique to join reads remains in concept the same, the dimension k is no longer fixed. StarK is designed in such a way that it allows the assembler to dynamically adjust the de Brujin graph dimension k on the fly and at any given nucleotide position without losing connections between graph vertices or doing complicated calculations. The new graph uses localised coverage difference evaluation to create connected sub graphs which allows higher resolution of genomic differences and helps differentiate errors from potential variants within the sequencing sample. In addition to this we present a bioinformatics analysis pipeline for high-variation viral population analysis (including transmission studies), which, using both new and established methods, creates easily interpretable visual representations of the underlying data analysis. Together we provide a solid framework for biologists for extracting more information from sequencing data with less effort and faster than before.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available