Use this URL to cite or link to this record in EThOS:
Title: Analysis of complex genetic variation using population reference graphs
Author: Maciuca, Sorina
ISNI:       0000 0004 8507 7970
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variation within a species. In Chapter 3, I propose a novel data structure that extends the traditional reference genome with known variants, providing a compressed representation of genetic diversity. I present algorithms to match sequencing reads to this extended reference structure and infer a personalised reference genome within close genetic distance from the sample under analysis. Coupled with existing variant calling tools, this personalised reference confers increased power to detect complex variants in diverse regions, compared to the traditional reference genome. In Chapter 4, I evaluate the performance of the method on simulated data and show that it is viable for megabase-sized genomes such as the malaria parasite Plasmodium falciparum -- a typical sample can be analysed in 5.7h on a single CPU, using a small amount of memory. I suggest a number of future optimisations to improve computational efficiency. In Chapter 5, I apply my method to 1300 Plasmodium falciparum samples from across the world and study two hyper-diverse genes that encode surface antigens in Plasmodium falciparum. I show that the personalised references recover variants of these genes that are missed by standard techniques of mapping reads to the traditional reference genome. Next, I build the first global variation catalogue incorporating dimorphic alleles of a region of functional interest and study their frequency patterns.
Supervisor: Iqbal, Zamin ; McVean, Gil Sponsor: Wellcome Trust
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available