Use this URL to cite or link to this record in EThOS:
Title: Estimating telomere length from whole genome sequencing data
Author: Farmery, James Henry Royston
ISNI:       0000 0004 7229 8133
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis details the development of two computational tools, Telomerecat and Parabam, as well as their applications to whole genome sequencing (WGS) data. Telomerecat is a tool for estimating telomere length from WGS data. The strength of Telomerecat lies in its applicability. This applicability is due to a number of advantages over previous attempts to estimate telomere length from WGS. Chief amongst these advantages is that it makes no assumption about the underlying chromosome count or size of the genome within input samples. This means that Telomerecat lends itself well to analysing cancer samples where such assumptions are unfounded. This also means it is applicable to non-human samples, a first for tools of its kind. Furthermore, a novel method for filtering reads derived from interstitial telomere sequences means that it does not rely on previously applied analyses, a source of bias. The other tool described in this thesis is Parabam. Parabam is the first tool of its kind to allow users to apply a function to all of the reads in sequence alignment files, in parallel. Furthermore, Parabam includes a novel method for iterating over index sorted sequence files as if they were name sorted. We provide evidence that Parabam is a quicker way to create complex subsets and statistics from sequence alignment files. In the latter half of the thesis we detail two applications of Telomerecat to large scale WGS projects. The first application, to the Prostate ICGC UK cohort, unveils hitherto uncovered associations between telomere length and previously identified molecular subtypes as well as cancer stage. In the second application, to the NIHR BioResource - Rare Disease cohort, we discover a previously unidentified variant in DKC1 that we propose is directly linked to short telomeres and an immunodeficient phenotype.
Supervisor: Lynch, Andy Graeme Sponsor: Cancer Research UK
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: telomere ; telomerecat ; parabam ; prostate ; cancer ; rare blood disease ; dyskeratosis congenita