Use this URL to cite or link to this record in EThOS:
Title: Sequencing in isolation : next-generation sequencing studies in founder populations
Author: Gilly, Arthur Leonard
ISNI:       0000 0004 7968 386X
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Although common variants are routinely assayed in populations, rare mutations and copy-number variants are understudied contributors to the aetiology of complex traits. Isolated populations hold the promise of increased power gains in detecting associations in rare and low-frequency variants that have drifted up in frequency due to founder events and geographical isolation. Population-specific imputation reference panels and very low-depth whole-genome sequencing have been proposed as ways to boost power in next-generation association studies while keeping sequencing costs low. The aim of this work is to leverage the wealth of sequencing data generated as part of the HELIC project to study the allelic architecture of complex phenotypes and identify sequence variants associated with traits of medical relevance. We develop METACARPA, a method that meta-analyses summary statistics from genome-wide association studies. We establish a robust pipeline for the imputation and refinement of 1x whole-genome sequencing data, as well as a quality control and association pipeline for cohort-wide high-depth sequencing. We examine variant selection and weighting methods for genome-wide burden testing of rare variants, and write several tools for the visualisation of single-point and aggregated association results. Finally, we develop UN-CNVc, a fast copy number variant caller optimised for population-wide sequencing data. Applying METACARPA to a 4-way multi-array and multi-cohort analysis of the HELIC array data allowed the discovery, among others, of two lipid-associated loci, including the cardioprotective low-frequency variant rs145556679. In our cohorts, 1x data provided access to more than 100,000 low-frequency variants not discovered using an imputed chip design, and allowed to replicate a burden of low-frequency and rare cardioprotective variants in the APOC3 gene. We discover burdens of rare regulatory and coding variants independent of known common-variant associations at known loci, such as in the ADIPOQ gene for adiponectin or GGT1 for gamma-glutamyltransferase, as well as novel associations entirely driven by rare variants, such as with triglycerides for the FAM189B gene. We describe two complex gene deletions influencing serum levels of this genes' protein products, called using UN-CNVc.
Supervisor: Wood, Angela Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: whole-genome sequencing ; genetic ; association study ; population isolate ; structural variant