Use this URL to cite or link to this record in EThOS:
Title: Multivariate linear mixed models for statistical genetics
Author: Casale, Francesco Paolo
ISNI:       0000 0004 6425 5256
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
In the last decade, genome-wide association studies have helped to advance our understanding of the genetic architecture of many important traits, including diseases. However, the statistical analysis of genotype-phenotype associations remains challenging due to multiple factors. First, many traits have polygenic architectures, which means that they are controlled by a large number of variants with small individual effects. Second, as increasingly deep phenotype data are being generated there is a need for multivariate analysis approaches to leverage multiple related phenotypes while retaining computational efficiency. Additionally, genetic analyses are confronted by strong confounding factors that can create spurious associations when not properly accounted for in the statistical model. We here derive more flexible methods that allow integrating genetic effects across variants and multiple quantitative traits. To do so, we build on the classical linear mixed model (LMM), a widely adopted framework for genetic studies. The first contribution of this thesis is mtSet, an efficient mixed-model approach that enables genome-wide association testing between sets of genetic variants and multiple traits while accounting for confounding factors. In both simulations and real-data applications we demonstrate that mtSet effectively combines the advantages of variant-set and multi-trait analyses. Next, we present a new model for gene-context interactions that builds on mtSet. The proposed interaction set test (iSet) yields increased statistical power for detecting polygenic interactions. Additionally, iSet enables the identification of genetic loci that are associated with different configurations of causal variants across contexts. After benchmarking the proposed method using simulated data, we consider two applications to real datasets, where we investigate genetic effects on gene expression across different cellular contexts and sex-specific genetic effects on lipid levels. Finally, we describe LIMIX, a software framework for the flexible implementation of different LMMs. Most of the models considered in this thesis, including mtSet and iSet, are implemented and available in LIMIX. A unique aspect of the software is an inference framework that allows a large class of genetic models to be defined and, in many cases, to be efficiently fitted by exploiting specific algebraic properties. We demonstrate the utility of this software suite in two applied collaboration projects. Taken together, this thesis demonstrates the value of flexible and integrative modelling in genetics and contributes new statistical methods for genetic analysis. These approaches generalise previous models, yet retain the computational efficiency that is needed to tackle large genetic datasets.
Supervisor: Stegle, Oliver Sponsor: EMBL-European Bioinformatics Institute
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: linear mixed model ; statistical genetics ; GWAS ; multivariate