Use this URL to cite or link to this record in EThOS:
Title: Statistical HLA type imputation from large and heterogeneous datasets
Author: Dilthey, Alexander Tilo
ISNI:       0000 0004 2738 4657
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
An individual's Human Leukocyte Antigen (HLA) type is an essential immunogenetic parameter, influencing susceptibility to a variety of autoimmune and infectious diseases, to certain types of cancer and the likelihood of adverse drug reactions. I present and evaluate two models for the accurate statistical determination of HLA types for single-population and multi-population studies, based on SNP genotypes. Importantly, SNP genotypes are already available for many studies, so that the application of the statistical methods presented here does not incur any extra cost besides computing time. HLA*IMP:01 is based on a parallelized and modified version of LDMhc (Leslie et al., 2008), enabling the processing of large reference panels and improving call rates. In a homogeneous single-population imputation scenario on a mainly British dataset, it achieves accuracies (posterior predictive values) and call rates >=88% at all classical HLA loci (HLA-A, HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DRB1) at 4-digit HLA type resolution. HLA*IMP:02 is specifically designed to deal with multi-population heterogeneous reference panels and based on a new algorithm to construct haplotype graph models that takes into account haplotype estimate uncertainty, allows for missing data and enables the inclusion of prior knowledge on linkage disequilibrium. It works as well as HLA*IMP:01 on homogeneous panels and substantially outperforms it in more heterogeneous scenarios. In a cross-European validation experiment, even without setting a call threshold, HLA*IMP:02 achieves an average accuracy of 96% at 4-digit resolution (>=91% for all loci, which is achieved at HLA-DRB1). HLA*IMP:02 can accurately predict structural variation (DRB paralogs), can (to an extent) detect errors in the reference panel and is highly tolerant of missing data. I demonstrate that a good match between imputation and reference panels in terms of principal components and reference panel size are essential determinants of high imputation accuracy under HLA*IMP:02.
Supervisor: McVean, Gil Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Genetics (life sciences) ; Bioinformatics (life sciences) ; Immunodiagnostics ; Immunology ; Mathematical genetics and bioinformatics (statistics) ; Statistics (see also social sciences) ; Human Leukocyte Antigen ; major histocompatibility complex ; imputation ; prediction ; autoimmune ; immunology ; graph ; population genetics