Use this URL to cite or link to this record in EThOS:
Title: A general-purpose model for emulating expression-based high-throughput assays
Author: Abdalla, Moustafa
ISNI:       0000 0004 8507 0541
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
High-throughput assays (HTAs) are the gold standard for scientific experimentation, providing starting points for drug design, for understanding the interactions between genes in biochemical cascades, and for identifying the role of genetic variants in the broader context of cellular processes. Here, I introduce a widely applicable and practically usable deep neural architecture that can recapitulate expression-based HTAs. Trained solely on promoter elements, the model can learn to emulate the transcriptional machinery of a given cell type or tissue - allowing us to construct in silico DNA reporter assays, predict the transcriptomic impact of small molecules in cell lines, and infer regulatory networks. By leveraging the tissue-specific properties of the model, I further show how model-derived impact scores can pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. I subsequently demonstrate the utility of this model in investigating the role of DNA and histone modifications in difficult-to-study processes, such as neural induction. Using an extension of the same architecture, I show how the framework outperforms linear models in predicting the consequences of genotype variation. I use the framework to identify putatively functional eQTLs that are missed by experimental high-throughput approaches that characterise variant function. By emulating the transcriptional machinery of human tissues/cells, this represents one of the first attempts to recapitulate expression-based HTAs using a single in silico framework. I also introduce a new analytic strategy that leverages coexpression differential dependence networks (i.e. correlation perturbation between gene clusters) to model the effects of genetic variation in a tissue-dependent manner, and to subsequently, discover new loci for type 2 diabetes-related traits. I demonstrate that the newly discovered risk loci are driven by tissue-specific differential dependence network annotations and as validation, highlight how they have been confirmed in (larger) genetic studies for the same or closely-related traits and/or partially verified experimentally. My results are a first step towards my long-term goal of developing a general-purpose model that can learn to predict expression under any set of arbitrary constraints (e.g. for a tissue of interest, given a particular genotype, or after a fixed-time post-drug exposure) and demonstrate the utility of such strategies in decoding GWAS signals.
Supervisor: McCarthy, Mark I. ; Holmes, Chris C. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available