Use this URL to cite or link to this record in EThOS:
Title: Deep learning uncovers genomic features of cell-type and state
Author: Phuycharoen, Mike
ISNI:       0000 0004 8504 9558
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Access from Institution:
Genomic and epigenomic data are being obtained experimentally at an ever-increasing rate. As datasets become easier and cheaper to collect, computational methods allowing their interpretation and integration gain in importance. This thesis addresses the problem of using omic data to identify functional elements in DNA sequences with machine learning. In particular, convolutional neural networks are used to identify binding sites of transcription factor proteins (TFs), as well as features of chromatin accessibility in a set of mouse and human cell types. Deep learning attribution methods can provide explanations for model predictions, and performance of different approaches is evaluated. Two main systems are analysed. The problem of differential and cooperative TF binding is illustrated in mouse branchial arch tissues, where MEIS TFs are known to co-bind with HOX to regulate tissue-specific developmental programmes. It is shown that deep neural networks outperform other commonly used computational methods in predicting binding of HOXA2 from differential MEIS data. Novel applications for indirect regularisation with data are introduced, allowing classification of small datasets. Secondly, a short time series of chromatin accessibility is modelled after immune stimulation in human CD4+T cells. Sequence features characteristic of different dynamic trajectories are identified. An unsupervised approach is introduced for obtaining differential features without a priori class specification along with a semi-supervised method for removal of replicate bias from the differential metric. The methods are used in two more systems in mouse: MEF2D binding across three tissues, and OCT4 binding in embryonic stem cells. Deep learning models presented in this work show substantial improvements over k-mer counting and SVMs, and provide important motivation for further development of machine learning methods for genomic analysis.
Supervisor: Chen, Ke ; Rattray, Magnus Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: ATAC-seq ; ChIP-seq ; OCT4 ; MEF2D ; CNN ; MEIS ; HOX ; Deep learning