Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.773697
Title: Machine learning for the exploitation of high throughput omics data : a case study on identifying circadian disruption from human blood transcriptomic data
Author: Alganmi, Nofe
ISNI:       0000 0004 7960 9424
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Abstract:
The DNA microarray is a high throughput technology that is able to scan thousands of genes simultaneously and read their expression level. However, there are many challenges associated with data. One of the main opportunities is the curse of dimensionality which makes it difficult to learn without overfitting. Therefore, we proposed an unsupervised nonlinear machine learning framework to explore the circadian rhythmic features as a case study. Auto-encoder is capable of automatically learn the microarray data features and reveal knowledge that can help in designing the complex relations between the features for a circadian disorder in the future. Features derived from unsupervised algorithms can serve as input features to supervised learning, used to build discriminative markers, and directly used as functional modules. The constructed features are typically compressed representation of input data in a lower dimension. They maintain essential information in the input but are better organized than the input with less noise or artifacts. Therefore, it is easier to build classifiers on the summarized features than raw input data, and the success of a classifier heavily depends on the choice of data representation We proved our finding using machine learning classification framework. With our representation, we could enhance simple linear SVM accuracy from 63% to 75% We also proposed a novel machine learning approach to evaluating the circadian disruption using robust regression as a contextual anomaly detection method. The main aspect of novelty in this work is coming from applying a point anomaly detection technique with respect to a circadian rhythmicity context. To the best of our knowledge, this work is the first which introduced the use of NR1D1/NR1D2 clock genes as prior knowledge to detect genes pathways involved in response to sleep disruption. In the Circadian Disruption Detection (CDD) model, we implemented and validated a model that successfully model the normal samples. While in anomalies samples i.e. samples with significant transcription effect under the circadian disruption, the model was acting poorly. Results of the analysis of variance (ANOVA) and t-test show the benefits of using our robust multi-regression errors as a biological biomarker to detect sleep deprivation using genes microarray data. we found that there was a significant difference between the error distribution for the normal sleep and the anomalies samples at the p < 0.05 level. The model used to identify a quantitative measurement for sleep disruption in human regardless of the time of the day.
Supervisor: Tang, H. Lilian ; Laing, Emma Sponsor: King Abdulaziz University Jeddah ; Saudi Arabia
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.773697  DOI:
Share: