Use this URL to cite or link to this record in EThOS:
Title: Machine learning with limited information : risk stratification and predictive modelling for clinical applications
Author: Shaikhina, Torgyn
ISNI:       0000 0004 7224 2631
Awarding Body: University of Warwick
Current Institution: University of Warwick
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
The high cost, complexity and multimodality of clinical data collection restrain the datasets available for predictive modelling using machine learning (ML), thus necessitating new data-efficient approaches specifically for limited datasets. This interdisciplinary thesis focuses on clinical outcome modelling using a range of ML techniques, including artificial neural networks (NNs) and their ensembles, decision trees (DTs) and random forests (RFs), as well as classical logistic regression (LR) and Cox proportional hazards (Cox PH) models. The utility of ML for data-efficient regression, classification and survival analyses was investigated in three clinical applications, whereby exposing the common limitations inherent in patient data, such as class imbalance, incomplete samples, and, in particular, limited dataset size. The latter problem was addressed by developing a methodological framework for learning from datasets with less than 10 observations per predictor variable. A novel method of multiple runs overcame the volatility of NN and DT models due to limited training samples, while a surrogate data test allowed for regression model evaluation in the presence of noise due to limited dataset size. When applied to hard tissue engineering for predicting femoral fracture risk, the framework resulted in 98.3% accurate regression NN. The framework was used to detect early rejection in antibody- incompatible kidney transplantation, achieving 85% accurate classification DT. The third clinical task – that of predicting 10-year incidence of type 2 diabetes in the UK population – resulted in 70-85% accurate classification and survival models, whilst highlighting the challenges of learning with the limited information characteristic of routinely collected data. By discovering unintuitive patterns, supporting existing hypotheses and generating novel insight, the ML models developed in this research contributed meaningfully to clinical research and paved the way for data-efficient applications of ML in engineering and clinical practice.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Q Science (General) ; R Medicine (General)