Use this URL to cite or link to this record in EThOS:
Title: Data mining and machine learning to predict acute coronary syndrome mortality
Author: Jaafar, Juliana
ISNI:       0000 0004 7230 1582
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis has investigated and demonstrated the potential for developing prediction models using Machine Learning(ML) algorithms on registry datasets. Many current Acute Coronary Syndrome (ACS) prediction models, were developed using traditional statistical methods. In an era of big-data evolution, ML offers a spectrum of algorithms that aid in generating prediction models for ACS. This study has explored 29 algorithms with which to build ACS prediction models for Asian (Malaysia) and Western (Leeds, UK) registries, covering patients with all types of ACS and those with the new standard ACS treatments. The internal and external validation of the models present satisfactory calibration measures, indicating the ability of ML algorithms to produce competitive models in comparison to traditional statistical methods. To achieve simpler, yet competitive predictive performance, comprehensive ML feature selection methods have been evaluated, and Correlation-Based-Feature-Selection(CFS) emerged as the best method. This thesis also has evaluated the potential of predictors of existing ACS models to be adapted to other registries‘ data. Despite different regions and different population characteristics, most of the existing predictors remains constant with the outcome. Thus, the findings suggest that, with some adjustments customized to the registry, the existing predictors can be adopted to develop a simple model and expedite the model development process. Furthermore, the strength of the predictors of each clinical categories has also been evaluated. The results suggest that, to construct a satisfactory ACS model, combination of predictors from various clinical events is essential. At the very least, to achieve a satisfactory model, combination of demographic, medical history, and clinical presentation information categories is required. However, predictors from medication history category has found to be worthless in terms of contributing to a better prediction model. Next, this study has investigated classifier degradation in ML model development. The findings suggest that the overlapping instances in minority class of imbalanced dataset and missing values are the main problems of classifier degradation. New methods i.e. the overlapped-undersampling method to handle imbalanced dataset and the mean-clustering-imputation method to handle missing values have been introduced. The overlapped-undersampling failed to boost the model performance of the datasets. Nevertheless, the results suggest that more training samples on imbalanced datasets are sufficient to produce satisfactory models. The mean-clustering-imputation method produced better models compare to the simple imputation method and imputation method embedded in an algorithm. However, removing instances with missing data resulted in superior models.
Supervisor: Atwell, Eric ; Wyatt, Jeremy C. ; Clamp, Susan Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available