Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.783206
Title: Secondary use of electronic medical records for early identification of raised condition likelihoods in individuals : a machine learning approach
Author: Turner, Jonathan
ISNI:       0000 0004 7968 8038
Awarding Body: City, University of London
Current Institution: City, University of London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
With many symptoms being common to multiple diseases, there is a challenge in producing an initial diagnosis or recommendation for diagnostic tests from a set of symptoms that could have been produced by a number of diseases. Often the initial choice of diagnosis or testing is based on a clinician's impression of the likelihood of that condition in a general population; however the opportunity may exist for modification of these likelihoods based on individuals' recorded medical histories. This data-driven approach utilises existing data and is thus cheap and non-invasive. A method is proposed by which an individual's likelihoods of having specified medical conditions are modified by the similarity of that individual's medical history to the medical histories of other individuals, comparing the prevalence of conditions in those other individuals' records who are similar to the individual of interest versus the prevalence of the conditions in those individuals who are dissimilar. In order to maximise the number of records available for analysis, a process was developed for the merging of data from disparate sources that used different clinical coding systems, including extensive development of a technique for semi automatically mapping clinical events coded in ICD9-CM to Clinical Terms Version 3 (CTV3), for which no existing mapping table was found. Semantically similar fields in the source code sets were identified and retained in the combined data set. 'Codelists' comprising multiple CTV3 codes for a variety of conditions were built that defined the presence of those conditions within individual records. The hierarchical structure of the CTV3 code table was utilised as a method of identifying codes that differed in structure but had clinically similar or related meaning. The optimum degree of granularity of the coded data to use in identifying similar records was investigated and used in subsequent analysis. Two methods were used for discovering groups of similar and dissimilar individuals: the 'nearest neighbours' method and the grouping of records using a clustering process. Altered likelihoods for a range of conditions were investigated and results for the nearest-neighbours approach compared to the clustering approach. Results for adjusted condition likelihoods for 18 conditions are reported, together with a discussion of possible reasons for a change, or otherwise, in the condition likelihood, and a discussion of the clinical significance and potential use of information about such a change. logistic regressions performed on a selection of conditions KNN performed better than logistic regression when judged by F-score (or sensitivity and specificity separately), however situation more nuanced when looking at likelihood ratios: Logistic regression produced higher (better) positive likelihood ratios, but KNN produced lower (better) negative likelihood ratios. Logistic regression produced higher odds ratios.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.783206  DOI: Not available
Keywords: Q Science (General) ; T Technology (General)
Share: