Use this URL to cite or link to this record in EThOS:
Title: Using novel data types for Big Data research in epilepsy : patient records, clinic letters and genetic mutation
Author: Lacey, Arron S.
ISNI:       0000 0004 7657 7519
Awarding Body: Swansea University
Current Institution: Swansea University
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Introduction: The aims of this thesis was to explore novel data types in healthcare that could enhance epidemiology studies in epilepsy and to develop novel methods of analysing routinely collected linked healthcare data, unstructured free text in hospital clinic letters and genetic variation. Method: The SAIL Databank was used to source linked healthcare data for people with epilepsy across Wales to study the effects of epilepsy and social deprivation, coding of epilepsy in GP records and the educational attainment of children born to mothers with epilepsy. Hospital clinic letters from Morriston Hospital in Swansea were analysed using Natural Language Processing techniques to extract rich clinic data not typically recorded as part of routinely collected data. An automated pipeline was developed to predict the pathogenicity of Single Nucleotide Polymorphisms to prioritize potential disease-causing genetic variation in epilepsy for further in-vitro analysis. Results: Incidence and prevalence of epilepsy was found to be strongly correlated with increased social deprivation, however a 10 year retrospective follow-up study found that there was no increase in deprivation following a diagnosis of epilepsy, pointing to deprivation contributing to social causation of epilepsy rather than epilepsy causing social drift. An algorithm was developed to accurately source epilepsy patients from GP records. Sodium Valproate was found to reduce educational attainment in 7 year olds by 12%. A Natural Language Processing pipeline was developed to extract rich epilepsy information from clinic letters. A pipeline was created to predict pathogencity of epilepsy SNPs that performed better than commonly used software. Conclusion: This thesis presents novel studies in epilepsy using population level healthcare data, unstructured clinic letters and genetic variation. New methods were developed that have the potential to be applied to other disease areas and used to link different data types into routinely collected healthcare records to enhance further research.
Supervisor: Rees, Mark I. ; Chung, Seokyung Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral