Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.749513
Title: Automatically explaining literature based discoveries
Author: McClure, Maryhilda Heidi
ISNI:       0000 0004 7233 8959
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Literature based discovery (LBD) identifies potentially related pairs of concepts that are not mentioned together in the same documents. The concept pairs may be identified via linking concepts that are mentioned in both sets of documents or via other statistical relatedness measures like latent semantic indexing. Unfortunately, the nature of the relationships are not identified so the importance and relevancy of the LBD pairs are not known. The primary objectives of this thesis are to identify candidate LBD related concepts and to determine if the natures of the relationship may be automatically explained using supervised machine learning classification. For example, in the benchmark LBD example of Raynaud’s phenomenon (A) being related to fish oil (C), candidate linking concepts are blood viscosity, platelet function and vascular reactivity. The linking concepts are referred to as Bs and, thus, create A-B-C LBD triples. The objectives of this work are to identify a training set of data that includes linking B terms, to identify the relationships between the A and B and the B and C pairs, and to apply supervised machine learning classification techniques to suggest relationship between the A to C concepts. In the Raynaud’s example, the suggestion would be that fish oil may treat Raynaud’s phenomenon. This work explores data representations suitable for applying classification techniques to explain the relationships. This work applies traditional classification evaluation methods on both classifier outcomes and data designs. Classifiers applied to the training data ultimately accurately predicted the A to C relationships over 70% of the time, while the chosen baselines only achieved approximately 30% accurately predicted relationships. The classifiers were then used on real LBD candidate pairs from an older set of MEDLINE abstracts found using statistical LBD. The predicted LBD explanations were validated against more recent literature which is a time-slice validation approach. To the best of my knowledge and research, relationship prediction techniques have not been applied to statistically related LBD candidate pairs to provide an explanation of how the A and C pairs are related. Additionally, applying time-slicing for validation of explained LBD candidates is also novel.
Supervisor: Stevenson, Mark ; Gaizauskas, Rob Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.749513  DOI: Not available
Share: