Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.728126
Title: Data-driven temporal information extraction with applications in general and clinical domains
Author: Filannino, Michele
ISNI:       0000 0004 6497 914X
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. However, Temporal Information Extraction (TIE) is a challenging task because of the amount of types of expressions (durations, frequencies, times, dates) and their high morphological variability and ambiguity. As far as the approaches are concerned, the most common among the existing ones is rule-based, while data-driven ones are under-explored. This thesis introduces a novel domain-independent data-driven TIE strategy. The identification strategy is based on machine learning sequence labelling classifiers on features selected through an extensive exploration. Results are further optimised using an a posteriori label-adjustment pipeline. The normalisation strategy is rule-based and builds on a pre-existing system. The methodology has been applied to both specific (clinical) and generic domain, and has been officially benchmarked at the i2b2/2012 and TempEval-3 challenges, ranking respectively 3rd and 1st. The results prove the TIE task to be more challenging in the clinical domain (overall accuracy 63%) rather than in the general domain (overall accuracy 69%).Finally, this thesis also presents two applications of TIE. One of them introduces the concept of temporal footprint of a Wikipedia article, and uses it to mine the life span of persons. In the other case, TIE techniques are used to improve pre-existing information retrieval systems by filtering out temporally irrelevant results.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.728126  DOI: Not available
Keywords: machine learning ; text mining ; temporal information extraction
Share: