Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.782959
Title: Where are you talking about? : advances and challenges of geographic analysis of text with application to disease monitoring
Author: Gritta, Milan
ISNI:       0000 0004 7968 5590
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
The Natural Language Processing task we focus on in this thesis is Geoparsing. Geoparsing is the process of extraction and grounding of toponyms (place names). Consider this sentence: "The victims of the Spanish earthquake off the coast of Malaga were of American and Mexican origin." Four toponyms will be extracted (called Geotagging) and grounded to their geographic coordinates (called Toponym Resolution). However, our research goes further than any previous work by showing how to distinguish the literal place(s) of the event (Spain, Malaga) from other linguistic types/uses such as nationalities (Mexican, American), improving downstream task accuracy. We consolidate and extend the Standard Evaluation Framework, discuss key research problems, then present concrete solutions in order to advance each stage of geoparsing. For geotagging, as well as training a SOTA neural Location-NER tagger, we simplify Metonymy Resolution with a novel minimalist feature extraction combined with an LSTM-based classifier, matching SOTA results. For toponym resolution, we deploy the latest deep learning methods to achieve SOTA performance by augmenting neural models with hitherto unused geographic features called Map Vectors. With each research project, we provide high-quality datasets and system prototypes, further building resources in this field. We then show how these geoparsing advances coupled with our proposed Intra-Document Analysis can be used to associate news articles with locations in order to monitor the spread of public health threats. To this end, we evaluate our research contributions with production data from a real-time downstream application to improve geolocation of news events for disease monitoring. The data was made available to us by the Joint Research Centre (JRC), which operates one such system called MediSys that processes incoming news articles in order to monitor threats to public health and make these available to a variety of governmental, business and non-profit organisations. We also discuss steps towards an end-to-end, automated news monitoring system and make actionable recommendations for future work. In summary, the thesis aims are twofold: (1) Generate original geoparsing research aimed at advancing each stage of the pipeline by addressing pertinent challenges with concrete solutions and actionable proposals. (2) Demonstrate how this research can be applied to news event monitoring to increase the efficacy of existing biosurveillance systems, e.g. European Commission's MediSys.
Supervisor: Collier, Nigel Sponsor: NERC
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.782959  DOI:
Keywords: Geoparsing ; Geocoding ; Toponym Resolution ; Geolocation ; Biosurveillance ; Geotagging ; Named Entity Recognition ; Location Resolution ; Geographical Parsing
Share: