Use this URL to cite or link to this record in EThOS:
Title: Inferring the geolocation of tweets at a fine-grained level
Author: Gonzalez Paule, Jorge David
ISNI:       0000 0004 7655 267X
Awarding Body: University of Glasgow
Current Institution: University of Glasgow
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Recently, the use of Twitter data has become important for a wide range of real-time applications, including real-time event detection, topic detection or disaster and emergency management. These applications require to know the precise location of the tweets for their analysis. However, approximately 1% of the tweets are finely-grained geotagged, which remains insufficient for such applications. To overcome this limitation, predicting the location of non-geotagged tweets, while challenging, can increase the sample of geotagged data to support the applications mentioned above. Nevertheless, existing approaches on tweet geolocalisation are mostly focusing on the geolocation of tweets at a coarse-grained level of granularity (i.e., city or country level). Thus, geolocalising tweets at a fine-grained level (i.e., street or building level) has arisen as a newly open research problem. In this thesis, we investigate the problem of inferring the geolocation of non-geotagged tweets at a fine-grained level of granularity (i.e., at most 1 km error distance). In particular, we aim to predict the geolocation where a given tweet was generated using its text as a source of evidence. This thesis states that the geolocalisation of non-geotagged tweets at a fine-grained level can be achieved by exploiting the characteristics of the 1\% of already available individual finely-grained geotagged tweets provided by the Twitter stream. We evaluate the state-of-the-art, derive insights on their issues and propose an evolution of techniques to achieve the geolocalisation of tweets at a fine-grained level. First, we explore the existing approaches in the literature for tweet geolocalisation and derive insights on the problems they exhibit when adapted to work at a fine-grained level. To overcome these problems, we propose a new approach that ranks individual geotagged tweets based on their content similarity to a given non-geotagged. Our experimental results show significant improvements over previous approaches. Next, we explore the predictability of the location of a tweet at a fine-grained level in order to reduce the average error distance of the predictions. We postulate that to obtain a fine-grained prediction a correlation between similarity and geographical distance should exist, and define the boundaries were fine-grained predictions can be achieved. To do that, we incorporate a majority voting algorithm to the ranking approach that assesses if such correlation exists by exploiting the geographical evidence encoded within the Top-N most similar geotagged tweets in the ranking. We report experimental results and demonstrate that by considering this geographical evidence, we can reduce the average error distance, but with a cost in coverage (the number of tweets for which our approach can find a fine-grained geolocation). Furthermore, we investigate whether the quality of the ranking of the Top-N geotagged tweets affects the effectiveness of fine-grained geolocalisation, and propose a new approach to improve the ranking. To this end, we adopt a learning to rank approach that re-ranks geotagged tweets based on their geographical proximity to a given non-geotagged tweet. We test different learning to rank algorithms and propose multiple features to model fine-grained geolocalisation. Moreover, we investigate the best performing combination of features for fine-grained geolocalisation. This thesis also demonstrates the applicability and generalisation of our fine-grained geolocalisation approaches in a practical scenario related to a traffic incident detection task. We show the effectiveness of using new geolocalised incident-related tweets in detecting the geolocation of real incidents reports, and demonstrate that we can improve the overall performance of the traffic incident detection task by enhancing the already available geotagged tweets with new tweets that were geolocalised using our approach. The key contribution of this thesis is the development of effective approaches for geolocalising tweets at a fine-grained level. The thesis provides insights on the main challenges for achieving the fine-grained geolocalisation derived from exhaustive experiments over a ground truth of geotagged tweets gathered from two different cities. Additionally, we demonstrate its effectiveness in a traffic incident detection task by geolocalising new incident-related tweets using our fine-grained geolocalisation approaches.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: QA75 Electronic computers. Computer science ; QA76 Computer software