Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.790849
Title: Discovering and understanding community opinions of neighbourhoods expressed in question answering platforms
Author: Saeidi, M.
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
Humans value the opinions of others. In recent years, people have been using social media platforms to both voice and gather opinions. Looking for relevant pieces of information through the huge amount of expressed opinions across several platforms is an overwhelming task. This is why automatically extracting information from such sources has received a great deal of attention in both academia and industry. However, little work in this field has been dedicated to the domain of city neighbourhoods. One reason is that unlike for many products and services, there are no dedicated review platforms for collecting opinions regarding the neighbourhoods. In the absence of dedicated review sites, a great amount of expressed opinions on neighbourhoods and other domains can be found on community question answering (QA) platforms. So far, this data has not been used. This raises a question as to what the strengths and limitations of QA data are and what challenges does it bring for extracting opinion information expressed about neighbourhoods. In this thesis, we comprehensively investigate these questions, using data from Yahoo! Answers for neighbourhoods of London. First, we investigate how well QA discussions reflect the demographic attributes of neighbourhoods present in census (e.g. age, religion, etc.). Our results show that significant, strong and meaningful correlations exist between text features from QA data and many demographic attributes. For instance, the terms poverty, drug, and rundown are amongst the top correlated terms with the attribute deprivation. We further demonstrate that text features based on Yahoo! Answers discussions can achieve a very good accuracy in predicting a wide range of demographic attributes for neighbourhoods. These predictions outperform predictions that are made using Twitter data, a platform that has been used widely in the past for predicting many real-world attributes. Demographics data provides objective statistics related to the population of neighbourhoods. Many attributes of interest are not reflected in those statistics. For instance, census data does not record statistics regarding whether a neighbourhood is posh, quiet or good for nightlife. Knowing these aspects is complementary to the demographic attributes in forming an understanding of neighbourhoods. We investigate whether text features from QA data can predict such aspects. To do this, we create a dataset of neighbourhoods labeled with these aspects. Our prediction results show that QA data can predict such aspects with a higher performance compared to Twitter data in the presence of these labels. Predicting a single value for a characteristic of a neighbourhood cannot provide a complete picture of people's opinions. To provide a fine-grained summary, a popular approach is to extract the sentiments towards different aspects of a given entity from each expressed opinion. Aspect-based sentiment analysis has been studied extensively, but research has always utilised the text from dedicated review platforms where a user usually writes opinions on a single specified entity. In the absence of a review platform for neighbourhoods, we extend the task to process the text from QA platforms where fewer assumptions can be made and the data is noisy. We construct a human-annotated dataset based on text from Yahoo! Answers discussions with a high inter-annotator agreements of over 70%, a suitable level for this task. To address this task, we propose methods based on representations of text that are learned sequentially using recurrent neural models or representations that are defined using the traditional bag of n-grams features. Our proposed methods can achieve prediction accuracies on similar levels to the less challenging sentiment analysis tasks. In summary, the study in this thesis demonstrates the strengths of QA data in predicting the values of real-world entities and for extracting information from opinions, specifically for the domain of city neighbourhoods.
Supervisor: Riedel, S. ; Capra, L. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.790849  DOI: Not available
Share: