Use this URL to cite or link to this record in EThOS:
Title: Aspect discovery and sentiment classification for online reviews
Author: Burns, Nicola
Awarding Body: University of Ulster
Current Institution: Ulster University
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Buying products and services online is becoming increasingly popular and as a result there are a vast number of online reviews. Automatic classification of this increasing large data has become a popular area of interest in recent research as the information contained in these reviews is valuable to potential customers and marketing intelligence. The work in this thesis is focused on discovering aspects and sentiment of online reviews using a topic modelling based approach. Sentiment analysis is to automatically discover opinions whereas topic modelling discovers latent topics. Topic modelling is combined with sentiment analysis techniques to create an effective approach to sentiment analysis. There are three problems which are addressed in this work. Firstly, the classes of real world product reviews tend to be highly imbalanced. When dealing with unbalanced data, data miners usually pre-process the unbalanced data so that they are class-balanced. This work therefore studies the comparison of balanced vs unbalanced datasets, and aims to answer the question: how to model unbalanced data sets, either artificially balance them or keep them unbalanced as they are? A series of experiments are performed to investigate the datasets in different scenarios. Experimental results provide evidence that within the product review domain there is no need to artificially balance a dataset as sentiment analysis on an unbalanced dataset performs better than a balanced dataset. Secondly, the LDA (Latent Dirichlet allocation) model is a popular choice for topic modelling, however the model comes with some shortcomings including identifying topics which could be considered too broad and the manual work to label all the topics produced. This work proposes a novel method, the Twofold-LDA model, to identify aspects and quantify sentiment, which incorporates domain knowledge, removes the one aspect per sentence assumption, and extracts such information that allows the sentiment analysis results to be presented in a user-friendly way. Finally, there has been no known work which focuses on ways to improve topic modelling to perform sentiment classification. As past studies show sentiment analysis techniques provide good performance for identifying sentiment, this work looks at how to incorporate sentiment analysis techniques into the topic modelling process. The Enhanced Twofold-LOA model is proposed which incorporates part-of-speech tagging into the topic / modelling process via altering the Gibbs sampling process. A case study is carried out to demonstrate the ability of the Enhanced Twofold-LDA model for solving practical problems, in particular through creating an end user application aimed at hotel customers.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available