Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.605525
Title: Temporal models of streaming social media data
Author: Preotiuc-Pietro, Daniel
ISNI:       0000 0004 5358 7060
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2014
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
There are significant temporal dependencies between online behaviour and occurring real world activities. Particularly in text modelling, these are usually ignored or at best dealt with in overly simplistic ways such as assuming smooth variation with time. Social media is a new data source which present collective behaviour much more richly than traditional sources, such as newswire, with a finer time granularity, timely reflection of activities, multiple modalities and large volume. Analysing temporal patterns in this data is important in order to discover newly emerging topics, periodic occurrences and correlation or causality to real world indicators or human behaviour patterns. With these opportunities come many challenges, both engineering (i.e.\ data volume and processing) and algorithmic, namely the inconsistency and short length of the messages and the presence of large amounts of irrelevant messages to our goal. Equipped with a better understanding of the dynamics of the complex temporal dependencies, tasks such as classification can be augmented to provide temporally aware responses. In this thesis we model the temporal dynamics of social media data. We first show that temporality is an important characteristic of this type of data. Further comparisons and correlation to real world indicators show that this data gives a timely reflection of real world events. Our goal is to use these variations to discover emerging or recurring user behaviours. We consider both the use of words and user behaviour in social media. With these goals in mind, we adapt existing and build novel machine learning techniques. These span a wide range of models: from Markov models to regularised regression models and from evolutionary spectral clustering which models smooth temporal variation to Gaussian Process regression which can identify more complex temporal patterns. We introduce approaches which discover and predict words, topics or behaviours that change over time or occur with some regularity. These are modeled for the first time in the NLP literature by using Gaussian Processes. We demonstrate that we can effectively pick out patterns, including periodicities, and achieve state-of-the-art forecasting results. We show that this performance gain transfers to improve tasks which do not take temporal information in account. Further analysed is how temporal variation in the text can be used to discover and track new content. We develop a model that exploits the variation in word co-occurrences for clustering over time. Different collection and processing tools, as well as several datasets of social media data have been developed and published as open-source software. The thesis posits that temporal analysis of data, from social media in particular, provides us with insights into real-world dynamics. Incorporating this temporal information into other applications can benefit standard tasks in natural language processing and beyond.
Supervisor: Cohn, Trevor Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.605525  DOI: Not available
Share: