Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.738972
Title: Scaling real-time event detection to massive streams
Author: Wurzer, Dominik Stefan
ISNI:       0000 0004 7225 0412
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
In today’s world the internet and social media are omnipresent and information is accessible to everyone. This shifted the advantage from those who have access to information to those who do so first. Identifying new events as they emerge is of substantial value to financial institutions who consider realtime information in their decision making processes, as well as for journalists that report about breaking news and governmental agencies that collect information and respond to emergencies. First Story Detection is the task of identifying those documents in a stream of documents that talk about new events first. This seemingly simple task is non-trivial as the computational effort increases with every processed document. Standard approaches to solve First Story Detection determine a document’s novelty by comparing it to previously seen documents. This results in the highest reported accuracy but even the currently fastest system only scales to 10% of the Twitter stream. In this thesis, we propose a new algorithm family, called memory-based methods, able to scale to the full Twitter stream on a single core. Our memory-based method computes a document’s novelty up to two orders of magnitude faster than state-of-the-art systems without sacrificing accuracy. This thesis additional provides original work on the impact of processing unbounded data streams on detection accuracy. Our experiments reveal for the first time that the novelty scores of state-of-the-art comparison based and memory-based methods decay over time. We show how to counteract the discovered novelty decay and increase detection accuracy. Additionally, we show that memory-based methods are applicable beyond First Story Detection by building the first real time rumour detection system on social media streams.
Supervisor: Thompson, Henry ; Lavrenko, Victor Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.738972  DOI: Not available
Keywords: realtime information ; First Story Detection ; memory-based methods ; algorithms ; detection accuracy ; rumour detection ; social media
Share: