Use this URL to cite or link to this record in EThOS:
Title: Quality-aware overload management for stream processing
Author: Fiscato, Marco
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
Data stream processing systems (DSPSs) compute real-time queries over continuously changing streams of data. A stream is a potentially infinite sequence of tuples, or timestamped data items. The constant increase in data volume renders the provisioning of a DSPS challenging, requiring computing resources that may not be available or too costly to purchase. Even if a user opts for a cloud deployment model, thus renting all resources on demand, acquiring sufficient resources may still not be possible due to the financial cost. In the future, we can expect the development of processing infrastructures, in which different parties cooperate to create federated resource pools. These kinds of deployments, in which many parties pool together their resources, are subject to a phenomenon similar to the tragedy of the commons. Since every party tends to consume more resources than they contribute, the amount of available resources is always scarce. For these reasons, overload should be considered a common operating condition for such DSPS and not an exception. In such situations, the system needs to discard some of its input data, which is an operation called load-shedding. When an overloaded system performs load-shedding, the choice of how much and what to discard is crucial for the correct functioning of the system. Many streaming applications are able to produce useful results even after some data has been discarded during the processing. Examples of such applications are meso-scale weather prediction, better tor- nadoes and hurricanes forecasting and real-time social media monitoring. An approximate result may still be useful to the user, as long as it is delivered with a low latency and contains some information about its quality. We propose a new model for federated stream processing under overload. The system constantly es- timates the impact of overload on the computation and reports to the user the achieved processing quality. We introduce a quality metric called Source Coverage Ratio (SCR). This can be used by the user as an indicator for the achieved processing quality and by the system to implement intelligent shedding policies and to better allocate the system resources among users. The SCR quality metric allows the implementation of a fair shedding policy, giving an equal processing quality to all users without penalising individual queries. We experimentally show that augmenting streams with the SCR metric allows the system to make better load-shedding decisions, leading to more accurate results for many types of queries. It also allows the user to reason about the amount of processing resources that are needed to run a given query, striking a balance between the quality of the delivered results and the resource cost.
Supervisor: Pietzuch, Peter Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral