Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.684352
Title: Stateful data-parallel processing
Author: Castro Fernandez, Raul
ISNI:       0000 0004 5920 9503
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2016
Availability of Full Text:
Access through EThOS:
Full text unavailable from EThOS. Please try the link below.
Access through Institution:
Abstract:
Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial - it brings domain-specific knowledge from broad fields - but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs - with explicit state that they can update, such as matrices in machine learning algorithms - and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state - spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.
Supervisor: Pietzuch, Peter Sponsor: Engineering and Physical Sciences Research Council ; BAE Systems
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.684352  DOI: Not available
Share: