Use this URL to cite or link to this record in EThOS:
Title: Risk assessment models for resource failure in grid computing
Author: Alsoghayer, Raid Abdullah
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
Access from Institution:
Service Level Agreements (SLAs) are introduced in order to overcome the limitations associated with the best-effort approach in Grid computing, and to accordingly make Grid computing more attractive for commercial uses. However, commercial Grid providers are not keen to adopt SLAs since there is a risk of SLA violation as a result of resource failure, which will result in a penalty fee; therefore, the need to model the resources risk of failure is critical to Grid resource providers. Essentially, moving from the best-effort approach for accepting SLAs to a risk-aware approach assists the Grid resource provider to provide a high-level Quality of Service (QoS). Moreover, risk is an important factor in establishing the resource price and penalty fee in the case of resource failure. In light of this, we propose a mathematical model to predict the risk of failure of a Grid resource using a discrete-time analytical model driven by reliability functions fitted to observed data. The model relies on the resource historical information so as to predict the probability of the resource failure (risk of failure) for a given time interval. The model was evaluated by comparing the predicted risk of failure with the observed risk of failure using availability data gathered from Grids resources. The risk of failure is an important property of a Grid resource, especially when scheduling jobs optimally in relation to resources so as to achieve a business objective. However, in Grid computing, user-centric scheduling algorithms ignore the risk factor and mostly address the minimisation of the cost of the resource allocation, or the overall deadline by which the job must be executed completely. Therefore, we propose a novel user-centric scheduling algorithm for scheduling Bag of Tasks (BoT) applications. The algorithm, which aims to meet user requirements, takes into account the risk of failure, the cost of resources and the job deadline. With this in mind, through simulation, we demonstrate that the algorithm provides a near-optimal solution for minimizing the cost of executing BoT jobs. Also, we show that the execution time of the proposed algorithm is very low, and is therefore suitable for solving scheduling problems in real-time. Risk assessment benefits the resource provider by providing methods to either support accepting or rejecting an SLA. Moreover, it will enable the resource provider to understand the capacity of the infrastructure and to thereby plan future investment. Scheduling algorithms will benefit the resource provider by providing methods to meet user requirements and the better utilisation of resources. The ability to adopt a risk assessment method and user-centric algorithms makes the exploitation of Grid systems more realistic.
Supervisor: Djemam, K. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available