Use this URL to cite or link to this record in EThOS:
Title: Towards achieving autonomic performance resilience in dstributed service-oriented systems
Author: Zhang, Rui
ISNI:       0000 0004 2722 9198
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2009
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
As businesses and institutions move towards an increasingly distributed, collaborative and outsourcing-oriented IT model, the last decade has seen the rapid rise of a new distributed computing framework, the service-oriented architecture (SOA). Within this framework, distributed applications are essentially dynamic run-time ensemble of software and infrastructure services implemented, published and maintained by dedicated service providers. Given the large-scale, complex and dynamic natures of these distributed serviceoriented systems, performance problems are becoming more likely to occur and often extremely difficult for human administrators to analyze and indeed rectify. In order to deal with this growing problem, a high degree of performance resilience must be injected into the system, such that it can autonomously overcome problems and resume normal service with minimal human intervention. This thesis takes a novel step towards addressing this issue using a monitoranalyze-action loop centered on customized statistical learning. Firstly, the SOA stack is instrumented to trace service-oriented workload across system components in an end-to-end manner so as to collect system-wide performance statistics attributable to requests. These groups of data are then fed into a Bayesian network framework where they are used to (1) automatically generate a performance model to link per-service performance status to end-to-end Service level agreements (SLA) goals, and (2) isolate the service(s) that are most responsible for (i.e, guilty of) end-to-end performance degradation taking into consideration the absolute performance, abnormality and application- level impact of these services. Finally, the performance of the identified guilty service is restored via one (or more) of three recuperation techniques relying on alternative service providers as well as local service restart and reconfig facilities respectively. We implemented Spring, a prototype of the approach proposed in this dissertation against the open source SOA stack with a specific emphasis on platform independence and low overhead. Experiments showed that Spring was highly effective in autonomously tackling real performance problems in two distinct real-world testbeds for distributed service-oriented systems. Extensive evaluation was also conducted through simulations on the most critical Spring component, problem localization. The results highlighted its strengths both in terms of problem localization accuracy and efficiency.
Supervisor: McKeever, Steve Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available