Agent-based resource management for grid computing
A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capability. An ideal grid environment should provide access to the available resources in a seamless manner. Resource management is an important infrastructural component of a grid computing environment. The overall aim of resource management is to efficiently schedule applications that need to utilise the available resources in the grid environment. Such goals within the high performance community will rely on accurate performance prediction capabilities. An existing toolkit, known as PACE (Performance Analysis and Characterisation Environment), is used to provide quantitative data concerning the performance of sophisticated applications running on high performance resources. In this thesis an ASCI (Accelerated Strategic Computing Initiative) kernel application, Sweep3D, is used to illustrate the PACE performance prediction capabilities. The validation results show that a reasonable accuracy can be obtained, cross-platform comparisons can be easily undertaken, and the process benefits from a rapid evaluation time. While extremely well-suited for managing a locally distributed multi-computer, the PACE functions do not map well onto a wide-area environment, where heterogeneity, multiple administrative domains, and communication irregularities dramatically complicate the job of resource management. Scalability and adaptability are two key challenges that must be addressed. In this thesis, an A4 (Agile Architecture and Autonomous Agents) methodology is introduced for the development of large-scale distributed software systems with highly dynamic behaviours. An agent is considered to be both a service provider and a service requestor. Agents are organised into a hierarchy with service advertisement and discovery capabilities. There are four main performance metrics for an A4 system: service discovery speed, agent system efficiency, workload balancing, and discovery success rate. Coupling the A4 methodology with PACE functions, results in an Agent-based Resource Management System (ARMS), which is implemented for grid computing. The PACE functions supply accurate performance information (e. g. execution time) as input to a local resource scheduler on the fly. At a meta-level, agents advertise their service information and cooperate with each other to discover available resources for grid-enabled applications. A Performance Monitor and Advisor (PMA) is also developed in ARMS to optimise the performance of the agent behaviours. The PMA is capable of performance modelling and simulation about the agents in ARMS and can be used to improve overall system performance. The PMA can monitor agent behaviours in ARMS and reconfigure them with optimised strategies, which include the use of ACTs (Agent Capability Tables), limited service lifetime, limited scope for service advertisement and discovery, agent mobility and service distribution, etc. The main contribution of this work is that it provides a methodology and prototype implementation of a grid Resource Management System (RMS). The system includes a number of original features that cannot be found in existing research solutions.