Measurement and simulation of the performance of high energy physics data grids
This thesis describes a study of resource brokering in a computational Grid for high energy physics. Such systems are being devised in order to manage the unprecedented workload of the next generation particle physics experiments such as those at the Large Hadron Collider. A simulation of the European Data Grid has been constructed, and calibrated using logging data from a real Grid testbed. This model is then used to explore the Grid's middleware configuration, and suggest improvements to its scheduling policy. The expansion of the simulation to include data analysis of the type conducted by particle physicists is then described. A variety of job and data management policies are explored, in order to determine how well they meet the needs of physicists, as well as how efficiently they make use of CPU and network resources. Appropriate performance indicators are introduced in order to measure how well jobs and resources are managed from different perspectives. The effects of inefficiencies in Grid middleware are explored, as are methods of compensating for them. It is demonstrated that a scheduling algorithm should alter its weighting on load balancing and data distribution, depending on whether data transfer or CPU requirements dominate, and also on the level of job loading. It is also shown that an economic model for data management and replication can improve the efficiency of network use and job processing.