Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.635591
Title: Monitoring, analysis and optimisation of I/O in parallel applications
Author: Wright, Steven A.
ISNI:       0000 0004 5357 5510
Awarding Body: University of Warwick
Current Institution: University of Warwick
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
High performance computing (HPC) is changing the way science is performed in the 21st Century; experiments that once took enormous amounts of time, were dangerous and often produced inaccurate results can now be performed and refined in a fraction of the time in a simulation environment. Current generation supercomputers are running in excess of 1016 floating point operations per second, and the push towards exascale will see this increase by two orders of magnitude. To achieve this level of performance it is thought that applications may have to scale to potentially billions of simultaneous threads, pushing hardware to its limits and severely impacting failure rates. To reduce the cost of these failures, many applications use checkpointing to periodically save their state to persistent storage, such that, in the event of a failure, computation can be restarted without significant data loss. As computational power has grown by approximately 2x every 18 − 24 months, persistent storage has lagged behind; checkpointing is fast becoming a bottleneck to performance. Several software and hardware solutions have been presented to solve the current I/O problem being experienced in the HPC community and this thesis examines some of these. Specifically, this thesis presents a tool designed for analysing and optimising the I/O behaviour of scientific applications, as well as a tool designed to allow the rapid analysis of one software solution to the problem of parallel I/O, namely the parallel log-structured file system (PLFS). This thesis ends with an analysis of a modern Lustre file system under contention from multiple applications and multiple compute nodes running the same problem through PLFS. The results and analysis presented outline a framework through which application settings and procurement decisions can be made.
Supervisor: Not available Sponsor: Department of Computer Science, University of Warwick
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.635591  DOI: Not available
Keywords: QA76 Electronic computers. Computer science. Computer software
Share: