Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.683389
Title: New software-based fault tolerance methods for high performance computing
Author: Hunt, Robert D.
ISNI:       0000 0004 5916 2610
Awarding Body: University of Bristol
Current Institution: University of Bristol
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Abstract:
As computer systems become ever more powerful and parallel, processing larger and larger sets of data, there is increased need for ensuring that scientific software applications are tolerant to faults in both hardware and software. New algorithms which take advantage of knowledge about the structure and calculation of important mathematical problems would enable increasingly more efficient and fault tolerant computation to be performed with minimal overhead. This thesis demonstrates how improvements to two important application areas in High Performance Computing (HP C) - that of Monte Carlo methods and Sparse Linear Algebra - can result in software with greater fault tolerance alongside low overheads. It proposes models that employ variations on existing techniques dealing with layout topologies in grids and a form of Error-Correcting Code (ECC) to provide an increased degree of fault tolerance in calculations. The models make efficient use of the variations to produce schemes that are both robust and based on straightforward approaches which can be implemented in a simple manner. The theory behind the models is developed and evaluated and basic implementations are created to gauge the performance and viability of the schemes. Both models perform well in the majority of cases with low overheads in the range of 0-10%, and both are eminently scalable. Furthermore, the methods with highest overhead in the Sparse Linear Algebra schemes are found to increase in performance for larger data sets that are more sparse - those that would require the extra protection afforded by software fault tolerance the most.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.683389  DOI: Not available
Share: