Use this URL to cite or link to this record in EThOS:
Title: Task-based parallelism for general purpose graphics processing units and hybrid shared-distributed memory systems
Author: Chalk, Aidan Bernard Gerard
ISNI:       0000 0004 6420 9364
Awarding Body: Durham University
Current Institution: Durham University
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
Modern computers can no longer rely on increasing CPU speed to improve their performance as further increasing the clock speed of single CPU machines will make them too difficult to cool, or the cooling require too much power. Hardware manufacturers must now use parallelism to drive performance to the levels expected by Moore's Law. More recently, High Performance Computers (HPCs) have adopted heterogeneous architectures, i.e.having multiple types of computing hardware (such as CPU & GPU) on a single node. These architectures allow the opportunity to extract performance from non-CPU architectures, while still providing a general purpose platform for less modern codes. In this thesis we investigate Task-Based Parallelism, a shared-memory paradigm for parallel computing. Task-Based Parallelism requires the programmer to divide the work into chunks (known as tasks) and describe the data dependencies between tasks. The tasks are then scheduled amongst the threads automatically by the task-based scheduler. In this thesis we examine how Task-Based Parallelism can be used with GPUs and hybrid shared-distributed memory, in particular we examine how data transfer can be incorporated into a task-based framework, either to the GPU from the host, or between separate nodes. We also examine how we can use the task graph to load balance the computation between multiple nodes or GPUs. We test our task-based methods with Molecular Dynamics, a tiled QR decomposition, and a new task-based Barnes-Hut algorithm. These are problems with different dependency structures which tests the ability of the scheduler to handle a variety of different types of computation. The results with these testcases show improved performance when we use asynchronous data transfer to and from the GPU, and show reasonable parallel efficiency over a small number of MPI ranks.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available