Use this URL to cite or link to this record in EThOS:
Title: Analysis and parameter prediction of compiler transformation for graphics processors
Author: Magni, Alberto
ISNI:       0000 0004 5916 4317
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
In the last decade graphics processors (GPUs) have been extensively used to solve computationally intensive problems. A variety of GPU architectures by different hardware manufacturers have been shipped in a few years. OpenCL has been introduced as the standard cross-vendor programming framework for GPU computing. Writing and optimising OpenCL applications is a challenging task, the programmer has to take care of several low level details. This is even harder when the goal is to improve performance on a wide range of devices: OpenCL does not guarantee performance portability. In this thesis we focus on the analysis and the portability of compiler optimisations. We describe the implementation of a portable compiler transformation: thread-coarsening. The transformation increases the amount of work carried out by a single thread running on the GPU. The goal is to reduce the amount of redundant instructions executed by the parallel application. The first contribution is a technique to analyse performance improvements and degradations given by the compiler transformation, we study the changes of hardware performance counters when applying coarsening. In this way we identify the root causes of execution time variations due to coarsening. As second contribution, we study the relative performance of coarsening over multiple input sizes. We show that the speedups given by coarsening are stable for problem sizes larger than a threshold that we call saturation point. We exploit the existence of the saturation point to speedup iterative compilation. The last contribution of the work is the development of a machine learning technique that automatically selects a coarsening configuration that improves performance. The technique is based on an iterative model built using a neural network. The network is trained once for a GPU model and used for several programs. To prove the flexibility of our techniques, all our experiments have been deployed on multiple GPU models by different vendors.
Supervisor: O'Boyle, Michael ; Dubach, Christopher Sponsor: Engineering and Physical Sciences Research Council (EPSRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: compliers ; graphics processors ; performance optimization