Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.630034
Title: Fully automated transformation of hardware-agnostic, data-parallel programs for host-driven executions on GPUs
Author: Guo, Jing
Awarding Body: University of Hertfordshire
Current Institution: University of Hertfordshire
Date of Award: 2012
Availability of Full Text:
Access through EThOS:
Abstract:
This thesis explores the feasibility and performance gains of a fully integrated and automatic approach to generating GPU programs from a high-level and completely hardware-agnostic abstraction. Over the past decade, Graphic Processing Units (GPUs) have become increasingly popular because of their massive computing power and attractive performance/price ratios. Various high-level programming models have further driven the widespread use of GPUs for computation and data intensive general-purpose applications. In the literature, orders of magnitude speedups against single- or multi-core CPUs have been reported. Despite such advancements, developers still shoulder the burden of exploiting complex low-level hardware details to achieve optimal performance. Therefore, it is of great interests and benefits to have an even higher level of programming abstraction. To this end, we base our research on a functional array programming language, which supports both implicit memory management and high-level dataparallel operations. Within this context, we identify several key challenges that must be overcome to achieve competitive performance: mapping the data-parallel operations efficiently onto GPU’s massive parallelism, managing and minimising CPU-GPU data communications, optimising GPU memory access efficiency and overcoming the data copying problem inherent to the functional setting. Compilation techniques addressing these challenges have been proposed and implemented in the Single Assignment C (SAC) compiler framework, which allows the automatic generation of GPU programs from very high-level abstractions. Experimental results have shown that, for a set of representative parallel applications, our compiler-generated codes can achieve a level of performance that is (on average) one order of magnitude higher than the hand-written sequential counterparts. For several dense linear algebra kernels, the performance is comparable to or
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.630034  DOI: Not available
Share: