Use this URL to cite or link to this record in EThOS:
Title: Improving high performance computing using code generation and compilation techniques
Author: Bercea, Gheorghe-Teodor
ISNI:       0000 0004 6347 2902
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
In an ideal world, scientific applications would be expressed as high-level compositions of abstractions that encapsulate parallelism and deliver near-optimal performance with low maintainability costs. The alternative, where such abstractions are unavailable, is for application programmers to control execution using an appropriate explicitly parallel programming model. In this thesis we explore both approaches, represented by the Firedrake framework and the OpenMP programming model respectively. We also explore how OpenMP can support high level abstractions such as Firedrake. Firedrake is designed as a composition of domain-specific abstractions for solving partial differential equations via the finite element method. We extend Firedrake with support for extruded meshes frequently used in geophysical simulations. We introduce algorithms for numbering and iterating over any discretization supported by an extruded mesh. Starting with version 4.0, OpenMP computations, previously intended exclusively for the CPU, can be offloaded to accelerators and coprocessors. We introduce code generation schemes for offloading single and nested OpenMP parallel constructs in the CLANG/LLVM toolchain. The schemes map OpenMP directives to the hardware model of the accelerator enabling the programmer to use OpenMP in a prescriptive way. Performance is evaluated on the extruded mesh extensions to Firedrake as well as on LULESH, a widely ported proxy application intended to be representative of an important portion of Department of Energy’s scientific codes. In the case of Firedrake, performance is shown to reach significant percentages of theoretical hardware limits. For OpenMP, the runtime is compared against hand-optimized implementations employing the accelerator-specific CUDA C/C++ language extensions. The additions to the Firedrake framework combine both approaches into a single toolchain containing a newly introduced OpenMP 4.0 Firedrake backend with functionality equivalent to all existing Firedrake backends. OpenMP 4.0 is used as a single representation for both CPU and GPU platforms thus simplifying the application of target-specific optimisations. The OpenMP 4.0 backend improves maintainability through code reuse and will deliver gains in portability as offloading support in CLANG advances.
Supervisor: Kelly, Paul ; Ham, David Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral