Title:
|
Portable, predictable and partitionable : a domain specific approach to heterogeneous computing
|
Computing is increasingly heterogeneous. Beyond Central Processing Units (CPUs), different architectures such as massively parallel Graphics Processing Units (GPUs) and reconfigurable Field Programmable Gate Arrays (FPGAs) are seeing widespread adoption. However, the failure of conventional programming approaches to support portable execution, predict the runtime characteristics and partition workloads optimally is hindering the realisation of heterogeneous computing. By narrowing the scope of expression in a natural manner, using a domain specific approach, these three challenges can be addressed. A domain specific heterogeneous computing methodology enables three features: Portability, Prediction and Partitioning. Portable, efficient execution is enabled by a domain specific approach because only a subset of domain functions need to be supported across the heterogeneous computing platforms. Predictive models of runtime characteristics are enabled as the structure of the domain functions may be analysed a priori. Finally optimal partitioning is possible because the metric models can be used to form an optimisation program that can be solved by either heuristic, machine learning or Mixed Integer Linear Programming (MILP) approaches. Using the example of the application domain of financial derivatives pricing, a domain specific application framework, the Forward Financial Framework (F^3), can execute a single pricing task upon a diverse range of CPU, GPU and FPGA platforms from many different vendors. Not only do these portable implementations exhibit strong parallel scaling, but are competitive with state-of-the-art, expert created implementations of the same option pricing problems. Furthermore, F^3 can model the crucial runtime metrics of latency and accuracy for these heterogeneous platforms using a small benchmarking procedure to within 10% of the run-time value of these metrics. Finally, the framework can optimally partition work across heterogeneous platforms, using a MILP framework, that is up to 270 times more efficient than what is achieved by using a heuristic approach.
|