The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.
Peccerillo, B., Bartolini, S. (2017). PHAST Library - Enabling single-source and high performance code for GPUs and multi-cores. In The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017) (pp.715-718). NEW YORK, NY 10017 : IEEE [10.1109/HPCS.2017.109].
PHAST Library - Enabling single-source and high performance code for GPUs and multi-cores
PECCERILLO, BIAGIO;BARTOLINI, SANDRO
2017-01-01
Abstract
The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.File | Dimensione | Formato | |
---|---|---|---|
bartoliniPeccerillo-Phast-MSPS17-cameraReady.pdf
non disponibili
Descrizione: Articolo principale
Tipologia:
PDF editoriale
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
210.08 kB
Formato
Adobe PDF
|
210.08 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1027058