The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.

Peccerillo, B., & Bartolini, S. (2017). PHAST Library - Enabling single-source and high performance code for GPUs and multi-cores. In The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017) (pp.715-718). NEW YORK, NY 10017 : IEEE [10.1109/HPCS.2017.109].

PHAST Library - Enabling single-source and high performance code for GPUs and multi-cores

PECCERILLO, BIAGIO;BARTOLINI, SANDRO
2017

Abstract

The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.
978-153863250-5
Peccerillo, B., & Bartolini, S. (2017). PHAST Library - Enabling single-source and high performance code for GPUs and multi-cores. In The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017) (pp.715-718). NEW YORK, NY 10017 : IEEE [10.1109/HPCS.2017.109].
File in questo prodotto:
File Dimensione Formato  
bartoliniPeccerillo-Phast-MSPS17-cameraReady.pdf

non disponibili

Descrizione: Articolo principale
Tipologia: PDF editoriale
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 210.08 kB
Formato Adobe PDF
210.08 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11365/1027058