Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the workload across several cores (in a multi-core) and several nodes (in a multi-node/multi-board configuration). In this paper, the advance in deploying this execution model is shown while developing it by using a combination of a simulator model (i.e., the COTSon framework) and a reconfigurable hardware platform (i.e., the AXIOM-board). The AXIOM platform consists of a custom board based on the Xilinx Zynq Ultrascale+ ZU9EG, which incorporates the largest FPGA available on that System-on-Chip at the moment, four 64-bit ARM cores and two 32-bit ARM cores, up to 32GiB of main memory and several 16Gbit/s transceivers. While a complete DF-Threads system is still under development, but is already capable of running a full Linux OS and simple applications, so some initial results are presented here. In particular, well-known programming models that are used to exploit the Thread-Level Parallelism such as Cilk, OpenMPI and Jump are compared with DF-thread execution. Cilk is good for multi-cores, but it is not suitable for multi-nodes systems. In the latter cases, the distribution of the workload could be managed partly by the programmer when using programming models such as message-passing (OpenMPI has been chosen for reference) or distributed shared-memory (Jump in our case). The obtained results show that a DF-Thread execution on a cluster of eight 4-core boards can provide a speed-up of more than 14x compared to the same configuration when using OpenMPI and more than 80x when compared with a OpenMPI single core, single node execution.

Giorgi, R. (2018). Scalable Embedded Computing through Reconfigurable Hardware: comparing DF-Threads, Cilk, OpenMPI and Jump. MICROPROCESSORS AND MICROSYSTEMS, 63, 66-74 [10.1016/j.micpro.2018.08.005].

Scalable Embedded Computing through Reconfigurable Hardware: comparing DF-Threads, Cilk, OpenMPI and Jump

Giorgi, Roberto
Writing – Original Draft Preparation
2018-01-01

Abstract

Data-Flow Threads (DF-Threads) is a new execution model that permits to seamlessly distribute the workload across several cores (in a multi-core) and several nodes (in a multi-node/multi-board configuration). In this paper, the advance in deploying this execution model is shown while developing it by using a combination of a simulator model (i.e., the COTSon framework) and a reconfigurable hardware platform (i.e., the AXIOM-board). The AXIOM platform consists of a custom board based on the Xilinx Zynq Ultrascale+ ZU9EG, which incorporates the largest FPGA available on that System-on-Chip at the moment, four 64-bit ARM cores and two 32-bit ARM cores, up to 32GiB of main memory and several 16Gbit/s transceivers. While a complete DF-Threads system is still under development, but is already capable of running a full Linux OS and simple applications, so some initial results are presented here. In particular, well-known programming models that are used to exploit the Thread-Level Parallelism such as Cilk, OpenMPI and Jump are compared with DF-thread execution. Cilk is good for multi-cores, but it is not suitable for multi-nodes systems. In the latter cases, the distribution of the workload could be managed partly by the programmer when using programming models such as message-passing (OpenMPI has been chosen for reference) or distributed shared-memory (Jump in our case). The obtained results show that a DF-Thread execution on a cluster of eight 4-core boards can provide a speed-up of more than 14x compared to the same configuration when using OpenMPI and more than 80x when compared with a OpenMPI single core, single node execution.
2018
Giorgi, R. (2018). Scalable Embedded Computing through Reconfigurable Hardware: comparing DF-Threads, Cilk, OpenMPI and Jump. MICROPROCESSORS AND MICROSYSTEMS, 63, 66-74 [10.1016/j.micpro.2018.08.005].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1064538