HashGrid: An optimized architecture for accelerating graph computing on FPGAs

IRIS

Large-scale graph processing poses challenges due to its size and irregular memory access patterns, causing performance degradation in common architectures, such as CPUs and GPUs. Recent research includes accelerating graph processing using Field Programmable Gate Arrays (FPGAs). FPGAs can provide very efficient acceleration thanks to reconfigurable on-chip resources. Although limited, these resources offer a larger design space than CPUs and GPUs. We propose an approach in which data are preprocessed in small chunks with an optimized graph partitioning technique for execution on FPGA accelerators. The chunks, located on the host, are streamed directly into a customized memory layer implemented in the FPGA, which is tightly coupled with the processing elements responsible for the graph algorithm execution. This improves application memory access latency, which is crucial in large-sale graph computing performance. This work presents a hardware design that, combined with graph partitioning, enables us to achieve high-performance and potentially scalable handling of large graphs (i.e., graphs with millions of vertices and billions of edges in current scenarios) while using popular graph algorithms. The proposed framework accelerates performance 56 times compared with CPU (multicore with 16 logical cores in our reference experiments), 2.5 times and 4 times faster compared to state-of-the-art FPGA and GPU solutions (FPGA has 15 compute units, and GPU reference has 128 streaming-multiprocessors in our experiments), respectively, when using the PageRank algorithm. For the Single-Source-Shortest-Past (SSSP) algorithm, we achieve speedups of up to 65x, 26x, and 18x compared to CPU, GPU, and FPGA works, respectively. Lastly, in the context of the Weakly Connected Component (WCC) algorithm, our framework achieves a speedup of up to 403 times compared to the CPU, 7.4x against the GPU, and it is faster than the FPGA alternatives up to 10.3x.

Sahebi, A., Procaccini, M., Giorgi, R. (2025). HashGrid: An optimized architecture for accelerating graph computing on FPGAs. FUTURE GENERATION COMPUTER SYSTEMS, 162, 1-15 [10.1016/j.future.2024.107497].

HashGrid: An optimized architecture for accelerating graph computing on FPGAs

Amin Sahebi;Marco Procaccini;Roberto Giorgi

2025-01-01

Abstract

Large-scale graph processing poses challenges due to its size and irregular memory access patterns, causing performance degradation in common architectures, such as CPUs and GPUs. Recent research includes accelerating graph processing using Field Programmable Gate Arrays (FPGAs). FPGAs can provide very efficient acceleration thanks to reconfigurable on-chip resources. Although limited, these resources offer a larger design space than CPUs and GPUs. We propose an approach in which data are preprocessed in small chunks with an optimized graph partitioning technique for execution on FPGA accelerators. The chunks, located on the host, are streamed directly into a customized memory layer implemented in the FPGA, which is tightly coupled with the processing elements responsible for the graph algorithm execution. This improves application memory access latency, which is crucial in large-sale graph computing performance. This work presents a hardware design that, combined with graph partitioning, enables us to achieve high-performance and potentially scalable handling of large graphs (i.e., graphs with millions of vertices and billions of edges in current scenarios) while using popular graph algorithms. The proposed framework accelerates performance 56 times compared with CPU (multicore with 16 logical cores in our reference experiments), 2.5 times and 4 times faster compared to state-of-the-art FPGA and GPU solutions (FPGA has 15 compute units, and GPU reference has 128 streaming-multiprocessors in our experiments), respectively, when using the PageRank algorithm. For the Single-Source-Shortest-Past (SSSP) algorithm, we achieve speedups of up to 65x, 26x, and 18x compared to CPU, GPU, and FPGA works, respectively. Lastly, in the context of the Weakly Connected Component (WCC) algorithm, our framework achieves a speedup of up to 403 times compared to the CPU, 7.4x against the GPU, and it is faster than the FPGA alternatives up to 10.3x.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista su cui è pubblicata l'opera
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	Citazione
	
				Sahebi, A., Procaccini, M., Giorgi, R. (2025). HashGrid: An optimized architecture for accelerating graph computing on FPGAs. FUTURE GENERATION COMPUTER SYSTEMS, 162, 1-15 [10.1016/j.future.2024.107497].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0167739X24004618-main.pdf accesso aperto Descrizione: Articolo Tipologia: PDF editoriale Licenza: Creative commons Dimensione 2 MB Formato Adobe PDF Visualizza/Apri	2 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1277975