In this chapter we describe the Janus supercomputer, a massively parallel FPGA-based system optimized for the simulation of spin-glasses, theoretical models that describe the behavior of glassy materials. The custom architecture of Janus has been developed to meet the computational requirements of these models. Spin-glass simulations are performed using Monte Carlo methods that lead to algorithms characterized by (1) intrinsic parallelism allowing us to implement many Monte Carlo update engines within a single FPGA; (2) rather small data base (2 MByte) that can be stored on-chip, significantly boosting bandwidth and reducing latency. (3) need to generate a large number of good-quality long (≥ 32 bit) random numbers; (4) mostly integer arithmetic and bitwise logic operations. Careful tailoring of the architecture to the specific features of these algorithms has allowed us to embed up to 1024 special purpose cores within just one FPGA, so that simulations of systems that would take centuries on conventional architectures can be performed in just a few months.
M., B.J., R. A., B.n., A., C., L. A., F., J. M., G.N., A., G.G., et al. (2013). A FPGA-based supercomputer for statistical physics: the weird case of Janus. In High-Performance Computing using FPGAs (pp. 481-506). New York : Springer [10.1007/978-1-4614-1791-0_16].
A FPGA-based supercomputer for statistical physics: the weird case of Janus
MAIORANO, Andrea;
2013-01-01
Abstract
In this chapter we describe the Janus supercomputer, a massively parallel FPGA-based system optimized for the simulation of spin-glasses, theoretical models that describe the behavior of glassy materials. The custom architecture of Janus has been developed to meet the computational requirements of these models. Spin-glass simulations are performed using Monte Carlo methods that lead to algorithms characterized by (1) intrinsic parallelism allowing us to implement many Monte Carlo update engines within a single FPGA; (2) rather small data base (2 MByte) that can be stored on-chip, significantly boosting bandwidth and reducing latency. (3) need to generate a large number of good-quality long (≥ 32 bit) random numbers; (4) mostly integer arithmetic and bitwise logic operations. Careful tailoring of the architecture to the specific features of these algorithms has allowed us to embed up to 1024 special purpose cores within just one FPGA, so that simulations of systems that would take centuries on conventional architectures can be performed in just a few months.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1126652