The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention due to their applicability in many fields. Another aspect that is gaining increasing attention is the use of a shared virtual address space between processor and accelerators. It can provide several advantages such as programmability and security. The use of a shared address space relies on a time-consuming IOMMU to satisfy address translation requests. In this work, we analyze convolutional workloads in convolutional accelerators, identifying the sensitivity of performance to IOMMU activity. Additionally, based on the analysis done on convolutional workloads, we propose the use of dedicated accelerator registers (Translation Registers) to reduce costly IOMMU accesses. Translation Registers allow reducing execution time by about 20% and the energy consumption related to address translation up to about 55%.
Mannino, M., Peccerillo, B., Mondelli, A., Bartolini, S. (2023). Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support. In CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers (pp.84-90). New York : Association for Computing Machinery [10.1145/3587135.3592208].
Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support
Mannino, Mirco;Peccerillo, Biagio;Bartolini, Sandro
2023-01-01
Abstract
The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention due to their applicability in many fields. Another aspect that is gaining increasing attention is the use of a shared virtual address space between processor and accelerators. It can provide several advantages such as programmability and security. The use of a shared address space relies on a time-consuming IOMMU to satisfy address translation requests. In this work, we analyze convolutional workloads in convolutional accelerators, identifying the sensitivity of performance to IOMMU activity. Additionally, based on the analysis done on convolutional workloads, we propose the use of dedicated accelerator registers (Translation Registers) to reduce costly IOMMU accesses. Translation Registers allow reducing execution time by about 20% and the energy consumption related to address translation up to about 55%.File | Dimensione | Formato | |
---|---|---|---|
3587135.3592208.pdf
non disponibili
Tipologia:
PDF editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2.55 MB
Formato
Adobe PDF
|
2.55 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1240514