The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention due to their applicability in many fields. Another aspect that is gaining increasing attention is the use of a shared virtual address space between processor and accelerators. It can provide several advantages such as programmability and security. The use of a shared address space relies on a time-consuming IOMMU to satisfy address translation requests. In this work, we analyze convolutional workloads in convolutional accelerators, identifying the sensitivity of performance to IOMMU activity. Additionally, based on the analysis done on convolutional workloads, we propose the use of dedicated accelerator registers (Translation Registers) to reduce costly IOMMU accesses. Translation Registers allow reducing execution time by about 20% and the energy consumption related to address translation up to about 55%.

Mannino, M., Peccerillo, B., Mondelli, A., Bartolini, S. (2023). Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support. In CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers (pp.84-90). New York : Association for Computing Machinery [10.1145/3587135.3592208].

Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support

Mannino, Mirco;Peccerillo, Biagio;Bartolini, Sandro
2023-01-01

Abstract

The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention due to their applicability in many fields. Another aspect that is gaining increasing attention is the use of a shared virtual address space between processor and accelerators. It can provide several advantages such as programmability and security. The use of a shared address space relies on a time-consuming IOMMU to satisfy address translation requests. In this work, we analyze convolutional workloads in convolutional accelerators, identifying the sensitivity of performance to IOMMU activity. Additionally, based on the analysis done on convolutional workloads, we propose the use of dedicated accelerator registers (Translation Registers) to reduce costly IOMMU accesses. Translation Registers allow reducing execution time by about 20% and the energy consumption related to address translation up to about 55%.
2023
979-8-4007-0140-5
Mannino, M., Peccerillo, B., Mondelli, A., Bartolini, S. (2023). Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support. In CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers (pp.84-90). New York : Association for Computing Machinery [10.1145/3587135.3592208].
File in questo prodotto:
File Dimensione Formato  
3587135.3592208.pdf

non disponibili

Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.55 MB
Formato Adobe PDF
2.55 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1240514