This paper addresses feedback-directed re-structuring techniques tuned to Non Uniform Cache Architectures (NUCA) in CMPs running multi-threaded applications. Access time to NUCA caches depends on the location of the referred block, so the locality and cache mapping of the application influence the overall performance. We show techniques for altering the distribution of applications into the cache space as to achieve improved average memory access time. In CMPs running multi-threaded applications, the aggregated accesses (and locality) of the processors form the actual cache load and pose specific issues. We consider a number of Splash-2 and Parsec benchmarks on an 8 processor system and we show that a relatively simple remapping algorithm is able to improve the average Static-NUCA (SNUCA) cache access time by 5.5% and allows an SNUCA cache to surpass the performance of a more complex dynamic-NUCA (DNUCA) for most benchmarks. Then, we present a more sophisticated remapping algo- rithm, relying on cache geometry information and on the access distribution statistics from individual processors, that reduces the average cache access time by 10.2% and is very stable across all benchmarks. © 2010 IEEE.

Bartolini, S., Foglia, P., Solinas, M., Prete, A. (2010). Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs. In IEEE International Symposium on Computer Architecture and High Performance Computing (pp.87-94). IEEE Computer Society [10.1109/SBAC-PAD.2010.20].

Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs

BARTOLINI S.;
2010-01-01

Abstract

This paper addresses feedback-directed re-structuring techniques tuned to Non Uniform Cache Architectures (NUCA) in CMPs running multi-threaded applications. Access time to NUCA caches depends on the location of the referred block, so the locality and cache mapping of the application influence the overall performance. We show techniques for altering the distribution of applications into the cache space as to achieve improved average memory access time. In CMPs running multi-threaded applications, the aggregated accesses (and locality) of the processors form the actual cache load and pose specific issues. We consider a number of Splash-2 and Parsec benchmarks on an 8 processor system and we show that a relatively simple remapping algorithm is able to improve the average Static-NUCA (SNUCA) cache access time by 5.5% and allows an SNUCA cache to surpass the performance of a more complex dynamic-NUCA (DNUCA) for most benchmarks. Then, we present a more sophisticated remapping algo- rithm, relying on cache geometry information and on the access distribution statistics from individual processors, that reduces the average cache access time by 10.2% and is very stable across all benchmarks. © 2010 IEEE.
2010
9780769542164
Bartolini, S., Foglia, P., Solinas, M., Prete, A. (2010). Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs. In IEEE International Symposium on Computer Architecture and High Performance Computing (pp.87-94). IEEE Computer Society [10.1109/SBAC-PAD.2010.20].
File in questo prodotto:
File Dimensione Formato  
05644962.pdf

non disponibili

Tipologia: Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 631.45 kB
Formato Adobe PDF
631.45 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/19112
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo