This paper analyzes the effects of including a full-width GF(2m) Montgomery multiplier within the datapath of an existing embedded processor, aiming to speed-up Elliptic Curve Cryptography (ECC). This approach tends to exploit the tight coupling between the new and the other processor modules while maintaining both software compatibility and high flexibility to adapt to different ECC parameters and algorithms. In addition, the present work focuses on the effects on performance due to the interaction between the new unit and the other processor parts. We show that the modified ARM processor runs the ECC critical operation (kP) 9-times faster than in pure software and up to 14-times faster using 3 units and optimized instruction scheduling. Moreover, the improved processor achieves the same performance with 1/4 sized caches thanks to more than 93% reduction of memory traffic.
Bartolini, S., Castagnini, G., Martinelli, E. (2007). Inclusion of a Montgomery Multiplier Unit into an Embedded Processor's Datapath to Speed-up Elliptic Curve Cryptography. In Third International Symposium onInformation Assurance and Security, 2007. IAS 2007 (pp.95-100). New York : IEEE Computer Society [10.1109/IAS.2007.81].
Inclusion of a Montgomery Multiplier Unit into an Embedded Processor's Datapath to Speed-up Elliptic Curve Cryptography
BARTOLINI S.;MARTINELLI E.
2007-01-01
Abstract
This paper analyzes the effects of including a full-width GF(2m) Montgomery multiplier within the datapath of an existing embedded processor, aiming to speed-up Elliptic Curve Cryptography (ECC). This approach tends to exploit the tight coupling between the new and the other processor modules while maintaining both software compatibility and high flexibility to adapt to different ECC parameters and algorithms. In addition, the present work focuses on the effects on performance due to the interaction between the new unit and the other processor parts. We show that the modified ARM processor runs the ECC critical operation (kP) 9-times faster than in pure software and up to 14-times faster using 3 units and optimized instruction scheduling. Moreover, the improved processor achieves the same performance with 1/4 sized caches thanks to more than 93% reduction of memory traffic.File | Dimensione | Formato | |
---|---|---|---|
04299757.pdf
non disponibili
Tipologia:
Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
594.53 kB
Formato
Adobe PDF
|
594.53 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/3786
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo