How Far Can We Go on the x64 Processors?
2006; Springer Science+Business Media; Linguagem: Inglês
10.1007/11799313_22
ISSN1611-3349
Autores Tópico(s)Coding theory and cryptography
ResumoThis paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on the new 64-bit x64 processors, AMD Athlon64 (AMD64) and Intel Pentium 4 (EM64T). We fully utilize newly introduced 64-bit registers and instructions for extracting maximal performance of target primitives. Our program of AES with 128-bit key runs in 170 cycles/block on Athlon 64, which is, as far as we know, the fastest implementation of AES on a PC processor.Also we implemented a “bitsliced” AES and Camellia for the first time, both of which achieved very good performance. A bitslice implementation is important from the viewpoint of a countermeasure against cache timing attacks because it does not require lookup tables with a key-dependent address. We also analyze performance of SHA256/512 and Whirlpool hash functions and show that SHA512 can run faster than SHA256 on Athlon 64. This paper exhibits an undocumented fact that 64-bit right shifts and 64-bit rotations are extremely slow on Pentium 4, which often leads to serious and unavoidable performance penalties in programming encryption primitives on this processor.
Referência(s)