For Gnuk, it is good to speed up RSA routine.
Last week, I improved a bit. Digital signing by Gnuk, it took 1.78 second (in version 0.12). With the change, it takes 1.72 second. (Majored by time command for gpg --clearsign. It includes calculation time on host and communication time.)
Then, I improved more. With the change, it takes 1.63 second.
Futher, I improved more. With today's change, it takes 1.54 second.
More, I improved. For Gnuk specific version, it just takes 1.48 second.
To be summarized:
- Use UMULL (32-bitx32-bit => 64-bit) instead of UMULAL (mul and accumulate)
- Loading/storing with more registers using LDM and STM
- Use GCC constraints for registers, condition code, and memory
Note that it's 2048-bit RSA computation. Therefore, it is 1024-bit by 1024-bit multiplication because of CRT. For such a not so long size, Karatsuba (or any divide-and-conquer strategy) doesn't make sense, but tuning in assembly language is important.
Here is the ticket of mine: http://polarssl.org/trac/ticket/26