23b7db74d0
I hope that this and the i386 version of it will not be needed, but this is currently about 16 cycles or 36% faster than the C version, and the i386 version is about 8 cycles or 19% faster than the C version, due to poor optimization of the C version.