rijndael_blockDecrypt() as both input and output.
This property is important because inside rijndael we can get away
with allocating just a 16 byte "work" buffer on the stack (which
is very cheap), whereas the calling code would need to allocate the
full sized buffer, and in all likelyhood would have to do so with
an expensive malloc(9).