f03be66539
In my eagerness to eliminate a branch which is taken once per 2^38 bytes of keystream, I forgot that the state words are in host order. Thus, the counter increment code worked fine on little-endian machines, but not on big-endian ones. Switch to a simpler (branchful) solution.