7f75b60ff7
manipulation away from the length comparison. Measurements on beast.cdrom.com show >3X improvement over the original code on large block sizes, putting the performance on par with the optimized assembly code in libc.