c96e16b3a6
multiples of 8. Then the misaligned pixels at the end were not copied. Clean up variable misuse related to this bug. The width in bytes was first calculated correctly and used to do complicated reblocking correctly, but it was stored in an unrelated scratch variable and later recalculated with an off-by-1-error, so the last byte (times 4 planes) in the intermediate copy was not copied. This doubly-misaligned case is especially slow. Misalignment complicates the reblocking, and each misaligment requires a read before write, and this read is still not done from the shadow buffer.