Bruce Evans ddac85e5a0 Further unobfuscate the method of drawing the mouse cursor in vga planar
mode.

Don't manually unroll the 2 inner loops.  On Haswell, doing so gave a
speedup of about 0.5% (about 4 cycles per iteration out of 1400), but
hard-coded a limit of width 9 and made better better optimizations
harder to see.  gcc-4.2.1 -O does the unrolling anyway, unless tricked
with a volatile hack.  gcc's unrolling is not very good and gives a
a speedup of about half as much (about 2 cycles per iteration).  (All
timing on i386.)

Manual unrolling was only feasible because the inner loop only iterates
once or twice.  Usually twice, but a dynamic check is needed to decide,
and was not moved from the second-innermost loop manually or by gcc.
This commit basically adds another dynamic check in the inner loop.

Cursor widths of 10-17 require 3 iterations in the inner loop and this
is not so easy to unroll -- even gcc stops at 2.
2017-04-14 12:03:34 +00:00
..
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00
2017-01-28 02:22:15 +00:00