1f04771fa4
gives a tiny but hopefully always free optimization in the 2 quadrants to which it applies. On Athlons, it reduces maximum latency by 4 cycles in these quadrants but has usually has a smaller effect on total time (typically ~2 cycles (~5%), but sometimes 8 cycles when the compiler generates poor code).