a preference for memory load instructions over large code footprints
with embedded immediate variables.
On amd64 CPUs from 2007-2008 there is not a significant change, but
amd64 CPUs from 2009-2010 get roughly 10% more throughput with this
code; amd64 CPUs from 2011-2012 get roughly 15% more throughput; and
AMD64 CPUs from 2013-2015 get 20-25% more throughput. The Raspberry
Pi 2 increases its throughput by 6-8%.
Sponsored by: Tarsnap Backup Inc.
Performance tested by: allanjude
MFC after: 3 weeks