Jake Burkholder 6412c65cf0 Add optimized block copy and zero functions using vis instructions, which
can do 64 bytes at a time and don't allocate lines in the L2 cache.  These
assume that everything is 64 byte aligned, and that there's more than 128
bytes of data (best for whole pages).  The block load and store instructions
don't follow normal memory ordering rules and require either a memory barrier
or move between registers before the data can actually be used.  This
implementation correctly shuffles around 3 out of the 4 sets of registers
in order to avoid memory barriers expect for the last 2 blocks.
2003-04-03 18:43:40 +00:00
..
2003-01-05 03:48:55 +00:00
2003-02-08 20:37:55 +00:00
2003-01-06 17:10:07 +00:00
2003-03-18 08:15:24 +00:00