6b1de9ceec
These are based on using the FPU to do 64-bit stores. They also use i586-optimized instruction ordering, i586-optimized cache management and a couple of other tricks. They should work on any i*86 with a h/w FPU, but are slower on at least i386's and i486's. They come close to saturating the memory bus on i586's. bzero() can maintain a 3-3-3-3 burst cycle to 66 MHz non-EDO main memory on a P133 (but is too slow to keep up with a 2-2-2-2 burst cycle for EDO - someone with EDO should fix this). bcopy() is several cycles short of keeping up with a 3-3-3-3 cycle for writing. For a P133 writing to 66 MHz main memory, it just manages an N-3-3-3, 3-3-3-3 pair of burst cycles, where N is typically 6. The new routines are not used by default. They are always configured and can be enabled at runtime using a debugger or an lkm to change their function pointer, or at compile time using new options (see another log message). Removed old, dead i586_bzero() and i686_bzero(). Read-before-write is usually bad for i586's. It doubles the memory traffic unless the data is already cached, and data is (or should be) very rarely cached for large bzero()s (the system should prefer uncached pages for cleaning), and the amount of data handled by small bzero()s is relatively small in the kernel. Improved comments about overlapping copies. Removed unused #include.