The block cache implementation in loader has proven to be almost useless, and in worst case even slowing down the disk reads due to insufficient cache size and extra memory copy.
Also the current cache implementation does not cache reads from CDs, or work with zfs built on top of multiple disks.
Instead of an LRU, this code uses a simple hash (O(1) read from cache), and instead of a single global cache, a separate cache per block device.
The cache also implements limited read-ahead to increase performance.
To simplify read ahead management, the read ahead will not wrap over bcache end, so in worst case, single block physical read will be performed to fill the last block in bcache.
Booting from a virtual CD over IPMI:
0ms latency, before: 27 second, after: 7 seconds
60ms latency, before: over 12 minutes, after: under 5 minutes.
Submitted by: Toomas Soome <tsoome@me.com>
Reviewed by: delphij (previous version), emaste (previous version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D4713
only happen on every Nth call. Update the existing twiddle() calls done in
various IO loops to roughly reflect the relative IO sizes. That is, tftp
and nfs call twiddle() on every 1K block, ufs on every filesystem block,
so the network calls now use a much larger divisor than disk IO calls.
Also add a new twiddle_divisor() function that allows an application to set
a global divisor that is applied on top of the per-call divisors. Nothing
calls this yet, but loader(8) will be using it to further throttle the
cursor for slow serial consoles.