Parallelize the buffer cache and rewrite getnewbuf(). This results in a

8x performance improvement in a micro benchmark on a 4 socket machine.

 - Get buffer headers from a per-cpu uma cache that sits in from of the
   free queue.
 - Use a per-cpu quantum cache in vmem to eliminate contention for kva.
 - Use multiple clean queues according to buffer cache size to eliminate
   clean queue lock contention.
 - Introduce a bufspace daemon that attempts to prevent getnewbuf() callers
   from blocking or doing direct recycling.
 - Close some bufspace allocation races that could lead to endless
   recycling.
 - Further the transition to a more modern style of small functions grouped
   by prefix in order to improve growing complexity.

Sponsored by:	EMC / Isilon
Reviewed by:	kib
Tested by:	pho
This commit is contained in:
jeff 2015-10-14 02:10:07 +00:00
parent c54a8cc39c
commit 4402204d47
2 changed files with 719 additions and 579 deletions

File diff suppressed because it is too large Load Diff

View File

@ -74,6 +74,7 @@ __FBSDID("$FreeBSD$");
#include <sys/sysctl.h>
#include <sys/systm.h>
#include <sys/selinfo.h>
#include <sys/smp.h>
#include <sys/pipe.h>
#include <sys/bio.h>
#include <sys/buf.h>
@ -229,12 +230,15 @@ vm_ksubmap_init(struct kva_md_info *kmi)
/*
* Allocate the buffer arena.
*
* Enable the quantum cache if we have more than 4 cpus. This
* avoids lock contention at the expense of some fragmentation.
*/
size = (long)nbuf * BKVASIZE;
kmi->buffer_sva = firstaddr;
kmi->buffer_eva = kmi->buffer_sva + size;
vmem_init(buffer_arena, "buffer arena", kmi->buffer_sva, size,
PAGE_SIZE, 0, 0);
PAGE_SIZE, (mp_ncpus > 4) ? BKVASIZE * 8 : 0, 0);
firstaddr += size;
/*