Parallelize the buffer cache and rewrite getnewbuf(). This results in a

8x performance improvement in a micro benchmark on a 4 socket machine. - Get buffer headers from a per-cpu uma cache that sits in from of the free queue. - Use a per-cpu quantum cache in vmem to eliminate contention for kva. - Use multiple clean queues according to buffer cache size to eliminate clean queue lock contention. - Introduce a bufspace daemon that attempts to prevent getnewbuf() callers from blocking or doing direct recycling. - Close some bufspace allocation races that could lead to endless recycling. - Further the transition to a more modern style of small functions grouped by prefix in order to improve growing complexity. Sponsored by: EMC / Isilon Reviewed by: kib Tested by: pho
2015-10-14 02:10:07 +00:00 · 2015-10-14 02:10:07 +00:00 · 4402204d47
commit 4402204d47
parent c54a8cc39c
2 changed files with 719 additions and 579 deletions
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
--- a/sys/vm/vm_init.c
+++ b/sys/vm/vm_init.c
@ -74,6 +74,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/sysctl.h>
 #include <sys/systm.h>
 #include <sys/selinfo.h>
+#include <sys/smp.h>
 #include <sys/pipe.h>
 #include <sys/bio.h>
 #include <sys/buf.h>
@ -229,12 +230,15 @@ vm_ksubmap_init(struct kva_md_info *kmi)

 	/*
 	 * Allocate the buffer arena.
+	 *
+	 * Enable the quantum cache if we have more than 4 cpus.  This
+	 * avoids lock contention at the expense of some fragmentation.
 	 */
 	size = (long)nbuf * BKVASIZE;
 	kmi->buffer_sva = firstaddr;
 	kmi->buffer_eva = kmi->buffer_sva + size;
 	vmem_init(buffer_arena, "buffer arena", kmi->buffer_sva, size,
-	    PAGE_SIZE, 0, 0);
+	    PAGE_SIZE, (mp_ncpus > 4) ? BKVASIZE * 8 : 0, 0);
 	firstaddr += size;

 	/*