pfsync: Performance improvement

pfsync code is called for every new state, state update and state deletion in pf. While pf itself can operate on multiple states at the same time (on different cores, assuming the states hash to a different hashrow), pfsync only had a single lock. This greatly reduced throughput on multicore systems. Address this by splitting the pfsync queues into buckets, based on the state id. This ensures that updates for a given connection always end up in the same bucket, which allows pfsync to still collapse multiple updates into one, while allowing multiple cores to proceed at the same time. The number of buckets is tunable, but defaults to 2 x number of cpus. Benchmarking has shown improvement, depending on hardware and setup, from ~30% to ~100%. MFC after: 1 week Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D18373
2018-12-06 19:27:15 +00:00 · 2018-12-06 19:27:15 +00:00 · 4fc65bcbe3
commit 4fc65bcbe3
parent e206dc6479
2 changed files with 337 additions and 236 deletions
--- a/share/man/man4/pfsync.4
+++ b/share/man/man4/pfsync.4
@ -26,7 +26,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd August 18, 2017
+.Dd December 6, 2018
 .Dt PFSYNC 4
 .Os
 .Sh NAME
@ -130,6 +130,13 @@ See
 .Xr carp 4
 for more information.
 Default value is 240.
+.It Va net.pfsync.pfsync_buckets
+The number of
+.Nm
+buckets.
+This affects the performance and memory tradeoff.
+Defaults to twice the number of CPUs.
+Change only if benchmarks show this helps on your workload.
 .El
 .Sh EXAMPLES
 .Nm
--- a/sys/netpfil/pf/if_pfsync.c
+++ b/sys/netpfil/pf/if_pfsync.c