pfsync: Performance improvement

pfsync code is called for every new state, state update and state
deletion in pf. While pf itself can operate on multiple states at the
same time (on different cores, assuming the states hash to a different
hashrow), pfsync only had a single lock.
This greatly reduced throughput on multicore systems.

Address this by splitting the pfsync queues into buckets, based on the
state id. This ensures that updates for a given connection always end up
in the same bucket, which allows pfsync to still collapse multiple
updates into one, while allowing multiple cores to proceed at the same
time.

The number of buckets is tunable, but defaults to 2 x number of cpus.
Benchmarking has shown improvement, depending on hardware and setup, from ~30%
to ~100%.

MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D18373
This commit is contained in:
Kristof Provost 2018-12-06 19:27:15 +00:00
parent e206dc6479
commit 4fc65bcbe3
2 changed files with 337 additions and 236 deletions

View File

@ -26,7 +26,7 @@
.\"
.\" $FreeBSD$
.\"
.Dd August 18, 2017
.Dd December 6, 2018
.Dt PFSYNC 4
.Os
.Sh NAME
@ -130,6 +130,13 @@ See
.Xr carp 4
for more information.
Default value is 240.
.It Va net.pfsync.pfsync_buckets
The number of
.Nm
buckets.
This affects the performance and memory tradeoff.
Defaults to twice the number of CPUs.
Change only if benchmarks show this helps on your workload.
.El
.Sh EXAMPLES
.Nm

File diff suppressed because it is too large Load Diff