freebsd-skq/sys
Andrew Gallatin a034518ac8 Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA
machine, one must avoid as many NUMA domain crossings as
possible. With SO_REUSEPORT_LB, a number of workers can share a
listen socket. However, even if a worker sets affinity to a core
or set of cores on a NUMA domain, it will receive connections
associated with all NUMA domains in the system. This will lead to
cross-domain traffic when the server writes to the socket or
calls sendfile(), and memory is allocated on the server's local
NUMA node, but transmitted on the NUMA node associated with the
TCP connection. Similarly, when the server reads from the socket,
he will likely be reading memory allocated on the NUMA domain
associated with the TCP connection.

This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A
server can now tell the kernel to filter traffic so that only
incoming connections associated with the desired NUMA domain are
given to the server. (Of course, in the case where there are no
servers sharing the listen socket on some domain, then as a
fallback, traffic will be hashed as normal to all servers sharing
the listen socket regardless of domain). This allows a server to
deal only with traffic that is local to its NUMA domain, and
avoids cross-domain traffic in most cases.

This patch, and a corresponding small patch to nginx to use
TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted
https media content from dual-socket Xeons with only 13% (as
measured by pcm.x) cross domain traffic on the memory controller.

Reviewed by:	jhb, bz (earlier version), bcr (man page)
Tested by: gonzo
Sponsored by:	Netfix
Differential Revision:	https://reviews.freebsd.org/D21636
2020-12-19 22:04:46 +00:00
..
amd64 Skip the vm.pmap.kernel_maps sysctl by default. 2020-12-18 20:41:23 +00:00
arm arm: Remove samsung exnynos port 2020-12-17 17:09:43 +00:00
arm64 Skip the vm.pmap.kernel_maps sysctl by default. 2020-12-18 20:41:23 +00:00
bsm
cam mmccam: Convert some printf to CAM_DEBUG 2020-11-30 14:49:13 +00:00
cddl Check that the frame pointer is within the current stack. 2020-12-08 18:00:58 +00:00
compat Add ELF flag to disable ASLR stack gap. 2020-12-18 23:14:39 +00:00
conf Use a template assembly file for firmware object files. 2020-12-17 20:31:17 +00:00
contrib Make MAXPHYS tunable. Bump MAXPHYS to 1M. 2020-11-28 12:12:51 +00:00
crypto Revert r366943. It did not work as expected. 2020-12-11 00:42:53 +00:00
ddb Add a kstack_contains() helper function. 2020-12-01 17:04:46 +00:00
dev Ensure a minimum packet length before creating a mbuf in if_ure. 2020-12-19 11:03:54 +00:00
dts Brand our DTS with the Linux version it was imported from 2020-10-10 07:18:51 +00:00
fs VFS_QUOTACTL: Remove needless casts of arg 2020-12-17 21:58:10 +00:00
gdb gdb(4): Don't escape GDB special characters at application layer 2020-09-30 14:55:54 +00:00
geom Make MAXPHYS tunable. Bump MAXPHYS to 1M. 2020-11-28 12:12:51 +00:00
gnu Brand our DTS with the Linux version it was imported from 2020-10-10 07:18:51 +00:00
i386 Skip the vm.pmap.kernel_maps sysctl by default. 2020-12-18 20:41:23 +00:00
isa
kern Optionally bind ktls threads to NUMA domains 2020-12-19 21:46:09 +00:00
kgssapi State kgssapi dependency on xdr. 2020-09-17 22:29:38 +00:00
libkern arc4random(9): Integrate with RANDOM_FENESTRASX push-reseed 2020-10-10 21:48:06 +00:00
mips mips: Fix sub-word atomics implementation 2020-12-14 00:47:59 +00:00
modules Make non-debug kernels installable. 2020-12-17 14:20:36 +00:00
net Switch direct rt fields access in rtsock.c to newly-create field acessors. 2020-12-18 22:00:57 +00:00
net80211 net80211: fix a typo 2020-11-04 12:07:33 +00:00
netgraph [ng_socket] Don't take the SOCKBUF_LOCK() twice in the RX data path. 2020-12-17 18:15:07 +00:00
netinet Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain 2020-12-19 22:04:46 +00:00
netinet6 Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain 2020-12-19 22:04:46 +00:00
netipsec Trigger soft lifetime expiration on sequence number 2020-10-16 11:27:01 +00:00
netpfil Fix LINT-NOINET6 build after r368571. 2020-12-14 22:54:32 +00:00
netsmb net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
nfs nfs: clean up empty lines in .c and .h files 2020-09-01 21:25:39 +00:00
nfsclient nfs: clean up empty lines in .c and .h files 2020-09-01 21:25:39 +00:00
nfsserver nfs: Mark unused statistics variable as reserved 2020-11-18 04:35:49 +00:00
nlm nlm: clean up empty lines in .c and .h files 2020-09-01 22:14:52 +00:00
ofed Fix for referencing file via its vnode in ibore. 2020-11-02 10:44:29 +00:00
opencrypto Remove the cloned file descriptors for /dev/crypto. 2020-11-25 00:10:54 +00:00
powerpc Enable ROUTE_MPATH support in GENERIC kernels. 2020-12-14 22:23:08 +00:00
riscv Skip the vm.pmap.kernel_maps sysctl by default. 2020-12-18 20:41:23 +00:00
rpc Fix a potential memory leak in the NFS over TLS handling code. 2020-09-05 00:50:52 +00:00
security audit: rework AUDIT_SYSCLOSE 2020-12-17 18:52:04 +00:00
sys Add ELF flag to disable ASLR stack gap. 2020-12-18 23:14:39 +00:00
teken Do a sweep and remove most WARNS=6 settings 2020-10-01 01:10:51 +00:00
tests Add small tool to invoke kernel test framework tests. 2020-09-02 09:20:40 +00:00
tools Use a template assembly file for firmware object files. 2020-12-17 20:31:17 +00:00
ufs ffs: quiet -Wstrict-prototypes 2020-12-11 22:51:57 +00:00
vm Revert r368523 which fixed contig allocs waiting forever. 2020-12-15 19:38:16 +00:00
x86 dmar: reserve memory windows of PCIe root port 2020-12-09 18:43:58 +00:00
xdr xdr: clean up empty lines in .c and .h files 2020-09-01 22:13:28 +00:00
xen xen: clean up empty lines in .c and .h files 2020-09-01 21:21:55 +00:00
Makefile