Commit Graph

273666 Commits

Author SHA1 Message Date
gallatin
30bdda0a92 Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA
machine, one must avoid as many NUMA domain crossings as
possible. With SO_REUSEPORT_LB, a number of workers can share a
listen socket. However, even if a worker sets affinity to a core
or set of cores on a NUMA domain, it will receive connections
associated with all NUMA domains in the system. This will lead to
cross-domain traffic when the server writes to the socket or
calls sendfile(), and memory is allocated on the server's local
NUMA node, but transmitted on the NUMA node associated with the
TCP connection. Similarly, when the server reads from the socket,
he will likely be reading memory allocated on the NUMA domain
associated with the TCP connection.

This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A
server can now tell the kernel to filter traffic so that only
incoming connections associated with the desired NUMA domain are
given to the server. (Of course, in the case where there are no
servers sharing the listen socket on some domain, then as a
fallback, traffic will be hashed as normal to all servers sharing
the listen socket regardless of domain). This allows a server to
deal only with traffic that is local to its NUMA domain, and
avoids cross-domain traffic in most cases.

This patch, and a corresponding small patch to nginx to use
TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted
https media content from dual-socket Xeons with only 13% (as
measured by pcm.x) cross domain traffic on the memory controller.

Reviewed by:	jhb, bz (earlier version), bcr (man page)
Tested by: gonzo
Sponsored by:	Netfix
Differential Revision:	https://reviews.freebsd.org/D21636
2020-12-19 22:04:46 +00:00
gallatin
e41ad4add5 Optionally bind ktls threads to NUMA domains
When ktls_bind_thread is 2, we pick a ktls worker thread that is
bound to the same domain as the TCP connection associated with
the socket. We use roughly the same code as netinet/tcp_hpts.c to
do this. This allows crypto to run on the same domain as the TCP
connection is associated with. Assuming TCP_REUSPORT_LB_NUMA
(D21636) is in place & in use, this ensures that the crypto source
and destination buffers are local to the same NUMA domain as we're
running crypto on.

This change (when TCP_REUSPORT_LB_NUMA, D21636, is used) reduces
cross-domain traffic from over 37% down to about 13% as measured
by pcm.x on a dual-socket Xeon using nginx and a Netflix workload.

Reviewed by:	jhb
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21648
2020-12-19 21:46:09 +00:00
gbe
5a4b111b5d libc: Fix most issues reported by mandoc
- varios "new sentence, new line" warnings
- varios "sections out of conventional order" warnings
- varios "unusual Xr order" warnings
- varios "missing section argument" warnings
- varios "no blank before trailing delimiter" warnings
- varios "normalizing date format" warnings

MFC after:	1 month
2020-12-19 14:54:28 +00:00
gbe
e0bc4f1ac7 trim(8): Fix a few issues reported by mandoc
- new sentence, new line
- unusual Xr order: ioctl(2) after da(4)
- unusual Xr order: sysexits(3) after nda(4)

MFC after:	1 week
2020-12-19 13:56:19 +00:00
gbe
7eb780db88 zonectl(8): Fix a few issues reported by mandoc
- Add missing quotation mark for a comment above the .Dd
- inserting missing end of block: Sh breaks Bd
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp before Bd
- empty block: Bd

MFC after:	1 week
2020-12-19 13:51:46 +00:00
gbe
317a176f55 nfsv4(4): Fix a few issues reported by mandoc
- new sentence, new line
- function name without markup: rtalloc()
- function name without markup: VOP_RECLAIM()

MFC after:	1 week
2020-12-19 13:45:39 +00:00
gbe
31192b8da6 bluetooth: Fix a mandoc related issues
- new sentence, new line
- sections out of conventional order: Sh FILES
- unusual Xr order: bthost(1) after bthidd(8)
- no blank before trailing delimiter
- whitespace at end of input line
- sections out of conventional order: Sh EXIT STATUS

MFC after:	1 week
2020-12-19 13:36:59 +00:00
gbe
5ffa3da018 mpsutil(8): Remove trailing whitespace
MFC after:	1 week
2020-12-19 13:23:26 +00:00
gbe
95f60314a2 bhyvectl(8): Normalize the man page date
MFC after:	1 week
2020-12-19 13:21:40 +00:00
gbe
17073b98c6 camdd(8): Fix the man page date
The comment before the .Dd macro was missing a quotation mark, so that
the date of the man page was always today.

MFC after:	3 days
2020-12-19 13:17:25 +00:00
gbe
ab6cb19c32 config: Fix a few mandoc related errors
- new sentence, new line
- no blank before trailing delimiter

MFC after:	1 week
2020-12-19 13:11:44 +00:00
gbe
c799d34dc5 devctl(8): Correct "sections out of conventional order" error
MFC after:	1 week
2020-12-19 13:05:54 +00:00
gbe
6c0d3d13f3 patch(1): Fix a few mandoc related issues
- no blank before trailing delimiter

MFC after:	1 week
2020-12-19 13:00:17 +00:00
gbe
020d3a24c5 uname(1): Fix a typo in the man page date
MFC after:	3 days
2020-12-19 12:55:27 +00:00
gbe
a7616b680b ident(1): Normalizing date format
MFC after:	3 days
2020-12-19 12:54:00 +00:00
gbe
ba1ad11a4d ipfw(8): Fix a few mandoc related issues
- no blank before trailing delimiter
- missing section argument: Xr inet_pton
- skipping paragraph macro: Pp before Ss
- unusual Xr order: syslogd after sysrc
- tab in filled text

There were a few multiline NAT examples which used the .Dl macro with
tabs. I converted them to .Bd, which is a more suitable macro for that case.

MFC after:	1 week
2020-12-19 12:47:40 +00:00
gbe
831e89d846 ping(8): Fix a mandoc related issue
- unusual Xr punctuation: none before traceroute6(8)
2020-12-19 11:57:47 +00:00
gbe
514ca7d347 nvmecontrol(8): Fix a few mandoc related issues and add a SEE ALSO section
- inserting missing end of block: Ss breaks Bl
- skipping paragraph macro: Pp before Ss
- referenced manual not found: Xr nvme 4 (2 times)
- unknown standard specifier: St The

The macro .St can only be used for standards known by mdoc(7). So add a
SEE ALSO section and add a reference to the NVM Express Base Specification.

MFC after:	2 weeks
2020-12-19 11:47:38 +00:00
hselasky
cb8b8da7d8 Ensure a minimum packet length before creating a mbuf in if_ure.
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-12-19 11:03:54 +00:00
gbe
80e7d9e58b devd.conf(5): Fix a mandoc related issue
- sections out of conventional order: Sh SEE ALSO

MFC after:	1 week
2020-12-19 11:03:04 +00:00
hselasky
5018a889e1 Move SYSCTL_ADD_PROC() to unlocked context in if_ure to avoid lock order reversal.
MFC after:	1 week
Reported by:	Mark Millard <marklmi@yahoo.com>
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-12-19 11:00:11 +00:00
gbe
5214378dab sysctl(9): Fix a few mandoc related issues
- missing comma before name: Nm SYSCTL_UQUAD
- bad NAME section content: text

MFC after:	1 week
2020-12-19 10:31:25 +00:00
gbe
383a696e0e ofw_bus_status_okay(9): Fix a few mandoc related issues
- missing comma before name: Nm ofw_bus_status_okay
- missing comma before name: Nm ofw_bus_node_status_okay
- skipping paragraph macro: Pp after Sh

MFC after:	1 week
2020-12-19 10:26:40 +00:00
gbe
f7ae08f89f ofw_bus_is_compatible(9): Fix a few mandoc related issues
- missing comma before name: Nm ofw_bus_is_compatible_strict
- missing comma before name: Nm ofw_bus_node_is_compatible
- missing comma before name: Nm ofw_bus_search_compatible
- skipping paragraph macro: Pp after Sh

MFC after:	1 week
2020-12-19 10:24:36 +00:00
gbe
1abe59e1dd fail(9): Fix a few mandoc related issues
- function name without markup: return()
- function name without markup: print()

MFC after:	1 week
2020-12-19 10:20:22 +00:00
gbe
fcac099f80 driver(9): Fix a mandoc related issue
- sections out of conventional order: Sh SEE ALSO

MFC after:	1 week
2020-12-19 10:18:21 +00:00
gbe
d4c70d520c bhnd_erom(9): Fix a few mandoc related issues
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp after Ss
- skipping paragraph macro: Pp at the end of Ss
- unusual Xr punctuation: none before bhnd_driver_get_erom_class(9)
- unusual Xr punctuation: none before bus_space(9)

MFC after:	1 week
2020-12-19 10:15:58 +00:00
gbe
a0074d7196 bhnd(9): Fix a few mandoc related issues
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp at the end of Ss
- missing section argument: Xr device_set_desc
- unusual Xr punctuation: none before bhnd_erom(9)

MFC after:	1 week
2020-12-19 10:11:37 +00:00
gbe
e0ef5c32f4 disk(9): Fix a few mandoc related errors
- function name without markup: g_io_deliver()
- function name without markup: disk_gone()
- sections out of conventional order: Sh SEE ALSO
- referenced manual not found: Xr MAKE_DEV 9

Actually the man page of MAKE_DEV has never existed.

MFC after:	3 days
2020-12-19 09:55:02 +00:00
gbe
ac2fdc04d2 accept_filter(9): Fix a mandoc related error
- no blank before trailing delimiter
2020-12-19 09:40:05 +00:00
rlibby
47dcf9e234 rtld-elf: link udivmoddi4 from compiler_rt
This fixes the gcc9 build of rtld-elf32 on amd64, which needed an
implementation of udivmoddi4.

rtld-elf uses certain functions normally found in libc, and so it
includes certain files from libc in its own build.  It has two
mechanisms to include files from libc: one that rebuilds source files in
the rtld-elf environment, and one that extracts object files from a
purpose-built no-SSP PIC archive.

In addition to libc functions, rtld-elf may need to link functions
normally found in libcompiler_rt (formerly libgcc).  Now, add an ability
to rebuild libcompiler_rt source files in the rtld-elf environment.  We
don't yet have a need for an object file extraction mechanism.

libcompiler_rt could also supply udivdi3 and umoddi3, but leave them
alone for now.

Reviewed by:	arichardson, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D27665
2020-12-19 08:38:31 +00:00
rlibby
caae510faf rtld-libc: fix incremental build
ar cr is an update of an archive, not a creation of a new one.  During
incremental builds (e.g. with meta mode) the archive was not getting
cleaned, and so could retain now-deleted objects from previous builds.
Now, delete the archive before creating/updating it.

Reviewed by:	arichardson, bdrewery, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D27663
2020-12-19 08:38:27 +00:00
kevans
42670ecd41 kern: cpuset: allow jails to modify child jails' roots
This partially lifts a restriction imposed by r191639 ("Prevent a superuser
inside a jail from modifying the dedicated root cpuset of that jail") that's
perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still
cannot modify their own cpuset, but they can modify child jails' roots to
further restrict them or widen them back to the modifying jails' own mask.

As a side effect of this, the system root may once again widen the mask of
jails as long as they're still using a subset of the parent jails' mask.
This was previously prevented by the fact that cpuset_getroot of a root set
will return that root, rather than the root's parent -- cpuset_modify uses
cpuset_getroot since it was introduced in r327895, previously it was just
validating against set->cs_parent which allowed the system root to widen
jail masks.

Reviewed by:	jamie
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27352
2020-12-19 03:30:06 +00:00
pfg
556c444f77 login(1): when exporting variables check the result of setenv(3)
When exporting a variable we correctly check all the preconditions that
could make setenv(3) fail. Checking the setenv(3) return value seems
redundant, but given that login(1) is critical, it doesn't hurt to have
a post-check.

This change is based on the "Principles of Secure Coding" course by
Matthew Bishop, PhD., which specifically discusses this code in FreeBSD.

(This change redoes r368776 due to a silly mistake)
2020-12-19 03:07:38 +00:00
pfg
7ff15fb1e9 Revert r368776:
login(1): when exporting variables check the result of setenv(3)

mismatch: the return value upon error is -1, so the code was not
doing nothing.
2020-12-19 02:42:14 +00:00
pfg
60c9d20436 login(1): when exporting variables check the result of setenv(3)
When exporting a variable we correctly check all the preconditions that
could make setenv(3) fail. Checking the setenv(3) return value seems
redundant, but given that login(1) is critical, it doesn't hurt to have
a post-check.

This change is based on the "Principles of Secure Coding" course by
Matthew Bishop, PhD., which specifically discusses this code in FreeBSD.

Differential Revision:	https://reviews.freebsd.org/D26966
2020-12-19 02:23:53 +00:00
kib
03496eb9d8 Remove redundand redefinion, fixing build.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-12-19 01:46:47 +00:00
jrtc27
ae337aedf0 usb: Replace ITUNERNET vendor with MICROCHIP and improve product names
These Mini-Box LCDs are using Microchip components and sub-licensed product
IDs. Whilst here, update the constant names and descriptions for the products
to use the names listed on the manufacturer's website rather than vague ones.
The picoLCD 4x20 is named that on the manufacturer's website so prefer that
name, even though linux-usb.org lists it with the numbers reversed as one might
expect.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D27670
2020-12-18 23:31:36 +00:00
mckusick
bf8b102af7 Rename pass4check() to freeblock() and move from pass4.c to inode.c.
The new name more accurately describes what it does and the file move
puts it with other similar functions. Done in preparation for future
cleanups. No functional differences intended.

Sponsored by: Netflix
Historic Footnote: my last FreeBSD svn commit
2020-12-18 23:28:27 +00:00
kib
4a242df81a Add ELF flag to disable ASLR stack gap.
Also centralize and unify checks to enable ASLR stack gap in a new
helper exec_stackgap().

PR:	239873
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-12-18 23:14:39 +00:00
kib
f0f7b63d5b proc.h: Reformat P_ and P2_ definitions.
Use traditional explicit leading zero format for hex numbers.
Align P2_ hex values.
Wrap long lines by splitting comments.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-12-18 23:11:27 +00:00
jrtc27
e2c235ea60 strerror.3: Fix whitespace issue introduced in r368714
MFC with:	368714
2020-12-18 22:10:17 +00:00
melifaro
86ac20efb9 Switch direct rt fields access in rtsock.c to newly-create field acessors.
rtsock code was build around the assumption that each rtentry record
 in the system radix tree is a ready-to-use sockaddr. This assumptions
 turned out to be not quite true:
* masks have their length tweaked, so we have rtsock_fix_netmask() hack
* IPv6 addresses have their scope embedded, so we have another explicit
 deembedding hack.

Change the code to decouple rtentry internals from rtsock code using
 newly-created rtentry accessors. This will allow to eventually eliminate
 both of the hacks and change rtentry dst/mask format.

Differential Revision:	https://reviews.freebsd.org/D27451
2020-12-18 22:00:57 +00:00
jhb
9ed9a2424d Skip the vm.pmap.kernel_maps sysctl by default.
This sysctl node can generate very verbose output, so don't trigger it
for sysctl -a or sysctl vm.pmap.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D27504
2020-12-18 20:41:23 +00:00
mhorne
8a087987ee riscv: report additional known SBI implementations
These implementation IDs are defined in the SBI spec, so we should print
their name if detected.

Submitted by:	Danjel Qyteza <danq1222@gmail.com>
Reviewed by:	jhb, kp
Differential Revision:	https://reviews.freebsd.org/D27660
2020-12-18 20:10:30 +00:00
manu
a08f037c93 arm64: rk3399: Export the watchdog clock
This clock is used by the watchdog IP and can be controlled only
in the secure world.
2020-12-18 16:55:54 +00:00
mhorne
48f753a7b0 amd64: use register macros for gdb_cpu_getreg()
Prefer these newly-added definitions to bare values.

MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2020-12-18 16:16:03 +00:00
mhorne
d28d1ea97b amd64: allow gdb(4) to write to most registers
Similar to the recent patch to arm's gdb stub in r368414, allow GDB to
update the contents of most general purpose registers.

Reviewed by:	cem, jhb, markj
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	44
Differential Revision:	https://reviews.freebsd.org/D27642
2020-12-18 16:09:24 +00:00
markj
f86a436763 acpi: Ensure that adjacent memory affinity table entries are coalesced
The SRAT may contain multiple distinct entries that together describe a
contiguous region of physical memory.  In this case we were not
coalescing the corresponding entries in the memory affinity table, which
led to fragmented phys_avail[] entries.  Since r338431 the vm_phys_segs[]
entries derived from phys_avail[] will be coalesced, resulting in a
situation where vm_phys_segs[] entries do not have a covering
phys_avail[] entry.  vm_page_startup() will not add such segments to the
physical memory allocator, leaving them unused.

Reported by:	Don Morris <dgmorris@earthlink.net>
Reviewed by:	kib, vangyzen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27620
2020-12-18 16:04:48 +00:00
markj
b37967e573 Fix the ipfw service status output when ipfw.ko isn't loaded
Reported by:	lme
Reviewed by:	lme
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27657
2020-12-18 16:02:28 +00:00