Commit Graph

271384 Commits

Author SHA1 Message Date
Gleb Smirnoff
afad340a14 inpcb: garbage collect INP_LOCK_INIT(), used only once in sctp
Reviewed by:		tuexen
Differential revision:	https://reviews.freebsd.org/D33543
2022-01-03 10:20:30 -08:00
Gleb Smirnoff
fec8a8c7cb inpcb: use global UMA zones for protocols
Provide structure inpcbstorage, that holds zones and lock names for
a protocol.  Initialize it with global protocol init using macro
INPCBSTORAGE_DEFINE().  Then, at VNET protocol init supply it as
the main argument to the in_pcbinfo_init().  Each VNET pcbinfo uses
its private hash, but they all use same zone to allocate and SMR
section to synchronize.

Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit
on the socket zone, which was always global.  Historically same
maxsockets value is applied also to every PCB zone.  Important fact:
you can't create a pcb without a socket!  A pcb may outlive its socket,
however.  Given that there are multiple protocols, and only one socket
zone, the per pcb zone limits seem to have little value.  Under very
special conditions it may trigger a little bit earlier than socket zone
limit, but in most setups the socket zone limit will be triggered
earlier.  When VIMAGE was added to the kernel PCB zones became per-VNET.
This magnified existing disbalance further: now we have multiple pcb
zones in multiple vnets limited to maxsockets, but every pcb requires a
socket allocated from the global zone also limited by maxsockets.
IMHO, this per pcb zone limit doesn't bring any value, so this patch
drops it.  If anybody explains value of this limit, it can be restored
very easy - just 2 lines change to in_pcbstorage_init().

Differential revision:	https://reviews.freebsd.org/D33542
2022-01-03 10:17:46 -08:00
Gleb Smirnoff
644ca0846d domains: make domain_init() initialize only global state
Now that each module handles its global and VNET initialization
itself, there is no VNET related stuff left to do in domain_init().

Differential revision:	https://reviews.freebsd.org/D33541
2022-01-03 10:15:22 -08:00
Gleb Smirnoff
24e1c6ae7d domains: init with standard SYSINIT(9) or VNET_SYSINIT()
There left only three modules that used dom_init().  And netipsec
was the last one to use dom_destroy().

Differential revision:	https://reviews.freebsd.org/D33540
2022-01-03 10:15:22 -08:00
Gleb Smirnoff
9880323a99 netipsec: use SYSINIT(9) instead of dom_init/dom_destroy
While here, use just static initializer for key_cb.

Differential revision:	https://reviews.freebsd.org/D33539
2022-01-03 10:15:21 -08:00
Gleb Smirnoff
340c7343f4 protocols: don't execute protosw_init() for every VNET
The function now modifies pr_usrreqs only, which are always
global.  Rename it to pr_usrreqs_init().

Differential revision:	https://reviews.freebsd.org/D33538
2022-01-03 10:15:21 -08:00
Gleb Smirnoff
89128ff3e4 protocols: init with standard SYSINIT(9) or VNET_SYSINIT
The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.

Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
  both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33537
2022-01-03 10:15:21 -08:00
Mark Johnston
321e586e46 posixshm tests: Fix occasional largepage_mprotect failures
largepage_mprotect maps a superpage and later extends the mapping.  This
occasionally fails with ASLR disabled.  To fix this, first try to
reserve a sufficiently large virtual address region.

Reported by:	Jenkins
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-01-03 13:00:50 -05:00
Mark Johnston
0e494a9e3f x86: Skip late calibration if our reference timer has low quality
Some AMD Geode-based systems end up using the 8254 PIT to calibrate the
TSC during late calibration, which doesn't work because that
timecounter's mask (65535) is much smaller than its frequency (1193182).
Moreover, early calibration is done against the 8254 timer anyway.

Work around the problem by simply using early calibration results if no
high-quality timecounters exist.

PR:		260868
Fixes:		22875f8879 ("x86: Implement deferred TSC calibration")
Reported and tested by:	mike@sentex.net, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Reviewed by:	imp, kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33730
2022-01-03 13:00:50 -05:00
Mark Johnston
5ba4192565 Remove an obsolete warning from NOTES
The PREEMPTION option is enabled in all GENERIC kernel configurations.

MFC after:	1 week
2022-01-03 13:00:50 -05:00
Kristof Provost
80871aeb0f udp_var.h: other headers already include types.h
Pointed out by:	imp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-01-03 18:35:02 +01:00
Jessica Clarke
3582b9e372 arm64: Check for intrng-reported errors in gicv3_its
Currently, any errors when adding a PIC child handler are ignored,
instead just continuing on to registering that PIC as an MSI, and
ignoring any errors that occur for that too.

Reviewed by:	andrew
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33342
2022-01-03 17:09:42 +00:00
Jessica Clarke
a3e828c91d intrng: Use less confusing return value for intr_pic_add_handler
Currently intr_pic_add_handler either returns the PIC you gave it (which
is useless and risks causing confusion about whether it's creating
another PIC) or, on error, NULL. Instead, convert it to return an int
error code as one would expect.

Note that the only consumer of this API, arm64's gicv3_its, does not use
the return value, so no uses need updating to work with the revised API.

Reviewed by:	markj, mmel
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33341
2022-01-03 17:08:44 +00:00
Ed Maste
1a0a41b105 ar: accept but ignore 'T' option
In previous versions of BSD ar -T was an alias for -f -- use only the
first 15 characters of archive member names.  In GNU ar and LLVM ar -T
creates a thin archive.

The -f / old BSD ar -T functionality is not particularly useful, and
ignoring -T still results in a usable and compatible (but not thin)
archive.

An exp-run found a few ports invoking ar -T but they all expect thin
archives.  In addition, -T will be used to specify thin archives after
a migration to LLVM-ar.

PR:             260523 [exp-run]
Reviewed by:	markj
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D33553
2022-01-03 11:42:59 -05:00
Corvin Köhne
9fe79f2f2b bhyve: dynamically register FwCtl ports
Qemu's FwCfg uses the same ports as Bhyve's FwCtl. Static allocated
ports wouldn't allow to switch between Qemu's FwCfg and Bhyve's
FwCtl.

Reviewed by:    markj
MFC after:      2 weeks
Sponsored by:   Beckhoff Automation GmbH & Co. KG
Differential Revision:  https://reviews.freebsd.org/D33496
2022-01-03 16:32:55 +01:00
Corvin Köhne
7d55d29508 bhyve: add more slop to 64 bit BARs
Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables
based on this allocation. If the guest decides to relocate those BARs
above 4 GB, it could lead to mismatching ACPI tables. Especially
when using OVMF with enabled bus enumeration it could cause
issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS
may be unable to recover from this situation and disables some PCI
devices because their BARs are located outside of the MMIO space
reported by ACPI. Avoid this situation by giving the guest more
space for relocating BARs.

Let's be paranoid. The available space for BARs below 4 GB is 512 MB
large. Use a slop of 512 MB. It'll allow the guest to relocate all
BARs below 4 GB to an address above 4 GB. We could run into issues
when we exceeding the memlimit above 4 GB. However, this space has
a size of 32 GB. Even when using many PCI device with large BARs
like framebuffer or when using multiple PCI busses, it's very
unlikely that we run out of space due to the large slop.
Additionally, this situation will occur on startup and not at runtime
which is much better.

Reviewed by:    markj
MFC after:      2 weeks
Sponsored by:   Beckhoff Automation GmbH & Co. KG
Differential Revision:  https://reviews.freebsd.org/D33118
2022-01-03 16:32:55 +01:00
Corvin Köhne
8ec366ec6c bhyve: allow reading of fwctl signature multiple times
At the moment, you only have one single chance to read the fwctl
signature. At boot bhyve is in the state IDENT_WAIT. It's then
possible to switch to IDENT_SEND. After bhyve sends the signature,
it switches to REQ. From now on it's impossible to switch back to
IDENT_SEND to read the signature. For that reason, only a single
driver can read the signature. A guest can't use two drivers to
identify that fwctl is present. It gets even worse when using
OVMF. OVMF uses a library to access fwctl. Therefore, every single
OVMF driver would try to read the signature. Currently, only a
single OVMF driver accesses the fwctl. So, there's no issue with
it yet. However, no OS driver would have a chance to detect fwctl when
using OVMF because it's signature was already consumed by OVMF.

Reviewed by:    markj
MFC after:      2 weeks
Sponsored by:   Beckhoff Automation GmbH & Co. KG
Differential Revision:  https://reviews.freebsd.org/D31981
2022-01-03 16:32:55 +01:00
Corvin Köhne
01f9362ef4 bhyve: enumerate BARs by size
E.g. Framebuffers can require large space and BARs need to be aligned
by their size. If BARs aren't allocated by size, it'll cause much
fragmentation of the MMIO space. Reduce fragmentation by ordering
the BAR allocation on their size to reduce the risk of
OUT_OF_MMIO_SPACE issues.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D28278
2022-01-03 16:32:55 +01:00
Corvin Köhne
338a1be836 bhyve: only init MSI-X table if passthru device supports it
Some passthru devices only support MSI instead of MSI-X. For those
devices the initialization of MSI-X table will fail. Re-add the
check erroneously removed in f1442847c9.

MFC after:	3 days
X-MFC with:	f1442847c9
PR:		260148
Reviewed by:	manu, bz
Differential Revision:	https://reviews.freebsd.org/D33728
2022-01-03 14:55:10 +00:00
Warner Losh
b94ed3bc5a sys.mk: Stop rewriting mips* to get MACHINE_CPUARCH
With mips no longer supported, we can GC the substitution from here.

Sponsored by:		Netflix
2022-01-03 08:00:09 -07:00
Warner Losh
ad0a65469a bsd.endian.mk: Remove arm big endian
Remove vestiges of arm big endian support. Also use the more proper
MACHINE_CPUARCH instead of MACHINE to test for that here.

This leaves powerpc as the only big endian arch.

Sponsored by:		Netflix
2022-01-03 08:00:09 -07:00
Warner Losh
577075538c bsd.endian.mk: Remove mips
Remove the enumeration of the big vs little endian platform names.

Sponsored by:		Netflix
2022-01-03 08:00:09 -07:00
Warner Losh
69ee64c1c2 src.opts.mk: Remove most of the mips support
Mips had a number of special cases that disabled features that didn't
work. Remove them all. However, retain the llvm mips bits because that
requires a lot more effort to unwind and will be done separately.

Sponsored by:		Netflix
2022-01-03 08:00:09 -07:00
Warner Losh
8d6197929d meta: Remove mips support
Mips is no longer a supported target, remove it.

Sponsored by:		Netflix
2022-01-03 08:00:08 -07:00
Warner Losh
9b93d7589a bsd.cpu.mk: Remove mips support
Remove the tweaks to the compiler, as well as additional command line
args to get the proper endian, word size and floating style.

Sponsored by:		Netflix
2022-01-03 08:00:08 -07:00
Warner Losh
539d322082 bsd.compat.mk: Remove support for mips64
No longer need to care about mips32 binaries on mips64 for lib32
support.

Sponsored by:		Netflix
2022-01-03 08:00:08 -07:00
Warner Losh
98e58025a5 bsd.lib/prog.mk: Remove special case for mips
We no longer need to set the TLS model for mips64*.

Sponsored by:		Netflix
2022-01-03 08:00:08 -07:00
Warner Losh
d889875b78 bsd.opts.mk: Remove mips support
We don't need to list all the 32-bit mips variants here anymore.

Sponsored by:		Netflix
2022-01-03 08:00:08 -07:00
Kristof Provost
aa70361d86 headers: make a few more headers self-contained
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-01-03 10:12:30 +01:00
Kristof Provost
9d406e088e dnctl: Support reading config from file like ipfw(8)
Extend the dnctl (dummynet config) tool to be able to read commands from
a file, just like ipfw already does.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33627
2022-01-03 09:50:18 +01:00
Jason A. Harmening
9e891d43f5 unionfs: implement VOP_SET_TEXT/VOP_UNSET_TEXT
The implementation simply passes the text ref to the appropriate
underlying vnode.  Without this, the default [un]set_text
implementation will only manage the text ref on the unionfs vnode,
causing it to be out of sync with the underlying filesystems and
potentially allowing corruption of executable file contents.
On INVARIANTS kernels, it also readily produces a panic on process
termination because the VM object representing the executable mapping
is backed by the underlying vnode, not the unionfs vnode.

PR:	251342
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D33611
2022-01-02 19:52:58 -08:00
Jason A. Harmening
d877dd5767 unionfs: simplify writecount management
Use atomics to track the writecount granted to the underlying FS,
and avoid holding the vnode interlock while calling the underling FS'
VOP_ADD_WRITECOUNT().  This also fixes a WITNESS warning about nesting
the same lock type.  Also add comments explaining why we need to track
the writecount on the unionfs vnode in the first place.  Finally,
simplify writecount management to only use the upper vnode and assert
that we shouldn't have an active writecount on the lower vnode through
unionfs.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D33611
2022-01-02 19:52:58 -08:00
Gleb Smirnoff
ca573c9a17 sshd: update the libwrap patch to drop connections early
OpenSSH has dropped libwrap support in OpenSSH 6.7p in 2014
(f2719b7c in github.com/openssh/openssh-portable) and we
maintain the patch ourselves since 2016 (a0ee8cc636).

Over the years, the libwrap support has deteriotated and probably
that was reason for removal upstream.  Original idea of libwrap was
to drop illegitimate connection as soon as possible, but over the
years the code was pushed further down and down and ended in the
forked client connection handler.

The negative effects of late dropping is increasing attack surface
for hosts that are to be dropped anyway.  Apart from hypothetical
future vulnerabilities in connection handling, today a malicious
host listed in /etc/hosts.allow still can trigger sshd to enter
connection throttling mode, which is enabled by default (see
MaxStartups in sshd_config(5)), effectively casting DoS attack.
Note that on OpenBSD this attack isn't possible, since they enable
MaxStartups together with UseBlacklist.

A only negative effect from early drop, that I can imagine, is that
now main listener parses file in /etc, and if our root filesystems
goes bad, it would get stuck.  But unlikely you'd be able to login
in that case anyway.

Implementation details:

- For brevity we reuse the same struct request_info.  This isn't
  a documented feature of libwrap, but code review, viewing data
  in a debugger and real life testing shows that if we clear
  RQ_CLIENT_NAME and RQ_CLIENT_ADDR every time, it works as intended.
- We set SO_LINGER on the socket to force immediate connection reset.
- We log message exactly as libwrap's refuse() would do.

Differential revision:	https://reviews.freebsd.org/D33044
2022-01-02 18:32:30 -08:00
Konstantin Belousov
d9cacbf4b0 sched_get/setaffinity(): try to be more compatible with Linux
in handling the cpuset sizes different from sizeof(cpuset_t).

For both cases, cpuset size shorter than sizeof(cpuset_t) results
in EINVAL on Linux.

For sched_getaffinity(), be more permissive and accept cpuset size
larger than our cpuset_t, by clipping the syscall argument and zeroing
the rest of the output buffer.  For sched_setaffinity(), we should allow
shorter cpusets than current ABI size, again zeroing the rest of the bits.

With this change, python os.sched_get/setaffinity functions work.

Reported by:	se
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-01-03 04:31:40 +02:00
Xin LI
9026652101 libmagic: Remove support for older FreeBSD where xlocale was not available.
The MINIMUM_SUPPORTED_OSREL is 1002501 (FreeBSD 10.3), and xlocale is
supported there.

While I'm there, explicitly use config.h generated with --disable-bzlib
--disable-xzlib instead of deleting them manually.

MFC after:	2 weeks
2022-01-02 18:05:08 -08:00
Alan Somers
f284bed200 geom_gate: ensure readprov is null-terminated
With crafted input to the G_GATE_CMD_CREATE ioctl, geom_gate can be made
to print kernel memory to the system console, potentially revealing
sensitive data from whatever was previously in that memory page.

But but but: this is a case of the sys admin misconfiguring, and you'd
need root privileges to do this.

Submitted By:	Johannes Totz <jo@bruelltuete.com>
MFC after:	2 weeks
Reviewed By:	asomers
Differential Revision: https://reviews.freebsd.org/D31727
2022-01-02 18:01:23 -07:00
Alan Somers
6226477a46 Various fixes for ggatec and ggated
Dynamically size buffers in ggatec. Instead of static size on the stack.
Add flush support.

Submitted by:	Johannes Totz <jo@bruelltuete.com>
MFC after:	2 weeks
Reviewed by:	asomers
Differential Revision: https://reviews.freebsd.org/D31722
2022-01-02 17:53:55 -07:00
Robert Watson
7776d3ccd1 Add a -q flag to ministat to suppress headers in output, for use with -n.
Reviewed by:	jrtc27
Differential Revision: https://reviews.freebsd.org/D33724
MFC after:	2 weeks
2021-12-18 22:53:03 +00:00
Kirk McKusick
1fbcaa13b0 When doing a read-only mount of a UFS filesystem using gjournal(8),
suppress error message about a missing gjournal provider.

Submitted by: Andreas Longwitz
MFC after:    2 weeks
Sponsored by: Netflix
2022-01-02 14:04:39 -08:00
Robert Wing
7c9948c2e9 skip test case nvlist_send_recv__send_many_fds__dgram
If I'm not mistaken, the underlying sendmsg() for nvlist_send() is
failing with ENOBUFS. In turn, nvlist_recv() returns NULL because it
didn't receive the expected number of file descriptors.

Adjusting net.local.dgram.recvspace worked on my local machine, but on
CI the test still fails consistently.

PR:     260891
2022-01-02 12:26:07 -09:00
Colin Percival
698727d637 Fix variable name: freq_khz -> freq
An earlier version of this code computed the TSC frequency in kHz.
When the code was changed to compute the frequency more accurately,
the variable name was not updated.

Reviewed by:	markj
Fixes:		22875f8879 x86: Implement deferred TSC calibration
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33696
2022-01-02 13:07:53 -08:00
Colin Percival
9cb3288287 Skip TSC calibration if exact value known
It's possible that the "early" TSC calibration gave us a value which
is known to be exact; in that case, skip the later re-calibration.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33695
2022-01-02 13:07:53 -08:00
Jessica Clarke
324150d6da ufs: Avoid subobject overflow in snapshot expunge code
The code here tries to be smart and zeroes out both di_db and di_ib with
a single bzero call, thereby overrunning the di_db subobject. This is
fine on most architectures, if a little dodgy. However, on CHERI, the
compiler can optionally restrict the bounds on pointers to subobjects to
just that subobject, in order to mitigate intra-object buffer overflows,
and this is enabled in CheriBSD's pure-capability kernels.

Instead, use separate bzero calls for each array, and let the compiler
optimise it as it sees fit; even if it's not generating inline zeroing
code, Clang will happily optimise two consecutive bzero's to a single
larger call.

Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D33651
2022-01-02 20:55:49 +00:00
Jessica Clarke
5b13fa7987 ufs: Rework shortlink handling to avoid subobject overflows
Shortlinks occupy the space of both di_db and di_ib when used. However,
everywhere that wants to read or write a shortlink takes a pointer do
di_db and promptly runs off the end of it into di_ib. This is fine on
most architectures, if a little dodgy. However, on CHERI, the compiler
can optionally restrict the bounds on pointers to subobjects to just
that subobject, in order to mitigate intra-object buffer overflows, and
this is enabled in CheriBSD's pure-capability kernels.

Instead, clean this up by inserting a union such that a new di_shortlink
can be added with the right size and element type, avoiding the need to
cast and allowing the use of the DIP macro to access the field. This
also mirrors how the ext2fs code implements extents support, with the
exact same structure other than having a uint32_t i_data[] instead of a
char di_shortlink[].

Reviewed by:	mckusick, jhb
Differential Revision:	https://reviews.freebsd.org/D33650
2022-01-02 20:55:36 +00:00
Konstantin Belousov
04fd468da0 mountmsdosfs(): some style
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33721
2022-01-02 22:25:07 +02:00
Konstantin Belousov
642f77be1d amd64 sigtramp: comment-out annotations for registers with DWARF number >= 32
Sponsored by:	The FreeBSD Foundation
2022-01-02 21:00:52 +02:00
Doug Moore
f1e7a532d1 busdma: _bus_dmamap_addseg repaired
A recent change introduced a one-off error into a test allowing
coalescing chunks into segments.  This fixes that error.

broke a check in _bus_dmamap_addseg on many architectures. This change makes it clear that it is not a particular range that is being boundary-checked, but the proposed union of the two adjacent ranges.
Reported by:	se
Reviewed by:	se
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
Differential Revision:	https://reviews.freebsd.org/D33715
2022-01-02 12:37:05 -06:00
Gordon Bergling
1b90dfa5d2 tcp_bbr(4): Fix a few typos in sysctl descriptions
- s/measurment/measurement/

MFC after:	3 days
2022-01-02 18:03:10 +01:00
Poul-Henning Kamp
79f38143bd sesutil: Widen "Desc" field to fit "Drive Slot 23" 2022-01-02 11:44:02 +00:00
sebastien.bini
bffefaf3e1 pmcstudy: fix error handling
Close file descriptor in the correct way if no counters
are built into the application.

Obtained from:		Stormshield
2022-01-02 10:51:07 +01:00