Commit Graph

147132 Commits

Author SHA1 Message Date
Jason A. Harmening
d711884e60 Remove unionfs_islocked()
The implementation is racy; if the unionfs vnode is not in fact
locked, vnode private data may be concurrently altered or freed.
Instead, simply rely upon the standard implementation to query the
v_vnlock field, which is type-stable and will reflect the correct
lower/upper vnode configuration for the unionfs node.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
a5d82b55fe Remove an impossible condition from unionfs_lock()
We hold the vnode interlock, so vnode private data cannot suddenly
become NULL.

Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Jason A. Harmening
a18c403fbd unionfs: remove LK_UPGRADE if falling back to the standard lock
The LK_UPGRADE operation may have temporarily dropped the upper or
lower vnode's lock.  If the unionfs vnode was reclaimed during that
window, its lock field will be reset to no longer point at the
upper/lower vnode lock, so the lock operation will use the standard
lock stored in v_lock.  Remove LK_UPGRADE from the flags in this case
to avoid a lockmgr assertion, as this lock has not been previously
owned by the calling thread.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D39272
2023-04-17 20:31:40 -05:00
Ed Maste
00172f3416 geom: use bool for one-bit wide bit-field
A one-bit wide bit-field can take only the values 0 and -1.  Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1".  Fix by using c99 bool.

Reported by:	Clang, via dim
Reviewed by:	dim
Sponsored by:	The FreeBSD Foundation
2023-04-17 15:43:00 -04:00
Gleb Smirnoff
3232b1f4a9 tcp: fix build
The recent 25685b7537 came in conflict with a540cdca31.  Remove the
code that cleans up the old style input queue.  Note that two lines
below we assert that the new style input queue is empty.  The TCP
stacks that use the queue are supposed to flush it in their
tfb_tcp_fb_fini method.
2023-04-17 10:24:20 -07:00
Gleb Smirnoff
a6b55ee6be net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.

This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch.  A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.

Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH.  But the tcp_lro_flush_all() asserts the presence
of network epoch.  Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.

Reviewed by:		zlei, gallatin, erj
Differential Revision:	https://reviews.freebsd.org/D39510
2023-04-17 09:08:35 -07:00
Gleb Smirnoff
a540cdca31 tcp_hpts: use queue(9) STAILQ for the input queue
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39574
2023-04-17 09:07:23 -07:00
Steve Kiernan
48ffacbc84 veriexec: Add function to get label associated with a file
Add mac_veriexec_metadata_get_file_label to avoid the need to
expose internals to other MAC modules.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:33 -04:00
Steve Kiernan
bd4742c970 veriexec: Rename old VERIEXEC_SIGNED_LOAD as VERIEXEC_SIGNED_LOAD32
We need to handle old ioctl from old binary.

Add some missing ioctls.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Steve Kiernan
d195f39d1d veriexec: Add option MAC_VERIEXEC_DEBUG
Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Simon J. Gerraty
8c3e263dc1 veriexec: mac_veriexec_syscall compat32 support
Some 32bit apps may need to be able to use
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL
MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL

Therefore compat32 support is required.

Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Steve Kiernan
8512d82ea0 veriexec: Additional functionality for MAC/veriexec
Ensure veriexec opens the file before doing any read operations.

When the MAC_VERIEXEC_CHECK_PATH_SYSCALL syscall is requested, veriexec
needs to open the file before calling mac_veriexec_check_vp. This is to
ensure any set up is done by the file system. Most file systems do not
explicitly need an open, but some (e.g. virtfs) require initialization
of access tokens (file identifiers, etc.) before doing any read or write
operations.

The evaluate_fingerprint() function needs to ensure it has an open file
for reading in order to evaluate the fingerprint. The ideal solution is
to have a hook after the VOP_OPEN call in vn_open. For now, we open the
file for reading, envaluate the fingerprint, and close the file. While
this leaves a potential hole that could possibly be taken advantage of
by a dedicated aversary, this code path is not typically visited often
in our use cases, as we primarily encounter verified mounts and not
individual files. This should be considered a temporary workaround until
discussions about the post-open hook have concluded and the hook becomes
available.

Add MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL and
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL to mac_veriexec_syscall so we can
fetch and check label contents in an unconstrained manner.

Add a check for PRIV_VERIEXEC_CONTROL to do ioctl on /dev/veriexec

Make it clear that trusted process cannot be debugged. Attempts to debug
a trusted process already fail, but the failure path is very obscure.
Add an explicit check for VERIEXEC_TRUSTED in
mac_veriexec_proc_check_debug.

We need mac_veriexec_priv_check to not block PRIV_KMEM_WRITE if
mac_priv_gant() says it is ok.

Reviewed by:	sjg
Obtained from:	Juniper Networks, Inc.
2023-04-17 11:47:32 -04:00
Mark Johnston
d95fbf4e1a riscv: save the thread pointer in both modes
The contents of frame->tf_tp are uninitialized if accessed by DTrace (in
probe context), resulting in a panic when trying to access the memory
pointed to by tp. This saves the thread pointer to the trap frame when
handling both userland and kernel exceptions.

Reviewed by:	markj, mhorne
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39582
2023-04-17 09:49:52 -04:00
Alexander V. Chernikov
f656a96020 tests: make ktest build on ppc.
MFC after:	2 weeks
2023-04-17 13:47:07 +00:00
Alexander V. Chernikov
9742519b22 netlink: fix operations with link-local routes/gateways.
MFC after:	3 days
2023-04-17 12:04:43 +00:00
Alexander V. Chernikov
b8da3b62a5 tests: add ktest modules to build
MFC after:	2 weeks
2023-04-17 10:46:05 +00:00
Pawel Jakub Dawidek
068913e4ba zfs: Add vfs.zfs.bclone_enabled sysctl.
Keep block cloning disabled by default for now, but allow to enable and
use it after setting vfs.zfs.bclone_enabled to 1, so people can easily
try it.

Approved by:	oshogbo
Reviewed by:	mm, oshogbo
Differential Revision:	https://reviews.freebsd.org/D39613
2023-04-17 03:38:30 -07:00
Zhenlei Huang
401f03445e lagg(4): Correctly define some sysctl variables
939a050ad9 virtualized lagg(4), but the corresponding sysctl of some
virtualized global variables are not marked with CTLFLAG_VNET. A try to
operate on those variables via sysctl will effectively go to the 'master'
copies and the virtualized ones are not read or set accordingly. As a
side effect, on updating the 'master' copy, the virtualized global
variables of newly created vnets will have correct values.

PR:		270705
Reviewed by:	kp
Fixes:		939a050ad9 Virtualize lagg(4) cloner
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39467
2023-04-17 18:24:35 +08:00
Zhenlei Huang
a7acce3491 vnet: Fix a typo in a source code comment
- s/form/from/

MFC after:	3 days
2023-04-17 18:24:35 +08:00
Pawel Jakub Dawidek
1959e122d9 zfs: Merge https://github.com/openzfs/zfs/pull/14739
The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.

Fix the problem by just returning if zil_replaying() returns TRUE.

Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek pawel@dawidek.net

Approved by: oshogbo, mm
2023-04-17 02:22:56 -07:00
Pawel Jakub Dawidek
e0bb199925 zfs: cherry-pick openzfs/zfs@c71fe7164
Fix data corruption when cloning embedded blocks

Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392
Closes #14739

Approved by: oshogbo, mm
2023-04-17 02:19:49 -07:00
Stephen J. Kiernan
88a3358ea4 veriexec: Add SPDX-License-Identifier 2023-04-16 21:23:00 -04:00
Stephen J. Kiernan
894bcc876d sys/modules/Makefile: conditionally add MAC/veriexec modules
Only build MAC/veriexec modules when MK_VERIEXEC is yes or we
are building all modules.

Add VERIEXEC knob to kernel __DEFAULT_NO_OPTIONS

Reviewed by:	sjg
Obtained from:	Juniper Networks, Inc.
2023-04-16 20:24:54 -04:00
Stephen J. Kiernan
8050e0a429 sys/modules/Makefile: add MAC/veriexec modules into the build
Build the MAC/veriexec module and the SHA2, SHA256, SHA384, and
SHA512 fingerprint modules.

Obtained from:	Juniper Networks, Inc.
2023-04-16 19:18:55 -04:00
Simon J. Gerraty
6ae8d57652 mac_veriexec: add mac_priv_grant check for NODEV
Allow other MAC modules to override some veriexec checks.

We need two new privileges:
PRIV_VERIEXEC_DIRECT	process wants to override 'indirect' flag
			on interpreter
PRIV_VERIEXEC_NOVERIFY	typically associated with PRIV_VERIEXEC_DIRECT
			allow override of O_VERIFY

We also need to check for PRIV_VERIEXEC_NOVERIFY override
for FINGERPRINT_NODEV and FINGERPRINT_NOENTRY.
This will only happen if parent had PRIV_VERIEXEC_DIRECT override.

This allows for MAC modules to selectively allow some applications to
run without verification.

Needless to say, this is extremely dangerous and should only be used
sparingly and carefully.

Obtained from:	Juniper Networks, Inc.

Reviewers: sjg
Subscribers: imp, dab

Differential Revision: https://reviews.freebsd.org/D39537
2023-04-16 19:14:40 -04:00
Stephen J. Kiernan
4819e5aeda Add new privilege PRIV_KDB_SET_BACKEND
Summary:
Check for PRIV_KDB_SET_BACKEND before allowing a thread to change
the KDB backend.

Obtained from:	Juniper Networks, Inc.
Reviewers: sjg, emaste
Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39538
2023-04-16 14:37:58 -04:00
Val Packett
77f0e198d9 procctl: add state flags to PROC_REAP_GETPIDS reports
For a process supervisor using the reaper API to track process subtrees,
it is very useful to know the state of the processes on the list.

Sponsored by:   https://www.patreon.com/valpackett
Reviewed by:    kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D39585
2023-04-16 13:48:20 +03:00
Stephen J. Kiernan
b1a00c2b13 Quiet compiler warnings for fget_noref and fdget_noref
Summary:
Typecasting both parts of the comparison to u_int quiets compiler
warnings about signed/unsigned comparison and takes care of positive
and negative numbers for the file descriptor in a single comparison.

Obtained from:	Juniper Netwowrks, Inc.

Reviewers: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39593
2023-04-15 23:50:54 -04:00
Warner Losh
214909d669 Revert "cam: fix up world compilation after previous"
This reverts commit 1d35493e46. It was the wrong fix. 757fc6666b has
the proper fix to include stdbool for userland.

Sponsored by:		Netflix
2023-04-15 18:25:55 -06:00
Warner Losh
757fc6666b cam: Include stdbool.h for userland
Sponsored by:		Netflix
2023-04-15 18:25:22 -06:00
Mateusz Guzik
1d35493e46 cam: fix up world compilation after previous
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 23:11:27 +00:00
Warner Losh
fd02926a68 cam: Properly mask out the status bits to get completion code
ccb_h.status has two parts: the actual status and some addition bits to
indicate additional information. It must be masked before comparing
against completion codes. Add new inline function cam_ccb_success to
simplify this to test whether or not the request succeeded. Most of the
code already does this, but a few places don't (the rest likely should
be converted to use cam_ccb_status and/or cam_ccb_success, but that's
for another day). This caused at least one bug in recognizing devices
behind a SATA port multiplexer, though some of these checks were
fine with the special knowledge of the code paths involved.

PR:			270459
Sponsored by:		Netflix
MFC After:		1 week (and maybe a EN requst)
Reviewed by:		ken, mav
Differential Revision:	https://reviews.freebsd.org/D39572
2023-04-15 16:32:41 -06:00
Mateusz Guzik
63ee747feb zfs: Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"
This reverts commit 519851122b.

It results in data corruption, see:
https://github.com/openzfs/zfs/issues/14753

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 21:34:54 +00:00
Mateusz Guzik
46ac8f2e7d zfs: don't use zfs_freebsd_copy_file_range
There is one data corruption problem reported and fixed upstream, not
cherry-picked here yet.

On top of it the following fires under load:
        VERIFY(zil_replaying(zfsvfs->z_log, tx));

The patch which introduced the entire machinery is a revert candidate,
but as the machinery came with a dedicated feature flag, doing so would
render affected pools read-only at best. To be figured out.

As a temporary bandaid at least stop the active usage.
Note this patch does not make the feature disappear from zpool upgrade.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-15 21:34:54 +00:00
Bjoern A. Zeeb
42742fe725 KASAN: add bus_space*read*_8 for aarch64
Add the remaining bus_space*read*_8 functions conditionally for
only arm64 in order to not break KASAN builds with new code using
one of them.

Suggested by:	markj
Reviewed by:	markj
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D39581
2023-04-15 16:13:56 +00:00
Eugene Grosbein
5ee1c90e50 tmpfs: unbreak module build outside of kernel build environment
MFC after:	3 days
2023-04-15 11:00:03 +07:00
Konstantin Belousov
1e0e335b0f amd64: fix PKRU and swapout interaction
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.

For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().

Reported by:	andrew
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39556
2023-04-15 02:53:59 +03:00
Randall Stewart
3cc7b66732 tcp: stack unloading crash in rack and bbr
Its possible to induce a crash in either rack or bbr. This would be done
if the rack stack were say the default and bbr was being used by a connection.
If the bbr stack is then unloaded and it was active, we will trigger a MPASS assert
in tcp_hpts since the new stack (default rack) would start a timer, and the old stack
(bbr) would have the inp already in hpts.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39576
2023-04-14 15:42:23 -04:00
Alexander V. Chernikov
9f324d8ac2 netlink: make netlink work correctly on CHERI.
Current Netlink message writer code relies on executing callbacks
 with arbitrary data (pointer or integer) to flush the completed
 messages.
This arbitrary data is stored as a union of { void *, uint64_t }.
At some stage, the message flushing code copied this data, using
 direct uint64_t assignment instead of copying the union. It lead
 to failure on CHERI, as sizeof(pointer) == 16 there.

Fix the code by making union non-anonymous and copying it entirely.

Reviewed by:	br, jhb, jrtc27
Differential Revision: https://reviews.freebsd.org/D39557
MFC after:	2 weeks
2023-04-14 16:33:43 +00:00
Alexander V. Chernikov
3e5d0784b9 Testing: add framework for the kernel unit tests.
This changes intends to reduce the bar to the kernel unit-testing by
 introducing a new kernel-testing framework ("ktest") based on Netlink,
 loadable test modules and python test suite integration.

This framework provides the following features:
* Integration to the FreeBSD test suite
* Automatic test discovery
* Automatic test module loading
* Minimal boiler-plate code in both kernel and userland
* Passing any metadata to the test
* Convenient environment pre-setup using python testing framework
* Streaming messages from the kernel to the userland
* Running tests in the dedicated taskqueues
* Skipping or parametrizing tests

Differential Revision: https://reviews.freebsd.org/D39385
MFC after:	2 weeks
2023-04-14 15:47:55 +00:00
Mikhail Pchelin
2f53b5991c net80211: fix a typo in Rx MCS set for unequal modulation case
RX MCS set defines which MCSs are supported for RX, bits 0-31 are for equal
modulation of the streams, bits 33-76 are for unequal case. Current code checks
txstreams variable instead of rxstreams to set bits from 53 to 76 for 4 spatial
streams case.

The modulations are defined in tables 19-38 and 19-41 of the IEEE Std
802.11-2020.

Spotted by bz in https://reviews.freebsd.org/D39476

Reviewed by:		bz
Approved by:		bz
Sponsored by:		Serenity Cybersecurity, LLC
Differential Revision:	https://reviews.freebsd.org/D39568
2023-04-14 18:20:09 +03:00
Mikhail Pchelin
ea26545cc5 net80211: wrong transmit MCS set in HT cap IE
Current code checks whether or not txstreams are equal to rxstreams and if it
isn't - sets needed bits in "Transmit MCS Set". But if they are equal it sets
whole set to zero, which contradicts the standard, if tx and rx streams are
equal 'Tx MCS Set Defined' (table 9-186, IEEE Std 802.11-2020) must be set to
one.

Reviewed by:		bz
Approved by:		bz
Sponsored by:		Serenity Cybersecurity, LLC
Differential Revision:	https://reviews.freebsd.org/D39476
2023-04-14 18:16:29 +03:00
Kyle Evans
d1b6271118 uart(4): add Sunrise Point UART controllers
Sponsored by:	Zenith Electronics LLC
Sponsored by:	Klara, Inc.
2023-04-14 09:58:00 -05:00
Elliott Mitchell
6d765bff6f xen: move common variables off of sys/x86/xen/hvm.c
The xen_domain_type and HYPERVISOR_shared_info variables are shared by
all Xen architectures, so they should be in common rather than
reimplemented by each architecture.

hvm_start_flags is used by xen_initial_domain() and so needs to be in
common.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28982
2023-04-14 15:59:11 +02:00
Julien Grall
5e2183dab8 xen/intr: move sys/x86/xen/xen_intr.c to sys/dev/xen/bus/
The event channel source code or equivalent is needed on all
architectures.  Since much of this is viable to share, get this moved out
of x86-land.  Each interrupt interface then needs a distinct back-end
implementation.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-01-13 17:41:04
Differential Revision: https://reviews.freebsd.org/D30236
2023-04-14 15:58:57 +02:00
Elliott Mitchell
6699c22c1c xen/intr: move interrupt allocation/release to architecture
Simply moving the interrupt allocation and release functions into files
which belong to the architecture.  Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.

Identifying the border between x86 and architecture-independent is
actually quite tricky.  Similarly, getting the prototypes for the
border right is also quite tricky.

Inspired by the work of Julien Grall <julien@xen.org>,
2015-10-20 09:14:56, but heavily adjusted.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
2023-04-14 15:58:56 +02:00
Julien Grall
2d795ab1ea xen/intr: move x86 PIC interface to xen_arch_intr.c, introduce wrappers
The x86 PIC interface is very much x86-specific and not used by other
architectures.  Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.

Introduce wrappers for calls into the architecture-dependent interrupt
layer.  All architectures need roughly the same functionality, but the
interface is slightly different between architectures.  Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.

The original implementation was done by Julien Grall in 2015, but this
has required major updating.

Removal of PVHv1 meant substantial portions disappeared.  The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.

In the original implementation the wrappers were normal functions.  Some
had empty stubs in xen_intr.c and were removed.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
2023-04-14 15:58:56 +02:00
Elliott Mitchell
373301019f xen/intr: remove type argument from xen_intr_alloc_isrc()
This value doesn't need to be set in xen_intr_alloc_isrc().  What is
needed is simply to ensure the allocated xenisrc won't appear as free,
even if xi_type is written non-atomically.  Since the type is no longer
used to indicate free or not, the calling function should take care of
all non-architecture initialization.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31188
2023-04-14 15:58:55 +02:00
Elliott Mitchell
d0a69069bb xen/x86: rework isrc allocation to use list instead of table scanning
Scanning the list of interrupts to find an unused entry is rather
inefficient.  Instead overlay a free list structure and use a list
instead.

This also has the useful effect of removing the last use of evtchn_type
values outside of xen_intr.c.

Reviewed by: royger
[royger]
 - Make avail_list static.
2023-04-14 15:58:54 +02:00
Elliott Mitchell
d32d65276b xen/intr: move evtchn_type to intr-internal.h
The evtchn_type enum is only touched by the Xen interrupt code.  Other
event channel uses no longer need the value, so that has been moved to
restrict its use.

Copyright note.  The current evtchn_type was introduced at 76acc41fb7
by Justin T. Gibbs.  This in turn appears to have been heavily inspired
by 30d1eefe39 done by Kip Macy.

Reviewed by: royger
2023-04-14 15:58:53 +02:00
Julien Grall
ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell
af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell
2794893ebf xen/intr: do full xenisrc initialization during binding
Keeping released xenisrcs in a known state simplifies allocation, but
forces the allocation function to maintain that state.  This turns into
a problem when trying to allow for interchangeable allocation functions.
Fix this issue by ensuring xenisrcs are always *fully* initialized
during binding.

Reviewed by: royger
2023-04-14 15:58:51 +02:00
Elliott Mitchell
ff73b1d69b xen/intr: split xen_intr_isrc_lock uses
There are actually several distinct locking domains in xen_intr.c, but
all were sharing the same lock.  Both xen_intr_port_to_isrc[] and the
x86 interrupt structures needed protection.  Split these two apart as a
precursor to splitting the architecture portions off the file.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:51 +02:00
Elliott Mitchell
834013dea2 xen/intr: rework xen_intr_alloc_isrc() locking
Locking for allocation was being done in xen_intr_bind_isrc(), but the
unlock was inside xen_intr_alloc_isrc().  While the lock acquisition at
the end of xen_intr_alloc_isrc() was to modify xen_intr_port_to_isrc[],
NOT allocation.  Fix this garbled (though working) locking scheme.

Now locking for allocation is strictly in xen_intr_alloc_isrc(), while
locking to modify xen_intr_port_to_isrc[] is in xen_intr_bind_isrc().

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:50 +02:00
Elliott Mitchell
09bd542d17 xen/intr: rework xen_intr_alloc_isrc() call structure
The call structure around xen_intr_alloc_isrc() was rather awful.
Notably finding a structure for reuse is part of allocation, but this
was done outside xen_intr_alloc_isrc().  Move this into
xen_intr_alloc_isrc() so the function handles all allocation steps.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:49 +02:00
Elliott Mitchell
149c581018 xen/intr: adjust xenisrc types, adjust format strings to match
As "CPUs", IRQs (vector) and virtual IRQs are always positive integers,
adjust the Xen code to use unsigned integers.  Several format strings
need adjustment to match.  Additionally single-bit bitfields are
boolean.

No functional change expected.

Reviewed by: royger
2023-04-14 15:58:49 +02:00
Elliott Mitchell
ecdcad6516 xen: remove CONFIG_XEN_COMPAT, purge Xen 3.0 compatibility
This overlaps the purpose of __XEN_INTERFACE_VERSION__.  Remove Xen 3.0.2
compatibility.  __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled.  As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.

Reviewed by: royger
2023-04-14 15:58:48 +02:00
Elliott Mitchell
61ccede8cf xen: purge no longer used hypervisor functions
HYPERVISOR_poll(), HYPERVISOR_block(), and HYPERVISOR_crash() appear no
longer used.  Further get_system_time() appears to have disappeared at
some point in the past, so HYPERVISOR_poll() was broken anyway.

No functional change intended.

Reviewed by: royger
2023-04-14 15:58:47 +02:00
Elliott Mitchell
b2c50bb934 xen/efi: make Xen PV EFI clock optional
The present implementation is only for x86.  Other architectures need
adjustments for querying presence of EFI.

Xen's EFI support is also quite troublesome on non-x86.  This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
2023-04-14 15:58:47 +02:00
Julien Grall
28a78d860e xen: introduce XEN_CPUID_TO_VCPUID()/XEN_VCPUID()
Part of the series for allowing FreeBSD/ARM to run on Xen.  On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.

While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region.  As such make
this a distinct macro.  Only callers in machine independent code have
been switched.

Add a wrapper for the x86 PIC interface to use matching the old
prototype.

Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
2023-04-14 15:58:46 +02:00
Elliott Mitchell
054073c283 xen/intr: xen_intr_bind_isrc() always set handle
Previously the upper layer handle was being set before the last
potential error condition.  The reasoning appears to have been it was
assumed invalid in case of an error being returned.  Now ensure it is
invalid until just before a successful return.

Fixes: 76acc41fb7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 6d54cab1fe ("xen: allow to register event channels without handlers")
Reviewed by: royger
2023-04-14 15:58:45 +02:00
Randall Stewart
9903bf34f0 tcp: rack pacing has some caveats that need to be obeyed when LRO is missing
n further non-LRO testing I found a case where rack is supposed to be waking up but
it is not now. In this special case it sets the flag rc_ack_can_sendout_data. When that is
set we should not prohibit output.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39565
2023-04-14 09:33:36 -04:00
Kristof Provost
b0e38a1373 bridge: distinguish no vlan and vlan 1
The bridge treated no vlan tag as being equivalent to vlan ID 1, which
causes confusion if the bridge sees both untagged and vlan 1 tagged
traffic.

Use DOT1Q_VID_NULL when there's no tag, and fix up the lookup code by
using 'DOT1Q_VID_RSVD_IMPL' to mean 'any vlan', rather than vlan 0. Note
that we have to account for userspace expecting to use 0 as meaning 'any
vlan'.

PR:		270559
Suggested by:	Zhenlei Huang <zlei@FreeBSD.org>
Reviewed by:	philip, zlei
Differential Revision:  https://reviews.freebsd.org/D39478
2023-04-14 13:17:02 +02:00
Zhenlei Huang
9af6f4268a bridge: Use the %D identifier to format MAC address
It is shorter and more readable.

No functional change intended.

Reviewed by:	kp
Fixes:		2d3614fb13 bridge: Log MAC address port flapping
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39542
2023-04-14 18:08:56 +08:00
Kajetan Staszkiewicz
39282ef356 pf: backport OpenBSD syntax of "scrub" option for "match" and "pass" rules
Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.

Obtained from:	OpenBSD
MFC after:	never
Sponsored by:	InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D38025
2023-04-14 09:04:06 +02:00
Gordon Bergling
26713ad9cf arm: Remove a double word in a comment in setjmp
- s/number number/number/

MFC after:	5 days
2023-04-13 20:37:25 +02:00
Gordon Bergling
c159f76713 kern: remove a double word in a KASSERT in subr_trap
- s/with with/with/

MFC after:	5 days
2023-04-13 20:03:37 +02:00
Henri Hennebert
71883128e5 rtsx: Add plug-and-play info
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D35074
2023-04-13 11:12:50 -03:00
Randall Stewart
25685b7537 TCP: Misc cleanups of tcp_subr.c
In going through all the TCP stacks I have found we have a few little bugs and niggles that need to
be cleaned up in tcp_subr.c including the following:

a) Set tcp_restoral_thresh to 450 (45%) not 550. This is a better proven value in my testing.
b) Lets track when we try to do pacing but fail via a counter for connections that do pace.
c) If a switch away from the default stack occurs and it fails we need to make sure the time
   scale is in the right mode (just in case the other stack changed it but then failed).
d) Use the TP_RXTCUR() macro when starting the TT_REXMT timer.
e) When we end a default flow lets log that in BBlogs as well as cleanup any t_acktime (disable).
f) When we respond with a RST lets make sure to update the log_end_status properly.
g) When starting a new pcb lets assure that all LRO features are off.
h) When discarding a connection lets make sure that any t_in_pkt's that might be there are freed properly.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39501
2023-04-13 09:29:05 -04:00
Ed Maste
2ef2c26f3f link_elf: fix SysV hash function overflow
Quoting from https://maskray.me/blog/2023-04-12-elf-hash-function:

The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an output executable or shared
object needing a dynamic symbol table (.dynsym), a linker generates a
.hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH
tag is produced to hold the address of .hash.

The function is supposed to return a value no larger than 0x0fffffff.
Unfortunately, there is a bug. When unsigned long consists of more than
32 bits, the return value may be larger than UINT32_MAX. For instance,
elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns
0x100000002, which is clearly unintended, as the function should behave
the same way regardless of whether long represents a 32-bit integer or
a 64-bit integer.

Reviewed by:	kib, Fangrui Song
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39517
2023-04-12 15:33:55 -04:00
John Baldwin
1ca12bd927 Remove the riscv64sf architecture.
Reviewed by:	jrtc27, arichardson, br, kp, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D39496
2023-04-12 11:09:27 -07:00
Michael Tuexen
2ba2849c82 tcp: fix typo in comment
Reported by:	cc
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2023-04-12 18:08:21 +02:00
Michael Tuexen
c687f21add tcp: make net.inet.tcp.functions_default vnet specific
Reviewed by:		cc, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39516
2023-04-12 18:04:27 +02:00
Randall Stewart
1073f41657 tcp_lro: When processing compressed acks lets support the new early wake feature for rack.
During compressed ack and mbuf queuing we determine if we need to wake up. A
new function was added that is optional to the tfb so that the stack itself can also
be asked if a wakeup should happen. This helps compensate for late hpts calls.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39502
2023-04-12 11:35:14 -04:00
Andrew Turner
421516f25e Create pmap_mask_set_locked on arm64
Create a locked version of pmap_mask_set. We will need this for BTI
support.

Sponsored by:	Arm Ltd
2023-04-12 13:10:13 +01:00
Michael Tuexen
73c48d9d8f tcp: fix deregistering stacks when vnets are used
This fixes a bug where stacks could not be deregistered when
end points in the non-default vnet are using it.

Reviewed by:		glebius, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39514
2023-04-12 10:52:53 +02:00
Zhenlei Huang
c3c5e6c3e6 tarfs: Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
deac4c7f07 iicbus(4): Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
8bd9afe9e1 bxe(4): Use CTLFLAG_RDTUN flag definition
sysctl variables rx_budget and max_aggregation_size are read-only loader
tunable. Mark them with CTLFLAG_RD flag.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
5ff8018108 ice(4): Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
69cb72b872 cam iosched: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
dc1c5138c3 powerpc: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
John Baldwin
cd800d3c96 Enable -Warray-parameter for clang.
I fixed many of these previously for GCC 12 and make tinderbox passes
with this enabled.

Differential Revision:	https://reviews.freebsd.org/D39378
2023-04-11 13:47:59 -07:00
Richard Scheffenegger
2169f71277 tcp: use IPV6_FLOWLABEL_LEN
Avoid magic numbers when handling the IPv6 flow ID for
DSCP and ECN fields and use the named variable instead.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D39503
2023-04-11 18:53:51 +02:00
Konstantin Belousov
c53e990b8d DEBUG_VFS_LOCKS: restore diagnostic for the witness use case
Reviewed by:	jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39477
2023-04-11 15:59:55 +03:00
Konstantin Belousov
75fc6f86c3 Add witness_is_owned(9)
which returns an indicator if the current thread owns the specified
lock.

Reviewed by:	jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39477
2023-04-11 15:59:49 +03:00
Konstantin Belousov
afa8f8971b vn_start_write(): consistently set *mpp to NULL on error or after failed sleep
This ensures that *mpp != NULL iff vn_finished_write() should be
called, regardless of the returned error, except for V_NOWAIT.
The only exception that must be maintained is the case where
vn_start_write(V_NOWAIT) is called with the intent of later dropping
other locks and then doing vn_start_write(V_XSLEEP), which needs the mp
value calculated from the non-waitable call above it.

Also note that V_XSLEEP is not supported by vn_start_secondary_write().

Reviewed by:	markj, mjg (previous version), rmacklem (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39441
2023-04-11 15:59:46 +03:00
Konstantin Belousov
b2f3288747 vn_start_write(): minor style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39441
2023-04-11 15:59:39 +03:00
Eugene Grosbein
37f4cb29bd imgact_binmisc: unbreak module build outside of kernel build environment
MFC after:	3 days
2023-04-11 17:32:29 +07:00
domienschepers
61605e0ae5 net80211: fail for unicast traffic without unicast key
Falling back to the multicast key may cause unicast traffic to leak.
Instead fail when no key is found.

For more information see the 'Framing Frames: Bypassing Wi-Fi Encryption
by Manipulating Transmit Queues' paper.

[ I updated the commit message to reference the paper and the code
comment to record historic behaviour as discussed in private email. ]

Security:	CVE-2022-47522
2023-04-10 23:38:57 +00:00
Randall Stewart
a2b33c9a7a tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when
LRO is not available. This is because LRO will hold off waking up the tcp connection to
processing the inbound packets until the pacing timer is up. Without LRO the pacing only
sort-of works. Sometimes we pace correctly, other times not so much.

This set of changes will make it so pacing works properly in the absence of LRO.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39494
2023-04-10 16:33:56 -04:00
John Baldwin
e222461790 rack: mask and tclass are only used for INET6.
This fixes the LINT-NOINET6 build.
2023-04-10 12:21:03 -07:00
Joseph Koshy
0e9e9048ae
procfs: Sync a documentation comment with the code.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39488
2023-04-10 17:58:46 +00:00
John Baldwin
3b3762c34e sys: Enable -Wunused-but-set-variable for GCC.
It has been enabled for clang for a while now.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39358
2023-04-10 10:36:33 -07:00
John Baldwin
8e9db62e74 zfs: Appease set by unused warnings for spl_fstrans_*mark stubs.
Use a void cast to mark the cookie value as used in spl_fstrans_unmark.

Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39357
2023-04-10 10:36:14 -07:00
John Baldwin
5328efb3d0 if_mos: Remove set but unused variable.
Reviewed by:	hselasky
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39356
2023-04-10 10:35:48 -07:00
John Baldwin
4b6228906f libalias: Mark set but unused variables as unused.
This function is clearly a stub, but it seems better to leave the stub
bits in place than to remove the function entirely.

Differential Revision:	https://reviews.freebsd.org/D39355
2023-04-10 10:35:29 -07:00
John Baldwin
16df72a9a2 udf: Remove set but unused variable from udf_getattr.
Reviewed by:	emaste
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39354
2023-04-10 10:31:45 -07:00
John Baldwin
3a9e6624eb rtw88: Silence unused but set warnings from GCC for debug.c.
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39353
2023-04-10 10:31:26 -07:00