Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510
Ensure veriexec opens the file before doing any read operations.
When the MAC_VERIEXEC_CHECK_PATH_SYSCALL syscall is requested, veriexec
needs to open the file before calling mac_veriexec_check_vp. This is to
ensure any set up is done by the file system. Most file systems do not
explicitly need an open, but some (e.g. virtfs) require initialization
of access tokens (file identifiers, etc.) before doing any read or write
operations.
The evaluate_fingerprint() function needs to ensure it has an open file
for reading in order to evaluate the fingerprint. The ideal solution is
to have a hook after the VOP_OPEN call in vn_open. For now, we open the
file for reading, envaluate the fingerprint, and close the file. While
this leaves a potential hole that could possibly be taken advantage of
by a dedicated aversary, this code path is not typically visited often
in our use cases, as we primarily encounter verified mounts and not
individual files. This should be considered a temporary workaround until
discussions about the post-open hook have concluded and the hook becomes
available.
Add MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL and
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL to mac_veriexec_syscall so we can
fetch and check label contents in an unconstrained manner.
Add a check for PRIV_VERIEXEC_CONTROL to do ioctl on /dev/veriexec
Make it clear that trusted process cannot be debugged. Attempts to debug
a trusted process already fail, but the failure path is very obscure.
Add an explicit check for VERIEXEC_TRUSTED in
mac_veriexec_proc_check_debug.
We need mac_veriexec_priv_check to not block PRIV_KMEM_WRITE if
mac_priv_gant() says it is ok.
Reviewed by: sjg
Obtained from: Juniper Networks, Inc.
The event channel source code or equivalent is needed on all
architectures. Since much of this is viable to share, get this moved out
of x86-land. Each interrupt interface then needs a distinct back-end
implementation.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-01-13 17:41:04
Differential Revision: https://reviews.freebsd.org/D30236
Simply moving the interrupt allocation and release functions into files
which belong to the architecture. Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.
Identifying the border between x86 and architecture-independent is
actually quite tricky. Similarly, getting the prototypes for the
border right is also quite tricky.
Inspired by the work of Julien Grall <julien@xen.org>,
2015-10-20 09:14:56, but heavily adjusted.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
The x86 PIC interface is very much x86-specific and not used by other
architectures. Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.
Introduce wrappers for calls into the architecture-dependent interrupt
layer. All architectures need roughly the same functionality, but the
interface is slightly different between architectures. Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.
The original implementation was done by Julien Grall in 2015, but this
has required major updating.
Removal of PVHv1 meant substantial portions disappeared. The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.
In the original implementation the wrappers were normal functions. Some
had empty stubs in xen_intr.c and were removed.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
The evtchn_type enum is only touched by the Xen interrupt code. Other
event channel uses no longer need the value, so that has been moved to
restrict its use.
Copyright note. The current evtchn_type was introduced at 76acc41fb7
by Justin T. Gibbs. This in turn appears to have been heavily inspired
by 30d1eefe39 done by Kip Macy.
Reviewed by: royger
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header. A
similar situation exists for the NR_EVENT_CHANNELS constant.
Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.
This was originally implemented by Julien Grall, but has been heavily
modified. The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around. This allows the
architecture to add function definitions which use struct xenisrc.
The original version only moved xi_intsrc into xen_arch_isrc_t. Moving
xi_vector was done by the submitter.
The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t. Those disappeared with the removal of PVHv1 support.
Copyright note. The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs. Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
- Adjust some line lengths
- Fix comment about NR_EVENT_CHANNELS after movement.
- Use #include instead of symlinks.
xen_intr_handle_upcall() has two interfaces. It needs to be called by
the x86 assembly code invoked by the APIC. Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.
Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface. Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().
When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled. This removes the need for
critical_enter()/critical_exit() when called this way.
The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.
Additionally driver_filter_t functions have no need to handle interrupt
counters. The intrcnt_add() calling function was reworked to match the
current situation. intrcnt_add() is now only called via one path.
The increment/decrement of curthread->td_intr_nesting_level had
previously been left out. Appears this was mostly harmless, but this
was noticed during implementation and has been added.
CONFIG_X86 is a leftover from use with Linux. While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.
Copyright note. xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs. xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.
sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface. Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later. As such the filename and header list
belong to Julien Grall, but what those were created for is later.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
Part of the series for allowing FreeBSD/ARM to run on Xen. On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.
While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region. As such make
this a distinct macro. Only callers in machine independent code have
been switched.
Add a wrapper for the x86 PIC interface to use matching the old
prototype.
Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D35074
Use it when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
sysctl variables rx_budget and max_aggregation_size are read-only loader
tunable. Mark them with CTLFLAG_RD flag.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Use it when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Bad behaving user-space USB applicatoins may crash the kernel by issuing
USB FS related ioctl(2)'s out of their expected order. By default
the USB FS ioctl(2) interface is only available to the
administrator, root, and driver applications like webcamd(8) needs
to be hijacked in order for this to happen.
The issue is the fast-path code does not always see updates made
by the slow-path code, and may then work on freed memory.
This is easily fixed by using an EPOCH(9) type of synchronization
mechanism. A SX(9) lock will be used as a substitute for EPOCH(9),
due to the need for sleepability. In addition most calls going into
the fast-path originate from a single user-space process and the
need for multi-thread performance is not present.
Differential Revision: https://reviews.freebsd.org/D39373
Reviewed by: markj@
Reported by: C Turt <ecturt@gmail.com>
admbugs: 994
MFC after: 1 week
Sponsored by: NVIDIA Networking
Move code in switch cases into own functions to make later changes easier to track.
No functional change, except for removing a superfluous break statement when
range checking USB_FS_MAX_FRAMES, in the USB_FS_OPEN case.
It should not have been there at all.
Suggested by: emaste@
MFC after: 1 week
Sponsored by: NVIDIA Networking
ifnets are allowed to pass batches of multiple packets to if_input,
linked by the m_nextpkt pointer. iflib_rxeof() sometimes does this, for
example. Netmap's generic mode did not handle this and would only
deliver the first packet in the batch, leaking the rest.
PR: 270636
Reviewed by: vmaffione
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39426
In emulated mode, the FreeBSD netmap port attempts to perform zero-copy
transmission. This works as follows: the kernel ring is populated with
mbuf headers to which netmap buffers are attached. When transmitting,
the mbuf refcount is initialized to 2, and when the counter value has
been decremented to 1 netmap infers that the driver has freed the mbuf
and thus transmission is complete.
This scheme does not generalize to the situation where netmap is
attaching to a software interface which may transmit packets among
multiple "queues", as is the case with bridge or lagg interfaces. In
that case, we would be relying on backing hardware drivers to free
transmitted mbufs promptly, but this isn't guaranteed; a driver may
reasonably defer freeing a small number of transmitted buffers
indefinitely. If such a buffer ends up at the tail of a netmap transmit
ring, further transmits can end up blocked indefinitely.
Fix the problem by removing the zero-copy scheme (which is also not
implemented in the Linux port of netmap). Instead, the kernel ring is
populated with regular mbuf clusters into which netmap buffers are
copied by nm_os_generic_xmit_frame(). The refcounting scheme is
preserved, and this lets us avoid allocating a fresh cluster per
transmitted packet in the common case. If the transmit ring is full, a
callout is used to free the "stuck" mbuf, avoiding the queue deadlock
described above.
Furthermore, when recycling mbuf clusters, be sure to fully reinitialize
the mbuf header instead of simply re-setting M_PKTHDR. Some software
interfaces, like if_vlan, may set fields in the header which should be
reset before the mbuf is reused.
Reviewed by: vmaffione
MFC after: 1 month
Sponsored by: Zenarmor
Sponsored by: OPNsense
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38065
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port. This change allows
tracing filters to be installed in this loopback path.
MFC after: 1 week
Sponsored by: Chelsio Communications
Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen. A single core VM isn't too
unusual, but actual single core hardware is uncommon.
Replace an open-coding of evtchn_clear_port() with the inline.
Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.
Reviewed by: royger
MFC after: 1 week
Summary of changes:
- postpone mtu size assignment during load to avoid race condition
- refactor some of the debug prints
- add request reset handler
- refactor flush scheduler to increase efficiency and avoid racing
- put correct vlan_tag for UD traffic with PFC
- suspend QP before going to ERROR state to avoid CQP timout
- fix arithmetic error on irdma_debug_bugf
- allow debug flag to be settable during driver load
- introduce meaningful default values for DCQCN algorithm
- interrupt naming convention improvements
- skip unsignaled completions in poll_cmpl
Signed-off-by: Bartosz Sobczak bartosz.sobczak@intel.com
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: hselasky@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D39173
If the tunable is set to 0, the tuning of the MPS (maximum payload size)
is disabled and the default MPS values set by the BIOS are used. In this
case the system may use a lower speed or operate in a less optimized
state, but it can resolve issues with stability and compatibility. With
specific devices the tuning of the mps, can lead to a complete freeze of
the system.
Reviewed by: manu
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38397
Commit 8923de5905 ("ice(4): Update to 1.37.7-k", 2023-02-13)
unintentionally overwrote the change made in commit 52f45d8ace ("net:
iflib: let the drivers use isc_capenable", 2021-12-28).
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reported by: jhibbits@
MFC after: 3 days
Sponsored by: Intel Corporation
The previous code unsuccesfully attempted to report a precise error for
each option in the user list. Moreover, commit 253b2ec199 broke some
ctrl-api-test (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260547).
With this patch we bail out as soon as an unrecoverable error is detected and
we properly check for copy boundaries. EOPNOTSUPP no longer immediately
returns an error, so that any other option in the list may be examined
by the caller code and a precise report of the (un)supported options can
be returned to the user.
With this patch, all ctrl-api-test unit tests pass again.
PR: 260547
Submitted by: giuseppe.lettieri@unipi.it
Reviewed by: vmaffione
MFC after: 14 days
Azure setup does not like it when FreeBSD overrides the settings of the
UART device. When Hyper-V is detected, don't do this and also don't
throttle putc() output. This is a workaround for the early boot hang
of FreeBSD on Azure.
Tested on Azure, ESXi (VM with serial port), and SG-8200
PR: 264267
Reviewed by: kevans, whu
Tested by: whu
Obtained from: Rubicon Communications, LLC (Netgate)
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
And minor style fixes.
Reviewed by: hselasky
Tested by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Fixes: 0d7064d58f xhci(4): Add new USB IDs
Differential Revision: https://reviews.freebsd.org/D38921
There is nothing hdmi related in this interface, it's just a generic interface
for crt controller so rename it.
This also remove the 'hdmi' device used in arm kernel config. 'vt' now controls
if we build this interface (sc(4) isn't supported on arm).
Sponsored by: Beckhoff Automation GmbH & Co. KG
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D39120
cpuid detection may have picked up a more specific guest type already,
and a follow-up check of smbios vendor/product may erroneously blow
away the previously detected type.
This reportedly fixes the boot under Hyper-V, which advertises an
smbios.system.product of "Virtual Machine."
PR: 270239
Reviewed by: imp, kib (both earlier version, same concept)
Fixes: 2fee875629 ("abstract out the vm detection via smbios..")
Differential Revision: https://reviews.freebsd.org/D39140
These can be returned from the PSCI AFFINITY_INFO call. This is not
marked as optional so bhyve will need to implement it & can use these
macros.
Sponsored by: Arm Ltd
Wrap parts of psci.h that aren't usable by userspace in _KERNEL checks.
This allows it to be used to implement PSCI and SMCCC by bhyve in
userspace.
Sponsored by: Arm Ltd
Sponsored by: Innovate UK
Sponsored by: The FreeBSD Foundation
The save_if_input function pointer was meant to save the previous
value of ifp->if_input before replacing it with the emulated
adapter hook.
However, the same pointer value is already stored in the if_input
field of the netmap_adapter struct, to be used for host TX ring processing.
Reuse the netmap_adapter if_input field to simplify the code
and save some space.
MFC after: 14 days