To avoid annoyng messages from glibc-2.35 test suite add the simple
implementation of rseq syscall which is do nothing for now.
I plan to implement it if and when the API stabilizes.
MFC after: 2 weeks
In vm_phys_alloc_contig, for an allocation bigger than the size of any
buddy queue free block, avoid examining any maximum-size free block
more than twice, by only starting to consider a sequence of adjacent
max-blocks starting at a max-block that does not follow another
max-block. If that first max-block follows adjacent blocks of smaller
size, and if together they provide enough memory to reduce by one the
number of max-blocks required for this allocation, use them as part of
this allocation.
Reviewed by: markj
Tested by: pho
Discussed with: alc
Differential Revision: https://reviews.freebsd.org/D34815
While here, use driver->name instead of hardcoding the xenpv and
xen_et strings both for devclass_find() and BUS_ADD_CHILD().
Reviewed by: Elliott Mitchell <ehem+freebsd@m5p.com>, imp, royger
Differential Revision: https://reviews.freebsd.org/D35001
We never use the cgd that we get from the XPT_GDEV_TYPE call. Prior to
9a6844d55f we used it to determine if READ AHEAD or WRITE CACHING was
supported. However, all that information was moved into adasetflags so
we no longer need to this since it's cached in the softc and updated
with the IDENTIFY data changes automatically.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D35039
Use a switch rather than a nested if to simplify the async event
processing code. No functional changes intended.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D35038
Disable preemption in seqlock write sections when using the _irqsave
variant. This ensures that a writer can't be preempted and subsequently
starved by a reader running in a callout handler on the same CPU.
This fixes occasional display hangs seen when using the i915 driver.
Tested by: emaste, wulf
Reviewed by: wulf, hselasky
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35021
In ieee80211_sn_sub() we need to shift the mask before applying it.
This fixes the logic from 978f25e840.
Reported by: J.R. Oldroyd (fbsd opal.com)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Our kern_sigtimedwait() calculates absolute sleep timo value as 'uptime+timeout'.
So, when the user specifies a big timeout value (LONG_MAX), the calculated
timo can be less the the current uptime value.
In that case kern_sigtimedwait() returns EAGAIN instead of EINTR, if
unblocked signal was caught.
While here switch to a high-precision sleep method.
Reviewed by: mav, kib
In collaboration with: mav
Differential revision: https://reviews.freebsd.org/D34981
MFC after: 2 weeks
Created a couple of helpers to send signals to the specific thread or to
the whole process. Use helpers in the corresponding syscalls.
This fixes the confusion where a signal destined for a whole process
was sent to a specific thread and vice versa.
There is an exclusion for the linux_kill() syscall that takes a pid
argument and should send a signal to the whole process, but I know
at least one example where kill() takes tid.
MFC after: 2 weeks
wasn't default
Since vte_reset changes register value to MDCSC_DEFAULT value, which may not
be the original value, thus causing some phy registers read failures.
Restoring VTE_MDCSC value to original after reset solves the link state
flapping issue.
Thanks to jhb ("the code looks ok") for his review.
Reviewed by: jhb
Obtained from: NetBSD via Andrius V
Differential Revision: https://reviews.freebsd.org/D34956
The contract with the lower layers is that once ENXIO is reported, all
further I/O to the device is not possible. This is reported when the
device departs for good or changes in some material manner out from
underneath the system. Since the lower layers terminate all pending I/O
when this is detected with ENXIO, reporting more than one provides no
extra value. ENXIO suppression done with atomics due to race described
in e8827f4094. It's on the error path and a rare event, so this won't affect
performance.
Sponsored by: Netflix
Reviewed by: mckusick, kib
Differential Revision: https://reviews.freebsd.org/D35034
On the 0 -> 1 transition of sc_enxio_active, report that we're doing
this. This is a rare, but interesting, event. Convert to using atomics
to set this field to prevent a rare race:
In CAM, when we invalidate a device, one thread (T1) will start the
process in error processing called from *dadone
(cam_periph_error). This routine will queue work to xpt_async_td
(T2) and indicate to *dadone to call biodone(ENXIO) for the bio. T2
wakes up and basically waits to acquire the periph lock. T2 will do
so when T1 drops the periph lock just before T1's call to
biodone. T2 acquires the lock and calls biodone(ENXIO) on all
pending bios. These two threads will race and we could lose the
printf or get two in rare cases. Since we only touch sc_enxio_active
in an error path that's infrequent, the extra atomic traffic will be
rare but will ensure robustness.
Sponsored by: Netflix
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35037
The BIO_SPEEDUP_WRITE and BIO_SPEEDUP_TRIM bits are part of the flags
word, so print them as such.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35035
Fix a comment that was left over from the orignial
implementation. Explain how pending transactions in hardware are
completed/aborted in the SIM prior to ndacleanup being called.
Sponsored by: Netflix
When using NIC TLS RX, packets that are dropped and retransmitted are
not decrypted by the NIC but are passed along as-is. As a result, a
received TLS record might contain a mix of encrypted and decrypted
data. If this occurs, the already-decrypted data needs to be
re-encrypted so that the resulting record can then be decrypted
normally.
Add support for this for sessions using AES-GCM with TLS 1.2 or TLS
1.3. For the recrypt operation, allocate a temporary buffer and
encrypt the the payload portion of the TLS record with AES-CTR with an
initial IV constructed from the AES-GCM nonce. Then fixup the
original mbuf chain by copying the results from the temporary buffer
back into the original mbufs for any mbufs containing decrypted data.
Once it has been recrypted, the mbuf chain can then be decrypted via
the normal software decryption path.
Co-authored by: Hans Petter Selasky <hselasky@FreeBSD.org>
Reviewed by: hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35012
Previously this used a temporary nonce[] buffer. The decrypt hook for
TLS 1.3 as well as the hooks for TLS 1.2 already constructed the IV
directly in crp.crp_iv.
Reviewed by: hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35027
Instead, create a switch structure private to ktls_ocf.c and store a
pointer to the switch in the ocf_session. This will permit adding an
additional function pointer needed for NIC TLS RX without further
bloating ktls_session.
Reviewed by: hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35011
742e7210d0 changed the prototype of udp_tun_func_t(). Bump
__FreeBSD_version so that external modules can #ifdef for it as
required.
PR: 263297
Sponsored by: Rubicon Communications, LLC ("Netgate")
Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34970
Implement the same filter feature we implemented for UDP over IPv6 in
742e7210d. This was missed in that commit.
Pointed out by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Some AMD EPYC VCPUs generated boot message of the type:
pci4: <unknown> at device 0.0 (no driver attached)
These are displayed for device class 0x13 devices, e.g.:
none8@pci0:130:0:0: class=0x130000 rev=0x00 hdr=0x00 vendor=0x1022 \
device=0x148a subvendor=0x1022 subdevice=0x148a
vendor = 'Advanced Micro Devices, Inc. [AMD]'
device = 'Starship/Matisse PCIe Dummy Function'
class = non-essential instrumentation
Since these devices serve no purpose (no driver attaches) I have
enabled the reporting of suich devices only for verbose boots (a
diversion from the patch provided in the PR).
A verbose boot will now display such devices as:
pci4: <non-essential instrumentation> at device 0.0 (no driver attached)
PR: 263469
Reported by: jfc@mit.edu (John F. Carr)
MFC after: 1 week
This took an embarrasingly long time to find.
The state changes for a radio with a STA /and/ AP VAP gets a bit messy.
The AP maps are marked as waiting, waiting for the STA AP to find a
channel to use before the AP VAPs become active.
However, the code path that clears the OACTIVE flag on a VAP only runs
during a successful run of ieee80211_newstate_cb().
So here is how it goes:
* the STA VAP goes down and needs to scan;
* the AP vap goes RUN->INIT; but it doesn't YET call ieee80211_newstate_cb();
* meanwhile - a send on the AP VAP causes the VAP to set the OACTIVE flag here;
* then the STA VAP finishes scan and goes to RUN;
* which will call wakeupwaiting() as part of the STA VAP transition to RUN;
* .. then the AP VAP goes INIT->RUN directly via a call to hostap_newstate
in wakeupwaiting rather than it being through the deferred path;
* /then/ the ieee80211_newstate_cb() is called, but it sees the state go
RUN->RUN;
* .. which results in the OACTIVE flag never being cleared.
This clears the OACTIVE flag when a VAP transitions RUN->RUN; the
driver layer or net80211 layer can set it if required in a subsequent
transmit.
Differential Revision: https://reviews.freebsd.org/D34920
Reviewed by: bz
The other QL_DPRINT*() invocations in qls_init_hw_if() all used the
expanded form instead of the local variable. The module build always
defines QL_DBG in CFLAGS so doesn't trip over this, but adding qlxge
to a kernel config builds without QL_DBG.
Reported by: olivier
This entails various changes to make this driver more "modern"
(new-bus vs pre-new-bus) using device_log() and device_printf() rather
than psm%d. It also fixes the device_busy/unbusy calls to use sc->dev
directly rather than looking the device_t up via the devclass and
unit.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D35006
Deduplicate code to iterate over the bpages list in a bus_dmamap_t
freeing bounce pages during bus_dmamap_unload.
Reviewed by: imp
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34967
When pages are freed to a bounce zone, only maps waiting for pages for
that zone can make forward progress. If a map for a different bounce
zone is at the head of the global list, then requests that could
otherwise make forward progress will be stalled waiting on the other
bounce zone. If bounce zones shared bounce pages then a global list
would still make sense to prevent "later" requests from starving an
earlier request but that is not a concern with per-zone bounce page
pools.
Reviewed by: imp
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34966
Rather than using a software interrupt with a single handler, just
create a dedicated kernel process woken up with a simple wakeup().
Reviewed by: imp
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34965
Add a new PI_SOFTCLOCK for use by softclock threads. Currently this
maps to PI_AV which is the second-highest ithread priority.
Reviewed by: mav, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33693
Rather than a detour via the devclass and hardcoding unit 0.
While here, remove a check for sc being NULL. It will never be NULL
when attach is called.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D35010
This option was never enabled in GENERIC and does not appear to work
(the cdevsw is stored in a global array but never passed to make_dev
to be associated with a character device).
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D35008
Rather than fetching the softc using the controller's unit number as
an index into the devclass.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D35004
While here, use a modern function declaration for smbios_modevent and
vpd_modevent.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D34996
Rather than fetching the softc using the controller's unit number as
an index into the devclass.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D34993
mld_domifattach() does a memory allocation under the global MLD mutex
and so can fail, but no error handling prevents a null pointer
dereference in this case. The mutex is only needed when updating the
global softc list; the allocation and static initialization of the softc
does not require this mutex. So, reduce the scope of the mutex and use
M_WAITOK for the allocation.
PR: 261457
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34943
Coverity points out that if counter was NULL when passed to
pfr_pool_get() we could potentially end up dereferencing it.
Happily all users of the function pass a non-NULL pointer. Enforce this
by assertion and remove the pointless NULL check.
Reported by: Coverity (CID 273309)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Move the use of 'sc' to after the NULL check.
It's very unlikely that we'd actually hit this, but Coverity is correct
that it's not a good idea to dereference the pointer and only then NULL
check it.
Reported by: Coverity (CID 1398362)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Use the drop and enable endpoint context commands to force a reset of
the data toggle for USB 2.0 and USB 3.0 after:
- clear endpoint halt command (when the driver wishes).
- set config command (when the kernel or user-space wants).
- set alternate setting command (only affected endpoints).
Some XHCI HW implementations may not allow the endpoint reset command when
the endpoint context is not in the halted state.
Reported by: Juniper and Gary Jennejohn
MFC after: 1 week
Sponsored by: NVIDIA Networking
When VLAN HW filter is disabled, the NIC does not pass any vlan tagged
traffic. Setting these flags on the device allows vlan tagged traffic to
pass.
PR: 236983
Tested by: pi
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D34824
If iommu_gas_match_one has to adjust for a boundary crossing, its
check against maxaddr includes 'offset' in its calculation, to ensure
that the allocated memory does not exceed the max address. However, if
there's no boundary crossing adjustment, then the maxaddr check
disregards 'offset'. Fix that.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D34978
15b1eb142c changed the callout code to store the CALLOUT_SHAREDLOCK flag
in c_iflags (where it used to be c_flags), but failed to update the
check in softclock_call_cc(). This resulted in the callout code always
taking the write lock, even if a read lock had been requested (with
the CALLOUT_SHAREDLOCK flag in callout_init_rm()).
Reviewed by: markj
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34959
Allow tables to be used for the l3 source/destination matching.
This requires taking the PF_RULES read lock.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34917
Previously it was disabled right before translation was enabled.
This way the disable logic is still executed even when translation
is not be activated, e.g. with hw.iommu.dma=0 tunable set.
On some platforms we need to disable PMR in order for core dump to work.
At the same time it was observed that enabling translation has
a significant impact on network performance.
With this patch PMR can be disabled, with IOMMU translation not being
turned on by appending the following to the loader.conf:
hw.dmar.enable=1
hw.dmar.pmr.disable=1
hw.dmar.dma=0
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D34907
This argument is useless for the vast majority of drivers. For now,
use __VA_ARGS__ wrapper macros so that that the *DRIVER_MODULE()
macros accept both the old version (with a devclass) and the new
version (which omits the argument and stores NULL in the
driver_module_data structure). This provides an API compatiblity
shim that can be merged to older stable branches.
Once all drivers relevant to 14.0 (both in and out of tree) have been
updated, the API compat shims can be dropped.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D34963
This permits a driver module structure that doesn't want to store a
pointer to the new driver's devclass.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D34962
When exporting sysctls to Prometheus, the exporter replaces "." with
"_". This caused several metrics to alias, confusing the Prometheus
server. Fix it by:
* Renaming the "tcp_log_bucket" UMA zone to "tcp_log_id_bucket". Also,
rename "tcp_log_node" to "tcp_log_id_node" for consistency.
* Not exporting sysctls with "(LEGACY)" in the description. That is
used by ZFS sysctls that have been replaced by others, many of which
alias to the same Prometheus metric name (like "vfs.zfs.arc_max" and
"vfs.zfs.arc.max").
PR: 259607
Reported by: delphij
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: delphij,rew,thj
Differential Revision: https://reviews.freebsd.org/D34952