Notable upstream pull request merges:
#11153 Scalable teardown lock for FreeBSD
#11651 Don't bomb out when using keylocation=file://
#11667 zvol: call zil_replaying() during replay
#11683 abd_get_offset_struct() may allocate new abd
#11693 Intentionally allow ZFS_READONLY in zfs_write
#11716 zpool import cachefile improvements
#11720 FreeBSD: Clean up zfsdev_close to match Linux
#11730 FreeBSD: bring back possibility to rewind the
checkpoint from bootloader
Obtained from: OpenZFS
MFC after: 2 weeks
Add parsing of the rewind options.
When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.
[1] FreeBSD repo: 277f38abff
[2] OpenZFS repo: f2c027bd6a
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
PR: 254152
Reported by: Zhenlei Huang <zlei.huang at gmail.com>
Obtained from: https://github.com/openzfs/zfs/pull/11730
TSFOOR happens if a beacon with a given TSF isn't received within the
programmed/expected TSF value, plus/minus a fudge range. (OOR == out of range.)
If this happens then it could be because the baseband/mac is stuck, or
the baseband is deaf. So, do a cold reset and resync the beacon to
try and unstick the hardware.
It also happens when a bad AP decides to err, slew its TSF because they
themselves are resetting and they don't preserve the TSF "well."
This has fixed a bunch of weird corner cases on my 2GHz AP radio upstairs
here where it occasionally goes deaf due to how much 2GHz noise is up
here (and ANI gets a little sideways) and this unsticks the station
VAP.
For AP modes a hung baseband/mac usually ends up as a stuck beacon
and those have been addressed for a long time by just resetting the
hardware. But similar hangs in station mode didn't have a similar
recovery mechanism.
Tested:
* AR9380, STA mode, 2GHz/5GHz
* AR9580, STA mode, 5GHz
* QCA9344 SoC w/ on-board wifi (TL-WDR4300/3600 devices); 2GHz
STA mode
Right now ts_antenna is either 0 or 1 in each supported HAL so
this is purely a sanity check.
Later on if I ever get magical free time I may add some extensions
for the NICs that can have slightly more complicated antenna switches
for transmit and I'd like this to not bust memory.
Implement a driver for the RTC embedded in the RK805/RK808 power
management system used for RK3328 and RK3399 SoCs.
Based on experiments on my RK808, setting the time doesn't alter the
internal/inaccessible sub-second counter, therefore there's no point
in calling clock_schedule().
Based on an earlier revision by andrew.
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D22692
Sponsored by: Google
MFC after: 1 week
Queue all XPT_ASYNC ccb's and run those in a new cam async thread. This thread
is allowed to sleep for things like memory. This should allow us to make all the
registration routines for cam periph drivers simpler since they can assume they
can always allocate memory. This is a separate thread so that any I/O that's
completed in xpt_done_td isn't held up.
This should fix the panics for WAITOK alloations that are elsewhere in the
storage stack that aren't so easy to convert to NOWAIT. Additional future work
will convert other allocations in the registration path to WAITOK should
detailed analysis show it to be safe.
Reviewed by: chs@, rpokala@
Differential Revision: https://reviews.freebsd.org/D29210
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug. Disable requests on port 1 by
default on affected firmware.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26581
These fixes are only relevant for requests on the second port. In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.
- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
and use it when an RX channel ID is required in crypto requests
instead of using the tx_channel_id.
- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
the crypto result.
- Set the FID to the first rx queue ID on the adapter rather than the
queue ID of the first rx queue for the port.
- While here, use tx_chan to set the tx_channel_id though this is
identical to the previous value.
Reviewed by: np
Reported by: Chelsio QA
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29175
We have seen several cases of processes which have become "stuck" in
kern_sigsuspend(). When this occurs, the kernel's td_sigblock_val
is set to 0x10 (one block outstanding) and the userspace copy of the
word is set to 0 (unblocked). Because the kernel's cached value
shows that signals are blocked, kern_sigsuspend() blocks almost all
signals, which means the process hangs indefinitely in sigsuspend().
It is not entirely clear what is causing this condition to occur.
However, it seems to make sense to add some protection against this
case by fetching the latest sigfastblock value from userspace for
syscalls which will sleep waiting for signals. Here, the change is
applied to kern_sigsuspend() and kern_sigtimedwait().
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29225
This permits these routines to use special logic for initializing MD
kthread state.
For the kproc case, this required moving the logic to set these flags
from kproc_create() into do_fork().
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29207
POSIX states that new threads created via pthread_create() should
inherit the "floating point environment" from the creating thread.
Discussed with: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29204
- Use update_pcb_bases() when updating FS or GS base addresses to
permit use of FSBASE and GSBASE in Linux processes. This also sets
PCB_FULL_IRET. linux32 was setting PCB_32BIT which should be a
no-op (exec sets it).
- Remove write-only variables to construct unused segment descriptors
for linux32.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29026
Before the pcb is copied to the new thread during cpu_fork() and
cpu_copy_thread(), the kernel re-reads the current register values in
case they are stale. This is done by setting PCB_FULL_IRET in
pcb_flags.
This works fine for user threads, but the creation of kernel processes
and kernel threads do not follow the normal synchronization rules for
pcb_flags. Specifically, new kernel processes are always forked from
thread0, not from curthread, so adjusting pcb_flags via a simple
instruction without the LOCK prefix can race with thread0 running on
another CPU. Similarly, kthread_add() clones from the first thread in
the relevant kernel process, not from curthread. In practice, Netflix
encountered a panic where the pcb_flags in the first kthread of the
KTLS process were trashed due to update_pcb_bases() in
cpu_copy_thread() running from thread0 to create one of the other KTLS
threads racing with the first KTLS kthread calling fpu_kern_thread()
on another CPU. In the panicking case, the write to update pcb_flags
in fpu_kern_thread() was lost triggering an "Unregistered use of FPU
in kernel" panic when the first KTLS kthread later tried to use the
FPU.
Reported by: gallatin
Discussed with: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29023
For native FreeBSD binaries, the return value from __getcwd(2)
doesn't really matter, as the libc wrapper takes over and returns
the proper errno.
PR: kern/254120
Reported By: Alex S <iwtcex@gmail.com>
Reviewed By: kib
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29217
swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.
We can remove it manually with intr_event_destroy().
PR: 254171
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29211
We can now counter_u64_free(NULL), so remove the checks.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29190
We already allow free(NULL) and uma_zfree(..., NULL). Make
uma_zfree_pcpu(..., NULL) work as well.
This also means that counter_u64_free(NULL) will work.
These make cleanup code simpler.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29189
We might own the last use reference, and then vrele() at the end would
need to take the dvp vnode lock to inactivate, which causes deadlock
with vp. We cannot vrele() dvp from start since this might unlock ldvp.
Handle it by holding the vnode and dropping use ref after lowerfs
VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode
after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true
unconditionally. This opens more opportunities for vp to be reclaimed,
if lvp is still alive we reinstantiate vp with null_nodeget().
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
Nullfs vnode which shares vm_object and pages with the lower vnode should
not be exempt from the reclaim just because lower vnode cached a lot.
Their reclamation is actually very cheap and should be preferred over
real fs vnodes, but this change is already useful.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
Other threads observing the non-NULL um_softdep can assume that it is
safe to use it. This is important for ro->rw remounts where change from
read-only to read-write status cannot be made atomic.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
otherwise we might follow a pointer in the freed memory.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
Suppose that we remount rw->ro and in parallel some reader tries to
instantiate a vnode, e.g. during lookup. Suppose that softdep_unmount()
already started, but we did not cleared the MNT_SOFTDEP flag yet.
Then ffs_vgetf() calls into softdep_load_inodeblock() which accessed
destroyed hashes and freed memory.
Set/clear fs_ronly simultaneously (WRT to files flush) with MNT_SOFTDEP.
It might be reasonable to move the change of fs_ronly to under MNT_ILOCK,
but no readers take it.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
Now MNT_SOFTDEP indicates that SU are active in any variant +-J, and
SU+J is indicated by MNT_SOFTDEP | MNT_SUJ combination. The reason is
that unmount will be able to easily hide SU from other operations by
clearing MNT_SOFTDEP while keeping the record of the active journal.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
It will be used to allow SU flush code to sync the volume while external
consumers see that SU is already disabled on the filesystem. Use it where
ffs_vgetf() called by SU code to process dependencies.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
This removes the need to check for error == 0.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav
Differential Revision: https://reviews.freebsd.org/D29006
config_intrhook_drain will remove the hook from the list as
config_intrhook_disestablish does if the hook hasn't been called. If it has,
config_intrhook_drain will wait for the hook to be disestablished in the normal
course (or expedited, it's up to the driver to decide how and when
to call config_intrhook_disestablish).
This is intended for removable devices that use config_intrhook and might be
attached early in boot, but that may be removed before the kernel can call the
config_intrhook or before it ends. To prevent all races, the detach routine will
need to call config_intrhook_train.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav, gde (in D29006 for man page)
Differential Revision: https://reviews.freebsd.org/D29005
This looks like a no-op, but it prevents udevadm(8) with failing
loudly, which in turn unbreaks installation of libfprint-2-2, which
in Focal is a dependency for make-4.2.1-1.2.
One might wonder why installing a build utility involves messing
with device handling...
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29133
The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.
During the scan, we drop the per-domain lock. At this point, the rvn
pointer may be invalidated. Take care to load rvn after re-acquiring
the per-domain lock.
While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.
Reviewed by: alc, kib
Reported by: gallatin
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29203
pf_kkif_free() already checks for NULL, so we don't have to check before
we call it.
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29195
The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this
patch add the option -I (capital letter i) to send it to the drivers.
None of existing PWM driver have implemented support for flags.
But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer.
Differential Revision: https://reviews.freebsd.org/D29137
MFC after: 2 weeks
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.
Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after: 3 days
Clang 12 no longer supports -Wno-error-..., only the -Wno-error=...
style (which is already used everywhere else in the tree).
Differential Revision: https://reviews.freebsd.org/D29157
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
VNET.
PR: 253998
Reported by: rashey at superbox.pl
MFC after: 3 days
Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
Test Plan:
```
set_skip:set_skip_group_lo -> passed [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```
Reviewers: #network, kp, bz
Reviewed By: kp
Subscribers: imp, ae
Differential Revision: https://reviews.freebsd.org/D29116
Copy the iovec for the trailer from the proper place. This is the same
fix for CBC encryption from ff6a7e4ba6.
Reported by: gallatin
Reviewed by: gallatin, markj
Fixes: 49f6925ca
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29177
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
MFC after: 1 month
The ENTRY() macro was modified by commit
28d945204e to add an optional NOP instruction
at the beginning of the function. It is of course an arm64 instruction, so
unsuitable for the 32bits sigcode. So just use EENTRY() instead for
aarch32_sigcode. This should fix receiving signals when running 32bits
binaries on FreeBSD/arm64.
MFC After: 1 week
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.
I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.
The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.
PR: 223917, 240122
Reviewed by: imp, manu
Tested by: bz
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29130
I noticed that many of the math-related tests were failing on AArch64.
After a lot of debugging, I noticed that the floating point exception flags
were not being reset when starting a new process. This change resets the
VFP inside exec_setregs() to ensure no VFP register state is leaked from
parent processes to children.
This commit also moves the clearing of fpcr that was added in 65618fdda0
from fork() to execve() since that makes more sense: fork() can retain
current register values, but execve() should result in a well-defined
clean state.
Reviewed By: andrew
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29060
The names are self-explanatory; these are currently only used by the
wg(8) tool, but they are handy data points to have.
Reviewed by: grehan
MFC after: 3 days
Discussed with: decke
Differential Revision: https://reviews.freebsd.org/D29143
If we hit an error during init, then we'll unwind our state and attempt
to detach the device -- don't block it.
This was discovered by creating a wg0 with missing parameters; said
failure ended up leaving this orphaned device in place and ended up
panicking the system upon enumeration of the dev.* sysctl space.
Reviewed by: gallatin, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29145
We have no use for the udphdr or this hlen local, just spell out the
addition inline.
MFC after: 3 days
Reviewed by: grehan, markj
Differential Revision: https://reviews.freebsd.org/D29142
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.
The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios. However, this change does add two new counters that
are available without PV_STATS. pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively. These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28923
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.
This fixes boot on PowerPC64 LE, when using ofwfb as the system console.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27475
The kernel-side already accepted a persistent-keepalive-interval, so
just add a verb to ifconfig(8) for it and start exporting it so that
ifconfig(8) can view it.
PR: 253790
MFC after: 3 days
Discussed with: decke
Simple condition flip; we wanted to panic here after epoch_trace_list().
Reviewed by: glebius, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29125
No sleeping allowed here, so avoid it. Collect the subset of data we
want inside of the epoch, as we'll need extra allocations when we add
items to the nvlist.
Reviewed by: grehan (earlier version), markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29124
This partially reverts df55485085 but still fixes the leak. It was
overlooked (sigh) that some packets will exceed MHLEN and cannot be
physically contiguous without clustering, but we don't actually need
it to be. m_defrag() should pull up enough for any of the headers that
we do need to be accessible.
Fixes: df55485085
Pointy hat; kevans
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
MFC after: 1 month
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
Discussed with: kib
MFC after: 1 month
config_intrhook doesn't need to be a two-pointer TAILQ. We rarely add/delete
from this and so those need not be optimized. Instaed, use the one-pointer
STAILQ plus a uintptr_t to be used as a flags word. This will allow these
changes to be MFC'd to 12 and 13 to fix a race in removable devices.
Feedback from: jhb
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D29004
P2P ifa may require 2 routes: one is the loopback route, another is
the "prefix" route towards its destination.
Current code marks loopback routes existence with IFA_RTSELF and
"prefix" p2p routes with IFA_ROUTE.
For historic reasons, we fill in ifa_dstaddr for loopback interfaces.
To avoid installing the same route twice, we preemptively set
IFA_RTSELF when adding "prefix" route for loopback.
However, the teardown part doesn't have this hack, so we try to
remove the same route twice.
Fix this by checking if ifa_dstaddr is different from the ifa_addr
and moving this logic into a separate function.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D29121
MFC after: 3 days
Misbehavior has been observed with TSC under VirtualBox, where threads
doing small sleeps (~1 second) may miss their wake up and hang around
in a sleep state indefinitely. Switching back to ACPI-fast decidedly
fixes it, so stop using TSC on VirtualBox at least for the time being.
This partially reverts 84eaf2ccc6, applying it only to VirtualBox and
increasing the quality to 0. Negative qualities can never be chosen and
cannot be chosen with the tunable recently added. If we do not have a
timecounter with a higher quality than 0, then TSC does at least leave
the system mostly usable.
PR: 253087
Reviewed by: emaste, kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29132
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.
DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.
Reviewed by: gallatin, erj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29103
ath_hal is compiled into the kernel by default and so always prints a
message to dmesg even when the system has no ath hardware.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
timer_settime(2) may be used to configure a timeout in the past. If
the timer is also periodic, we also try to compute the number of timer
overruns that occurred between the initial timeout and the time at which
the timer fired. This is done in a loop which iterates once per period
between the initial timeout and now. If the period is small and the
initial timeout was a long time ago, this loop can take forever to run,
so the system is effectively DOSed.
Replace the loop with a more direct calculation of
(now - initial timeout) / period to compute the number of overruns.
Reported by: syzkaller
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29093
When I synchronized kern.mk with bsd.sys.mk, I accidentally changed
CCLDFLAGS to LDFLAGS which is not used by the kernel builds. This commit
should unbreak the GitHub actions cross-build CI. I didn't notice it
locally because cheribuild already passes -fuse-ld in the linker flags as
it predates this being done in the makefiles.
Reported By: Jose Luis Duran
Fixes: 172a624f0 ("Silence annoying and incorrect non-default linker warning with GCC")
The current preference number were copied from IPv4 code,
assuming 500k routes to be the full-view. Adjust with the current
reality (100k full-view).
Reported by: Marek Zarychta <zarychtam at plan-b.pwste.edu.pl>
MFC after: 3 days
Initialization of the XTS key schedule was accidentally dropped
when adding AES-GCM support so all-zero schedule was used instead.
This rendered previously created GELI partitions unusable.
This change restores proper XTS key schedule initialization.
Reported by: Peter Jeremy <peter@rulingia.com>
MFC after: immediately
Add missing #ifndef/#define/#endif guards against multiple inclusions
to ieee80211_ratectl.h as they are missing.
MFC after: 3 days
Sponsored-by: Rubicon Communications, LLC ("Netgate")
In r361544, the pmap drivers were converted to ifuncs. When doing so,
this changed the call type of pmap functions to be called via the
secure-plt stubs.
These stubs depend on the TOC base being loaded to r30 to run properly.
On SMP AIM (i.e. a dual processor G4 or running 32-bit on G5), since the
APs were being started up from the reset vector instead of going
through __start, they had never had r30 initialized properly, so when the
cpu_reset code in trap_subr32.S attempted to branch to
pmap_cpu_bootstrap(), it was loading the target from the wrong location.
Ensure r30 is set up directly in the cpu_reset trap code, so we can make
PLT calls as normal.
Fixes boot on my SMP G4.
Reviewed by: jhibbits
MFC after: 3 days
Sponsored by: Tag1 Consulting, Inc.
Drain the callbacks upon if_deregister_com_alloc() such that the
if_com_free[type] won't be nullified before if_destroy().
Taking fwip(4) as an example, before this fix, kldunload if_fwip will
go through the following:
1. fwip_detach()
2. if_free() -> schedule if_destroy() through NET_EPOCH_CALL
3. fwip_detach() returns
4. firewire_modevent(MOD_UNLOAD) -> if_deregister_com_alloc()
5. kernel complains about:
Warning: memory type fw_com leaked memory on destroy (1 allocations, 64 bytes leaked).
6. EPOCH runs if_destroy() -> if_free_internal()i
By this time, if_com_free[if_alloctype] is NULL since it's already
nullified by if_deregister_com_alloc(); hence, firewire_free() won't
have a chance to release the allocated fw_com.
Reviewed by: hselasky, glebius
MFC after: 2 weeks
The hardware random number generator of the RPi4 differs slightly
from the version found on the RPi3.
This commit extends the existing bcm2835_rng driver to function on the RPi4.
Submitted by: James Mintram <me at jamesrm dot com>
Reviewed by: markm, cem, delphij
Approved by: csprng(cem, markm)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22493
Return while there are any I/Os in a queue may result in them stuck
indefinitely, since there is only one taskqueue task for all of them.
I think I've reproduced this by switching ha_role to secondary under
heavy load.
MFC after: 3 days
This updates the driver to align with the version included in
the "Intel Ethernet Adapter Complete Driver Pack", version 25.6.
There are no major functional changes; this mostly contains
bug fixes and changes to prepare for new features. This version
of the driver uses the previously committed ice_ddp package
1.3.19.0.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Tested by: jeffrey.e.pieper@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D28640
netmap is compiled into the kernel by default so initialization was
always reported, and netmap uses a formatting convention not used in the
rest of the kernel.
Reviewed by: vmaffione
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29099
Firmware access from t4_attach takes place without any synchronization.
The driver should not panic (debug kernels) if something goes wrong in
early communication with the firmware. It should still load so that
it's possible to poke around with cxgbetool.
MFC after: 1 week
Sponsored by: Chelsio Communications
invltlb_invpcid_pti_handler() was requesting delayed TLB invalidation
even for processes that aren't using PTI. With an out-of-tree
change to avoid PTI for non-jailed root processes, this caused an
assertion failure in pmap_activate_sw_pcid_pti() when context-switching
between PTI and non-PTI processes.
Reviewed by: bdrewery kib tychon
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29094
cryptosoft is always present and doesn't print any useful information
when it attaches.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29098
We don't typically print anything when a subsystem initializes itself,
and KTLS is currently disabled by default anyway.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29097
There currently isn't a need to provide a public interface to a
software Poly1305 implementation beyond what is already available via
libsodium's APIs and these symbols conflict with symbols shared within
the ossl.ko module between ossl_poly1305.c and ossl_chacha20.c.
Reported by: se, kp
Fixes: 78991a93eb
Sponsored by: Netflix
Otherwise during attach newbus prints "nexus0", which is not very
useful.
The generic nexus device is already quiet, as is nexus_acpi on arm64.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Allow sending user data on the SYN segment.
Reviewed by: rrs
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29082
Sponsored by: Netflix, Inc.
This structure is only used by the kernel module internally. It's not
shared with user space, so hide it behind #ifdef _KERNEL.
Sponsored by: Rubicon Communications, LLC ("Netgate")
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.
MFC after: 2 weeks
Sponsored by: Microsoft
This appears to be a copy-and-paste error that has simply been
overlooked. The tree contains only two calls to any of the affected
variants, but recent additions to the test suite started exercising the
call to atomic_clear_rel_int() in ng_leave_write(), reliably causing
panics.
Apparently, the issue was inherited from the arm64 atomic header. That
instance was addressed in c90baf6817, but the fix did not make its way
to RISC-V.
Note that the particular test case ng_macfilter_test:main still appears
to fail on this platform, but this change reduces the panic to a
timeout.
PR: 253237
Reported by: Jenkins, arichardson
Reviewed by: kp, arichardson
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29064
In some configurations we need more classes than ALTQ supports by
default. Increase the maximum number of classes we allow.
This will only cost us a comparatively trivial amount of memory, so
there's little reason not to do so.
If ever we find we want even more we may want to consider turning these
defines into a tunable, but for now do the easy thing.
Reviewed by: donner@
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29034
Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.
Use them where appropriate.
Reviewed by: ae (previous version), rscheff, tuexen, rgrimes
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29056
Teach pf to read the DSCP value from the IPv6 header so that we can
match on them.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29048
The CROSS_TOOLCHAIN GCC .mk files include -B${CROSS_BINUTILS_PREFIX}, so
GCC will select the right linker and we don't need to warn.
While here also apply 17b8b8fb5f to kern.mk.
Test Plan: no more warning printed with CROSS_TOOLCHAIN=mips-gcc6
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D29015
Reuse existing handling for .ctors, print a warning if multiple
constructor sections are present. Destructors are not handled as of
yet.
This is required for KASAN.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29049
In 48ba9b2669 we switched from creating level 1 blocks to smaller
level 2 blocks when creating the early arm64 page tables. On issue
was that they had a different meaning for register x7. The former used
it to hold page table attributes, while the latter held just the memory
type. This caused these attributes to be incorrectly shifted.
Fix this by changing the meaning of x7 to hold the block attributes
and fix the only caller that used the old meaning.
Most hardware seems to have handled the bits being off however qemu
failed to boot as reserved bits that should be zero were being set and
qemu fails to clear these when translating from a virtual address to a
physical address.
Sponsored by: Innovate UK
Maintain a cache of physically contiguous runs of pages for use as
output buffers when software encryption is configured and in-place
encryption is not possible. This makes allocation and free cheaper
since in the common case we avoid touching the vm_page structures for
the buffer, and fewer calls into UMA are needed. gallatin@ reports a
~10% absolute decrease in CPU usage with sendfile/KTLS on a Xeon after
this change.
It is possible that we will not be able to allocate these buffers if
physical memory is fragmented. To avoid frequently calling into the
physical memory allocator in this scenario, rate-limit allocation
attempts after a failure. In the failure case we fall back to the old
behaviour of allocating a page at a time.
N.B.: this scheme could be simplified, either by simply using malloc()
and looking up the PAs of the pages backing the buffer, or by falling
back to page by page allocation and creating a mapping in the cache
zone. This requires some way to save a mapping of an M_EXTPG page array
in the mbuf, though. m_data is not really appropriate. The second
approach may be possible by saving the mapping in the plinks union of
the first vm_page structure of the array, but this would force a vm_page
access when freeing an mbuf.
Reviewed by: gallatin, jhb
Tested by: gallatin
Sponsored by: Ampere Computing
Submitted by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D28556
It closes tiny race when the flag could be set between being cleared
and the space is checked, that would create us some more work. The
flag setting is protected by both locks, so we can clear it in either
place, but in between both locks are dropped.
MFC after: 1 week
I think it allowed to avoid some TX thread wakeups while the socket
buffer is full. But add there another options if ic_check_send_space
is set, which means socket just reported that new space appeared, so
it may have sense to pull more data from ic_to_send for better TX
coalescing.
MFC after: 1 week
Previously the spi_ranges_cnt stored the table size in bytes
instead of the number of elements. Fix that.
Reviewed by: mmel
Submitted by: Zyta Szpak <zr@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
To trace leaf asm functions we can insert a single nop instruction as
the first instruction in a function and trigger off this.
Reviewed by: gnn
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28132
This reduces the memory mapped to be closer to the minimal memory
needed to enable the MMU.
Reviewed by: mmel
Sponsored by: Innovate UK
Differential Revision:://reviews.freebsd.org/D27765
Add sysctl link_active_on_if_down, which allows user to control
if interface is kept in active state when it is brought
down with ifconfig. Set it to enabled by default to preserve
backwards compatibility.
Reviewed by: erj
Tested by: gowtham.kumar.ks@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D28028
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
There is no HW counter for frames with invalid L3/L4 checksums
so add a SW one.
Also add a "rx_errors" sysctl with a copy of netstat IERRORS
counter value to make it easier accessible from scripts.
Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27639
There were two definitions for the SCSI VPD Block Device Characteristics (page
0xb1): struct scsi_vpd_block_characteristics and struct
scsi_vpd_block_device_characteristics. The latter is more complete and more
widely used. Convert uses of the former to the latter by tweaking the da driver
and removing sturct scsi_vpd_block_characteristics.
The old code had a O(n) loop, where n is the size of /dev/devstat.
Multiply that by another O(n) loop in devstat_mmap for a total of
O(n^2).
This change adds DIOCGMEDIASIZE support to /dev/devstat so userland can
quickly determine the right amount of memory to map, eliminating the
O(n) loop in userland.
This change decreases the time to run "gstat -bI0.001" with 16,384 md
devices from 29.7s to 4.2s.
Also, fix a memory leak first reported as PR 203097.
Sponsored by: Axcient
Reviewed by: mav, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28968
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
Also, add new "rx_errs" sysctl in the dev.ix.N.mac_stats tree. This is
to provide an another way to display the sum of RX errors.
Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27191
For interfaces with admin completion queues, introduce a new devmethod
IFDI_ADMIN_COMPLETION_HANDLE and a corresponding flag IFLIB_HAS_ADMINCQ.
This provides an option for handling any admin cq logic, which cannot be
run from an interrupt context.
Said method is called from within iflib's admin task, making it safe to
sleep.
Reviewed by: mmacy
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28708
According to Armada 8k documentation, the interrupt cause register
(at offset 0x14) is RW0C. Update the configuration in attach and
the mvebu_gpio_isrc_eoi() to follow the description.
Reviewed by: mmel
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D29013
Respect filter-specific flags for the EVFILT_FS filter.
When a kevent is registered with the EVFILT_FS filter, it is always
triggered when an EVFILT_FS event occurs, regardless of the
filter-specific flags used. Fix that.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28974
This fixes mpr driver on big-endian devices.
Tested on powerpc64 and powerpc64le targets using a SAS9300-8i card
(LSISAS3008 pci vendor=0x1000 device=0x0097)
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, Sreekanth Reddy <sreekanth.reddy@broadcom.com> (by email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25785
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing I/O DS operations on a Flexible File Layout pNFS
server. For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
instead of the ReadDS/WriteDS/CommitDS counts.
This patch fixes this.
Only the RPC counts reported by nfsstat(1) were affected by this bug,
the I/O operations were performed correctly.
MFC after: 2 weeks
Tracker should contain exactly the path from the starting directory to
the current lookup point. Otherwise we might not detect some cases of
dotdot escape. Consequently, if we are walking up the tree by dotdot
lookup, we must remove an entries below the walked directory.
Reviewed by: markj
Tested by: arichardson, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D28907
with the reasoning that the flags did not worked properly, and were not
shipped in a release.
O_RESOLVE_BENEATH is kept as useful.
Reviewed by: markj
Tested by: arichardson, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D28907
When searching for runs to reclaim, we need to ensure that the entire
run will be added to the buddy allocator as a single unit. Otherwise,
it will not be visible to vm_phys_alloc_contig() as it is currently
implemented. This is a problem for allocation requests that are not a
power of 2 in size, as with 9KB jumbo mbuf clusters.
Reported by: alc
Reviewed by: alc
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28924
Merge the following fixes from https://github.com/pfsense/FreeBSD-src
1940e7d3 Save address of ingress packets to allow wg to work on HA
8f5531f1 Fix connection to IPv6 endpoint
825ed9ee Fix tcpdump for wg IPv6 rx tunnel traffic
2ec232d3 Fix issue with replying to INITIATION messages in server mode
ec77593a Return immediately in wg_init if in DETACH'd state
0f0dde6f Remove unnecessary wg debug printf on transmit
2766dc94 Detect and fix case in wg_init() where sockets weren't cleaned up
b62cc7ac Close the UDP tunnel sockets when the interface has been stopped
Reviewed by: kevans
Obtained from: pfSense 2.5
MFC after: 3 days
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28962
There are three issues with change that stopped truncating ea area before
write, and resulted in possible zero tail in the ea area:
- Truncate to zero checked i_ea_len after the reference was dropped,
making the last drop effectively truncate to zero length always.
- Loop to fill uio for zeroing specified too large length, that triggered
assert in normal situation.
- Integrity check could trip over the tail, instead we must allow
partial header or header with zero length, and clamp ea image in
memory at it.
Reported by: arichardson
Tested by: arichardson, pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Fixup: 5e198e7646
Differential Revision: https://reviews.freebsd.org/D28999
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing a File Layout pNFS DS I/O operation.
The size of the I/O operation was smaller than expected.
The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
in the layout. I had misinterpreted RFC5661 and had shifted the value
right by 6 bits. The correct interpretation is to use the value as
presented (it is always an exact multiple of 64), clearing bits 0->5.
This patch fixes this.
Without the patch, I/O through the DSs work, but the I/O size is 1/64th
of what is optimal.
MFC after: 2 weeks
The default behavior for attaching processes to jails is that the jail's
cpuset augments the attaching processes, so that it cannot be used to
escalate a user's ability to take advantage of more CPUs than the
administrator wanted them to.
This is problematic when root needs to manage jails that have disjoint
sets with whatever process is attaching, as this would otherwise result
in a deadlock. Therefore, if we did not have an appropriate common
subset of cpus/domains for our new policy, we now allow the process to
simply take on the jail set *if* it has the privilege to widen its mask
anyways.
With the new logic, root can still usefully cpuset a process that
attaches to a jail with the desire of maintaining the set it was given
pre-attachment while still retaining the ability to manage child jails
without jumping through hoops.
A test has been added to demonstrate the issue; cpuset of a process
down to just the first CPU and attempting to attach to a jail without
access to any of the same CPUs previously resulted in EDEADLK and now
results in taking on the jail's mask for privileged users.
PR: 253724
Reviewed by: jamie (also discussed with)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28952
r366302 broke copy_file_range(2) for small values of
input file offset and len.
It was possible for rem to be greater than len and then
"len - rem" was a large value, since both variables are
unsigned.
Reported by: koobs, Pablo <pablogsal gmail com> (Python)
Reviewed by: asomers, koobs
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28981
This flag has been set on startup since 65618fdda0.
However, This causes some of the math-related tests to fail as they report
zero instead of a tiny number. This fixes at least
/usr/tests/lib/msun/ldexp_test and possibly others.
Additionally, setting this flag prevents printf() from printing subnormal
numbers in decimal form.
See also https://www.openwall.com/lists/musl/2021/02/26/1
PR: 253847
Reviewed By: mmel
Differential Revision: https://reviews.freebsd.org/D28938
arm64 has a distinct exception code for single-step, so we can use this
to detect when an unexpected SS trap is encountered, or when an expected
one is not. See db_stop_at_pc().
Reviewed by: markj, jhb
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28942
This value should be kept in sync with updates to kdb_frame->tf_elr,
since it is queried by PC_REGS() in several places.
Reviewed by: markj, jhb
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28943
The main issue is that debug exceptions must to be disabled for the
entire duration that SS bit in MDSCR_EL1 is set. Otherwise, a
single-step exception will be generated immediately. This can occur
before returning from the debugger (when MDSCR is written to) or before
re-entering it after the single-step (when debug exceptions are unmasked
in the exception handler).
Solve this by delaying the unmask to C code for EL1, and avoid unmasking
at all while handling debug exceptions, thus avoiding any recursive
debug traps.
Reviewed by: markj, jhb
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28944
Mainly link errors interrupts should only be activated on fully linked port,
otherwise noise on lanes can cause livelock. But we don't have error
counters yet, so leave these interrupts disabled.
PAT_WRITE_BACK is x86-only, whereas sys/dev/xen could be shared
between multiple architectures.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28831
During code inspection I noticed that the n_direofoffset field
of the NFS node was being manipulated without any lock being
held to make it SMP safe.
This patch adds locking of the NFS node's mutex around
handling of n_direofoffset to make it SMP safe.
I have not seen any failure that could be attributed to n_direofoffset
being manipulated concurrently by multiple processors, but I think this
is possible, since directories are read with shared vnode
locking, plus locks only on individual buffer cache blocks.
However, there have been as yet unexplained issues w.r.t reading
large directories over NFS that could have conceivably been caused
by concurrent manipulation of n_direofoffset.
MFC after: 2 weeks
Commit 3fe2c68ba2 dealt with a panic in cache_enter_time() where
the vnode referred to the directory argument.
It would also be possible to get these panics if a broken
NFS server were to return the directory as an new object being
created within the directory or in a Lookup reply.
This patch adds checks to avoid the panics and logs
messages to indicate that the server is broken for the
file object creation cases.
Reviewd by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28987
The missing newline mildly garbles boot-time messages and this can be
troublesome if you need those.
Fixes: a520f5ca58 ("armv8crypto: print a message on probe failure")
Reported by: Mike Karels (mike@karels.net)
Reviewed By: gonzo
Differential Revision: https://reviews.freebsd.org/D28988
Add the missing receive stat(u)s flag for 160Mhz channel width.
While here correct the comment for c_phytype to reference the correct
flags.
MFC-after: 3 days
Sponsored-by: Rubicon Communications, LLC ("Netgate")
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.
This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.
Reported by: otis
Tested by: otis
Reviewed by: mjg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28969
for the which which definitely use membar to sync with interrupt handlers.
libc and rtld uses of __compiler_membar() seems to want compiler barriers
proper.
The barrier in sched_unpin_lite() after td_pinned decrement seems to be not
needed and removed, instead of convertion.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28956
Historically it was allowed for any names, but arguably should never be
even attempted. Allow it again since there is a release pending and
allowing it is bug-compatible with previous behavior.
Reported by: otis
place -- in the VOP rather than vn_setexttr() -- and that it is for historic
reasons. We might wish to relocate it in due course, but this way at least
we document the asymmetry.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
only one row (cache line). Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.
Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve. About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
After the merge of OpenZFS master-9312e0fd1 it has become possible to
import ZFS pools witn an active org.illumos:edonr feature on FreeBSD,
leading to a panic.
In addition, "zpool status" reported all pools without edonr as upgradable
and "zpool upgrade -v" lists edonr in the list of upgradable features.
This is an accepted but not yet included bugfix by upstream.
Obtained from: https://github.com/openzfs/zfs/pull/11653
Differential Revision: https://reviews.freebsd.org/D28935
Reported by: garga (on freebsd-current@)
Reviewed by: freqlabs
X-MFC-with: ba27dd8be8
Add 04/25 Depopulation restoration in progress, 31/04 Depopulation failed, and
31/05 Depopulation restoration failed.
These are defined in SPC-6r2 (though 31/4 was added in an earlier draft). They
relate to different aspects of in-progress or failed depopulation removal and
restoration commands.
do_jail_attach() now only uses the PD_XXX flags that refer to lock
status, so make sure that something else like PD_KILL doesn't slip
through.
Add a KASSERT() in prison_deref() to catch any further PD_KILL misuse.
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system. However, if PG_PS is set, there is no next level.
Reported by: rpokala
Reviewed by: kib
Tested by: rpokala
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28922
I missed updating this counter when rebasing the changes in
9c64fc4029 after the switch to
COUNTER_U64_DEFINE_EARLY in 1755b2b989.
Fixes: 9c64fc4029 Add Chacha20-Poly1305 as a KTLS cipher suite.
Sponsored by: Netflix
A sequence of overlapping IPv4 fragments could crash the kernel in
pf due to an assertion.
Reported by: Alexander Bluhm
Obtained from: OpenBSD
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Some changes back in ye olde times somewhere has changed the default
block size the flash device exposes. So, the default geom redboot
FIS probing (to find the partition table structure in flash!)
is no longer finding it.
So, force it to probe at the last 64k of flash regardless of the
underlying flash block size.
Tested:
* Ubiquiti Routerstation pro, boots -HEAD MIPS
In uart_phyp_get(), when the internal buffer is empty, we make a
hypercall to retrieve up to 16 bytes of input data from the
hypervisor. As this is specified to be returned in BE format, we need
to do a 64-bit byte swap on the first and second half of the data.
If the buffer being passed in was insufficient to return the fetched
data, we store the remainder in the internal buffer and use it to
satisfy the following calls to uart_phyp_get() until it is drained.
However, in this case, we were accidentally byteswapping the internal
buffer again.
Move the byteswapping code to just after the hypercall so it only gets
swapped when we're filling the buffer.
Fixes arrow keys in qemu on pseries, among other console oddities.
Sponsored by: Tag1 Consulting, Inc.
MFC after: 3 days
We were unlocking the vm object before reading the backing_object field.
In the meantime, the object could be freed and reused. This could cause
us to go off the rails in the object chain traversal, failing to unlock
the rest of the objects in the original chain and corrupting the lock
state of the victim chain.
Reviewed by: bdrewery, kib, markj, vangyzen
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28926
Modify lock_delay() to increase the delay time after spinning,
not before. Previously we would spin at least twice instead of once.
In NetApp's benchmarks this fixes a performance regression compared
to FreeBSD 10, which called cpu_spinwait() directly.
Reviewed By: mjg
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27331
the negotiation of TCP features. This affects most TCP options but
adherance to RFC7323 with the timestamp option will prevent a session
from getting established.
PR: 253576
Reviewed By: tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28652
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)
Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
dirtybufthresh is a watermark, slightly below the high watermark for
dirty buffers. When a delayed write is issued, the dirtying thread will
start flushing buffers if the dirtybufthresh watermark is reached. This
helps ensure that the high watermark is not reached, otherwise
performance will degrade as clustering and other optimizations are
disabled (see buf_dirty_count_severe()).
When the buffer cache was partitioned into "domains", the dirtybufthresh
threshold checks were not updated. Fix this.
Reported by: Shrikanth R Kamath <kshrikanth@juniper.net>
Reviewed by: rlibby, mckusick, kib, bdrewery
Sponsored by: Juniper Networks, Inc., Klara, Inc.
Fixes: 3cec5c77d6
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28901
Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size. When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file. An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager. In other words, both the file size and pager
size were used interchangeably.
With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES. In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache. The on-disk copy is unaffected.
Fix this by consistently using the pager size when available.
Reported by: dumbbell
Reviewed by: chs, kib
Tested by: dumbbell, pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28811
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.
Reviewed by: kib
Tested by: arichardson
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28895
In the Neoverse N1 SDP PCIe driver we need to map a page shared
between the firmware and the kernel. Previously we would use
pmap_kenter for this, however as this is not standardised between
architectures switch to the common pmap_qenter.
While here fix the error handling code to clean up on failure.
Reviewed by: br
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28890
Despite the comment to the contrary neither pf nor carp use
in_addmulti(). Nothing does, so get rid of it.
Carp stopped using it in 08b68b0e4c
(2011). It's unclear when pf stopped using it, but before
d6d3f01e0a (2012).
Reviewed by: bz@, melifaro@
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28918
Make sure PD_KILL isn't passed to do_jail_attach, where it might end
up trying to kill the caller's prison (even prison0).
Fix the child jail loop in prison_deref_kill, which was doing the
post-order part during the pre-order part. That's not a system-
killer, but make jails not always die correctly.
Commit 6dd69f0064 ("iflib: introduce isc_dma_width")
failed to build on powerpc due to implicit type conversion
error. Fix that.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The int in the argument to the ternary triggered -Wint-in-bool-context
from gcc. Upstream linux has a larger and more entangled patch,
12f727721eee61b3d19dedb95cb893b2baa9fe41, which doesn't apply cleanly.
When we eventually sync that, we can just drop this change.
Reviewed by: hselasky, imp, kib
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28762
Get rid of db_look_char because it's not compatible with db_get_line().
This fixes the following issue:
db> script lockinfo=show alllocks
db> run lockinfo
db:0:lockinfo> how alllocks
No such command; use "help" to list available commands
Reported by: markj
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28725
db_cmd_match had an even/odd bug, where if a third command was partially
matched (or any odd number greater than one) the search result would be
set back from CMD_AMBIGUOUS to CMD_FOUND, causing the last command in
the list to be executed instead of failing the match.
Reported by: mlaier
Reviewed by: markj, mlaier, vangyzen
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28659
This KASSERT is overzealous because of the following race condition:
1) A managed page which is currently in PQ_LAUNDRY is freed.
vm_page_free_prep calls vm_page_dequeue_deferred()
The page state is:
PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
2) The laundry worker comes around and pick up the page and calls
vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the
queue. We do a vm_page_astate_load and get
PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
as per above.
3) The laundry worker is pre-empted and another thread allocates our page
from the free pool. For example vm_page_alloc_domain_after calls
vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for
an OBJT_UNMANAGED object.
The page state is:
PQ_NONE, 0 - VPO_UNMANAGED
4) The laundry worker resumes, and processes vm_pageout_defer based on the
stale astate which leads to a call to vm_page_pqbatch_submit, which will
trip on the KASSERT.
Submitted by: mlaier
Reviewed by: markj, rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28563
Some DMA controllers are unable to address the full host memory space
and are instead limited to a subset of address range (e.g. 48-bit).
Allow the driver to specify the maximum allowed DMA addressing width
(in bits) for the NIC hardware, by introducing a new field in
if_softc_ctx.
If said field is omitted (set to 0), the lowaddr of DMA window bounds
defaults to BUS_SPACE_MAXADDR.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28706
with the semantic following C11 signal_fence, that is, it establishes
ordering between its place and any interrupt handler executing on the
same CPU.
Reviewed by: markj, mjg, rlibby
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28909
Make the pwm_backlight module depend on backlight, so it
has access to the backlight interface symbols. Otherwise you'll
get an error like:
link_elf: symbol backlight_get_info_desc undefined
Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
MFC after: 3 days
PR: 253765
iflib_rxeof() was counting everything twice. This was introduced when
pfil hooks were added to the iflib receive path. We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.
PR: 253583
Reviewed by: gallatin, erj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28900
softdep_prealloc() must be called to ensure enough journal space is
available, before ffs_extwrite(). Also it must be done before taking
ffs_lock_ea(), because it calls ffs_syncvnode(), potentially dropping
the vnode lock.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
ffs_lock_ea is after the vnode lock, so vnode must not be relocked under
lock_ea. Move ffs_truncate() call in ffs_close_ea() after the lock_ea is
dropped, and only truncate to length zero, since this is the only mode
supported by ffs_truncate() for EAs. Previously code did truncation and
then write.
Zero the part of the ext area that is unused, if truncation is due but not
done because ea area is not zero-length.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Do it in ffs_write(), where we can gracefuly handle relock and its
consequences. In particular, recheck the v_data to see if the vnode
reclamation ended, and return EBADF when we cannot proceed with the
write.
Reviewed by: mckusick
Reported by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
instead of DOINGSOFTDEP(). The softdep_prealloc() function does nothing
in SU case.
Note that the call should be safe with regard to the vnode relock,
because it is called with MNT_NOWAIT, which does not descend into fsync.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue. Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other. This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.
Reviewed by: rlibby, scottl
Discussed with: kib
Sponsored by: Rubicon Communications, LLC ("Netgate")
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28821
Add it to the x86 GENERIC and MINIMAL kernels
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: rpokala
Differential Revision: https://reviews.freebsd.org/D28738
ipmi_ssif will `smbus_request_bus()` to do multiple smbus requests
(which requests the iicbus), and then here in `bread()` we also need to
request the bus because `bread()` takes multiple transactions.
This causes deadlock as it's waiting for the bus it already has without
`IIC_RECURSIVE`.
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D28742
All supported platforms support thread-local vars and __thread.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28796
This makes random read benchmarks look better on a wide ZFS pools.
I am not sure where the original value goes from, but it is there
for too long now.
MFC after: 1 week