This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is
a unique algorithm. This has 3x the performance of epoch in a write heavy
workload with less than half of the read side cost. The memory overhead
is significantly lessened by limiting the free-to-use latency. A synthetic
test uses 1/20th of the memory vs Epoch. There is significant further
discussion in the comments and code review.
This code should be considered experimental. I will write a man page after
it has settled. After further validation the VM will begin using this
feature to permit lockless page lookups.
Both markj and cperciva tested on arm64 at large core counts to verify
fences on weaker ordering architectures. I will commit a stress testing
tool in a follow-up.
Reviewed by: mmacy, markj, rlibby, hselasky
Discussed with: sbahara
Differential Revision: https://reviews.freebsd.org/D22586
The stack pointer is swapped with the sscratch CSR just before the
jump to cpu_exception_handler_user where the first instruction swaps
it again. The two swaps together are a no-op, but the csr swap
instructions can be expensive (e.g. on Bluespec RISC-V cores csr swap
instructions force a full pipeline stall).
Reported by: jrtc27
Reviewed by: br
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23394
Otherwise we risk running into use-after-free.
In particular this codepath ends up dropping all protection before
suspending writes:
ufs_quotactl -> quotaoff_inchange -> vfs_write_suspend_umnt
Reported by: pho
Around a generic call to null_nodeget(), there is nothing that would
prevent the unmount of the nullfs mp until we process to the
insmntque1() point. Calculate the VV_ROOT flag after insmntque1() to
not access mp->mnt_data before we have an exclusively locked vnode
from this mount point on the mp vnode list.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
cpu_switch/throw() and savectx() do not save or restore any values in
these fields which mostly held non-callee-save registers.
makectx() copied these fields from kdb_frame, but they weren't used
except for PC_REGS using pcb_sepc. Change PC_REGS to use
kdb_frame->tf_sepc directly instead.
Reviewed by: br
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23395
When using the processor ID value we mask off the low and high bits that
should be zero. Unfortunatly we don't shift the ID value so it won't be
affected. Add the shift when reading the ID as this will need to align
with the address based target value.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
ctx (and thus ctx.flags) is stack garbage at the start of this
function, so initialize ctx.flags to an explicit value instead of
using binary operations on the garbage.
Reported by: gcc9
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D23368
ahd_inb() returns type uint8_t. The shift left by untyped 24 implicitly
promotes the result to type (signed) int. Then the binary OR with uint64_t
values sign-extends the integer. If bit 31 of the read value happened to be
set, the 64-bit result would have all upper 32 bits set to 1 due to OR. This
is clearly not intended.
Reported by: Coverity
CID: 980473 (old one!)
Make completion event path mostly lockless using EPOCH(9).
Implement a mechanism using EPOCH(9) which allows us to make
the callback path for completion events mostly lockless.
Simplify draining callback events using epoch_wait().
While at it make sure all receive completion callbacks are
covered by the network EPOCH(9), because this is required
when calling if_input() and ether_input() after r357012.
Sponsored by: Mellanox Technologies
Software interrupt handlers are allowed to sleep. In swi_net() there
is a read lock behind NETISR_RLOCK() which in turn ends up calling
msleep() which means the whole of swi_net() cannot be protected by an
EPOCH(9) section. By default the NETISR_LOCKING feature is disabled.
This issue was introduced by r357004. This is a preparation step for
replacing the functionality provided by r357004.
Found by: kib@
Sponsored by: Mellanox Technologies
ThinkPad PrivacyGuard is a built-in toggleable privacy filter that
restricts viewing angles when on. It is an available on some new
ThinkPad models such as the X1 Carbon 7th gen (as an optional HW
upgrade).
The privacy filter can be enabled/disabled via an ACPI call. This commit
adds a sysctl under dev.acpi_ibm that allows for getting and setting the
PrivacyGuard state.
Submitted by: Kamila Součková <kamila@ksp.sk>
Reviewed By: cem, philip
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D23370
Make sure all occurrences of ieee80211_input_xxx() in sys/dev are
covered by a network epoch section. Do not depend on the interrupt
handler nor any taskqueues being in a network epoch section.
This patch should unbreak the PCI WLAN drivers after r357004.
Pointy hat: glebius@
Sponsored by: Mellanox Technologies
ZFS tracks if anything denies VEXEC to allow for a quick check for the
common case of path traversal. Use it.
Differential Revision: https://reviews.freebsd.org/D22224
With this change having the listmtx lock held postpones dooming the vnode.
Use this fact to simplify iteration over the lazy list. It also allows
filters to safely access ->v_data.
Reviewed by: kib (early version)
Differential Revision: https://reviews.freebsd.org/D23397
if_vlan grew a dependency on opt_inet6.h in r356993
if_lagg and if_vlan both grew a dependency on opt_kern_tls.h in r351522
This is needed for standalone module builds of these guys.
MAC is also almost universally a default; every GENERIC includes it, and
it's std.armv[67]. mips is again the oddball here with it only being
included in ERL/OCTEON1.
The only module currently working around this one is mac_veriexec, but it
looks like nothing it builds actually uses the MAC definition. Downstream
consumers enabling MAC in mips using mac_veriexec may be advised to do
something differently here in config.mk.
For untied module builds, we'll generate opt_foo headers if they're included
in SRCS. However, options that would normally be represented in opt_global.h
aren't properly represented.
Start generating opt_global.h with #define VIMAGE for !mips since it's
almost universally a project default and right now kmods must hack it in
themselves in order to be properly compiled for the default kernel. For
example, ^/sys/modules/pf/Makefile
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D23345
On some POWER8 machines, 'ibm,associativity' property may have 6
cells, which would overflow the 5 cells buffer being used.
There was also an issue with the "check if node is root" part,
that have been fixed too.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D23414
The SD bit is defined as the MSB of the sstatus register, meaning its
position will vary depending on the CSR's length. Previously, there were
two (unused) defines for this, for the 32 and 64-bit cases, but their
definitions were swapped.
Consolidate these into one define: SSTATUS_SD, and make the definition
dependent on the value of __riscv_xlen.
Reviewed by: br
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D23402
It doesn't exist in mainline dts due to the issues related
with detaching and reattaching USB3 devices as mentioned in
https://patchwork.kernel.org/patch/10853381/
In case of FreeBSD, as a temporary workaround "usbconfig reset"
command can fix the problem.
Reviewed by: manu
Right now OOM is initiated unconditionally on the page allocation
failure, after the wait.
Reported by: Mark Millard <marklmi@yahoo.com>
Reviewed by: cy, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23409
While here, clean up magic numbers.
The memset(,0,) (and M_ZERO!) can just be removed; the bit_alloc() API already
zeros the allocation.
No functional change.
Reported by: Coverity
CID: 1378286
All of these uses of sizeof() were on the wrong type in relation to the pointer
passed to SYSCTL_ADD_PROC as arg1. Fortunately, none of the handlers actually
use arg2. So just don't pass a (non-zero) arg2.
Reported by: Coverity
CID: 1007701
These were all introduced in the initial import of hwpstate_intel(4).
Reported by: Coverity
CIDs: 1413161, 1413164, 1413165, 1413167
X-MFC-With: r357002
Sample out put now instead of mere VNASSERT failed:
VNASSERT failed: vp->v_holdcnt == 1234 not true at /usr/src/sys/kern/vfs_subr.c:3148 (vputx)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23396
if_epair used the 'params' argument to pass a pointer to the b interface
through if_clone_create().
This pointer can be controlled by userspace, which means it could be abused to
trigger a panic. While this requires PRIV_NET_IFCREATE
privileges those are assigned to vnet jails, which means that vnet jails
could panic the system.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 3 days
Borrow the trick from memset and memmove and use the scale/index/base addressing
to avoid branches.
If a mismatch is found, the routine has to calculate the difference. Make sure
there is always up to 8 bytes to inspect. This replaces the previous loop which
would operate over up to 16 bytes with an unrolled list of 8 tests.
Speed varies a lot, but this is a net win over the previous routine with probably
a lot more to gain.
Validated with glibc test suite.