Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number. However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.
Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation. Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.
Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.
Reviewed by: grehan, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26910
On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".
Clamp the size to LONG_MAX. Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().
Reviewed by: chs, kib
Reported by: pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27081
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list. Then it
acquires the global sigio lock and processes the list. However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.
Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty. Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.
Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object. However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object. However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.
Fix this by introducing funsetown_locked(), which avoids the racy check.
Reviewed by: kib
Reported by: pho
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27157
Note this still does not scale but is enough to move it out of the way
for the foreseable future.
In particular a trivial benchmark spawning/killing threads stops contesting
on tidhash.
These options need to be in the kern.opts.mk file to be alive for kernel
and module builds. This also reverts r367579 since that's not needed with
this fix: the host's bsd.opts.mk is irrelevant.
Reviewed by: brooks@
Differential Revision: https://reviews.freebsd.org/D27170
When building out-of-tree modules, it appears that the system share/mk
is used, but sys/conf/kern.mk is used. That results in MK_INIT_ALL_ZERO
being undefined. In the interest of maximum compatability, check
that MK_INIT_ALL_* and COMPILER_FEATURES are defined before comparing
their values.
Reported by: mmacy
Sponsored by: DARPA
Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27028
There are two options:
- WITH_INIT_ALL_ZERO: Zero all variables on the stack.
- WITH_INIT_ALL_PATTERN: Initialize variables with well-defined patterns.
The exact pattern are a compiler implementation detail and vary by type.
They are somewhat documented in the LLVM commit message:
https://reviews.llvm.org/rL349442
I've used WITH_INIT_ALL_* to match Microsoft's InitAll feature rather
than naming them after the LLVM specific compiler flags.
In a range of consumer products, options like these are used in
both debug and production builds with debugs builds using patterns
(intended to provoke crashes on use of uninitialized values) and
production using zeros (deemed more likely to lead to harmless
misbehavior or NULL-pointer dereferences).
Reviewed by: emaste
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27131
- Force dynamic to be a non-PIE binary.
- Add a dynamicpie test which uses a PIE binary.
Reviewed by: andrew
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27127
PIE executables use crtbeginS.o and have a non-NULL dso_handle as a
result.
Reviewed by: andrew, emaste
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27126
This is more consistent with the names used for .ctor and .dtor
symbols and better reflects __JCR_END__'s role.
Reviewed by: andrew
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27125
uma_zone_reserve()), messages like the following appear on the console:
"Freed UMA keg (Test zone) was not empty (0 items). Lost 528 pages of
memory."
When keg_drain_domain() is draining the zone, it tries to keep the number
of items specified in the reservation. However, when we are destroying the
UMA zone, we do not need to keep those items. Therefore, when destroying a
non-secondary and non-cache zone, we should reset the keg reservation to 0
prior to draining the zone.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27129
This in particular unbreaks rtkit.
The limitation was a leftover of previous state, to quote a
comment:
/*
* Though lwpid is unique, only current process is supported
* since there is no efficient way to look up a LWP yet.
*/
Long since then a global tid hash was introduced to remedy
the problem.
Permission checks still apply.
Submitted by: greg_unrelenting.technology (Greg V)
Differential Revision: https://reviews.freebsd.org/D27158
This deduplicates 2 sets of caches using the same sizes.
Memory savings fluctuate a lot, one sample result is buildworld on zfs
saving ~180MB RAM in reduced page count associated with zio caches.
Refer to the Linux commit mentioned below for a more detailed description.
Linux commit:
a18177925c252da7801149abe217c05b80884798
Requested by: Isilon
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Perhaps it made sense in 1998 (r32836), but now it feels a bit out of
place. We tend to avoid documenting non-essential ports variables in
the manual page (we try to document them in the Porter's Handbook instead).
MFC after: 1 week
The revision r342168 broke ABI of ng_nat needlessly and
the change was merged to stable branches breaking ABI there, too.
Unbreak it.
PR: 250722
MFC after: 1 week
In r367327 generic_bs_sr_<n> were derived from mips. Given we are calling
generic_bs_w_<n> and no write directly, we do not have to do the address
calculations ourselves as eneric_bs_w_<n> will do a str val [bsh, offset].
All we actually have to do is increment offset.
MFC after: 3 days
There are workloads with very bursty tid allocation and since unr tries very
hard to have small-sized bitmaps it keeps reallocating memory. Just doing
buildkernel gives almost 150k calls to free coming from unr.
This also gets rid of the hack which tried to postpone TID reuse.
Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D27101
The intent is to replace the current id allocation method and a known upper
bound will be useful.
Reviewed by: kib (previous version), markj (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D27100
* TCP segments without timestamps should be dropped when support for
the timestamp option has been negotiated.
* TCP segments with timestamps should be processed normally if support
for the timestamp option has not been negotiated.
This patch enforces the above.
PR: 250499
Reviewed by: gnn, rrs
MFC after: 1 week
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D27148
It includes:
ACPI_HANDLE() implementation.
AC and VIDEO ACPI events notification support.
Replacement of hand-rolled GPLed _DSM method evaluation helpers
with in-base ones.
Submitted by: wulf
Differential Revision: https://reviews.freebsd.org/D26603
from a Linux binary. Should come handy for AppImages.
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26959
The MAC address can be set with the optional mac-addr property in the VF
section of the iovctl.conf(5) used to instantiate the VFs.
MFC after: 2 weeks
Sponsored by: Chelsio Communications