The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
similar to the kernel memory allocator.
This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated. This also
reduces boilerplate code and simplifies callers.
A wait primitive is supplied for uma zones for similar reasons. This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.
Reviewed by: alc, kib, markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
In reclaim_pv_chunk(), rotate the pv chunks list so that next
invocations of the reclaim do not scan the same pv chunks that could
not be freed. Only do the rotation when there is no parallel scan,
tracked by active_reclaims counter.
To rotate, move all chunks that are before current iteration marker,
after another marker that is inserted at the list tail on start of the
reclaim.
Reviewed by: alc
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Some callers of fpusetregs()/npxsetregs(), most importantly
set_fpcontext(), clear reserved bits. But some did not. Do the
clearing in fpusetregs() and remove now redundand operation from
set_fpcontext().
Reported by: Maxime Villard <max@m00nbsd.net>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This is needed for the HDA emulation with FreeBSD guests.
Reviewed by: marcelo
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12832
timer frequency a power of two. This changes the frequency from 10 to
16.7 MHz (2 ^ 24 HZ). Using a power of two avoids roundoff errors when
doing arithmetic in sbintime_t units.
Testing shows this can fix erratic ntpd behavior in guests using the
hpet timer (which is the default for multicore guests).
Reported by: bsam@
Specifically, devices that do not support PCI-e FLR and were not
gracefully shutdown by the guest OS could continue to issue DMA
requests after the VM was terminated. The changes in r305485 meant
that those DMA requests were completed against the host's memory which
could result in random memory corruption. Instead, leave ppt devices
that are not attached to a VM disabled in the IOMMU and only restore
the devices to the host domain if the ppt(4) driver is detached from a
device.
As an added safety belt, disable busmastering for a pass-through device
when before adding it to the host domain during ppt(4) detach.
PR: 222937
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12661
While here cache-align chains.
This shortens longest found chain during poudriere -j 80 from 32 to 16.
Pushing this higher up will probably require allocation on boot.
HEAD. Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.
Disable building LINT-VIMAGE with VIMAGE being default.
This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.
Requested by: many
Reviewed by: kristof, emaste, hiren
X-MFC after: never
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12639
Note that pv_list_locks is an array and currently it fits 2 locks per line.
Resizing it and/or putting more locks in different lines requires several tests.
MFC after: 1 week
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.
This is needed to implement support for kernel dump compression in the
MI kernel dump code.
Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11722
For processing, reclaim_pv_chunk() removes the pv_chunk from the lru
list, which makes pc_lru linkage invalid. Then the pmap lock is
released, which allows for other thread to free the last pv entry
allocated from the chunk and call free_pv_chunk(), which tries to
modify the invalid linkage.
Similarly, the chunk is inserted into the private tailq new_tail
temporary. Again, free_pv_chunk() might be run and corrupt the
linkage for the new_tail after the pmap lock is dropped.
This is a consequence of r299788 elimination of pvh_global_lock, which
allowed for reclaim to run in parallel with other pmap calls which
free pv chunks.
As a fix, do not remove the chunk from pc_lru queue, use a marker to
remember the position in the queue iteration. We can safely operate
on the chunks after the chunk's pmap is locked, we fetched the chunk
after the marker, and we checked that chunk pmap is same as we have
locked, because chunk removal from pc_lru requires both pv_chunk_mutex
and the pmap mutex owned.
Note that the fix lost an optimization which was present in the
previous algorithm. Namely, new_tail requeueing rotated the pv chunks
list so that reclaim didn't scan the same pv chunks that couldn't be
freed (because they contained a wired and/or superpage mapping) on
every invocation. An additional change is planned which would improve
this.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
allocated, when requested range of descriptors does not fit into
currently allocated LDT, or trim the return if the range fits
partially. Before, the function returned EINVAL.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
use LDT segments immediately.
If the i386_set_ldt() call created a first LDT descriptor (and
consequently created the LDT) for our address space, LDTR is currently
loaded only on the CPU executing the syscall. Other CPUs executing
threads sharing the address space, would only load LDTR after context
switch.
Uncomment set_user_ldt_rv() and call it on all CPUs. Remove critical
section inside set_user_ldt(), it is not needed in the context of call
from smp_rendezvous().
Set md_ldt after md_ldt_sd is initialized using the same code sequence
as in user_ldt_free(). Do the whole initialization in a critical
section, to not race with the context switching while we set LDT.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
cpu_switch.S uses curproc->p_md.md_ldt value as the flag indicating
presence of the process LDT. The flag is checked and then ldt segment
descriptor is copied into the CPU' GDT slot.
Disallow context switches around clearing of the curproc LDT state by
performing the cleanup in critical section. Ensure that the md_ldt
flag is cleared before md_ldt_sd descriptor content is destroyed by
inserting fence between the operations.
We depend on the x86 memory model strong ordering guarantees, in
particular, that cpu_switch.S observes the writes to md_ldt and
md_ldt_sd in the expected order.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Provide consistent snapshot of the requested descriptors by preventing
other threads from modifying LDT while we fetch the data, lock dt_lock
around the read. Copy the data into intermediate buffer, which is
copied out after the lock is dropped.
Use guaranteed atomic (aligned volatile) reads of the descriptors to
use same-size atomic as CPU update to set A bit in the descriptor type
field.
Improve overflow checking for the descriptors range calculations and
remove unneeded casts.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Compilers are allowed to combine plain reads into group operations,
e.g. 64bit element copies of one array into another can be
legitimately optimized back to a memcpy() call, which r323772 tried to
prevent.
Qualify accesses to LDT descriptors with volatile dereference to
ensure that each write indeed occurs. After that, our usual claim of
native-size aligned writes being atomic applies.
This is equivalent to atomic_store(memory_order_relaxed) C11 accesses,
but our machine/atomic.h does not provide corresponding primitive.
Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
On i386, the function is used from the context switch code and needs
to be accessible externally. Amd64 MD context switch does not lock an
LDT spinlock and inlines switching in assembly.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This makes the LDT to use only one page with default settings,
avoiding the need to find contigous 2 pages in KVA. It seems that
most users are fine even with 512 segments.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
machine independent parts of the existing code to a new file that can be
shared between amd64 and arm64.
Reviewed by: kib (previous version), imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D12434
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12413
CAM_DEBUG_TRACE results in way too much debug output than needed now.
When debugging, it's always possible to turn on trace level using camcontrol.
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D12110
AMD Family 17h CPUs have an internal network used to communicate between
the host CPU and the PSP and SMU coprocessors. It exposes a simple
32-bit register space.
Reviewed by: avg (no +1), mjoras, truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12217
This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though
the second with predictable complications on hot-plug and reboot events).
I tested it with PEX 8717 and PEX 8733 chips, but expect it should work
with many other compatible ones too. It supports up to two NT bridges
per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows,
6 or 12 scratchpad registers and 16 doorbells. There are also 4 DMA engines
in those chips, but they are not yet supported.
While there, rename Intel NTB driver from generic ntb_hw(4) to more specific
ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and
alike to Linux naming.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
The actual cache line size has always been 64 bytes.
The 128 number arose as an optimization for Core 2 era Intel processors. By
default (configurable in BIOS), these CPUs would prefetch adjacent cache
lines unintelligently. Newer CPUs prefetch more intelligently.
The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400
series, "Dunnington"). If you are still using one of these CPUs, especially
in a multi-socket configuration, consider locating the "adjacent cache line
prefetch" option in BIOS and disabling it.
Reported by: mjg
Reviewed by: np
Discussed with: jhb
Sponsored by: Dell EMC Isilon
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.
Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.
Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.
The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.
Reviewed by: jhb (previous version)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12023