The symbol is just an offset in the hardware TSS structure, it is not
limited to the common_tss instance.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Add cpuctl(4) ioctl CPUCTL_EVAL_CPU_FEATURES which forces re-read of
cpu_features, cpu_features2, cpu_stdext_features, and
std_stdext_features2.
The intent is to allow the kernel to see the changes in the CPU
features after micocode update. Of course, the update is not atomic
across variables and not synchronized with readers. See the man page
warning as well.
Reviewed by: imp (previous version), jilles
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13770
It does not change anything in the behavior of trap_pfault(), while
eliminating obfuscation of jumping to the code which checks for the
condition reversed of the goto cause. Also avoid force initialize the
rv variable, since it is now only accessed after storing vm_fault()
return value.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13725
The entry must be logged "manually" using TSRAW rather than TSENTER
since PCPU data structures have not yet been initialized and thus
curthread cannot be accessed; &thread0 is what will become curthread
later in hammer_time.
Other MD initialization code should be similarly instrumented in order
to gain visibility into the time spent before entering mi_startup; this
will require some care and testing from people with access to such
hardware.
The DTrace fasttrap entry points expect a struct reg containing the
register values of the calling thread. Perform the conversion in
fasttrap rather than in the trap handler: this reduces the number of
ifdefs and avoids wasting stack space for traps that don't involve
DTrace.
MFC after: 2 weeks
This variable should be pure MI except possibly for reading it in MD
dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998. There were many imperfections in
r36441. This commit fixes only a small one, to simplify fixing the
others 1 arch at a time. (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)
Stop issuing pre-assigned number to enumerate all page table pages,
the assignment is incorrect. Instead automatically calculate the next
unused index. This index in fact does not serve any purpose except to
be unique to satisfy vm_page_grab() interface, we do not look up the
page by the index later.
Reported and tested by: emaste
Reviewed by: andrew
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
PR: 223906
Differential revision: https://reviews.freebsd.org/D13273
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).
Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.
Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.
To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.
Reviewed by: kib, jhibbits
Differential Revision: https://reviews.freebsd.org/D13180
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
similar to the kernel memory allocator.
This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated. This also
reduces boilerplate code and simplifies callers.
A wait primitive is supplied for uma zones for similar reasons. This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.
Reviewed by: alc, kib, markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
In reclaim_pv_chunk(), rotate the pv chunks list so that next
invocations of the reclaim do not scan the same pv chunks that could
not be freed. Only do the rotation when there is no parallel scan,
tracked by active_reclaims counter.
To rotate, move all chunks that are before current iteration marker,
after another marker that is inserted at the list tail on start of the
reclaim.
Reviewed by: alc
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Some callers of fpusetregs()/npxsetregs(), most importantly
set_fpcontext(), clear reserved bits. But some did not. Do the
clearing in fpusetregs() and remove now redundand operation from
set_fpcontext().
Reported by: Maxime Villard <max@m00nbsd.net>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Note that pv_list_locks is an array and currently it fits 2 locks per line.
Resizing it and/or putting more locks in different lines requires several tests.
MFC after: 1 week
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.
This is needed to implement support for kernel dump compression in the
MI kernel dump code.
Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11722
For processing, reclaim_pv_chunk() removes the pv_chunk from the lru
list, which makes pc_lru linkage invalid. Then the pmap lock is
released, which allows for other thread to free the last pv entry
allocated from the chunk and call free_pv_chunk(), which tries to
modify the invalid linkage.
Similarly, the chunk is inserted into the private tailq new_tail
temporary. Again, free_pv_chunk() might be run and corrupt the
linkage for the new_tail after the pmap lock is dropped.
This is a consequence of r299788 elimination of pvh_global_lock, which
allowed for reclaim to run in parallel with other pmap calls which
free pv chunks.
As a fix, do not remove the chunk from pc_lru queue, use a marker to
remember the position in the queue iteration. We can safely operate
on the chunks after the chunk's pmap is locked, we fetched the chunk
after the marker, and we checked that chunk pmap is same as we have
locked, because chunk removal from pc_lru requires both pv_chunk_mutex
and the pmap mutex owned.
Note that the fix lost an optimization which was present in the
previous algorithm. Namely, new_tail requeueing rotated the pv chunks
list so that reclaim didn't scan the same pv chunks that couldn't be
freed (because they contained a wired and/or superpage mapping) on
every invocation. An additional change is planned which would improve
this.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
allocated, when requested range of descriptors does not fit into
currently allocated LDT, or trim the return if the range fits
partially. Before, the function returned EINVAL.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
use LDT segments immediately.
If the i386_set_ldt() call created a first LDT descriptor (and
consequently created the LDT) for our address space, LDTR is currently
loaded only on the CPU executing the syscall. Other CPUs executing
threads sharing the address space, would only load LDTR after context
switch.
Uncomment set_user_ldt_rv() and call it on all CPUs. Remove critical
section inside set_user_ldt(), it is not needed in the context of call
from smp_rendezvous().
Set md_ldt after md_ldt_sd is initialized using the same code sequence
as in user_ldt_free(). Do the whole initialization in a critical
section, to not race with the context switching while we set LDT.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
cpu_switch.S uses curproc->p_md.md_ldt value as the flag indicating
presence of the process LDT. The flag is checked and then ldt segment
descriptor is copied into the CPU' GDT slot.
Disallow context switches around clearing of the curproc LDT state by
performing the cleanup in critical section. Ensure that the md_ldt
flag is cleared before md_ldt_sd descriptor content is destroyed by
inserting fence between the operations.
We depend on the x86 memory model strong ordering guarantees, in
particular, that cpu_switch.S observes the writes to md_ldt and
md_ldt_sd in the expected order.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Provide consistent snapshot of the requested descriptors by preventing
other threads from modifying LDT while we fetch the data, lock dt_lock
around the read. Copy the data into intermediate buffer, which is
copied out after the lock is dropped.
Use guaranteed atomic (aligned volatile) reads of the descriptors to
use same-size atomic as CPU update to set A bit in the descriptor type
field.
Improve overflow checking for the descriptors range calculations and
remove unneeded casts.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Compilers are allowed to combine plain reads into group operations,
e.g. 64bit element copies of one array into another can be
legitimately optimized back to a memcpy() call, which r323772 tried to
prevent.
Qualify accesses to LDT descriptors with volatile dereference to
ensure that each write indeed occurs. After that, our usual claim of
native-size aligned writes being atomic applies.
This is equivalent to atomic_store(memory_order_relaxed) C11 accesses,
but our machine/atomic.h does not provide corresponding primitive.
Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
On i386, the function is used from the context switch code and needs
to be accessible externally. Amd64 MD context switch does not lock an
LDT spinlock and inlines switching in assembly.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This makes the LDT to use only one page with default settings,
avoiding the need to find contigous 2 pages in KVA. It seems that
most users are fine even with 512 segments.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
machine independent parts of the existing code to a new file that can be
shared between amd64 and arm64.
Reviewed by: kib (previous version), imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D12434
Care must be taken when updating the active LDT, since parallel
threads might try to load a segment descriptor which is currently
updated. Since the results are undefined, this cannot be ignored by
claiming to be an application race.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12413
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.
Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.
Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.
The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.
Reviewed by: jhb (previous version)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12023
- Use more relevant name 'signo' instead of 'i' for the local variable
which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
of the trap() function. Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.
Requested and reviewed by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week