This fixes panic when trying to run strace(8) from Focal.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32355
The virtual LAPIC driver uses callouts to implement the LAPIC timer.
Callouts are armed using callout_reset_sbt(), which currently puts
everything on CPU 0. On systems running many bhyve VMs this results in
a large amount of contention for CPU 0's callout lock.
Modify vlapic to schedule callouts on the local CPU instead. This
allows timer interrupts to be scheduled more evenly among CPUs where
bhyve is running.
Reviewed by: grehan, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32559
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
This is a re-application of commit
9068f6ea697b1b28ad1326a4c7a9ba86f08b985e.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
The root page is not zeroed at allocation time since with 4-level tables
each entry is copied from a template. However, with 5-level tables only
a single entry is filled, so the rest need to be cleared.
Reported by: alc
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32525
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.
PR: 259036
Requested by: John Hay <john@sanren.ac.za>
Discussed with: ian, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We actually do not know is it safe or not to flush cache for random
BAR/register page existing in the system. It is well-known that for
instance LAPICs cannot tolerate cache flush. As report indicates,
there are more such devices.
This issue typically affects AMD machines which do not report self-snoop,
causing real CLFLUSH invocation on the mapped pages. Intels do self-snoop,
so this change should be nop for them, and unsafe devices, if any, are
already ignored.
Reported and tested by: manu
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument. Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
Drop fpstate only after copying out xfpustate from the thread usermode
save area. Otherwise a context switch between get_fpcontext(), which now
returns the pointer directly into user save area, and copyout, would
cause reinit of the save area, loosing user registers.
Reported, reviewed, and tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D32159
It no longer serves any purpose as thread0's td_frame field is now
initialized during fpuinitstate(). No functional change intended.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32057
When creating a new thread, we unconditionally copy td_frame from the
creating thread. For threads which never return to user mode, this is
unnecessary since td_frame just points to the base of the stack or a
random interrupt frame.
If KASAN is configured this copying may also trigger false positives
since the td_frame region may contain poisoned stack regions. It was
not noticed before since thread0 used a dummy proc0_tf trapframe, and
kernel procs are generally created by thread0. Since commit
df8dd6025af88a99d34f549fa9591a9b8f9b75b1, though, we call
cpu_thread_alloc(&thread0) when initializing FPU state, which
reinitializes thread0.td_frame.
Work around the problem by not copying the frame unless the copying
thread came from user mode. While here, de-duplicate the copying and
remove redundant re(initialization) of td_frame.
Reported by: syzbot+2ec89312bffbf38d9aec@syzkaller.appspotmail.com
Reviewed by: kib
Fixes: df8dd6025af8
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32057
This reverts commit 0f6829488ef32142b9ea1c0806fb5ecfe0872c02.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble. Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant. Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.
PR: 258678
Reported by: cy
Reviewed by: cy, kevans, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32060
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.
There is no reason to keep it in FreeBSD.
Approved by: ed (private mail)
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31923
This reverts commit 9068f6ea697b1b28ad1326a4c7a9ba86f08b985e.
The underlying macro needs to be reworked to avoid problems with control
flow statements.
Reported by: rlibby
For signal send, copyout from the user FPU save area directly.
For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area. We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.
Requested by: jhb
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
Instead do one more allocation at the thread creation time. This frees
a lot of space on the stack.
Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn(). This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.
A useful experiment now would be to reduce KSTACK_PAGES.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
from machdep.c which is too large pile of unrelated things.
Some ptrace functions are moved from machdep.c to ptrace_machdep.c.
Now machdep.c contains code mostly related to the low level initialization
and regular low level operation of the architecture, while signal MD code
and registers handling is placed in exec_machdep.c.
Reviewed by: jhb, markj
Discussed with: jrtc27
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.
Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968
This is one of the pieces required to make modern (ie Focal)
strace(1) work.
Reviewed By: jhb (earlier version)
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D28212
There is no need to restrict trampoline page table to low 1M, it
should work with any pages below 4G. Only wakeup code itself should
be below 1M.
Do not waste level 5 page when LA48 mode is used.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31931
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31931
- Re-implement pcib interface to use standard pci bus driver on top of
vmd(4) instead of custom one.
- Re-implement memory/bus resource allocation to properly handle even
complicated configurations.
- Re-implement interrupt handling to evenly distribute children's MSI/
MSI-X interrupts between available vmd(4) MSI-X vectors and setup them
to be handled by standard OS mechanisms with minimal overhead, except
sharing when unavoidable.
Successfully tested on Dell XPS 13 laptop with Core i7-1185G7 CPU (VMD
device ID 0x9a0b) and single NVMe SSD, dual-booting with Windows 10.
Successfully tested on Supermicro X11DPI-NT motherboard with Xeon(R)
Gold 6242R CPUs (VMD device ID 0x201d), simultaneously handling NVMe
SSD on one PCIe port and PLX bridge with 3 NVMe and 1 AHCI SSDs on
another. Handles SSD hot-plug (except Optane 905p for some reason,
which are not detected until manual bus rescan) and enabled IOMMU
(directly connected SSDs work, but ones connected to the PLX fail
without errors from IOMMU).
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential revision: https://reviews.freebsd.org/D31762
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.
Add regression tests.
Reviewed by: jhb, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31156
Add support for the KVM paravirtual clock device.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29733
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.
Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416
Add a variant of 'rdtscp()' that retains and returns the 'IA32_TSC_AUX'
value read by 'rdtscp'.
Sponsored By: Juniper Networks, Inc.
Sponsored By: Klara, Inc.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D31415