Commit Graph

8665 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
f9246e1484 linux: Implement some bits of PTRACE_PEEKUSER
This makes Linux gdb from Bionic a little less broken.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32455
2021-10-17 12:20:21 +01:00
Edward Tomasz Napierala
75a9d95b4d linux: Adjust PTRACE_GET_SYSCALL_INFO buffer size semantics
The tests/ptrace_syscall_info test from strace(1) complained
about this.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32368
2021-10-17 11:49:46 +01:00
Konstantin Belousov
e81e77c5a0 Enable PPS_SYNC on amd64, arm64 and armv7
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.

PR:	259036
Requested by:	John Hay <john@sanren.ac.za>
Discussed with:	ian, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-10 22:34:40 +03:00
Konstantin Belousov
ce21d4bff1 amd64 efirt: do not flush cache for runtime pages
We actually do not know is it safe or not to flush cache for random
BAR/register page existing in the system.  It is well-known that for
instance LAPICs cannot tolerate cache flush.  As report indicates,
there are more such devices.

This issue typically affects AMD machines which do not report self-snoop,
causing real CLFLUSH invocation on the mapped pages.  Intels do self-snoop,
so this change should be nop for them, and unsafe devices, if any, are
already ignored.

Reported and tested by:	manu
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32318
2021-10-06 05:53:20 +03:00
Konstantin Belousov
33c17670af amd64: add pmap_page_set_memattr_noflush()
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument.  Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32318
2021-10-06 05:53:12 +03:00
Mitchell Horne
ab4ed843a3 minidump: De-duplicate the progress bar
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31885
2021-09-29 16:42:21 -03:00
Mitchell Horne
31991a5a45 minidump: De-duplicate is_dumpable()
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().

Reviewed by:	kib, markj, imp (earlier version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31884
2021-09-29 16:41:52 -03:00
Kirk McKusick
1c8d670cb6 Bring the tags and links entries for amd64 up to date.
MFC after:    1 week
Sponsored by: Netflix
2021-09-27 20:04:51 -07:00
Konstantin Belousov
b1e2f063ae amd64 sendsig: fix context corruption
Drop fpstate only after copying out xfpustate from the thread usermode
save area. Otherwise a context switch between get_fpcontext(), which now
returns the pointer directly into user save area, and copyout, would
cause reinit of the save area, loosing user registers.

Reported, reviewed, and tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D32159
2021-09-27 20:12:46 +03:00
Mark Johnston
f766826fe3 amd64: Remove proc0_tf, the bootstrap trapframe
It no longer serves any purpose as thread0's td_frame field is now
initialized during fpuinitstate().  No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32057
2021-09-25 10:18:52 -04:00
Mark Johnston
ca1e447b10 amd64: Avoid copying td_frame from kernel procs
When creating a new thread, we unconditionally copy td_frame from the
creating thread.  For threads which never return to user mode, this is
unnecessary since td_frame just points to the base of the stack or a
random interrupt frame.

If KASAN is configured this copying may also trigger false positives
since the td_frame region may contain poisoned stack regions.  It was
not noticed before since thread0 used a dummy proc0_tf trapframe, and
kernel procs are generally created by thread0.  Since commit
df8dd6025a, though, we call
cpu_thread_alloc(&thread0) when initializing FPU state, which
reinitializes thread0.td_frame.

Work around the problem by not copying the frame unless the copying
thread came from user mode.  While here, de-duplicate the copying and
remove redundant re(initialization) of td_frame.

Reported by:	syzbot+2ec89312bffbf38d9aec@syzkaller.appspotmail.com
Reviewed by:	kib
Fixes:		df8dd6025a
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32057
2021-09-25 10:18:30 -04:00
Konstantin Belousov
e36d0e86e3 Revert "linux32: add a hack to avoid redefining the type of the savefpu tag"
This reverts commit 0f6829488e.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble.  Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant.  Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.

PR:	258678
Reported by:	cy
Reviewed by:	cy, kevans, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32060
2021-09-22 23:17:47 +03:00
Konstantin Belousov
cf0ee8738e Drop cloudabi
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.

There is no reason to keep it in FreeBSD.

Approved by:	ed (private mail)
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31923
2021-09-22 00:18:44 +03:00
Konstantin Belousov
c2ee4dfd04 ia32_get_fpcontext(): xfpusave can be legitimately NULL
Reported by:	cy
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Fixes:	bd9e0f5df6
2021-09-22 00:17:06 +03:00
Mark Johnston
bcdc599dc2 Revert "cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it"
This reverts commit 9068f6ea69.

The underlying macro needs to be reworked to avoid problems with control
flow statements.

Reported by:	rlibby
2021-09-21 13:51:42 -04:00
Konstantin Belousov
2e79a21632 amd64: consistently use uprintf() to report weird situations in sigreturn
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
bd9e0f5df6 amd64: eliminate td_md.md_fpu_scratch
For signal send, copyout from the user FPU save area directly.

For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area.  We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.

Requested by:	jhb
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
df8dd6025a amd64: stop using top of the thread' kernel stack for FPU user save area
Instead do one more allocation at the thread creation time.  This frees
a lot of space on the stack.

Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn().  This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.

A useful experiment now would be to reduce KSTACK_PAGES.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
9151abe323 exec_machdep.c: some style, use ANSI C definition for sys_sigreturn()
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
12ca33f44f amd64: move signal handling and register structures manipulations into exec_machdep.c
from machdep.c which is too large pile of unrelated things.
Some ptrace functions are moved from machdep.c to ptrace_machdep.c.

Now machdep.c contains code mostly related to the low level initialization
and regular low level operation of the architecture, while signal MD code
and registers handling is placed in exec_machdep.c.

Reviewed by:	jhb, markj
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
a42d362bb5 amd64: centralize definitions of CS_SECURE and EFL_SECURE
Requested by	markj
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:14 +03:00
Mark Johnston
9068f6ea69 cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well.  No functional change
intended.

Reviewed by:	cem, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32029
2021-09-21 12:07:47 -04:00
Konstantin Belousov
f575573ca5 Remove PT_GET_SC_ARGS_ALL
Reimplement bdf0f24bb1 by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.

Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31968
2021-09-16 20:11:27 +03:00
Edward Tomasz Napierala
bdf0f24bb1 linux: implement PTRACE_GET_SYSCALL_INFO
This is one of the pieces required to make modern (ie Focal)
strace(1) work.

Reviewed By:	jhb (earlier version)
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D28212
2021-09-14 20:19:55 +00:00
Konstantin Belousov
1c56781cc9 amd64 wakeup: rework trampoline page allocation
There is no need to restrict trampoline page table to low 1M, it
should work with any pages below 4G.  Only wakeup code itself should
be below 1M.

Do not waste level 5 page when LA48 mode is used.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31931
2021-09-14 00:23:15 +03:00
Konstantin Belousov
2b6eec531a x86: duplicate acpi_wakeup.c per i386 and amd64
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31931
2021-09-14 00:23:14 +03:00
Alexander Motin
7af4475a6e vmd(4): Major driver refactoring
- Re-implement pcib interface to use standard pci bus driver on top of
vmd(4) instead of custom one.
 - Re-implement memory/bus resource allocation to properly handle even
complicated configurations.
 - Re-implement interrupt handling to evenly distribute children's MSI/
MSI-X interrupts between available vmd(4) MSI-X vectors and setup them
to be handled by standard OS mechanisms with minimal overhead, except
sharing when unavoidable.

Successfully tested on Dell XPS 13 laptop with Core i7-1185G7 CPU (VMD
device ID 0x9a0b) and single NVMe SSD, dual-booting with Windows 10.

Successfully tested on Supermicro X11DPI-NT motherboard with Xeon(R)
Gold 6242R CPUs (VMD device ID 0x201d), simultaneously handling NVMe
SSD on one PCIe port and PLX bridge with 3 NVMe and 1 AHCI SSDs on
another.  Handles SSD hot-plug (except Optane 905p for some reason,
which are not detected until manual bus rescan) and enabled IOMMU
(directly connected SSDs work, but ones connected to the PLX fail
without errors from IOMMU).

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential revision:	https://reviews.freebsd.org/D31762
2021-09-02 20:58:02 -04:00
Konstantin Belousov
9939af1a16 amd64: correctly calculate KVA of the preloaded ucode blob
when kernphys != 2M

Reported and tested by:	kbowling
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-08-31 04:46:12 +03:00
Andrew Turner
b792434150 Create sys/reg.h for the common code previously in machine/reg.h
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).

Reviewed by:	imp, markj
Sponsored by:	DARPA, AFRL (original work)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19830
2021-08-30 12:50:53 +01:00
Konstantin Belousov
7aa47cace1 amd64: remove lfence after swapgs on syscall entry
According to the description of SBSS issue at
https://software.intel.com/content/www/us/en/develop/articles/software-security-guidance/technical-documentation/speculative-behavior-swapgs-and-segment-registers.html
lfence after swapgs is needed only for the case when swapgs could be
speculatively executed.  Since syscall entry, unlike exception and
interrupt entries, executes swapgs unconditionally, there is no
opportunity for speculation.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31682
2021-08-26 19:09:21 +03:00
Cyril Zhang
a85404906b vmm: Add credential to cdev object
Add a credential to the cdev object in sysctl_vmm_create(), then check
that we have the correct credentials in sysctl_vmm_destroy(). This
prevents a process in one jail from opening or destroying the /dev/vmm
file corresponding to a VM in a sibling jail.

Add regression tests.

Reviewed by:	jhb, markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31156
2021-08-18 13:41:33 -04:00
Adam Fenn
6c69c6bb4c kvm_clock: KVM paravirtual clock support
Add support for the KVM paravirtual clock device.

Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29733
2021-08-14 15:57:54 +03:00
Adam Fenn
652ae7b114 x86: cpufunc: Add rdtsc_ordered()
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.

Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.

Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31416
2021-08-14 15:57:53 +03:00
Adam Fenn
908e277230 x86: cpufunc: Add rdtscp_aux()
Add a variant of 'rdtscp()' that retains and returns the 'IA32_TSC_AUX'
value read by 'rdtscp'.

Sponsored By:	Juniper Networks, Inc.
Sponsored By:	Klara, Inc.
Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D31415
2021-08-14 15:57:53 +03:00
NagaChaitanya Vellanki
2a9b4076dc
Merge common parts of i386 and amd64's ieeefp.h into x86/x86_ieeefp.h
MFC after:	1 week
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D26292
2021-08-12 18:45:22 +08:00
Dmitry Chagin
bed2ac27a1 linux(4): Remove the unnecessary spaces.
MFC after:		2 weeks
2021-08-12 11:58:33 +03:00
Dmitry Chagin
b356030e67 linux(4): Regen for clone3 system call.
MFC after:		2 weeks
2021-08-12 11:50:22 +03:00
Dmitry Chagin
17913b0b6b linux(4): Implement clone3 system call.
clone3 system call is used by glibc-2.34.

Differential revision:	https://reviews.freebsd.org/D31475
MFC after:		2 weeks
2021-08-12 11:49:36 +03:00
Dmitry Chagin
0a4b664ae8 linux(4): Add struct clone_args for future clone3 system call.
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.

Differential revision:	https://reviews.freebsd.org/D31474
MFC after:		2 weeks
2021-08-12 11:49:01 +03:00
Dmitry Chagin
0c08f34f4d linux(4): Regen for clone syscall.
MFC after:		2 weeks
2021-08-12 11:47:31 +03:00
Dmitry Chagin
f1c450492f linux(4): Change clone syscall definition to match Linux actual one.
Differential revision:	https://reviews.freebsd.org/D31473
MFC after:		2 weeks
2021-08-12 11:46:36 +03:00
Dmitry Chagin
de8374df28 fork: Allow ABI to specify fork return values for child.
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D31472
MFC after:		2 weeks
2021-08-12 11:45:25 +03:00
Dmitry Chagin
bee191e46f linux(4): Regen for faccessat2 system call.
MFC after:		2 weeks
2021-08-12 11:41:35 +03:00
Dmitry Chagin
13d79be995 linux(4): Implement faccessat2 system call.
It's used by bash on arm64 with glibc-2.32.

Reviewed by:		trasz
Differential Revision:	https://reviews.freebsd.org/D31345
MFC after:		2 weeks
2021-08-12 11:40:42 +03:00
Ka Ho Ng
179bc5729d vmm: Fix wrong assert in ivhd_dev_add_entry
The correct condition is to check the number of ivhd entries fit into
the array.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D31514
2021-08-12 15:56:16 +08:00
Mark Johnston
1263022e39 Add a GENERIC-KMSAN kernel configuration for amd64
Sponsored by:	The FreeBSD Foundation
2021-08-10 21:27:53 -04:00
Mark Johnston
b0f71f1bc5 amd64: Add MD bits for KMSAN
Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code.  This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls.  Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.

Use these functions in amd64 interrupt and exception handlers.  Note
that handlers for user->kernel transitions need not be annotated.

Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31467
2021-08-10 21:27:53 -04:00
Mark Johnston
8978608832 amd64: Populate the KMSAN shadow maps and integrate with the VM
- During boot, allocate PDP pages for the shadow maps.  The region above
  KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array.  For now, this array is
  not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled.  As with
  KASAN, sanitizer instrumentation appears to create stack frames large
  enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured.  KMSAN
  cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured.  Each
  buffer has a static MAXPHYS-sized allocation of KVA, which in turn
  eats 2*MAXPHYS of space in the shadow map.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31295
2021-08-10 21:27:53 -04:00
Mark Johnston
a422084abb Add the KMSAN runtime
KMSAN enables the use of LLVM's MemorySanitizer in the kernel.  This
enables precise detection of uses of uninitialized memory.  As with
KASAN, this feature has substantial runtime overhead and is intended to
be used as part of some automated testing regime.

The runtime maintains a pair of shadow maps.  One is used to track the
state of memory in the kernel map at bit-granularity: a bit in the
kernel map is initialized when the corresponding shadow bit is clear,
and is uninitialized otherwise.  The second shadow map stores
information about the origin of uninitialized regions of the kernel map,
simplifying debugging.

KMSAN relies on being able to intercept certain functions which cannot
be instrumented by the compiler.  KMSAN thus implements interceptors
which manually update shadow state and in some cases explicitly check
for uninitialized bytes.  For instance, all calls to copyout() are
subject to such checks.

The runtime exports several functions which can be used to verify the
shadow map for a given buffer.  Helpers provide the same functionality
for a few structures commonly used for I/O, such as CAM CCBs, BIOs and
mbufs.  These are handy when debugging a KMSAN report whose
proximate and root causes are far away from each other.

Obtained from:	NetBSD
Sponsored by:	The FreeBSD Foundation
2021-08-10 21:27:53 -04:00
Mark Johnston
f95f780ea4 amd64: Define KVA regions for KMSAN shadow maps
KMSAN requires two shadow maps, each one-to-one with the kernel map.
Allocate regions of the kernels PML4 page for them.  Add functions to
create mappings in the shadow map regions, these will be used by the
KMSAN runtime.

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31295
2021-08-10 21:27:52 -04:00
Mark Johnston
4fd450a87d amd64 pmap: Pre-set PG_M on 2MB KASAN shadow map entries
Also remove a redundant assertion in pmap_kasan_enter().

Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31295
2021-08-10 21:22:07 -04:00
Mark Johnston
41335c6b7f vmm: Make iommu ops tables const
While here, use designated initializers and rename some AMD iommu method
implementations to match the corresponding op names.  No functional
change intended.

Reviewed by:	grehan
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31462
2021-08-09 13:28:27 -04:00
Mark Johnston
e54ae8258d amd64: Fix output operand specs for the stmxcsr and vmread intrinsics
This does not appear to affect code generation, at least with the
default toolchain.

Noticed because incorrect output specifications lead to false positives
from KMSAN, as the instrumentation uses them to update shadow state for
output operands.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31466
2021-08-09 13:28:08 -04:00
Ed Maste
9feff969a0 Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by:	The FreeBSD Foundation
2021-08-08 10:42:24 -04:00
Ka Ho Ng
df95cc76af vmm: Bump vmname buffer in struct vm to VM_MAX_NAMELEN + 1
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.

MFC after:	3 days
Reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31372
2021-08-02 17:55:08 +08:00
Roger Pau Monné
82bf6a2566 xen/timer: fix amd64 LINT kernel build
On amd64 XENHVM depends on the xentimer device for PVH early startup,
so both should be added or removed together (like the current
dependency with xenpci). Fix this by adding xentimer to NOTES and
updating the comments on the config files. Note that on i386 there's
no such dependency between xentimer and XENHVM, since there's no PVH
support.

While there also fix the MINIMAL i386 build to include the xentimer,
so it keeps the same functionality as before xentimer was split from
XENHVM.

Reported by: lwhsu
PR: 257549
Fixes: ae59812748 ('xen/timer: make xen timer optional')
2021-08-02 10:33:35 +02:00
Konstantin Belousov
665895db26 amd64 pmap_vm_page_alloc_check(): loose the assert
Current expression checks that vm_page_alloc(9) never returns a page
belonging to the preload area.  This is not true if something was freed
from there, for instance a preloaded module was unloaded, or ucode update
freed.

Only check that we never allow to allocate a page belonging to the kernel
proper, check against _end.

Reported and tested by:	dhw
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-08-02 03:28:33 +03:00
Konstantin Belousov
1a55a3a729 amd64 pmap_vm_page_alloc_check(): print more data for failed assert
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-08-01 16:42:02 +03:00
Konstantin Belousov
041b7317f7 Add pmap_vm_page_alloc_check()
which is the place to put MD asserts about allocated pages.

On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-31 16:53:42 +03:00
Konstantin Belousov
e18380e341 amd64: do not assume that kernel is loaded at 2M physical
Allow any 2M aligned contiguous location below 4G for the staging
area location.  It should still be mapped by loader at KERNBASE.

The assumption kernel makes about loader->kernel handoff with regard to
the MMU programming are explicitly listed at the beginning of hammer_time(),
where kernphys is calculated.  Now kernphys is the variable instead of
symbol designating the physical address.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-31 16:53:42 +03:00
Mark Johnston
a90d053b84 Simplify kernel sanitizer interceptors
KASAN and KCSAN implement interceptors for various primitive operations
that are not instrumented by the compiler.  KMSAN requires them as well.
Rather than adding new cases for each sanitizer which requires
interceptors, implement the following protocol:
- When interceptor definitions are required, define
  SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX.
- In headers that declare functions which need to be intercepted by a
  sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide
  declarations.
- When SAN_RUNTIME is defined, do not redefine the names of intercepted
  functions.  This is typically the case in files which implement
  sanitizer runtimes but is also needed in, for example, files which
  define ifunc selectors for intercepted operations.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-29 21:13:32 -04:00
Konstantin Belousov
2572376f7f amd64: do not touch low memory in AP startup unless we used legacy boot
This fixes several ommisions in 48216088b1

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31343
2021-07-30 01:20:45 +03:00
Konstantin Belousov
b27fe1c3ba amd64: stop doing special allocation for the AP startup trampoline
There is no reason now why do we need to allocate trampoline page very
early in the boot process.  The only requirement for the page is that
it is below 1M to be usable by the real mode during init.  This can be
handled by vm_alloc_contig() when we do the startup.

Also assert that startup trampoline fits into single page.  In principle
we can do multi-page allocation if needed, but it is not.

Move the alloc_ap_trampoline() function and the boot_address variable to
i386/mp_machdep.c.  Keep existing mechanism of early alloc on i386.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31343
2021-07-30 01:20:45 +03:00
Mark Johnston
4b136ef259 amd64: Set GS.base before calling init_secondary() on APs
KMSAN instrumentation requires thread-local storage to track
initialization state for function parameters and return values.  This
buffer is accessed as part of each function prologue.  It is provided by
the KMSAN runtime, which looks up a pointer in the current thread's
structure.

When KMSAN is configured, init_secondary() is instrumented, but this
means that GS.base must be initialized first, otherwise the runtime
cannot safely access curthread.  Work around this by loading GS.base
before calling init_secondary(), so that the runtime can at least check
curthread == NULL and return a pointer to some dummy storage.  Note that
init_secondary() still must reload GS.base after calling lgdt(), which
loads a selector into %gs, which in turn clears the base register.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31336
2021-07-29 10:22:37 -04:00
Mark Johnston
e153745083 amd64: Set MSR_KGSBASE to 0 during AP startup
There is no reason to initialize it to anything else, and this matches
initialization of the BSP.  No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31336
2021-07-29 10:14:05 -04:00
Dmitry Chagin
f337940144 linux(4): Fix gcc buld.
gcc failed as it didn't inlined the builtins and generates calls to
the libgcc, ld can't find libgcc as cross-toolchain libgcc is not installed.
To avoid this add internal vDSO ffs functions without optimized builtins.

Reported by:		jhb
MFC after:		2 weeks
2021-07-29 09:52:33 +03:00
Julien Grall
ae59812748 xen/timer: make xen timer optional
The timer is not used on ARM.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29041
2021-07-28 17:27:03 +02:00
Konstantin Belousov
d6717f8778 amd64: rework AP startup
Stop using temporal page table with 1:1 mapping of low 1G populated over
the whole VA.  Use 1:1 mapping of low 4G temporarily installed in the
normal kernel page table.

The features are:
- now there is one less step for startup asm to perform
- the startup code still needs to be at lower 1G because CPU starts in
  real mode. But everything else can be located anywhere in low 4G
  because it is accessed by non-paged 32bit protected mode.  Note that
  kernel page table root page is at low 4G, as well as the kernel itself.
- the page table pages can be allocated by normal allocator, there is
  no need to carve them from the phys_avail segments at very early time.
  The allocation of the page for startup code still requires some magic.
  Pages are freed after APs are ignited.
- la57 startup for APs is less tricky, we directly load the final page
  table and do not need to tweak the paging mode.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-27 20:11:15 +03:00
Edward Tomasz Napierala
30c6d98219 linux: implement sigaltstack(2) on arm64
... by making it machine-independent.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31286
2021-07-27 13:34:49 +00:00
Edward Tomasz Napierala
b54838003c linux: fix sigaltstack on amd64
To determine whether to use alternate signal stack or not,
we need to use the native signal number, not the one translated
with bsd_to_linux_signal().

In practical terms, this fixes golang.

Reviewed By:	dchagin
Fixes:		135dd0cab5
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31298
2021-07-26 11:57:52 +01:00
Alan Cox
3687797618 amd64: Don't repeat unnecessary tests when cmpset fails
When a cmpset for removing the PG_RW bit in pmap_promote_pde() fails,
there is no need to repeat the alignment, PG_A, and PG_V tests just to
reload the PTE's value.  The only bit that we need be concerned with at
this point is PG_M.  Use fcmpset instead.

MFC after:	1 week
2021-07-24 13:06:47 -05:00
Konstantin Belousov
48216088b1 amd64: do not touch low memory in AP startup unless we used legacy boot
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-24 18:52:45 +03:00
Konstantin Belousov
6a3821369f amd64: make efi_boot global
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-24 18:52:44 +03:00
Konstantin Belousov
c8bae074d9 amd64: add pmap_alloc_page_below_4g()
Suggested and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-24 18:52:44 +03:00
Konstantin Belousov
34516d4ad1 amd64 pti init: fix calculation of the kernel text start
Old expression happens to provide the correct answer, but assumes that
kernel is loaded at physical address zero, with 2M gap.  Do not use
kernphys to calculate KVA of kernel text start, just explicitly write
out KERNBASE and the hole size.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-24 18:52:44 +03:00
Edward Tomasz Napierala
72f7ddb587 linux: implement rt_sigsuspend(2) on arm64
... by making it architecture-independent.

Reviewed By:	dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D31259
2021-07-23 20:13:00 +00:00
Alan Cox
b7de535288 amd64: Eliminate a redundant test from pmap_enter_object()
The call to pmap_allow_2m_x_page() in pmap_enter_object() is redundant.
Specifically, even without the call to pmap_allow_2m_x_page() in
pmap_enter_object(), pmap_allow_2m_x_page() is eventually called by
pmap_enter_pde(), so the outcome will be the same.  Essentially,
calling pmap_allow_2m_x_page() in pmap_enter_object() amounts to
"optimizing" for the unexpected case.

Reviewed by:	kib
MFC after:	1 week
2021-07-23 23:15:42 -05:00
Mark Johnston
f95e683fa2 Annotate amd64 stack unwinders with __nomemorysanitize
Sponsored by:	The FreeBSD Foundation
2021-07-23 10:47:13 -04:00
Dmitry Chagin
cf8d74e3fe linux(4): Allow musl brand to use FUTEX_REQUEUE op.
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.

PR:			255947
Submitted by:		Philippe Michaud-Boudreault
Differential Revision:	https://reviews.freebsd.org/D30332
2021-07-20 14:39:20 +03:00
Dmitry Chagin
1ca6b15bbd Drop "All rights reserved" from my copyright statements.
Add email and fixup years while here.

Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D30912
MFC after:		2 weeks
2021-07-20 10:05:50 +03:00
Dmitry Chagin
ae8330b448 linux(4): Add arch name to the some printfs.
Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30904
MFC after:		2 weeks
2021-07-20 10:05:08 +03:00
Dmitry Chagin
09cffde975 linux(4): Fixup the vDSO initialization order.
The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();

As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.

Reviewed by:		trasz
Differential Revision:	https://reviews.freebsd.org/D30902
MFC after		2 weeks
2021-07-20 10:02:34 +03:00
Dmitry Chagin
a543556c81 linux(4): Constify vdso install/deinstall.
In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.

Reviewed by:		emaste
Differential revision:	https://reviews.freebsd.org/D30901
MFC after:		2 weeks
2021-07-20 10:01:47 +03:00
Dmitry Chagin
9931033bbf linux(4); Almost complete the vDSO.
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.

The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
  implemented in the vDSO: for now gettimeofday, clock_gettime.

The first two have been implemented, so add the implementation of system
calls.

System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
  can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).

In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.

Tested by:		trasz (arm64)
Differential revision:  https://reviews.freebsd.org/D30900
MFC after:              2 weeks
2021-07-20 10:01:18 +03:00
Dmitry Chagin
5fd9cd53d2 linux(4): Modify sv_onexec hook to return an error.
Temporary add stubs to the Linux emulation layer which calls the existing hook.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D30911
MFC after:		2 weeks
2021-07-20 09:56:25 +03:00
Dmitry Chagin
815165be20 linux(4): Remove function prototypes from the vDSO.
In preparation for vDSO code revision get rid of incomplete vDSO methods
from locore, but leave .note.Linux section commented out.
.note.Linux section is used by glibc rtld to get the kernel version, that
saves one system call call. I'll try to implement it later, if figure out
how to use it with jails.

MFC after:	2 weeks
2021-07-20 09:52:08 +03:00
Jessica Clarke
439097486b acpi: Fix a repeated comment typo 2021-07-19 17:19:23 +01:00
Jessica Clarke
52fba9a943 acpi: Fix a repeated vm_offset_t that should be a vm_size_t
The underlying types for both are the same so arguably this doesn't
really matter, but using the wrong type is still confusing and
technically incorrect.
2021-07-19 17:19:23 +01:00
David Chisnall
cf98bc28d3 Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-16 18:06:44 +01:00
Alan Cox
325ff93274 Clear the accessed bit when copying a managed superpage mapping
pmap_copy() is used to speculatively create mappings, so those mappings
should not have their access bit preset.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31162
2021-07-14 13:06:10 -05:00
Warner Losh
c0c703342d pccard: remove pccard device from all kernels
All the PC Card drivers have been removed from the tree. Remove the
pccard drivers from all the kernels.

Sponsored by:		Netflix
2021-07-13 20:39:31 -06:00
Alan Cox
d411b285bc pmap: Micro-optimize pmap_remove_pages() on amd64 and arm64
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE().  This enables the compiler to generate
slightly smaller machine code.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31161
2021-07-13 17:33:23 -05:00
Ka Ho Ng
b5c74dfd64 vmm: Fix AMD-vi using wrong rid range
The ACPI parsing code around rid range was wrong on assuming there is
only one pair of start/end device id range. Besides, ivhd_dev_parse()
never work as supposed. The start/end rid info was always zero.

Restructure the code to build dynamic-sized tables for each IOMMU softc
holding device entries. The device entries are enumerated to find a
suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU
unit itself) are no-op from now on. There are also a minor fix on wrong
%b formatting string usage.

Tested on my EPYC 7282.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D30827
2021-07-14 01:53:10 +08:00
Peter Grehan
517904de5c igc(4): Introduce new driver for the Intel I225 Ethernet controller.
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.

The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.

Thanks to Mike Karels for testing and feedback on the driver.

Reviewed by:	bcr (manpages), kbowling, scottl, erj
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30668
2021-07-12 14:57:18 +10:00
Helge Oldach
b21f19c9e0 MINIMAL: remove debugging and some loadable network modules
Remove deugging stuff, since it's arguably not needed in a minimal
setup. Also vlan, tuntap and gif since they can be loaded.

imp didn't include the part of the patch that removed xen guest support.
Xen guest is relatively small and has no way of being loaded.

Reviewed by:	imp
PR:		229564
MFC After:	3 days
2021-07-11 10:35:42 -06:00
David Chisnall
d2b558281a Revert "Pass the syscall number to capsicum permission-denied signals"
This broke the i386 build.

This reverts commit 3a522ba1bc.
2021-07-10 20:26:01 +01:00
David Chisnall
3a522ba1bc Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-10 17:19:52 +01:00
Konstantin Belousov
fdc71fa112 amd64 pmap: unexpand the NBPDR macro definition
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:54 +03:00
Konstantin Belousov
71463a34ab amd64 mpboot.S: fix typo in comment
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:54 +03:00
Konstantin Belousov
63664df720 amd64 locore.S: add FF copyright for LA57 work
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:53 +03:00