Commit Graph

8728 Commits

Author SHA1 Message Date
Jessica Clarke
52fba9a943 acpi: Fix a repeated vm_offset_t that should be a vm_size_t
The underlying types for both are the same so arguably this doesn't
really matter, but using the wrong type is still confusing and
technically incorrect.
2021-07-19 17:19:23 +01:00
David Chisnall
cf98bc28d3 Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-16 18:06:44 +01:00
Alan Cox
325ff93274 Clear the accessed bit when copying a managed superpage mapping
pmap_copy() is used to speculatively create mappings, so those mappings
should not have their access bit preset.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31162
2021-07-14 13:06:10 -05:00
Warner Losh
c0c703342d pccard: remove pccard device from all kernels
All the PC Card drivers have been removed from the tree. Remove the
pccard drivers from all the kernels.

Sponsored by:		Netflix
2021-07-13 20:39:31 -06:00
Alan Cox
d411b285bc pmap: Micro-optimize pmap_remove_pages() on amd64 and arm64
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE().  This enables the compiler to generate
slightly smaller machine code.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31161
2021-07-13 17:33:23 -05:00
Ka Ho Ng
b5c74dfd64 vmm: Fix AMD-vi using wrong rid range
The ACPI parsing code around rid range was wrong on assuming there is
only one pair of start/end device id range. Besides, ivhd_dev_parse()
never work as supposed. The start/end rid info was always zero.

Restructure the code to build dynamic-sized tables for each IOMMU softc
holding device entries. The device entries are enumerated to find a
suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU
unit itself) are no-op from now on. There are also a minor fix on wrong
%b formatting string usage.

Tested on my EPYC 7282.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D30827
2021-07-14 01:53:10 +08:00
Peter Grehan
517904de5c igc(4): Introduce new driver for the Intel I225 Ethernet controller.
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.

The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.

Thanks to Mike Karels for testing and feedback on the driver.

Reviewed by:	bcr (manpages), kbowling, scottl, erj
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D30668
2021-07-12 14:57:18 +10:00
Helge Oldach
b21f19c9e0 MINIMAL: remove debugging and some loadable network modules
Remove deugging stuff, since it's arguably not needed in a minimal
setup. Also vlan, tuntap and gif since they can be loaded.

imp didn't include the part of the patch that removed xen guest support.
Xen guest is relatively small and has no way of being loaded.

Reviewed by:	imp
PR:		229564
MFC After:	3 days
2021-07-11 10:35:42 -06:00
David Chisnall
d2b558281a Revert "Pass the syscall number to capsicum permission-denied signals"
This broke the i386 build.

This reverts commit 3a522ba1bc.
2021-07-10 20:26:01 +01:00
David Chisnall
3a522ba1bc Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned.  This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.

Approved by:	markj (mentor)

Reviewed by:	kib, bcr (manpages)

Differential Revision: https://reviews.freebsd.org/D29185
2021-07-10 17:19:52 +01:00
Konstantin Belousov
fdc71fa112 amd64 pmap: unexpand the NBPDR macro definition
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:54 +03:00
Konstantin Belousov
71463a34ab amd64 mpboot.S: fix typo in comment
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:54 +03:00
Konstantin Belousov
63664df720 amd64 locore.S: add FF copyright for LA57 work
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:53 +03:00
Konstantin Belousov
9dc715230c amd64 locore.S: trim .globl list from symbols gone for long time
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-07-10 14:46:53 +03:00
Konstantin Belousov
55e63ed307 x86: use ANSI C definition style for trap_fatal
PR:	257062
Submitted by:	Vijay Sharma <vijaysh312@gmail.com>
MFC after:	1 week
2021-07-10 14:46:53 +03:00
Mark Johnston
f08f0ae524 amd64: Mark the trapframe as initialized in trap()
Otherwise KASAN may generate false positives if the trapframe was
written into a poisoned region of the stack.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Konstantin Belousov
28a66fc3da Do not call FreeBSD-ABI specific code for all ABIs
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI.  Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.

Requested by:	dchagin
Reviewed by:	dchagin, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30987
2021-07-07 14:12:07 +03:00
Alan Cox
e41fde3ed7 On a failed fcmpset don't pointlessly repeat tests
In a few places, on a failed compare-and-set, both the amd64 pmap and
the arm64 pmap repeat tests on bits that won't change state while the
pmap is locked.  Eliminate some of these unnecessary tests.

Reviewed by:	andrew, kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31014
2021-07-05 21:07:40 -05:00
Edward Tomasz Napierala
447636e43c linux(4): implement coredump support
Implement dumping core for Linux binaries on amd64, for both
32- and 64-bit executables.  Some bits are still missing.

This is based on a prototype by chuck@.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30019
2021-06-30 22:45:06 +01:00
Alan Cox
1a8bcf30f9 amd64: a simplication to pmap_remove_{all,write}
Eliminate some unnecessary unlocking and relocking when we have to retry
the operation to avoid deadlock.  (All of the other pmap functions that
iterate over a PV list already implemented retries without these same
unlocking and relocking operations.)

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30951
2021-06-30 13:12:25 -05:00
Edward Tomasz Napierala
435754a59e Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values.  It also updates all
of the ABI definitions to preserve current behaviour.

This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication.  It will be used
for Linux coredumps.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30921
2021-06-29 08:49:12 +01:00
Mateusz Guzik
9a8e4527f0 amd64: typo fix: memcmpy -> memcmp in a comment
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-06-26 16:24:46 +00:00
Dmitry Chagin
4aae133469 linux(4): Make vDSO defines private.
Hide the vDSO defines to the linux32_sysvec as they are not intended to
be used outside of it. Fix LINUX32_PS_STRINGS, use the size of
struct linux32_ps_strings instead of a numeric constant.

MFC after:	2 weeks
2021-06-25 18:41:04 +03:00
Konstantin Belousov
33e1287b6a amd64: do not touch BIOS reset flag halfword, unless we boot through BIOS
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30872
2021-06-24 00:38:00 +03:00
Dmitry Chagin
c1da89fec2 linux(4): Retire linux_kplatform.
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().

I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.

This is the first stage of Linuxulator's vdso revision.

Reviewed by:		trasz, imp
Differential Revision:	https://reviews.freebsd.org/D30774
MFC after:		2 weeks
2021-06-22 08:36:21 +03:00
Dmitry Chagin
e013e36939 linux(4): Get rid of Linuxulator kernel build options.
Stop confusing people, retire COMPAT_LINUX and COMPAT_LINUX32 kernel
build options. Since we have 32 and 64 bit Linux emulators, we can't build both
emulators together into the kernel. I don't think it matters, Linux emulation
depends on loadable modules (via rc).

Cut LINPROCFS and LINSYSFS for consistency.

PR:			215061
Reviewed by:		bcr (manpages), trasz
Differential Revision:	https://reviews.freebsd.org/D30751
MFC after:		2 weeks
2021-06-22 08:32:39 +03:00
Edward Tomasz Napierala
135dd0cab5 linux: reduce differences between rt_sendsig() and sendsig()
This makes it easier to compare the two.  This involves moving
the mutex slightly lower down, but there should be no functional
changes.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30541
2021-06-21 17:51:56 +01:00
Dmitry Chagin
8fe8bb7cb5 linux(4): Regen for linux_poll system call.
MFC after:	2 weeks
2021-06-22 08:09:55 +03:00
Dmitry Chagin
2eff670fde linux(4): Implement poll system call via linux_common_ppol()
for the sake of converting events to/from native.

MFC after:	2 weeks
2021-06-22 08:07:46 +03:00
Dmitry Chagin
26795a0378 linux(4): Rework Linux ppoll system call.
For now the Linux emulation layer uses in kernel ppoll(2) without
conversion of user supplied fd 'events', and does not convert the
kernel supplied fd 'revents'.

At least POLLRDHUP is handled by FreeBSD differently than by
Linux. Seems that Linux silencly ignores POLLRDHUP on non socket fd's
unlike FreeBSD, which does more strictly check and fails.

Rework the Linux ppoll, using kern_poll and converting 'events'
and 'revents' values.
While here, move poll events defines to the MI part of code as they
mostly identical on all arches except arm.

Differential Revision:	https://reviews.freebsd.org/D30716
MFC after:		2 weeks
2021-06-22 08:06:05 +03:00
Ka Ho Ng
210e6aec4f vmm: Fix ivrs_drv device_printf usage
The original %b description string is wrong.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp, jhb
Differential Revision:	https://reviews.freebsd.org/D30805
2021-06-19 14:07:26 +08:00
Konstantin Belousov
0247c33e89 amd64 efirt: initialize vm_pages backing EFI runtime memory
so that PHYS_TO_VM_PAGE() and consequently physcopyin() work for them

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30785
2021-06-17 16:58:51 +03:00
Konstantin Belousov
870e197d52 Add quirks for Linux ABI signals handling
Require queueing of the signals with default action, and disable
dequeueing SIGCHLD on wait for live process.

Reported and tested by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:01:35 +03:00
Mark Johnston
70dd5eebc0 amd64: Fix propagation of LDT updates
When a process has used sysarch(2) to specify descriptors for its
private LDT, upon rfork(RFMEM) descriptors are copied into the new child
process.  Any updates to the descriptors are thus reflected to all other
processes sharing the vmspace.  However, this is incorrect in the rather
obscure case where the child process was created before the LDT was
modified.  Fix this by only modifying other processes which already
share the LDT.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:18 -04:00
Olivier Houchard
30b915d7b2 an: Remove driver
Now that an(4) is gone, remove it from GENERIC kernel config files.

Reported by:	flo
2021-06-12 01:08:54 +02:00
Dmitry Chagin
89f15b79b1 linux(4): Regen for ppoll_time64 system call.
MFC after:	2 weeks
2021-06-10 15:19:12 +03:00
Dmitry Chagin
ed61e0ce1d linux(4): Implement ppoll_time64 system call.
MFC after:	2 weeks
2021-06-10 15:18:46 +03:00
Dmitry Chagin
981a60f112 linux(4): Regen for pselect6_time64 system call.
MFC after:	2 weeks
2021-06-10 15:04:37 +03:00
Dmitry Chagin
f6d075ecd7 linux(4): Implement pselect6_time64 system call.
MFC after:	2 weeks
2021-06-10 15:03:30 +03:00
Dmitry Chagin
c002529000 linux(4): Regen for rt_sigtimedwait_time64 system call.
MFC after:	2 weeks
2021-06-10 14:52:43 +03:00
Dmitry Chagin
db4a1f331b linux(4): Implement rt_sigtimedwait_time64 system call.
It still does not work as intended, awaits D30675.

MFC after:	2 weeks
2021-06-10 14:51:30 +03:00
Dmitry Chagin
985978806e linux(4): Regen for futex_time64 system call.
MFC after:	2 weeks
2021-06-10 14:28:25 +03:00
Dmitry Chagin
2e46d0c3d9 linux(4): Implement futex_time64 system call.
MFC after:	2 weeks
2021-06-10 14:27:06 +03:00
Dmitry Chagin
ee64d98204 linux(4): Regen for futex system call.
MFC after:	2 weeks
2021-06-10 14:16:40 +03:00
Dmitry Chagin
3c1de151e3 linux(4): Change Linux futex syscall definition to match Linux actual one.
MFC after:	2 weeks
2021-06-10 14:00:00 +03:00
Edward Tomasz Napierala
f102b61d0e linux: make sure to zero the l_siginfo structure for ptrace(2)
Reported By:	dchagin
Sponsored By:	EPSRC
2021-06-08 10:18:29 +01:00
Konstantin Belousov
598f6fb49c linuxolator: Add compat.linux.setid_allowed knob
PR:	21463
Reported by:	kris
Reviewed by:	dchagin
Tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:43:00 +03:00
Dmitry Chagin
f4e801085b linux(4): optimize ksiginfo to siginfo conversion.
Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.

MFC after:	2 weeks
2021-06-07 06:06:17 +03:00
Dmitry Chagin
e29ea22f70 Regen for ('0f8dab45404f347752470579feccc6d2739b9570') Linux
rt_sigtimedwait system call.

MFC after:	2 weeks
2021-06-07 05:39:29 +03:00
Dmitry Chagin
0f8dab4540 linux(4): Fix timeout parameter of rt_sigtimedwait syscall, which is
timespec not a timeval.

MFC after:	2 weeks
2021-06-07 05:35:35 +03:00
Dmitry Chagin
56b187005c Regen for ('6501370a7dfb358daf07555136742bc064e68cb7') Linux
clock_nanosleep_time64 system call.

MFC after:	2 weeks
2021-06-07 05:29:27 +03:00
Dmitry Chagin
6501370a7d linux(4): Implement clock_nanosleep_time64 system call.
MFC after:	2 weeks
2021-06-07 05:26:48 +03:00
Dmitry Chagin
773d9153c3 Regen for ('187715a420237e1ed94dd5aef158eada7dcdc559') Linux
clock_getres_time64 system call.

MFC after:	2 weeks
2021-06-07 05:21:48 +03:00
Dmitry Chagin
187715a420 linux(4): Implement clock_getres_time64 system call.
MFC after:	2 weeks
2021-06-07 05:21:32 +03:00
Dmitry Chagin
82e3848654 Regen for ('19f9a0e4df54f8d1e99234146024422bdcfa09ce') Linux
clock_settime64 system call.

MFC after:	2 weeks
2021-06-07 05:14:04 +03:00
Dmitry Chagin
19f9a0e4df linux(4): Implement clock_settime64 system call.
MFC after:	2 weeks
2021-06-07 05:11:25 +03:00
Dmitry Chagin
9e07ae7a09 Regen for ('99b6f430698fa00a33184dd61591d8b6518ed9d3') Linux
clock_gettime64 system call.

MFC after:	2 weeks
2021-06-07 05:08:11 +03:00
Dmitry Chagin
99b6f43069 linux(4): Implement clock_gettime64 system call.
MFC after:	2 weeks
2021-06-07 05:04:42 +03:00
Dmitry Chagin
ea7fa5583c Regen for ('e4bffb80bbc6a2e4b3be89aefcbd5bb2c2fc0ba0') Linux
utimensat_time64 syscall.

MFC after:	2 weeks
2021-06-07 04:56:58 +03:00
Dmitry Chagin
e4bffb80bb linux(4): Implement utimensat_time64 system call.
MFC after:	2 weeks
2021-06-07 04:54:30 +03:00
Dmitry Chagin
bfcce1a9f6 linux(4): add struct timespec64 definition and conversion routine for
future use.

MFC after:		2 weeks
2021-06-07 04:47:12 +03:00
Mark Johnston
8cd05b8833 amd64: Clear the local TSS when creating a new thread
Otherwise it is copied from the creating thread.  Then, if either thread
exits, the other is left with a dangling pointer, typically resulting in
a page fault upon the next context switch.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30607
2021-06-01 19:38:22 -04:00
Mark Johnston
6cda627556 amd64: Relax the assertion added in commit 4a59cbc12
We only need to ensure that interrupts are disabled when handling a
fault from iret.  Otherwise it's possible to trigger the assertion
legitimately, e.g., by copying in from an invalid address.

Fixes:		4a59cbc12
Reported by:	pho
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30594
2021-06-01 19:38:09 -04:00
Mark Johnston
4a59cbc125 amd64: Avoid enabling interrupts when handling kernel mode prot faults
When PTI is enabled, we may have been on the trampoline stack when iret
faults.  So, we have to switch back to the regular stack before
re-entering trap().

trap() has the somewhat strange behaviour of re-enabling interrupts when
handling certain kernel-mode execeptions.  In particular, it was doing
this for exceptions raised during execution of iret.  When switching
away from the trampoline stack, however, the thread must not be migrated
to a different CPU.  Fix the problem by simply leaving interrupts
disabled during the window.

Reported by:	syzbot+6cfa544fd86ad4647ffc@syzkaller.appspotmail.com
Reported by:	syzbot+cfdfc9e5a8f28f11a7f5@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30578
2021-05-31 18:49:33 -04:00
Dmitry Chagin
19593f775c linux(4); Retire unnecessary __packed attribute from some struct's
definition.

Differential Revision:	https://reviews.freebsd.org/D30482
MFC after:		2 weeks
2021-05-31 21:56:34 +03:00
Konstantin Belousov
c56de177d2 x86: initialize initial FPU state earlier
Make it under SI_SUB_CPU sysinit, instead of much later SI_SUB_DRIVERS.
The SI_SUB_DRIVERS survived from times when FPU used real ISA attachment,
now it is only pnp stub claiming id.

PR:	255997
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30512
2021-05-28 21:38:32 +03:00
Edward Tomasz Napierala
c0f171736a Regen after 6d926e850d.
Sponsored By:	EPSRC
2021-05-28 09:04:17 +01:00
Edward Tomasz Napierala
6d926e850d linux: add new syscall numbers
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30193
2021-05-28 09:02:16 +01:00
Mark Johnston
4c599db71a vmm: Let guests enable SMEP/SMAP if the host supports it
Reviewed by:	kib, grehan, jhb
Tested by:	grehan (AMD)
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30462
2021-05-26 09:34:52 -04:00
Konstantin Belousov
a59f028537 amd64/linux*: add required header to get the constant value
Otherwise asm silently interpret it as the external global symbol.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
Fixes:	91aae953cb
2021-05-26 01:24:09 +03:00
Konstantin Belousov
91aae953cb amd64: clear PSL.AC in the right frame
If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact.  Since
onfault handler is effectively jump, AC survives until syscall exit.

Reported by:	m00nbsd, via Sony
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
admbugs:	975
2021-05-25 18:20:46 +03:00
Edward Tomasz Napierala
95c19e1d65 linux: refactor bsd_to_linux_regset() out of linux_ptrace.c
This will be used for Linux coredump support.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30365
2021-05-21 07:26:07 +01:00
Ceri Davies
c1a148873d sys/*/conf/*, docs: fix links to handbook
While here, fix all links to older en_US.ISO8859-1 documentation
in the src/ tree.

PR:             255026
Reported by:    Michael Büker <freebsd@michael-bueker.de>
Reviewed by:    dbaio
Approved by:    blackend (mentor), re (gjb)
MFC after:      10 days
Differential Revision: https://reviews.freebsd.org/D30265
2021-05-20 09:27:10 +01:00
Roger Pau Monné
9e14ac116e x86/xen: further PVHv1 removal cleanup
The AP startup extern variable declarations are not longer needed,
since PVHv2 uses the native AP startup path using the lapic. Remove
the declaration and make the variables static to mp_machdep.c

Sponsored by: Citrix Systems R&D
2021-05-18 10:43:31 +02:00
Roger Pau Monné
ac3ede5371 x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.

Note FreeBSD supports PVHv2, which is the replacement for PVHv1.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228
2021-05-17 11:41:21 +02:00
Mitchell Horne
c93e6ea344 xen: remove support for PVHv1 bootpath
PVHv1 is a legacy interface supported only by Xen versions 4.4 through
4.9.

Reviewed by: royger
2021-05-17 10:56:52 +02:00
Mark Johnston
60cb98a1bd linux: Fix a mistake in commit fb58045145
The change to futex_andl_smap() should have ordered stac before the
load from a user address, otherwise it does not fix anything.

Fixes:	fb58045145 ("linux: Fix SMAP-enabled futex routines")
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-05-16 22:23:14 -04:00
Mark Johnston
fb58045145 linux: Fix SMAP-enabled futex routines
Some of them were dereferencing the user pointer before disabling SMAP.

PR:		255591
Reviewed by:	kib
Tested by:	pitwuu@gmail.com
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D30276
2021-05-16 13:42:08 -04:00
Ed Maste
2c9764f36b regen syscall files after d51198d63b63 2021-05-13 14:09:58 -04:00
Edward Tomasz Napierala
916f3dba45 linux(4): make arch_prctl(2) support GET_CET_STATUS, report unknown codes
This is largely a no-op, to make future debugging slightly easier.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30035
2021-05-06 09:33:42 +01:00
Edward Tomasz Napierala
023bff7990 linux(4): fix ptrace(2) to properly handle orig_rax
This fixes strace(1) erroneously reporting return values
as "Function not implemented", combined with reporting the binary
ABI as X32.

Very similar code in linux_ptrace_getregs() is left as it is - it's
probably wrong too, but I don't have a way to test it.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29927
2021-05-04 15:21:06 +01:00
Andrew Turner
c78ad207ba Switch the EFI virtual address to a uint64_t
It is defined as a uint64_t in the UEFI spec. As it's not used as a
pointer by the kernel follow this and define it as the same in the
kernel.

Reviewed by:	kib, manu, imp
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D29759
2021-05-01 06:01:20 +00:00
Konstantin Belousov
72a42ec63b amd64: disable LA57 by default for now
A testing on the real hardware uncovered an issue, and since I do not have
access to the machine, disable until the bug can be fixed.

Reported by:	"Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-30 17:43:45 +03:00
Konstantin Belousov
21fc6a2a10 amd64: invalidate TLB between page table update and access
When setting up trampoline mapping for LA57 switcher, it is possible
that TLB still has some random mapping at that address.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-30 17:43:45 +03:00
Mark Johnston
20e3b9d8bd kasan: Use vm_offset_t for the first parameter to kasan_shadow_map()
No functional change intended.

Sponsored by:	The FreeBSD Foundation
2021-04-29 11:39:02 -04:00
Alexander V. Chernikov
6993187a8c Add FIB_ALGO to GENERIC on amd64/arm64.
Option `FIB_ALGO` gates new modular fib lookup functionality,
 enabling more performant routing table lookups and improving
 control plane convergence under the load.

Detailed feature description is available in D27401.

Reviewed By: olivier, gnn
Differential Revision: https://reviews.freebsd.org/D28434
2021-04-24 23:22:58 +00:00
Edward Tomasz Napierala
77651151f3 linux: make ptrace(2) return EIO when trying to peek invalid address
Previously we've returned the error from native ptrace(2), ENOMEM.
This confused Linux strace(2).

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29925
2021-04-24 11:37:50 +01:00
Ka Ho Ng
6fe60f1d5c AMD-vi: Fortify IVHD device_identify process
- Use malloc(9) to allocate ivhd_hdrs list. The previous assumption
  that there are at most 10 IVHDs in a system is not true. A counter
  example would be a system with 4 IOMMUs, and each IOMMU is related
  to IVHDs type 10h, 11h and 40h in the ACPI IVRS table.
- Always scan through the whole ivhd_hdrs list to find IVHDs that has
  the same DeviceId but less prioritized IVHD type.

Sponsored by:	The FreeBSD Foundation
MFC with:	74ada297e8
Reviewed by:	grehan
Approved by:	lwhsu (mentor)
Differential Revision:	https://reviews.freebsd.org/D29525
2021-04-19 16:08:13 +08:00
Mark Johnston
f115c06121 amd64: Add MD bits for KASAN
- Initialize KASAN before executing SYSINITs.
- Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN.
- Increase the kernel stack size if KASAN is enabled.  Some of the
  ASAN instrumentation increases stack usage and it's enough to
  trigger stack overflows in ZFS.
- Mark the trapframe as valid in interrupt handlers if it is
  assigned to td_intr_frame.  Otherwise, an interrupt in a function
  which creates a poisoned alloca region can trigger false positives.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29455
2021-04-13 17:42:20 -04:00
Mark Johnston
6faf45b34b amd64: Implement a KASAN shadow map
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map.  This region is the shadow map.  The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid.  Various kernel allocators call kasan_mark() to update
the shadow map.

Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN.  UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.

The shadow map uses one byte per eight bytes in the kernel map.  In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.

When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map.  kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29417
2021-04-13 17:42:20 -04:00
Edward Tomasz Napierala
ca6e1fa3ce linux: adjust ordering of Linux auxv and add dummy AT_HWCAP2
This should be a no-op; the purpose of this is to reduce
a spurious difference between Linuxulator and Linux, to make
debugging core dumps slightly easier.

Note that AT_HWCAP2 we pass to Linux binaries is always 0,
instead of being equal to 'cpu_feature2'.  This matches what
I've observed under Ubuntu Focal VM.

Reviewed By:	chuck, dchagin
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D29609
2021-04-13 13:14:30 +01:00
Andrew Turner
5d2d599d3f Create VM_MEMATTR_DEVICE on all architectures
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.

Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29692
2021-04-12 06:15:31 +00:00
Piotr Pawel Stefaniak
a212f56d10 Balance parentheses in sysctl descriptions 2021-04-11 10:30:55 +02:00
Konstantin Belousov
94172affa4 amd64: clear debug registers on execing 32bit Linux binary
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
d50adfec9e amd64: clear debug registers on execing 32bit native binary
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
2f15884747 amd64 linux64: use x86_clear_dbregs()
instead of manually inlining it

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
290b0d123a x86: use x86_clear_dbregs() on fork
instead of manual zeroing of the debug registers file in pcb.
This centralizes the cleaning code, but the practical difference is
that PCB_DBREGS flag is cleared, saving some operations on context
switching.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:02 +03:00
Konstantin Belousov
a8b75a57c9 x86: add x86_clear_dbregs() helper
Move the code from exec_setregs() to reset debug registers state on exec,
to the x86_clear_dbregs() helper

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:01 +03:00
Greg V
a29bff7a52 smbios: support getting address from EFI
On some systems (e.g. Lenovo ThinkPad X240, Apple MacBookPro12,1)
the SMBIOS entry point is not found in the <0xFFFFF space.

Follow the SMBIOS spec and use the EFI Configuration Table for
locating the entry point on EFI systems.

Reviewed by:	rpokala, dab
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D29276
2021-04-07 14:46:29 -05:00
Mark Johnston
0f07c234ca Remove more remnants of sio(4)
Reviewed by:	imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29626
2021-04-07 14:33:02 -04:00
Konstantin Belousov
aa3ea612be x86: remove gcov kernel support
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29529
2021-04-02 15:41:51 +03:00
Ka Ho Ng
03efa462b2 AMD-vi: Mixed format IVHD block should replace fixed format IVHD block
This fixes double IVHD_SETUP_INTR calls on the same IOMMU device.

Sponsored by:	The FreeBSD Foundation
MFC with:	74ada297e8
Reported by:	Oleg Ginzburg <olevole@olevole.ru>
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D29521
2021-04-01 15:31:24 +08:00
Ka Ho Ng
cf76495e0a AMD-vi: Fix mismatched NULL checking in amdiommu teardown path
Sponsored by:	The FreeBSD Foundation
Approved by:	lwhsu (mentor)
MFC with:	74ada297e8
2021-04-01 03:34:34 +08:00
Jason A. Harmening
8dc8feb53d Clean up a couple of MD warts in vm_fault_populate():
--Eliminate a big ifdef that encompassed all currently-supported
architectures except mips and powerpc32.  This applied to the case
in which we've allocated a superpage but the pager-populated range
is insufficient for a superpage mapping.  For platforms that don't
support superpages the check should be inexpensive as we shouldn't
get a superpage in the first place.  Make the normal-page fallback
logic identical for all platforms and provide a simple implementation
of pmap_ps_enabled() for MIPS and Book-E/AIM32 powerpc.

--Apply the logic for handling pmap_enter() failure if a superpage
mapping can't be supported due to additional protection policy.
Use KERN_PROTECTION_FAILURE instead of KERN_FAILURE for this case,
and note Intel PKU on amd64 as the first example of such protection
policy.

Reviewed by:	kib, markj, bdragon
Differential Revision:	https://reviews.freebsd.org/D29439
2021-03-30 18:15:55 -07:00
Konstantin Belousov
8223717ce6 x86: clear %db registers in new process
Reported by:	 Michał Górny <mgorny@gentoo.org>
PR:	254661
Reviewed by:	emaste, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29496
2021-03-31 02:07:35 +03:00
Mitchell Horne
7446b0888d gdb: report specific stop reason for watchpoints
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html

Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	51
Differential Revision:	https://reviews.freebsd.org/D29174
2021-03-30 11:36:41 -03:00
Mitchell Horne
9d81dd5404 ddb: replace watchpoint set/clear functions
Use the new kdb variants. Print more specific error messages.

Reviewed by:	jhb, markj
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29156
2021-03-29 12:05:44 -03:00
Mitchell Horne
15dc1d4452 x86: implement kdb watchpoint functions
Add wrappers around the dbreg interface that can be consumed by MI
kernel debugger code. The dbreg functions themselves are updated to
return error codes, not just -1. dbreg_set_watchpoint() is extended to
accept access bits as an argument.

Reviewed by:	jhb, kib, markj
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29155
2021-03-29 12:05:43 -03:00
Mark Johnston
7ae2e70336 amd64: Make KPDPphys local to pmap.c
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-24 09:57:31 -04:00
Ka Ho Ng
be97fc8dce bhyve amd: Small cleanups in amdvi_dump_cmds
Bump offset with MOD_INC instead in amdvi_dump_cmds.

Reviewed by:	jhb
Approved by:	philip (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28862
2021-03-23 16:12:41 +08:00
Mark Johnston
3ead60236f Generalize bus_space(9) and atomic(9) sanitizer interceptors
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN.  Lay a bit of groundwork for KASAN and KMSAN.

When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h.  These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.

No functional change intended.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2021-03-22 22:21:53 -04:00
Ed Maste
54399caa2f Correct "Fondation" typo (missing "u") 2021-03-22 13:06:31 -04:00
Ka Ho Ng
74ada297e8 AMD-vi: Fix IOMMU device interrupts being overridden
Currently, AMD-vi PCI-e passthrough will lead to the following lines in
dmesg:
"kernel: CPU0: local APIC error 0x40
ivhd0: Error: completion failed tail:0x720, head:0x0."

After some tracing, the problem is due to the interaction with
amdvi_alloc_intr_resources() and pci_driver_added(). In ivrs_drv, the
identification of AMD-vi IVHD is done by walking over the ACPI IVRS
table and ivhdX device_ts are added under the acpi bus, while there are
no driver handling the corresponding IOMMU PCI function. In
amdvi_alloc_intr_resources(), the MSI intr are allocated with the ivhdX
device_t instead of the IOMMU PCI function device_t. bus_setup_intr() is
called on ivhdX. the IOMMU pci function device_t is only used for
pci_enable_msi(). Since bus_setup_intr() is not called on IOMMU pci
function, the IOMMU PCI function device_t's dinfo->cfg.msi is never
updated to reflect the supposed msi_data and msi_addr. So the msi_data
and msi_addr stay in the value 0. When pci_driver_added() tried to loop
over the children of a pci bus, and do pci_cfg_restore() on each of
them, msi_addr and msi_data with value 0 will be written to the MSI
capability of the IOMMU pci function, thus explaining the errors in
dmesg.

This change includes an amdiommu driver which currently does attaching,
detaching and providing DEVMETHODs for setting up and tearing down
interrupt. The purpose of the driver is to prevent pci_driver_added()
from calling pci_cfg_restore() on the IOMMU PCI function device_t.
The introduction of the amdiommu driver handles allocation of an IRQ
resource within the IOMMU PCI function, so that the dinfo->cfg.msi is
populated.

This has been tested on EPYC Rome 7282 with Radeon 5700XT GPU.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	jhb
Approved by:	philip (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28984
2021-03-22 17:33:43 +08:00
Ka Ho Ng
ede14736fd ivrs_drv: Fix IVHDs with duplicated BaseAddress
Reviewed by:	jhb
Approved by:	philip (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28945
2021-03-22 17:33:43 +08:00
Jason A. Harmening
d22883d715 Remove PCPU_INC
e4b8deb222 removed the last in-tree uses of PCPU_INC().  Its
potential benefit is also practically nonexistent.  Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction.  The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29308
2021-03-20 19:23:59 -07:00
Mitchell Horne
c02c04f113 x86: consolidate hw watchpoint logic into new file
This is a prerequisite to using these functions outside of ddb, but also
provides some cleanup and minor refactoring. This code is almost
entirely duplicated between the two implementations, the only
significant difference being the lack of dbreg synchronization on i386.

Cleanups are:
 - demote some internal functions to static
 - use the constant NDBREGS instead of a '4' literal
 - remove K&R definitions
 - some added comments

Reviewed by:	kib, jhb
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29153
2021-03-19 16:51:52 -03:00
D Scott Phillips
f8a6ec2d57 bhyve: support relocating fbuf and passthru data BARs
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by:	scottph
MFC After:	1 week
Sponsored by:	Intel Corporation
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D24066
2021-03-19 11:04:36 +08:00
John Baldwin
3b57ddb029 Rename linux_set_upcall_kse() to linux_set_upcall().
This matches the rename of cpu_set_upcall_kse() in
5c2cf81845.

Reviewed by:	kib, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D29295
2021-03-18 12:14:34 -07:00
John Baldwin
a7883464fc x86: Reduce code duplication in cpu_fork() and cpu_copy_thread().
Add copy_thread() to hold shared code.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29228
2021-03-18 12:13:17 -07:00
Jason A. Harmening
c2460d7cfe factor out PT page allocation/freeing
As follow-on work to e4b8deb222, move page table page
allocation and freeing into their own functions.  Use these
functions to provide separate kernel vs. user page table page
accounting, and to wrap common tasks such as management of
zero-filled page state.

Requested by:	markj, kib
Reviewed by:	kib
Differential Revision:	 https://reviews.freebsd.org/D29151
2021-03-15 17:14:43 -07:00
John Baldwin
40d593d17e x86: Update some stale comments in cpu_fork() and cpu_copy_thread().
Neither of these routines allocate stacks.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29227
2021-03-12 09:48:49 -08:00
John Baldwin
c7b0213523 x86: Always use clean FPU and segment base state for new kthreads.
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29208
2021-03-12 09:48:36 -08:00
John Baldwin
755efb8d8f x86: Copy the FPU/XSAVE state from the creating thread to new threads.
POSIX states that new threads created via pthread_create() should
inherit the "floating point environment" from the creating thread.

Discussed with:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29204
2021-03-12 09:47:41 -08:00
John Baldwin
704547ce1c amd64: Cleanups to setting TLS registers for Linux binaries.
- Use update_pcb_bases() when updating FS or GS base addresses to
  permit use of FSBASE and GSBASE in Linux processes.  This also sets
  PCB_FULL_IRET.  linux32 was setting PCB_32BIT which should be a
  no-op (exec sets it).

- Remove write-only variables to construct unused segment descriptors
  for linux32.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29026
2021-03-12 09:47:31 -08:00
John Baldwin
9221145868 amd64: Only update fsbase/gsbase in pcb for curthread.
Before the pcb is copied to the new thread during cpu_fork() and
cpu_copy_thread(), the kernel re-reads the current register values in
case they are stale.  This is done by setting PCB_FULL_IRET in
pcb_flags.

This works fine for user threads, but the creation of kernel processes
and kernel threads do not follow the normal synchronization rules for
pcb_flags.  Specifically, new kernel processes are always forked from
thread0, not from curthread, so adjusting pcb_flags via a simple
instruction without the LOCK prefix can race with thread0 running on
another CPU.  Similarly, kthread_add() clones from the first thread in
the relevant kernel process, not from curthread.  In practice, Netflix
encountered a panic where the pcb_flags in the first kthread of the
KTLS process were trashed due to update_pcb_bases() in
cpu_copy_thread() running from thread0 to create one of the other KTLS
threads racing with the first KTLS kthread calling fpu_kern_thread()
on another CPU.  In the panicking case, the write to update pcb_flags
in fpu_kern_thread() was lost triggering an "Unregistered use of FPU
in kernel" panic when the first KTLS kthread later tried to use the
FPU.

Reported by:	gallatin
Discussed with:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D29023
2021-03-12 09:45:18 -08:00
Jason A. Harmening
e4b8deb222 amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters.  Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.

The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios.  However, this change does add two new counters that
are available without PV_STATS.  pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively.  These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28923
2021-03-09 09:27:10 -08:00
Mark Johnston
435c7cfb24 Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29103
2021-03-08 12:39:06 -05:00
Eric van Gyzen
8b434feedf Only set delayed inval for procs using PTI
invltlb_invpcid_pti_handler() was requesting delayed TLB invalidation
even for processes that aren't using PTI.  With an out-of-tree
change to avoid PTI for non-jailed root processes, this caused an
assertion failure in pmap_activate_sw_pcid_pti() when context-switching
between PTI and non-PTI processes.

Reviewed by:	bdrewery kib tychon
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D29094
2021-03-05 13:20:08 -06:00
Mark Johnston
732b69c9f9 acpi: Make nexus_acpi quiet on amd64 and i386
Otherwise during attach newbus prints "nexus0", which is not very
useful.

The generic nexus device is already quiet, as is nexus_acpi on arm64.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-05 12:54:00 -05:00
Mark Johnston
aac25e2225 pmap: Fix largemap restart checks in the kernel_maps sysctl handler
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system.  However, if PG_PS is set, there is no next level.

Reported by:	rpokala
Reviewed by:	kib
Tested by:	rpokala
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28922
2021-02-25 18:49:47 -05:00
Andrew Turner
3fd63ddfdf Limit when we call DELAY from KCSAN on amd64
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.

Reviewed by:	kib
Tested by:	arichardson
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28895
2021-02-25 12:38:05 +00:00
Allan Jude
d0673fe160 smbios: Move smbios driver out from x86 machdep code
Add it to the x86 GENERIC and MINIMAL kernels

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	rpokala
Differential Revision:	https://reviews.freebsd.org/D28738
2021-02-23 21:17:09 +00:00
Mateusz Guzik
5fa12fe0cd amd64: implement strlen in assembly, take 2
Tested with glibc test suite.

The C variant in libkern performs excessive branching to find the zero
byte instead of using the bsfq instruction. The same code patched to use
it is still slower than the routine implemented here as the compiler
keeps neglecting to perform certain optimizations (like using leaq).

On top of that the routine can be used as a starting point for copyinstr
which operates on words intead of bytes.

The previous attempt had an instance of swapped operands to andq when
dealing with fully aligned case, which had a side effect of breaking the
code for certain corner cases. Noted by jrtc27.

Sample results:

$(perl -e "print 'A' x 3"):
stock:  211198039
patched:338626619
asm:    465609618

$(perl -e "print 'A' x 100"):
stock:   83151997
patched: 98285919
asm:    120719888

Reviewed by:	jhb, kib
Differential Revision:	https://reviews.freebsd.org/D28779
2021-02-21 00:43:05 +00:00
John Baldwin
67932460c7 Add a VA_IS_CLEANMAP() macro.
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.

In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions.  Abstracting this check reduces the diff relative
to FreeBSD.  It is perhaps slightly more readable as well.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D28710
2021-02-17 16:32:11 -08:00
Mark Johnston
0fc8a79672 linux: Unmap the VDSO page when unloading
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO.  When destroying the VDSO object, we failed to
destroy the mapping and free KVA.  Fix this.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28696
2021-02-16 09:40:02 -05:00
Roger Pau Monné
a2495c3667 xen/boot: allow specifying boot method when booted from Xen
Allow setting the bootmethod variable from the Xen PVH entry point, in
order to be able to correctly set the underlying firmware mode when
booted as a dom0.

Move the bootmethod variable to be defined in x86/cpu_machdep.c
instead so it can be shared by both i386 and amd64.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D28619
2021-02-16 15:26:11 +01:00
Edward Tomasz Napierala
cc743b050a linux: drop unneeded casts
No functional changes.

Sponsored By:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28533
2021-02-15 13:14:15 +00:00
Mateusz Guzik
b49a0db662 Revert "amd64: implement strlen in assembly"
This reverts commit af366d353b.

Trips over '\xa4' byte and terminates early, as found in
lib/libc/gen/setdomainname_test:setdomainname_basic testcase

However, keep moving libkern/strlen.c out of conf/files.

Reported by:	lwhsu
2021-02-09 16:23:18 +01:00
Mateusz Guzik
7da3bfc20c amd64: fix up a braino in strlen comment 2021-02-08 19:24:26 +00:00
Mateusz Guzik
af366d353b amd64: implement strlen in assembly
The C variant in libkern performs excessive branching to find the
non-zero byte instead of using the bsfq instruction. The same code
patched to use it is still slower than the routine implemented here
as the compiler keeps neglecting to perform certain optimizations
(like using leaq).

On top of that the routine can is a starting point for copyinstr
which operates on words instead of bytes.

Tested with glibc test suite.

Sample results (calls/s):

Haswell:
$(perl -e "print 'A' x 3"):
stock:	211198039
patched:338626619
asm:	465609618

$(perl -e "print 'A' x 100"):
stock:	 83151997
patched: 98285919
asm:	120719888

AMD EPYC 7R32:
$(perl -e "print 'A' x 3"):
stock:	282523617
asm:	491498172

$(perl -e "print 'A' x 100"):
stock:	114857172
asm:	112082057
2021-02-08 19:15:21 +00:00
Konstantin Belousov
5832a3e398 amd64 GENERIC: compile in mlx5en(4)
Reviewed by:	hselasky, manu
Sponsored by:	NVidia Networking/Mellanox Technologies
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28469
2021-02-05 03:22:26 +02:00
Muhammad Moinur Rahman
aa77662373 Add a comment notifying that "device axp" requires miibus for build.
Although if RJ-45 interface is not being used the miibus is not required
but miibus is a build time dependency.

Reviewed by:    imp, manu, rajesh1.kumar@amd.com
Approved by:    imp, manu, rajesh1.kumar@amd.com
Differential Revision:  https://reviews.freebsd.org/D28465
2021-02-04 21:05:47 +00:00
Roger Pau Monné
5ea878684f bhyve/ioapic: improve the tracking of IRR bit
One common method of EOI'ing an interrupt at the IO-APIC level is to
switch the pin to edge triggering mode and then back into level mode.
That would cause the IRR bit to be cleared and thus further interrupts
to be injected. FreeBSD does indeed use that method if the IO-APIC EOI
register is not supported.

The bhyve IO-APIC emulation code didn't clear the IRR bit when doing
that switch, and was also missing acknowledging the IRR state when
trying to inject an interrupt in vioapic_send_intr.

Reviewed by:		grehan
Differential revision:	https://reviews.freebsd.org/D28238
2021-02-02 09:47:00 +01:00
Roger Pau Monné
d7d067698a bhyve/ioapic: only account for asserted line in level mode
After modifying a redirection entry only try to inject an interrupt if
the pin is in level mode, pins in edge mode shouldn't take into
account the line assert status as they are triggered by edge changes,
not the line status itself.

Reviewed by:		grehan
Differential revision:	https://reviews.freebsd.org/D28237
2021-02-02 09:45:45 +01:00
Roger Pau Monné
49429cf9be bhyve/vioapic: remove an extra pin masked check
vioapic_send_intr does already check whether the pin is masked before
injecting the interrupt, there's no need to do it in vioapic_write
also.

No functional change intended.

Reviewed by:		grehan
Differential revision:	https://reviews.freebsd.org/D28236
2021-02-02 09:44:20 +01:00
Mateusz Guzik
aae89f6f09 amd64: use compiler intrinsics for bsf* and bsr* 2021-02-01 04:53:23 +00:00
Mateusz Guzik
f1be262ec1 amd64: move memcmp checks upfront
This is a tradeoff which saves jumps for smaller sizes while making
the 8-16 range slower (roughly in line with the other cases).

Tested with glibc test suite.

For example size 3 (most common with vfs namecache) (ops/s):
before:	407086026
after:	461391995

The regressed range of 8-16 (with 8 as example):
before:	540850489
after:	461671032
2021-01-31 16:07:20 +00:00
Mateusz Guzik
d1de5698df amd64: retire sse2_pagezero
All page zeroing is using temporal stores with rep movs*, the routine is
unused for several years.

Should a need arise for zeroing using non-temporal stores, a more
optimized variant can be implemented with a more descriptive name.
2021-01-30 00:17:15 +00:00
Mateusz Guzik
164c3b8184 amd64: add missing ALIGN_TEXT to loops in memset and memmove 2021-01-30 00:01:44 +00:00
Mateusz Guzik
b42a2ea558 Remove ndis(4) remnants from kernel configs
Unbreaks LINT kernels.
2021-01-26 00:04:13 +00:00
Edward Tomasz Napierala
c987d6a677 linux: map EBUSY returned by ptrace into ESRCH
The ptrace(2) Linux man page claims the syscall returns ESRCH,
if the tracee is not stopped; the native ptrace(2) returns EBUSY.

Sponsored by:	The FreeBSD Foundation
2021-01-19 11:21:55 +00:00
Edward Tomasz Napierala
47f7345bab linux: fix PTRACE_POKEDATA and PTRACE_POKETEXT.
Sponsored by:	The FreeBSD Foundation
2021-01-19 10:30:55 +00:00
Andrew Gallatin
efa9c21bca KTLS: Enable KERN_TLS in GENERIC on amd64
Based on discussions on freebsd-arch@, enable KERN_TLS in
GENERIC on amd64, but leave it disabled via the
sysctl kern.ipc.tls.enable.  Users wishing to enable
ktls must set kern.ipc.tls.enable=1

While here, fix wording in NOTES to mention that KERN_TLS
also does receive now.

Sponsored by:	Netflix

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D28163
2021-01-18 13:29:10 -05:00
Vladimir Kondratyev
b62f6dfaed hid: Replace USBHID_ENABLED kernel config option with loader tunable
usbhid(4) is disabled by default to avoid conflicts with existing USB HID
drivers. To enable it place following lines to /boot/loader.conf:

hw.usb.usbhid.enable=1
usbhid_load="YES"

Suggested by:	jhb
Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D28124
2021-01-14 23:04:47 +03:00
Andrew Turner
6eebda3bba Split out the NODEBUG options to a common file
This is the superset of the nooptions found in the -DEBUG kernels.

Reviewed by:	emaste, manu
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28152
2021-01-14 16:57:53 +00:00
Mateusz Guzik
37bd3aa6fa amd64: use builtins for all ffs* variants
While here even up whitespace.
2021-01-14 14:37:22 +00:00
John Baldwin
074a91f746 Enable accelerated AES-XTS software crypto in GENERIC.
In particular, using GELI on a root filesystem will only use
accelerated software crypto drivers if they are available before the
root filesystem is mounted.  While these modules can be loaded from
the loader, including them in GENERIC provides a better out-of-the-box
experience for users.

Both aesni(4) and armv8crypto(4) provide accelerated implementations
of the default cipher used by GELI (AES-XTS) in addition to other
ciphers.

Reviewed by:	mhorne, allanjude, markj
Differential Revision:	https://reviews.freebsd.org/D28100
2021-01-13 13:13:01 -08:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Mateusz Guzik
44121a0fbe amd64: fix tlb shootdown when all cpus are passed in the bitmap
Right now the routine leaves the current CPU in the map, later tripping
on an assert when filling in the scoreboard: panic: IPI scoreboard is
zero, initiator 1 target 1

Instead pre-check if all CPUs are present in the map and remember that
outcome for later.

Fixes:	7eaea04a5b ("amd64: compare TLB shootdown target to all_cpus")
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28111
2021-01-12 08:47:32 +00:00
Andrew Gallatin
7eaea04a5b amd64: compare TLB shootdown target to all_cpus
On amd64, the pmap code passes all_cpus to
smp_targeted_tlb_shootdown() when unmapping from the
kernel pmap.  This function has an optimized path to send IPIs
to all but itself, which it intends to do when the target
is all cpus.   However, we need to compare the target cpu mask
with all_cpus, rather than using CPU_ISFULLSET().  Comparing with
CPU_ISFULLSET() will only work when we have MAXCPU cpus active in
the system, otherwise, we'll be sending repeated IPIs, rather than
a single IPI to all CPUs but ourself.

Fixing this should reduce the time spent in native_lapic_ipi_wait()
as we will be sending ipis in parallel, rather than one-by-one.
This is confirmed by dtrace.

Reviewed by: alc, jhb, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D28102
2021-01-11 20:09:32 -05:00
Konstantin Belousov
4174e45fb4 amd64 pmap: do not sleep in pmap_allocpte_alloc() with zero referenced page table page.
Otherwise parallel pmap_allocpte_alloc() for nearby va might also fail
allocating page table page and free the page under us.  The end result is
that we could dereference unmapped pte when doing cleanup after sleep.

Instead, on allocation failure, first free everything, only then we can
drop pmap mutex and sleep safely, right before returning to caller.
Split inner non-sleepable part of the pmap_allocpte_alloc() into a new
helper pmap_allocpte_nosleep().

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:58 +02:00
Konstantin Belousov
9a8f5f5cf5 amd64 pmap: rename _pmap_allocpte() to pmap_allocpte_alloc().
The function performs actual allocation of pte, as opposed to
pmap_allocpte() that uses existing free pte if pt page is already
there. This also moves function out of namespace similar to a language
reserved.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:58 +02:00
Konstantin Belousov
f67064e592 amd64 pmap: Remove wrong __unused annotation from the va argument.
Noted by:	alc
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:58 +02:00
Konstantin Belousov
9658d9c71a amd64 pmap: fix NULL deref in pmap_mincore().
pmap_pdpe() might return NULL, check for it.

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27956
2021-01-11 22:57:52 +02:00
Roger Pau Monne
ed78016d00 xen/privcmd: implement the dm op ioctl
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.

This allows user-space to make use of the dm_op family of hypercalls,
which are used by device models.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:33:27 +01:00
Alan Cox
5a181b8bce Prefer the use of vm_page_domain() to vm_phys_domain().
When we already have the vm page in hand, use vm_page_domain() instead
of vm_phys_domain().  The former has a trivial constant-time
implementation whereas the latter iterates over the mem_affinity array.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D28005
2021-01-10 13:25:33 -06:00
Vladimir Kondratyev
0f0379fa55 hid: Add recently imported drivers to NOTES
Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D28060
2021-01-10 22:17:20 +03:00
Konstantin Belousov
45974de8fb x86: Add rdtscp32() into cpufunc.h.
Suggested by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
811f08c1cf amd64 pmap: add comment explaining TLB invalidation modes.
Requested and reviewed by:	alc
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25815
2021-01-10 02:44:42 +02:00
Warner Losh
a21def4d56 pccard: Remove wi(4) driver
Remove wi(4). pccard is going away, and wi only supports PC Card
devices, though it has a minor amount of glue to also support
PCI cards. However, removing the one without removing the other
is hard, so the whole driver is being removed.

Relnotes: Yes
2021-01-07 20:41:06 -07:00
Vladimir Kondratyev
01f2e864f7 hid: Import usbhid - USB transport backend for HID subsystem.
This change implements hid_if.m methods for HID-over-USB protocol [1].

Also, this change adds USBHID_ENABLED kernel option which changes
device_probe() priority and adds/removes PnP records to prefer usbhid
over ums, ukbd, wmt and other USB HID device drivers and vice-versa.

The module is based on uhid(4) driver.  It is disabled by default for
now due to conflicts with existing USB HID drivers.

[1] https://www.usb.org/sites/default/files/hid1_11.pdf

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D27893
2021-01-08 02:18:43 +03:00
Vladimir Kondratyev
b1f1b07f6d hid: Import iichid - I2C transport backend for HID subsystem
This implements hid_if.m methods for HID-over-I2C protocol [1].

Following kernel options are added:

IICHID_SAMPLING - Enable support for a sampling mode as interrupt
                  resource acquisition is not always possible in a case
                  of GPIO interrupts.
IICHID_DEBUG    - Enable debug output.

The module is based on prior Marc Priggemeyer work (D16698).

[1] http://download.microsoft.com/download/7/d/d/7dd44bb7-2a7a-4505-ac1c-7227d3d96d5b/hid-over-i2c-protocol-spec-v1-0.docx

Differential revision:	https://reviews.freebsd.org/D27892
2021-01-08 02:18:43 +03:00
Vladimir Kondratyev
1975878673 hid: Import functions and constants required by new subsystem
This does an import of quirk stubs, debugging macros from USB code and
numerous usage constants used by dependent drivers.

Besides, this change renames some functions to get a better matching
with userland library and NetBSD/OpenBSD HID code. Namely:

- Old hid_report_size() renamed to hid_report_size_max()
- New hid_report_size() calculates size of given report rather than
  maximum size of all reports.
- hid_get_data_unsigned() renamed to hid_get_udata()
- hid_put_data_unsigned() renamed to hid_put_udata()

Compat shim functions are provided in usbhid.h to make possible compile
of legacy code unmodified after this change.

Reviewed by:	manu, hselasky
Differential revision:	https://reviews.freebsd.org/D27887
2021-01-08 02:18:42 +03:00
Vladimir Kondratyev
67de2db262 Factor-out hardware-independent part of USB HID support to new module
It will be used by the upcoming HID-over-i2C implementation.  Should be
no-op, except hid.ko module dependency is to be added to affected drivers.

Reviewed by:	hselasky, manu
Differential revision:	https://reviews.freebsd.org/D27867
2021-01-08 02:18:42 +03:00
Konstantin Belousov
098dbd7ff7 amd64 nmi handler: fix comment about %ebx
Reported by:	markj
MFC after:	3 days
2020-12-27 12:59:33 +02:00
Konstantin Belousov
d39f7430a6 amd64: preserve %cr2 in NMI/MCE/DBG handlers.
These handlers could interrupt code which has interrupts disabled,
and if a spurious page fault occurs during exception handler run,
we get clobbered %cr2 in higher level stack.

This is mostly a speculation, but it is based on hints from good sources.

MFC after:	1 week
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27772
2020-12-27 12:59:33 +02:00
Kristof Provost
3b5008b065 Fix amd64 GENERIC-MMCCAM kernel build
c4df8cbfde removed bvmconsole and
bvmdebug, but missed the bvmconsole entry in GENERIC-MMCCAM.
2020-12-24 22:30:52 +01:00
Mitchell Horne
962c06c5a3 gdb(4) fix x86 signal reporting
The existing values correspond to x86 exception vector numbers, but the
trap numbers used in the kernel do not match these 1-to-1. Prefer the
definitions from x86/trap.h, as they are what actually get passed to
kdb_trap(). This is of little consequence, as gdb_cpu_signal() only
reports the trap reason (signal number) to the gdb client.

This is limited to the subset of trap values for which kdb_trap() is
reachable.

Reviewed by:	kib
Discussed with:	jhb
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27645
2020-12-23 15:40:14 -04:00
John Baldwin
1dce7d9e7e Skip the vm.pmap.kernel_maps sysctl by default.
This sysctl node can generate very verbose output, so don't trigger it
for sysctl -a or sysctl vm.pmap.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D27504
2020-12-18 20:41:23 +00:00
Mitchell Horne
72939459bd amd64: use register macros for gdb_cpu_getreg()
Prefer these newly-added definitions to bare values.

MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2020-12-18 16:16:03 +00:00
Mitchell Horne
0ef474de88 amd64: allow gdb(4) to write to most registers
Similar to the recent patch to arm's gdb stub in r368414, allow GDB to
update the contents of most general purpose registers.

Reviewed by:	cem, jhb, markj
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	44
Differential Revision:	https://reviews.freebsd.org/D27642
2020-12-18 16:09:24 +00:00
Konstantin Belousov
3fd989da91 amd64 pmap: fix PCID mode invalidations
When r362031 moved local TLB invalidation after shootdown IPI send, it
moved too much.  In particular, PCID-mode clearing of the pm_gen
generation counters must occur before IPIs are send, which is in fact
described by the comment before seq_cst fence in the invalidation
functions.

Fix it by extracting pm_gen clearing into new helper
pmap_invalidate_preipi(), which is executed before a call to
smp_masked_tlb_shootdown().

Rest of the local invalidation callbacks is simplified as result, and
become very similar to the remote shootdown handlers (to be merged in
some future).

Move pin of the thread to pmap_invalidate_preipi(), and do unpin in
smp_masked_tlb_shootdown().

Reported and tested by:	mjg (previous version)
Reviewed by:	alc, cem (previous version), markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D227588
2020-12-14 22:52:29 +00:00
Alexander V. Chernikov
d5fe384b4d Enable ROUTE_MPATH support in GENERIC kernels.
Ability to load-balance traffic over multiple path is a must-have thing for routers.
It may be used by the servers to balance outgoing traffic over multiple default gateways.

The previous implementation, RADIX_MPATH stayed in the shadow for too long.
It was not well maintained, which lead us to a vicious circle - people were using
 non-contiguous mask or firewalls to achieve similar goals. As a result, some routing
 daemons implementation still don't have multipath support enabled for FreeBSD.

Turning on ROUTE_MPATH by default would fix it. It will allow to reduce networking
 feature gap to other operating systems. Linux and OpenBSD enabled similar support
 at least 5 years ago.

ROUTE_MPATH does not consume memory unless actually used. It enables around ~1k LOC.

It does not bring any behaviour changes for userland.
Additionally, feature is (temporarily) turned off by the net.route.multipath sysctl
 defaulting to 0.

Differential Revision:	https://reviews.freebsd.org/D27428
2020-12-14 22:23:08 +00:00
Brooks Davis
9ee99cec1f hme(4): Remove as previous announced
The hme (Happy Meal Ethernet) driver was the onboard NIC in most
supported sparc64 platforms. A few PCI NICs do exist, but we have seen
no evidence of use on non-sparc systems.

Reviewed by:	imp, emaste, bcr
Sponsored by:	DARPA
2020-12-11 21:40:38 +00:00
Tijl Coosemans
df4ca45cf9 Fix i386 linux module after r367395.
In r367395 parts of machine dependent linux_dummy.c were moved to a new
machine independent file sys/compat/linux/linux_dummy.c and the existing
linux_dummy.c was renamed to linux_dummy_machdep.c.

Add linux_dummy_machdep.c to the linux module for i386.
Rename sys/amd64/linux32/linux_dummy.c for consistency.
Add the new linux_dummy.c to the linux module for i386.
2020-12-05 14:53:24 +00:00
Toomas Soome
a4a10b37d4 Add VT driver for VBE framebuffer device
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.

struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.

Differential Revision:	https://reviews.freebsd.org/D27373
2020-11-30 08:22:40 +00:00
Konstantin Belousov
3c48106aaa bhyve: limit max GPA to VM_MAXUSER_ADDRESS_LA48.
We use 4-level EPT pages, correct the upper bound.

Reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27402
2020-11-29 10:32:38 +00:00
Peter Grehan
15add60d37 Convert vmm_ops calls to IFUNC
There is no need for these to be function pointers since they are
never modified post-module load.

Rename AMD/Intel ops to be more consistent.

Submitted by:	adam_fenn.io
Reviewed by:	markj, grehan
Approved by:	grehan (bhyve)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D27375
2020-11-28 01:16:59 +00:00
Ruslan Bukin
f120ad6ba0 o Move options IOMMU from Debugging section back to the Bus section
where it originally was. The bug introduced in r366267.
o Remove options IOMMU from i386/MINIMAL as we don't have it in
  i386/GENERIC.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D27399
2020-11-27 21:37:48 +00:00
Peter Grehan
ddfc488c36 Remove manual instruction encodings for VMLOAD, VMRUN, and VMSAVE.
This is	a relic	from when these	instructions weren't supported by the toolchain.
No functional change.

Submitted by:	adam_fenn.io
Reviewed by:	grehan
Approved by:	grehan (bhyve)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D27130
2020-11-26 05:58:55 +00:00
Maxim Sobolev
fd2ef8ef5a Unobfuscate "KERNLOAD" parameter on amd64. This change lines-up amd64 with the
i386 and the rest of supported architectures by defining KERNLOAD in the
vmparam.h and getting rid of magic constant in the linker script, which albeit
documented via comment but isn't programmatically accessible at a compile time.

Use KERNLOAD to eliminate another (matching) magic constant 100 lines down
inside unremarkable TU "copy.c" 3 levels deep in the EFI loader tree.

Reviewed by:	markj
Approved by:	markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D27355
2020-11-25 23:19:01 +00:00
John Baldwin
908dca3ef4 Pull the check for VM ownership into ppt_find().
This reduces some code duplication.  One behavior change is that
ppt_assign_device() will now only succeed if the device is unowned.
Previously, a device could be assigned to the same VM multiple times,
but each time it was assigned, the device's state was reset.

Reviewed by:	markj, grehan
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27301
2020-11-24 23:56:33 +00:00
John Baldwin
1925586e03 Honor the disabled setting for MSI-X interrupts for passthrough devices.
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X.  This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).

This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.

While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	grehan, markj
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27212
2020-11-24 23:18:52 +00:00
Jung-uk Kim
926ce35a7e Port rtsx(4) driver for Realtek SD card reader from OpenBSD.
This driver provides support for Realtek PCI SD card readers.  It attaches
mmc(4) bus on card insertion and detaches it on card removal.  It has been
tested with RTS5209, RTS5227, RTS5229, RTS522A, RTS525A and RTL8411B.  It
should also work with RTS5249, RTL8402 and RTL8411.

PR:			204521
Submitted by:		Henri Hennebert (hlh at restart dot be)
Reviewed by:		imp, jkim
Differential Revision:	https://reviews.freebsd.org/D26435
2020-11-24 21:28:44 +00:00
Konstantin Belousov
4815f175d0 Linuxolator: Replace use of eventhandlers by sysent hooks.
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27309
2020-11-23 18:18:16 +00:00
Mark Johnston
431fb8abd7 vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures.  We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.

Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain().  Make the latter an inline function.

Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers.  Include it from vm_page.h so that
vm_page_domain() can be defined there.

Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.

Reviewed by:	alc
Reviewed by:	dougm, kib (earlier versions)
Differential Revision:	https://reviews.freebsd.org/D27207
2020-11-19 03:59:21 +00:00
Conrad Meyer
77eb984147 'make sysent' for r367773
X-MFC-With:	r367773
2020-11-17 19:53:59 +00:00
Conrad Meyer
de774e422e linux(4): Implement name_to_handle_at(), open_by_handle_at()
They are similar to our getfhat(2) and fhopen(2) syscalls.

Differential Revision:	https://reviews.freebsd.org/D27111
2020-11-17 19:51:47 +00:00
Mark Johnston
6f5a960678 vmm: Make pmap_invalidate_ept() wait synchronously for guest exits
Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number.  However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.

Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation.  Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.

Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.

Reviewed by:	grehan, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26910
2020-11-11 15:01:17 +00:00
Conrad Meyer
e9b13c6612 linux(4): Deduplicate unimpl/dummy syscall handlers
No functional change.

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27099
2020-11-05 19:30:31 +00:00
Mark Johnston
72143e89bb Add qat(4)
This provides an OpenCrypto driver for Intel QuickAssist devices.  The
driver was initially ported from NetBSD and comes with a few
improvements:
- support for GMAC/AES-GCM, AES-CTR and AES-XTS, and support for
  SHA/HMAC-authenticated encryption
- support for detaching the driver
- various bug fixes
- DH895X support

Discussed with:	jhb
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D26963
2020-11-05 15:55:23 +00:00
Mark Johnston
cff169880e amd64: Make it easier to configure exception stack sizes
The amd64 kernel handles certain types of exceptions on a dedicated
stack.  Currently the sizes of these stacks are all hard-coded to
PAGE_SIZE, but for at least NMI handling it can be useful to use larger
stacks.  Add constants to intr_machdep.h to make this easier to tweak.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27076
2020-11-04 16:42:20 +00:00
Mateusz Piotrowski
a858a39b31 Fix a typo 2020-11-04 10:38:25 +00:00
Alan Cox
9b4e77cb97 Tidy up the #includes. Recent changes, such as the introduction of
VM_ALLOC_WAITOK and vm_page_unwire_noq(), have eliminated the need for
many of the #includes.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D27052
2020-11-02 19:20:06 +00:00
Mateusz Guzik
82c174a3b4 malloc: delegate M_EXEC handling to dedicacted routines
It is almost never needed and adds an avoidable branch.

While here do minior clean ups in preparation for larger changes.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27019
2020-10-30 20:02:32 +00:00
Edward Tomasz Napierala
bce7ee9d41 Drop "All rights reserved" from all my stuff. This includes
Foundation copyrights, approved by emaste@.  It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.

Reviewed by:	emaste, imp, gbe (manpages)
Differential Revision:	https://reviews.freebsd.org/D26980
2020-10-28 13:46:11 +00:00
Edward Tomasz Napierala
866b1f5147 Fix misnomer - linux_to_bsd_errno() does the exact opposite.
Reported by:	arichardson
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26965
2020-10-27 12:49:40 +00:00
Kyle Evans
d42a83b1a9 audit: also correctly audit linux_execve()
Linux execve() gets audited as AUE_EXECVE as well, we should also interpret
the return from this correctly for the same reasoning as in r367002.

MFC with:	r367002
2020-10-26 17:30:17 +00:00
Konstantin Belousov
c0b5fcf692 Improve FPU Tag Word reconstruction on i386 to indicate register states.
Improve the code reconstructing en_tw in struct fpreg32 from FXSAVE
results so that all register states are indicated correctly.  The
previous code unconditionally mapped non-empty register state to
'normalized value' constant.  The new code explicitly distinguishes
the 'zero value' and 'special value' constants as well.  This improves
consistency between real FSAVE and translation from FXSAVE, and
ensures that tests using PT_GETFPREGS can rely on a single correct
value independently of the underlying implementation.

PR:	250454
Sponsored by:	The FreeBSD Foundation
Obtained from:	Moritz Systems
Submitted by:	Michał Górny <mgorny@moritz.systems>
Discussed with:	emaste
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26856
2020-10-21 00:15:12 +00:00
John Baldwin
ba610be90a Add a kernel crypto driver using assembly routines from OpenSSL.
Currently, this supports SHA1 and SHA2-{224,256,384,512} both as plain
hashes and in HMAC mode on both amd64 and i386.  It uses the SHA
intrinsics when present similar to aesni(4), but uses SSE/AVX
instructions when they are not.

Note that some files from OpenSSL that normally wrap the assembly
routines have been adapted to export methods usable by 'struct
auth_xform' as is used by existing software crypto routines.

Reviewed by:	gallatin, jkim, delphij, gnn
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26821
2020-10-20 17:50:18 +00:00
Mark Johnston
8e2cbc5660 vmx: Implement pmap (de)activation in C
Rewrite the code that maintains pm_active and invalidates EPTP-tagged
TLB entries in C.  Previously this work was done in vmx_enter_guest(),
in assembly, but there is no good reason for that and it makes the TLB
invalidation algorithm for nested page tables harder to review.

No functional change intended.  Now, an error from the invept
instruction results in a kernel panic rather than a vmexit.  Such errors
should occur only as a result of VMM bugs.

Reviewed by:	grehan, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26830
2020-10-19 15:24:35 +00:00
Edward Tomasz Napierala
6221ec6064 Stop calling set_syscall_retval() from linux_set_syscall_retval().
The former clobbers some registers that shouldn't be touched.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26406
2020-10-18 16:16:22 +00:00
Edward Tomasz Napierala
c0d07d326f Slightly tweak linux ptrace(2) debug message; no functional changes.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26815
2020-10-18 15:56:47 +00:00
Konstantin Belousov
546df7a45d amd64 pmap.h: explicitly provide constants values instead of relying
on some more advanced C features.

This fixes gcc-toolchain build of exception.S.

Reported and tested by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-16 16:22:32 +00:00
Mitchell Horne
ce4900bc8a Simplify preload_dump() condition
Hiding this feature behind RB_VERBOSE is gratuitous. The tunable is enough
to limit its use to only those who explicitly request it.

Suggested by:	kevans
2020-10-15 20:21:15 +00:00
Konstantin Belousov
e406235000 Fix for mis-interpretation of PCB_KERNFPU.
RIght now PCB_KERNFPU is used both as indication that kernel prepared
hardware FPU context to use and that the thread is fpu-kern
thread.  This also breaks fpu_kern_enter(FPU_KERN_NOCTX), since
fpu_kern_leave() then clears PCB_KERNFPU.

Introduce new flag PCB_KERNFPU_THR which indicates that the thread is
fpu-kern.  Do not clear PCB_KERNFPU if fpu-kern thread leaves noctx
fpu region.

Reported and tested by:	jhb (amd64)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25511
2020-10-14 23:01:41 +00:00
Konstantin Belousov
d3ba71b2b1 Limit workaround for errata E400 to appropriate AMD cpus.
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10.  For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved.  Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.

While there, style the code:
    move MSR defines to specialreg.h
    move identification to initcpu.c

Reported by:	whu
Reviewed by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26470
2020-10-14 22:57:50 +00:00
Konstantin Belousov
6f3b523c9a Avoid dump_avail[] redefinition.
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h.  This fixes default gcc build for mips.

Reviewed by:	alc, scottph
Tested by:	kevans (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26741
2020-10-14 22:51:40 +00:00
Tycho Nightingale
42360f5c5b eliminate possible race in parallel TLB shootdown IPI
On the target side TLB shootdown IPI handler, prevent the compiler
from performing a forward store optimization which may mask a
subsequent update to the scoreboard by the initiator.

Reported by:	Max Laier, Anton Rang
Discussed with:	kib
Sponsored by:	Dell EMC Isilon
2020-10-13 18:28:48 +00:00
Emmanuel Vadot
7113afc84c 10Gigabit Ethernet driver for AMD SoC
This patch has the driver for 10Gigabit Ethernet controller in AMD
SoC. This driver is written compatible to the Iflib framework. The
existing driver is for the old version of hardware. The submitted
driver here is for the recent versions of the hardware where the Ethernet
controller is PCI-E based.

Submitted by:	Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25793
2020-10-11 16:01:16 +00:00
Conrad Meyer
f8e8a06d23 random(4) FenestrasX: Push root seed version to arc4random(3)
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled.  Otherwise, there is no
functional change.  The mechanism can be disabled with
debug.fxrng_vdso_enable=0.

arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time.  Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed.  On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.

This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator.  That change is
left for future discussion/work.

Reviewed by:	kib (previous version)
Approved by:	csprng (me -- only touching FXRNG here)
Differential Revision:	https://reviews.freebsd.org/D22839
2020-10-10 21:52:00 +00:00
Warner Losh
7e46dafa58 Create in-tree LINT files
Now that config(8) has supported include for 19 years, transition to
including the NOTES files. include support didn't exist at the time,
nor did the envvar stuff recently added. Now that it does, eliminate
the building of LINT files by just including everything you need.

Note: This may cause conflicts with updating in some cases.
	find sys -name LINT\* -rm
is suggested across this commit to remove the generated LINT
files.

Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D26540
2020-10-09 01:48:14 +00:00
Mitchell Horne
22e6a67086 Add a routine to dump boot metadata
The boot metadata (also referred to as modinfo, or preload metadata)
provides information about the size and location of the kernel,
pre-loaded modules, and other metadata (e.g. the EFI framebuffer) to be
consumed during by the kernel during early boot. It is encoded as a
series of type-length-value entries and is usually constructed by
loader(8) and passed to the kernel. It is also faked on some
architectures when booted by other means.

Although much of the module information is available via kldstat(8),
there is no easy way to debug the metadata in its entirety. Add some
routines to parse this data and allow it to be printed to the console
during early boot or output via a sysctl.

Since the output can be lengthly, printing to the console is gated
behind the debug.dump_modinfo_at_boot kenv variable as well as the
BOOTVERBOSE flag. The sysctl to print the metadata is named
debug.dump_modinfo.

Reviewed by:	tsoome
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26687
2020-10-08 18:02:05 +00:00
Mitchell Horne
8481aab1ac Print symbol index for unsupported relocation types
It is unlikely, but possible, that an unrecognized or unsupported
relocation type is encountered while trying to load a kernel module. If
this occurs we should offer the symbol index as a hint to the user.

While here, fix some small style issues.

Reviewed by:	markj, kib (amd64 part, in D26701)
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2020-10-07 18:48:10 +00:00
Konstantin Belousov
df01340989 amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64).  If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.

Reported and tested by:	Michał Górny <mgorny@mgorny@moritz.systems>
PR:	250043
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26643
2020-10-03 23:17:29 +00:00
Konstantin Belousov
9f2a3e3b0a Fix pmap_pti_add_kva() call for doublefault stack page.
After r354889 stack got struct nmi_pcpu at top, which makes IST top
not page-aligned.  Since pmap_pti_add_kva() truncates/rounds up
addresses, it erronously entered a page mapped before double fault
stack into the pti page table.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:11:20 +00:00
Konstantin Belousov
5e8ea68fd8 Move ctx_switch_xsave declaration to amd64 md_var.h.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:07:09 +00:00
Emmanuel Vadot
90b8c0ea10 Fix LINT: Add backlight to NOTES 2020-10-02 20:52:09 +00:00
Mark Johnston
494955366a Remove svn:executable from a couple of vmm(4) source files.
MFC after:	3 days
2020-10-01 22:20:29 +00:00
John Baldwin
a3f2a9c57e Clear the upper 32-bits of registers in x86_emulate_cpuid().
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared.  While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.

This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.

SmartOS bug:	https://smartos.org/bugview/OS-8168
Submitted by:	Patrick Mooney
Reviewed by:	rgrimes
Differential Revision:	https://reviews.freebsd.org/D24727
2020-10-01 16:45:11 +00:00
Ruslan Bukin
6186bfbd18 Rename kernel option ACPI_DMAR to IOMMU.
This is mostly needed for a common arm64/amd64 iommu code.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26587
2020-09-29 20:29:07 +00:00
Edward Tomasz Napierala
1e2521ffae Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26458
2020-09-27 18:47:06 +00:00
Edward Tomasz Napierala
0c5bd5f993 Regen after r366145.
Sponsored by:	DARPA
2020-09-25 10:05:38 +00:00
Mark Johnston
78257765f2 Add a vmparam.h constant indicating pmap support for large pages.
Enable SHM_LARGEPAGE support on arm64.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26467
2020-09-23 19:34:21 +00:00
Warner Losh
f9ba2bbe3a Use envvar rather than nonstandard hint. lines
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.

Suggested by: kevans
2020-09-23 19:18:53 +00:00
Konstantin Belousov
b82149116a amd64 pmap: More unification for psind = 1 vs 2 in pmap_enter_largepage().
Move
  pkru check
  wait for page alloc
  wire accounting update
  asserting allowed updates for valid mappings
out of psind conditions.

Also add assert that psind references supported page size.
Remove not true comment.
Avoid uneccessary page table walks from top level.

Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26513
2020-09-22 23:28:06 +00:00
D Scott Phillips
00e6614750 Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

Reviewed by:	jhb
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26131
2020-09-21 22:21:59 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Konstantin Belousov
6d4b6bd3ce amd64 pmap: only calculate page table page when needed.
Noted by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26499
2020-09-21 15:53:41 +00:00
Konstantin Belousov
7149d7209e amd64 pmap: handle cases where pml4 page table page is not allocated.
Possible in LA57 pmap config.

Noted by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26492
2020-09-20 22:16:24 +00:00
Mark Johnston
d26ab2bec0 Fix some nits in 1G page support in the amd64 pmap.
- Move assertions out of the main loop to avoid duplicate conditional
  expressions, and improve assertion messages.
- Fix va_next updates.  In some cases we were not doing the wraparound
  check before continuing the loop.
- Use the right va_next.  In pmap_advise() and pmap_copy() we would step
  through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().

Reviewed by:	alc, kib
MFC with:	r365518
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26463
2020-09-19 15:22:04 +00:00
Eric van Gyzen
d8d2dda141 amd64 pmap_pkru_same: prev_ppr was always NULL
Fix the logic so it works as it appears.

Reported by:	Coverity
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	D26211 (in progress, so omitting full URL)
2020-09-18 20:53:40 +00:00
Mark Johnston
04636a71c6 Ensure that a protection key is selected in pmap_enter_largepage().
Reviewed by:	alc, kib
Reported by:	Coverity
MFC with:	r365518
Differential Revision:	https://reviews.freebsd.org/D26464
2020-09-18 12:30:39 +00:00
Edward Tomasz Napierala
70890254b3 Get rid of sv_errtbl and SV_ABI_ERRNO().
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26388
2020-09-17 11:39:33 +00:00
Ed Maste
09860d44e4 bhyve: do not permit write access to VMCB / VMCS
Reported by:	Patrick Mooney
Submitted by:	jhb
Security:	CVE-2020-24718
2020-09-15 21:04:27 +00:00
Konstantin Belousov
101d5b527a bhyve: intercept AMD SVM instructions.
Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an
SVM instruction.  Otherwise, SVM allows execution of them, and instructions
operate on host physical addresses despite being executed in guest mode.

Reported by:	Maxime Villard <max@m00nbsd.net>
admbug:	972
CVE:	CVE-2020-7467
Reviewed by:	grehan, markj
Differential revision:	https://reviews.freebsd.org/D26313
2020-09-15 20:22:50 +00:00
Edward Tomasz Napierala
c26391f4dd Move SV_ABI_ERRNO translation into linux-specific code, to simplify
the syscall path and declutter it a bit.  No functional changes intended.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26378
2020-09-15 16:41:21 +00:00
John Baldwin
d8dc46f6e9 Add constant for the DE_CFG MSR on AMD CPUs.
Reported by:	Patrick Mooney <pmooney@pfmooney.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25885
2020-09-11 20:32:40 +00:00
John Baldwin
385f4a5ac8 Use vmcb_read/write for the vmcb snapshot functions.
This avoids some unnecessary layers of indirection.
2020-09-10 22:22:23 +00:00
Konstantin Belousov
6cadbcd203 Add pmap_enter(9) PMAP_ENTER_LARGEPAGE flag and implement it on amd64.
The flag requests entry of non-managed superpage mapping of size
pagesizes[psind] into the page table.

Pmap supports fake wiring of the largepage mappings.  Only attributes
of the largepage mapping can be changed by calling pmap_enter(9) over
existing mapping, physical address of the page must be unchanged.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 21:50:24 +00:00