Commit Graph

8301 Commits

Author SHA1 Message Date
Emmanuel Vadot
90b8c0ea10 Fix LINT: Add backlight to NOTES 2020-10-02 20:52:09 +00:00
Mark Johnston
494955366a Remove svn:executable from a couple of vmm(4) source files.
MFC after:	3 days
2020-10-01 22:20:29 +00:00
John Baldwin
a3f2a9c57e Clear the upper 32-bits of registers in x86_emulate_cpuid().
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared.  While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.

This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.

SmartOS bug:	https://smartos.org/bugview/OS-8168
Submitted by:	Patrick Mooney
Reviewed by:	rgrimes
Differential Revision:	https://reviews.freebsd.org/D24727
2020-10-01 16:45:11 +00:00
Ruslan Bukin
6186bfbd18 Rename kernel option ACPI_DMAR to IOMMU.
This is mostly needed for a common arm64/amd64 iommu code.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26587
2020-09-29 20:29:07 +00:00
Edward Tomasz Napierala
1e2521ffae Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26458
2020-09-27 18:47:06 +00:00
Edward Tomasz Napierala
0c5bd5f993 Regen after r366145.
Sponsored by:	DARPA
2020-09-25 10:05:38 +00:00
Mark Johnston
78257765f2 Add a vmparam.h constant indicating pmap support for large pages.
Enable SHM_LARGEPAGE support on arm64.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26467
2020-09-23 19:34:21 +00:00
Warner Losh
f9ba2bbe3a Use envvar rather than nonstandard hint. lines
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.

Suggested by: kevans
2020-09-23 19:18:53 +00:00
Konstantin Belousov
b82149116a amd64 pmap: More unification for psind = 1 vs 2 in pmap_enter_largepage().
Move
  pkru check
  wait for page alloc
  wire accounting update
  asserting allowed updates for valid mappings
out of psind conditions.

Also add assert that psind references supported page size.
Remove not true comment.
Avoid uneccessary page table walks from top level.

Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26513
2020-09-22 23:28:06 +00:00
D Scott Phillips
00e6614750 Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

Reviewed by:	jhb
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26131
2020-09-21 22:21:59 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Konstantin Belousov
6d4b6bd3ce amd64 pmap: only calculate page table page when needed.
Noted by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26499
2020-09-21 15:53:41 +00:00
Konstantin Belousov
7149d7209e amd64 pmap: handle cases where pml4 page table page is not allocated.
Possible in LA57 pmap config.

Noted by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26492
2020-09-20 22:16:24 +00:00
Mark Johnston
d26ab2bec0 Fix some nits in 1G page support in the amd64 pmap.
- Move assertions out of the main loop to avoid duplicate conditional
  expressions, and improve assertion messages.
- Fix va_next updates.  In some cases we were not doing the wraparound
  check before continuing the loop.
- Use the right va_next.  In pmap_advise() and pmap_copy() we would step
  through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().

Reviewed by:	alc, kib
MFC with:	r365518
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26463
2020-09-19 15:22:04 +00:00
Eric van Gyzen
d8d2dda141 amd64 pmap_pkru_same: prev_ppr was always NULL
Fix the logic so it works as it appears.

Reported by:	Coverity
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	D26211 (in progress, so omitting full URL)
2020-09-18 20:53:40 +00:00
Mark Johnston
04636a71c6 Ensure that a protection key is selected in pmap_enter_largepage().
Reviewed by:	alc, kib
Reported by:	Coverity
MFC with:	r365518
Differential Revision:	https://reviews.freebsd.org/D26464
2020-09-18 12:30:39 +00:00
Edward Tomasz Napierala
70890254b3 Get rid of sv_errtbl and SV_ABI_ERRNO().
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26388
2020-09-17 11:39:33 +00:00
Ed Maste
09860d44e4 bhyve: do not permit write access to VMCB / VMCS
Reported by:	Patrick Mooney
Submitted by:	jhb
Security:	CVE-2020-24718
2020-09-15 21:04:27 +00:00
Konstantin Belousov
101d5b527a bhyve: intercept AMD SVM instructions.
Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an
SVM instruction.  Otherwise, SVM allows execution of them, and instructions
operate on host physical addresses despite being executed in guest mode.

Reported by:	Maxime Villard <max@m00nbsd.net>
admbug:	972
CVE:	CVE-2020-7467
Reviewed by:	grehan, markj
Differential revision:	https://reviews.freebsd.org/D26313
2020-09-15 20:22:50 +00:00
Edward Tomasz Napierala
c26391f4dd Move SV_ABI_ERRNO translation into linux-specific code, to simplify
the syscall path and declutter it a bit.  No functional changes intended.

Reviewed by:	kib (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26378
2020-09-15 16:41:21 +00:00
John Baldwin
d8dc46f6e9 Add constant for the DE_CFG MSR on AMD CPUs.
Reported by:	Patrick Mooney <pmooney@pfmooney.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25885
2020-09-11 20:32:40 +00:00
John Baldwin
385f4a5ac8 Use vmcb_read/write for the vmcb snapshot functions.
This avoids some unnecessary layers of indirection.
2020-09-10 22:22:23 +00:00
Konstantin Belousov
6cadbcd203 Add pmap_enter(9) PMAP_ENTER_LARGEPAGE flag and implement it on amd64.
The flag requests entry of non-managed superpage mapping of size
pagesizes[psind] into the page table.

Pmap supports fake wiring of the largepage mappings.  Only attributes
of the largepage mapping can be changed by calling pmap_enter(9) over
existing mapping, physical address of the page must be unchanged.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 21:50:24 +00:00
Konstantin Belousov
fceae0fd3c Fix assert.
Noted by:	alc
MFC after:	1 week
2020-09-09 21:35:44 +00:00
Konstantin Belousov
4ebfc4edaf amd64 pmap: teach functions walking user page tables about PG_PS bit in PDPE.
Only unmanaged 1G superpages are handled.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 21:08:45 +00:00
Konstantin Belousov
6e64bebb6f amd64: report support for 1G superpages in getpagesizes(2).
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D24652
2020-09-09 21:01:36 +00:00
Mark Johnston
847ab36bf2 Include the psind in data returned by mincore(2).
Currently we use a single bit to indicate whether the virtual page is
part of a superpage.  To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.

The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl.  MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.

For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26238
2020-09-02 18:16:43 +00:00
Mark Johnston
2d838cd867 Add the MEM_EXTRACT_PADDR ioctl to /dev/mem.
This allows privileged userspace processes to find information about the
physical page backing a given mapping.  It is useful in applications
such as DPDK which perform some of their own memory management.

Reviewed by:	kib, jhb (previous version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D26237
2020-09-02 18:12:47 +00:00
Konstantin Belousov
8f8838c059 Fix a page table pages leak after LA57.
If the call to _pmap_allocpte() is not sleepable, it is possible that
allocation of PML4 or PDP page is successful but either PDP or PD page
is not.  Restructured code in _pmap_allocpte() leaves zero-referenced
page in the paging structure.

Handle it by checking refcount of the page one level above failed
alloc and free that page if its reference count is zero.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26293
2020-09-02 15:55:16 +00:00
Mateusz Guzik
543769bf83 amd64: clean up empty lines in .c and .h files 2020-09-01 21:16:54 +00:00
Matt Macy
fb702b4446 ZFS: clarify dependencies for static linking 2020-08-28 17:06:35 +00:00
Konstantin Belousov
f1e1c4ad73 Restore workaround for sysret fault on non-canonical address after LA57.
Sponsored by:	The FreeBSD Foundation
2020-08-24 22:12:45 +00:00
Peter Grehan
a333a508a2 cpu_auxmsr: assert caller is preventing CPU migration.
Submitted by:	Adam Fenn (adam at fenn dot io)
Requested by:	kib
Reviewed by:	kib, grehan
Approved by:	kib
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D26166
2020-08-24 11:49:49 +00:00
Konstantin Belousov
f446480b5f amd64: Handle 5-level paging on wakeup.
We can switch into long mode directly with LA57 enabled.

Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:43:23 +00:00
Konstantin Belousov
177622f1fd amd64: Handle 5-level paging for efirt calls.
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:40:35 +00:00
Konstantin Belousov
f3eb12e4a6 Add bhyve support for LA57 guest mode.
Noted and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:37:21 +00:00
Konstantin Belousov
25f2da2e64 Add amd64 procctl(2) ops to manage forced LA48/LA57 VA after exec.
Tested by:	pho (LA48 hardware)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:32:13 +00:00
Konstantin Belousov
9ce875d9b5 amd64 pmap: LA57 AKA 5-level paging
Since LA57 was moved to the main SDM document with revision 072, it
seems that we should have a support for it, and silicons are coming.

This patch makes pmap support both LA48 and LA57 hardware.  The
selection of page table level is done at startup, kernel always
receives control from loader with 4-level paging.  It is not clear how
UEFI spec would adapt LA57, for instance it could hand out control in
LA57 mode sometimes.

To switch from LA48 to LA57 requires turning off long mode, requesting
LA57 in CR4, then re-entering long mode.  This is somewhat delicate
and done in pmap_bootstrap_la57().  AP startup in LA57 mode is much
easier, we only need to toggle a bit in CR4 and load right value in CR3.

I decided to not change kernel map for now.  Single PML5 entry is
created that points to the existing kernel_pml4 (KML4Phys) page, and a
pml5 entry to create our recursive mapping for vtopte()/vtopde().
This decision is motivated by the fact that we cannot overcommit for
KVA, so large space there is unusable until machines start providing
wider physical memory addressing.  Another reason is that I do not
want to break our fragile autotuning, so the KVA expansion is not
included into this first step.  Nice side effect is that minidumps are
compatible.

On the other hand, (very) large address space is definitely
immediately useful for some userspace applications.

For userspace, numbering of pte entries (or page table pages) is
always done for 5-level structures even if we operate in 4-level mode.
The pmap_is_la57() function is added to report the mode of the
specified pmap, this is done not to allow simultaneous 4-/5-levels
(which is not allowed by hw), but to accomodate for EPT which has
separate level control and in principle might not allow 5-leve EPT
despite x86 paging supports it. Anyway, it does not seems critical to
have 5-level EPT support now.

Tested by:	pho (LA48 hardware)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:19:04 +00:00
Eric van Gyzen
d17136fc3d amd64 pmap: potential integer overflowing expression
Coverity has identified the line in this change as "Potential integer
overflowing expression" due to the variable i declared as an int
and used in an expression with vm_paddr_t, a 64bit variable.

This change has very little effect as when this line is execute
nkpt is small and phys_addr is a the beginning of physical memory.
But there is no explicit protection that the above is true.

Submitted by:	bret_ketchum@dell.com
Reported by:	Coverity
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26141
2020-08-21 14:22:32 +00:00
Mark Johnston
72c7f24c8d Use pmap_mapbios() to map ACPI tables on amd64 and i386.
The ACPI table-mapping code used pmap_kenter_temporary() to create
mappings, which in turn uses the fixed-size crashdump map.  Moreover,
the code was not verifying that the table fits in this map, so when
mapping large tables we could clobber adjacent mappings.  This use of
pmap_kenter_temporary() appears to predate support in pmap_mapbios() for
creating early mappings, but that restriction no longer applies.

PR:		248746
Reviewed by:	kib, mav
Tested by:	gallatin, Curtis Villamizar <curtis@ipv6.occnc.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26125
2020-08-20 00:52:53 +00:00
Alexander Motin
8f6355b51d Remove some noisy ACPI tables messages from verbose dmesg.
Those messages were printed hundreds of times during boot, often multiple
times for each table.  We already print information about the tables in
more organized form once to not duplicate it when random ACPI drivers are
attaching.

MFC after:	1 week
2020-08-19 16:09:36 +00:00
Mateusz Guzik
a125ed50a6 linux: add sysctl compat.linux.use_emul_path
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
2020-08-18 22:04:22 +00:00
Mateusz Guzik
d5e3895ea4 linux: consistently use LFREEPATH instead of open-coding it 2020-08-18 22:03:55 +00:00
Peter Grehan
3a3f1e9dfa Export a routine to provide the TSC_AUX MSR value and use this in vmm.
Also, drop an unnecessary set of braces.

Requested by:	kib
Reviewed by:	kib
MFC after:	3 weeks
2020-08-18 11:36:38 +00:00
Peter Grehan
f5f5f1e7d6 Support guest rdtscp and rdpid instructions on Intel VT-x
Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts
that support the "enable RDTSCP" VM-execution control.

Submitted by:	adam_fenn.io
Reported by:	chuck
Reviewed by:	chuck, grehan, jhb
Approved by:	jhb (bhyve), grehan
MFC after:	3 weeks
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D26003
2020-08-18 07:23:47 +00:00
Peter Grehan
46567b4f5e Allow guest device MMIO access from bootmem memory segments.
Recent versions of UEFI have moved local APIC timer initialization into
the early SEC phase which runs out of ROM, prior to self-relocating
into RAM. This results in a hypervisor exit.

Currently bhyve prevents instruction emulation from segments that aren't
marked as "sysmem" aka guest RAM, with the vm_gpa_hold() routine failing.
However, there is no reason for this restriction: the hypervisor already
controls whether EPT mappings are marked as executable.

Fix by dropping the redundant check of sysmem.

MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D25955
2020-08-18 07:08:17 +00:00
Ruslan Bukin
c4cd699010 o Add machine/iommu.h and include MD iommu headers from it,
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.

Requested by:	andrew
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25957
2020-08-05 19:11:31 +00:00
Alexander Motin
aba10e131f Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.

To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs.  On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY.  To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25754
2020-07-25 15:19:38 +00:00
Alex Richardson
b798ef6490 Include TMPFS in all the GENERIC kernel configs
Being able to use tmpfs without kernel modules is very useful when building
small MFS_ROOT kernels without a real file system.
Including TMPFS also matches arm/GENERIC and the MIPS std.MALTA configs.

Compiling TMPFS only adds 4 .c files so this should not make much of a
difference to NO_MODULES build times (as we do for our minimal RISC-V
images).

Reviewed By: br (earlier version for riscv), brooks, emaste
Differential Revision: https://reviews.freebsd.org/D25317
2020-07-24 08:40:04 +00:00
Alexander Motin
ce53f590ca Untie nmi_handle_intr() from DEV_ISA.
The only part of nmi_handle_intr() depending on ISA is isa_nmi(), which is
already wrapped.  Entering debugger on NMI does not really depend on ISA.

MFC after:	2 weeks
2020-07-22 20:15:21 +00:00