Commit Graph

2082 Commits

Author SHA1 Message Date
Konstantin Belousov
aa3ea612be x86: remove gcov kernel support
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29529
2021-04-02 15:41:51 +03:00
Mitchell Horne
7446b0888d gdb: report specific stop reason for watchpoints
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html

Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	51
Differential Revision:	https://reviews.freebsd.org/D29174
2021-03-30 11:36:41 -03:00
Mitchell Horne
15dc1d4452 x86: implement kdb watchpoint functions
Add wrappers around the dbreg interface that can be consumed by MI
kernel debugger code. The dbreg functions themselves are updated to
return error codes, not just -1. dbreg_set_watchpoint() is extended to
accept access bits as an argument.

Reviewed by:	jhb, kib, markj
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29155
2021-03-29 12:05:43 -03:00
Mark Johnston
7ae2e70336 amd64: Make KPDPphys local to pmap.c
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-03-24 09:57:31 -04:00
Mark Johnston
3ead60236f Generalize bus_space(9) and atomic(9) sanitizer interceptors
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN.  Lay a bit of groundwork for KASAN and KMSAN.

When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h.  These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.

No functional change intended.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2021-03-22 22:21:53 -04:00
Jason A. Harmening
d22883d715 Remove PCPU_INC
e4b8deb222 removed the last in-tree uses of PCPU_INC().  Its
potential benefit is also practically nonexistent.  Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction.  The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29308
2021-03-20 19:23:59 -07:00
D Scott Phillips
f8a6ec2d57 bhyve: support relocating fbuf and passthru data BARs
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by:	scottph
MFC After:	1 week
Sponsored by:	Intel Corporation
Reviewed by:	grehan
Approved by:	philip (mentor)
Differential Revision:	https://reviews.freebsd.org/D24066
2021-03-19 11:04:36 +08:00
Jason A. Harmening
e4b8deb222 amd64 pmap: convert to counter(9), add PV and pagetable page counts
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters.  Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.

The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios.  However, this change does add two new counters that
are available without PV_STATS.  pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively.  These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28923
2021-03-09 09:27:10 -08:00
Mark Johnston
435c7cfb24 Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29103
2021-03-08 12:39:06 -05:00
Andrew Turner
3fd63ddfdf Limit when we call DELAY from KCSAN on amd64
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.

Reviewed by:	kib
Tested by:	arichardson
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D28895
2021-02-25 12:38:05 +00:00
Allan Jude
d0673fe160 smbios: Move smbios driver out from x86 machdep code
Add it to the x86 GENERIC and MINIMAL kernels

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	rpokala
Differential Revision:	https://reviews.freebsd.org/D28738
2021-02-23 21:17:09 +00:00
Mateusz Guzik
aae89f6f09 amd64: use compiler intrinsics for bsf* and bsr* 2021-02-01 04:53:23 +00:00
Mateusz Guzik
d1de5698df amd64: retire sse2_pagezero
All page zeroing is using temporal stores with rep movs*, the routine is
unused for several years.

Should a need arise for zeroing using non-temporal stores, a more
optimized variant can be implemented with a more descriptive name.
2021-01-30 00:17:15 +00:00
Mateusz Guzik
37bd3aa6fa amd64: use builtins for all ffs* variants
While here even up whitespace.
2021-01-14 14:37:22 +00:00
Roger Pau Monne
ed78016d00 xen/privcmd: implement the dm op ioctl
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.

This allows user-space to make use of the dm_op family of hypercalls,
which are used by device models.

Sponsored by:	Citrix Systems R&D
2021-01-11 16:33:27 +01:00
Konstantin Belousov
45974de8fb x86: Add rdtscp32() into cpufunc.h.
Suggested by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Mitchell Horne
72939459bd amd64: use register macros for gdb_cpu_getreg()
Prefer these newly-added definitions to bare values.

MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
2020-12-18 16:16:03 +00:00
Mitchell Horne
0ef474de88 amd64: allow gdb(4) to write to most registers
Similar to the recent patch to arm's gdb stub in r368414, allow GDB to
update the contents of most general purpose registers.

Reviewed by:	cem, jhb, markj
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	44
Differential Revision:	https://reviews.freebsd.org/D27642
2020-12-18 16:09:24 +00:00
Peter Grehan
15add60d37 Convert vmm_ops calls to IFUNC
There is no need for these to be function pointers since they are
never modified post-module load.

Rename AMD/Intel ops to be more consistent.

Submitted by:	adam_fenn.io
Reviewed by:	markj, grehan
Approved by:	grehan (bhyve)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D27375
2020-11-28 01:16:59 +00:00
Maxim Sobolev
fd2ef8ef5a Unobfuscate "KERNLOAD" parameter on amd64. This change lines-up amd64 with the
i386 and the rest of supported architectures by defining KERNLOAD in the
vmparam.h and getting rid of magic constant in the linker script, which albeit
documented via comment but isn't programmatically accessible at a compile time.

Use KERNLOAD to eliminate another (matching) magic constant 100 lines down
inside unremarkable TU "copy.c" 3 levels deep in the EFI loader tree.

Reviewed by:	markj
Approved by:	markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D27355
2020-11-25 23:19:01 +00:00
John Baldwin
1925586e03 Honor the disabled setting for MSI-X interrupts for passthrough devices.
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X.  This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).

This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.

While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.

Reported by:	Sony Arpita Das @ Chelsio
Reviewed by:	grehan, markj
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D27212
2020-11-24 23:18:52 +00:00
Mark Johnston
6f5a960678 vmm: Make pmap_invalidate_ept() wait synchronously for guest exits
Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number.  However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.

Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation.  Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.

Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.

Reviewed by:	grehan, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26910
2020-11-11 15:01:17 +00:00
Mark Johnston
cff169880e amd64: Make it easier to configure exception stack sizes
The amd64 kernel handles certain types of exceptions on a dedicated
stack.  Currently the sizes of these stacks are all hard-coded to
PAGE_SIZE, but for at least NMI handling it can be useful to use larger
stacks.  Add constants to intr_machdep.h to make this easier to tweak.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27076
2020-11-04 16:42:20 +00:00
Konstantin Belousov
546df7a45d amd64 pmap.h: explicitly provide constants values instead of relying
on some more advanced C features.

This fixes gcc-toolchain build of exception.S.

Reported and tested by:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-16 16:22:32 +00:00
Konstantin Belousov
e406235000 Fix for mis-interpretation of PCB_KERNFPU.
RIght now PCB_KERNFPU is used both as indication that kernel prepared
hardware FPU context to use and that the thread is fpu-kern
thread.  This also breaks fpu_kern_enter(FPU_KERN_NOCTX), since
fpu_kern_leave() then clears PCB_KERNFPU.

Introduce new flag PCB_KERNFPU_THR which indicates that the thread is
fpu-kern.  Do not clear PCB_KERNFPU if fpu-kern thread leaves noctx
fpu region.

Reported and tested by:	jhb (amd64)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25511
2020-10-14 23:01:41 +00:00
Konstantin Belousov
df01340989 amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64).  If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.

Reported and tested by:	Michał Górny <mgorny@mgorny@moritz.systems>
PR:	250043
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26643
2020-10-03 23:17:29 +00:00
Konstantin Belousov
5e8ea68fd8 Move ctx_switch_xsave declaration to amd64 md_var.h.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:07:09 +00:00
Edward Tomasz Napierala
1e2521ffae Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26458
2020-09-27 18:47:06 +00:00
Mark Johnston
78257765f2 Add a vmparam.h constant indicating pmap support for large pages.
Enable SHM_LARGEPAGE support on arm64.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26467
2020-09-23 19:34:21 +00:00
D Scott Phillips
00e6614750 Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

Reviewed by:	jhb
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26131
2020-09-21 22:21:59 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Mark Johnston
2d838cd867 Add the MEM_EXTRACT_PADDR ioctl to /dev/mem.
This allows privileged userspace processes to find information about the
physical page backing a given mapping.  It is useful in applications
such as DPDK which perform some of their own memory management.

Reviewed by:	kib, jhb (previous version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D26237
2020-09-02 18:12:47 +00:00
Mateusz Guzik
543769bf83 amd64: clean up empty lines in .c and .h files 2020-09-01 21:16:54 +00:00
Konstantin Belousov
f3eb12e4a6 Add bhyve support for LA57 guest mode.
Noted and reviewed by:	grehan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:37:21 +00:00
Konstantin Belousov
9ce875d9b5 amd64 pmap: LA57 AKA 5-level paging
Since LA57 was moved to the main SDM document with revision 072, it
seems that we should have a support for it, and silicons are coming.

This patch makes pmap support both LA48 and LA57 hardware.  The
selection of page table level is done at startup, kernel always
receives control from loader with 4-level paging.  It is not clear how
UEFI spec would adapt LA57, for instance it could hand out control in
LA57 mode sometimes.

To switch from LA48 to LA57 requires turning off long mode, requesting
LA57 in CR4, then re-entering long mode.  This is somewhat delicate
and done in pmap_bootstrap_la57().  AP startup in LA57 mode is much
easier, we only need to toggle a bit in CR4 and load right value in CR3.

I decided to not change kernel map for now.  Single PML5 entry is
created that points to the existing kernel_pml4 (KML4Phys) page, and a
pml5 entry to create our recursive mapping for vtopte()/vtopde().
This decision is motivated by the fact that we cannot overcommit for
KVA, so large space there is unusable until machines start providing
wider physical memory addressing.  Another reason is that I do not
want to break our fragile autotuning, so the KVA expansion is not
included into this first step.  Nice side effect is that minidumps are
compatible.

On the other hand, (very) large address space is definitely
immediately useful for some userspace applications.

For userspace, numbering of pte entries (or page table pages) is
always done for 5-level structures even if we operate in 4-level mode.
The pmap_is_la57() function is added to report the mode of the
specified pmap, this is done not to allow simultaneous 4-/5-levels
(which is not allowed by hw), but to accomodate for EPT which has
separate level control and in principle might not allow 5-leve EPT
despite x86 paging supports it. Anyway, it does not seems critical to
have 5-level EPT support now.

Tested by:	pho (LA48 hardware)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:19:04 +00:00
Peter Grehan
f5f5f1e7d6 Support guest rdtscp and rdpid instructions on Intel VT-x
Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts
that support the "enable RDTSCP" VM-execution control.

Submitted by:	adam_fenn.io
Reported by:	chuck
Reviewed by:	chuck, grehan, jhb
Approved by:	jhb (bhyve), grehan
MFC after:	3 weeks
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D26003
2020-08-18 07:23:47 +00:00
Ruslan Bukin
c4cd699010 o Add machine/iommu.h and include MD iommu headers from it,
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.

Requested by:	andrew
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25957
2020-08-05 19:11:31 +00:00
Alexander Motin
aba10e131f Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.

To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs.  On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY.  To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25754
2020-07-25 15:19:38 +00:00
Konstantin Belousov
3ec7e1695c amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
When pmap operates in PTI mode, we must reload %cr3 on return to
userspace.  In non-PCID mode the reload always flushes all non-global
TLB entries and we take advantage of it by only invalidating the KPT
TLB entries (there is no cached UPT entries at all).

In PCID mode, we flush both KPT and UPT TLB explicitly, but we can
take advantage of the fact that PCID mode command to reload %cr3
includes a flag to flush/not flush target TLB.  In particular, we can
avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3
on return to usermode should be flushing.  This is done by providing
either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask.  The mask is
automatically reset to all-1s on return to usermode.

Similarly, we can avoid flushing UPT TLB on context switch, replacing
it by setting pc_ucr3_load_mask.  This unifies INVPCID and non-INVPCID
PTI ifunc, leaving only 4 cases instead of 6.  This trick is also
applicable both to the TLB shootdown IPI handlers, since handlers
interrupt the target thread.

But then we need to check pc_curpmap in handlers, and this would
reopen the same race for INVPCID machines as was fixed in r306350 for
non-INVPCID.  To not introduce the same bug, unconditionally do
spinlock_enter() in pmap_activate().

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D25483
2020-07-18 18:19:57 +00:00
Mateusz Guzik
c4e64133d8 amd64: patch ffsl to use the compiler builtin
This shortens fdalloc by over 60 bytes. Correctness verified by running both
variants at the same time and comparing the result of each call.

Note someone(tm) should make a pass at converting everything else feasible.
2020-07-16 11:28:24 +00:00
Konstantin Belousov
dc43978aa5 amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu.  Now each CPU can initiate
shootdown IPI independently from other CPUs.  Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu.  After that IPI is sent to all targets
which scan for zeroed scoreboard generation words.  Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set.  Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.

Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.

The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.

Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.

Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
   this is fine because we in fact do not send IPIs from interrupt
   handlers. More for !x2APIC mode where ICR access for write requires
   two registers write, we disable interrupts around it. If considered
   incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
   cache line, to reduce ping-pong.

Reviewed by:	markj (previous version)
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D25510
2020-07-14 20:37:50 +00:00
Conrad Meyer
c74a3041f0 Add domain policy allocation for amd64 fpu_kern_ctx
Like other types of allocation, fpu_kern_ctx are frequently allocated per-cpu.
Provide the API and sketch some example consumers.

fpu_kern_alloc_ctx_domain() preferentially allocates memory from the
provided domain, and falls back to other domains if that one is empty
(DOMAINSET_PREF(domain) policy).

Maybe it makes more sense to just shove one of these in the DPCPU area
sooner or later -- left for future work.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22053
2020-07-03 14:54:46 +00:00
Conrad Meyer
4daa95f85d bhyve(8): For prototyping, reattempt decode in userspace
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of.  In this scenario,
reset decoder state and try again.

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24464
2020-06-25 00:18:42 +00:00
Conrad Meyer
f4ce062964 vmm(4): Add 12 user ABI compat after r349948
Reported by:	kp
Reviewed by:	jhb, kp
Tested by:	kp
Differential Revision:	https://reviews.freebsd.org/D24929
2020-05-20 17:27:54 +00:00
Conrad Meyer
8a68ae80f6 vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).

Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24525
2020-05-15 15:54:22 +00:00
John Baldwin
483d953a86 Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed.  In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken).  A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations.  The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system).  In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions.  The file format also does not currently support
versioning of individual chunks of state.  As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files.  The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility.  As a result, the current implementation is not enabled
by default.  It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by:	Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by:	Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes:	yes
Sponsored by:	University Politehnica of Bucharest
Sponsored by:	Matthew Grooms (student scholarships)
Sponsored by:	iXsystems
Differential Revision:	https://reviews.freebsd.org/D19495
2020-05-05 00:02:04 +00:00
Conrad Meyer
cfdea69d24 vmm(4): Decode 3-byte VEX-prefixed instructions
Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24462
2020-04-21 21:33:06 +00:00
Conrad Meyer
497cb9259b vmm.h: Add ABI assertions and mark implicit holes
The static assertions were added (with size and offsets from gdb) and verified
with a build prior to marking the holes explicitly.

This is in preparation for a subsequent revision, pending in phabricator, that
makes use of some of these unused bits without impacting the ABI.

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24461
2020-04-17 15:19:42 +00:00
Conrad Meyer
b645fd4531 vmm(4): Expose instruction decode to userspace build
Permit instruction decoding logic to be compiled outside of the kernel for
rapid iteration and validation.

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D24439
2020-04-16 16:50:33 +00:00
Conrad Meyer
ca0ec73c11 Expand generic subword atomic primitives
The goal of this change is to make the atomic_load_acq_{8,16},
atomic_testandset{,_acq}_long, and atomic_testandclear_long primitives
available in MI-namespace.

The second goal is to get this draft out of my local tree, as anything that
requires a full tinderbox is a big burden out of tree.  MD specifics can be
refined individually afterwards.

The generic implementations may not be ideal for your architecture; feel
free to implement better versions.  If no subword_atomic definitions are
needed, the include can be removed from your arch's machine/atomic.h.
Generic definitions are guarded by defined macros of the same name.  To
avoid picking up conflicting generic definitions, some macro defines are
added to various MD machine/atomic.h to register an existing implementation.

Include _atomic_subword.h in arm and arm64 machine/atomic.h.

For some odd reason, KCSAN only generates some versions of primitives.
Generate the _acq variants of atomic_load.*_8, atomic_load.*_16, and
atomic_testandset.*_long.  There are other questionably disabled primitives,
but I didn't run into them, so I left them alone.  KCSAN is only built for
amd64 in tinderbox for now.

Add atomic_subword implementations of atomic_load_acq_{8,16} implemented
using masking and atomic_load_acq_32.

Add generic atomic_subword implementations of atomic_testandset_long(),
atomic_testandclear_long(), and atomic_testandset_acq_long(), using
atomic_fcmpset_long() and atomic_fcmpset_acq_long().

On x86, add atomic_testandset_acq_long as an alias for
atomic_testandset_long.

Reviewed by:	kevans, rlibby (previous versions both)
Differential Revision:	https://reviews.freebsd.org/D22963
2020-03-25 23:12:43 +00:00