This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel. For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.
This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.
It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument. As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38125
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.
struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.
As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.
Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.
While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().
Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124
This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code. To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.
A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37161
The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.
Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37158
Currently libvmmapi provides a way to get a list of the allowed ioctls
on the vmm device file, so that bhyve can limit rights on the device
file fd. The interface is rather strange: it allocates a copy of the
list but returns a const pointer, so the caller has to cast away the
const in order to free it without aggravating the compiler.
As far as I can see, there's no reason to make a copy of the array, but
changing vm_get_ioctls() to not do that would break compatibility. So
this change just introduces a better interface: move all rights-limiting
logic into libvmmapi.
Any new operations on the fd should be wrapped by libvmmapi, so also
discourage use of vm_get_device_fd(). Currently bhyve uses it only when
limiting rights on the device fd.
No functional change intended.
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D37098
- Clear CR2, EFER, and R8-15 to zero.
- Reset DR6 and DR7 to their documented reset values.
- Reset interrupt shadow state.
- Document the reason CR0 is reset to a value that doesn't match its
documented value.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D35622
Sponsored by: Beckhoff Automation GmbH & Co. KG
Currently there is no way to safely free a vm structure without
leaking the fd. vm_destroy() closes the fd but also destroys the VM
whereas in some cases a VM needs to be opened (vm_open) and then
closed (vm_close).
Reviewed by: jhb
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D35073
Allows callers of vm_get_name() to retrieve the vm name without having
to allocate a buffer.
While in the vicinity, do minor cleanup in vm_snapshot_basic_metadata().
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D34290
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).
It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.
Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month
- Add a starting index to 'struct vmstats' and change the
VM_STATS ioctl to fetch the 64 stats starting at that index.
A compat shim for <= 13 continues to fetch only the first 64
stats.
- Extend vm_get_stats() in libvmmapi to use a loop and a static
thread local buffer which grows to hold the stats needed.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27463
In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to
preserve errno. However, it's possible that free() may change the errno
as well. Keep the free() call, but explicitly save and restore errno.
Noted by: jhb
Fixes: 6bb140e3ca895a14
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
Use errno to print a more descriptive error message when vm_open() fails
libvmm: preserve errno when vm_device_open() fails
vm_destroy() squashes errno by making a dive into sysctlbyname() - we
can safely skip vm_destroy() here since it's not doing any critical
clean up at this point. Replace vm_destroy() with a free() call.
PR: 250671
MFC after: 3 days
Submitted by: marko@apache.org
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D29109
struct checkpoint_op, enum checkpoint_opcodes, and
MAX_SNAPSHOT_VMNAME are not vmm specific, move them out of the vmmapi
header.
They are used for the save/restore functionality that bhyve(8)
provides and are better suited in usr.sbin/bhyve/snapshot.h
Since bhyvectl(8) requires these, the Makefile for bhyvectl has been
modified to include usr.sbin/bhyve/snapshot.h
Reviewed by: kevans, grehan
Differential Revision: https://reviews.freebsd.org/D28410
Add a new ioctl to disable all MSI-X interrupts for a PCI passthrough
device and invoke it if a write to the MSI-X capability registers
disables MSI-X. This avoids leaving MSI-X interrupts enabled on the
host if a guest device driver has disabled them (e.g. as part of
detaching a guest device driver).
This was found by Chelsio QA when testing that a Linux guest could
switch from MSI-X to MSI interrupts when using the cxgb4vf driver.
While here, explicitly fail requests to enable MSI on a passthrough
device if MSI-X is enabled and vice versa.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: grehan, markj
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27212
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).
Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
While here, replace the array of mapping structures with an array of
string pointers where the index is the capability value.
Submitted by: Rob Fairbanks <rob.fx907@gmail.com>
Reviewed by: rgrimes
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24289
Update a bunch of Makefile.depend files as
a result of adding Makefile.depend.options files
Reviewed by: bdrewery
MFC after: 1 week
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org/D22494
Instead of relying on PROT_NONE mappings with MAP_ANON, use MAP_GUARD
to reserve address space around guest memory ranges including the
guard ranges of address space around mappings.
Submitted by: Shawn Webb
Reviewed by: araujo
Approved by: re (rgrimes)
MFC after: 1 month
Sponsored by: HardendBSD and G2, Inc
Differential Revision: https://reviews.freebsd.org/D16822
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
This is used as part of implementing run control in bhyve's debug
server. The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor. Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).
The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server). The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs. The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.
Reviewed by: avg, grehan
Tested on: Intel (jhb), AMD (avg)
Differential Revision: https://reviews.freebsd.org/D14466
Unlike the existing GLA2GPA ioctl, GLA2GPA_NOFAULT does not modify
the guest. In particular, it does not inject any faults or modify
PTEs in the guest when performing an address space translation.
This is used by bhyve's debug server to read and write memory for
the remote debugger.
Reviewed by: grehan
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D14075
These are a convenience for bhyve's debug server to use a single
ioctl for 'g' and 'G' rather than a loop of individual get/set
ioctl requests.
Reviewed by: grehan
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D14074
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
lead to access from the virtual machine to the heap of the bhyve(8) process.
Submitted by: Felix Wilhelm <fwilhelm ernw.de>
Patch by: grehan
Security: FreeBSD-SA-16:38.bhyve
This both avoids some dependencies on xinstall.host and allows
bootstrapping on older releases to work due to lack of at least 'install -l'
support.
Sponsored by: EMC / Isilon Storage Division
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Requested by: grehan
due to a change in behavior of the 'vm_map_gpa()'.
Prior to r284539 if 'vm_map_gpa()' was called to map an address range in the
guest MMIO region then it would return NULL. This was used by the "movs"
emulation to detect if the 'src' or 'dst' operand was in MMIO space.
Post r284539 'vm_map_gpa()' started returning a non-NULL pointer even when
mapping the guest MMIO region.
Fix this by returning non-NULL only if [gaddr, gaddr+len) is entirely
within the 'lowmem' or 'highmem' regions and NULL otherwise.
Pointy hat to: neel
Reviewed by: grehan
Reported by: tychon, Ben Perrault (ben.perrault@gmail.com)
MFC after: 1 week
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks
Off by default, build behaves normally.
WITH_META_MODE we get auto objdir creation, the ability to
start build from anywhere in the tree.
Still need to add real targets under targets/ to build packages.
Differential Revision: D2796
Reviewed by: brooks imp
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.
The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.
Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.
Reviewed by: tychon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2428
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.
Add "MOVS" instruction support for the 'MMIO to MMIO' case.
Reviewed by: neel