Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.
The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.
All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.
A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.
A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.
In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.
A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.
Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:
- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).
- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.
After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.
The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.
- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.
Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035
Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.
This also removes the '-b' and '-g' flag from bhyve(8). These two flags were
marked deprecated in r368519.
Reviewed by: grehan, kevans
Approved by: kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D27490
Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.
Mark the '-b' and '-g' flag as deprecated for bhyve(8).
These will be removed in 13.
Reviewed by: jhb, grehan
Approved by: kevans (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27519
Add update to RIP after a userspace instruction decode (as is done for
the in-kernel counterpart of this case).
Submitted by: adam_fenn.io
Reviewed by: cem, markj
Approved by: grehan (bhyve)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27243
Introduce -D flag that allows for the VM to be destroyed on guest initiated
power-off by the bhyve(8) process itself.
This is quality of life change that allows for simpler deployments without
the need for bhyvectl --destroy.
Requested by: swills
Reviewed by: 0mp (manpages), grehan, kib, swills
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25414
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464
Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).
Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
Print the failed instruction stream as a contiguous stream of hex. This
is closer to something you could throw at a disassembler than 0xHH 0xHH
0xHH.
Also, use the debug.h 'raw' stdio-aware printf helper to avoid the
cascading
line
effect.
Add an implementatation of the 'Virtual Machine Generation ID' spec to
Bhyve. The spec provides a randomly generated GUID (at bhyve start) in
device memory, along with an ACPI device with _CID VM_Gen_Counter and ADDR
evaluating to a Package pointing at that GUID.
A GPE is defined which Notifies the ACPI Device when the generation changes
(such as when a snapshot is rolled back). At this time, Bhyve does not
support snapshotting, so the GPE is never actually raised.
Suggested by: rpokala
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D23165
To allow more general use of the bootrom region, separate initialization from
allocation, and allocation from loading a file.
The bootrom segment is the high 16MB of the low 4GB region.
Each allocation in the segment creates a new mapping with specified protection.
By default, allocation begins at the low end of the range. However, the
BOOTROM_ALLOC_TOP flag is provided to locate a provided bootrom in the high
region it is expected to be in.
The existing ROM-file loading code is refactored to use the new interface.
Reviewed by: grehan (earlier version)
Differential Revision: https://reviews.freebsd.org/D24422
Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.
Reviewed by: grehan, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22657
- Allow the userland hypervisor to intercept breakpoint exceptions
(BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to
enable this feature. These exceptions are reported to userland via
a new VM_EXITCODE_BPT that includes the length of the original
breakpoint instruction. If userland wishes to pass the exception
through to the guest, it must be explicitly re-injected via
vm_inject_exception().
- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
pseudo-register. Injecting a BP# on Intel requires setting this to
the length of the breakpoint instruction. AMD SVM currently ignores
writes to this register (but reports success) and fails to read it.
- Rework the per-vCPU state tracked by the debug server. Rather than
a single 'stepping_vcpu' global, add a structure for each vCPU that
tracks state about that vCPU ('stepping', 'stepped', and
'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is
currently reporting an event. Event handlers for MTRAP and
breakpoint exits loop until the associated event is reported to the
debugger.
Breakpoint events are discarded if the breakpoint is not present
when a vCPU resumes in the breakpoint handler to retry submitting
the breakpoint event.
- Maintain a linked-list of active breakpoints in response to the GDB
'Z0' and 'z0' packets.
Reviewed by: markj (earlier version)
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D20309
will return success when the kernel is built without support of
the capability mode.
It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.
Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744
Architectures Software Developer’s Manual Volume 3"). Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).
Reviewed by: jhb, rgrimes, 0mp
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D17531
For tools that uses bhyve such like libvirt, it is important to be able to
probe what features are supported by the given bhyve binary.
To give more context, libvirt probes bhyve's capabilities in a not very
effective way:
- Running 'bhyve -h' and parsing output.
- To detect devices, it runs 'bhyve -s 0,dev' for every each device and
parses error output to identify if the device is supported or not.
PR: 2101111
Submitted by: novel
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems Inc.
The bhyve(8) exit status indicates how the VM was terminated:
0 rebooted
1 powered off
2 halted
3 triple fault
The problem is when we have wrappers around bhyve that parses the exit
error code and gets an exit(1) for an error but interprets it as "powered off".
So to mitigate this issue and makes it less error prone for third part
applications, I have added a new exit code 4 that is "exited due to an error".
For now the bhyve(8) exit status are:
0 rebooted
1 powered off
2 halted
3 triple fault
4 exited due to an error
Reviewed by: @jhb
MFC after: 2 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D16161
strdup(3) allocates memory for a copy of the string, does the copy and
returns a pointer to it. If there is no sufficient memory NULL is returned
and the global errno is set to ENOMEM.
We do a sanity check to see if it was possible to allocate enough memory.
Also as we allocate memory, we need to free this memory used. Or it will
going out of scope leaks the storage it points to.
Reviewed by: rgrimes
MFC after: 3 weeks.
X-MFC: r332298
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15550
This commit adds a new debug server to bhyve. Unlike the existing -g
option which provides an efficient connection to a debug server
running in the guest OS, this debug server permits inspection and
control of the guest from within the hypervisor itself without
requiring any cooperation from the guest. It is similar to the debug
server provided by qemu.
To avoid conflicting with the existing -g option, a new -G option has
been added that accepts a TCP port. An IPv4 socket is bound to this
port and listens for connections from debuggers. In addition, if the
port begins with the character 'w', the hypervisor will pause the
guest at the first instruction until a debugger attaches and
explicitly continues the guest. Note that only a single debugger can
attach to a guest at a time.
Virtual CPUs are exposed to the remote debugger as threads. General
purpose register values can be read for each virtual CPU. Other
registers cannot currently be read, and no register values can be
changed by the debugger.
The remote debugger can read guest memory but not write to guest
memory. To facilitate source-level debugging of the guest, memory
addresses from the debugger are treated as virtual addresses (rather
than physical addresses) and are resolved to a physical address using
the active virtual address translation of the current virtual CPU.
Memory reads should honor memory mapped I/O regions, though the debug
server does not attempt to honor any alignment or size constraints
when accessing MMIO.
The debug server provides limited support for controlling the guest.
The guest is suspended when a debugger is attached and resumes when a
debugger detaches. A debugger can suspend a guest by sending a Ctrl-C
request (e.g. via Ctrl-C in GDB). A debugger can also continue a
suspended guest while remaining attached. Breakpoints are not yet
supported. Single stepping is supported on Intel CPUs that support
MTRAP VM exits, but is not available on other systems.
While the current debug server has limited functionality, it should
at least be usable for basic debugging now. It is also a useful
checkpoint to serve as a base for adding additional features.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D15022
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
this on the branch.
Original commit message:
Initial bhyve native graphics support.
This adds emulations for a raw framebuffer device, PS2 keyboard/mouse,
XHCI USB controller and a USB tablet.
A simple VNC server is provided for keyboard/mouse input, and graphics
output.
A VGA emulation is included, but is currently disconnected until an
additional bhyve change to block out VGA memory is committed.
Credits:
- raw framebuffer, VNC server, XHCI controller, USB bus/device emulation
and UEFI f/w support by Leon Dang
- VGA, console/g, initial VNC server by tychon@
- PS2 keyboard/mouse jointly done by tychon@ and Leon Dang
- hypervisor framebuffer mem support by neel@
Tested by: Michael Dexter, in a number of revisions of this code.
With the appropriate UEFI image, FreeBSD, Windows and Linux guests can
installed and run in graphics mode using the UEFI/GOP framebuffer.
Approved by: re (gjb)
A couple of minor memory size option related nits:
- use common name 'memsize' (instead of 'max-size' or just 'size')
- bhyve: update usage with memsize unit suffix, drop legacy "MB"
unit
- bhyveload: update usage with memsize unit suffix
- bhyve(8): document default size
- bhyveload(8): use memsize formatting like it's done
in bhyve(8)
Reviewed by: wblock, grehan
Approved by: re (kib), wblock, grehan
Differential Revision: https://reviews.freebsd.org/D6952
if vm_activate_cpu(..) fails when called from fbsdrun_addcpu(..)
MFC after: 1 week
PR: 203884
Reviewed by: grehan
Submitted by: William Orr <will@worrbase.com>
to the qemu one, and uses the same i/o ports but with different
messaging. Requires the 'bootrom' option to be enabled.
This is used by UEFI (and potentially other BIOSs/firmware) to
request information from bhyve. Currently, only the number of
vCPUs is made available, with more to follow.
A very large thankyou to Ben Perrault who helped out testing
an earlier version of this, and bhyve/Windows in general.
Reviewed by: tychon
Discussed with: neel
Sponsored by: Nahanni Systems
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.
MFC after: 2 weeks
The default remains localtime for compatibility with the original device model
in bhyve(8). This is required for OpenBSD guests which assume that the RTC
keeps UTC time.
Reviewed by: grehan
Pointed out by: Jason Tubnor (jason@tubnor.net)
MFC after: 2 weeks
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.
Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.
bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())
Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.
The VT-x code has the following types of MSRs:
- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).
- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).
- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).
Reviewed by: grehan
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.
Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.
Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.
Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).
Reviewed by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan