a diagnostic message. So we do a sanity checking on the return value
of vq_getchain().
Spotted by: gcc49
Reviewed by: avg
MFC after: 4 weeks
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15388
The prior code only allowed multiples of 32 for the
numbers of columns. Remove this restriction to allow
a forthcoming UEFI firmware update to allow arbitrary
x,y resolutions.
(the code for handling rows already supported non mult-32 values)
Reviewed by: Leon Dang (original author)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15274
This commit adds a new debug server to bhyve. Unlike the existing -g
option which provides an efficient connection to a debug server
running in the guest OS, this debug server permits inspection and
control of the guest from within the hypervisor itself without
requiring any cooperation from the guest. It is similar to the debug
server provided by qemu.
To avoid conflicting with the existing -g option, a new -G option has
been added that accepts a TCP port. An IPv4 socket is bound to this
port and listens for connections from debuggers. In addition, if the
port begins with the character 'w', the hypervisor will pause the
guest at the first instruction until a debugger attaches and
explicitly continues the guest. Note that only a single debugger can
attach to a guest at a time.
Virtual CPUs are exposed to the remote debugger as threads. General
purpose register values can be read for each virtual CPU. Other
registers cannot currently be read, and no register values can be
changed by the debugger.
The remote debugger can read guest memory but not write to guest
memory. To facilitate source-level debugging of the guest, memory
addresses from the debugger are treated as virtual addresses (rather
than physical addresses) and are resolved to a physical address using
the active virtual address translation of the current virtual CPU.
Memory reads should honor memory mapped I/O regions, though the debug
server does not attempt to honor any alignment or size constraints
when accessing MMIO.
The debug server provides limited support for controlling the guest.
The guest is suspended when a debugger is attached and resumes when a
debugger detaches. A debugger can suspend a guest by sending a Ctrl-C
request (e.g. via Ctrl-C in GDB). A debugger can also continue a
suspended guest while remaining attached. Breakpoints are not yet
supported. Single stepping is supported on Intel CPUs that support
MTRAP VM exits, but is not available on other systems.
While the current debug server has limited functionality, it should
at least be usable for basic debugging now. It is also a useful
checkpoint to serve as a base for adding additional features.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D15022
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
bhyvectl --get-cpu-topology option added.
Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
When using -l option targeting file that can't be opened (ie. nmdm module
is not loaded and /dev/nmdm* is specified) bhyve tries to apply capsicum
capabilities to a file that was not opened.
Enclose that code in an if statement and only run it on correctly opened
descriptor also providing meaningful message in case of an error.
Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by: grehan, emaste
Sponsoied by: Mysterious Code Ltd.
Differential Revision: D12985
Gcc noticed that the result of the bit shift is always zero. Shift so
that the ATC_CS_C67 bits end up in bits 6 & 7.
Reviewed by: grehan, tychon
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11775
Gcc complained that e82545_tx_thread has a return type declared but
doesn't return anything. Annotate the procedure with _Noreturn.
Reviewed by: grehan
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11774
pthread_join(3). The variable tid is not yet initialized in case
the authentication fails at early stage, that would lead pthread_join be
called with an uninitialized variable.
CID: 1375950
Reported by: Coverity, cem
Reviewed by: cem
MFC after: 3 weeks.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11150
Also add __FBSDID.
Reviewed by: grehan
This file lacks a license(!) so for this change the following declaration
applies:
To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably
and unconditionally waives, abandons, and surrenders all of Affirmer's
Copyright and Related Rights and associated claims and causes of action,
whether now known or unknown (including existing as well as future claims
and causes of action).
not on wildcard. [1]
- Move the default port assignment from pci_fbuf.c to rfb.c,
to avoid polluting pci_fbuf.c with network things.
Suggested by: grehan
There is no behavioral difference, as it's just swapping
out the name of two identically-valued constants.
Submitted by: Vicki Pfau (vi AT endrift.com)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D9597
This prevented the device from attaching with a
Windows guest (most other guests use the device type
for matching)
PR: 212711
Submitted by: jbeich
MFC after: 3 days
in the first byte of the 3-byte mouse data report.
Plan9/9front requires this.
Switch over to using #defines for the data report bits.
Verified no regression on Win10/Fedora-live.
Reported and tested by: Trent Thompson (trentnthompson at gmail com)
MFC after: 1 week
The TCP server implementation in dbgport does not track clients, so it
may try to write to a disconected socket resulting in SIGPIPE.
Avoid that by setting SO_NOSIGPIPE socket option.
Because dbgport emulates an I/O port to guest, the communication is done
byte by byte. Reduce latency of the TCP/IP transfers by using
TCP_NODELAY option. In my tests that change improves performance of
kgdb commands with lots of output (e.g. info threads) by two orders of
magnitude.
A general note. Since we have a uart emulation in bhyve, that can be
used for the console and gdb access to guests. So, bvmconsole and bvmdebug
could be de-orbited now. But there are many existing deployments that
still dependend on those.
Discussed with: julian, jhb
MFC after: 2 weeks
Sponsored by: Panzura
writev() can do a short write. Retrying it results in a very convoluted
and complex code, so we iterate over iovec and do regular stream_write()
instead.
Approved by: trasz
Sponsored by: iXsystems, Inc.
These functions are allowed to overwrite their input. Pull a copy of the
input parameter and call dirname() and basename() on that instead. Do
ensure that we reload the pathname value between calls.
Adds virtio-console device support to bhyve, allowing to create
bidirectional character streams between host and guest.
Syntax:
-s <slotnum>,virtio-console,port1=/path/to/port1.sock,anotherport=...
Maximum of 16 ports per device can be created. Every port is named
and corresponds to an Unix domain socket created by bhyve. bhyve
accepts at most one connection per port at a time.
Limitations:
- due to lack of destructors of in bhyve, sockets on the filesystem
must be cleaned up manually after bhyve exits
- there's no way to use "console port" feature, nor the console port
resize as of now
- emergency write is advertised, but no-op as of now
Approved by: trasz
MFC after: 1 month
Relnotes: yes
Sponsored by: iXsystems, Inc.
Differential Revision: D7185
"io" is the default, and allows VGA i/o registers to be
accessed. This is required by Win7/2k8 graphics guests that
use a combination of BIOS int10 and UEFI.
"off" disables all VGA i/o and mem accesses.
"on" is not yet hooked up, but will enable full VGA rendering.
OpenBSD/UEFI >= 5.9 graphics guests can be booted using "vga=off"
- Allow "rfb" to be used instead of "tcp" for the fbuf VNC
description. "tcp" will be removed at a future point and is
kept as an alias.
Discussed with: Leon Dang
MFC after: 3 days
injected without state being set up.
This fixes a core dump when dropping to the UEFI prompt
with graphics enabled and moving the mouse around.
Discussed with: Leon Dang
MFC after: 3 days
There seems no hard limit on number of segments per packet in the chip,
and 20 appeared insufficient. Hope 64 will be enough, but if not -- add
check to report that and drop the packet instead of corrupting stack.
This makes factual interrupt routing match one shipped with UEFI firmware.
With old firmware this make legacy interrupts work reliable for functions 0
of PCI slots 3-6. Updated UEFI image fixes problem completely.
The code was successfully tested with FreeBSD, Linux, Solaris and Windows
guests. This interface is predictably slower (about 2x) then virtio-net,
but it is very helpful for guests not supporting virtio-net by default.
Thanks to Jeremiah Lott and Peter Grehan for doing original heavy lifting.
It was useless before, but may improve performance now if multiple devices
are configured and guest supports this feature.
Sponsored by: iXsystems, Inc.
While old syntax is still supported, new syntax looks like this:
-s 3,ahci,hd:/dev/zvol/XXX,hd:/dev/zvol/YYY,cd:/storage/ZZZ.iso
Sponsored by: iXsystems, Inc.
Add `WRAPPED_CTASSERT` macro by annotating CTASSERTs with __unused
to deal with -Wunused-local-typedefs warnings from gcc 4.8+.
All other compilers (clang, etc) use CTASSERT as-is. A more generic
solution for this issue will be proposed after ^/stable/11 is forked.
Consolidate all CTASSERTs under one block instead of inlining them in
functions.
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7119
MFC after: 1 week
Reported by: Jenkins
Reviewed by: grehan (maintainer)
Sponsored by: EMC / Isilon Storage Division
Put cfl/prdt under AHCI_DEBUG #defines as they are only used in
those cases.
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7119
MFC after: 1 week
Reported by: Jenkins
Reviewed by: grehan (maintainer)
Sponsored by: EMC / Isilon Storage Division
this on the branch.
Original commit message:
Initial bhyve native graphics support.
This adds emulations for a raw framebuffer device, PS2 keyboard/mouse,
XHCI USB controller and a USB tablet.
A simple VNC server is provided for keyboard/mouse input, and graphics
output.
A VGA emulation is included, but is currently disconnected until an
additional bhyve change to block out VGA memory is committed.
Credits:
- raw framebuffer, VNC server, XHCI controller, USB bus/device emulation
and UEFI f/w support by Leon Dang
- VGA, console/g, initial VNC server by tychon@
- PS2 keyboard/mouse jointly done by tychon@ and Leon Dang
- hypervisor framebuffer mem support by neel@
Tested by: Michael Dexter, in a number of revisions of this code.
With the appropriate UEFI image, FreeBSD, Windows and Linux guests can
installed and run in graphics mode using the UEFI/GOP framebuffer.
Approved by: re (gjb)
A couple of minor memory size option related nits:
- use common name 'memsize' (instead of 'max-size' or just 'size')
- bhyve: update usage with memsize unit suffix, drop legacy "MB"
unit
- bhyveload: update usage with memsize unit suffix
- bhyve(8): document default size
- bhyveload(8): use memsize formatting like it's done
in bhyve(8)
Reviewed by: wblock, grehan
Approved by: re (kib), wblock, grehan
Differential Revision: https://reviews.freebsd.org/D6952
When bhyve cannot open a backing file, it now says explicitly which file
could not be opened
Note that the change has only be maed in block_if.c and not in
pci_virtio_block.c as the error will always be catched by the first
PR: 202321 (different patch)
Reviewed by: grehan
MFC after: 3 day
Sponsored by: Gandi.net
Differential Revision: https://reviews.freebsd.org/D6576
Previously, many errors (such as the PCI device not being attached
to the ppt(4) driver) resulted in bhyve silently exiting without
starting the virtual machine. Now any errors encountered when
configuring a virtual slot for a PCI passthru device should be noted
on stderr.
Reviewed by: neel
Differential Revision: https://reviews.freebsd.org/D5990
If the PBA shares a page with the MSI-X table, map the shared page via
/dev/mem and emulate accesses to the portion of the PBA in the shared
page by accessing the mapped page.
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5919
ncq was not being inititialized properly but it was not actually
necessary either, so make the code smaller by removing it.
CID: 1248842
Reviewed by: grehan
The buffer length should be checked to avoid overflow, but there
is no API to get the slot length, so the hardcoded value is used.
Return the currently-first request chain back to the available
queue if there are no more packets.
Report the link as up if we managed to open vale port.
Use consistent coding style.
Submitted by: btw
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5595
if vm_activate_cpu(..) fails when called from fbsdrun_addcpu(..)
MFC after: 1 week
PR: 203884
Reviewed by: grehan
Submitted by: William Orr <will@worrbase.com>
- Don't advertize trusted-computing capability in the Identify page.
This prevents Windows from issuing a TRUSTED_RECEIVE_DMA command.
- Windows will send down SMART and SECURITY_FREEZE_LOCK
even though smart and security capabilities were not advertized.
Send back a silent abort.
Reviewed by: mav
to the qemu one, and uses the same i/o ports but with different
messaging. Requires the 'bootrom' option to be enabled.
This is used by UEFI (and potentially other BIOSs/firmware) to
request information from bhyve. Currently, only the number of
vCPUs is made available, with more to follow.
A very large thankyou to Ben Perrault who helped out testing
an earlier version of this, and bhyve/Windows in general.
Reviewed by: tychon
Discussed with: neel
Sponsored by: Nahanni Systems
the largest that the Windows virtio driver can send down
- Always advertize indirect descriptors. The Illumos virtio
driver won't attach unless this capability is seen.
Reviewed by: neel
The /etc/ttys entry for a serial console in FreeBSD/x86 is as follows:
ttyu0 "/usr/libexec/getty 3wire" vt100 onifconsole secure
The initial terminal type passed to getty(8) is "3wire" which sets the
CLOCAL flag. However reset(1) clears this flag and any programs that try
to open the terminal will hang waiting for DCD to be asserted.
Fix this by always asserting DCD and DSR in the emulated uart.
The following discussion on virtualization@ has more details:
https://lists.freebsd.org/pipermail/freebsd-virtualization/2015-June/003666.html
Reported by: jmg
Discussed with: grehan
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.
MFC after: 2 weeks
capablity by advertising pcie capability.
Since the 'hostbridge' device isn't a true pci-to-pci bridge, and
doesn't actaully use the bridge configuration space layout, change
the header-type from type 1 to type 0 to avoid confusion.
Reviewed by: neel
While there is no issued with the number of descriptors in
a virtio indirect descriptor, it's a guest's choice as to
whether indirect descriptors are used. For the case where
they aren't, the virtio block ring size is still 64 which
is less than the now reported max_segs of 67. This results
in an assertion in recent Linux guests even though it was
benign since they were using indirect descs.
The intertwined relationship between virtio ring size,
max seg size and blockif queue size will be addressed
in an upcoming commit, at which point the max descriptors
will again be bumped up to 67.
The Windows virtio driver ignores the advertized seg_max
field and assumes the host can accept up to 67 segments
in indirect descriptors, triggering an assert in the bhyve
process.
No objection from: mav
Reviewed by: neel
Reported and tested by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
The default behavior is to infer the logical and physical sector sizes from
the block device backend. However older versions of Windows only work with
specific logical/physical combinations:
- Vista and Windows 7: 512/512
- Windows 7 SP1: 512/512 or 512/4096
For this reason allow the sector size to be specified using the following
block device option: sectorsize=logical[/physical]
Reported by: Leon Dang (ldang@nahannisys.com)
Reviewed by: grehan
MFC after: 2 weeks
not one that needs to be negotiated. Use the host capabilities
field and not the negotiated field when verifying that indirect
descriptors are supported.
Found with the Redhat Windows viostor driver, which clears
the indirect capability in the negotiated caps and then starts
using them.
Reported and tested by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
This is needed to support Windows guests that use byte reads to access certain
AHCI registers (e.g. PxTFD.Status and PxTFD.Error).
Reviewed by: grehan, mav
Reported by: Leon Dang (ldang@nahannisys.com)
Differential Revision: https://reviews.freebsd.org/D2469
MFC after: 2 weeks
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.
The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.
Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.
Reviewed by: tychon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2428
It is not required to use CLO to recover from task file error, it should
be enough to do only stop/start, that does not clear the PxTFD.STS.ERR.
MFC after: 13 days
Using status updates in r282364, block queue on BSY, DRQ or ERR bits set.
This can be a performance penalization for non-NCQ commands, but it is
required for proper error recovery and standard compliance.
MFC after: 2 weeks
GEOM does not support scatter/gather lists in its I/Os. Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size. If such case is detected,
move the data through temporary sequential buffer.
MFC after: 2 weeks
There are a number of assumptions about legacy interrupts always
being available in virtio so don't allow back-ends to make the
decision to support them.
This fixes the issue seen with virtio-rnd on OpenBSD. MSI-x vectors
were not being used, and the virtio-rnd backend wasn't allocating a
legacy interrupt resulting in a bhyve assert and guest exit.
Reported by: Julian Hsiao, madoka at nyanisore dot net
Reviewed by: neel
MFC after: 1 week
I've missed that network driver sometimes returns taken request back to
available queue without processing. Add new helper function for that case.
Reported by: flo
MFC after: 2 weeks
I/O interface.
Asynchronous operation, based on r280026 change, allows to not block virtual
CPU during I/O processing, that on slow/busy storage can take seconds.
Use of recently improved block I/O interface allows to process multiple
requests same time, that improves random I/O performance on wide storages.
Benchmarks of virtual disk, backed by ZVOL on RAID10 pool of 4 HDDs, show
~3.5 times random read performance improvements, while no degradation on
linear I/O. Guest CPU usage during test dropped from 100% to almost zero.
MFC after: 2 weeks
Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.
On parallel random I/O this allows better utilize wide storage pools.
To not confuse prefetcher on linear I/O, consecutive requests are executed
sequentially, following the same logic as was earlier implemented in CTL.
Benchmarks of virtual AHCI disk, backed by ZVOL on RAID10 pool of 4 HDDs,
show ~3.5 times random read performance improvements, while no degradation
on linear I/O.
MFC after: 2 weeks
It works only for virtual disks backed by ZVOLs and raw devices supporting
BIO_DELETE. Virtual disks backed by files won't report this capability.
MFC after: 2 weeks
Relnotes: yes
Passing through physical block size/offset from underlying storage allows
guest to manage proper data and I/O alignment to improve performance.
MFC after: 2 weeks
OpenBSD guests test bit 0 of this MSR to detect whether the workaround for
erratum 721 has been applied.
Reported by: Jason Tubnor (jason@tubnor.net)
MFC after: 1 week
The default remains localtime for compatibility with the original device model
in bhyve(8). This is required for OpenBSD guests which assume that the RTC
keeps UTC time.
Reviewed by: grehan
Pointed out by: Jason Tubnor (jason@tubnor.net)
MFC after: 2 weeks
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.
Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.
bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())
Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.
Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.
The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D1385
MFC after: 2 weeks
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).
Discussed with: grehan
bhyve doesn't emulate the MSRs needed to support this feature at this time.
Don't expose any model-specific RAS and performance monitoring features in
cpuid leaf 80000007H.
Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and
BIOS signature and P-state related MSRs.
This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels
2.6.32, 3.10.0 and 3.17.0.
PxCMD.ST from '1' to '0' and back. This allows the driver a chance to
recover if for instance a timeout occurred due to activity on the
host.
Reviewed by: grehan
emulating a large number of MSRs.
Ignore writes to a couple more AMD-specific MSRs and return 0 on read.
This further reduces the unimplemented MSRs accessed by a Linux guest on boot.
CPUID.80000001H:ECX.
Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and
returning 0 on reads.
This further reduces the number of unimplemented MSRs hit by a Linux guest
during boot.
This gets rid of the "TSC doesn't count with P0 frequency!" message when
booting a Linux guest.
Tested on an "AMD Opteron 6320" courtesy of Ben Perrault.
values. Therefore the bit width of the "PM Timer Block" was actually being
interpreted as 50-bits instead of the expected 32-bit.
This eliminates an error message emitted by a Linux 3.17 guest during boot:
"Invalid length for FADT/PmTimerBlock: 50, using default 32"
Reviewed by: grehan
MFC after: 1 week
This gets rid of the following error message during FreeBSD guest bootup:
"vtbd0: hard error cmd=flush fsbn 0"
Reported by: rodrigc
Reviewed by: grehan
The new IASL from the recent acpi-ca import will error out
if it doesn't see these new fields, which were previously
reserved.
Reported by: lme
Reviewed by: neel
The mixed little/big-endianness of SMBIOS UUIDs was clarified in v2.6
of the SMBIOS spec. dmidecode uses the reported version of SMBIOS to
determine the layout and what to byte-swap.
bhyve's SMBIOS reported as 2.4 though it implemented the 2.6-style of
memory layout. This resulted in dmidecode reporting a different
UUID than one passed in via the -U option.
Fix by exporting a version of 2.6.
Reviewed by: tychon
Reported by: julian
MFC after: 1 day
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.
The VT-x code has the following types of MSRs:
- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).
- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).
- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).
Reviewed by: grehan
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.
Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.
Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.
Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).
Reviewed by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan
NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.
Tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.
Reviewed by: neel, tychon
Phabric: D745
MFC after: 3 days
in the emulation of the task switch. If any exceptions are triggered then the
guest %rip should point to instruction that caused the task switch as opposed
to the one after it.
Add the ACPI MCFG table to advertise the extended config memory window.
Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.
Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.
CR: https://phabric.freebsd.org/D505
Reviewed by: jhb, grehan
Requested by: tychon
change 0 means success and non-zero means failure.
This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values
from VM-exit handlers.
CR: D480
Reviewed by: grehan, jhb
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.
A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.
vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.
vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
option).
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
register pairs where two 32-bit registers make up a larger logical
size. Support those access by splitting the quad-word into two
double-words.
Reviewed by: grehan
to 16. This is arbitrary and is used to ensure that a vcpu goes back into
the vm_run() loop to process interrupts or rendezvous events in a timely
fashion.
Found with: Coverity Scan
CID: 1216436
it implicitly in vmm.ko.
Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.
This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.