to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.
Add "MOVS" instruction support for the 'MMIO to MMIO' case.
Reviewed by: neel
code segment base address.
Also if an instruction doesn't support a mod R/M (modRM) byte, don't
be concerned if the CPU is in real mode.
Reviewed by: neel
This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET
due to worries about non-existing for virtual CPUs deep sleep states.
Benchmarks of usleep(1) on guest and host show such extra latencies:
- 51us for virtual HPET,
- 22us for virtual LAPIC timer,
- 22us for host HPET and
- 3us for host LAPIC timer.
MFC after: 2 weeks
- fix warning about comparison of 'uint8_t v_tpr >= 0' always being true.
- fix error triggered by an empty clobber list in the inline assembly for
"clgi" and "stgi"
- fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The
gcc assembler does not like the explicit operand "%rax" while the clang
assembler requires specifying the operand "%rax". Fix this by encoding the
instructions using the ".byte" directive.
Reported by: julian
MFC after: 1 week
Allow the ppt driver to attach to devices that were hinted to be
passthrough devices by the PCI code creating them with a driver
name of "ppt".
Add a tunable that allows the IOMMU to be forced to be used. With
SR-IOV passthrough devices the VFs may be created after vmm.ko is
loaded. The current code will not initialize the IOMMU in that
case, meaning that the passthrough devices can't actually be used.
Differential Revision: https://reviews.freebsd.org/D73
Reviewed by: neel
MFC after: 1 month
Sponsored by: Sandvine Inc.
capability of VT-x. This lets bhyve run nested in older VMware versions that
don't support the PAT save/restore capability.
Note that the actual value programmed by the guest in MSR_PAT is irrelevant
because bhyve sets the 'Ignore PAT' bit in the nested PTE.
Reported by: marcel
Tested by: Leon Dang (ldang@nahannisys.com)
Sponsored by: Nahanni Systems
MFC after: 2 weeks
hw.x2apic_enable tunable allows disabling it from the loader prompt.
To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications. This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered. This
may be changed later.
In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.
Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.
Reviewed by: neel
Tested by: pho (real hardware), neel (on bhyve)
Discussed with: jhb, grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
These instructions are emitted by 'bus_space_read_region()' when accessing
MMIO regions.
Since MOVS can be used with a repeat prefix start decoding the REPZ and
REPNZ prefixes. Also start decoding the segment override prefix since MOVS
allows overriding the source operand segment register.
Tested by: tychon
MFC after: 1 week
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.
Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.
bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())
Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks
VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping
track pending exceptions for a vcpu. This in turn causes confusion because
some fields in 'struct vm_exception' like 'vcpuid' make sense only in the
ioctl context. It also makes it harder to add or remove structure fields.
Fix this by using 'struct vm_exception' only to communicate information
from userspace to vmm.ko when injecting an exception.
Also, add a field 'restart_instruction' to 'struct vm_exception'. This
field is set to '1' for exceptions where the faulting instruction is
restarted after the exception is handled.
MFC after: 1 week
emulated or when the vcpu incurs an exception. This matches the CPU behavior.
Remove special case code in HLT processing that was clearing the interrupt
shadow. This is now redundant because the interrupt shadow is always cleared
when the vcpu is resumed after an instruction is emulated.
Reported by: David Reed (david.reed@tidalscale.com)
MFC after: 2 weeks
vm_inject_exception(). This fixes the issue that 'exception.cpuid' is
uninitialized when calling 'vm_inject_exception()'.
However, in practice this change is a no-op because vm_inject_exception()
does not use 'exception.cpuid' for anything.
Reported by: Coverity Scan
CID: 1261297
MFC after: 3 days
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.
Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.
The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D1385
MFC after: 2 weeks
OpenBSD guests always enable "special mask mode" during boot. As a result of
r275952 this is flagged as an error and the guest cannot boot.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D1384
MFC after: 1 week
"hw.vmm.trace_guest_exceptions". To enable this feature set the tunable
to "1" before loading vmm.ko.
Tracing the guest exceptions can be useful when debugging guest triple faults.
Note that there is a performance impact when exception tracing is enabled
since every exception will now trigger a VM-exit.
Also, handle machine check exceptions that happen during guest execution
by vectoring to the host's machine check handler via "int $18".
Discussed with: grehan
MFC after: 2 weeks
- implement 8259 "polled" mode.
- set 'atpic->sfn' if bit 4 in ICW4 is set during master initialization.
- report error if guest tries to enable the "special mask" mode.
Differential Revision: https://reviews.freebsd.org/D1328
Reviewed by: tychon
Reported by: grehan
Tested by: grehan
MFC after: 1 week
Initialize the 8259 such that IRQ7 is the lowest priority.
Reviewed by: tychon
Differential Revision: https://reviews.freebsd.org/D1322
MFC after: 1 week
is deasserted. Prior to this change each assertion on a level triggered irq
pin resulted in two interrupts being delivered to the CPU.
Differential Revision: https://reviews.freebsd.org/D1310
Reviewed by: tychon
MFC after: 1 week
using the VM_MIN_ADDRESS constant.
HardenedBSD redefines VM_MIN_ADDRESS to be 64K, which results in
bhyve VM startup failing. Guest memory is always assumed to start
at 0 so use the absolute value instead.
Reported by: Shawn Webb, lattera at gmail com
Reviewed by: neel, grehan
Obtained from: Oliver Pinter via HardenedBSD
23bd719ce1
MFC after: 1 week
'struct vm *'. Previously it used to be a 'void *' but there is no reason
to hide the actual type from the handler.
Discussed with: tychon
MFC after: 1 week
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).
Discussed with: grehan
Most I/O port handlers return -1 to signal an error. If this value is returned
without modification to vm_run() then it leads to incorrect behavior because
'-1' is interpreted as ERESTART at the system call level.
Fix this by always returning EIO to signal an error from an I/O port handler.
MFC after: 1 week
bhyve doesn't emulate the MSRs needed to support this feature at this time.
Don't expose any model-specific RAS and performance monitoring features in
cpuid leaf 80000007H.
Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and
BIOS signature and P-state related MSRs.
This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels
2.6.32, 3.10.0 and 3.17.0.
Rename vmx_assym.s to vmx_assym.h to reflect that file's actual use
and update vmx_support.S's include to match. Add vmx_assym.h to the
SRCS to that it gets properly added to the dependency list. Add
vmx_support.S to SRCS as well, so it gets built and needs fewer
special-case goo. Remove now-redundant special-case goo. Finally,
vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS}
explicitly, that's all taken care of by beforedepend.
With these items fixed, we no longer build vmm.ko every single time
through the modules on a KERNFAST build.
Sponsored by: Netflix
emulating a large number of MSRs.
Ignore writes to a couple more AMD-specific MSRs and return 0 on read.
This further reduces the unimplemented MSRs accessed by a Linux guest on boot.
CPUID.80000001H:ECX.
Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and
returning 0 on reads.
This further reduces the number of unimplemented MSRs hit by a Linux guest
during boot.
Initialize CPUID.80000008H:ECX[7:0] with the number of logical processors in
the package. This fixes a panic during early boot in NetBSD 7.0 BETA.
Clear the Topology Extension feature bit from CPUID.80000001H:ECX since we
don't emulate leaves 0x8000001D and 0x8000001E. This fixes a divide by zero
panic in early boot in Centos 6.4.
Tested on an "AMD Opteron 6320" courtesy of Ben Perrault.
Reviewed by: grehan
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe