Commit Graph

347 Commits

Author SHA1 Message Date
Neel Natu
e09ff17129 Add macro to identify AVIC capability (advanced virtual interrupt controller)
in AMD processors.

Submitted by:	Dmitry Luhtionov (dmitryluhtionov@gmail.com)
2015-01-24 00:35:49 +00:00
Neel Natu
7534635359 MOVS instruction emulation.
These instructions are emitted by 'bus_space_read_region()' when accessing
MMIO regions.

Since MOVS can be used with a repeat prefix start decoding the REPZ and
REPNZ prefixes. Also start decoding the segment override prefix since MOVS
allows overriding the source operand segment register.

Tested by:	tychon
MFC after:	1 week
2015-01-19 06:53:31 +00:00
Neel Natu
d087a39935 Simplify instruction restart logic in bhyve.
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.

Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.

bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
  as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
  as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())

Differential Revision:	https://reviews.freebsd.org/D1526
Reviewed by:		grehan
MFC after:		2 weeks
2015-01-18 03:08:30 +00:00
Neel Natu
07820b4b4c Fix typo (missing comma).
MFC after:	3 days
2015-01-14 07:18:51 +00:00
Neel Natu
c9c75df48c 'struct vm_exception' was intended to be used only as the collateral for the
VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping
track pending exceptions for a vcpu. This in turn causes confusion because
some fields in 'struct vm_exception' like 'vcpuid' make sense only in the
ioctl context. It also makes it harder to add or remove structure fields.

Fix this by using 'struct vm_exception' only to communicate information
from userspace to vmm.ko when injecting an exception.

Also, add a field 'restart_instruction' to 'struct vm_exception'. This
field is set to '1' for exceptions where the faulting instruction is
restarted after the exception is handled.

MFC after:      1 week
2015-01-13 22:00:47 +00:00
Neel Natu
2ce1242309 Clear blocking due to STI or MOV SS in the hypervisor when an instruction is
emulated or when the vcpu incurs an exception. This matches the CPU behavior.

Remove special case code in HLT processing that was clearing the interrupt
shadow. This is now redundant because the interrupt shadow is always cleared
when the vcpu is resumed after an instruction is emulated.

Reported by:	David Reed (david.reed@tidalscale.com)
MFC after:	2 weeks
2015-01-06 19:04:02 +00:00
Neel Natu
cd86d3634b Initialize all fields of 'struct vm_exception exception' before passing it to
vm_inject_exception(). This fixes the issue that 'exception.cpuid' is
uninitialized when calling 'vm_inject_exception()'.

However, in practice this change is a no-op because vm_inject_exception()
does not use 'exception.cpuid' for anything.

Reported by:    Coverity Scan
CID:            1261297
MFC after:      3 days
2014-12-30 23:38:31 +00:00
Neel Natu
0dafa5cd4b Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.
The new RTC emulation supports all interrupt modes: periodic, update ended
and alarm. It is also capable of maintaining the date/time and NVRAM contents
across virtual machine reset. Also, the date/time fields can now be modified
by the guest.

Since bhyve now emulates both the PIT and the RTC there is no need for
"Legacy Replacement Routing" in the HPET so get rid of it.

The RTC device state can be inspected via bhyvectl as follows:
bhyvectl --vm=vm --get-rtc-time
bhyvectl --vm=vm --set-rtc-time=<unix_time_secs>
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram
bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value>

Reviewed by:	tychon
Discussed with:	grehan
Differential Revision:	https://reviews.freebsd.org/D1385
MFC after:	2 weeks
2014-12-30 22:19:34 +00:00
Neel Natu
95474bc26a Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on
an AMD/SVM host.

MFC after:	1 week
2014-12-30 02:44:33 +00:00
Neel Natu
1a5934ef8e Implement "special mask mode" in vatpic.
OpenBSD guests always enable "special mask mode" during boot. As a result of
r275952 this is flagged as an error and the guest cannot boot.

Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D1384
MFC after:	1 week
2014-12-28 00:53:52 +00:00
Neel Natu
b053814333 Allow ktr(4) tracing of all guest exceptions via the tunable
"hw.vmm.trace_guest_exceptions".  To enable this feature set the tunable
to "1" before loading vmm.ko.

Tracing the guest exceptions can be useful when debugging guest triple faults.

Note that there is a performance impact when exception tracing is enabled
since every exception will now trigger a VM-exit.

Also, handle machine check exceptions that happen during guest execution
by vectoring to the host's machine check handler via "int $18".

Discussed with:	grehan
MFC after:	2 weeks
2014-12-23 02:14:49 +00:00
Neel Natu
d66bcddce4 Emulate writes to the IA32_MISC_ENABLE MSR.
PR:		196093
Reported by:	db
Tested by:	db
Discussed with:	grehan
MFC after:	1 week
2014-12-20 19:47:51 +00:00
Neel Natu
ac721e53ec Various 8259 device model improvements:
- implement 8259 "polled" mode.
- set 'atpic->sfn' if bit 4 in ICW4 is set during master initialization.
- report error if guest tries to enable the "special mask" mode.

Differential Revision:	https://reviews.freebsd.org/D1328
Reviewed by:		tychon
Reported by:		grehan
Tested by:		grehan
MFC after:		1 week
2014-12-20 04:57:45 +00:00
Neel Natu
e64c5af3f8 Fix 8259 IRQ priority resolver.
Initialize the 8259 such that IRQ7 is the lowest priority.

Reviewed by:		tychon
Differential Revision:	https://reviews.freebsd.org/D1322
MFC after:		1 week
2014-12-17 03:04:43 +00:00
Neel Natu
09eced2549 For level triggered interrupts clear the PIC IRR bit when the interrupt pin
is deasserted. Prior to this change each assertion on a level triggered irq
pin resulted in two interrupts being delivered to the CPU.

Differential Revision:	https://reviews.freebsd.org/D1310
Reviewed by:	tychon
MFC after:	1 week
2014-12-16 06:33:57 +00:00
Peter Grehan
526c8885fd Change the lower bound for guest vmspace allocation to 0 instead of
using the VM_MIN_ADDRESS constant.

HardenedBSD redefines VM_MIN_ADDRESS to be 64K, which results in
bhyve VM startup failing. Guest memory is always assumed to start
at 0 so use the absolute value instead.

Reported by:	Shawn Webb, lattera at gmail com
Reviewed by:	neel, grehan
Obtained from:	Oliver Pinter via HardenedBSD
23bd719ce1
MFC after:	1 week
2014-11-23 23:07:21 +00:00
Marcelo Araujo
d018cd6f5e Reported by: Coverity
CID:		1249760
Reviewed by:	neel
Approved by:	neel
Sponsored by:	QNAP Systems Inc.
2014-10-28 07:19:02 +00:00
Peter Grehan
f1be09bd95 Remove bhyve SVM feature printf's now that they are available in the
general CPU feature detection code.

Reviewed by:	neel
2014-10-27 22:20:51 +00:00
Neel Natu
f0c8263e55 Change the type of the first argument to the I/O emulation handlers to
'struct vm *'. Previously it used to be a 'void *' but there is no reason
to hide the actual type from the handler.

Discussed with:	tychon
MFC after:	1 week
2014-10-26 19:03:06 +00:00
Neel Natu
160ef77abf Move the ACPI PM timer emulation into vmm.ko.
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).

Discussed with:	grehan
2014-10-26 04:44:28 +00:00
Neel Natu
31b117bec9 Don't pass the 'error' return from an I/O port handler directly to vm_run().
Most I/O port handlers return -1 to signal an error. If this value is returned
without modification to vm_run() then it leads to incorrect behavior because
'-1' is interpreted as ERESTART at the system call level.

Fix this by always returning EIO to signal an error from an I/O port handler.

MFC after:	1 week
2014-10-26 03:03:41 +00:00
Neel Natu
e1a172e1c2 IFC @r273214 2014-10-20 02:57:30 +00:00
Neel Natu
867b59607c IFC @r273206 2014-10-19 23:05:18 +00:00
Neel Natu
592cd7d3be Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX.
bhyve doesn't emulate the MSRs needed to support this feature at this time.

Don't expose any model-specific RAS and performance monitoring features in
cpuid leaf 80000007H.

Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and
BIOS signature and P-state related MSRs.

This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels
2.6.32, 3.10.0 and 3.17.0.
2014-10-19 21:38:58 +00:00
Neel Natu
65d5111ac1 Don't advertise support for the NodeID MSR since bhyve doesn't emulate it. 2014-10-18 05:39:32 +00:00
Warner Losh
b82e2e94e2 Fix build to not bogusly always rebuild vmm.ko.
Rename vmx_assym.s to vmx_assym.h to reflect that file's actual use
and update vmx_support.S's include to match. Add vmx_assym.h to the
SRCS to that it gets properly added to the dependency list. Add
vmx_support.S to SRCS as well, so it gets built and needs fewer
special-case goo. Remove now-redundant special-case goo. Finally,
vmx_genassym.o doesn't need to depend on a hand expanded ${_ILINKS}
explicitly, that's all taken care of by beforedepend.

With these items fixed, we no longer build vmm.ko every single time
through the modules on a KERNFAST build.

Sponsored by: Netflix
2014-10-17 13:20:49 +00:00
Neel Natu
2688a818a3 Don't advertise the Instruction Based Sampling feature because it requires
emulating a large number of MSRs.

Ignore writes to a couple more AMD-specific MSRs and return 0 on read.

This further reduces the unimplemented MSRs accessed by a Linux guest on boot.
2014-10-17 06:23:04 +00:00
Neel Natu
02904c45ab Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 in
CPUID.80000001H:ECX.

Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and
returning 0 on reads.

This further reduces the number of unimplemented MSRs hit by a Linux guest
during boot.
2014-10-17 03:04:38 +00:00
Neel Natu
b1cf7bb5e4 Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch. 2014-10-16 18:16:31 +00:00
Neel Natu
5a1f0b36b1 Fix topology enumeration issues exposed by AMD Bulldozer Family 15h processor.
Initialize CPUID.80000008H:ECX[7:0] with the number of logical processors in
the package. This fixes a panic during early boot in NetBSD 7.0 BETA.

Clear the Topology Extension feature bit from CPUID.80000001H:ECX since we
don't emulate leaves 0x8000001D and 0x8000001E. This fixes a divide by zero
panic in early boot in Centos 6.4.

Tested on an "AMD Opteron 6320" courtesy of Ben Perrault.

Reviewed by:	grehan
2014-10-16 18:13:10 +00:00
Davide Italiano
2be111bf7d Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by:   kmacy
Tested by:      make universe
2014-10-16 18:04:43 +00:00
Neel Natu
06053618cb Actually hide the SVM capability by clearing CPUID.80000001H:ECX[bit 3]
after it has been initialized by cpuid_count().

Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-10-15 04:29:03 +00:00
Neel Natu
d63e02ea96 Emulate "POP r/m".
This is needed to boot OpenBSD/i386 MP kernel in bhyve.

Reported by:	grehan
MFC after:	1 week
2014-10-14 21:02:33 +00:00
Neel Natu
f37dbf579d Remove extraneous comments. 2014-10-11 04:57:17 +00:00
Neel Natu
8fe9436d4c Get rid of unused headers.
Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static.
Replace ERR() with KASSERT().
style(9) cleanup.
2014-10-11 04:41:21 +00:00
Neel Natu
3d492b65bc Get rid of unused forward declaration of 'struct svm_softc'. 2014-10-11 03:21:33 +00:00
Neel Natu
92337d968c style(9) fixes.
Get rid of unused headers.
2014-10-11 03:19:26 +00:00
Neel Natu
882a1f1942 Use a consistent style for messages emitted when the module is loaded. 2014-10-11 03:09:34 +00:00
Neel Natu
ed6aacb51f IFC @r272887 2014-10-10 23:52:56 +00:00
Neel Natu
faba66190e Fix bhyvectl so it works correctly on AMD/SVM hosts. Also, add command line
options to display some key VMCB fields.

The set of valid options that can be passed to bhyvectl now depends on the
processor type. AMD-specific options are identified by a "--vmcb" or "--avic"
in the option name. Intel-specific options are identified by a "--vmcs" in
the option name.

Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-10-10 21:48:59 +00:00
Neel Natu
5295c3e61d Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
- MSR_PLATFORM_INFO
- MSR_TURBO_RATIO_LIMITx
- MSR_RAPL_POWER_UNIT

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:13:33 +00:00
Neel Natu
65145c7f50 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.
The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting
CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to
be present anyways.

Discussed with:	grehan
2014-10-06 20:48:01 +00:00
Neel Natu
107af8f2ed IFC @r272481 2014-10-05 01:28:21 +00:00
Neel Natu
d72978ecd7 Get rid of code that dealt with the hardware not being able to save/restore
the PAT MSR on guest exit/entry. This workaround was done for a beta release
of VMware Fusion 5 but is no longer needed in later versions.

All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT
in the VM exit and entry controls.

Discussed with:	grehan
2014-10-02 05:32:29 +00:00
Neel Natu
970388bf8d IFC @r272185 2014-09-27 22:15:50 +00:00
Neel Natu
30571674ce Simplify register state save and restore across a VMRUN:
- Host registers are now stored on the stack instead of a per-cpu host context.

- Host %FS and %GS selectors are not saved and restored across VMRUN.
  - Restoring the %FS/%GS selectors was futile anyways since that only updates
    the low 32 bits of base address in the hidden descriptor state.
  - GS.base is properly updated via the MSR_GSBASE on return from svm_launch().
  - FS.base is not used while inside the kernel so it can be safely ignored.

- Add function prologue/epilogue so svm_launch() can be traced with Dtrace's
  FBT entry/exit probes. They also serve to save/restore the host %rbp across
  VMRUN.

Reviewed by:	grehan
Discussed with:	Anish Gupta (akgupt3@gmail.com)
2014-09-27 02:04:58 +00:00
Peter Grehan
a48c333805 Allow the PIC's IMR register to be read before ICW initialisation.
As of git submit e179f6914152eca9, the Linux kernel does a simple
probe of the PIC by writing a pattern to the IMR and then reading it
back, prior to the init sequence of ICW words.

The bhyve PIC emulation wasn't allowing the IMR to be read until
the ICW sequence was complete. This limitation isn't required so
relax the test.

With this change, Linux kernels 3.15-rc2 and later won't hang
on boot when calibrating the local APIC.

Reviewed by:	tychon
MFC after:	3 days
2014-09-27 01:15:24 +00:00
Neel Natu
af198d882a Allow more VMCB fields to be cached:
- CR2
- CR0, CR3, CR4 and EFER
- GDT/IDT base/limit fields
- CS/DS/ES/SS selector/base/limit/attrib fields

The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'.

Restructure the code such that the fields above are only modified in a single
place. This makes it easy to invalidate the VMCB cache when any of these fields
is modified.
2014-09-21 23:42:54 +00:00
Neel Natu
4eea1566cb Get rid of unused stat VMM_HLT_IGNORED. 2014-09-21 18:52:56 +00:00
Neel Natu
8f02c5e456 IFC r271888.
Restructure MSR emulation so it is all done in processor-specific code.
2014-09-20 21:46:31 +00:00