Commit Graph

48 Commits

Author SHA1 Message Date
Corvin Köhne
6171e026be bhyve: add support for MTRR
Some guests or driver might depend on MTRR to work properly. E.g. the
nvidia gpu driver won't work without MTRR.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D33333
2022-01-14 12:41:44 +01:00
Mark Johnston
4c599db71a vmm: Let guests enable SMEP/SMAP if the host supports it
Reviewed by:	kib, grehan, jhb
Tested by:	grehan (AMD)
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30462
2021-05-26 09:34:52 -04:00
John Baldwin
a3f2a9c57e Clear the upper 32-bits of registers in x86_emulate_cpuid().
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared.  While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.

This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.

SmartOS bug:	https://smartos.org/bugview/OS-8168
Submitted by:	Patrick Mooney
Reviewed by:	rgrimes
Differential Revision:	https://reviews.freebsd.org/D24727
2020-10-01 16:45:11 +00:00
Peter Grehan
f5f5f1e7d6 Support guest rdtscp and rdpid instructions on Intel VT-x
Enable any of rdtscp and/or rdpid for bhyve guests on Intel-based hosts
that support the "enable RDTSCP" VM-execution control.

Submitted by:	adam_fenn.io
Reported by:	chuck
Reviewed by:	chuck, grehan, jhb
Approved by:	jhb (bhyve), grehan
MFC after:	3 weeks
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D26003
2020-08-18 07:23:47 +00:00
Peter Grehan
ec048c7550 Hide host CPUID 0x15 TSC/Crystal ratio/freq info from guest
In recent Linux (5.3+) and OpenBSD (6.6+) kernels, and with hosts that
support CPUID 0x15, the local APIC frequency is determined directly
from the reported crystal clock to avoid calibration against the 8254
timer.

However, the local APIC frequency implemented by bhyve is 128MHz, where
most h/w systems report frequencies around 25MHz. This shows up on
OpenBSD guests as repeated keystrokes on the emulated PS2 keyboard
when using VNC, since the kernel's timers are now much shorter.

Fix by reporting all-zeroes for CPUID 0x15. This allows guests to fall
back to using the 8254 to calibrate the local APIC frequency.

Future work could be to compute values returned for 0x15 that would
match the host TSC and bhyve local APIC frequency, though all dependencies
on this would need to be examined (for example, Linux will start using
0x16 for some hosts).

PR:	246321
Reported by:	Jason Tubnor (and tested)
Reviewed by:	jhb
Approved by:	jhb, bz (mentor)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D24837
2020-05-14 22:18:12 +00:00
Pawel Biernacki
b40598c539 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (4 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23625
X-Generally looks fine:	jhb
2020-02-15 18:57:49 +00:00
Konstantin Belousov
caab504277 vmm: Add Hygon Dhyana support.
Submitted by:	Pu Wen <puwen@hygon.cn>
Discussed with:	grehan
Reviewed by:	jhb (previous version)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23553
2020-02-13 19:03:12 +00:00
John Baldwin
e519cee307 Expose the MD_CLEAR capability used by Intel MDS mitigations to guests.
Submitted by:	Patrick Mooney <pmooney@pfmooney.com>
Reviewed by:	kib
Tested by:	Patrick on SmartOS with Linux and Windows guests
Obtained from:	Joyent
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20296
2019-05-18 21:20:38 +00:00
Conrad Meyer
fce2d624ea vmm(4): Pass through RDSEED feature bit to guests
Reviewed by:	jhb
Approved by:	#bhyve (jhb)
MFC after:	2 leapseconds
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20194
2019-05-08 00:40:08 +00:00
Conrad Meyer
d0c7cde53e vmm(4): Mask Spectre feature bits on AMD hosts
For parity with Intel hosts, which already mask out the CPUID feature
bits that indicate the presence of the SPEC_CTRL MSR, do the same on
AMD.

Eventually we may want to have a better support story for guests, but
for now, limit the damage of incorrectly indicating an MSR we do not yet
support.

Eventually, we may want a generic CPUID override system for
administrators, or for minimum supported feature set in heterogenous
environments with failover.  That is a much larger scope effort than
this bug fix.

PR:		235010
Reported by:	Rys Sommefeldt <rys AT sommefeldt.com>
Sponsored by:	Dell EMC Isilon
2019-01-18 23:54:51 +00:00
Conrad Meyer
15b7da10ac vmm(4): Take steps towards multicore bhyve AMD support
vmm's CPUID emulation presented Intel topology information to the guest, but
disabled AMD topology information and in some cases passed through garbage.
I.e., CPUID leaves 0x8000_001[de] were passed through to the guest, but
guest CPUs can migrate between host threads, so the information presented
was not consistent.  This could easily be observed with 'cpucontrol -i 0xfoo
/dev/cpuctl0'.

Slightly improve this situation by enabling the AMD topology feature flag
and presenting at least the CPUID fields used by FreeBSD itself to probe
topology on more modern AMD64 hardware (Family 15h+).  Older stuff is
probably less interesting.  I have not been able to empirically confirm it
is sufficient, but it should not regress anything either.

Reviewed by:	araujo (previous version)
Relnotes:	sure
2019-01-16 02:19:04 +00:00
Rodney W. Grimes
01d822d33b Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).

The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.

Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse.  [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.

bhyvectl --get-cpu-topology option added.

Reviewed by:	grehan (maintainer, earlier version),
Reviewed by:	bcr (manpages)
Approved by:	bde (mentor), phk (mentor)
Tested by:	Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after:	1 week
Relnotes:	Y
Differential Revision:	https://reviews.freebsd.org/D9930
2018-04-08 19:24:49 +00:00
Pedro F. Giffuni
c49761dd57 sys/amd64: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:03:07 +00:00
Peter Grehan
3da443021f Hide the AMD MONITORX/MWAITX capability.
Otherwise, recent Linux guests will use these instructions, resulting
in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits.

This fixes boot-time hangs in recent Linux guests on Ryzen CPUs
(and probably Bulldozer aka AMD FX as well).

Reviewed by:	kib
MFC after:	1 week
2017-03-16 03:21:42 +00:00
Neel Natu
ea91ca92ba Do a proper emulation of guest writes to MSR_EFER.
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
  segment limits in 64-bit mode.

MFC after:	2 weeks
2015-05-06 05:40:20 +00:00
Neel Natu
317080849e Don't advertise the Intel SMX capability to the guest.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-02 19:07:49 +00:00
Neel Natu
1d29bfc149 Emulate machine check related MSRs to allow guest OSes like Windows to boot.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-02 04:19:11 +00:00
Neel Natu
8325ce5c7e Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.

MFC after:	1 week
2015-04-30 22:23:22 +00:00
Neel Natu
7d786ee2a9 Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.
This is required for booting Windows guests.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-30 19:23:50 +00:00
Alexander Motin
c077e6287f Report ARAT (APIC-Timer-always-running) feature for virtual CPU.
This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET
due to worries about non-existing for virtual CPUs deep sleep states.

Benchmarks of usleep(1) on guest and host show such extra latencies:
 - 51us for virtual HPET,
 - 22us for virtual LAPIC timer,
 - 22us for host HPET and
 - 3us for host LAPIC timer.

MFC after:	2 weeks
2015-03-16 11:57:03 +00:00
Neel Natu
592cd7d3be Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX.
bhyve doesn't emulate the MSRs needed to support this feature at this time.

Don't expose any model-specific RAS and performance monitoring features in
cpuid leaf 80000007H.

Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and
BIOS signature and P-state related MSRs.

This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels
2.6.32, 3.10.0 and 3.17.0.
2014-10-19 21:38:58 +00:00
Neel Natu
65d5111ac1 Don't advertise support for the NodeID MSR since bhyve doesn't emulate it. 2014-10-18 05:39:32 +00:00
Neel Natu
2688a818a3 Don't advertise the Instruction Based Sampling feature because it requires
emulating a large number of MSRs.

Ignore writes to a couple more AMD-specific MSRs and return 0 on read.

This further reduces the unimplemented MSRs accessed by a Linux guest on boot.
2014-10-17 06:23:04 +00:00
Neel Natu
02904c45ab Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 in
CPUID.80000001H:ECX.

Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and
returning 0 on reads.

This further reduces the number of unimplemented MSRs hit by a Linux guest
during boot.
2014-10-17 03:04:38 +00:00
Neel Natu
5a1f0b36b1 Fix topology enumeration issues exposed by AMD Bulldozer Family 15h processor.
Initialize CPUID.80000008H:ECX[7:0] with the number of logical processors in
the package. This fixes a panic during early boot in NetBSD 7.0 BETA.

Clear the Topology Extension feature bit from CPUID.80000001H:ECX since we
don't emulate leaves 0x8000001D and 0x8000001E. This fixes a divide by zero
panic in early boot in Centos 6.4.

Tested on an "AMD Opteron 6320" courtesy of Ben Perrault.

Reviewed by:	grehan
2014-10-16 18:13:10 +00:00
Neel Natu
06053618cb Actually hide the SVM capability by clearing CPUID.80000001H:ECX[bit 3]
after it has been initialized by cpuid_count().

Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-10-15 04:29:03 +00:00
Neel Natu
4e27d36d38 IFC @r271694 2014-09-17 18:46:51 +00:00
Neel Natu
246e7a2b64 IFC @r269962
Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-09-02 04:22:42 +00:00
Neel Natu
8bd3845d3c Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package"
tunables to modify the default cpu topology advertised by bhyve.

Also add a tunable "hw.vmm.topology.cpuid_leaf_b" to disable the CPUID
leaf 0xb. This is intended for testing guest behavior when it falls back
on using CPUID leaf 0x4 to deduce CPU topology.

The default behavior is to advertise each vcpu as a core in a separate soket.
2014-08-24 01:10:06 +00:00
Neel Natu
534dc967d7 Fix a bug in the emulation of CPUID leaf 0x4 where bhyve was claiming that
the vcpu had no caches at all. This causes problems when executing applications
in the guest compiled with the Intel compiler.

Submitted by:	Mark Hill (mark.hill@tidalscale.com)
2014-08-23 22:44:31 +00:00
Peter Grehan
eee8190aab Bring (almost) up-to-date with HEAD.
- use the new virtual APIC page
- update to current bhyve APIs

Tested by Anish with multiple FreeBSD SMP VMs on a Phenom,
and verified by myself with light FreeBSD VM testing
on a Sempron 3850 APU.

The issues reported with Linux guests are very likely to still
be here, but this sync eliminates the skew between the
project branch and CURRENT, and should help to determine
the causes.

Some follow-on commits will fix minor cosmetic issues.

Submitted by:	Anish Gupta (akgupt3@gmail.com)
2014-06-03 06:56:54 +00:00
John Baldwin
44a68c4e40 - Rework the XSAVE/XRSTOR emulation to only expose XCR0 features to the
guest for which the rules regarding xsetbv emulation are known.  In
  particular future extensions like AVX-512 have interdependencies among
  feature bits that could allow a guest to trigger a GP# in the host with
  the current approach of allowing anything the host supports.
- Add proper checking of Intel MPX and AVX-512 XSAVE features in the
  xsetbv emulation and allow these features to be exposed to the guest if
  they are enabled in the host.
- Expose a subset of known-safe features from leaf 0 of the structured
  extended features to guests if they are supported on the host including
  RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM.  Aside
  from AVX-512, these features are all new instructions available for use
  in ring 3 with no additional hypervisor changes needed.

Reviewed by:	neel
2014-05-27 19:04:38 +00:00
Tycho Nightingale
e0f210e6ef Account for the "plus 1" encoding of the CPUID Function 4 reported
core per package and cache sharing values.

Approved by:	grehan (co-mentor)
2014-04-11 18:19:21 +00:00
Neel Natu
52e5c8a2ec Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with:	grehan@
2014-02-20 01:48:25 +00:00
John Baldwin
abb023fb95 Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and
XSAVE-enabled features like AVX.
- Store a per-cpu guest xcr0 register.  When switching to the guest FPU
  state, switch to the guest xcr0 value.  Note that the guest FPU state is
  saved and restored using the host's xcr0 value and xcr0 is saved/restored
  "inside" of saving/restoring the guest FPU state.
- Handle VM exits for the xsetbv instruction by updating the guest xcr0.
- Expose the XSAVE feature to the guest only if the host has enabled XSAVE,
  and only advertise XSAVE features enabled by the host to the guest.
  This ensures that the guest will only adjust FPU state that is a subset
  of the guest FPU state saved and restored by the host.

Reviewed by:	grehan
2014-02-08 16:37:54 +00:00
Neel Natu
49cc03da31 Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Reviewed by:	grehan
MFC after:	3 days
2013-10-16 18:20:27 +00:00
Peter Grehan
517e21d3e7 Hide TSC-deadline APIC timer support from guests. This mode
isn't yet implemented in bhyve's APIC emulation.

Reviewed by:	neel
Approved by:	re@ (blanket)
2013-09-17 17:56:53 +00:00
Peter Grehan
8b7e3e3022 Allow CPUID leaf 0xD to be read as zeroes.
Linux reads this even though extended features
aren't exposed.

Support for 0xD will be expanded once AVX[2]
is exposed to the guest in upcoming work.
2013-09-06 05:16:10 +00:00
Peter Grehan
560d5eda2c Make sure all CPUID values are handled, instead of exiting the
bhyve process when an unhandled one is encountered.

Hide some additional capabilities from the guest (e.g. debug store).

This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on
AP spinup (where CPUID is used when sync'ing the TSCs) and the
issue with the Java build where CPUIDs are issued from a guest
userspace.

Submitted by:	tycho nightingale at pluribusnetworks com
Reviewed by:	neel
Reported by:	many
2013-06-28 06:05:33 +00:00
Neel Natu
1472b87f2f Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead
  of directly using rdtsc().

- don't advertise the invariant TSC capability to the guest to discourage it
  from using the TSC as its time base.

Discussed with:	jhb@ (about making 'smp_tsc' a global)
Reported by:	Dan Mack on freebsd-virtualization@
Obtained from:	NetApp
2013-04-10 05:59:07 +00:00
Neel Natu
25448de222 Requests for invalid CPUID leaves should map to the highest known leaf instead.
Reviewed by:	grehan
Obtained from:	NetApp
2013-02-13 23:22:17 +00:00
Peter Grehan
a0cad47092 Handle CPUID leaf 0x7 now that FreeBSD is using it.
Return 0's for now.

Reviewed by:	neel
Obtained from:	NetApp
2012-11-20 06:01:03 +00:00
Neel Natu
ff6ec151e0 Hide the monitor/mwait instruction capability from the guest until we know how
to properly intercept it.

Obtained from:	NetApp
2012-10-25 04:08:26 +00:00
Neel Natu
a2da7af6bc Add support for trapping MMIO writes to local apic registers and emulating them.
The default behavior is still to present the local apic to the guest in the
x2apic mode.
2012-09-25 22:31:35 +00:00
Peter Grehan
298379f7fb Until the issue of how to handle guest XCR0 state is resolved,
prevent CURRENT guests from hitting unhandled xsetbv exits
by hiding the xsave/osxsave/avx cpuid2 bits.
2012-05-03 05:04:37 +00:00
John Baldwin
8b28761278 Some tweaks to the CPUID support:
- Don't always pass the cpuid request to the current CPU as some nodes
  we will emulate purely in software.
- Pass in the APIC ID of the virtual CPU so we can return the proper APIC
  ID.
- Always report a completely flat topology with no SMT or multicore.
- Report the CPUID2_HV feature and implement support for the 0x40000000
  CPUID level.
- Use existing constants from <machine/specialreg.h> when possible and
  use cpu_feature2 when checking for VMX support.
2011-06-02 14:04:07 +00:00
Peter Grehan
1f3025e133 Changes to allow the GENERIC+bhye kernel built from this branch to
run as a 1/2 CPU guest on an 8.1 bhyve host.

bhyve/inout.c
      inout.h
      fbsdrun.c
 - Rather than exiting on accesses to unhandled i/o ports, emulate
   hardware by returning -1 on reads and ignoring writes to unhandled
   ports. Support the previous mode by allowing a 'strict' parameter
   to be set from the command line.
   The 8.1 guest kernel was vastly cut down from GENERIC and had no
   ISA devices. Booting GENERIC exposes a massive amount of random
   touching of i/o ports (hello syscons/vga/atkbdc).

bhyve/consport.c
dev/bvm/bvm_console.c
 - implement a simplistic signature for the bvm console by returning
   'bv' for an inw on the port. Also, set the priority of the console
   to CN_REMOTE if the signature was returned. This works better in
   an environment where multiple consoles are in the kernel (hello syscons)

bhyve/rtc.c
 - return 0 for the access to RTC_EQUIPMENT (yes, you syscons)

amd64/vmm/x86.c
          x86.h
 - hide a bunch more CPUID leaf 1 bits from the guest to prevent
   cpufreq drivers from probing.
   The next step will be to move CPUID handling completely into
   user-space. This will allow the full spectrum of changes from
   presenting a lowest-common-denominator CPU type/feature set, to
   exposing (almost) everything that the host can support.

Reviewed by:	neel
Obtained from:	NetApp
2011-05-19 21:53:25 +00:00
Peter Grehan
366f60834f Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
  bhyve  - user-space sequencer and i/o emulation
  vmmctl - dump of hypervisor register state
  libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
	Joe CaraDonna
	Peter Snyder
	Jeff Heller
	Sandeep Mann
	Steve Miller
	Brian Pawlowski
2011-05-13 04:54:01 +00:00