Provide a tunable 'machdep.x2apic_desired' to let the administrator override
the default behavior.
Provide a read-only sysctl 'machdep.x2apic' to let the administrator know
whether the kernel is using x2apic or legacy mmio to access local apic.
Tested with Parallels Desktop 8 and bhyve hypervisors.
Also tested running on bare metal Intel Xeon E5-2658.
Obtained from: NetApp
Discussed with: jhb, attilio, avg, grehan
by clang in the local APIC code.
0x81 is a read-modify-write instruction - the EPT check
that only allowed read or write and not both has been
relaxed to allow read and write.
Reviewed by: neel
Obtained from: NetApp
On a nested page table fault the hypervisor will:
- fetch the instruction using the guest %rip and %cr3
- decode the instruction in 'struct vie'
- emulate the instruction in host kernel context for local apic accesses
- any other type of mmio access is punted up to user-space (e.g. ioapic)
The decoded instruction is passed as collateral to the user-space process
that is handling the PAGING exit.
The emulation code is fleshed out to include more addressing modes (e.g. SIB)
and more types of operands (e.g. imm8). The source code is unified into a
single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well
as /usr/sbin/bhyve.
Reviewed by: grehan
Obtained from: NetApp
In the case where the underlying host had disabled MSI-X via the
"hw.pci.enable_msix" tunable, the ppt_setup_msix() function would fail
and return an error without properly cleaning up. This in turn would
cause a page fault on the next boot of the guest.
Fix this by calling ppt_teardown_msix() in all the error return paths.
Obtained from: NetApp
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.
Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.
Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().
Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks
hypervisor. Apparently, hypervisors failed to filter out 'Standard
Extended Features' report from CPUID, but deliver #gp when
corresponding bit in %cr4 is toggled.
This shall be reconsidered later, after hypervisors correct the bug.
Reported and tested by: joel
Reviewed by: avg
MFC after: 2 weeks
between inline asm statements that would in turn modify the flags
value set by the first asm, and used by the second.
Solve by making the common error block a string that can be pulled
into the first inline asm, and using symbolic labels for asm variables.
bhyve can now build/run fine when compiled with clang.
Reviewed by: neel
Obtained from: NetApp
%gs, when supported. Note that WRFSBASE and WRGSBASE are not very
useful on FreeBSD right now, because a return from the kernel mode to
userspace reloads the bases specified by the sysarch(2) syscall, most
likely.
Enable the Supervisor Mode Execution Prevention (SMEP) when
supported. Since the loader(8) performs hand-off to the kernel with
the page tables which contradict the SMEP, postpone enabling the SMEP
on BSP until pmap switched for the proper kernel tables.
Debugged with the help from: avg
Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 1 month
introduced with the IvyBridge CPUs. Provide the definitions for new
bits in CR3 and CR4 registers.
Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 2 weeks
to vmcs_getreg(). Without this conversion vmcs_getreg() will return EINVAL.
In particular this prevented injection of the breakpoint exception into the
guest via the "-B" option to /usr/sbin/bhyve which is hugely useful when
debugging guest hangs.
This was broken in r241921.
Pointy hat: me
Obtained from: NetApp
vm page allocators do. This fixes a panic when a virtio block
device is mounted as root, with the host system dying in
vm_page_dirty with invalid bits.
Reviewed by: neel
Obtained from: NetApp
guest does a vm exit.
This allows us to trap any fpu access in the host context while the fpu still
has "dirty" state belonging to the guest.
Reported by: "s vas" on freebsd-virtualization@
Obtained from: NetApp
host cpu to the scheduler until the guest is ready to run again.
This implies that the host cpu utilization will now closely mirror the actual
load imposed by the guest vcpu.
Also, the vcpu mutex now needs to be of type MTX_SPIN since we need to acquire
it inside a critical section.
Obtained from: NetApp
If an IPI was delivered to this cpu before interrupts were disabled
then return right away via vmx_setjmp() with a return value of VMX_RETURN_AST.
Obtained from: NetApp
AMD BKDG for CPU families 10h and later requires that the memory
mapped config is always read into or written from al/ax/eax register.
Discussed with: kib, alc
Reviewed by: kib (earlier version)
MFC after: 25 days
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.
Reviewed by: bde, kib
Discussed with: dim, theraven
MFC after: 2 weeks
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.
Reviewed by: marius
chunks. This breaks the assumption that the entire memory segment is
contiguously allocated in the host physical address space.
This also paves the way to satisfy the 4KB page allocations by requesting
free pages from the VM subsystem as opposed to hard-partitioning host memory
at boot time.
associated with guest physical memory is contiguous.
Add check to vm_gpa2hpa() that the range indicated by [gpa,gpa+len) is all
contained within a single 4KB page.