Commit Graph

6494 Commits

Author SHA1 Message Date
neel
7d7f92fbad Prefer x2apic mode when running inside a virtual machine.
Provide a tunable 'machdep.x2apic_desired' to let the administrator override
the default behavior.

Provide a read-only sysctl 'machdep.x2apic' to let the administrator know
whether the kernel is using x2apic or legacy mmio to access local apic.

Tested with Parallels Desktop 8 and bhyve hypervisors.
Also tested running on bare metal Intel Xeon E5-2658.

Obtained from:	NetApp
Discussed with:	jhb, attilio, avg, grehan
2012-12-16 00:57:14 +00:00
neel
d8091074f2 IFC @r243836 2012-12-04 04:37:42 +00:00
kib
6b76c5a1b8 Print the frame addresses for the backtraces on i386 and amd64. It
allows both to inspect the frame sizes and to manually peek into the
frames from ddb, if needed.

Reviewed by:	dim
MFC after:	2 weeks
2012-12-03 22:16:51 +00:00
jkim
caa6f5a7e6 Remove duplicate code. Reduce diff between amd64 and i386. 2012-12-01 00:56:19 +00:00
jkim
9c50b706fb Use volatile keywords properly. 2012-11-30 20:15:01 +00:00
grehan
7f24aaf567 Properly screen for the AND 0x81 instruction from the set
of group1 0x81 instructions that use the reg bits as an
extended opcode.

Still todo: properly update rflags.

Pointed out by:	jilles@
2012-11-30 05:40:24 +00:00
jkim
5bf7bb816c Tidy up inline assembly. No functional change. 2012-11-30 00:59:37 +00:00
grehan
f596548906 Remove debug printf.
Pointed out by:	emaste
2012-11-29 15:08:13 +00:00
grehan
ffd1f089c3 Add support for the 0x81 AND instruction, now generated
by clang in the local APIC code.

0x81 is a read-modify-write instruction - the EPT check
that only allowed read or write and not both has been
relaxed to allow read and write.

Reviewed by:	neel
Obtained from:	NetApp
2012-11-29 06:26:42 +00:00
neel
da4e87dfd6 Cleanup the user-space paging exit handler now that the unified instruction
emulation is in place.

Obtained from:	NetApp
2012-11-28 13:34:44 +00:00
neel
308122a0f1 Change emulate_rdmsr() and emulate_wrmsr() to return 0 on sucess and errno on
failure. The conversion from the return value to HANDLED or UNHANDLED can be
done locally in vmx_exit_process().

Obtained from: NetApp
2012-11-28 13:10:18 +00:00
neel
36ab9a2e1a Revamp the x86 instruction emulation in bhyve.
On a nested page table fault the hypervisor will:
- fetch the instruction using the guest %rip and %cr3
- decode the instruction in 'struct vie'
- emulate the instruction in host kernel context for local apic accesses
- any other type of mmio access is punted up to user-space (e.g. ioapic)

The decoded instruction is passed as collateral to the user-space process
that is handling the PAGING exit.

The emulation code is fleshed out to include more addressing modes (e.g. SIB)
and more types of operands (e.g. imm8). The source code is unified into a
single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well
as /usr/sbin/bhyve.

Reviewed by:	grehan
Obtained from:	NetApp
2012-11-28 00:02:17 +00:00
neel
d8bfa0f575 Fix a bug in the MSI-X resource allocation for PCI passthrough devices.
In the case where the underlying host had disabled MSI-X via the
"hw.pci.enable_msix" tunable, the ppt_setup_msix() function would fail
and return an error without properly cleaning up. This in turn would
cause a page fault on the next boot of the guest.

Fix this by calling ppt_teardown_msix() in all the error return paths.

Obtained from:	NetApp
2012-11-22 04:07:18 +00:00
neel
575baa2d8a Get rid of redundant comparision which is guaranteed to be "true" for unsigned
integers.

Obtained from:	NetApp
2012-11-22 00:08:20 +00:00
grehan
5a600cdfe4 Handle CPUID leaf 0x7 now that FreeBSD is using it.
Return 0's for now.

Reviewed by:	neel
Obtained from:	NetApp
2012-11-20 06:01:03 +00:00
neel
0e18e1b9de IFC @ r243164 2012-11-17 02:55:47 +00:00
kib
bc5bfde14d Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by:	alc
MFC after:	2 weeks
2012-11-16 05:55:56 +00:00
kib
e8ae50d444 Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by:	"Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by:	alc, mdf (previous version)
Tested by:	pho (previous version)
MFC after:	2 weeks
2012-11-14 20:01:40 +00:00
neel
1a164db277 IFC @ r242940 2012-11-13 07:39:05 +00:00
neel
bc4be3dff1 IFC @ r242684 2012-11-11 03:26:14 +00:00
kib
3c8fa044cf Do not try to enable new features in the %cr4 if running under
hypervisor.  Apparently, hypervisors failed to filter out 'Standard
Extended Features' report from CPUID, but deliver #gp when
corresponding bit in %cr4 is toggled.

This shall be reconsidered later, after hypervisors correct the bug.

Reported and tested by:	joel
Reviewed by:	avg
MFC after:	2 weeks
2012-11-09 16:00:30 +00:00
grehan
091578815a Fix issue found with clang build. Avoid code insertion by the compiler
between inline asm statements that would in turn modify the flags
value set by the first asm, and used by the second.

Solve by making the common error block a string that can be pulled
into the first inline asm, and using symbolic labels for asm variables.

bhyve can now build/run fine when compiled with clang.

Reviewed by:	neel
Obtained from:	NetApp
2012-11-06 02:43:41 +00:00
attilio
f3501b109e Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by:	alc, jimharris
2012-11-03 23:03:14 +00:00
kib
888a8bb770 Enable the new instructions for reading and writing bases for %fs,
%gs, when supported.  Note that WRFSBASE and WRGSBASE are not very
useful on FreeBSD right now, because a return from the kernel mode to
userspace reloads the bases specified by the sysarch(2) syscall, most
likely.

Enable the Supervisor Mode Execution Prevention (SMEP) when
supported. Since the loader(8) performs hand-off to the kernel with
the page tables which contradict the SMEP, postpone enabling the SMEP
on BSP until pmap switched for the proper kernel tables.

Debugged with the help from:	avg
Tested by:	avg, Michael Moll <kvedulv@kvedulv.de>
MFC after:	1 month
2012-11-01 15:17:43 +00:00
kib
872b317d89 Provide the reading and display of the Standard Extended Features,
introduced with the IvyBridge CPUs.  Provide the definitions for new
bits in CR3 and CR4 registers.

Tested by:	avg, Michael Moll <kvedulv@kvedulv.de>
MFC after:	2 weeks
2012-11-01 15:14:37 +00:00
neel
aee862ac3f Convert VMCS_ENTRY_INTR_INFO field into a vmcs identifier before passing it
to vmcs_getreg(). Without this conversion vmcs_getreg() will return EINVAL.

In particular this prevented injection of the breakpoint exception into the
guest via the "-B" option to /usr/sbin/bhyve which is hugely useful when
debugging guest hangs.

This was broken in r241921.

Pointy hat: me
Obtained from:	NetApp
2012-10-29 23:58:15 +00:00
neel
9631d598cc Corral all the host state associated with the virtual machine into its own file.
This state is independent of the type of hardware assist used so there is
really no need for it to be in Intel-specific code.

Obtained from:	NetApp
2012-10-29 01:51:24 +00:00
grehan
dc37578ed2 Set the valid field of the newly allocated field as all other
vm page allocators do. This fixes a panic when a virtio block
device is mounted as root, with the host system dying in
vm_page_dirty with invalid bits.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-26 22:32:26 +00:00
neel
cbd59fc940 Unconditionally enable fpu emulation by setting CR0.TS in the host after the
guest does a vm exit.

This allows us to trap any fpu access in the host context while the fpu still
has "dirty" state belonging to the guest.

Reported by: "s vas" on freebsd-virtualization@
Obtained from:	NetApp
2012-10-26 03:12:40 +00:00
neel
bcb3589583 If the guest vcpu wants to idle then use that opportunity to relinquish the
host cpu to the scheduler until the guest is ready to run again.

This implies that the host cpu utilization will now closely mirror the actual
load imposed by the guest vcpu.

Also, the vcpu mutex now needs to be of type MTX_SPIN since we need to acquire
it inside a critical section.

Obtained from:	NetApp
2012-10-25 04:29:21 +00:00
neel
80aee5fb8a Hide the monitor/mwait instruction capability from the guest until we know how
to properly intercept it.

Obtained from:	NetApp
2012-10-25 04:08:26 +00:00
neel
583a9ef76d Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner.
Also add a stats counter to count the number of NMIs delivered per vcpu.

Obtained from:	NetApp
2012-10-24 02:54:21 +00:00
neel
a74007510a Test for AST pending with interrupts disabled right before entering the guest.
If an IPI was delivered to this cpu before interrupts were disabled
then return right away via vmx_setjmp() with a return value of VMX_RETURN_AST.

Obtained from:	NetApp
2012-10-23 02:20:42 +00:00
eadler
f3db91bee2 The 'testing memory' patch gets printed too many times
Approved by: cperciva (implicit)
2012-10-22 11:57:26 +00:00
eadler
0b45640988 Explain the upcoming delay by printing a message when the kernel
is about to begin testing memory.

Reviewed by:	dteske, adri
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:16:39 +00:00
neel
26dd051c2c Calculate the number of host ticks until the next guest timer interrupt.
This information will be used in conjunction with guest "HLT exiting" to
yield the thread hosting the virtual cpu.

Obtained from:	NetApp
2012-10-20 08:23:05 +00:00
kib
36119706ad Print the %rip value for uprintf_signal.
MFC after:	1 week
2012-10-14 17:08:46 +00:00
avg
5da136c22f pciereg_cfg*: use assembly to access the mem-mapped cfg space
AMD BKDG for CPU families 10h and later requires that the memory
mapped config is always read into or written from al/ax/eax register.

Discussed with:	kib, alc
Reviewed by:	kib (earlier version)
MFC after:	25 days
2012-10-14 10:13:50 +00:00
grehan
8fb5b5f8de Add the guest physical address and r/w/x bits to
the paging exit in preparation for a rework of
bhyve MMIO handling.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-12 23:12:19 +00:00
neel
e3e8a520e2 Provide per-vcpu locks instead of relying on a single big lock.
This also gets rid of all the witness.watch warnings related to calling
malloc(M_WAITOK) while holding a mutex.

Reviewed by:	grehan
2012-10-12 18:32:44 +00:00
neel
97c20149fa Fix warnings generated by 'debug.witness.watch' during VM creation and
destruction for calling malloc() with M_WAITOK while holding a mutex.

Do not allow vmm.ko to be unloaded until all virtual machines are destroyed.
2012-10-11 19:39:54 +00:00
neel
d09cf38e25 Deliver the MSI to the correct guest virtual cpu.
Prior to this change the MSI was being delivered unconditionally to vcpu 0
regardless of how the guest programmed the MSI delivery.
2012-10-11 19:28:07 +00:00
kevlo
ceb08698f2 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
attilio
6997194551 Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
attilio
3212891c92 Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by:	marius
2012-10-09 12:22:43 +00:00
kevlo
8747a46991 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
neel
ca6e3cf930 Allocate memory pages for the guest from the host's free page queue.
It is no longer necessary to hard-partition the memory between the host
and guests at boot time.
2012-10-08 23:41:26 +00:00
neel
18dd2c0d51 Change vm_malloc() to map pages in the guest physical address space in 4KB
chunks. This breaks the assumption that the entire memory segment is
contiguously allocated in the host physical address space.

This also paves the way to satisfy the 4KB page allocations by requesting
free pages from the VM subsystem as opposed to hard-partitioning host memory
at boot time.
2012-10-04 02:27:14 +00:00
neel
77ab4804ac Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Add check to vm_gpa2hpa() that the range indicated by [gpa,gpa+len) is all
contained within a single 4KB page.
2012-10-03 01:18:51 +00:00
neel
3e50e0220b Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested
page tables.
2012-10-03 00:46:30 +00:00