freebsd-dev

Author	SHA1	Message	Date
Peter Grehan	0a5e9bfb72	Fix issue found with clang build. Avoid code insertion by the compiler between inline asm statements that would in turn modify the flags value set by the first asm, and used by the second. Solve by making the common error block a string that can be pulled into the first inline asm, and using symbolic labels for asm variables. bhyve can now build/run fine when compiled with clang. Reviewed by: neel Obtained from: NetApp	2012-11-06 02:43:41 +00:00
Neel Natu	514393f565	Convert VMCS_ENTRY_INTR_INFO field into a vmcs identifier before passing it to vmcs_getreg(). Without this conversion vmcs_getreg() will return EINVAL. In particular this prevented injection of the breakpoint exception into the guest via the "-B" option to /usr/sbin/bhyve which is hugely useful when debugging guest hangs. This was broken in r241921. Pointy hat: me Obtained from: NetApp	2012-10-29 23:58:15 +00:00
Neel Natu	b01c203325	Corral all the host state associated with the virtual machine into its own file. This state is independent of the type of hardware assist used so there is really no need for it to be in Intel-specific code. Obtained from: NetApp	2012-10-29 01:51:24 +00:00
Peter Grehan	cda5bd7f19	Set the valid field of the newly allocated field as all other vm page allocators do. This fixes a panic when a virtio block device is mounted as root, with the host system dying in vm_page_dirty with invalid bits. Reviewed by: neel Obtained from: NetApp	2012-10-26 22:32:26 +00:00
Neel Natu	bd8572e0be	Unconditionally enable fpu emulation by setting CR0.TS in the host after the guest does a vm exit. This allows us to trap any fpu access in the host context while the fpu still has "dirty" state belonging to the guest. Reported by: "s vas" on freebsd-virtualization@ Obtained from: NetApp	2012-10-26 03:12:40 +00:00
Neel Natu	f76fc5d414	If the guest vcpu wants to idle then use that opportunity to relinquish the host cpu to the scheduler until the guest is ready to run again. This implies that the host cpu utilization will now closely mirror the actual load imposed by the guest vcpu. Also, the vcpu mutex now needs to be of type MTX_SPIN since we need to acquire it inside a critical section. Obtained from: NetApp	2012-10-25 04:29:21 +00:00
Neel Natu	ff6ec151e0	Hide the monitor/mwait instruction capability from the guest until we know how to properly intercept it. Obtained from: NetApp	2012-10-25 04:08:26 +00:00
Neel Natu	f352ff0ca8	Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner. Also add a stats counter to count the number of NMIs delivered per vcpu. Obtained from: NetApp	2012-10-24 02:54:21 +00:00
Neel Natu	eeefa4e4be	Test for AST pending with interrupts disabled right before entering the guest. If an IPI was delivered to this cpu before interrupts were disabled then return right away via vmx_setjmp() with a return value of VMX_RETURN_AST. Obtained from: NetApp	2012-10-23 02:20:42 +00:00
Neel Natu	2e25737a49	Calculate the number of host ticks until the next guest timer interrupt. This information will be used in conjunction with guest "HLT exiting" to yield the thread hosting the virtual cpu. Obtained from: NetApp	2012-10-20 08:23:05 +00:00
Peter Grehan	13ec93719a	Add the guest physical address and r/w/x bits to the paging exit in preparation for a rework of bhyve MMIO handling. Reviewed by: neel Obtained from: NetApp	2012-10-12 23:12:19 +00:00
Neel Natu	75dd336603	Provide per-vcpu locks instead of relying on a single big lock. This also gets rid of all the witness.watch warnings related to calling malloc(M_WAITOK) while holding a mutex. Reviewed by: grehan	2012-10-12 18:32:44 +00:00
Neel Natu	cdc5b9e7b1	Fix warnings generated by 'debug.witness.watch' during VM creation and destruction for calling malloc() with M_WAITOK while holding a mutex. Do not allow vmm.ko to be unloaded until all virtual machines are destroyed.	2012-10-11 19:39:54 +00:00
Neel Natu	f9d4f89e4d	Deliver the MSI to the correct guest virtual cpu. Prior to this change the MSI was being delivered unconditionally to vcpu 0 regardless of how the guest programmed the MSI delivery.	2012-10-11 19:28:07 +00:00
Neel Natu	7ce04d0ad9	Allocate memory pages for the guest from the host's free page queue. It is no longer necessary to hard-partition the memory between the host and guests at boot time.	2012-10-08 23:41:26 +00:00
Neel Natu	f7d51510f1	Change vm_malloc() to map pages in the guest physical address space in 4KB chunks. This breaks the assumption that the entire memory segment is contiguously allocated in the host physical address space. This also paves the way to satisfy the 4KB page allocations by requesting free pages from the VM subsystem as opposed to hard-partitioning host memory at boot time.	2012-10-04 02:27:14 +00:00
Neel Natu	4db4fb2c25	Get rid of assumptions in the hypervisor that the host physical memory associated with guest physical memory is contiguous. Add check to vm_gpa2hpa() that the range indicated by [gpa,gpa+len) is all contained within a single 4KB page.	2012-10-03 01:18:51 +00:00
Neel Natu	bda273f21e	Get rid of assumptions in the hypervisor that the host physical memory associated with guest physical memory is contiguous. Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested page tables.	2012-10-03 00:46:30 +00:00
Neel Natu	341f19c949	Get rid of assumptions in the hypervisor that the host physical memory associated with guest physical memory is contiguous. In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether or not the address range had already been allocated. Replace this instead with an explicit API 'vm_gpa_available()' that returns TRUE if a page is available for allocation in guest physical address space.	2012-09-29 01:15:45 +00:00
Neel Natu	70593114cd	Intel VT-x provides the length of the instruction at the time of the nested page table fault. Use this when fetching the instruction bytes from the guest memory. Also modify the lapic_mmio() API so that a decoded instruction is fed into it instead of having it fetch the instruction bytes from the guest. This is useful for hardware assists like SVM that provide the faulting instruction as part of the vmexit.	2012-09-27 00:27:58 +00:00
Neel Natu	73820fb0a4	Add an option "-a" to present the local apic in the XAPIC mode instead of the default X2APIC mode to the guest.	2012-09-26 00:06:17 +00:00
Neel Natu	a2da7af6bc	Add support for trapping MMIO writes to local apic registers and emulating them. The default behavior is still to present the local apic to the guest in the x2apic mode.	2012-09-25 22:31:35 +00:00
Neel Natu	e90273829b	Add ioctls to control the X2APIC capability exposed by the virtual machine to the guest. At the moment this simply sets the state in the 'vcpu' instance but there is no code that acts upon these settings.	2012-09-25 19:08:51 +00:00
Neel Natu	edf89256dd	Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an AP needs to be activated by spinning up an execution context for it. The local apic emulation is now completely done in the hypervisor and it will detect writes to the ICR_LO register that try to bring up the AP. In response to such writes it will return to userspace with an exit code of SPINUP_AP. Reviewed by: grehan	2012-09-25 02:33:25 +00:00
Neel Natu	98ed632c63	Stash the 'vm_exit' information in each 'struct vcpu'. There is no functional change at this time but this paves the way for vm exit handler functions to easily modify the exit reason going forward.	2012-09-24 19:32:24 +00:00
Neel Natu	2d3a73ed6d	Restructure the x2apic access code in preparation for supporting memory mapped access to the local apic. The vlapic code is now aware of the mode that the guest is using to access the local apic. Reviewed by: grehan@	2012-09-21 03:09:23 +00:00
Peter Grehan	177fd53318	Add sysctls to display the total and free amount of hard-wired mem for VMs # sysctl hw.vmm hw.vmm.mem_free: 2145386496 hw.vmm.mem_total: 2145386496 Submitted by: Takeshi HASEGAWA hasegaw at gmail com	2012-08-26 01:41:41 +00:00
Neel Natu	39c21c2db2	Force certain bits in %cr4 to be hard-wired to '1' or '0' from a guest's perspective. If we don't do this some guest OSes (e.g. Linux) will reset the CR4_VMXE bit in %cr4 with disastrous consequences. Reported by: grehan	2012-08-04 02:06:55 +00:00
Neel Natu	4bff7fad95	Verify that VMX operation has been enabled by BIOS before executing the VMXON instruction. Reported by "s vas" on freebsd-virtualization@	2012-07-25 00:21:16 +00:00
Peter Grehan	298379f7fb	Until the issue of how to handle guest XCR0 state is resolved, prevent CURRENT guests from hitting unhandled xsetbv exits by hiding the xsave/osxsave/avx cpuid2 bits.	2012-05-03 05:04:37 +00:00
Peter Grehan	cd942e0f25	MSI-x interrupt support for PCI pass-thru devices. Includes instruction emulation for memory r/w access. This opens the door for io-apic, local apic, hpet timer, and legacy device emulation. Submitted by: ryan dot berryhill at sandvine dot com Reviewed by: grehan Obtained from: Sandvine	2012-04-28 16:28:00 +00:00
Peter Grehan	38f1b189cd	IFC @ r234692 sys/amd64/include/cpufunc.h sys/amd64/include/fpu.h sys/amd64/amd64/fpu.c sys/amd64/vmm/vmm.c - Add API to allow vmm FPU state init/save/restore. FP stuff discussed with: kib	2012-04-26 07:52:28 +00:00
Ed Maste	2a5bbbe380	Remove duplicated license text.	2012-03-06 21:13:12 +00:00
Peter Grehan	608f97c359	Add support for running as a nested hypervisor under VMWare Fusion, on systems with VT-x/EPT (e.g. Sandybridge Macbooks). This will most likely work on VMWare Workstation8/Player4 as well. See the VMWare app note at: http://communities.vmware.com/docs/DOC-8970 Fusion doesn't propagate the PAT MSR auto save-restore entry/exit control bits. Deal with this by noting that fact and setting up the PAT MSR to essentially be a no-op - it is init'd to power-on default, and a software shadow copy maintained. Since it is treated as a no-op, o/s settings are essentially ignored. This may not give correct results, but since the hypervisor is running nested, a number of bets are already off. On a quad-core/HT-enabled 'MacBook8,2', nested VMs with 1/2/4 vCPUs were fired up. The more nested vCPUs the worse the performance, unless the VMs were started up in multiplexed mode where things worked perfectly up to the limit of 8 vCPUs. Reviewed by: neel	2011-12-24 19:39:02 +00:00
Neel Natu	14ddf164ba	Get rid of redundant initialization of 'dmask'. It was being re-initialized shortly afterwards.	2011-07-06 21:40:48 +00:00
Peter Grehan	a5615c9044	IFC @ r222830	2011-06-28 06:26:03 +00:00
John Baldwin	8b28761278	Some tweaks to the CPUID support: - Don't always pass the cpuid request to the current CPU as some nodes we will emulate purely in software. - Pass in the APIC ID of the virtual CPU so we can return the proper APIC ID. - Always report a completely flat topology with no SMT or multicore. - Report the CPUID2_HV feature and implement support for the 0x40000000 CPUID level. - Use existing constants from <machine/specialreg.h> when possible and use cpu_feature2 when checking for VMX support.	2011-06-02 14:04:07 +00:00
John Baldwin	b3996dd47c	Add a 'show vmcs' DDB command to dump state about the current CPU's current VMCS.	2011-06-02 13:49:19 +00:00
Neel Natu	ad54f37429	Fix a long standing bug in VMXCTX_GUEST_RESTORE(). There was an assumption by the "callers" of this macro that on "return" the %rsp will be pointing to the 'vmxctx'. The macro was not doing this and thus when trying to restore host state on an error from "vmlaunch" or "vmresume" we were treating the memory locations on the host stack as 'struct vmxctx'. This led to all sorts of weird bugs like double faults or invalid instruction faults. This bug is exposed by the -O2 option used to compile the kernel module. With the -O2 flag the compiler will optimize the following piece of code: int loopstart = 1; ... if (loopstart) { loopstart = 0; vmx_launch(); } else vmx_resume(); into this: vmx_launch(); Since vmx_launch() and vmx_resume() are declared to be __dead2 functions the compiler is free to do this. The compiler has no way to know that the functions return indirectly through vmx_setjmp(). This optimization in turn leads us to trigger the bug in VMXCTX_GUEST_RESTORE(). With this change we can boot a 8.1 guest on a 9.0 host. Reported by: jhb@	2011-05-20 03:23:09 +00:00
Neel Natu	3caf3beb5c	Avoid unnecessary sign extension when promoted to a 64-bit integer. This was benign because the interruption info field is a 32-bit quantity and the hardware guarantees that the upper 32-bits are all zeros. But it did make reading the objdump output very confusing.	2011-05-20 02:08:05 +00:00
Peter Grehan	1f3025e133	Changes to allow the GENERIC+bhye kernel built from this branch to run as a 1/2 CPU guest on an 8.1 bhyve host. bhyve/inout.c inout.h fbsdrun.c - Rather than exiting on accesses to unhandled i/o ports, emulate hardware by returning -1 on reads and ignoring writes to unhandled ports. Support the previous mode by allowing a 'strict' parameter to be set from the command line. The 8.1 guest kernel was vastly cut down from GENERIC and had no ISA devices. Booting GENERIC exposes a massive amount of random touching of i/o ports (hello syscons/vga/atkbdc). bhyve/consport.c dev/bvm/bvm_console.c - implement a simplistic signature for the bvm console by returning 'bv' for an inw on the port. Also, set the priority of the console to CN_REMOTE if the signature was returned. This works better in an environment where multiple consoles are in the kernel (hello syscons) bhyve/rtc.c - return 0 for the access to RTC_EQUIPMENT (yes, you syscons) amd64/vmm/x86.c x86.h - hide a bunch more CPUID leaf 1 bits from the guest to prevent cpufreq drivers from probing. The next step will be to move CPUID handling completely into user-space. This will allow the full spectrum of changes from presenting a lowest-common-denominator CPU type/feature set, to exposing (almost) everything that the host can support. Reviewed by: neel Obtained from: NetApp	2011-05-19 21:53:25 +00:00
John Baldwin	e22b232b0e	Enable handling of 1GB pages in the direct map since HEAD supports those. Submitted by: neel	2011-05-15 02:09:12 +00:00
John Baldwin	34a6b2d627	First cut at porting the kernel portions of 221828 and 221905 from the BHyVe reference branch to HEAD.	2011-05-14 20:35:01 +00:00
Peter Grehan	366f60834f	Import of bhyve hypervisor and utilities, part 1. vmm.ko - kernel module for VT-x, VT-d and hypervisor control bhyve - user-space sequencer and i/o emulation vmmctl - dump of hypervisor register state libvmm - front-end to vmm.ko chardev interface bhyve was designed and implemented by Neel Natu. Thanks to the following folk from NetApp who helped to make this available: Joe CaraDonna Peter Snyder Jeff Heller Sandeep Mann Steve Miller Brian Pawlowski	2011-05-13 04:54:01 +00:00

44 Commits