183456 Commits

Author SHA1 Message Date
neel
263c4acf84 Use the new userboot 'getenv' callback to set a couple of environment variables
in the guest.

The variables are: smbios.bios.vendor=BHYVE and boot_serial=1

The FreeBSD guest uses the "smbios.bios.vendor" environment variable to
detect whether or not it is running as a guest inside a hypervisor.

The "boot_serial=1" is temporary and will be dropped when bhyve can do VGA
emulation.

Obtained from:	NetApp
2012-11-06 21:48:45 +00:00
neel
9aedd7f40e Add a callback function to userboot.so to fetch a list of environment
variables and pass them to the kernel.

Bump up the userboot version to USERBOOT_VERSION_3. This takes into account
the bump to USERBOOT_VERSION_2 that has already happened in head (but not
propagated to this branch yet).

Reviewed by:	dfr@
Obtained from:	NetApp
2012-11-06 21:36:37 +00:00
grehan
091578815a Fix issue found with clang build. Avoid code insertion by the compiler
between inline asm statements that would in turn modify the flags
value set by the first asm, and used by the second.

Solve by making the common error block a string that can be pulled
into the first inline asm, and using symbolic labels for asm variables.

bhyve can now build/run fine when compiled with clang.

Reviewed by:	neel
Obtained from:	NetApp
2012-11-06 02:43:41 +00:00
grehan
f31f52c187 Change the thread name of the vCPU threads to contain the
name of the VM and the vCPU number. This helps hugely
when using top -H to identify what a VM is doing.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-31 19:17:55 +00:00
grehan
36e2485283 Exit if the requested num vCPUs exceeds the maximum rather
than waiting until AP bringup detects an out-of-range vCPU.

While here, fix all error output to use fprintf(stderr, ...

Reviewed by:	neel
Reported by:	@allanjude
2012-10-31 03:29:52 +00:00
neel
55bf9997b9 Teach FreeBSD to detect that it is a guest running inside BHyVe.
Reviewed by:	grehan
Obtained from:	NetApp
2012-10-30 03:03:37 +00:00
neel
aee862ac3f Convert VMCS_ENTRY_INTR_INFO field into a vmcs identifier before passing it
to vmcs_getreg(). Without this conversion vmcs_getreg() will return EINVAL.

In particular this prevented injection of the breakpoint exception into the
guest via the "-B" option to /usr/sbin/bhyve which is hugely useful when
debugging guest hangs.

This was broken in r241921.

Pointy hat: me
Obtained from:	NetApp
2012-10-29 23:58:15 +00:00
neel
9631d598cc Corral all the host state associated with the virtual machine into its own file.
This state is independent of the type of hardware assist used so there is
really no need for it to be in Intel-specific code.

Obtained from:	NetApp
2012-10-29 01:51:24 +00:00
neel
6a8b1eb583 Present the bvm dbgport to the guest only when explicitly requested via
the "-g" command line option.

Suggested by:	grehan
Obtained from:	NetApp
2012-10-27 22:58:02 +00:00
neel
1e4c5ce626 Probe for existence of the bvm debug port instead of just assuming that it is
always present.

Suggested by:	grehan
Obtained from:	NetApp
2012-10-27 22:54:23 +00:00
neel
86d868af7f Present the bvm console device to the guest only when explicitly requested via
the "-b" command line option.

Reviewed by:	grehan
Obtained from:	NetApp
2012-10-27 22:33:23 +00:00
neel
bd0ca87b04 Ignore PCI configuration accesses to all bus numbers other than PCI bus 0.
Obtained from:	NetApp
2012-10-27 02:39:08 +00:00
grehan
ed9d132c4e Rename vmmctl to bhyvectl. 'vmmctl' came from a pre-bhyve
internal codebase at NetApp. No need for it to have an
unrelated name to the other userspace utils.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-27 02:10:45 +00:00
grehan
dc37578ed2 Set the valid field of the newly allocated field as all other
vm page allocators do. This fixes a panic when a virtio block
device is mounted as root, with the host system dying in
vm_page_dirty with invalid bits.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-26 22:32:26 +00:00
grehan
1372a368e0 Remove mptable generation code from libvmmapi and move it to bhyve.
Firmware tables require too much knowledge of system configuration,
and it's difficult to pass that information in general terms to a library.
The upcoming ACPI work exposed this - it will also livein bhyve.

Also, remove code specific to NetApp from the mptable name, and remove
the -n option from bhyve.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-26 13:40:12 +00:00
neel
cbd59fc940 Unconditionally enable fpu emulation by setting CR0.TS in the host after the
guest does a vm exit.

This allows us to trap any fpu access in the host context while the fpu still
has "dirty" state belonging to the guest.

Reported by: "s vas" on freebsd-virtualization@
Obtained from:	NetApp
2012-10-26 03:12:40 +00:00
neel
bcb3589583 If the guest vcpu wants to idle then use that opportunity to relinquish the
host cpu to the scheduler until the guest is ready to run again.

This implies that the host cpu utilization will now closely mirror the actual
load imposed by the guest vcpu.

Also, the vcpu mutex now needs to be of type MTX_SPIN since we need to acquire
it inside a critical section.

Obtained from:	NetApp
2012-10-25 04:29:21 +00:00
neel
80aee5fb8a Hide the monitor/mwait instruction capability from the guest until we know how
to properly intercept it.

Obtained from:	NetApp
2012-10-25 04:08:26 +00:00
neel
f5d9223df5 Fix typo: host_rip -> host_rsp
Obtained from:	NetApp
2012-10-25 03:39:36 +00:00
neel
583a9ef76d Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner.
Also add a stats counter to count the number of NMIs delivered per vcpu.

Obtained from:	NetApp
2012-10-24 02:54:21 +00:00
neel
a74007510a Test for AST pending with interrupts disabled right before entering the guest.
If an IPI was delivered to this cpu before interrupts were disabled
then return right away via vmx_setjmp() with a return value of VMX_RETURN_AST.

Obtained from:	NetApp
2012-10-23 02:20:42 +00:00
neel
26dd051c2c Calculate the number of host ticks until the next guest timer interrupt.
This information will be used in conjunction with guest "HLT exiting" to
yield the thread hosting the virtual cpu.

Obtained from:	NetApp
2012-10-20 08:23:05 +00:00
grehan
beaad57fa0 Rework how guest MMIO regions are dealt with.
- New memory region interface. An RB tree holds the regions,
with a last-found per-vCPU cache to deal with the common case
of repeated guest accesses to MMIO registers in the same page.

- Support memory-mapped BARs in PCI emulation.

 mem.c/h - memory region interface

 instruction_emul.c/h - remove old region interface.
 Use gpa from EPT exit to avoid a tablewalk to
 determine operand address. Determine operand size
 and use when calling through to region handler.

 fbsdrun.c - call into region interface on paging
  exit. Distinguish between instruction emul error
  and region not found

 pci_emul.c/h - implement new BAR callback api.
 Split BAR alloc routine into routines that
 require/don't require the BAR phys address.

 ioapic.c
 pci_passthru.c
 pci_virtio_block.c
 pci_virtio_net.c
 pci_uart.c  - update to new BAR callback i/f

Reviewed by:	neel
Obtained from:	NetApp
2012-10-19 18:11:17 +00:00
grehan
8fb5b5f8de Add the guest physical address and r/w/x bits to
the paging exit in preparation for a rework of
bhyve MMIO handling.

Reviewed by:	neel
Obtained from:	NetApp
2012-10-12 23:12:19 +00:00
neel
4650e5d776 Deal with transient EBUSY error return from vm_run() by retrying the operation. 2012-10-12 18:49:07 +00:00
neel
e3e8a520e2 Provide per-vcpu locks instead of relying on a single big lock.
This also gets rid of all the witness.watch warnings related to calling
malloc(M_WAITOK) while holding a mutex.

Reviewed by:	grehan
2012-10-12 18:32:44 +00:00
neel
4829fce72f Output the value of all capabilities when the "--getcap" option is used without
a "--capname=<capname>". Do the same for the "--get-all" option.
2012-10-12 18:14:54 +00:00
neel
a62f9562ca Add an api to map a vm capability type into a string to be used for display
purposes.
2012-10-12 17:39:28 +00:00
neel
97c20149fa Fix warnings generated by 'debug.witness.watch' during VM creation and
destruction for calling malloc() with M_WAITOK while holding a mutex.

Do not allow vmm.ko to be unloaded until all virtual machines are destroyed.
2012-10-11 19:39:54 +00:00
neel
d09cf38e25 Deliver the MSI to the correct guest virtual cpu.
Prior to this change the MSI was being delivered unconditionally to vcpu 0
regardless of how the guest programmed the MSI delivery.
2012-10-11 19:28:07 +00:00
neel
364c9ec6f9 Grab the softc from the ACPI host-pci bridge device instead of from the pci
endpoint device.

Reviewed by:	jhb
2012-10-10 00:11:06 +00:00
neel
ca6e3cf930 Allocate memory pages for the guest from the host's free page queue.
It is no longer necessary to hard-partition the memory between the host
and guests at boot time.
2012-10-08 23:41:26 +00:00
grehan
89c25d5adf Clarify comment about default number of FICL dictionary cells.
Suggested by:	peterj
2012-10-04 03:59:45 +00:00
neel
09939583a7 The ioctl VM_GET_MEMORY_SEG is no longer able to return the host physical
address associated with the guest memory segment. This is because there is
no longer a 1:1 mapping between GPA and HPA.

As a result 'vmmctl' can only display the guest physical address and the
length of the lowmem and highmem segments.
2012-10-04 03:07:05 +00:00
neel
18dd2c0d51 Change vm_malloc() to map pages in the guest physical address space in 4KB
chunks. This breaks the assumption that the entire memory segment is
contiguously allocated in the host physical address space.

This also paves the way to satisfy the 4KB page allocations by requesting
free pages from the VM subsystem as opposed to hard-partitioning host memory
at boot time.
2012-10-04 02:27:14 +00:00
grehan
cdb0dba22b Allow the number of FICL dictionary cells to be overridden.
Loading a 7.3 ISO with userboot/amd64 takes up 10035 cells,
overflowing the long-standing default of 10000.

Bump userboot's value up to 15000 cells.
2012-10-03 04:22:39 +00:00
grehan
edf1984a8f Rework the GPT/MBR/raw policy so that it actually works, and navigates
around disk_open's current handling of falling back from GPT to MBR.

As in the previous commit, this should all be fixed in CURRENT.
2012-10-03 03:00:37 +00:00
grehan
8ad01bb5de Restore the ability to boot partitioned disks. The previous submit
broke that by forcing raw disks, due to the use of error returns
by userboot's initial disk opens.
2012-10-03 02:58:55 +00:00
neel
77ab4804ac Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Add check to vm_gpa2hpa() that the range indicated by [gpa,gpa+len) is all
contained within a single 4KB page.
2012-10-03 01:18:51 +00:00
neel
3e50e0220b Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested
page tables.
2012-10-03 00:46:30 +00:00
grehan
c17623c804 Fix the error return in disk_readslicetab() when an MBR/GPT partition
wasn't found, and use that in userdisk_open() to allow raw disks
and ISO images to be read.

This is a temporary fix - disk.c has changed a lot in CURRENT so this
code may be reworked or made redundant on the next IFC. It is useful
to be able to boot from CD in the meantime.
2012-10-02 04:41:43 +00:00
grehan
dd517bc793 Add cd9660 support to userboot to allow CD boot. 2012-10-02 04:36:37 +00:00
neel
bc87f08e98 Get rid of assumptions in the hypervisor that the host physical memory
associated with guest physical memory is contiguous.

In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether
or not the address range had already been allocated.

Replace this instead with an explicit API 'vm_gpa_available()' that returns
TRUE if a page is available for allocation in guest physical address space.
2012-09-29 01:15:45 +00:00
neel
b65259b285 Intel VT-x provides the length of the instruction at the time of the nested
page table fault. Use this when fetching the instruction bytes from the guest
memory.

Also modify the lapic_mmio() API so that a decoded instruction is fed into it
instead of having it fetch the instruction bytes from the guest. This is
useful for hardware assists like SVM that provide the faulting instruction
as part of the vmexit.
2012-09-27 00:27:58 +00:00
neel
5dbc1ca26a Add an option "-a" to present the local apic in the XAPIC mode instead of the
default X2APIC mode to the guest.
2012-09-26 00:06:17 +00:00
neel
bc269b51af Add support for trapping MMIO writes to local apic registers and emulating them.
The default behavior is still to present the local apic to the guest in the
x2apic mode.
2012-09-25 22:31:35 +00:00
neel
ebdd69568d Add ioctls to control the X2APIC capability exposed by the virtual machine to
the guest.

At the moment this simply sets the state in the 'vcpu' instance but there is
no code that acts upon these settings.
2012-09-25 19:08:51 +00:00
neel
c34be7b811 Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an
AP needs to be activated by spinning up an execution context for it.

The local apic emulation is now completely done in the hypervisor and it will
detect writes to the ICR_LO register that try to bring up the AP. In response
to such writes it will return to userspace with an exit code of SPINUP_AP.

Reviewed by: grehan
2012-09-25 02:33:25 +00:00
neel
34b672cc8a Stash the 'vm_exit' information in each 'struct vcpu'.
There is no functional change at this time but this paves the way for vm exit
handler functions to easily modify the exit reason going forward.
2012-09-24 19:32:24 +00:00
neel
c0caea8c2f Restructure the x2apic access code in preparation for supporting memory mapped
access to the local apic.

The vlapic code is now aware of the mode that the guest is using to access the
local apic.

Reviewed by: grehan@
2012-09-21 03:09:23 +00:00