226 Commits

Author SHA1 Message Date
tychon
f560130808 Fix a recursive lock acquisition in vi_reset_dev().
Reviewed by:	grehan
2014-08-22 13:01:22 +00:00
neel
253faed0cb Minor cleanup:
- Set 'pirq_cold' to '0' on the first PIRQ allocation.
- Make assertions stronger.

Reviewed by:	jhb
CR:		https://phabric.freebsd.org/D592
2014-08-13 00:14:26 +00:00
neel
6e79e6c83b Support PCI extended config space in bhyve.
Add the ACPI MCFG table to advertise the extended config memory window.

Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.

Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.

CR:		https://phabric.freebsd.org/D505
Reviewed by:	jhb, grehan
Requested by:	tychon
2014-08-08 03:49:01 +00:00
tychon
5b3314e950 Commands which encounter a fatal error shouldn't be marked as completed.
Furthermore, provide an indication of the current command so it can be
determined which one actually failed.

Reviewed by:	grehan
2014-07-30 18:47:31 +00:00
neel
c1611aec98 Simplify the meaning of return values from the inout handlers. After this
change 0 means success and non-zero means failure.

This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values
from VM-exit handlers.

CR:		D480
Reviewed by:	grehan, jhb
2014-07-25 20:18:35 +00:00
neel
570f3bede5 Reduce the proliferation of VMEXIT_RESTART in task_switch.c.
This is in preparation for further simplification of the return values from
VM exit handlers in bhyve(8).
2014-07-24 05:31:57 +00:00
neel
4535fa67c4 Fix fault injection in bhyve.
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
2014-07-24 01:38:11 +00:00
neel
e972917c13 Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m
2014-07-23 04:28:51 +00:00
neel
1f15eea2e0 Handle nested exceptions in bhyve.
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
2014-07-19 20:59:08 +00:00
neel
5046d9cb8a Add emulation for legacy x86 task switching mechanism.
FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by:	glebius
2014-07-16 21:26:26 +00:00
grehan
bffc595f8b Use the blockif CHS routine to create fake CHS values,
and then populate them in the identity page.

This fixes a divide-by-zero error at probe time with NetBSD.

MFC after:	1 week.
2014-07-15 00:27:08 +00:00
grehan
4632b82c93 Add a call to synthesize a C/H/S value for block emulations
that require it (ahci). The algorithm used is from the VHD
specification.
2014-07-15 00:25:54 +00:00
grehan
08d74d1482 Extend capabilities to 64-bits in preparation for some API changes.
The v1.0 virtio spec supports an extended size for guest/host
caps, but in practice 64-bits should last for a long time.
2014-07-05 02:38:53 +00:00
grehan
60ac0781e8 Use correct flag for event index.
Submitted by:	luigi
Obtained from:	Vincenzo Maffione, Universita` di Pisa
MFC after:	1 week
2014-07-03 00:23:14 +00:00
neel
410cd82729 Add post-mortem debugging for "EPT Misconfiguration" VM-exit. This error
is hard to reproduce so try to collect all the breadcrumbs when it happens.

Reviewed by:	grehan
2014-06-27 18:00:38 +00:00
jhb
63fdb1686c Sort command flags in usage output and the manpages. 2014-06-27 15:20:34 +00:00
grehan
80a89f9553 Set the version and date to fixed fields rather than using
preprocessor macros that don't allow reproducible builds.
As a side-effect, the date string is now spec-compliant.

root@bhyve:~ # dmidecode
# dmidecode 2.12
SMBIOS 2.4 present.
12 structures occupying 514 bytes.
Table at 0x000F101F.

Handle 0x0001, DMI type 0, 24 bytes
BIOS Information
        Vendor: BHYVE
        Version: 1.0
        Release Date: 03/14/2014

Submitted by:	des (original version)
Reviewed by:	tychon
MFC after:	1 week
2014-06-27 05:27:37 +00:00
jhb
4cd3ac2dee - Document -b to enable the bvmcons console (but mark it as deprecated
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
  option).
2014-06-26 20:12:38 +00:00
neel
921bfc7679 Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
2014-06-24 02:02:51 +00:00
bapt
c0cd28f928 use .Mt to mark up email addresses consistently (part2)
PR:		191174
Submitted by:	Franco Fichtner  <franco@lastsummer.de>
2014-06-20 09:57:27 +00:00
neel
f48544ce4a Fix typo and rename macro KDB_SYS_FLAG to KBD_SYS_FLAG.
Reviewed by:	tychon
2014-06-18 17:20:02 +00:00
tychon
ed75890094 r267169 should apply to 64-bit BARs as well.
Reviewed by:	neel
2014-06-09 19:55:50 +00:00
joel
b6e42ddcb7 Remove blank lines. 2014-06-09 19:29:10 +00:00
tychon
ad10fea609 Some devices (e.g. Intel AHCI and NICs) support quad-word access to
register pairs where two 32-bit registers make up a larger logical
size.  Support those access by splitting the quad-word into two
double-words.

Reviewed by:	grehan
2014-06-06 16:18:37 +00:00
neel
a09b56e60a Use MIN(a,b) from <sys/param.h> instead of rolling our own version.
Pointed out by:	grehan
2014-06-01 02:47:09 +00:00
neel
a6503bd2d6 Limit the maximum number of back-to-back iterations of a "rep; ins/outs"
to 16. This is arbitrary and is used to ensure that a vcpu goes back into
the vm_run() loop to process interrupts or rendezvous events in a timely
fashion.

Found with:	Coverity Scan
CID:		1216436
2014-06-01 02:13:07 +00:00
neel
9c2a942387 Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
2014-05-31 23:37:34 +00:00
neel
4b40e47cf8 Add segment protection and limits violation checks in vie_calculate_gla()
for 32-bit x86 guests.

Tested using ins/outs executed in a FreeBSD/i386 guest.
2014-05-27 04:26:22 +00:00
neel
31a1b31ef2 Fix issue with restarting an "insb/insw/insl" instruction because of a page
fault on the destination buffer.

Prior to this change a page fault would be detected in vm_copyout(). This
was done after the I/O port access was done. If the I/O port access had
side-effects (e.g. reading the uart FIFO) then restarting the instruction
would result in incorrect behavior.

Fix this by validating the guest linear address before doing the I/O port
emulation. If the validation results in a page fault exception being injected
into the guest then the instruction can now be restarted without any
side-effects.
2014-05-26 18:21:08 +00:00
neel
ffc6a38259 Do the linear address calculation for the ins/outs emulation using a new
API function 'vie_calculate_gla()'.

While the current implementation is simplistic it forms the basis of doing
segmentation checks if the guest is in 32-bit protected mode.
2014-05-25 00:57:24 +00:00
neel
51a05acc08 Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).
2014-05-24 23:12:30 +00:00
neel
6a6e13c407 Consolidate all the information needed by the guest page table walker into
'struct vm_guest_paging'.

Check for canonical addressing in vmm_gla2gpa() and inject a protection
fault into the guest if a violation is detected.

If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to
point to the root of the page tables.
2014-05-24 20:26:57 +00:00
neel
2ccda87aca Check for alignment check violation when processing in/out string instructions. 2014-05-23 19:59:14 +00:00
neel
8f99933d82 Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by:	grehan
Reviewed by:	tychon (partially and a much earlier version)
2014-05-23 05:15:17 +00:00
jhb
f558af85b7 Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
  8 steerable pins configured via config space access to byte-wide
  registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
  pin and a PCI interrupt router pin.  When a PCI INTx interrupt is
  asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
  8259A pins (ISA IRQs) and initialize the interrupt line config register
  for the corresponding PCI function with the ISA IRQ as this matches
  existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
  configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
  PRT tables and return the appropriate table based on the configured
  routing configuration.  Note that if the lpc device is not configured, no
  routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
  to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
  pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
  the DSDT.  iasl(8) will fill in the actual number of elements, and
  this makes it simpler to generate a Package with a variable number of
  elements.

Reviewed by:	tycho
2014-05-15 14:16:55 +00:00
neel
f14c076ec7 Don't include the guest memory segments in the bhyve(8) process core dump.
This has not added a lot of value when debugging bhyve issues while greatly
increasing the time and space required to store the core file.

Passing the "-C" option to bhyve(8) will change the default and dump guest
memory in the core dump.

Requested by:	grehan
Reviewed by:	grehan
2014-05-13 16:40:27 +00:00
neel
c0a09660e6 abort(3) the process in response to a VMEXIT_ABORT. This usually happens in
response to an unhandled VM exit or an unexpected error so a core is useful.

Remove unused macro VMEXIT_SWITCH.

Reviewed by:	grehan
2014-05-12 23:35:10 +00:00
neel
aa70881c34 Disable the 'uart_drain()' callback when the emulated receive FIFO is full.
Failing to do this will cause the kevent(2) notification to trigger
continuously and the bhyve(8) mevent thread will hog the cpu until the
characters on the backend tty device are drained.

Also, make the uart backend file descriptor non-blocking to avoid a
select(2) before every byte read from that backend.

Reviewed by:	grehan
2014-05-05 23:54:13 +00:00
neel
193c3dd0ca Modify the "-p" option to be more flexible when associating a 'vcpu' with
a 'hostcpu'. The new format of the argument string is "vcpu:hostcpu".

This allows pinning a subset of the vcpus if desired.

It also allows pinning a vcpu to more than a single 'hostcpu'.

Submitted by:	novel (initial version)
2014-05-05 18:06:35 +00:00
neel
92337e8cda Remove misleading "addcpu" in an error message emitted by fbsdrun_deletecpu().
Pointed out by:	novel
2014-05-05 16:35:37 +00:00
neel
08f11e3e21 Re-adding an event to a kqueue modifies the parameters of the original event.
However, if the original knote had been disabled then it is not automatically
re-enabled.

Fix this by using EV_ADD to create an mevent and EV_ENABLE to enable it.

Adding a kevent for the first time implicitly enables it so existing callers
of mevent_add() don't need to change.

Reviewed by:	grehan
2014-05-05 16:30:03 +00:00
neel
d89ccd0492 Don't allow MPtable generation if there are multiple PCI hierarchies. This is
because there isn't a standard way to relay this information to the guest OS.

Add a command line option "-Y" to bhyve(8) to inhibit MPtable generation.

If the virtual machine is using PCI devices on buses other than 0 then it can
still use ACPI tables to convey this information to the guest.

Discussed with:	grehan@
2014-05-02 04:51:31 +00:00
neel
b735ae5b9a Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

Tested by executing "halt" inside a RHEL7-beta guest.

Discussed with:	grehan@
Reviewed by:	jhb@, tychon@
2014-05-02 00:33:56 +00:00
neel
0601994645 Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
2014-04-30 02:08:27 +00:00
neel
9c85092013 Some Linux guests will implement a 'halt' by disabling the APIC and executing
the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and
converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit
the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was
spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET).

This functionality was broken in r263780 in a way that made it impossible
to kill the bhyve(8) process because it would loop forever in
vm_handle_suspend().

Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from
a Linux guest will appear to be hung but this is consistent with the
behavior on bare metal. The guest can be rebooted by using the bhyvectl
options '--force-reset' or '--force-poweroff'.

Reviewed by:	grehan@
2014-04-29 18:42:56 +00:00
neel
b616a9a2e4 Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with:	grehan@
2014-04-28 22:06:40 +00:00
grehan
165205a041 Implement legacy interrupts for the AHCI device emulation
according to the method outlined in the AHCI spec.

Tested with FreeBSD 9/10/11 with MSI disabled,
and also NetBSD/amd64 (lightly).

Reviewed by:	neel, tychon
MFC after:	3 weeks
2014-04-28 18:41:25 +00:00
grehan
2405704197 Respect and track the enable bit in the PCI configuration address word.
Ignore writes, and return 0xff's, on config accesses when not set.
Behaviour now matches that seen on h/w.

Found with a NetBSD/amd64 guest.

Reviewed by:	tychon
MFC after:	3 weeks
2014-04-25 17:35:34 +00:00
tychon
6c68fcc495 Provide a very basic stub for the 8042 PS/2 keyboard controller.
Reviewed by:	jhb
Approved by:	neel (co-mentor)
2014-04-25 13:38:18 +00:00
delphij
4436d2d38c Use calloc() in favor of malloc + memset.
Reviewed by:	neel
2014-04-22 18:55:21 +00:00