Commit Graph

293 Commits

Author SHA1 Message Date
Neel Natu
5295c3e61d Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
- MSR_PLATFORM_INFO
- MSR_TURBO_RATIO_LIMITx
- MSR_RAPL_POWER_UNIT

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:13:33 +00:00
Neel Natu
02c282e862 iasl(8) expects integer fields in data tables to be specified as hexadecimal
values. Therefore the bit width of the "PM Timer Block" was actually being
interpreted as 50-bits instead of the expected 32-bit.

This eliminates an error message emitted by a Linux 3.17 guest during boot:
"Invalid length for FADT/PmTimerBlock: 50, using default 32"

Reviewed by:	grehan
MFC after:	1 week
2014-10-09 19:02:32 +00:00
Neel Natu
8ccb28efcd Implement the FLUSH operation in the virtio-block emulation.
This gets rid of the following error message during FreeBSD guest bootup:
"vtbd0: hard error cmd=flush fsbn 0"

Reported by:	rodrigc
Reviewed by:	grehan
2014-10-07 17:08:53 +00:00
Neel Natu
107af8f2ed IFC @r272481 2014-10-05 01:28:21 +00:00
Peter Grehan
8b58e6af3c Add new fields in the FADT, required by IASL 20140926-64.
The new IASL from the recent acpi-ca import will error out
if it doesn't see these new fields, which were previously
reserved.

Reported by:	lme
Reviewed by:	neel
2014-10-03 17:27:30 +00:00
Neel Natu
970388bf8d IFC @r272185 2014-09-27 22:15:50 +00:00
Peter Grehan
5ed6ab5baa Correct display of bhyve SMBIOS UUIDs with dmidecode by bumping the version.
The mixed little/big-endianness of SMBIOS UUIDs was clarified in v2.6
of the SMBIOS spec. dmidecode uses the reported version of SMBIOS to
determine the layout and what to byte-swap.

bhyve's SMBIOS reported as 2.4 though it implemented the 2.6-style of
memory layout. This resulted in dmidecode reporting a different
UUID than one passed in via the -U option.

Fix by exporting a version of 2.6.

Reviewed by:	tychon
Reported by:	julian
MFC after:	1 day
2014-09-23 01:17:22 +00:00
Neel Natu
8f02c5e456 IFC r271888.
Restructure MSR emulation so it is all done in processor-specific code.
2014-09-20 21:46:31 +00:00
Neel Natu
b6cf6c8ca6 IFC @r271887 2014-09-20 06:27:37 +00:00
Neel Natu
c3498942a5 Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
  switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
  before returning. This is an optimization for MSRs that are not used in
  host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
  the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by:	grehan
2014-09-20 02:35:21 +00:00
Neel Natu
4e27d36d38 IFC @r271694 2014-09-17 18:46:51 +00:00
Glen Barber
7fca1ad503 Update the bhyve(8) manual to reflect that it is no
longer considered 'experimental.'

Reviewed by:	grehan
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2014-09-17 16:45:20 +00:00
Neel Natu
bbadcde418 Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by:	Anish Gupta (akgupt3@gmail.com)
Reviewed by:	grehan
2014-09-14 04:39:04 +00:00
Neel Natu
1aba8e7ff8 Initialize 'bc_rdonly' to the right value.
Note that independent of this change a readonly disk file would still be
opened O_RDONLY and protected from writes by the guest.

Reviewed by:	grehan
2014-09-11 21:15:20 +00:00
Peter Grehan
82560f19d0 Allow vtnet operation without merged rx buffers.
NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.

Tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.

Reviewed by:	neel, tychon
Phabric:	D745
MFC after:	3 days
2014-09-09 22:35:02 +00:00
Peter Grehan
e18f344b9b Add a callback to be notified about negotiated features.
Submitted by:	luigi
Obtained from:	Vincenzo Maffione, Universita` di Pisa
MFC after:	3 days
2014-09-09 04:11:54 +00:00
Neel Natu
04da7226c4 Set the 'inst_length' to '0' early on before any error conditions are detected
in the emulation of the task switch. If any exceptions are triggered then the
guest %rip should point to instruction that caused the task switch as opposed
to the one after it.
2014-08-30 18:35:16 +00:00
Tycho Nightingale
b297e71ede Fix a recursive lock acquisition in vi_reset_dev().
Reviewed by:	grehan
2014-08-22 13:01:22 +00:00
Neel Natu
33424543f2 Minor cleanup:
- Set 'pirq_cold' to '0' on the first PIRQ allocation.
- Make assertions stronger.

Reviewed by:	jhb
CR:		https://phabric.freebsd.org/D592
2014-08-13 00:14:26 +00:00
Neel Natu
12a6eb99a1 Support PCI extended config space in bhyve.
Add the ACPI MCFG table to advertise the extended config memory window.

Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.

Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.

CR:		https://phabric.freebsd.org/D505
Reviewed by:	jhb, grehan
Requested by:	tychon
2014-08-08 03:49:01 +00:00
Tycho Nightingale
42404fae46 Commands which encounter a fatal error shouldn't be marked as completed.
Furthermore, provide an indication of the current command so it can be
determined which one actually failed.

Reviewed by:	grehan
2014-07-30 18:47:31 +00:00
Neel Natu
afd5e8ba88 Simplify the meaning of return values from the inout handlers. After this
change 0 means success and non-zero means failure.

This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values
from VM-exit handlers.

CR:		D480
Reviewed by:	grehan, jhb
2014-07-25 20:18:35 +00:00
Neel Natu
e84d8ebfcc Reduce the proliferation of VMEXIT_RESTART in task_switch.c.
This is in preparation for further simplification of the return values from
VM exit handlers in bhyve(8).
2014-07-24 05:31:57 +00:00
Neel Natu
d37f2adb38 Fix fault injection in bhyve.
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
2014-07-24 01:38:11 +00:00
Neel Natu
d665d229ce Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m
2014-07-23 04:28:51 +00:00
Neel Natu
091d453222 Handle nested exceptions in bhyve.
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
2014-07-19 20:59:08 +00:00
Neel Natu
3d5444c864 Add emulation for legacy x86 task switching mechanism.
FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by:	glebius
2014-07-16 21:26:26 +00:00
Peter Grehan
ad15140ee7 Use the blockif CHS routine to create fake CHS values,
and then populate them in the identity page.

This fixes a divide-by-zero error at probe time with NetBSD.

MFC after:	1 week.
2014-07-15 00:27:08 +00:00
Peter Grehan
c4813fadf1 Add a call to synthesize a C/H/S value for block emulations
that require it (ahci). The algorithm used is from the VHD
specification.
2014-07-15 00:25:54 +00:00
Peter Grehan
18e32ebc89 Extend capabilities to 64-bits in preparation for some API changes.
The v1.0 virtio spec supports an extended size for guest/host
caps, but in practice 64-bits should last for a long time.
2014-07-05 02:38:53 +00:00
Peter Grehan
f23a8ac1b9 Use correct flag for event index.
Submitted by:	luigi
Obtained from:	Vincenzo Maffione, Universita` di Pisa
MFC after:	1 week
2014-07-03 00:23:14 +00:00
Neel Natu
64fe72354c Add post-mortem debugging for "EPT Misconfiguration" VM-exit. This error
is hard to reproduce so try to collect all the breadcrumbs when it happens.

Reviewed by:	grehan
2014-06-27 18:00:38 +00:00
John Baldwin
cde1f5b8a0 Sort command flags in usage output and the manpages. 2014-06-27 15:20:34 +00:00
Peter Grehan
62f17e92fe Set the version and date to fixed fields rather than using
preprocessor macros that don't allow reproducible builds.
As a side-effect, the date string is now spec-compliant.

root@bhyve:~ # dmidecode
# dmidecode 2.12
SMBIOS 2.4 present.
12 structures occupying 514 bytes.
Table at 0x000F101F.

Handle 0x0001, DMI type 0, 24 bytes
BIOS Information
        Vendor: BHYVE
        Version: 1.0
        Release Date: 03/14/2014

Submitted by:	des (original version)
Reviewed by:	tychon
MFC after:	1 week
2014-06-27 05:27:37 +00:00
John Baldwin
5749449d9b - Document -b to enable the bvmcons console (but mark it as deprecated
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
  option).
2014-06-26 20:12:38 +00:00
Neel Natu
be679db4cd Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
2014-06-24 02:02:51 +00:00
Baptiste Daroussin
01c2b8ac0d use .Mt to mark up email addresses consistently (part2)
PR:		191174
Submitted by:	Franco Fichtner  <franco@lastsummer.de>
2014-06-20 09:57:27 +00:00
Neel Natu
79aad80d3c Fix typo and rename macro KDB_SYS_FLAG to KBD_SYS_FLAG.
Reviewed by:	tychon
2014-06-18 17:20:02 +00:00
Tycho Nightingale
67b6ffaad6 r267169 should apply to 64-bit BARs as well.
Reviewed by:	neel
2014-06-09 19:55:50 +00:00
Joel Dahl
087129d22c Remove blank lines. 2014-06-09 19:29:10 +00:00
Tycho Nightingale
b6ae8b050b Some devices (e.g. Intel AHCI and NICs) support quad-word access to
register pairs where two 32-bit registers make up a larger logical
size.  Support those access by splitting the quad-word into two
double-words.

Reviewed by:	grehan
2014-06-06 16:18:37 +00:00
Neel Natu
26cdcdbebb Use MIN(a,b) from <sys/param.h> instead of rolling our own version.
Pointed out by:	grehan
2014-06-01 02:47:09 +00:00
Neel Natu
0be3798af5 Limit the maximum number of back-to-back iterations of a "rep; ins/outs"
to 16. This is arbitrary and is used to ensure that a vcpu goes back into
the vm_run() loop to process interrupts or rendezvous events in a timely
fashion.

Found with:	Coverity Scan
CID:		1216436
2014-06-01 02:13:07 +00:00
Neel Natu
95ebc360ef Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.
2014-05-31 23:37:34 +00:00
Neel Natu
65ffa035a7 Add segment protection and limits violation checks in vie_calculate_gla()
for 32-bit x86 guests.

Tested using ins/outs executed in a FreeBSD/i386 guest.
2014-05-27 04:26:22 +00:00
Neel Natu
6303b65d35 Fix issue with restarting an "insb/insw/insl" instruction because of a page
fault on the destination buffer.

Prior to this change a page fault would be detected in vm_copyout(). This
was done after the I/O port access was done. If the I/O port access had
side-effects (e.g. reading the uart FIFO) then restarting the instruction
would result in incorrect behavior.

Fix this by validating the guest linear address before doing the I/O port
emulation. If the validation results in a page fault exception being injected
into the guest then the instruction can now be restarted without any
side-effects.
2014-05-26 18:21:08 +00:00
Neel Natu
5382c19d81 Do the linear address calculation for the ins/outs emulation using a new
API function 'vie_calculate_gla()'.

While the current implementation is simplistic it forms the basis of doing
segmentation checks if the guest is in 32-bit protected mode.
2014-05-25 00:57:24 +00:00
Neel Natu
da11f4aa1d Add libvmmapi functions vm_copyin() and vm_copyout() to copy into and out
of the guest linear address space. These APIs in turn use a new ioctl
'VM_GLA2GPA' to convert the guest linear address to guest physical.

Use the new copyin/copyout APIs when emulating ins/outs instruction in
bhyve(8).
2014-05-24 23:12:30 +00:00
Neel Natu
e813a87350 Consolidate all the information needed by the guest page table walker into
'struct vm_guest_paging'.

Check for canonical addressing in vmm_gla2gpa() and inject a protection
fault into the guest if a violation is detected.

If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to
point to the root of the page tables.
2014-05-24 20:26:57 +00:00
Neel Natu
a7424861fb Check for alignment check violation when processing in/out string instructions. 2014-05-23 19:59:14 +00:00
Neel Natu
d17b5104a9 Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by:	grehan
Reviewed by:	tychon (partially and a much earlier version)
2014-05-23 05:15:17 +00:00
John Baldwin
b3e9732a76 Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
  8 steerable pins configured via config space access to byte-wide
  registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
  pin and a PCI interrupt router pin.  When a PCI INTx interrupt is
  asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
  8259A pins (ISA IRQs) and initialize the interrupt line config register
  for the corresponding PCI function with the ISA IRQ as this matches
  existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
  configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
  PRT tables and return the appropriate table based on the configured
  routing configuration.  Note that if the lpc device is not configured, no
  routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
  to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
  pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
  the DSDT.  iasl(8) will fill in the actual number of elements, and
  this makes it simpler to generate a Package with a variable number of
  elements.

Reviewed by:	tycho
2014-05-15 14:16:55 +00:00
Neel Natu
0dd10c0047 Don't include the guest memory segments in the bhyve(8) process core dump.
This has not added a lot of value when debugging bhyve issues while greatly
increasing the time and space required to store the core file.

Passing the "-C" option to bhyve(8) will change the default and dump guest
memory in the core dump.

Requested by:	grehan
Reviewed by:	grehan
2014-05-13 16:40:27 +00:00
Neel Natu
ee2dbd023c abort(3) the process in response to a VMEXIT_ABORT. This usually happens in
response to an unhandled VM exit or an unexpected error so a core is useful.

Remove unused macro VMEXIT_SWITCH.

Reviewed by:	grehan
2014-05-12 23:35:10 +00:00
Neel Natu
2bd073e13b Disable the 'uart_drain()' callback when the emulated receive FIFO is full.
Failing to do this will cause the kevent(2) notification to trigger
continuously and the bhyve(8) mevent thread will hog the cpu until the
characters on the backend tty device are drained.

Also, make the uart backend file descriptor non-blocking to avoid a
select(2) before every byte read from that backend.

Reviewed by:	grehan
2014-05-05 23:54:13 +00:00
Neel Natu
9b6155a20c Modify the "-p" option to be more flexible when associating a 'vcpu' with
a 'hostcpu'. The new format of the argument string is "vcpu:hostcpu".

This allows pinning a subset of the vcpus if desired.

It also allows pinning a vcpu to more than a single 'hostcpu'.

Submitted by:	novel (initial version)
2014-05-05 18:06:35 +00:00
Neel Natu
067824256f Remove misleading "addcpu" in an error message emitted by fbsdrun_deletecpu().
Pointed out by:	novel
2014-05-05 16:35:37 +00:00
Neel Natu
09fd42cb88 Re-adding an event to a kqueue modifies the parameters of the original event.
However, if the original knote had been disabled then it is not automatically
re-enabled.

Fix this by using EV_ADD to create an mevent and EV_ENABLE to enable it.

Adding a kevent for the first time implicitly enables it so existing callers
of mevent_add() don't need to change.

Reviewed by:	grehan
2014-05-05 16:30:03 +00:00
Neel Natu
b100acf254 Don't allow MPtable generation if there are multiple PCI hierarchies. This is
because there isn't a standard way to relay this information to the guest OS.

Add a command line option "-Y" to bhyve(8) to inhibit MPtable generation.

If the virtual machine is using PCI devices on buses other than 0 then it can
still use ACPI tables to convey this information to the guest.

Discussed with:	grehan@
2014-05-02 04:51:31 +00:00
Neel Natu
e50ce2aa06 Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

Tested by executing "halt" inside a RHEL7-beta guest.

Discussed with:	grehan@
Reviewed by:	jhb@, tychon@
2014-05-02 00:33:56 +00:00
Neel Natu
2cb97c9dd6 Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
2014-04-30 02:08:27 +00:00
Neel Natu
c6a0cc2e21 Some Linux guests will implement a 'halt' by disabling the APIC and executing
the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and
converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit
the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was
spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET).

This functionality was broken in r263780 in a way that made it impossible
to kill the bhyve(8) process because it would loop forever in
vm_handle_suspend().

Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from
a Linux guest will appear to be hung but this is consistent with the
behavior on bare metal. The guest can be rebooted by using the bhyvectl
options '--force-reset' or '--force-poweroff'.

Reviewed by:	grehan@
2014-04-29 18:42:56 +00:00
Neel Natu
f0fdcfe247 Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with:	grehan@
2014-04-28 22:06:40 +00:00
Peter Grehan
67e1705297 Implement legacy interrupts for the AHCI device emulation
according to the method outlined in the AHCI spec.

Tested with FreeBSD 9/10/11 with MSI disabled,
and also NetBSD/amd64 (lightly).

Reviewed by:	neel, tychon
MFC after:	3 weeks
2014-04-28 18:41:25 +00:00
Peter Grehan
fcbec69157 Respect and track the enable bit in the PCI configuration address word.
Ignore writes, and return 0xff's, on config accesses when not set.
Behaviour now matches that seen on h/w.

Found with a NetBSD/amd64 guest.

Reviewed by:	tychon
MFC after:	3 weeks
2014-04-25 17:35:34 +00:00
Tycho Nightingale
d42ea5731e Provide a very basic stub for the 8042 PS/2 keyboard controller.
Reviewed by:	jhb
Approved by:	neel (co-mentor)
2014-04-25 13:38:18 +00:00
Xin LI
994f858a8b Use calloc() in favor of malloc + memset.
Reviewed by:	neel
2014-04-22 18:55:21 +00:00
Tycho Nightingale
82c2c89084 Factor out common ioport handler code for better hygiene -- pointed
out by neel@.

Approved by:	neel (co-mentor)
2014-04-22 16:13:56 +00:00
Tycho Nightingale
1d6be92ac6 Fix ACPI DSDT indentation cosmetic breakage introduced in r264631 --
pointed out by jhb@.

Approved by:	grehan (co-mentor)
2014-04-18 16:01:19 +00:00
Tycho Nightingale
d6aa08c3ef Respect the destination operand size of the 'Input from Port' instruction.
Approved by:	grehan (co-mentor)
2014-04-18 15:22:56 +00:00
Tycho Nightingale
79d6ca331e Add support for reading the PIT Counter 2 output signal via the NMI
Status and Control register at port 0x61.

Be more conservative about "catching up" callouts that were supposed
to fire in the past by skipping an interrupt if it was
scheduled too far in the past.

Restore the PIT ACPI DSDT entries and add an entry for NMISC too.

Approved by:	neel (co-mentor)
2014-04-18 00:02:06 +00:00
Tycho Nightingale
b96be57a2d Add support for emulating the slave PIC.
Reviewed by:	grehan, jhb
Approved by:	grehan (co-mentor)
2014-04-14 19:00:20 +00:00
Tycho Nightingale
8b4a7f857b Constrain the amount of data returned to what is actually available
not the size of the buffer.

Approved by:	grehan (co-mentor)
2014-04-09 14:50:55 +00:00
John Baldwin
2cf4f7ef79 Handle single-byte reads from the bvmcons port (0x220) by returning
0xff.  Some guests may attempt to read from this port to identify
psuedo-PNP ISA devices.  (The ie(4) driver in FreeBSD/i386 is one
example.)

Reviewed by:	grehan
2014-04-08 21:02:03 +00:00
Peter Grehan
9d0c4e17d9 Add support for the virtio RNG entropy-source device.
Call through to /dev/random synchronously to fill
virtio buffers with RNG data.

Tested with FreeBSD-CURRENT and Ubuntu guests.

Submitted by:	Leon Dang
Discussed with:	markm
MFC after:	3 weeks
Sponsored by:	Nahanni Systems
2014-04-02 20:18:17 +00:00
Neel Natu
b15a09c05e Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with:	grehan
2014-03-26 23:34:27 +00:00
Tycho Nightingale
e883c9bb40 Move the atpit device model from userspace into vmm.ko for better
precision and lower latency.

Approved by:	grehan (co-mentor)
2014-03-25 19:20:34 +00:00
Neel Natu
0826d045cc Use 'cpuset_t' to represent the vcpus active in a virtual machine. 2014-03-20 18:15:37 +00:00
Tycho Nightingale
4feac03f2c Don't reissue in-flight commands.
Approved by:	neel (co-mentor)
2014-03-18 23:25:35 +00:00
Tycho Nightingale
7292923b49 Though there currently isn't a way to insert new media into an ATAPI
drive, at least pretend to support Asynchronous Notification (AN) to
avoid a guest needlessly polling for it.

Approved by:	grehan (co-mentor)
2014-03-16 12:33:40 +00:00
Tycho Nightingale
113d84c11d Support the bootloader's single 16-bit 'outw' access to the Divisor
Latch MSB and LSB registers.

Approved by:	neel (co-mentor)
2014-03-16 12:31:28 +00:00
Tycho Nightingale
762fd20804 Replace the userspace atpic stub with a more functional vmm.ko model.
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by:	jhb, neel
Approved by:	neel (co-mentor)
2014-03-11 16:56:00 +00:00
Peter Grehan
f4959d3537 Open the uart emulation's backing tty in non-blocking mode.
This fixes the issue of bhyve appearing to halt when using
nmdm ports for the console, until a connection is made to
the other end.

bhyveload already does this.

Reported by:	Many.
MFC after:	3 weeks.
2014-03-07 06:23:37 +00:00
Tycho Nightingale
af5bfc53b8 Add SMBIOS support.
A new option, -U, can be used to set the UUID in the System
Information (Type 1) structure.  Manpage fix to follow.

Approved by:	grehan (co-mentor)
2014-03-04 17:12:06 +00:00
Neel Natu
9777ca203c Document the "-a" and "-x" options to match the changes in r262236.
Reviewed by:	grehan
2014-02-26 19:14:54 +00:00
Neel Natu
dc50650607 Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with:	grehan, jhb
Reviewed by:	jhb
2014-02-26 00:52:05 +00:00
Peter Grehan
4258c52e29 Fix virtio spec URL.
Submitted by:	lwhsu
MFC after:	1 week
2014-02-21 22:45:35 +00:00
Tycho Nightingale
182d7debb9 Avoid clobbering the counter mode when issuing a latch command.
Approved by:	grehan (co-mentor)
2014-02-21 01:15:26 +00:00
Neel Natu
52e5c8a2ec Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with:	grehan@
2014-02-20 01:48:25 +00:00
Neel Natu
7a902ec0ec Add a check to validate that memory BARs of passthru devices are 4KB aligned.
Also, the MSI-x table offset is not required to be 4KB aligned so take this
into account when computing the pages occupied by the MSI-x tables.
2014-02-18 19:00:15 +00:00
John Baldwin
a96b8b801a Tweak the handling of PCI capabilities in emulated devices to remove
the non-standard zero capability list terminator.   Instead, track
the start and end of the most recently added capability and use that
to adjust the previous capability's next pointer when a capability is
added and to determine the range of config registers belonging to
PCI capability registers.

Reviewed by:	neel
2014-02-18 03:00:20 +00:00
Neel Natu
06db1b4a59 Update bhyve(8) man page to describe the usage of the "-s" option to assign
bus numbers to emulated devices. Also add the restriction that the LPC bridge
emulation can only be configured on bus 0.

Reviewed by:	grehan@
2014-02-14 21:46:04 +00:00
Neel Natu
d84882ca8f Allow PCI devices to be configured on all valid bus numbers from 0 to 255.
This is done by representing each bus as root PCI device in ACPI. The device
implements the _BBN method to return the PCI bus number to the guest OS.

Each PCI bus keeps track of the resources that is decodes for devices
configured on the bus: i/o, mmio (32-bit) and mmio (64-bit). These windows
are advertised to the guest via the _CRS object of the root device.

Bus 0 is treated specially since it consumes the I/O ports to access the
PCI config space [0xcf8-0xcff]. It also decodes the legacy I/O ports that
are consumed by devices on the LPC bus. For this reason the LPC bridge can
be configured only on bus 0.

The bus number can be specified using the following command line option
to bhyve(8): "-s <bus>:<slot>:<func>,<emul>[,<config>]"

Discussed with:	grehan@
Reviewed by:	jhb@
2014-02-14 21:34:08 +00:00
Tycho Nightingale
2a261121af Provide an indication a "PIO Setup Device to Host FIS" occurred while executing
the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands.

Also, provide an indication a "D2H Register FIS" occurred during a SET FEATURES
command.

Approved by:	grehan (co-mentor)
2014-02-12 00:32:14 +00:00
John Baldwin
1f82944f35 Mark the I/O ports used by the bhyve console and debug devices as system
resources.

MFC after:	1 week
2014-02-07 20:53:41 +00:00
John Baldwin
3cbf3585cb Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
  to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
  ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
  interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
  PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
  that share a slot attempt to distribute their INTx interrupts across
  the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
  and when the INTx DIS bit is set in a function's PCI command register.
  Either assert or deassert the associated I/O APIC pin when the
  state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by:	neel (7)
Reviewed by:	neel
MFC after:	2 weeks
2014-01-29 14:56:48 +00:00
John Baldwin
d2bc4816c5 Remove support for legacy PCI devices. These haven't been needed since
support for LPC uart devices was added and it conflicts with upcoming
patches to add PCI INTx support.

Reviewed by:	neel
2014-01-27 22:26:15 +00:00
Tycho Nightingale
4e5f86e009 Fix issue with stale fields from a recycled request pulled off the freelist.
Approved by:	grehan (co-mentor)
2014-01-22 01:57:52 +00:00
Tycho Nightingale
40eb53f232 Increase the block-layer backend maximum number of requests to match
the AHCI command queue depth.  This allows a slew of commands issued
by a Linux guest to be absorbed without error.

Approved by:	grehan (co-mentor)
2014-01-22 01:56:49 +00:00
Peter Grehan
d68f0bd618 Fix issue with the virtio descriptor region being truncated
if it was above 4GB. This was seen with CentOS 6.5 guests with
large RAM, since the block drivers are loaded late in the
boot sequence and end up allocating descriptor memory from
high addresses.

Reported by:	Michael Dexter
MFC after:	3 days
2014-01-09 07:17:21 +00:00