178 Commits

Author SHA1 Message Date
neel
f333e1d3ba MFC r272481.
Add new fields in the FADT, required by IASL 20140926-64.
2015-04-06 03:16:20 +00:00
mav
9d1b41e02f MFC r280154:
Report that we may have write cache, and that we do support FLUSH.
2015-03-27 08:59:21 +00:00
mav
1240a4ece1 MFC r280133: Increase S/G list size of 32 to 33 entries.
32 entries are not enough for the worst case of misaligned 128KB request,
that made FreeBSD to chunk large quests in odd pieces.
2015-03-27 08:58:30 +00:00
mav
ea196f5fc6 MFC r280126: Pre-allocate one extra request per processing thread.
Processing threads call callbacks before freeing requests.  As result,
new requests may arrive before old ones are freed.
2015-03-27 08:57:38 +00:00
mav
4b55b30394 MFC r280044:
According to Linux and QEMU, s/n equal to buffer is not zero-terminated.

This makes same s/n reported for both virtio and AHCI drivers.
2015-03-27 08:56:44 +00:00
mav
1265da624b MFC r280042: Close potential race on blockif_close().
Reported by:	vangyzen
2015-03-27 08:55:54 +00:00
mav
77dd65195c MFC r280040:
Give AHCI disk serial based on backing file path same as for virtio block.

It is still not good that they may intersect on different hosts, but that
is better then intersecting on the same host.
2015-03-27 08:54:55 +00:00
mav
b0012948cc MFC r280037:
Rewrite virtio block device driver to work asynchronously and use the block
I/O interface.

Asynchronous operation, based on r280026 change, allows to not block virtual
CPU during I/O processing, that on slow/busy storage can take seconds.
Use of recently improved block I/O interface allows to process multiple
requests same time, that improves random I/O performance on wide storages.

Benchmarks of virtual disk, backed by ZVOL on RAID10 pool of 4 HDDs, show
~3.5 times random read performance improvements, while no degradation on
linear I/O.  Guest CPU usage during test dropped from 100% to almost zero.
2015-03-27 08:53:59 +00:00
mav
aaa4bfa294 MFC r280026, r280041:
Modify virtqueue helpers added in r253440 to allow queuing.

Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.
2015-03-27 08:52:57 +00:00
mav
3cab0e3831 MFC r280004: Give block I/O interface multiple (8) execution threads.
On parallel random I/O this allows better utilize wide storage pools.
To not confuse prefetcher on linear I/O, consecutive requests are executed
sequentially, following the same logic as was earlier implemented in CTL.

Benchmarks of virtual AHCI disk, backed by ZVOL on RAID10 pool of 4 HDDs,
show ~3.5 times random read performance improvements, while no degradation
on linear I/O.
2015-03-27 08:51:20 +00:00
mav
3505e1d0e9 MFC r279987: Add checksums to identify data and NCQ command error log. 2015-03-27 08:50:26 +00:00
mav
22ec3a9f70 MFC r279979: Slightly polish virtual AHCI CD reporting. 2015-03-27 08:49:33 +00:00
mav
c9d7a5b073 MFC r279977: Fix NOP and IDLE commands for virtual AHCI disks. 2015-03-27 08:48:44 +00:00
mav
b30732f21d MFC r279976: Add support for NCQ variant of DSM TRIM for virtual AHCI disks.
The code is not really tested yet due to lack of initiator support.
2015-03-27 08:47:54 +00:00
mav
adad7a7d84 MFC r279975: Improve NCQ errors reporting for virtual AHCI disks.
While this implementation is still not perfect, previous was just broken.
2015-03-27 08:47:02 +00:00
mav
e0d0bd28e8 MFC r279968: Remove incorrect SERR register setting.
At this point we have nothing to report through that register.
2015-03-27 08:46:12 +00:00
mav
0bcdb239fd MFC r279967: Change prdbc value reporting. 2015-03-27 08:44:58 +00:00
mav
f96653efd8 MFC r279965: Polish AHCI disk identify data and fix speed negotiation. 2015-03-27 08:43:45 +00:00
mav
f4f616ce8b MFC r279960:
Add support for PIO variants of READ/WRITE commands for AHCI disks.

AHCI API hides all PIO specifics, so this functionality is almost free.
2015-03-27 08:42:55 +00:00
mav
989c57f51c MFC r279975: Use ahci_write_fis_d2h() for commands completion. 2015-03-27 08:41:49 +00:00
mav
3ce68975f2 MFC r279957, r280017: Add DSM TRIM command support for virtual AHCI disks.
It works only for virtual disks backed by ZVOLs and raw devices supporting
BIO_DELETE.  Virtual disks backed by files won't report this capability.

Relnotes:	yes
2015-03-23 14:36:53 +00:00
mav
07b059e689 MFC r280293: Add missing variable initialization.
Reported by:	Coverity
CID:		1288938
2015-03-23 11:48:25 +00:00
mav
6a65b273d8 MFC r279658, r279673, r279675:
Implement cache flush for ahci-hd and for virtio-blk over device.
2015-03-19 09:56:38 +00:00
mav
207f5e4d98 MFC r279654: Report logical/physical sector sizes for virtual SATA disk. 2015-03-19 09:54:48 +00:00
mav
a4ce980fef MFC r279651, r279652, r279657:
Add support for TOPOLOGY feature of virtio block device.

Passing through physical block size/offset from underlying storage allows
guest to manage proper data and I/O alignment to improve performance.
2015-03-19 09:53:00 +00:00
neel
10c6be06b4 MFC r273683
Move the ACPI PM timer emulation into vmm.ko.

MFC r273706
Change the type of the first argument to the I/O emulation handlers to
'struct vm *'.

MFC r273710
Add a comment explaining the intent behind the I/O reservation [0x72-0x77].

MFC r273744
Add foo_genassym.c files to DPSRCS so dependencies for them are generated.
This ensures these objects are rebuilt to generate an updated header of
assembly constants if needed.

MFC r274045
If the start bit, PxCMD.ST, is cleared and nothing is in-flight then
PxCI, PxSACT, PxCMD.CCS and PxCMD.CR should be 0.

MFC r274076
Improve the ability to cancel an in-flight request by using an interrupt,
via SIGCONT, to force the read or write system call to return prematurely.

MFC r274330
To allow a request to be submitted from within the callback routine of
a completing one increase the total by 1 but don't advertise it.

MFC r274931
Change the lower bound for guest vmspace allocation to 0 instead of using
the VM_MIN_ADDRESS constant.

MFC r275817
For level triggered interrupts clear the PIC IRR bit when the interrupt pin
is deasserted.

MFC r275850
Fix 8259 IRQ priority resolver.

MFC r275952
Various 8259 device model improvements.

MFC r275965
Emulate writes to the IA32_MISC_ENABLE MSR.
2014-12-30 22:22:46 +00:00
neel
9a7db864f7 MFC r273375
Add support AMD processors with the SVM/AMD-V hardware extensions.

MFC r273749
Remove bhyve SVM feature printf's now that they are available in the general
CPU feature detection code.

MFC r273766
Add missing 'break' pointed out by Coverity CID 1249760.

MFC r276098
Allow ktr(4) tracing of all guest exceptions via the tunable "hw.vmm.trace_guest_exceptions"

MFC r276392
Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on an
AMD/SVM host.

MFC r276402
Remove "svn:mergeinfo" property that was dragged along when these files were
svn copied in r273375.
2014-12-30 08:24:14 +00:00
neel
88c1adb417 MFC r270326
Fix a recursive lock acquisition in vi_reset_dev().

MFC r270434
Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find
any unmasked pin with an interrupt asserted.

MFC r270436
Fix a bug in the emulation of CPUID leaf 0x4.

MFC r270437
Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package"
tunables to modify the default cpu topology advertised by bhyve.

MFC r270855
Set the 'inst_length' to '0' early on before any error conditions are detected
in the emulation of the task switch. If any exceptions are triggered then the
guest %rip should point to instruction that caused the task switch as opposed
to the one after it.

MFC r270857
The "SUB" instruction used in getcc() actually does 'x -= y' so use the
proper constraint for 'x'. The "+r" constraint indicates that 'x' is an
input and output register operand.

While here generate code for different variants of getcc() using a macro
GETCC(sz) where 'sz' indicates the operand size.

Update the status bits in %rflags when emulating AND and OR opcodes.

MFC r271439
Initialize 'bc_rdonly' to the right value.

MFC r271451
Optimize the common case of injecting an interrupt into a vcpu after a HLT
by explicitly moving it out of the interrupt shadow.

MFC r271888
Restructure the MSR handling so it is entirely handled by processor-specific
code.

MFC r271890
MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This
behavior was changed in r271888 so update the comment block to reflect this.

MFC r271891
Add some more KTR events to help debugging.

MFC r272197
mmap(2) requires either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings.

MFC r272395
Get rid of code that dealt with the hardware not being able to save/restore
the PAT MSR on guest exit/entry. This workaround was done for a beta release
of VMware Fusion 5 but is no longer needed in later versions.

All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT
in the VM exit and entry controls.

MFC r272670
Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.

MFC r272710
Implement the FLUSH operation in the virtio-block emulation.

MFC r272838
iasl(8) expects integer fields in data tables to be specified as hexadecimal
values. Therefore the bit width of the "PM Timer Block" was actually being
interpreted as 50-bits instead of the expected 32-bit.

This eliminates an error message emitted by a Linux 3.17 guest during boot:
"Invalid length for FADT/PmTimerBlock: 50, using default 32"

MFC r272839
Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:
 - MSR_PLATFORM_INFO
 - MSR_TURBO_RATIO_LIMITx
 - MSR_RAPL_POWER_UNIT

MFC r273108
Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve.

MFC r273212
Support stopping and restarting the AHCI command list via toggling PxCMD.ST
from '1' to '0' and back.  This allows the driver a chance to recover if
for instance a timeout occurred due to activity on the host.
2014-12-28 21:27:13 +00:00
grehan
2a5b8e77b1 MFC r272007
Correct display of bhyve SMBIOS UUIDs with dmidecode by bumping the version.

The mixed little/big-endianness of SMBIOS UUIDs was clarified in v2.6
of the SMBIOS spec. dmidecode uses the reported version of SMBIOS to
determine the layout and what to byte-swap.

bhyve's SMBIOS reported as 2.4 though it implemented the 2.6-style of
memory layout. This resulted in dmidecode reporting a different
UUID than one passed in via the -U option.

Fix by exporting a version of 2.6.

Approved by:	re (gjb)
2014-09-25 23:09:35 +00:00
gjb
d6ca07d629 MFC r271711:
Update the bhyve(8) manual to reflect that it is no
  longer considered 'experimental.'

Approved by:	re (delphij)
Sponsored by:	The FreeBSD Foundation
2014-09-22 14:54:12 +00:00
grehan
71412b55f1 MFC virtio-net changes.
Re-tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.

r271299:
Add a callback to be notified about negotiated features.

r271338:
Allow vtnet operation without merged rx buffers.

NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.

Approved by:	re (glebius)
Obtained from:	Vincenzo Maffione, Universita` di Pisa (r271299)
2014-09-16 19:08:54 +00:00
grehan
5d455a50f5 MFC r267921, r267934, r267949, r267959, r267966, r268202, r268276,
r268427, r268428, r268521, r268638,	r268639, r268701, r268777,
    r268889, r268922, r269008, r269042,	r269043, r269080, r269094,
    r269108, r269109, r269281, r269317,	r269700, r269896, r269962,
    r269989.

Catch bhyve up to CURRENT.

Lightly tested with FreeBSD i386/amd64,	Linux i386/amd64, and
OpenBSD/amd64. Still resolving an	issue with OpenBSD/i386.

Many thanks to jhb@ for	all the	hard work on the prior MFCs !

r267921 - support the "mov r/m8, imm8" instruction
r267934 - document options
r267949 - set DMI vers/date to fixed values
r267959 - doc: sort cmd flags
r267966 - EPT misconf post-mortem info
r268202 - use correct flag for event index
r268276 - 64-bit virtio capability api
r268427 - invalidate guest TLB when cr3 is updated, needed for TSS
r268428 - identify vcpu's operating mode
r268521 - use correct offset in guest logical-to-linear translation
r268638 - chs value
r268639 - chs fake values
r268701 - instr emul operand/address size override prefix support
r268777 - emulation for legacy x86 task switching
r268889 - nested exception support
r268922 - fix INVARIANTS build
r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel
r269042 - fix fault injection
r269043 - Reduce VMEXIT_RESTARTs in task_switch.c
r269080 - fix issues in PUSH emulation
r269094 - simplify return values from the inout handlers
r269108 - don't return -1 from the push emulation handler
r269109 - avoid permanent sleep in vm_handle_hlt()
r269281 - list VT-x features in base kernel dmesg
r269317 - Mark AHCI fatal errors as not completed
r269700 - Support PCI extended config space in bhyve
r269896 - Minor cleanup
r269962 - use max guest memory when creating IOMMU domain
r269989 - fix interrupt mode names
2014-08-19 01:20:24 +00:00
grehan
46d28d66fb MFC r267311, r267330, r267811, r267884
Turn on interrupt window exiting unconditionally when an ExtINT is being
injected into the guest.

Add helper functions to populate VM exit information for rendezvous and
astpending exits.

Provide APIs to directly get 'lowmem' and 'highmem' size directly.

Expose the amount of resident and wired memory from the guest's vmspace
2014-08-17 01:23:52 +00:00
grehan
e83027edbb MFC r266933
Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.
2014-08-17 00:52:07 +00:00
jhb
ce450da430 MFC 266424,266476,266524,266573,266595,266626,266627,266633,266641,266642,
266708,266724,266934,266935,268521:
Emulation of the "ins" and "outs" instructions.

Various fixes for translating guest linear addresses to guest physical
addresses.
2014-07-22 04:39:16 +00:00
jhb
c1fe945ebd MFC 266125:
Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
2014-07-22 03:14:37 +00:00
jhb
e6b48465b7 MFC 264353,264509,264768,264770,264825,264846,264988,265114,265165,265365,
265941,265951,266390,266550,266910:
Various bhyve fixes:
- Don't save host's return address in 'struct vmxctx'.
- Permit non-32-bit accesses to local APIC registers.
- Factor out common ioport handler code.
- Use calloc() in favor of malloc + memset.
- Change the vlapic timer frequency to be in the ballpark of contemporary
  hardware.
- Allow the guest to read the TSC via MSR 0x10.
- A VMCS is always inactive when it exits the vmx_run() loop.  Remove
  redundant code and the misleading comment that suggest otherwise.
- Ignore writes to microcode update MSR.  This MSR is accessed by RHEL7
  guest.
  Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
- Provide an alias for the userboot console and name it 'comconsole'.
- Use EV_ADD to create an mevent and EV_ENABLE to enable it.
- abort(3) the process in response to a VMEXIT_ABORT.
- Don't include the guest memory segments in the bhyve(8) process core dump.
- Make the vmx asm code dtrace-fbt-friendly.
- Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in
  the VCPU_RUNNING state.
- Enable VMX in the IA32_FEATURE_CONTROL MSR if it not enabled and the MSR
  isn't locked.
2014-07-21 19:08:02 +00:00
jhb
888f6511e3 MFC 263780,264516,265062,265101,265203,265364:
Add an ioctl to suspend a virtual machine (VM_SUSPEND).

Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

This logic can be disabled via the tunable 'hw.vmm.halt_detection'.
2014-07-21 02:39:17 +00:00
jhb
d034cf40e5 MFC 264916,267611:
Provide a very basic stub for the 8042 PS/2 keyboard controller.
2014-07-21 02:17:28 +00:00
jhb
cf1a222326 MFC 260847,264055,264867:
- Add a very simple virtio_random(4) driver for FreeBSD guests to harvest
  entropy from hypervisors.
- Add support to bhyve for the virtio RNG entropy-source device to provide
  entry to bhyve guests.
2014-07-21 00:21:56 +00:00
jhb
6095428430 MFC 263432,265366,265376:
Fixes for vcpu management in bhyve:
- Use 'cpuset_t' to represent the vcpus active in a virtual machine.
- Modify the "-p" option to be more flexible when associating a 'vcpu' with
  a 'hostcpu'.
2014-07-19 22:24:29 +00:00
jhb
da07382880 MFC 262884,263236,265407:
Various uart fixes:
- Open the uart emulation's backing tty in non-blocking mode.
- Support 16-bit register access.
- Disable the 'uart_drain()' callback when the emulated receive FIFO
  is full.
2014-07-19 22:13:12 +00:00
jhb
3da9c304ee MFC 259942,262274,263035,263054,263211,263744,264179,264324,264468,264631,
264648,264650,264651,266572,267558:
Flesh out the AT PIC and 8254 PIT emulations and move them into the kernel.
2014-07-19 22:06:46 +00:00
jhb
ba55949ac3 MFC 261904,261905,262143,262184,264921,265211,267169,267292,267294:
Various PCI fixes:
- Allow PCI devices to be configured on all valid bus numbers from 0 to 255.
- Tweak the handling of PCI capabilities in emulated devices to remove
  the non-standard zero capability list terminator.
- Add a check to validate that memory BARs of passthru devices are 4KB
  aligned.
- Respect and track the enable bit in the PCI configuration address word.
- Handle quad-word access to 32-bit register pairs.
2014-07-19 20:13:01 +00:00
jhb
b9c113aadd MFC 264277:
Handle single-byte reads from the bvmcons port (0x220) by returning
0xff.  Some guests may attempt to read from this port to identify
psuedo-PNP ISA devices.  (The ie(4) driver in FreeBSD/i386 is one
example.)
2014-06-26 19:19:06 +00:00
jhb
30000c41d7 MFC 262744:
Add SMBIOS support.

A new option, -U, can be used to set the UUID in the System
Information (Type 1) structure.
2014-06-13 21:30:40 +00:00
jhb
f6a797dc57 MFC 262139,262140,262236,262281,262532:
Various x2APIC fixes and enhancements:
- Use spinlocks for the vioapic.
- Handle the SELF_IPI MSR.
- Simplify the APIC mode switching between MMIO and x2APIC.  The guest is
  no longer allowed to switch modes at runtime.  Instead, the desired mode
  is set when the virtual machine is created.
- Disallow MMIO access in x2APIC mode and MSR access in xAPIC mode.
- Add support for x2APIC virtualization assist in Intel VT-x.
2014-06-13 19:10:40 +00:00
jhb
3e1f2ae835 MFC 261638,262144,262506,266765:
Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and
XSAVE-enabled features like AVX.
- Store a per-cpu guest xcr0 register and handle xsetbv VM exits by emulating
  the instruction.
- Only expose XSAVE to guests if XSAVE is enabled in the host.  Only expose
  a subset of XSAVE features currently supported by the guest and for which
  the proper emulation of xsetbv is known.  Currently this includes X87, SSE,
  AVX, AVX-512, and Intel MPX.
- Add support for injecting hardware exceptions into the guest and use this
  to trigger exceptions in the guest for invalid xsetbv operations instead
  of potentially faulting in the host.
- Queue pending exceptions in the 'struct vcpu' instead of directly updating
  the processor-specific VMCS or VMCB. The pending exception will be delivered
  right before entering the guest.
- Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
  it to only deliver x86 hardware exceptions. This new ioctl is now used to
  inject a protection fault when the guest accesses an unimplemented MSR.
- Expose a subset of known-safe features from leaf 0 of the structured
  extended features to guests if they are supported on the host including
  RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM.  Aside
  from AVX-512, these features are all new instructions available for use
  in ring 3 with no additional hypervisor changes needed.
2014-06-12 19:58:12 +00:00
jhb
cebc7c305f MFC 262311: Fix virtio spec URL. 2014-06-12 15:24:33 +00:00
jhb
835cb387cc MFC 260239,261268,265058:
Expand the support for PCI INTx interrupts including providing interrupt
routing information for INTx interrupts to I/O APIC pins and enabling
INTx interrupts in the virtio and AHCI backends.
2014-06-12 13:13:15 +00:00