Commit Graph

6632 Commits

Author SHA1 Message Date
Mark Johnston
3e0ba3a163 Only invoke fasttrap hooks for traps from user mode, and ensure that they're
called with interrupts enabled. Calling fasttrap_pid_probe() with interrupts
disabled can lead to deadlock if fasttrap writes to the process' address
space.

Reviewed by:	rpaulo
MFC after:	3 weeks
2014-03-19 01:27:56 +00:00
Warner Losh
d4f95c889d In kernel config files, it is supposed to be 'options<space><tab>' not
'options<tab><tab>', per long standing (but recently not so strictly
enforced) convention.
2014-03-18 14:41:18 +00:00
Neel Natu
22d822c6b0 When a vcpu is deactivated it must also unblock any rendezvous that may be
blocked on it.

This is done by issuing a wakeup after clearing the 'vcpuid' from 'active_cpus'.
Also, use CPU_CLR_ATOMIC() to guarantee visibility of the updated 'active_cpus'
across all host cpus.
2014-03-18 02:49:28 +00:00
Neel Natu
970955e479 Notify vcpus participating in the rendezvous of the pending event to ensure
that they execute the rendezvous function as soon as possible.
2014-03-17 23:30:38 +00:00
Warner Losh
6a5b1a3544 Align all comments in config files on same column. This consistency
helps when bits and pieces of GENERIC from i386 or amd64 are cut and
pasted into other architecture's config files (which in the case of
ARM had gotten rather akimbo).
2014-03-16 15:22:52 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Tycho Nightingale
0775fbb475 Fix a race wherein the source of an interrupt vector is wrongly
attributed if an ExtINT arrives during interrupt injection.

Also, fix a spurious interrupt if the PIC tries to raise an interrupt
before the outstanding one is accepted.

Finally, improve the PIC interrupt latency when another interrupt is
raised immediately after the outstanding one is accepted by creating a
vmexit rather than waiting for one to occur by happenstance.

Approved by:	neel (co-mentor)
2014-03-15 23:09:34 +00:00
Robert Watson
3dbe595b2d Revert a small portion of r263198 left over from local testing: don't
enable PCB groups and RSS by default [yet].
2014-03-15 00:59:23 +00:00
Robert Watson
7527624efa Several years after initial development, merge prototype support for
linking NIC Receive Side Scaling (RSS) to the network stack's
connection-group implementation.  This prototype (and derived patches)
are in use at Juniper and several other FreeBSD-using companies, so
despite some reservations about its maturity, merge the patch to the
base tree so that it can be iteratively refined in collaboration rather
than maintained as a set of gradually diverging patch sets.

(1) Merge a software implementation of the Toeplitz hash specified in
    RSS implemented by David Malone.  This is used to allow suitable
    pcbgroup placement of connections before the first packet is
    received from the NIC.  Software hashing is generally avoided,
    however, due to high cost of the hash on general-purpose CPUs.

(2) In in_rss.c, maintain authoritative versions of RSS state intended
    to be pushed to each NIC, including keying material, hash
    algorithm/ configuration, and buckets.  Provide software-facing
    interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both
    the RSS standardised Toeplitz and a 'naive' variation with a hash
    efficient in software but with poor distribution properties.
    Implement rss_m2cpuid()to be used by netisr and other load
    balancing code to look up the CPU on which an mbuf should be
    processed.

(3) In the Ethernet link layer, allow netisr distribution using RSS as
    a source of policy as an alternative to source ordering; continue
    to default to direct dispatch (i.e., don't try and requeue packets
    for processing on the 'right' CPU if they arrive in a directly
    dispatchable context).

(4) Allow RSS to control tuning of connection groups in order to align
    groups with RSS buckets.  If a packet arrives on a protocol using
    connection groups, and contains a suitable hardware-generated
    hash, use that hash value to select the connection group for pcb
    lookup for both IPv4 and IPv6.  If no hardware-generated Toeplitz
    hash is available, we fall back on regular PCB lookup risking
    contention rather than pay the cost of Toeplitz in software --
    this is a less scalable but, at my last measurement, faster
    approach.  As core counts go up, we may want to revise this
    strategy despite CPU overhead.

Where device drivers suitably configure NICs, and connection groups /
RSS are enabled, this should avoid both lock and line contention during
connection lookup for TCP.  This commit does not modify any device
drivers to tune device RSS configuration to the global RSS
configuration; patches are in circulation to do this for at least
Chelsio T3 and Intel 1G/10G drivers.  Currently, the KPI for device
drivers is not particularly robust, nor aware of more advanced features
such as runtime reconfiguration/rebalancing.  This will hopefully prove
a useful starting point for refinement.

No MFC is scheduled as we will first want to nail down a more mature
and maintainable KPI/KBI for device drivers.

Sponsored by:   Juniper Networks (original work)
Sponsored by:   EMC/Isilon (patch update and merge)
2014-03-15 00:57:50 +00:00
Gleb Smirnoff
45c203fce2 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
Gleb Smirnoff
2c284d9395 Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 02:58:48 +00:00
Warner Losh
22b8ff24b5 Delete stray clause 3 (Advertising clause) and renumber while i'm
here.

Approved by: alc@
2014-03-11 23:41:35 +00:00
Tycho Nightingale
1ed19b835a Don't try to return a vector to a caller that only cares if a vector
is pending or not.

Approved by:	neel (co-mentor)
2014-03-11 22:12:12 +00:00
Warner Losh
846351f8b3 Remove clause 3 (the advertising clause), per the regent's letter. 2014-03-11 17:20:50 +00:00
Tycho Nightingale
762fd20804 Replace the userspace atpic stub with a more functional vmm.ko model.
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by:	jhb, neel
Approved by:	neel (co-mentor)
2014-03-11 16:56:00 +00:00
Roger Pau Monné
079f7ef839 xen: add a hook to perform AP startup
AP startup on PVH follows the PV method, so we need to add a hook in
order to diverge from bare metal.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Add hook for start_all_aps on native (using native_start_all_aps
   defined in mp_machdep).

amd64/amd64/mp_machdep.c:
 - Make some variables global because they will also be used by the
   Xen PVH AP startup code.
 - Use the start_all_aps hook to start APs.
 - Rename start_all_aps to native_start_all_aps.

amd64/include/smp.h:
 - Add declaration for native_start_all_aps.

x86/include/init.h:
 - Declare start_all_aps hook in init_ops.

x86/xen/pv.c:
 - Pick external declarations from mp_machdep.
 - Introduce Xen PV code to start APs on PVH.
 - Set start_all_aps init hook to use the Xen PVH implementation.
2014-03-11 10:27:57 +00:00
Roger Pau Monné
5a036d7e02 xen: add hook for AP bootstrap memory reservation
This hook will only be implemented for bare metal, Xen doesn't require
any bootstrap code since APs are started in long mode with paging
enabled.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Set mp_bootaddress hook for bare metal.

x86/include/init.h:
 - Define mp_bootaddress in init_ops.
2014-03-11 10:26:16 +00:00
Roger Pau Monné
4d30a3fb95 xen: use the same hypercall mechanism for XEN and XENHVM
Currently XEN (PV) and XENHVM (PVHVM) ports use different ways to
issue hypercalls, unify this by filling the hypercall_page under HVM
also.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/include/xen/hypercall.h:
 - Unify Xen hypercall code by always using the PV way.

i386/i386/locore.s:
 - Define hypercall_page on i386 XENHVM.

x86/xen/hvm.c:
 - Fill hypercall_page on XENHVM kernels using the HVM method (only
   when running as an HVM guest).
2014-03-11 10:24:13 +00:00
Roger Pau Monné
1e69553ed1 xen: implement hook to fetch and parse e820 memory map
e820 memory map is fetched using a hypercall under Xen PVH, so add a
hook to init_ops in oder to diverge from bare metal and implement a
Xen variant.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

x86/include/init.h:
 - Add a parse_memmap hook to init_ops, that will be called to fetch
   and parse the memory map.

amd64/amd64/machdep.c:
 - Decouple the fetch and the parse of the memmap, so the parse
   function can be shared with Xen code.
 - Move code around in order to implement the parse_memmap hook.

amd64/include/pc/bios.h:
 - Declare bios_add_smap_entries (implemented in machdep.c).

x86/xen/pv.c:
 - Implement fetching of e820 memmap when running as a PVH guest by
   using the XENMEM_memory_map hypercall.
2014-03-11 10:23:03 +00:00
Roger Pau Monné
5f05c79450 xen: implement an early timer for Xen PVH
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
 - Implement Xen early delay functions using the PV timer and declare
   them.

x86/include/init.h:
 - Add hooks for early clock source initialization and early delay
   functions.

i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
 - Set early delay hooks to use the i8254 on bare metal.
 - Use clock_init (that will in turn make use of init_ops) to
   initialize the early clock source.

amd64/include/clock.h:
i386/include/clock.h:
 - Declare i8254_delay and clock_init.

i386/xen/clock.c:
 - Rename DELAY to i8254_delay.

x86/isa/clock.c:
 - Introduce clock_init that will take care of initializing the early
   clock by making use of the init_ops hooks.
 - Move non ISA related delay functions to the newly introduced delay
   file.

x86/x86/delay.c:
 - Add moved delay related functions.
 - Implement generic DELAY function that will use the init_ops hooks.

x86/xen/pv.c:
 - Set PVH hooks for the early delay related functions in init_ops.

conf/files.amd64:
conf/files.i386:
conf/files.pc98:
 - Add delay.c to the kernel build.
2014-03-11 10:20:42 +00:00
Roger Pau Monné
97baeefd5b amd64: introduce hook for custom preload metadata parsers
Add hooks to amd64 in order to have diverging implementations, since
on Xen PV the metadata is passed to the kernel in a different form.

Approbed by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/machdep.c:
 - Define init_ops for native.
 - Put native code inside of native_parse_preload_data hook.
 - Call the parse_preload_data in order to fill the metadata info.

x86/include/init.h:
 - Declare the init_ops struct.

x86/xen/pv.c:
 - Declare xen_init_ops that contains the Xen PV implementation of
   init_ops.
 - Implement the parse_preload_data for Xen PVH, the info is fetched
   from HYPERVISOR_start_info->cmd_line as provided by Xen.
2014-03-11 10:15:25 +00:00
Roger Pau Monné
1a9cdd373a xen: add PV/PVH kernel entry point
Add the PV/PVH entry point and the low level functions for PVH
early initialization.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/genassym.c:
 - Add __FreeBSD_version define to assym.s so it can be used for the
   Xen notes.

amd64/amd64/locore.S:
 - Make bootstack global so it can be used from Xen kernel entry
   point.

amd64/amd64/xen-locore.S:
 - Add Xen notes to the kernel.
 - Add the Xen PV entry point, that is going to call hammer_time_xen.

amd64/include/asmacros.h:
 - Add ELFNOTE macros.

i386/xen/xen_machdep.c:
 - Define HYPERVISOR_start_info for the XEN i386 PV port, which is
   going to be used in some shared code between PV and PVH.

x86/xen/hvm.c:
 - Define HYPERVISOR_start_info for the PVH port.

x86/xen/pv.c:
 - Introduce hammer_time_xen which is going to perform early setup for
   Xen PVH:
    - Setup shared Xen variables start_info, shared_info and
      xen_store.
    - Set guest type.
    - Create initial page tables as FreeBSD expects to find them.
    - Call into native init function (hammer_time).

xen/xen-os.h:
 - Declare HYPERVISOR_start_info.

conf/files.amd64:
 - Add amd64/amd64/locore.S and x86/xen/pv.c to the list of files.
2014-03-11 10:07:01 +00:00
Roger Pau Monné
e8da1c4877 amd64/i386: switch IPI handlers to C code.
Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Reviewed by: jhb
Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/apic_vector.S:
i386/i386/apic_vector.s:
 - Remove asm coded IPI handlers and instead call the newly introduced
   C variants.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
 - Add C coded clones to the asm IPI handlers (moved from
   x86/xen/hvm.c).

i386/include/smp.h:
amd64/include/smp.h:
 - Add prototypes for the C IPI handlers.

x86/xen/hvm.c:
 - Move the C IPI handlers to mp_machdep and call those in the Xen IPI
   handlers.

i386/xen/mp_machdep.c:
 - Add dummy IPI handlers to the i386 Xen PV port (this port doesn't
   support SMP).
2014-03-11 10:03:29 +00:00
Ed Maste
c75da776c2 Disable amd64 TLB Context ID (pcid) by default for now
There are a number of reports of userspace application crashes that
are "solved" by setting vm.pmap.pcid_enabled=0, including Java and the
x11/mate-terminal port (PR ports/184362).

I originally planned to disable this only in stable/10 (in r262753), but
it has been pointed out that additional crash reports on HEAD are not
likely to provide new insight into the problem.  The feature can easily
be enabled for testing.
2014-03-05 01:34:10 +00:00
Jung-uk Kim
1d22d877b8 Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.
Inspired by:	jhb
2014-03-04 21:35:57 +00:00
Jung-uk Kim
be2d4fcf68 Revert accidentally committed changes in 262748. 2014-03-04 20:16:00 +00:00
Jung-uk Kim
603bc162fb Properly save and restore CR0.
MFC after:	3 days
2014-03-04 20:07:36 +00:00
Jung-uk Kim
05acaa9f85 Remove dead code since r230426, fix a comment, and tidy up.
Reported by:	jhb
MFC after:	3 days
2014-03-04 19:41:16 +00:00
Neel Natu
ef39d7e910 Fix a race between VMRUN() and vcpu_notify_event() due to 'vcpu->hostcpu'
being updated outside of the vcpu_lock(). The race is benign and could
potentially result in a missed notification about a pending interrupt to
a vcpu. The interrupt would not be lost but rather delayed until the next
VM exit.

The vcpu's hostcpu is now updated concurrently with the vcpu state change.
When the vcpu transitions to the RUNNING state the hostcpu is set to 'curcpu'.
It is set to 'NOCPU' in all other cases.

Reviewed by:	grehan
2014-03-01 03:17:58 +00:00
John Baldwin
ad3e368726 Correct VMware capitalization.
Submitted by:	joeld
2014-02-28 21:33:40 +00:00
John Baldwin
722b6744a7 Workaround an apparent bug in VMWare Fusion's nested VT support where it
triggers a VM exit with the exit reason of an external interrupt but
without a valid interrupt set in the exit interrupt information.

Tested by:	Michael Dexter
Reviewed by:	neel
MFC after:	1 week
2014-02-28 19:07:55 +00:00
Neel Natu
dc50650607 Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with:	grehan, jhb
Reviewed by:	jhb
2014-02-26 00:52:05 +00:00
Alan Cox
f438547338 When the kernel is running in a virtual machine, it cannot rely upon the
processor family to determine if the workaround for AMD Family 10h Erratum
383 should be enabled.  To enable virtual machine migration among a
heterogeneous collection of physical machines, the hypervisor may have
been configured to report an older processor family with a reduced feature
set.  Effectively, the reported processor family and its features are like
a "least common denominator" for the collection of machines.

Therefore, when the kernel is running in a virtual machine, instead of
relying upon the processor family, we now test for features that prove
that the underlying processor is not affected by the erratum.  (The
features that we test for are unlikely to ever be emulated in software
on an affected physical processor.)

PR:		186061
Tested by:	Simon Matter
Discussed with:	jhb, neel
MFC after:	2 weeks
2014-02-22 18:53:42 +00:00
Neel Natu
159dd56f94 Add support for x2APIC virtualization assist in Intel VT-x.
The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode
is switched to x2APIC. The VT-x implementation of this handler turns off the
APIC-access virtualization and enables the x2APIC virtualization in the VMCS.

The x2APIC virtualization is done by allowing guest read access to a subset
of MSRs in the x2APIC range. In non-root operation the processor will satisfy
an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead.

The guest is also given write access to TPR, EOI and SELF_IPI MSRs which
get special treatment in non-root operation. This is documented in the
Intel SDM section titled "Virtualizing MSR-Based APIC Accesses".

Enforce that APIC-write and APIC-access VM-exits are handled only if
APIC-access virtualization is enabled. The one exception to this is
SELF_IPI virtualization which may result in an APIC-write VM-exit.
2014-02-21 06:03:54 +00:00
Neel Natu
52e5c8a2ec Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with:	grehan@
2014-02-20 01:48:25 +00:00
John Baldwin
a0efd3fb34 A first pass at adding support for injecting hardware exceptions for
emulated instructions.
- Add helper routines to inject interrupt information for a hardware
  exception from the VM exit callback routines.
- Use the new routines to inject GP and UD exceptions for invalid
  operations when emulating the xsetbv instruction.
- Don't directly manipulate the entry interrupt info when a user event
  is injected.  Instead, store the event info in the vmx state and
  only apply it during a VM entry if a hardware exception or NMI is
  not already pending.
- While here, use HANDLED/UNHANDLED instead of 1/0 in a couple of
  routines.

Reviewed by:	neel
2014-02-18 03:07:36 +00:00
Neel Natu
294d0d88fc Handle writes to the SELF_IPI MSR by the guest when the vlapic is configured
in x2apic mode. Reads to this MSR are currently ignored but should cause a
general proctection exception to be injected into the vcpu.

All accesses to the corresponding offset in xAPIC mode are ignored.

Also, do not panic the host if there is mismatch between the trigger mode
programmed in the TMR and the actual interrupt being delivered. Instead the
anomaly is logged to aid debugging and to prevent a misbehaving guest from
panicking the host.
2014-02-17 23:07:16 +00:00
Neel Natu
9c43cd07ec Use spinlocks to lock accesses to the vioapic.
This is necessary because if the vlapic is configured in x2apic mode the
vioapic_process_eoi() function is called inside the critical section
established by vm_run().
2014-02-17 22:57:51 +00:00
Dimitry Andric
f785676f2a Upgrade our copy of llvm/clang to 3.4 release. This version supports
all of the features in the current working draft of the upcoming C++
standard, provisionally named C++1y.

The code generator's performance is greatly increased, and the loop
auto-vectorizer is now enabled at -Os and -O2 in addition to -O3.  The
PowerPC backend has made several major improvements to code generation
quality and compile time, and the X86, SPARC, ARM32, Aarch64 and SystemZ
backends have all seen major feature work.

Release notes for llvm and clang can be found here:
<http://llvm.org/releases/3.4/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.4/tools/clang/docs/ReleaseNotes.html>

MFC after:	1 month
2014-02-16 19:44:07 +00:00
Christian Brueffer
7f47cbd3ce Retire the nve(4) driver; nfe(4) has been the default driver for NVIDIA
nForce MCP adapters for a long time.

Yays:	jhb, remko, yongari
Nays:	none on the current and stable lists
2014-02-16 12:22:43 +00:00
Andriy Gapon
f25e50cf0b provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64
Reviewed by:	jhb
MFC after:	10 days
X-MFC note:	consider thirdparty modules depending on these symbols
Sponsored by:	HybridCluster
2014-02-14 15:18:37 +00:00
John Baldwin
4edef187b8 Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
  PCI domain/segment.  Host-PCI bridge drivers use this API to allocate
  bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
  their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
  full range of bus numbers from secbus to subbus from their parent bridge.
  The drivers also always program their primary bus register.  The bridge
  drivers also support growing their bus range by extending the bus resource
  and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
  used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.

Reviewed by:	imp
MFC after:	1 month
2014-02-12 04:30:37 +00:00
John Baldwin
4f67a8c5e9 Don't waste a page of KVA for the boot-time memory test on x86. For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1.  For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.

Reviewed by:	alc, peter
MFC after:	2 weeks
2014-02-11 22:02:40 +00:00
John Baldwin
abb023fb95 Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and
XSAVE-enabled features like AVX.
- Store a per-cpu guest xcr0 register.  When switching to the guest FPU
  state, switch to the guest xcr0 value.  Note that the guest FPU state is
  saved and restored using the host's xcr0 value and xcr0 is saved/restored
  "inside" of saving/restoring the guest FPU state.
- Handle VM exits for the xsetbv instruction by updating the guest xcr0.
- Expose the XSAVE feature to the guest only if the host has enabled XSAVE,
  and only advertise XSAVE features enabled by the host to the guest.
  This ensures that the guest will only adjust FPU state that is a subset
  of the guest FPU state saved and restored by the host.

Reviewed by:	grehan
2014-02-08 16:37:54 +00:00
Neel Natu
bf73979dd9 Add a counter to differentiate between VM-exits due to nested paging faults
and instruction emulation faults.
2014-02-08 06:22:09 +00:00
Neel Natu
62fbd7c27a Fix a bug in the handling of VM-exits caused by non-maskable interrupts (NMI).
If a VM-exit is caused by an NMI then "blocking by NMI" is in effect on the
CPU when the VM-exit is completed. No more NMIs will be recognized until
the execution of an "iret".

Prior to this change the NMI handler was dispatched via a software interrupt
with interrupts enabled. This meant that an interrupt could be recognized
by the processor before the NMI handler completed its execution. The "iret"
issued by the interrupt handler would then cause the "blocking by NMI" to
be cleared prematurely.

This is now fixed by handling the NMI with interrupts disabled in addition
to "blocking by NMI" already established by the VM-exit.
2014-02-08 05:04:34 +00:00
John Baldwin
00f3efe1bd Add support for FreeBSD/i386 guests under bhyve.
- Similar to the hack for bootinfo32.c in userboot, define
  _MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot.
  This allows userboot to load 32-bit kernels and modules.
- Copy the SMAP generation code out of bootinfo64.c and into its own
  file so it can be shared with bootinfo32.c to pass an SMAP to the i386
  kernel.
- Use uint32_t instead of u_long when aligning module metadata in
  bootinfo32.c in userboot, as otherwise the metadata used 64-bit
  alignment which corrupted the layout.
- Populate the basemem and extmem members of the bootinfo struct passed
  to 32-bit kernels.
- Fix the 32-bit stack in userboot to start at the top of the stack
  instead of the bottom so that there is room to grow before the
  kernel switches to its own stack.
- Push a fake return address onto the 32-bit stack in addition to the
  arguments normally passed to exec() in the loader.  This return
  address is needed to convince recover_bootinfo() in the 32-bit
  locore code that it is being invoked from a "new" boot block.
- Add a routine to libvmmapi to setup a 32-bit flat mode register state
  including a GDT and TSS that is able to start the i386 kernel and
  update bhyveload to use it when booting an i386 kernel.
- Use the guest register state to determine the CPU's current instruction
  mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long
  mode) in the instruction emulation code.  Update the gla2gpa() routine
  used when fetching instructions to handle flat mode, 32-bit paging, and
  PAE paging in addition to long mode paging.  Don't look for a REX
  prefix when the CPU is in 32-bit mode, and use the detected mode to
  enable the existing 32-bit mode code when decoding the mod r/m byte.

Reviewed by:	grehan, neel
MFC after:	1 month
2014-02-05 04:39:03 +00:00
Tycho Nightingale
54e03e07b3 Add support for emulating the byte move and zero extend instructions:
"mov r/m8, r32" and "mov r/m8, r64".

Approved by:	neel (co-mentor)
2014-02-05 02:01:08 +00:00
Neel Natu
953c2c47eb Avoid doing unnecessary nested TLB invalidations.
Prior to this change the cached value of 'pm_eptgen' was tracked per-vcpu
and per-hostcpu. In the degenerate case where 'N' vcpus were sharing
a single hostcpu this could result in 'N - 1' unnecessary TLB invalidations.
Since an 'invept' invalidates mappings for all VPIDs the first 'invept'
is sufficient.

Fix this by moving the 'eptgen[MAXCPU]' array from 'vmxctx' to 'struct vmx'.

If it is known that an 'invept' is going to be done before entering the
guest then it is safe to skip the 'invvpid'. The stat VPU_INVVPID_SAVED
counts the number of 'invvpid' invalidations that were avoided because
they were subsumed by an 'invept'.

Discussed with:	grehan
2014-02-04 02:45:08 +00:00
John Baldwin
3cbf3585cb Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
  to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
  ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
  interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
  PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
  that share a slot attempt to distribute their INTx interrupts across
  the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
  and when the INTx DIS bit is set in a function's PCI command register.
  Either assert or deassert the associated I/O APIC pin when the
  state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by:	neel (7)
Reviewed by:	neel
MFC after:	2 weeks
2014-01-29 14:56:48 +00:00