Commit Graph

7242 Commits

Author SHA1 Message Date
neel
9cbb7c919e Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.

MFC after:	1 week
2015-04-30 22:23:22 +00:00
neel
f57c0156d3 When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

Requested by:	grehan
MFC after:	1 week
2015-04-30 21:00:47 +00:00
neel
6641e0d4f2 Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.
This is required for booting Windows guests.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-30 19:23:50 +00:00
jhb
9c4c8b62fb Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
neel
acc3b0bbd7 Re-implement RTC current time calculation to eliminate the possibility of
losing time.

The problem with the earlier implementation was that the uptime value
used by 'vrtc_curtime()' could be different than the uptime value when
'vrtc_time_update()' actually updated 'base_uptime'.

Fix this by calculating and updating the (rtctime, uptime) tuple together.

MFC after:	2 weeks
2015-04-29 23:44:28 +00:00
whu
0b81a49657 Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
    - Vector interrupt support.
    - Signal optimization.
    - Storvsc driver performance improvement.
    - Scatter and gather support for storvsc driver.
    - Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR:     195238
Submitted by:   whu
Reviewed by:    royger, jhb, delphij
Approved by:    royger
MFC after:      2 weeks
Relnotes:       yes
Sponsored by:   Microsoft OSTC
2015-04-29 10:12:34 +00:00
neel
42e24b0f80 Emulate the 'bit test' instruction. Windows 7 uses 'bit test' to check the
'Delivery Status' bit in APIC ICR register.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-29 02:01:46 +00:00
neel
0151349b05 Implement the century byte in the RTC. Some guests require this field to be
properly set.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-28 23:44:47 +00:00
tychon
da94200043 STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.
Reviewed by:	neel
2015-04-25 19:02:06 +00:00
kib
cbdd03ad96 Move common code from sys/i386/i386/mp_machdep.c and
sys/amd64/amd64/mp_machdep.c, to the new common x86 source
sys/x86/x86/mp_x86.c.

Proposed and reviewed by:	jhb
Review:	https://reviews.freebsd.org/D2347
Sponsored by:	The FreeBSD Foundation
2015-04-24 16:20:56 +00:00
jhb
e4683250d1 Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
araujo
41717d52d1 Missing break in switch case.
Differential Revision:	D2342
Reviewed by:		neel
2015-04-23 02:50:06 +00:00
kib
9fb28191d6 Move some common code from sys/amd64/amd64/machdep.c and
sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c.  Most
of the code is related to the idle handling.

Discussed with:	pluknet
Sponsored by:	The FreeBSD Foundation
2015-04-22 12:32:14 +00:00
kib
2bf89b258b Remove duplicate definitions of MWAIT_CX hints. Identical defines in
specialreg.h are enough.

Discussed with:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-20 08:25:55 +00:00
kib
32cf8052a6 Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by:	alc
Discussed with:	Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-18 21:23:16 +00:00
neel
4fe75f1860 Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-16 22:44:51 +00:00
neel
6a373c232e Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.
MFC after:	1 week
2015-04-16 20:15:47 +00:00
emaste
f81c3bc31e Use explicitly sized types in EFI module metadata
This will allow the same metadata struct to be used on all platforms.

Differential Revision:	https://reviews.freebsd.org/D2275
Reviewed by:	jhb
2015-04-10 19:26:45 +00:00
tychon
9079cdda46 Enhance the support for Group 1 Extended opcodes:
* Implemement the 0x81 and 0x83 CMP instructions.
  * Implemement the 0x83 AND instruction.
  * Implemement the 0x81 OR instruction.

Reviewed by:	neel
2015-04-06 12:22:41 +00:00
eadler
cea14a6cb5 adrian asked me to revert and get more testing 2015-04-05 05:18:14 +00:00
eadler
41fc5228f6 head/sys/amd64/amd64/support.S: unroll loop
unroll the loop in ENTRY(pagezero)
	acc' to the submitter this results in a reproducible 1% perf
	improvement under buildworld like workload

	I validated correctness and run-testing, but not performance impact

Submitted by:	lidl@pix.net
Reviewed by:	adrian
PR:		199151
MFC After:	1 month
2015-04-05 05:07:24 +00:00
rstone
57feb6fb43 Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int.  This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc.  zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision:	https://reviews.freebsd.org/D2106
Reviewed by:	kib
Reported by:	Michael Fuckner <michael@fuckner.net>
Tested by:	Michael Fuckner <michael@fuckner.net>
MFC after:	2 weeks
2015-04-01 12:42:26 +00:00
tychon
b3c521e852 Fix "MOVS" instruction memory to MMIO emulation. Currently updates to
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.

Add "MOVS" instruction support for the 'MMIO to MMIO' case.

Reviewed by:	neel
2015-04-01 00:15:31 +00:00
kib
82927bc930 Provide workaround for a performance issue with the popcnt instruction
on Intel processors.  Clear spurious dependency by explicitely xoring
the destination register of popcnt.

Use bitcount64() instead of re-implementing SWAR locally, for
processors without popcnt instruction.

Reviewed by:	jhb
Discussed with:	jilles (previous version)
Sponsored by:	The FreeBSD Foundation
2015-03-31 01:44:07 +00:00
jhb
4e5400d652 Wait 100 microseconds for a local APIC to dispatch each startup-related IPI
rather than 20.  The MP 1.4 specification states in Appendix B.2:

  "A period of 20 microseconds should be sufficient for IPI dispatch to
   complete under normal operating conditions".

(Note that this appears to be separate from the 10 millisecond (INIT) and
200 microsecond (STARTUP) waits after the IPIs are dispatched.)  The
Intel SDM is silent on this issue as far as I can tell.

At least some hardware requires 60 microseconds as noted in the PR, so
bump this to 100 to be on the safe side.

PR:		197756
Reported by:	zaphod@berentweb.com
MFC after:	1 week
2015-03-30 20:13:22 +00:00
kib
1a6b57192c Make it possible for the signal handler to act on #ss. Load the
canonical user data segment' selector into %ss when calling the
handler.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-28 09:03:54 +00:00
kib
dff23c6926 The #ss fault handler erronously does not check for the fault
originated from the return to usermode. #ss must be handled same as
#np.

Reported by:	Andrew Lutomirski through secteam
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-03-28 09:02:19 +00:00
neel
4eebcc1fbb Fix the RTC device model to operate correctly in 12-hour mode. The following
table documents the values in the RTC 'hour' field in the two modes:

Hour-of-the-day		12-hour mode	24-hour mode
12	AM		12		0
[1-11]	AM		[1-11]		[1-11]
12	PM		0x80 | 12	12
[1-11]	PM		0x80 | [1-11]	[13-23]

Reported by:	Julian Hsiao (madoka@nyanisore.net)
MFC after:	1 week
2015-03-28 02:55:16 +00:00
tychon
b925086de0 When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

Also if an instruction doesn't support a mod R/M (modRM) byte, don't
be concerned if the CPU is in real mode.

Reviewed by:	neel
2015-03-24 17:12:36 +00:00
kib
b178649de8 Use VT-d interrupt remapping block (IR) to perform FSB messages
translation.  In particular, despite IO-APICs only take 8bit apic id,
IR translation structures accept 32bit APIC Id, which allows x2APIC
mode to function properly.  Extend msi_cpu of struct msi_intrsrc and
io_cpu of ioapic_intsrc to full int from one byte.

KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid
bringing all dmar headers into interrupt code. The non-PCI(e) devices
which generate message interrupts on FSB require special handling. The
HPET FSB interrupts are remapped, while DMAR interrupts are not.

For each msi and ioapic interrupt source, the iommu cookie is added,
which is in fact index of the IRE (interrupt remap entry) in the IR
table. Cookie is made at the source allocation time, and then used at
the map time to fill both IRE and device registers. The MSI
address/data registers and IO-APIC redirection registers are
programmed with the special values which are recognized by IR and used
to restore the IRE index, to find proper delivery mode and target.
Map all MSI interrupts in the block when msi_map() is called.

Since an interrupt source setup and dismantle code are done in the
non-sleepable context, flushing interrupt entries cache in the IR
hardware, which is done async and ideally waits for the interrupt,
requires busy-wait for queue to drain.  The dmar_qi_wait_for_seq() is
modified to take a boolean argument requesting busy-wait for the
written sequence number instead of waiting for interrupt.

Some interrupts are configured before IR is initialized, e.g. ACPI
SCI.  Add intr_reprogram() function to reprogram all already
configured interrupts, and call it immediately before an IR unit is
enabled.  There is still a small window after the IO-APIC redirection
entry is reprogrammed with cookie but before the unit is enabled, but
to fix this properly, IR must be started much earlier.

Add workarounds for 5500 and X58 northbridges, some revisions of which
have severe flaws in handling IR.  Use the same identification methods
as employed by Linux.

Review:	https://reviews.freebsd.org/D1892
Reviewed by:	neel
Discussed with:	jhb
Tested by:	glebius, pho (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-03-19 13:57:47 +00:00
jfv
06710b0884 Update to the Intel ixgbe driver:
- Split the driver into independent pf and vf loadables. This is
	  in preparation for SRIOV support which will be following shortly.
	  This also allows us to keep a seperate revision control over the
	  two parts, making for easier sustaining.
	- Make the TX/RX code a shared/seperated file, in the old code base
	  the ixv code would miss fixes that went into ixgbe, this model
	  will eliminate that problem.
	- The driver loadables will now match the device names, something that
	  has been requested for some time.
	- Rather than a modules/ixgbe there is now modules/ix and modules/ixv
	- It will also be possible to make your static kernel with only one
	  or the other for streamlined installs, or both.

Enjoy!

Submitted by: jfv and erj
2015-03-17 18:32:28 +00:00
mav
af4c17529a Report ARAT (APIC-Timer-always-running) feature for virtual CPU.
This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET
due to worries about non-existing for virtual CPUs deep sleep states.

Benchmarks of usleep(1) on guest and host show such extra latencies:
 - 51us for virtual HPET,
 - 22us for virtual LAPIC timer,
 - 22us for host HPET and
 - 3us for host LAPIC timer.

MFC after:	2 weeks
2015-03-16 11:57:03 +00:00
neel
e16d18eeb8 Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve when
vmm.ko is loaded.

Also relocate the 'justreturn' IPI handler to be alongside all other handlers.

Requested by:	kib
2015-03-14 02:32:08 +00:00
jhb
c1c1fd8e25 Only schedule interrupts on a single hyperthread of a modern Intel CPU core
by default.  Previously we used a single hyperthread on Pentium4-era
cores but used both hyperthreads on more recent CPUs.

MFC after:	2 weeks
2015-03-06 20:34:28 +00:00
tychon
eedd9cfb16 When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.

Reviewed by:	neel
2015-03-06 02:05:45 +00:00
neel
e549774cf3 Fix warnings/errors when building vmm.ko with gcc:
- fix warning about comparison of 'uint8_t v_tpr >= 0' always being true.

- fix error triggered by an empty clobber list in the inline assembly for
  "clgi" and "stgi"

- fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The
  gcc assembler does not like the explicit operand "%rax" while the clang
  assembler requires specifying the operand "%rax". Fix this by encoding the
  instructions using the ".byte" directive.

Reported by:	julian
MFC after:	1 week
2015-03-02 20:13:49 +00:00
rstone
e40d09375f Implement interface to create SR-IOV Virtual Functions
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device.  An ioctl can be invoked on that device to
create VFs and have the driver initialize them.

At this point, allocating memory I/O windows (BARs) is not
supported.

Differential Revision:	https://reviews.freebsd.org/D76
Reviewed by:		jhb
MFC after: 		1 month
Sponsored by:		Sandvine Inc.
2015-03-01 00:40:09 +00:00
rstone
29fedc4c2c Allow passthrough devices to be hinted.
Allow the ppt driver to attach to devices that were hinted to be
passthrough devices by the PCI code creating them with a driver
name of "ppt".

Add a tunable that allows the IOMMU to be forced to be used.  With
SR-IOV passthrough devices the VFs may be created after vmm.ko is
loaded.  The current code will not initialize the IOMMU in that
case, meaning that the passthrough devices can't actually be used.

Differential Revision:	https://reviews.freebsd.org/D73
Reviewed by:		neel
MFC after: 		1 month
Sponsored by:		Sandvine Inc.
2015-03-01 00:39:48 +00:00
kib
59184505da Supposed fix for some SandyBridge mobile CPUs hang on AP startup when
x2APIC mode is detected and enabled.  Current theory is that switching
the APIC mode while an IPI is in flight might be the issue.

Postpone switching to x2APIC mode until we are guaranteed that all
starting IPIs are already send and aknowledged.  Use aps_ready signal
as an indication that the BSP is done with us.

Tested by:	adrian
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-28 20:37:38 +00:00
neel
f458d8769b Always emulate MSR_PAT on Intel processors and don't rely on PAT save/restore
capability of VT-x. This lets bhyve run nested in older VMware versions that
don't support the PAT save/restore capability.

Note that the actual value programmed by the guest in MSR_PAT is irrelevant
because bhyve sets the 'Ignore PAT' bit in the nested PTE.

Reported by:	marcel
Tested by:	Leon Dang (ldang@nahannisys.com)
Sponsored by:	Nahanni Systems
MFC after:	2 weeks
2015-02-24 05:35:15 +00:00
jhb
4145efe339 Ensure that the supplied data length is large enough to hold the base
FPU state to avoid passing a negative length to fpusetregs() / npxsetregs().

Differential Revision:	https://reviews.freebsd.org/D1861
Reviewed by:	kib, emaste
2015-02-18 23:34:03 +00:00
kib
7a89c3df60 Initialize x2APIC mode on the resume path before accessing LAPIC.
Remove unneeded disable of LAPIC in the native_lapic_xapic_mode().  We
attempt to send wakeup IPI on the resume path right after BSP wakeup,
so disabling is wrong.

Reported and tested by:	glebius, "Ranjan1018 ." <214748mv@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-16 21:56:19 +00:00
markj
fc1485dfdc Add support for decoding multibyte NOPs.
Differential Revision:	https://reviews.freebsd.org/D1830
Reviewed by:	jhb, kib
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Divison
2015-02-13 01:35:53 +00:00
kib
0754f0eac9 Add x2APIC support. Enable it by default if CPU is capable. The
hw.x2apic_enable tunable allows disabling it from the loader prompt.

To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications.  This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered.  This
may be changed later.

In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.

Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.

Reviewed by:	neel
Tested by:	pho (real hardware), neel (on bhyve)
Discussed with:	jhb, grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-09 21:00:56 +00:00
jhb
9dfc36d985 Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4.  The Intel SDM claims that
the INIT IPIs here are invalid, but other systems follow the MP
spec instead.

While here, fix the IPI wait routine to accept a timeout in microseconds
instead of a raw spin count, and don't spin forever during AP startup.
Instead, panic if a STARTUP IPI is not delivered after 20 us.

PR:		196542
Differential Revision:	https://reviews.freebsd.org/D1719
MFC after:	2 weeks
2015-02-06 18:19:59 +00:00
bryanv
25ce9181cf Generalized parts of the XEN timer code into a generic pvclock
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.

Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by:	royger (eariler version)
MFC after:	1 month
2015-02-04 08:26:43 +00:00
kib
3bbc91d138 Do not qualify the mcontext_t *mcp argument for set_mcontext(9) as
const.  On x86, even after the machine context is supposedly read into
the struct ucontext, lazy FPU state save code might only mark the FPU
data as hardware-owned.  Later, set_fpcontext() needs to fetch the
state from hardware, modifying the *mcp.

The set_mcontext(9) is called from sigreturn(2) and setcontext(2)
implementations and old create_thread(2) interface, which throw the
*mcp out after the set_mcontext() call.

Reported by:	dim
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-31 21:43:46 +00:00
royger
217732b47d amd64: allow base memory segment to start at address different than 0
Current code requires that the first physical memory segment starts at 0,
but this is not really needed. We only need to make sure the bootstrap code
and page tables for APs are allocated below 4GB.

This patch removes this requirement and allows booting a Dell R710 from
UEFI, where the first physical memory segment starts at 0x10000.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D1417
2015-01-26 08:42:47 +00:00
jhb
03984c1e9c If the boot-time memory test is enabled, output a dot ('.') for
each GB of RAM tested so people watching the console can see that
the machine is making progress and not hung.

PR:		196650
Submitted by:	Ravi Pokala <rpokala@panasas.com>
Suggestions from:	Eric van Gyzen <eric@vangyzen.net>
MFC after:	2 weeks
2015-01-25 20:16:45 +00:00
des
ffa2c78ca2 Remove ISA NICs. Anyone still using these on amd64 can build their
own kernel.
2015-01-25 12:02:38 +00:00