7261 Commits

Author SHA1 Message Date
kib
c3a04ab331 On amd64, make proc0 pmap initialization slightly more correct. In
particular, switch to the proc0 pmap to have expected %cr3 and PCID
for the thread0 during initialization, and the up to date pm_active
mask.

pmap_pinit0() should be done after proc0->p_vmspace is assigned so
that the amd64 pmap_activate() find the correct curproc pmap.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 08:30:29 +00:00
kib
7531e6e70e Implement the support for PCID in UP kernels.
Requested by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 07:57:47 +00:00
jimharris
32fd259513 Add nvme and nvd drivers to GENERIC for amd64 and i386.
MFC after:	3 days
Sponsored by:	Intel
2015-05-14 20:19:22 +00:00
trasz
82bbee8b66 Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Differential Revision:	https://reviews.freebsd.org/D2407
Reviewed by:	emaste@, wblock@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-05-14 14:03:55 +00:00
kib
d7675fcc44 Initialize pcids array for the proc0 pmap.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:09:07 +00:00
kib
29ab4beeb4 Tweak assert to also print the thread address.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-10 09:05:57 +00:00
kib
f371322983 On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
kib
2bf53173c9 Correct the assertion. We should compare the pmap' curcpu pcid value
against 0, not the pmap.

Noted by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 21:36:44 +00:00
kib
3fb738761e Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
kib
3359a03537 Remove unused define.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-05-09 18:38:35 +00:00
kib
6006bf3a7d If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
neel
390ea95709 Check 'td_owepreempt' and yield the vcpu thread if it is set.
This is done explicitly because a vcpu thread can be in a critical section
for the entire time slice alloted to it. This in turn can delay the handling
of the 'td_owepreempt'.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D2430
2015-05-06 23:40:24 +00:00
neel
7776059e98 Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.

The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.

Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.

Reviewed by:	tychon
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2428
2015-05-06 16:25:20 +00:00
neel
e5110c0076 Do a proper emulation of guest writes to MSR_EFER.
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
  segment limits in 64-bit mode.

MFC after:	2 weeks
2015-05-06 05:40:20 +00:00
neel
46cb5cb412 Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windows
Vista guest.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-04 04:27:23 +00:00
neel
ed97fa3273 Don't advertise the Intel SMX capability to the guest.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	1 week
2015-05-02 19:07:49 +00:00
neel
cd20ad9aa3 Emulate machine check related MSRs to allow guest OSes like Windows to boot.
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-02 04:19:11 +00:00
neel
755b785420 r281630 relaxed the limits on the vectors that can be asserted in the IRRs.
Do the same when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-05-01 16:00:29 +00:00
neel
c782f2195b Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC after:	2 weeks
2015-05-01 05:11:14 +00:00
neel
9cbb7c919e Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.

MFC after:	1 week
2015-04-30 22:23:22 +00:00
neel
f57c0156d3 When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

Requested by:	grehan
MFC after:	1 week
2015-04-30 21:00:47 +00:00
neel
6641e0d4f2 Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.
This is required for booting Windows guests.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-30 19:23:50 +00:00
jhb
9c4c8b62fb Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
neel
acc3b0bbd7 Re-implement RTC current time calculation to eliminate the possibility of
losing time.

The problem with the earlier implementation was that the uptime value
used by 'vrtc_curtime()' could be different than the uptime value when
'vrtc_time_update()' actually updated 'base_uptime'.

Fix this by calculating and updating the (rtctime, uptime) tuple together.

MFC after:	2 weeks
2015-04-29 23:44:28 +00:00
whu
0b81a49657 Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
    - Vector interrupt support.
    - Signal optimization.
    - Storvsc driver performance improvement.
    - Scatter and gather support for storvsc driver.
    - Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR:     195238
Submitted by:   whu
Reviewed by:    royger, jhb, delphij
Approved by:    royger
MFC after:      2 weeks
Relnotes:       yes
Sponsored by:   Microsoft OSTC
2015-04-29 10:12:34 +00:00
neel
42e24b0f80 Emulate the 'bit test' instruction. Windows 7 uses 'bit test' to check the
'Delivery Status' bit in APIC ICR register.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-29 02:01:46 +00:00
neel
0151349b05 Implement the century byte in the RTC. Some guests require this field to be
properly set.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-28 23:44:47 +00:00
tychon
da94200043 STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.
Reviewed by:	neel
2015-04-25 19:02:06 +00:00
kib
cbdd03ad96 Move common code from sys/i386/i386/mp_machdep.c and
sys/amd64/amd64/mp_machdep.c, to the new common x86 source
sys/x86/x86/mp_x86.c.

Proposed and reviewed by:	jhb
Review:	https://reviews.freebsd.org/D2347
Sponsored by:	The FreeBSD Foundation
2015-04-24 16:20:56 +00:00
jhb
e4683250d1 Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
araujo
41717d52d1 Missing break in switch case.
Differential Revision:	D2342
Reviewed by:		neel
2015-04-23 02:50:06 +00:00
kib
9fb28191d6 Move some common code from sys/amd64/amd64/machdep.c and
sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c.  Most
of the code is related to the idle handling.

Discussed with:	pluknet
Sponsored by:	The FreeBSD Foundation
2015-04-22 12:32:14 +00:00
kib
2bf89b258b Remove duplicate definitions of MWAIT_CX hints. Identical defines in
specialreg.h are enough.

Discussed with:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-20 08:25:55 +00:00
kib
32cf8052a6 Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by:	alc
Discussed with:	Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-18 21:23:16 +00:00
neel
4fe75f1860 Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-16 22:44:51 +00:00
neel
6a373c232e Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.
MFC after:	1 week
2015-04-16 20:15:47 +00:00
emaste
f81c3bc31e Use explicitly sized types in EFI module metadata
This will allow the same metadata struct to be used on all platforms.

Differential Revision:	https://reviews.freebsd.org/D2275
Reviewed by:	jhb
2015-04-10 19:26:45 +00:00
tychon
9079cdda46 Enhance the support for Group 1 Extended opcodes:
* Implemement the 0x81 and 0x83 CMP instructions.
  * Implemement the 0x83 AND instruction.
  * Implemement the 0x81 OR instruction.

Reviewed by:	neel
2015-04-06 12:22:41 +00:00
eadler
cea14a6cb5 adrian asked me to revert and get more testing 2015-04-05 05:18:14 +00:00
eadler
41fc5228f6 head/sys/amd64/amd64/support.S: unroll loop
unroll the loop in ENTRY(pagezero)
	acc' to the submitter this results in a reproducible 1% perf
	improvement under buildworld like workload

	I validated correctness and run-testing, but not performance impact

Submitted by:	lidl@pix.net
Reviewed by:	adrian
PR:		199151
MFC After:	1 month
2015-04-05 05:07:24 +00:00
rstone
57feb6fb43 Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int.  This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc.  zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision:	https://reviews.freebsd.org/D2106
Reviewed by:	kib
Reported by:	Michael Fuckner <michael@fuckner.net>
Tested by:	Michael Fuckner <michael@fuckner.net>
MFC after:	2 weeks
2015-04-01 12:42:26 +00:00
tychon
b3c521e852 Fix "MOVS" instruction memory to MMIO emulation. Currently updates to
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.

Add "MOVS" instruction support for the 'MMIO to MMIO' case.

Reviewed by:	neel
2015-04-01 00:15:31 +00:00
kib
82927bc930 Provide workaround for a performance issue with the popcnt instruction
on Intel processors.  Clear spurious dependency by explicitely xoring
the destination register of popcnt.

Use bitcount64() instead of re-implementing SWAR locally, for
processors without popcnt instruction.

Reviewed by:	jhb
Discussed with:	jilles (previous version)
Sponsored by:	The FreeBSD Foundation
2015-03-31 01:44:07 +00:00
jhb
4e5400d652 Wait 100 microseconds for a local APIC to dispatch each startup-related IPI
rather than 20.  The MP 1.4 specification states in Appendix B.2:

  "A period of 20 microseconds should be sufficient for IPI dispatch to
   complete under normal operating conditions".

(Note that this appears to be separate from the 10 millisecond (INIT) and
200 microsecond (STARTUP) waits after the IPIs are dispatched.)  The
Intel SDM is silent on this issue as far as I can tell.

At least some hardware requires 60 microseconds as noted in the PR, so
bump this to 100 to be on the safe side.

PR:		197756
Reported by:	zaphod@berentweb.com
MFC after:	1 week
2015-03-30 20:13:22 +00:00
kib
1a6b57192c Make it possible for the signal handler to act on #ss. Load the
canonical user data segment' selector into %ss when calling the
handler.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-28 09:03:54 +00:00
kib
dff23c6926 The #ss fault handler erronously does not check for the fault
originated from the return to usermode. #ss must be handled same as
#np.

Reported by:	Andrew Lutomirski through secteam
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-03-28 09:02:19 +00:00
neel
4eebcc1fbb Fix the RTC device model to operate correctly in 12-hour mode. The following
table documents the values in the RTC 'hour' field in the two modes:

Hour-of-the-day		12-hour mode	24-hour mode
12	AM		12		0
[1-11]	AM		[1-11]		[1-11]
12	PM		0x80 | 12	12
[1-11]	PM		0x80 | [1-11]	[13-23]

Reported by:	Julian Hsiao (madoka@nyanisore.net)
MFC after:	1 week
2015-03-28 02:55:16 +00:00
tychon
b925086de0 When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

Also if an instruction doesn't support a mod R/M (modRM) byte, don't
be concerned if the CPU is in real mode.

Reviewed by:	neel
2015-03-24 17:12:36 +00:00
kib
b178649de8 Use VT-d interrupt remapping block (IR) to perform FSB messages
translation.  In particular, despite IO-APICs only take 8bit apic id,
IR translation structures accept 32bit APIC Id, which allows x2APIC
mode to function properly.  Extend msi_cpu of struct msi_intrsrc and
io_cpu of ioapic_intsrc to full int from one byte.

KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid
bringing all dmar headers into interrupt code. The non-PCI(e) devices
which generate message interrupts on FSB require special handling. The
HPET FSB interrupts are remapped, while DMAR interrupts are not.

For each msi and ioapic interrupt source, the iommu cookie is added,
which is in fact index of the IRE (interrupt remap entry) in the IR
table. Cookie is made at the source allocation time, and then used at
the map time to fill both IRE and device registers. The MSI
address/data registers and IO-APIC redirection registers are
programmed with the special values which are recognized by IR and used
to restore the IRE index, to find proper delivery mode and target.
Map all MSI interrupts in the block when msi_map() is called.

Since an interrupt source setup and dismantle code are done in the
non-sleepable context, flushing interrupt entries cache in the IR
hardware, which is done async and ideally waits for the interrupt,
requires busy-wait for queue to drain.  The dmar_qi_wait_for_seq() is
modified to take a boolean argument requesting busy-wait for the
written sequence number instead of waiting for interrupt.

Some interrupts are configured before IR is initialized, e.g. ACPI
SCI.  Add intr_reprogram() function to reprogram all already
configured interrupts, and call it immediately before an IR unit is
enabled.  There is still a small window after the IO-APIC redirection
entry is reprogrammed with cookie but before the unit is enabled, but
to fix this properly, IR must be started much earlier.

Add workarounds for 5500 and X58 northbridges, some revisions of which
have severe flaws in handling IR.  Use the same identification methods
as employed by Linux.

Review:	https://reviews.freebsd.org/D1892
Reviewed by:	neel
Discussed with:	jhb
Tested by:	glebius, pho (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-03-19 13:57:47 +00:00
jfv
06710b0884 Update to the Intel ixgbe driver:
- Split the driver into independent pf and vf loadables. This is
	  in preparation for SRIOV support which will be following shortly.
	  This also allows us to keep a seperate revision control over the
	  two parts, making for easier sustaining.
	- Make the TX/RX code a shared/seperated file, in the old code base
	  the ixv code would miss fixes that went into ixgbe, this model
	  will eliminate that problem.
	- The driver loadables will now match the device names, something that
	  has been requested for some time.
	- Rather than a modules/ixgbe there is now modules/ix and modules/ixv
	- It will also be possible to make your static kernel with only one
	  or the other for streamlined installs, or both.

Enjoy!

Submitted by: jfv and erj
2015-03-17 18:32:28 +00:00