153 Commits

Author SHA1 Message Date
jhb
95a3814f21 MFamd64: Add bounds checks on addresses used with /dev/mem.
Reject attempts to read from or memory map offsets in /dev/mem that are
beyond the maximum-supported physical address of the current CPU.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7408
2016-10-27 21:23:14 +00:00
kib
45100446da Follow-up to r307866:
- Make !KDB config buildable.
- Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi
  in one place, namely nmi_call_kdb().  This allows to remove do_panic
  argument from the functions, and to remove i386/amd64 duplication of
  the variable and sysctl definitions.  Note that now NMI causes
  panic(9) instead of trap_fatal() reporting and then panic(9),
  consistently for NMIs delivered while CPU operated in ring 0 and 3.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-24 20:47:46 +00:00
kib
a04db702cd Handle broadcast NMIs.
On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs
reporting hardware errors are broadcasted to all CPUs.

When kernel is configured to enter kdb on NMI, the outcome is
problematic, because each CPU tries to enter kdb.  All CPUs are
executing NMI handlers, which set the latches disabling the nested NMI
delivery; this means that stop_cpus_hard(), used by kdb_enter() to
stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work.  One
indication of this is the harmless but annoying diagnostic "timeout
stopping cpus".

Much more harming behaviour is that because all CPUs try to enter kdb,
and if ddb is used as debugger, all CPUs issue prompt on console and
race for the input, not to mention the simultaneous use of the ddb
shared state.

Try to fix this by introducing a pseudo-lock for simultaneous attempts
to handle NMIs.  If one core happens to enter NMI trap handler, other
cores see it and simulate reception of the IPI_STOP_HARD.  More,
generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting
for the acknowledgement, relying on the nmi handler on other cores
suspending and then restarting the CPU.

Since it is impossible to detect at runtime whether some stray NMI is
broadcast or unicast, add a knob for administrator (really developer)
to configure debugging NMI handling mode.

The updated patch was debugged with the help from Andrey Gapon (avg)
and discussed with him.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D8249
2016-10-24 16:40:27 +00:00
kib
559623d89a Re-apply r306516 (by cem):
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags

Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-10-04 17:01:24 +00:00
cem
de42bf751c Revert r306516 for now, it is incomplete on i386
Noted by:	kib
2016-09-30 18:58:50 +00:00
cem
22e3a710d0 Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags
Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version), kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-09-30 18:12:16 +00:00
kib
3f4c126b42 Detect x2APIC mode on boot and obey it.
If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode,
system usually consumes such configuration without a notice, since
x2APIC is turned on by OS if possible (nop).  But if BIOS
simultaneously requested OS to not use x2APIC, code assumption that
that xAPIC is active breaks.

In my opinion, we cannot safely turn off x2APIC if control is passed
in this mode.  Make madt.c ignore user or BIOS requests to turn x2APIC
off, and do not check the x2APIC black list.  Just trust the config
and try to continue, giving a warning in dmesg.

Reported and tested by:	Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version)
Diagnosed by and discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-09-19 15:58:45 +00:00
bde
b8aaa2c367 Fix decoding of tf_rsp on amd64, and move TF_HAS_STACKREGS() to the
i386-only section, and fix a comment about the amd64 kernel trapframe
not having stackregs.

tf_rsp doesn't need decoding on amd64, but had an old clone of i386
code to do this in 1 place, and since the amd64 kernel trapframe does
have stackregs, the result was an off-by-16 error for %rsp in an error
message.
2016-09-16 07:09:35 +00:00
jhb
bc4a384597 Remove 'cpu' and 'cpu_class' on amd64.
The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386.  Remove them entirely on
amd64.

Reviewed by:	imp, kib (older version)
Differential Revision:	https://reviews.freebsd.org/D7888
2016-09-15 17:05:54 +00:00
bde
d58cd5baa4 Use the MI macro TRAPF_USERMODE() instead of open-coded checks for
SEL_UPL and sometimes PSL_VM.  This is just a style change on amd64,
but on i386 it fixes 1 unimportant place where the PSL_VM check was
missing and starts fixing 1 important place where the PSL_VM check
had a logic error.

Fix logic errors in treating vm86 bioscall mode as kernel mode.  The
main place checked all the necessary flags, but put the necessary
parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong
place.  The broken case is only reached if a vm86 bioscall uses a
%cs which is nonzero mod 4, but that is unusual -- most bios calls
start with %cs = 0xc000 or 0xf000 and rarely change it.  Another
place was missing the check for PCB_VM86CALL, but was only reachable
if there are bugs virtualizing PSL_I.

Add a macro TF_HAS_STACKREGS() and use this instead of converting
open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only
care about whether the frame has stack registers.  This fixes 3
places in my recent fix for register variables in vm86 mode where I
messed up the PSL_VM check and cleans up other places.
2016-09-14 12:57:40 +00:00
kib
e56264ca17 Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC.  For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge.  Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.

Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET.  For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.

Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods.  Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location.  __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.

Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access.  But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.

Tested by:	Howard Su <howard0su@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
ed
79cf319bae Implement _ALIGN() using internal integer types.
The existing version depends on register_t and uintptr_t, which are only
available when including headers such as <sys/types.h>. As this macro is
used by <sys/socket.h>, for example, it should be written in such a way
that it doesn't depend on those types.
2016-05-31 13:31:19 +00:00
ed
703fbbe36f Add missing dependency on <machine/_limits.h>.
In r227474, this header file was changed to define SIG_ATOMIC_{MIN,MAX}
in terms of LONG_{MIN,MAX}. Unlike all of the definitions in this header
file, LONG_{MIN,MAX} is provided by <limits.h>. Remove the dependency on
<limits.h> by using __LONG_{MIN,MAX} instead and including
<machine/_limits.h>.

This change is needed to make SIG_ATOMIC_{MIN,MAX} work without
including any other header files.
2016-05-31 08:38:24 +00:00
ed
b881bf575c Add missing dependency on <machine/_limits.h>.
This header uses __INT_MIN and __INT_MAX, which is provided by
<machine/_limits.h>. This is needed to make <stdint.h>'s WCHAR_MIN and
WCHAR_MAX work without including other headers as well.
2016-05-31 08:36:39 +00:00
sephe
1d0f0760f8 hyperv/vmbus: Rename ISR functions
MFC after:	1 week
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6601
2016-05-31 04:47:53 +00:00
kib
cde0a91a26 Add x86 CPU features definitions published in the Intel SDM rev. 58.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-04-16 06:07:13 +00:00
avg
f7d20d3734 re-enable AMD Topology extension on certain models if disabled by BIOS
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors.  We re-enable the extension, so that we can properly discover
core and cache topology.  Linux seems to do the same.

Reported by:	Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by:	jhb, kib
Tested by:	Johannes Dieterich <dieterich.joh@gmail.com>
		(earlier version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D5883
2016-04-12 13:30:39 +00:00
kib
eb986c64f5 Type of the interrupt handlers on x86 cannot be expressed in C.
Simplify and unify placeholder type definitions.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5771
2016-03-29 19:56:48 +00:00
kib
54481719ca Add defines for the LAPIC TSC deadline timer mode. The LVT timer mode
field is two-bit, extend the mask.

Also add comments about all MSRs writes to which are not serializing.

Sponsored by:	The FreeBSD Foundation
2016-03-28 09:43:40 +00:00
kib
a05a278552 POSIX states that #include <signal.h> shall make both mcontext_t and
ucontext_t available.  Our code even has XXX comment about this.

Add a bit of compliance by moving struct __ucontext definition into
sys/_ucontext.h and including it into signal.h and sys/ucontext.h.

Several machine/ucontext.h headers were changed to use namespace-safe
types (like uint64_t->__uint64_t) to not depend on sys/types.h.
struct __stack_t from sys/signal.h is made always visible in private
namespace to satisfy sys/_ucontext.h requirements.

Apparently mips _types.h pollutes global namespace with f_register_t
type definition.  This commit does not try to fix the issue.

PR:	207079
Reported and tested by:	Ting-Wei Lan <lantw44@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-12 07:38:19 +00:00
jhibbits
31bb8ee5bd Convert rman to use rman_res_t instead of u_long
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources.  For
now, this is still compatible with u_long.

This is step one in migrating rman to use uintmax_t for resources instead of
u_long.

Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.

This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.

Reviewed By: jhb
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
2016-01-27 02:23:54 +00:00
emaste
2029b75c0e Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
kib
65cfa1c59d Add standard extended feature bit 6 from the Intel SDM rev. 57, which
indicates that data-pointer in the saved x87 FPU state is only updated
on FPU exceptions.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-29 22:14:21 +00:00
jhb
994c23f093 Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
cem
ff6912bd6b x86: Add CPUID_STDEXT_* macros for CPU feature bits
A follow-up to r292478 and r292488.

Sponsored by:	EMC / Isilon Storage Division
2015-12-21 04:42:58 +00:00
kib
f124247e27 Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
royger
ee786e40e8 xen: Code cleanup and small bug fixes
xen/hypervisor.h:
 - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain,
   is_running_on_xen
 - Remove unused define CONFIG_X86_PAE
 - Remove unused variable xen_start_info: note that it's used inpcifront
   which is not built at all
 - Remove forward declaration of HYPERVISOR_crash

xen/xen-os.h:
 - Remove unused define CONFIG_X86_PAE
 - Drop unused helpers: test_and_clear_bit, clear_bit,
   force_evtchn_callback
 - Implement a generic version (based on ofed/include/linux/bitops.h) of
   set_bit and test_bit and prefix them by xen_ to avoid any use by other
   code than Xen. Note that It would be worth to investigate a generic
   implementation in FreeBSD.
 - Replace barrier() by __compiler_membar()
 - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop
   = pause

xen/xen_intr.h:
 - Move the prototype of xen_intr_handle_upcall in it: Use by all the
   platform

x86/xen/xen_intr.c:
 - Use BITSET* for the enabledbits: Avoid to use custom helpers
 - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit
 - Don't export the variable xen_intr_pcpu

dev/xen/blkback/blkback.c:
 - Fix the string format when XBB_DEBUG is enabled: host_addr is typed
   uint64_t

dev/xen/balloon/balloon.c:
 - Remove set but not used variable
 - Use the correct type for frame_list: xen_pfn_t represents the frame
   number on any architecture

dev/xen/control/control.c:
 - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
   first driver, every new device with the driver is unset will use
   xenstore.

dev/xen/grant-table/grant_table.c:
 - Remove unused cmpxchg
 - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't
   contain anything required for the code on x86

dev/xen/netfront/netfront.c:
 - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame
   number on any architecture

dev/xen/netback/netback.c:
 - Use the correct type for gmfn: xen_pfn_t represents the frame number on
   any architecture

dev/xen/xenstore/xenstore.c:
 - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
  first driver, every new device with the driver is unset will use xenstore.

Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore
arch-specific code. Although, a new series will add some helpers that differ
between x86 and ARM64, so I've kept the headers for now.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3921
Sponsored by:		Citrix Systems R&D
2015-10-21 10:44:07 +00:00
royger
d31f5fb7b9 x86/xen: Consolidate xen-os.h in a single place
amd64 and i386 platform code contain very similar xen/xen-os.h

The only differences are:
 - Functions/variables/types which were unused in i386/xen/xen-os.h:
    * xen_xchg
    * __xchg_dummy
    * __xg
    * __xchg
    * atomic_t
    * atomic_inc
    * rdtscll

The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.

The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3880
Sponsored by:		Citrix Systems R&D
2015-10-21 10:04:35 +00:00
markj
e8967c8bd9 Add stack_save_td_running(), a function to trace the kernel stack of a
running thread.

It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.

This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.

Reviewed by:	bdrewery, jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3256
2015-09-11 03:54:37 +00:00
markj
b30bc0e211 Remove the arg0 field from struct amd64_frame. Its existence was a bug,
since on amd64 the first argument to a function is generally not on the
stack.

Revert an old DTrace bug fix to some code that assumed that
sizeof(struct amd64_frame) == 16.

Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:31:22 +00:00
markj
dfb0cc5c03 Merge stack(9) implementations for i386 and amd64 under x86/.
Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:24:07 +00:00
imp
9b66762ac3 Add missing ofw_machdep.h. Make x86 ofw_machdep.h work pc98 too.
This allows the owc module to compile on pc98 and seems preferable to
adding another special case in the build system.
2015-08-28 15:41:09 +00:00
marcel
bacabe8a7e Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.

1.  Delay calling cninit() until after pmap_bootstrap(). This makes
    sure we have PMAP initialized enough to add translations. Keep
    kdb_init() after cninit() so that we have console when we need
    to break into the debugger on boot.
2.  Unfortunately, the ATPIC code had be moved as well so as to
    avoid a spurious trap #30. The reason for which is not known
    at this time.
3.  In pmap_mapdev_attr(), when we need to map a device prior to the
    VM system being initialized, use virtual_avail as the KVA to map
    the device at. In particular, avoid using the direct map on amd64
    because we can't demote by virtue of not being able to allocate
    yet. Keep track of the translation.
    Re-use the translation after the VM has been initialized to not
    waste KVA and to satisfy the assumption in uart(4) that the handle
    returned for the low-level console is the same as later returned
    when the device is probed and attached.
4.  In pmap_unmapdev() remove the mapping from the table when called
    pre-init. Otherwise keep the mapping. During bus probe and attach
    device resources are mapped and unmapped multiple times, which
    would have us destroy the mapping used by the low-level console.
5.  In pmap_init(), set pmap_initialized to signal that we're not
    pre-init anymore. On amd64, bring the direct map in sync with the
    translations created at that time.
6.  Implement bus_space_map() and bus_space_unmap() for real: when
    the tag corresponds to memory space, call the corresponding
    pmap_mapdev() and pmap_unmapdev() functions to construct and
    actual handle.
7.  In efifb.c and vt_vga.c, remove the crutches and hacks and simply
    call pmap_mapdev_attr() or bus_space_map() as desired.

Notes:
1.  uart(4) already used bus_space_map() during low-level console
    setup but since serial ports have traditionally been I/O port
    based, the lack of a proper implementation for said function
    was not a problem. It has always supported memory mapped UARTs
    for low-level consoles by setting hw.uart.console accordingly.
2.  The use of the direct map on amd64 without setting caching
    attributes has been a bigger problem than previously thought.
    This change has the fortunate (and unexpected) side-effect of
    fixing various EFI frame buffer problems (though not all).

PR: 191564, 194952

Special thanks to:
1.  XipLink, Inc -- generously donated an Intel Bay Trail E3800
    based eval board (ADLE3800PC).
2.  The FreeBSD Foundation, in particular emaste@ -- for UEFI
    support in general and testing.
3.  Everyone who tested the proposed for PR 191564.
4.  jhb@ and kib@ for being a soundboard and applying a clue bat
    if so needed.
2015-08-12 15:26:32 +00:00
jkim
1b47007c5a Fix more style issues.
Submitted by:	bde
2015-08-05 17:21:42 +00:00
emaste
50ae188f8f Rationalize BSD license on sys/*/include/float.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 17:05:35 +00:00
jkim
a9c8f64e12 Fix style(9) bugs. 2015-08-04 18:59:54 +00:00
jkim
c6a4c72646 Always define __va_list for amd64 and restore pre-r232261 behavior for i386.
Note it allows exotic compilers, e.g., TCC, to build with our stdio.h, etc.

PR:		201749
MFC after:	1 week
2015-08-04 00:11:39 +00:00
kib
4694db8d4e Add bit names for the IA32_MISC_ENABLE msr.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-28 06:55:08 +00:00
kib
9b07cc4555 Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to
obtain the thread %fs and %gs bases.  Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers.  The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE),
override the corresponding segment registers.

The main purpose of the operations is to retrieve and modify the tcb
address for debuggee.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-29 07:07:24 +00:00
kib
3fb738761e Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
kib
6006bf3a7d If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
neel
4d099a301f Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.
AMDID_FFXSR is at bit 25 so correct its value to 0x02000000.

MFC after:	1 week
2015-05-06 05:12:29 +00:00
jhb
9c4c8b62fb Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
whu
0b81a49657 Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
    - Vector interrupt support.
    - Signal optimization.
    - Storvsc driver performance improvement.
    - Scatter and gather support for storvsc driver.
    - Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR:     195238
Submitted by:   whu
Reviewed by:    royger, jhb, delphij
Approved by:    royger
MFC after:      2 weeks
Relnotes:       yes
Sponsored by:   Microsoft OSTC
2015-04-29 10:12:34 +00:00
jhb
e4683250d1 Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
kib
ed2e42d42c Revert unrelated chunk from the r281707.
MFC after:	2 weeks
2015-04-18 21:27:28 +00:00
kib
32cf8052a6 Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by:	alc
Discussed with:	Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-18 21:23:16 +00:00
jhb
148355cbb6 Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h>
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
  for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
  libbfd looks for these types, and having them fixes 'gcore' in gdb of a
  32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
  dump code instead of duplicating the definitions.

Differential Revision:	https://reviews.freebsd.org/D2142
Reviewed by:	kib, nathanw (powerpc bits)
MFC after:	1 week
2015-04-08 16:30:45 +00:00
kib
b178649de8 Use VT-d interrupt remapping block (IR) to perform FSB messages
translation.  In particular, despite IO-APICs only take 8bit apic id,
IR translation structures accept 32bit APIC Id, which allows x2APIC
mode to function properly.  Extend msi_cpu of struct msi_intrsrc and
io_cpu of ioapic_intsrc to full int from one byte.

KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid
bringing all dmar headers into interrupt code. The non-PCI(e) devices
which generate message interrupts on FSB require special handling. The
HPET FSB interrupts are remapped, while DMAR interrupts are not.

For each msi and ioapic interrupt source, the iommu cookie is added,
which is in fact index of the IRE (interrupt remap entry) in the IR
table. Cookie is made at the source allocation time, and then used at
the map time to fill both IRE and device registers. The MSI
address/data registers and IO-APIC redirection registers are
programmed with the special values which are recognized by IR and used
to restore the IRE index, to find proper delivery mode and target.
Map all MSI interrupts in the block when msi_map() is called.

Since an interrupt source setup and dismantle code are done in the
non-sleepable context, flushing interrupt entries cache in the IR
hardware, which is done async and ideally waits for the interrupt,
requires busy-wait for queue to drain.  The dmar_qi_wait_for_seq() is
modified to take a boolean argument requesting busy-wait for the
written sequence number instead of waiting for interrupt.

Some interrupts are configured before IR is initialized, e.g. ACPI
SCI.  Add intr_reprogram() function to reprogram all already
configured interrupts, and call it immediately before an IR unit is
enabled.  There is still a small window after the IO-APIC redirection
entry is reprogrammed with cookie but before the unit is enabled, but
to fix this properly, IR must be started much earlier.

Add workarounds for 5500 and X58 northbridges, some revisions of which
have severe flaws in handling IR.  Use the same identification methods
as employed by Linux.

Review:	https://reviews.freebsd.org/D1892
Reviewed by:	neel
Discussed with:	jhb
Tested by:	glebius, pho (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-03-19 13:57:47 +00:00
neel
771dc2cc82 Add x86 specific APIs 'lapic_ipi_alloc()' and 'lapic_ipi_free()' to allow IPI
vectors to be dynamically allocated. This allows kernel modules like vmm.ko
to allocate unique IPI slots when loaded (as opposed to hard allocating one
or more vectors).

Also, reorganize the fixed IPI vectors to create a contiguous space for
dynamic IPI allocation.

Reviewed by:	kib, jhb
Differential Revision:	https://reviews.freebsd.org/D2042
2015-03-14 00:30:41 +00:00