Commit Graph

624 Commits

Author SHA1 Message Date
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Justin Hibbits
e665eafb25 Correct the memory rman ranges to be to BUS_SPACE_MAXADDR
Summary:
As part of the migration of rman_res_t to be typed to uintmax_t, memory ranges
must be clamped appropriately for the bus, to prevent completely bogus addresses
from being used.

This is extracted from D4544.

Reviewed By: cem
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5134
2016-03-01 02:59:06 +00:00
Jung-uk Kim
0eda5b3f23 Silence PVS-Studio warning (V595). It can never be NULL here. 2016-02-23 23:57:24 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Konstantin Belousov
2fe1339ea2 Some BIOSes ACPI bytecode needs to take (sleepable) acpi mutex for
acpi_GetInteger() execution.  Intel DMAR interrupt remapping code
needs to know UID of the HPET to properly route the FSB interrupts
from the HPET, even when interrupt remapping is disabled, and the code
is executed under some non-sleepable mutexes.

Cache HPET UIDs in the device softc at the attach time and provide
lock-less method to get UID, use the method from the dmar hpet
handling code instead of calling GetInteger().

Reported and tested by:	Larry Rosenman <ler@lerctr.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-20 13:37:04 +00:00
Justin Hibbits
7915adb560 Introduce a RMAN_IS_DEFAULT_RANGE() macro, and use it.
This simplifies checking for default resource range for bus_alloc_resource(),
and improves readability.

This is part of, and related to, the migration of rman_res_t from u_long to
uintmax_t.

Discussed with:	jhb
Suggested by:	marcel
2016-02-20 01:32:58 +00:00
Konstantin Belousov
90edf67ecf POSIX states that #include <signal.h> shall make both mcontext_t and
ucontext_t available.  Our code even has XXX comment about this.

Add a bit of compliance by moving struct __ucontext definition into
sys/_ucontext.h and including it into signal.h and sys/ucontext.h.

Several machine/ucontext.h headers were changed to use namespace-safe
types (like uint64_t->__uint64_t) to not depend on sys/types.h.
struct __stack_t from sys/signal.h is made always visible in private
namespace to satisfy sys/_ucontext.h requirements.

Apparently mips _types.h pollutes global namespace with f_register_t
type definition.  This commit does not try to fix the issue.

PR:	207079
Reported and tested by:	Ting-Wei Lan <lantw44@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-12 07:38:19 +00:00
Justin Hibbits
2dd1bdf183 Convert rman to use rman_res_t instead of u_long
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources.  For
now, this is still compatible with u_long.

This is step one in migrating rman to use uintmax_t for resources instead of
u_long.

Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.

This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.

Reviewed By: jhb
Sponsored by:	Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
2016-01-27 02:23:54 +00:00
Sepherosa Ziehau
69a53a7a3a hyperv: use x86 generic code to do the hypervisor detection
This is first step to move the generic part of HV code into kernel instead
of module, so that it is possible to use hypercall to implement some other
paravirtualization code in the kernel.

Submitted by:		Howard Su <howard0su@gmail.com>
Reviewed by:		royger, delphij, adrian
Approved by:		adrian (mentor)
Sponsored by:		Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D3072
2016-01-14 02:50:13 +00:00
Ed Maste
0e42ee5dd8 Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
Ian Lepore
69dcb7e771 Make the 'env' directive described in config(5) work on all architectures,
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.

Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly.  Most startup code wasn't
doing so.  Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.

Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.

The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values.  Now if the size is zero, the provided buffer
full of existing env data is installed.  A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.

Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp.  A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data.  Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
2016-01-02 02:53:48 +00:00
Konstantin Belousov
6b247f858e Add standard extended feature bit 6 from the Intel SDM rev. 57, which
indicates that data-pointer in the saved x87 FPU state is only updated
on FPU exceptions.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-29 22:14:21 +00:00
John Baldwin
9e8d8b4b0c Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
Enji Cooper
b59f7a7ad8 Remove redundant declarations in sys/x86/xen which are now handled in other sys/x86
headers

Differential Revision: https://reviews.freebsd.org/D4685
X-MFC with: r291949
Sponsored by: EMC / Isilon Storage Division
2015-12-23 17:43:55 +00:00
Conrad Meyer
986fd63b46 x86: Add CPUID_STDEXT_* macros for CPU feature bits
A follow-up to r292478 and r292488.

Sponsored by:	EMC / Isilon Storage Division
2015-12-21 04:42:58 +00:00
Conrad Meyer
ce43b54ab2 x86: Detect feature flags "AVX512DQ", "AVX512IFMA", "AVX512BW", "AVX512VBMI"
Documented in Intel Architecture Set Extensions Programming Reference
(319433-023).

Sponsored by:	EMC / Isilon Storage Division
2015-12-20 03:34:30 +00:00
Conrad Meyer
f750a7edaa x86: Detect feature flags "CLWB" and "PCOMMIT"
"The availability of CLWB instruction is indicated by the presence of
the CPUID feature flag CLWB (bit 24 of the EBX register)."

CLWB is similar to CLFLUSHOPT, except that it is not required to discard
cacheline contents.

"On processors that supports PCOMMIT, PCOMMIT is enumerated through
CPUID (CPUID.7.0.EBX[22]) only when the feature is enabled by BIOS."

PCOMMIT is used to cause store-to-memory operations to become persistent
(protected from power failure).

Sponsored by:	EMC / Isilon Storage Division
2015-12-19 20:47:15 +00:00
Roger Pau Monné
a7285da666 x86/bounce: try to always completely fill bounce pages
Current code doesn't try to make use of the full page when bouncing because
the size is only expanded to be a multiple of the alignment. Instead try to
always create segments of PAGE_SIZE when using bounce pages.

This allows us to remove the specific casing done for
BUS_DMA_KEEP_PG_OFFSET, since the requirement is to make sure the offsets
into contiguous segments are aligned, and now this is done by default.

Sponsored by:		Citrix Systems R&D
Reviewed by:		hps, kib
Differential revision:	https://reviews.freebsd.org/D4119
2015-12-15 10:07:03 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
9a5d210cb4 It seems that at least some KVM versions advertise support for EIO
suppression but the version of the IOAPIC reported is 0x11 and neither
IOAPIC EOIR nor the Linux trick of temporal reprogramming of the pin
to edge-trigger mode to issue EOI work.

Disable eoi suppression if KVM is detected.  The mode can still be
forced with the tunable.

Reported and tested by:	Roman Mamontov <mr.xanto@gmail.com>
Sponsored by:	The FreeBSD Foundation
2015-12-05 08:52:37 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
Konstantin Belousov
906430e4f0 In the SandyBridge x2APIC workaround detection code, only fetch the
environment variable when SandyBridge CPU is detected.  Reduce code
duplication.

Sponsored by:	The FreeBSD Foundation
2015-12-03 10:59:10 +00:00
Konstantin Belousov
2a8a46b161 Correct the number of DTLB entries reported for the CPUID Leaf 2
descriptor 0x6c.

Confirmed by:	Intel
MFC after:	3 days
2015-11-24 19:55:11 +00:00
Svatopluk Kraus
eae22c4430 Revert r291142.
The not quite consistent logic for bounce pages allocation is utilizited
by re(4) interface which can hang now.

Approved by:	kib (mentor)
2015-11-23 11:19:00 +00:00
Svatopluk Kraus
6fa7734d6f Fix BUS_DMA_MIN_ALLOC_COMP flag logic. When bus_dmamap_t map is being
created for bus_dma_tag_t tag, bounce pages should be allocated
only if needed.

Before the fix, they were allocated always if BUS_DMA_COULD_BOUNCE flag
was set but BUS_DMA_MIN_ALLOC_COMP not. As bounce pages are never freed,
it could cause memory exhaustion when a lot of such tags together with
their maps were created.

Note that there could be more maps in one tag by current design.
However BUS_DMA_MIN_ALLOC_COMP flag is tag's flag. It's set after
bounce pages are allocated. Thus, they are allocated only for first
tag's map which needs them.

Approved by:	kib (mentor)
2015-11-21 19:55:01 +00:00
Marius Strobl
ec2fbee752 Avoid a NULL pointer dereference in bounce_bus_dmamap_unload() when
the map has been created via bounce_bus_dmamem_alloc(). In that case
bus_dmamap_unload(9) typically isn't called during normal operation
but still should be during detach, cleanup from failed attach etc.

Submitted by:	yongari
MFC after:	3 days
2015-11-21 02:08:47 +00:00
Marius Strobl
8fd47ac11c Avoid a NULL pointer dereference in bounce_bus_dmamap_sync() when the
map has been created via bounce_bus_dmamem_alloc(). Even for coherent
DMA - which bus_dmamem_alloc(9) typically is used for -, calling of
bus_dmamap_sync(9) isn't optional.

PR:		188899 (non-original problem)
MFC after:	3 days
2015-11-20 02:23:35 +00:00
Roger Pau Monné
1522652230 xen: fix dropping bitmap IPIs during resume
Current Xen resume code clears all pending bitmap IPIs on resume, which is
not correct. Instead re-inject bitmap IPI vectors on resume to all CPUs in
order to acknowledge any pending bitmap IPIs.

Sponsored by:		Citrix Systems R&D
MFC after:		2 weeks
2015-11-18 18:11:19 +00:00
Roger Pau Monné
ea64b86f94 xen/intr: properly dispose event channels on resume
All event channels are torn down when performing a migration on Xen, make
sure all handlers are also removed and the event channel structure is
properly disposed so it can be reused.

Sponsored by:		Citrix Systems R&D
MFC after:		2 weeks
2015-11-18 18:10:28 +00:00
Roger Pau Monné
531cfe55e2 x86/intr: allow mutex recursion in intr_remove_handler
This is needed so interrupt handlers can be removed while the PIC is
resuming, it was previously not possible due to intr_resume holding the
intr_table_lock and intr_remove_handler recursing on it.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib (previous version)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D4114
2015-11-18 18:09:49 +00:00
Roger Pau Monné
5c4133b1b5 x86/dma_bounce: rework _bus_dmamap_load_ma implementation
The implementation of bus_dmamap_load_ma_triv currently calls
_bus_dmamap_load_phys on each page that is part of the passed in buffer.
Since each page is treated as an individual buffer, the resulting behaviour
is different from the behaviour of _bus_dmamap_load_buffer. This breaks
certain drivers, like Xen blkfront.

If an unmapped buffer of size 4096 that starts at offset 13 into the first
page is passed to the current _bus_dmamap_load_ma implementation (so the ma
array contains two pages), the result is that two segments are created, one
with a size of 4083 and the other with size 13 (because two independant
calls to _bus_dmamap_load_phys are performed, one for each physical page).
If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer
the result is that only one segment is created, with a size of 4096.

This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce
buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements
_bus_dmamap_load_ma so that it's behaviour is the same as the mapped version
(_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer
code, other arches are left untouched.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib, jah (previous version)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D888
2015-11-09 12:19:58 +00:00
Tijl Coosemans
27f38a8d69 Since r289279 bufinit() uses mp_ncpus, but some architectures set this
variable during mp_start() which is too late.  Move this to mp_setmaxid()
where other architectures set it and move x86 assertions to MI code.

Reviewed by:	kib (x86 part)
2015-11-08 14:26:50 +00:00
Roger Pau Monné
f186ed526a xen/intr: fix the event channel enabled per-cpu mask
Fix two issues with the current event channel code, first ENABLED_SETSIZE is
not correctly defined and then using a BITSET to store the per-cpu masks is
not portable to other arches, since on arm32 the event channel arrays shared
with the hypervisor are of type uint64_t and not long. Partially restore the
previous code but switch the bit operations to use the recently introduced
xen_{set/clear/test}_bit versions.

Reviewed by:		Julien Grall <julien.grall@citrix.com>
Sponsored by:		Citrix Systems R&D
Differential Revision:	https://reviews.freebsd.org/D4080
2015-11-05 14:33:46 +00:00
Ian Lepore
53f93ed3ff Fix an alignment check that is wrong in half the busdma implementations.
This will enable the elimination of a workaround in the USB driver that
artifically allocates buffers twice as big as they need to be (which
actually saves memory for very small buffers on the buggy platforms).

When deciding how to allocate a dma buffer, armv4, armv6, mips, and
x86/iommu all correctly check for the tag alignment <= maxsize as enabling
simple uma/malloc based allocation.  Powerpc, sparc64, x86/bounce, and
arm64/bounce were all checking for alignment < maxsize; on those platforms
when alignment was equal to the max size it would fall back to page-based
allocators even for very small buffers.

This change makes all platforms use the <= check.  It should be noted that
on all platforms other than arm[v6] and mips, this check is relying on
undocumented behavior in malloc(9) that if you allocate a block of a given
size it will be aligned to the next larger power-of-2 boundary.  There is
nothing in the malloc(9) man page that makes that explicit promise (but the
busdma code has been relying on this behavior all along so I guess it works).

Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which
does explicitly implement this promise about size and alignment.  Other
platforms probably should switch to the aligned allocator.
2015-11-02 23:37:19 +00:00
Roger Pau Monné
f4576dd975 x86/dma_bounce: revert r289834 and r289836
The new load_ma implementation can cause dereferences when used with
certain drivers, back it out until the reason is found:

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 03
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808a2d22
stack pointer           = 0x28:0xfffffe07cc737710
frame pointer           = 0x28:0xfffffe07cc737790
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (g_down)
trap number             = 12
panic: page fault
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff80641647 at kdb_backtrace+0x67
#1 0xffffffff80606762 at vpanic+0x182
#2 0xffffffff806067e3 at panic+0x43
#3 0xffffffff8084eef1 at trap_fatal+0x351
#4 0xffffffff8084f0e4 at trap_pfault+0x1e4
#5 0xffffffff8084e82f at trap+0x4bf
#6 0xffffffff80830d57 at calltrap+0x8
#7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb
#8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91
#9 0xffffffff8042dcad at ata_dmaload+0x11d
#10 0xffffffff8042df7e at ata_begin_transaction+0x7e
#11 0xffffffff8042c18e at ataaction+0x9ce
#12 0xffffffff802a220f at xpt_run_devq+0x5bf
#13 0xffffffff802a17ad at xpt_action_default+0x94d
#14 0xffffffff802c0024 at adastart+0x8b4
#15 0xffffffff802a2e93 at xpt_run_allocq+0x193
#16 0xffffffff802c0735 at adastrategy+0xf5
#17 0xffffffff80554206 at g_disk_start+0x426
Uptime: 2m29s
2015-10-26 14:50:35 +00:00
Conrad Meyer
ce7543042c xen: Add missing semi-colon for BITSET_DEFINE()
Broken when it was removed from the macro in r289867.

Pointy-hat:	markj
Sponsored by:	EMC / Isilon Storage Division
2015-10-24 19:04:55 +00:00
Roger Pau Monné
59cd0f10b3 x86/dma_bounce: rework _bus_dmamap_load_ma implementation
The implementation of bus_dmamap_load_ma_triv currently calls
_bus_dmamap_load_phys on each page that is part of the passed in buffer.
Since each page is treated as an individual buffer, the resulting behaviour
is different from the behaviour of _bus_dmamap_load_buffer. This breaks
certain drivers, like Xen blkfront.

If an unmapped buffer of size 4096 that starts at offset 13 into the first
page is passed to the current _bus_dmamap_load_ma implementation (so the ma
array contains two pages), the result is that two segments are created, one
with a size of 4083 and the other with size 13 (because two independant
calls to _bus_dmamap_load_phys are performed, one for each physical page).
If the same is done with a mapped buffer and calling _bus_dmamap_load_buffer
the result is that only one segment is created, with a size of 4096.

This patch relegates the usage of bus_dmamap_load_ma_triv in x86 bounce
buffer code to drivers requesting BUS_DMA_KEEP_PG_OFFSET and implements
_bus_dmamap_load_ma so that it's behaviour is the same as the mapped version
(_bus_dmamap_load_buffer). This patch only modifies the x86 bounce buffer
code, other arches are left untouched.

Reviewed by:		kib, jah
Differential Revision:	https://reviews.freebsd.org/D888
Sponsored by:		Citrix Systems R&D
2015-10-23 15:39:59 +00:00
Jason A. Harmening
a50730587b Remove unclear comment about address truncation in busdma. Add (hopefully much clearer) comment at declaration of PHYS_TO_VM_PAGE().
Noted by:	avg
2015-10-23 12:03:25 +00:00
Konstantin Belousov
c0db387d25 Decode new values for CPUID leaf 2 cache and TLB descriptors, from the
Intel SDM revision 56.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-23 11:43:56 +00:00
Roger Pau Monné
2f9ec994bc xen: Code cleanup and small bug fixes
xen/hypervisor.h:
 - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain,
   is_running_on_xen
 - Remove unused define CONFIG_X86_PAE
 - Remove unused variable xen_start_info: note that it's used inpcifront
   which is not built at all
 - Remove forward declaration of HYPERVISOR_crash

xen/xen-os.h:
 - Remove unused define CONFIG_X86_PAE
 - Drop unused helpers: test_and_clear_bit, clear_bit,
   force_evtchn_callback
 - Implement a generic version (based on ofed/include/linux/bitops.h) of
   set_bit and test_bit and prefix them by xen_ to avoid any use by other
   code than Xen. Note that It would be worth to investigate a generic
   implementation in FreeBSD.
 - Replace barrier() by __compiler_membar()
 - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop
   = pause

xen/xen_intr.h:
 - Move the prototype of xen_intr_handle_upcall in it: Use by all the
   platform

x86/xen/xen_intr.c:
 - Use BITSET* for the enabledbits: Avoid to use custom helpers
 - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit
 - Don't export the variable xen_intr_pcpu

dev/xen/blkback/blkback.c:
 - Fix the string format when XBB_DEBUG is enabled: host_addr is typed
   uint64_t

dev/xen/balloon/balloon.c:
 - Remove set but not used variable
 - Use the correct type for frame_list: xen_pfn_t represents the frame
   number on any architecture

dev/xen/control/control.c:
 - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
   first driver, every new device with the driver is unset will use
   xenstore.

dev/xen/grant-table/grant_table.c:
 - Remove unused cmpxchg
 - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't
   contain anything required for the code on x86

dev/xen/netfront/netfront.c:
 - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame
   number on any architecture

dev/xen/netback/netback.c:
 - Use the correct type for gmfn: xen_pfn_t represents the frame number on
   any architecture

dev/xen/xenstore/xenstore.c:
 - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
  first driver, every new device with the driver is unset will use xenstore.

Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore
arch-specific code. Although, a new series will add some helpers that differ
between x86 and ARM64, so I've kept the headers for now.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3921
Sponsored by:		Citrix Systems R&D
2015-10-21 10:44:07 +00:00
Roger Pau Monné
6a306bff7f x86/xen: Consolidate xen-os.h in a single place
amd64 and i386 platform code contain very similar xen/xen-os.h

The only differences are:
 - Functions/variables/types which were unused in i386/xen/xen-os.h:
    * xen_xchg
    * __xchg_dummy
    * __xg
    * __xchg
    * atomic_t
    * atomic_inc
    * rdtscll

The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.

The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3880
Sponsored by:		Citrix Systems R&D
2015-10-21 10:04:35 +00:00
Jason A. Harmening
012cf46f07 Don't page-align the physical address when calling PHYS_TO_VM_PAGE().
M    busdma_bounce.c
2015-10-17 14:58:55 +00:00
Jason A. Harmening
dcaa560af0 Ensure the client regions for unmapped bounce buffers created through bus_dmamap_load_phys() do not span multiple pages.
This is already done for mapped buffers.
While here, stop casting bus_addr_t to vm_offset_t.
2015-10-13 02:17:56 +00:00
Bjoern A. Zeeb
6b1ad46a3b dmar_ctx_dtr() does not exist since r284869. Remove the static function
declaration to avoid a cmpile time warning.
2015-09-22 16:50:59 +00:00
Zbigniew Bodek
18c72666ce Add domain support to PCI bus allocation
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.

Reviewed by:   jhb
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
2015-09-16 23:34:51 +00:00
Adrian Chadd
a14bc739d5 Add ASUS Sandybridge laptops to the similar x2apic disable logic
that was recently added for Lenovo laptops.

This is a prime candidate for conversion into a table and also
checking other fields like "product".

Tested:

* ASUS UX31E
2015-09-16 01:44:11 +00:00
Mark Johnston
610141cebb Add stack_save_td_running(), a function to trace the kernel stack of a
running thread.

It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.

This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.

Reviewed by:	bdrewery, jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3256
2015-09-11 03:54:37 +00:00
Mark Johnston
1e954a7c63 Remove the arg0 field from struct amd64_frame. Its existence was a bug,
since on amd64 the first argument to a function is generally not on the
stack.

Revert an old DTrace bug fix to some code that assumed that
sizeof(struct amd64_frame) == 16.

Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:31:22 +00:00
Mark Johnston
4db79feb8f Merge stack(9) implementations for i386 and amd64 under x86/.
Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:24:07 +00:00
Warner Losh
5e4fb9caca Add missing ofw_machdep.h. Make x86 ofw_machdep.h work pc98 too.
This allows the owc module to compile on pc98 and seems preferable to
adding another special case in the build system.
2015-08-28 15:41:09 +00:00
Roger Pau Monné
e8234cfef6 preload_search_info: make sure mod is set
Add a check to preload_search_info to make sure mod is set. Most of the
callers of preload_search_info don't check that the mod parameter is
set, which can cause page faults. While at it, remove some now unnecessary
checks before calling preload_search_info.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D3440
2015-08-21 15:57:57 +00:00
Roger Pau Monné
f8f1bb83f7 xen: allow disabling PV disks and nics
Introduce two new loader tunnables that can be used to disable PV disks and
PV nics at boot time. They default to 0 and should be set to 1 (or any
number different than 0) in order to disable the PV devices:

hw.xen.disable_pv_disks=1
hw.xen.disable_pv_nics=1

In /boot/loader.conf will disable both PV disks and nics.

Sponsored by:	Citrix Systems R&D
Tested by:	Karl Pielorz <kpielorz_lst@tdx.co.uk>
MFC after:	1 week
2015-08-21 15:53:08 +00:00
Konstantin Belousov
8c48615974 Automatically disable x2APIC mode on SandyBridge Lenovo machines. I
believe that the bug only affects mobile CPUs, at least I did not see
other reports, but it is impossible to detect it in madt_setup_local().

While there, reduce duplication in the information strings printed
when x2APIC is auto-disabled, and do not print the line when user
manually override the setting.

Tested and reviewed by:	  royger (previous version)
Sponsored by:	The FreeBSD Foundation
2015-08-21 15:13:25 +00:00
Jason A. Harmening
7b59f7bc5c Use pmap_quick_enter_page() to handle bouncing of unmapped buffers in the x86 busdma_bounce implementation. Also treat user buffers as unmapped.
This allows two things:
1. Sync'ing bounced maps in non-sleepable contexts.  The physcopy* calls previously used could sleep on sf_buf operations in some cases.
2. Sync'ing user buffers outside the context of the owning process

Approved by:	kib (mentor)
2015-08-14 20:08:16 +00:00
Jason A. Harmening
e6e0582bd4 Reformat x86 bounce buffer synchronization code to reduce indentation. No functional change.
Approved by:	kib (mentor)
2015-08-14 18:01:40 +00:00
Konstantin Belousov
0a44024a4e Comment only change, fix grammar and somewhat clarify the action.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-08-14 13:51:59 +00:00
Marcel Moolenaar
7ef5e8bc80 Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.

1.  Delay calling cninit() until after pmap_bootstrap(). This makes
    sure we have PMAP initialized enough to add translations. Keep
    kdb_init() after cninit() so that we have console when we need
    to break into the debugger on boot.
2.  Unfortunately, the ATPIC code had be moved as well so as to
    avoid a spurious trap #30. The reason for which is not known
    at this time.
3.  In pmap_mapdev_attr(), when we need to map a device prior to the
    VM system being initialized, use virtual_avail as the KVA to map
    the device at. In particular, avoid using the direct map on amd64
    because we can't demote by virtue of not being able to allocate
    yet. Keep track of the translation.
    Re-use the translation after the VM has been initialized to not
    waste KVA and to satisfy the assumption in uart(4) that the handle
    returned for the low-level console is the same as later returned
    when the device is probed and attached.
4.  In pmap_unmapdev() remove the mapping from the table when called
    pre-init. Otherwise keep the mapping. During bus probe and attach
    device resources are mapped and unmapped multiple times, which
    would have us destroy the mapping used by the low-level console.
5.  In pmap_init(), set pmap_initialized to signal that we're not
    pre-init anymore. On amd64, bring the direct map in sync with the
    translations created at that time.
6.  Implement bus_space_map() and bus_space_unmap() for real: when
    the tag corresponds to memory space, call the corresponding
    pmap_mapdev() and pmap_unmapdev() functions to construct and
    actual handle.
7.  In efifb.c and vt_vga.c, remove the crutches and hacks and simply
    call pmap_mapdev_attr() or bus_space_map() as desired.

Notes:
1.  uart(4) already used bus_space_map() during low-level console
    setup but since serial ports have traditionally been I/O port
    based, the lack of a proper implementation for said function
    was not a problem. It has always supported memory mapped UARTs
    for low-level consoles by setting hw.uart.console accordingly.
2.  The use of the direct map on amd64 without setting caching
    attributes has been a bigger problem than previously thought.
    This change has the fortunate (and unexpected) side-effect of
    fixing various EFI frame buffer problems (though not all).

PR: 191564, 194952

Special thanks to:
1.  XipLink, Inc -- generously donated an Intel Bay Trail E3800
    based eval board (ADLE3800PC).
2.  The FreeBSD Foundation, in particular emaste@ -- for UEFI
    support in general and testing.
3.  Everyone who tested the proposed for PR 191564.
4.  jhb@ and kib@ for being a soundboard and applying a clue bat
    if so needed.
2015-08-12 15:26:32 +00:00
Konstantin Belousov
f36f7c0bf8 In x2APIC mode, IPI generation is atomic because it is performed by
single ICR MSR write.  This is in contrast with the xAPIC mode, where
we must read current ICR value, do bit fiddling and perform two 32-bit
register writes.  As a consequence, there is no need to disable
interrupts around ICR value calculation and write.

Note that typical users of ipi_raw() and ipi_vectored() take spinlock,
which already disables interrupts.  For them, the change removes
unneeded CLI and POPFL/Q instructions.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-12 09:55:52 +00:00
Konstantin Belousov
edc8222303 Make kstack_pages a tunable on arm, x86, and powepc. On i386, the
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.

The tunable was tested on x86 only.  From the visual inspection, it
seems that it might work on arm and powerpc.  The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size.  I only
changed the macros to use variable instead of constant, since I cannot
test.

On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
2015-08-10 17:18:21 +00:00
Konstantin Belousov
a8bf83d618 Formally pair store_rel(&smp_started) with load_acq(&smp_started).
The expected semantic is to have misc. data, e.g. CPU bitmaps, visible
in the BSP after smp_started is written by the last started AP, which
formally requires acquire barrier on the load.  The change is mostly
nop due to the ordered behaviour of the x86 CPUs.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-06 18:02:54 +00:00
John Baldwin
3c790178c5 Remove some more vestiges of the Xen PV domu support. Specifically,
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.

Reported by:	bde (stale bits in <machine/xenfunc.h>)
Reviewed by:	royger (earlier version)
Differential Revision:	https://reviews.freebsd.org/D3266
2015-08-06 17:07:21 +00:00
Jung-uk Kim
fb396e55da Fix more style issues.
Submitted by:	bde
2015-08-05 17:21:42 +00:00
Ed Maste
96226a9aa7 Rationalize BSD license on sys/*/include/float.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 17:05:35 +00:00
Jung-uk Kim
ca23ca33f2 Fix style(9) bugs. 2015-08-04 18:59:54 +00:00
Jung-uk Kim
45c2c9a84a Always define __va_list for amd64 and restore pre-r232261 behavior for i386.
Note it allows exotic compilers, e.g., TCC, to build with our stdio.h, etc.

PR:		201749
MFC after:	1 week
2015-08-04 00:11:39 +00:00
Konstantin Belousov
f94cc23475 Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID
reported, on APs.  We already did this on BSP.

Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving.  In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID.  Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.

Reported and tested by:	Andre Meiser <ortadur@web.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-03 12:14:42 +00:00
Konstantin Belousov
90a2db45eb Add bit names for the IA32_MISC_ENABLE msr.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-28 06:55:08 +00:00
Konstantin Belousov
9b3df93bf1 Typo in comment. 2015-07-20 19:51:41 +00:00
Konstantin Belousov
1ef630fb33 Fix warnings about unused functions for UP build.
Sponsored by:	The FreeBSD Foundation
2015-07-16 12:16:42 +00:00
Zbigniew Bodek
721555e7ee Fix KSTACK_PAGES issue when the default value was changed in KERNCONF
If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.

Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.

The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.

Reviewed by:   kib
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094
2015-07-16 10:46:52 +00:00
Christian Brueffer
9f026a420b Set the initial system time to a sane (as in: not end of 21st century) value when
booting on a PC with CMOS clock set to a year before 2000.

This uses 1980 (instead of 1970 as in the initial patch) as pivot year as
suggested by imp in the PR followup.

PR:		195703
Submitted by:	cs@soi.spb.ru
Reviewed by:	imp
MFC after:	1 weeks
2015-06-29 17:02:09 +00:00
Konstantin Belousov
1817023775 Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to
obtain the thread %fs and %gs bases.  Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers.  The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE),
override the corresponding segment registers.

The main purpose of the operations is to retrieve and modify the tcb
address for debuggee.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-29 07:07:24 +00:00
Konstantin Belousov
1abfd35537 Split the DMAR unit domains and contexts. Domains carry address space
and related data structures.  Contexts attach requests initiators to
domains.  There is still 1:1 correspondence between contexts and
domains on the running system, since only busdma currently allocates
them, using dmar_get_ctx_for_dev().

Large part of the change is formal rename of the ctx to domain, but
patch also reworks the context allocation and free to allow for
independent domain creation.

The helper dmar_move_ctx_to_domain() is introduced for future use, to
reassign request initiator from one domain to another.  The hard issue
which is not yet resolved with the context move is proper handling (or
reserving) RMRR entries in the destination domain as required by ACPI
DMAR table for moved context.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-06-26 07:01:29 +00:00
Jung-uk Kim
5ef5072350 Merge ACPICA 20150619. 2015-06-18 23:14:45 +00:00
John Baldwin
125954c873 Handle X2APIC entries in the MADT for APICs with an ID < 255. At least one
BIOS has been seen to include such entries even though the relevant specs
require that X2APIC entries only be used for CPUs with an APIC ID >= 255.

This was tested on a system with "plain" local APIC entries in the MADT
to ensure no regressions, but it has not yet been tested on a system with
X2APIC entries in the MADT.  Currently such systems do not boot at all,
and with this change they might now boot correctly.

Differential Revision:	https://reviews.freebsd.org/D2521
Reviewed by:	kib
MFC after:	2 weeks
2015-06-09 10:49:40 +00:00
Konstantin Belousov
32a1e9e4a5 Update print_INTEL_TLB() by the tag values from the Intel SDM
rev. 55.  The modern CPUs cache and TLB descriptions looked quite
questionable without the update, e.g. Haswell i7 4770S reported:
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
After the update, the report is:
	Data TLB: 1 GByte pages, 4-way set associative, 4 entries
	Data TLB: 4 KB pages, 4-way set associative, 64 entries
	Instruction TLB: 2M/4M pages, fully associative, 8 entries
	Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
	64-Byte prefetching
	Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
	L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
Some tags were apparently removed from the table 3-21, Vol. 2A.  Keep
them around, but add a comment stating the removal.

Update the format line for cpu_stdext_feature according to the bits
from the SDM rev.55.  It appears that Haswells do not store %cs and
%ds values in the FPU save area.

Store content of the %ecx register from the CPUID leaf 0x7
subleaf 0 as cpu_stdext_feature2 and print defined bits from it,
again acording to SDM rev. 55.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-06 22:03:24 +00:00
Konstantin Belousov
69baeadc31 Remove several write-only variables, all reported by the gcc 4.9
buildkernel run.

Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros.  It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.

Review:	https://reviews.freebsd.org/D2665
Reviewed by:	rodrigc
Looked at by:	bjk
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 13:24:17 +00:00
Konstantin Belousov
35725a97c4 Explicitely enable queued invalidation completion interrupt when the
queue is started, not relying on the interrupt remaping method to
happen.  Also disable interrupts when shooting down the queue.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 09:17:59 +00:00
Roger Pau Monné
02761cf314 xen: make sure xenpv bus is the last to attach
This is needed so other buses have a chance of attaching a real ISA bus, if
none is found xenpv will attach it.

Sponsored by: Citrix Systems R&D
2015-05-25 09:47:16 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Konstantin Belousov
d9e8bbb64d When sleeping in Sx state using MWAIT instruction, accept fast wakeup
requests from writes to the monitored line.

Submitted by:	avg
2015-05-19 14:21:00 +00:00
Adrian Chadd
4f5e270a93 Update the comments to match what the code ended up becoming.
-1 is now "no locality information available".

Sponsored by:	Norse Corp, Inc.
2015-05-15 21:33:19 +00:00
Konstantin Belousov
a546448b8d Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms".  The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated.  If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated.  On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid.  The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs.  Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-09 19:11:01 +00:00
Konstantin Belousov
b57a73f8e7 If x86 CPU implementation of the MWAIT instruction reasonably
interacts with interrupts, query ACPI and use MWAIT for entrance into
Cx sleep states.  Support C1 "I/O then halt" mode.  See Intel'
document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface
Specification" for description.

Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use
it instead of inlining "sti; hlt" sequence in several places.

In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods
sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported}
sysctls.

Both jkim and avg have some other patches implementing the mwait
functionality; this work is unrelated.  Linux does not rely on the
ACPI to provide correct tables describing Cx modes.  Instead, the
driver has pre-defined knowledge of the CPU models, it was supplied by
Intel.

Tested by:    pho (previous versions)
Sponsored by:	The FreeBSD Foundation
2015-05-09 12:28:48 +00:00
Roger Pau Monné
0df8b29da3 xen: introduce a newbus function to allocate unused memory
In order to map memory from other domains when running on Xen FreeBSD uses
unused physical memory regions. Until now this memory has been allocated
using bus_alloc_resource, but this is not completely safe as we can end up
using unreclaimed MMIO or ACPI regions.

Fix this by introducing a new newbus method that can be used by Xen drivers
to request for unused memory regions. On amd64 we make sure this memory
comes from regions above 4GB in order to prevent clashes with MMIO/ACPI
regions. On i386 there's nothing we can do, so just fall back to the
previous mechanism.

Sponsored by: Citrix Systems R&D
Tested by: Gustau Pérez <gperez@entel.upc.edu>
2015-05-08 14:48:40 +00:00
Adrian Chadd
415d7ccab2 Add initial memory locality cost awareness to the VM, and include
a basic ACPI SLIT table parser.

For now this just exports the map via sysctl; it'll eventually be useful
to userland when there's more useful NUMA support in -HEAD.

* Add an optional mem_locality map;
* add a mapping function taking from/to domain and returning the
  relative cost, or -1 if it's not available;
* Add a very basic SLIT parser to x86 ACPI.

Differential Revision:	https://reviews.freebsd.org/D2460
Reviewed by:	rpaulo, stas, jhb
Sponsored by:	Norse Corp, Inc (hardware, coding); Dell (hardware)
2015-05-08 00:56:56 +00:00
Neel Natu
712bd51ada Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.
AMDID_FFXSR is at bit 25 so correct its value to 0x02000000.

MFC after:	1 week
2015-05-06 05:12:29 +00:00
John Baldwin
ed95805e90 Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
Wei Hu
da2f98a1cf Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
    - Vector interrupt support.
    - Signal optimization.
    - Storvsc driver performance improvement.
    - Scatter and gather support for storvsc driver.
    - Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR:     195238
Submitted by:   whu
Reviewed by:    royger, jhb, delphij
Approved by:    royger
MFC after:      2 weeks
Relnotes:       yes
Sponsored by:   Microsoft OSTC
2015-04-29 10:12:34 +00:00
Hans Petter Selasky
5caa65ca2d The add_bounce_page() function can be called when loading physical
pages which pass a NULL virtual address. If the BUS_DMA_KEEP_PG_OFFSET
flag is set, use the physical address to compute the page offset
instead. The physical address should always be valid when adding
bounce pages and should contain the same page offset like the virtual
address.

Submitted by:	Svatopluk Kraus <onwahe@gmail.com>
MFC after:	1 week
Reviewed by:	jhb@
2015-04-28 06:12:37 +00:00
Konstantin Belousov
02c26f81a7 Move common code from sys/i386/i386/mp_machdep.c and
sys/amd64/amd64/mp_machdep.c, to the new common x86 source
sys/x86/x86/mp_x86.c.

Proposed and reviewed by:	jhb
Review:	https://reviews.freebsd.org/D2347
Sponsored by:	The FreeBSD Foundation
2015-04-24 16:20:56 +00:00
John Baldwin
179fa75e6e Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by:	Hudson River Trading LLC (who owns ACT LLC)
MFC after:	1 week
2015-04-23 14:22:20 +00:00
Konstantin Belousov
dfe7b3bfbc Move some common code from sys/amd64/amd64/machdep.c and
sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c.  Most
of the code is related to the idle handling.

Discussed with:	pluknet
Sponsored by:	The FreeBSD Foundation
2015-04-22 12:32:14 +00:00
Marius Strobl
62def911bd Refine the workaround for Intel HSD131 [1] added in r269052:
- Use the full mask described by the erratum as with a sufficiently high
  number of these false-positives, the overflow bit (bit 62) additionally
  gets set [7].
- HSD131 has been brought into several other Haswell-derived CPUs including
  to the next generation, i. e. Intel Broadwell. Thus, also skip reporting of
  these benign errors by default on CPU models affected by HSM142, HSW131 and
  BDM48 [2 - 5], describing the HSD131 silicon bug for additional models.
  Also, Celeron 2955U with a CPU ID of 0x45 have been reported to be covered
  by this fault [6], with the specification update concerned with HSM142 [2]
  only referring to 0x3c and 0x46.

Submitted by:	David Froehlich [7]
MFC after:	3 days

http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf [1]
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-mobile-specification-update.pdf [2]
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf [3]
http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/core-m-processor-family-spec-update.pdf [4]
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf [5]
https://lists.freebsd.org/pipermail/freebsd-hackers/2015-January/046878.html [6]
2015-04-19 20:15:57 +00:00
Konstantin Belousov
9dda94167d Revert unrelated chunk from the r281707.
MFC after:	2 weeks
2015-04-18 21:27:28 +00:00
Konstantin Belousov
1c8e7232b4 Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by:	alc
Discussed with:	Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-18 21:23:16 +00:00
Konstantin Belousov
34c15db9cd Add config option PAE_TABLES for the i386 kernel. It switches pmap to
use PAE format for the page tables, but does not incur other
consequences of the full PAE config.  In particular, vm_paddr_t and
bus_addr_t are left 32bit, and max supported memory is still limited
by 4GB.

The option allows to have nx permissions for memory mappings on i386
kernel, while keeping the usual i386 KBI and avoiding the kernel data
sizing problems typical for the PAE config.

Intel documented that the PAE format for page tables is available
starting with the Pentium Pro, but it is possible that the plain
Pentium CPUs have the required support (Appendix H).  The goal is to
enable the option and non-exec mappings on i386 for the GENERIC
kernel.  Anybody wanting a useful system on 486, have to reconfigure
the modern i386 kernel anyway.

Discussed with:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-13 15:22:45 +00:00
Jung-uk Kim
9e222cd613 Fix build on i386.
Reported by:	bz
2015-04-12 22:40:27 +00:00
John Baldwin
dbee5c671a Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h>
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
  for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
  libbfd looks for these types, and having them fixes 'gcore' in gdb of a
  32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
  dump code instead of duplicating the definitions.

Differential Revision:	https://reviews.freebsd.org/D2142
Reviewed by:	kib, nathanw (powerpc bits)
MFC after:	1 week
2015-04-08 16:30:45 +00:00
Konstantin Belousov
5f01ba81ae Account for the offset of the page run when allocating the
dmar_map_entry.  Non-zero offset both increases the required mapping
size, which is handled in dmar_bus_dmamap_load_something1(), and makes
it possible that allocated range crosses boundary, which needs a check
in dmar_gas_match_one().

Reported and tested by:	jimharris
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-08 01:55:22 +00:00
Konstantin Belousov
bdf31595ff When mapping an allocated entry, use the entry size, instead of the
requested size.  If tag restrictions caused split entry, its size is
less then requsted.

Hardware provided by:	Michael Fuckner <michael@fuckner.net>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-24 12:48:51 +00:00
Konstantin Belousov
a3d78402d2 Assert that the mapping loop makes progress.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-24 12:46:21 +00:00
Konstantin Belousov
0a110d5b17 Use VT-d interrupt remapping block (IR) to perform FSB messages
translation.  In particular, despite IO-APICs only take 8bit apic id,
IR translation structures accept 32bit APIC Id, which allows x2APIC
mode to function properly.  Extend msi_cpu of struct msi_intrsrc and
io_cpu of ioapic_intsrc to full int from one byte.

KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid
bringing all dmar headers into interrupt code. The non-PCI(e) devices
which generate message interrupts on FSB require special handling. The
HPET FSB interrupts are remapped, while DMAR interrupts are not.

For each msi and ioapic interrupt source, the iommu cookie is added,
which is in fact index of the IRE (interrupt remap entry) in the IR
table. Cookie is made at the source allocation time, and then used at
the map time to fill both IRE and device registers. The MSI
address/data registers and IO-APIC redirection registers are
programmed with the special values which are recognized by IR and used
to restore the IRE index, to find proper delivery mode and target.
Map all MSI interrupts in the block when msi_map() is called.

Since an interrupt source setup and dismantle code are done in the
non-sleepable context, flushing interrupt entries cache in the IR
hardware, which is done async and ideally waits for the interrupt,
requires busy-wait for queue to drain.  The dmar_qi_wait_for_seq() is
modified to take a boolean argument requesting busy-wait for the
written sequence number instead of waiting for interrupt.

Some interrupts are configured before IR is initialized, e.g. ACPI
SCI.  Add intr_reprogram() function to reprogram all already
configured interrupts, and call it immediately before an IR unit is
enabled.  There is still a small window after the IO-APIC redirection
entry is reprogrammed with cookie but before the unit is enabled, but
to fix this properly, IR must be started much earlier.

Add workarounds for 5500 and X58 northbridges, some revisions of which
have severe flaws in handling IR.  Use the same identification methods
as employed by Linux.

Review:	https://reviews.freebsd.org/D1892
Reviewed by:	neel
Discussed with:	jhb
Tested by:	glebius, pho (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-03-19 13:57:47 +00:00
Konstantin Belousov
dcc33b0a8a Provide definitions for all descriptors types in the DMAR invalidation
queue.  They are for first-level translations and device TLB.

Review:	https://reviews.freebsd.org/D1892
Reviewed by:	neel
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-19 13:05:55 +00:00
Konstantin Belousov
0f5830b045 Fix syntax error.
Review:	https://reviews.freebsd.org/D1892
Found by:	neel
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-03-19 13:03:58 +00:00
Konstantin Belousov
743ffdaf3c When initial placement of the new entry crosses the boundary,
allocator tries to move the entry up, after the boundary.  The new
location may still fail to satisfy boundary requirement, for instance,
if the boundary is set to page size, and allocation is of multiple
pages.

Recheck that boundary is not crossed after the move.  If it is
crossed, give up on allocating the whole entry and split it.

Reported by:	Michael Fuckner <michael@fuckner.net>, running nvme(4)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-17 22:00:11 +00:00
Konstantin Belousov
582b656248 When inserting new entry into the address map, ensure that not only
next entry does not intersect with the tail of the new entry, but also
that previous entry is also before new entry start.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-17 21:55:33 +00:00
Neel Natu
8958d18cb3 Add x86 specific APIs 'lapic_ipi_alloc()' and 'lapic_ipi_free()' to allow IPI
vectors to be dynamically allocated. This allows kernel modules like vmm.ko
to allocate unique IPI slots when loaded (as opposed to hard allocating one
or more vectors).

Also, reorganize the fixed IPI vectors to create a contiguous space for
dynamic IPI allocation.

Reviewed by:	kib, jhb
Differential Revision:	https://reviews.freebsd.org/D2042
2015-03-14 00:30:41 +00:00
Neel Natu
847383d090 Free up the IPI slot used by IPI_STOP_HARD.
Change the numeric value of IPI_STOP_HARD so it doesn't occupy a valid IPI
slot. This can be done because IPI_STOP_HARD is actually delivered via NMI.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D1983
2015-03-01 02:31:27 +00:00
Konstantin Belousov
68c2f49379 Since all generations of Intel CPUs have errata which causes hang on
the cache line flush in the LAPIC page, keep direct map page covering
LAPIC mapped uncached.

To have the (incomplete) check for the LAPIC range in
pmap_invalidate_cache_range() working, lapic_paddr must be initialized
in x2APIC mode too.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-27 11:13:46 +00:00
Roger Pau Monné
544b31a3e3 xen/intr: fix fallout from r278854
r278854 introduced a race in the event channel handling code. We must make
sure that the pending bit is cleared before executing the filter, or else we
might miss other events that would be injected after the filter has ran but
before the pending bit is cleared.

While there also mask event channels while FreeBSD executes the ithread
bound to that event channel. This refrains Xen from injecting more
interrupts while the ithread has not finished it's work.

Sponsored by: Citrix Systems R&D
Reported by: sbruno, robak
Tested by: robak
2015-02-26 16:05:09 +00:00
Konstantin Belousov
2d4c4c8dc7 Implements EOI suppression mode, where LAPIC on EOI command for
level-triggered interrupt does not broadcast the EOI message to all
APICs in the system.  Instead, interrupt handler must follow LAPIC EOI
with IOAPIC EOI.  For modern IOAPICs, the later is done by writing to
EOIR register.  Otherwise, Intel provided Linux with a trick of
temporary switching the pin config to edge and then back to level.

Detect presence of EOIR register by reading IO-APIC version.  The
summary table in the comments was taken from the Linux kernel.  For
Intel, newer IO-APICs are only briefly documented as part of the
ICH/PCH datasheet.  According to the BKDG and chipset documentation,
AMD LAPICs do not provide EOI suppression, althought IO-APICs do
declare version 0x21 and implement EOIR.

The trick to temporary switch pin to edge mode to clear IRR was tested
on modern chipset, by pretending that EOIR is not present, i.e. by
forcing io_haseoi to zero.

Tunable hw.lapic_eoi_suppression disables the optimization.

Reviewed by:	neel
Tested by:	pho
Review:	https://reviews.freebsd.org/D1943
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-26 11:02:40 +00:00
Konstantin Belousov
45bc78eb45 For now, disable x2APIC mode when Xen is detected, even if CPU
declares support for it.  Newer versions of Xen works fine with x2APIC
code, but e.g. Xen 4.2 delivers GPF on the LAPIC MSR write, despite
x2APIC mode being known to hypervisor.

Discussed with:	royger
Sponsored by:	The FreeBSD Foundation
2015-02-25 16:44:07 +00:00
Konstantin Belousov
af02586229 Revert r276949 and redo the fix for PCIe/PCI bridges, which do not
follow specification and do not provide PCIe capability.

Verify if the port above such bridge is downstream PCIe (or root port)
and treat the bridge as PCIe/PCI then.  This allows to avoid
maintaining the table of device ids for bridges without capability,
while still calculate correct request originator for devices behind
the bridge.

Submitted by:	Jason Harmening <jason.harmening@gmail.com>
MFC after:	1 week
2015-02-21 22:38:32 +00:00
Tijl Coosemans
f140f6d14b Fix build on i386 without "device apic"
Reviewed by:	kib
2015-02-20 19:42:26 +00:00
Konstantin Belousov
117c6e7cf2 Fix UP build.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-18 10:51:48 +00:00
Konstantin Belousov
5f674c4cbd Initialize x2APIC mode on the resume path before accessing LAPIC.
Remove unneeded disable of LAPIC in the native_lapic_xapic_mode().  We
attempt to send wakeup IPI on the resume path right after BSP wakeup,
so disabling is wrong.

Reported and tested by:	glebius, "Ranjan1018 ." <214748mv@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-16 21:56:19 +00:00
Roger Pau Monné
69138e8788 xen/intr: improve handling of legacy IRQs
Devices that use ISA IRQs expect them to be already configured, and don't
call bus_config_intr, which prevents those IRQs from working on Xen. In
order to solve it pre-register all the legacy IRQs with the default values
(edge triggered, low polarity) if no override is found.

While there add a panic if the registration of an interrupt override fails.

Sponsored by: Citrix Systems R&D
2015-02-16 16:37:59 +00:00
Roger Pau Monné
a2c5251281 xen/intr: improve PIRQ handling
Improve and cleanup the Xen PIRQ event channel code:

 - Remove the xi_shared field as it is unused.
 - Clean the "pending" bit in the EOI handler, this is more similar to how
   native interrupts are handled.
 - Don't mask edge triggered PIRQs, edge trigger interrupts cannot be
   masked.
 - Panic if PHYSDEVOP_eoi fails.
 - Remove the usage of the PHYSDEVOP_alloc_irq_vector hypercall because
   it's just a no-op in the Xen versions that are supported by FreeBSD Dom0.

Sponsored by: Citrix Systems R&D
2015-02-16 16:30:42 +00:00
Konstantin Belousov
738a5a4f41 Detect whether x2APIC on VMWare is usable without interrupt
redirection support.  Older versions of the hypervisor mis-interpret
the cpuid format in ioapic registers when x2APIC is turned on, but IR
is not used by the guest OS.

Based on:	Linux commit 4cca6ea04d31c22a7d0436949c072b27bde41f86
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-14 09:00:12 +00:00
Konstantin Belousov
e17c0a1e56 Registers definitions for the new capabilities from the version 2.4 of
VT-d specification.  Also add definitions for the interrupt remapping
table and IEC.

Print new capabilities on boot. although there is no hardware which
support it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-11 23:30:46 +00:00
Konstantin Belousov
5a49ae8ed5 vm_page_lookup() accepts read-locked object.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-11 23:28:28 +00:00
Konstantin Belousov
4c918926cd Add x2APIC support. Enable it by default if CPU is capable. The
hw.x2apic_enable tunable allows disabling it from the loader prompt.

To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications.  This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered.  This
may be changed later.

In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.

Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.

Reviewed by:	neel
Tested by:	pho (real hardware), neel (on bhyve)
Discussed with:	jhb, grehan
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2015-02-09 21:00:56 +00:00
John Baldwin
f418f79ce2 Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4.  The Intel SDM claims that
the INIT IPIs here are invalid, but other systems follow the MP
spec instead.

While here, fix the IPI wait routine to accept a timeout in microseconds
instead of a raw spin count, and don't spin forever during AP startup.
Instead, panic if a STARTUP IPI is not delivered after 20 us.

PR:		196542
Differential Revision:	https://reviews.freebsd.org/D1719
MFC after:	2 weeks
2015-02-06 18:19:59 +00:00
Bryan Venteicher
4025433eb2 Add interface to derive a TSC frequency from the pvclock
This can later use this to determine the TSC frequency like is done with
VMware, instead of using a DELAY loop that is not always accurate in an VM.

MFC after:	1 month
2015-02-04 08:33:04 +00:00
Bryan Venteicher
d3ccddf3ce Generalized parts of the XEN timer code into a generic pvclock
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.

Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by:	royger (eariler version)
MFC after:	1 month
2015-02-04 08:26:43 +00:00
John Baldwin
d141141610 Opt for performance over power-saving on Intel CPUs that have a
P-state but not C-state invariant TSC by changing the default behavior
to leaving the TSC enabled as the timecounter and disabling C2+ instead
of disabling the TSC by default.

Discussed with:		jkim
Tested by:		Jan Kokemuller <jan.kokemueller@gmail.com>
2015-01-29 20:41:42 +00:00
Roger Pau Monné
b829c841ad loader: fix the size of MODINFOMD_MODULEP
The data in MODINFOMD_MODULEP is packed by the loader as a 4 byte type, but
the amd64 kernel expects a vm_paddr_t, which is of size 8 bytes. Fix this by
saving it as 8 bytes in the loader and retrieving it using the proper type
in the kernel.

Sponsored by: Citrix Systems R&D
2015-01-20 12:28:24 +00:00
Neel Natu
d1b1b60065 Update the vdso timehands only via tc_windup().
Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter
hardware was changed via 'sysctl kern.timecounter.hardware'. This happened
because the vdso timehands update was missing the special treatment in
tc_windup() when changing timecounters.

Reviewed by:	kib
2015-01-20 03:54:30 +00:00
Warner Losh
a66412fc4c Include mca_machdep.h. 2015-01-18 03:43:47 +00:00
Warner Losh
8e9b1703f7 Need to include opt_mca.h to test for DEV_MCA. 2015-01-17 02:17:59 +00:00
Roger Pau Monné
ca49b3342d loader: implement multiboot support for Xen Dom0
Implement a subset of the multiboot specification in order to boot Xen
and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot
implementation is tailored to boot Xen and FreeBSD Dom0, and it will
most surely fail to boot any other multiboot compilant kernel.

In order to detect and boot the Xen microkernel, two new file formats
are added to the bootloader, multiboot and multiboot_obj. Multiboot
support must be tested before regular ELF support, since Xen is a
multiboot kernel that also uses ELF. After a multiboot kernel is
detected, all the other loaded kernels/modules are parsed by the
multiboot_obj format.

The layout of the loaded objects in memory is the following; first the
Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to
long mode by itself), after that the FreeBSD kernel is loaded as a RAW
file (Xen will parse and load it using it's internal ELF loader), and
finally the metadata and the modules are loaded using the native
FreeBSD way. After everything is loaded we jump into Xen's entry point
using a small trampoline. The order of the multiboot modules passed to
Xen is the following, the first module is the RAW FreeBSD kernel, and
the second module is the metadata and the FreeBSD modules.

Since Xen will relocate the memory position of the second
multiboot module (the one that contains the metadata and native
FreeBSD modules), we need to stash the original modulep address inside
of the metadata itself in order to recalculate its position once
booted. This also means the metadata must come before the loaded
modules, so after loading the FreeBSD kernel a portion of memory is
reserved in order to place the metadata before booting.

In order to tell the loader to boot Xen and then the FreeBSD kernel the
following has to be added to the /boot/loader.conf file:

xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga"
xen_kernel="/boot/xen"

The first argument contains the command line that will be passed to the Xen
kernel, while the second argument is the path to the Xen kernel itself. This
can also be done manually from the loader command line, by for example
typing the following set of commands:

OK unload
OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga
OK load kernel
OK load zfs
OK load if_tap
OK load ...
OK boot

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D517

For the Forth bits:
Submitted by: Julien Grall <julien.grall AT citrix.com>
2015-01-15 16:27:20 +00:00
Konstantin Belousov
b1752aa0ea For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Paging
Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-12 07:36:25 +00:00
Konstantin Belousov
6b7c46afec Right now, for non-coherent DMARs, page table update code flushes the
cache for whole page containing modified pte, and more, only last page
in the series of the consequtive pages is flushed (i.e. the affected
mappings should be larger than 2MB).

Avoid excessive flushing and do missed neccessary flushing, by
splitting invalidation and unmapping.  For now, flush exactly the
range of the changed pte.  This is still somewhat bigger than
neccessary, since pte is 8 bytes, while cache flush line is at least
32 bytes.

The originator of the issue reports that after the change,
'dmar_bus_dmamap_unload went from 13,288 cycles down to
3,257. dmar_bus_dmamap_load_buffer went from 9,686 cycles down to
3,517.  and I am now able to get line 1GbE speed with Netperf TCP
(even with 1K message size).'

Diagnosed and tested by:	Nadav Amit <nadav.amit@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-11 20:27:15 +00:00
Konstantin Belousov
df0b07834f Fix calculation of requester for PCI device behind PCIe/PCI bridge.
In my case on the test machine, I have hierarchy of
pcib2 (PCIe port on host bridge with PCIe capability) -> pci2 ->
    pcib3 (ITE PCIe/PCI bridge) -> pci3 -> em1

The device to check PCIe capability is pcib2 and not pcib3, as it is
currently done in the code.  Also, in case of the bridge, we shall
step to pcib2 for the loop iteration, since pcib3 does not carry PCIe
capability info and would force wrong recalculation of rid.

Also change the returned requester to the PCIe bus which provides port
for the bridge.  This only results in changing
hw.busdma.pciX.X.X.X.bounce tunable to force identity-mapped context
for the device.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-10 23:12:49 +00:00
Konstantin Belousov
34e8337b8e Print rid when announcing DMAR context creation. Print sid when fault
occurs.  This allows to connect dots in case the requester is
calculated erronously.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-10 22:57:08 +00:00
Konstantin Belousov
b29d186cb9 Fix DMAR context allocations for the devices behind PCIe->PCI bridges
after dmar driver was converted to use rids.  The bus component to
calculate context page must be taken from the requestor rid, which is
a bridge, and not from the device bus number.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-09 02:10:44 +00:00
Sean Bruno
e31b1dc894 Update Features2 to display SDBG capability of processor. This is
showing up on Haswell-class CPUs

From the Intel SDM, "Table 3-20. Feature Information Returned in the
ECX Register"

11 | SDBG | A value of 1 indicates the processor supports
IA32_DEBUG_INTERFACE MSR for silicon debug.

Submitted by:	jiashiun@gmail.com
Reviewed by:	jhb neel
MFC after:	2 weeks
2015-01-08 16:50:35 +00:00
John Baldwin
c0ae66888b Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision:	https://reviews.freebsd.org/D1232
Submitted by:	jeff (kernel bits)
Reviewed by:	adrian, jeff
2015-01-08 15:53:13 +00:00
Mark Johnston
bdb9ab0dd9 Factor out duplicated code from dumpsys() on each architecture into generic
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.

PR:		193873
Differential Revision:	https://reviews.freebsd.org/D904
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by:	jhibbits (earlier version)
Sponsored by:	EMC / Isilon Storage Division
2015-01-07 01:01:39 +00:00
John Baldwin
92597e064b On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST).  Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR:		192316
Differential Revision:	https://reviews.freebsd.org/D1441
No objection from:	jkim
MFC after:	1 week
2015-01-05 20:44:44 +00:00
Hans Petter Selasky
588327f919 Fix warning about possible use of uninitialized variable. 2015-01-02 08:42:44 +00:00
Roger Pau Monné
f229f35db7 xen/intr: balance dynamic interrupts across available vCPUs
By default Xen binds all event channels to vCPU#0, and FreeBSD only shuffles
the interrupt sources once, at the end of the boot process. Since new event
channels might be created after this point (because new devices or backends
are added), try to automatically shuffle them at creation time.

This does not affect VIRQ or IPI event channels, that are already bound to a
specific vCPU as requested by the caller.

Sponsored by: Citrix Systems R&D
2014-12-10 13:25:21 +00:00
Roger Pau Monné
23ca39cf61 xen: mask event channels while binding them to a vCPU
Mask the event channel source before trying to bind it to a CPU, this
prevents stray interrupts from firing while assigning them and hitting the
KASSERT in xen_intr_handle_upcall.

Sponsored by: Citrix Systems R&D
2014-12-10 11:42:02 +00:00
Roger Pau Monné
1093cd82e0 xen: convert the Grant-table code to a NewBus device
This allows the Grant-table code to attach directly to the xenpv bus,
allowing us to remove the grant-table initialization done in xenpv.

Sponsored by: Citrix Systems R&D
2014-12-10 11:35:41 +00:00
Roger Pau Monné
f35b3592e6 xen: create a new PCI bus override
When running as a Xen PVH Dom0 we need to add custom buses that override
some of the functionality present in the ACPI PCI Bus and the PCI Bus. We
currently override the ACPI PCI Bus, but not the PCI Bus, so add a new
override for the PCI Bus and share the generic functions between them.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D

conf/files.amd64:
 - Add the new files.

x86/xen/xen_pci_bus.c:
 - Generic file that contains the PCI overrides so they can be used by the
   several PCI specific buses.

xen/xen_pci.h:
 - Prototypes for the generic overried functions.

dev/xen/pci/xen_pci.c:
 - Xen specific override for the PCI bus.

dev/xen/pci/xen_acpi_pci.c:
 - Xen specific override for the ACPI PCI bus.
2014-12-09 18:03:25 +00:00
Roger Pau Monné
2691a406c7 xen: notify ACPI about SCI override
If the SCI is remapped to a non-ISA global interrupt notify the ACPI
subsystem about the override.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D
2014-12-09 11:12:24 +00:00
John Baldwin
180e57e5c7 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
John Baldwin
824fc46089 MFamd64: Add support for extended FPU states on i386. This includes
support for AVX on i386.
- Similar to amd64, move the FPU save area out of the PCB and instead
  store saved FPU state in a variable-sized buffer after the PCB on the
  stack.
- To support the variable PCB location, alter the locore code to only use
  the bottom-most page of proc0stack for init386().  init386() returns
  the correct stack pointer to locore which adjusts the stack for thread0
  before calling mi_startup().
- Don't bother setting cr3 in thread0's pcb in locore before calling
  init386().  It wasn't used (init386() overwrote it at the end) and
  it doesn't work with the variable-sized FPU save area.
- Remove the new-bus attachment from npx.  This was only ever useful for
  external co-processors using IRQ13, but those have not been supported
  for several years.  npxinit() is now called much earlier during boot
  (init386()) similar to amd64.
- Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE.
- npxsave() is now only called from context switch contexts so it can
  use XSAVEOPT.

Differential Revision:	https://reviews.freebsd.org/D1058
Reviewed by:	kib
Tested on:	FreeBSD/i386 VM under bhyve on Intel i5-2520
2014-11-02 22:58:30 +00:00
John Baldwin
01e1933dcc Rework virtual machine hypervisor detection.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
  the 0x40000000 leaf to determine the hypervisor vendor ID.  Export the
  vendor ID and the highest supported hypervisor CPUID leaf via
  hv_vendor[] and hv_high variables, respectively.  The hv_vendor[]
  array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
  identcpu.c.  Add a VM_GUEST_VMWARE to identify vmware and use that in
  the TSC code to identify VMWare.

Differential Revision:	https://reviews.freebsd.org/D1010
Reviewed by:	delphij, jkim, neel
2014-10-28 19:17:44 +00:00