Commit Graph

1222 Commits

Author SHA1 Message Date
Julien Grall
b55c0d5f56 xen: move x86-specific xen_vector_callback_enabled to sys/x86
This is x86-only and so should not be in the common area.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29040
2021-03-15 14:20:21 +01:00
Kyle Evans
8cc15b0dfc x86: tsc: deprioritize TSC on VirtualBox
Misbehavior has been observed with TSC under VirtualBox, where threads
doing small sleeps (~1 second) may miss their wake up and hang around
in a sleep state indefinitely.  Switching back to ACPI-fast decidedly
fixes it, so stop using TSC on VirtualBox at least for the time being.

This partially reverts 84eaf2ccc6, applying it only to VirtualBox and
increasing the quality to 0. Negative qualities can never be chosen and
cannot be chosen with the tunable recently added. If we do not have a
timecounter with a higher quality than 0, then TSC does at least leave
the system mostly usable.

PR:		253087
Reviewed by:	emaste, kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29132
2021-03-08 14:43:06 -06:00
Mark Johnston
435c7cfb24 Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29103
2021-03-08 12:39:06 -05:00
Allan Jude
d0673fe160 smbios: Move smbios driver out from x86 machdep code
Add it to the x86 GENERIC and MINIMAL kernels

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	rpokala
Differential Revision:	https://reviews.freebsd.org/D28738
2021-02-23 21:17:09 +00:00
Roger Pau Monné
a2495c3667 xen/boot: allow specifying boot method when booted from Xen
Allow setting the bootmethod variable from the Xen PVH entry point, in
order to be able to correctly set the underlying firmware mode when
booted as a dom0.

Move the bootmethod variable to be defined in x86/cpu_machdep.c
instead so it can be shared by both i386 and amd64.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D28619
2021-02-16 15:26:11 +01:00
Mark Johnston
b577047027 mca: Handle inconsistent CMCI capability reporting
A BIOS bug may apparently cause the BSP to report that it does not
implement CMCI, with some APs reporting that they do.  In this scenario,
avoid a NULL pointer dereference that occurs in cmci_monitor() because
cmc_state was not allocated by the BSP.

PR:		253272
Reported by:	asomers, mmacy
Reviewed by:	kib (previous version)
MFC after:	1 week
2021-02-08 14:42:54 -05:00
Mateusz Guzik
e6ff6154d2 x86: use compiler intrinsics for bswap* 2021-02-01 04:53:23 +00:00
Roger Pau Monné
b6d85a5f51 stand/multiboot: adjust the protocol between loader and kernel
There's a currently ad-hoc protocol to hand off the FreeBSD kernel
payload between the loader and the kernel itself when Xen is in the
middle of the picture. Such protocol wasn't very resilient to changes
to the loader itself, because it relied on moving metadata around to
package it using a certain layout. This has proven to be fragile, so
replace it with a more robust version.

The new protocol requires using a xen_header structure that will be
used to pass data between the FreeBSD loader and the FreeBSD kernel
when booting in dom0 mode. At the moment the only data conveyed is the
offset of the start of the module metadata relative to the start of the
module itself.

This is a slightly disruptive change since it also requires a change
to the kernel which is contained in this patch. In order to update
with this change the kernel must be updated before updating the
loader, as described in the handbook. Note this is only required when
booting a FreeBSD/Xen dom0. This change doesn't affect the normal
FreeBSD boot protocol.

This fixes booting FreeBSD/Xen in dom0 mode after
3630506b9d.

Sponsored by:		Citrix Systems R&D
MFC after:		3 days
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D28411
2021-01-29 15:23:26 +01:00
Konstantin Belousov
9f47eeffa3 x86: switch kernel TSC timecounter to RDTSCP on AMD Zen CPUs
Reported by:	many
Tested by:	gallatin, mikael, rhurlin
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-21 14:55:31 +02:00
Konstantin Belousov
f3ea417f96 x86 busdma_bounce: use malloc_domainset_aligned(9).
This stops busdma bounce making assumptions about alignment of malloc(9)
results, which are no longer true.

Also add assert that the result of malloc_aligned() fits into single
page, which is the assumption of the code.

Reported by:	dim
Reviewed by:	andrew, jah, markj
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28147
2021-01-17 19:29:05 +02:00
Konstantin Belousov
f9d85a0821 Revert "x86 busdma_bounce: do not make assumptions about alignment of malloc(9) results."
This reverts commit 8f54940f01.
The free needs to be called on the address returned by malloc,
not the realigned address.

Noted by:	andrew
Sponsored by:	The FreeBSD Foundation
2021-01-13 17:44:00 +02:00
Konstantin Belousov
8f54940f01 x86 busdma_bounce: do not make assumptions about alignment of malloc(9) results.
Reported by:	dim
Reviewed by:	dim, jah
Tested by:	dim, pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28108
2021-01-13 17:00:49 +02:00
Konstantin Belousov
895ad33784 x86 budma_bounce: style.
Reviewed by:	dim, jah
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28108
2021-01-13 17:00:46 +02:00
Konstantin Belousov
a013e285df x86 tsc: mark %eax as earlyclobber in tscp_get_timecount_low().
i386 codegen insists on preloading tc_priv into register on i386, and
this register cannot be %eax because RDTSCP instruction clobbers it
before it is used.

Reported and tested by:	dim
MFC after:	6 days
Sponsored by:	The FreeBSD Foundation
2021-01-11 00:05:49 +02:00
Konstantin Belousov
9e680e4005 tsc: add RDTSCP or faster variants of get_timecount()
Use it in preference of Xfenced RDTSC if RDTSCP is supported. It is
recommended by both Intel and AMD. But, on AMD Zens and newer use
LFENCE, as recommended by AMD [*]. In particular, this means that now
AMD CPUs use more appropriate fence instead of too harsh MFENCe.

Add comment explaining the intent of the selection logic.

Reported by:	gallatin [*]
Reviewed by:	gallatin, markj
Tested by:	gallatin, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
826fc3cc3d tsc: use u_int for return type for prototype, same as in definitions.
Reviewed by:	gallatin, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
aa9450e44b x86 identcpu.c: fix formatting of the comment.
Reviewed by:	gallatin, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:33 +02:00
Konstantin Belousov
84eaf2ccc6 x86: stop punishing VMs with low priority for TSC timecounter
I suspect that virtualization techniques improved from the time when we
have to effectively disable TSC use in VM.  For instance, it was reported
(complained) in https://github.com/JuliaLang/julia/issues/38877 that
FreeBSD is groundlessly slow on AWS with some loads.

Remove the check and start watching for complaints.

Reviewed by:	emaste, grehan
Discussed with:	cperciva
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27629
2020-12-23 12:45:15 +02:00
Ryan Libby
ee47a12a49 dmar: reserve memory windows of PCIe root port
PCI memory address space is shared between memory-mapped devices (MMIO)
and host memory (which may be remapped by an IOMMU). Device accesses to
an address within a memory aperture in a PCIe root port will be treated
as peer-to-peer and not forwarded to an IOMMU. To avoid this, reserve
the address space of the root port's memory apertures in the address
space used by the IOMMU for remapping.

Reviewed by:	kib, tychon
Discussed with:	Anton Rang <rang@acm.org>
Tested by:	tychon
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D27503
2020-12-09 18:43:58 +00:00
Warner Losh
730b1b4d1c busdma: Annotate bus_dmamap_sync() with fence
Add an explicit thread fence release before returning from
bus_dmamap_sync. This should be a no-op in practice, but makes
explicit that all ordinary stores will be completed before subsequent
reads/writes to ordinary device memory. On x86, normal memory ordering
is strong enough to generally guarantee this. The fence keeps the
optimizer (likely LTO) from reordering other calls around this.
The other architectures already have calls, as appropriate, that
are equivalent.

Note: On x86, there is one exception to this rule. If you've mapped
memory as write combining, then you will need to add a sfence or
similar. Normally, though, busdma doesn't operate on such memory, and
drivers that do already cope appropriately.

Reviewed by: kib@, gallatin@, chuck@, mav@
Differential Revision: https://reviews.freebsd.org/D27448
2020-12-04 21:34:04 +00:00
John Baldwin
5941edfcdc Add a kstack_contains() helper function.
This is useful for stack unwinders which need to avoid out-of-bounds
reads of a kernel stack which can trigger kernel faults.

Reviewed by:	kib, markj
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27356
2020-12-01 17:04:46 +00:00
Toomas Soome
a4a10b37d4 Add VT driver for VBE framebuffer device
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.

struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.

Differential Revision:	https://reviews.freebsd.org/D27373
2020-11-30 08:22:40 +00:00
Yuri Pankov
41062d6878 hwpstate_intel: don't unconditionally print the error message
Actually check the wrmsr_safe() return value when setting autonomous
HWP for package.

PR:		245582
Differential Revision:	https://reviews.freebsd.org/D24744
2020-11-29 01:43:04 +00:00
Ruslan Bukin
f593116991 Add device_t member to struct iommu.
This is needed on arm64 for the interface between iommu framework
and iommu controller drivers.

Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D27229
2020-11-16 15:29:52 +00:00
Conrad Meyer
e9b13c6612 linux(4): Deduplicate unimpl/dummy syscall handlers
No functional change.

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27099
2020-11-05 19:30:31 +00:00
Ruslan Bukin
9729b14985 Move the iommu stubs to a generic place, so they are available on all the
platforms.

This allows to not depend on the IOMMU macro in AHCI driver.

Requested by:	kib
Suggested by:	andrew
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26887
2020-10-23 21:27:48 +00:00
Ruslan Bukin
94dfb28ee0 Assign the reserved apic region (GAS entry) to the iommu domain msi_entry.
Requested by:	kib
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26859
2020-10-19 15:50:58 +00:00
Konstantin Belousov
d3ba71b2b1 Limit workaround for errata E400 to appropriate AMD cpus.
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10.  For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved.  Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.

While there, style the code:
    move MSR defines to specialreg.h
    move identification to initcpu.c

Reported by:	whu
Reviewed by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26470
2020-10-14 22:57:50 +00:00
Konstantin Belousov
6f3b523c9a Avoid dump_avail[] redefinition.
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h.  This fixes default gcc build for mips.

Reviewed by:	alc, scottph
Tested by:	kevans (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26741
2020-10-14 22:51:40 +00:00
Warner Losh
8e82f10172 timer_restore is now unused, remove it
apm was the only consumer of timer_restore. Now that it's gone, this
can be removed.
2020-10-08 20:56:11 +00:00
Mitchell Horne
6debfd4b13 Remove unused function cpu_boot()
The prototype was added with the creation of kern_shutdown.c in r17658,
but it appears to have never been implemented. Remove it now.

Reviewed by:	cem, kib
Differential Revision:	https://reviews.freebsd.org/D26702
2020-10-06 23:16:56 +00:00
Ryan Moeller
3331a1d173 Explicit CTLFLAG_DYN not needed
Dynamically created OIDs automatically get this flag set.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D26561
2020-10-04 19:37:15 +00:00
Konstantin Belousov
5e8ea68fd8 Move ctx_switch_xsave declaration to amd64 md_var.h.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:07:09 +00:00
Ruslan Bukin
6186bfbd18 Rename kernel option ACPI_DMAR to IOMMU.
This is mostly needed for a common arm64/amd64 iommu code.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26587
2020-09-29 20:29:07 +00:00
Michal Meloun
88f7c52f31 Add missing declarations of 64-bit variants of bus_peek/bus_poke on amd64.
It fixes GENERIC-KCSAN build.

Reported by:	rpokala
MFC after:	1 month
MFC with:	r365899
2020-09-24 08:40:32 +00:00
Warner Losh
f9ba2bbe3a Use envvar rather than nonstandard hint. lines
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.

Suggested by: kevans
2020-09-23 19:18:53 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Michal Meloun
3182062142 Add missing assignment forgotten in r365899
Noticed by:	mav
MFC after:	1 month
MFC with:	r365899
2020-09-20 15:11:52 +00:00
Michal Meloun
95a85c125d Add NetBSD compatible bus_space_peek_N() and bus_space_poke_N() functions.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.

Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.

This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25371
2020-09-19 11:06:41 +00:00
Scott Long
74c781ed91 Refine the busdma template interface. Provide tools for filling in fields
that can be extended, but also ensure compile-time type checking.  Refactor
common code out of arch-specific implementations.  Move the mpr and mps
drivers to this new API.  The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
2020-09-14 05:58:12 +00:00
Jason A. Harmening
3966af52f3 amd64: prevent KCSan false positives on LAPIC mapping
For configurations without x2APIC support (guests, older hardware), the global
LAPIC MMIO mapping will trigger false-positive KCSan reports as it will appear
that multiple CPUs are concurrently reading and writing the same address.
This isn't actually true, as the underlying physical access will be performed
on the local CPU's APIC. Additionally, because LAPIC access can happen during
event timer configuration, the resulting KCSan printf can produce a panic due
to attempted recursion on event timer resources.

Add a __nosanitizethread preprocessor define to prevent the compiler from
inserting TSan hooks, and apply it to the x86 LAPIC accessors.

PR:		249149
Reported by:	gbe
Reviewed by:	andrew, kib
Tested by:	gbe
Differential Revision:	https://reviews.freebsd.org/D26354
2020-09-12 07:04:00 +00:00
John Baldwin
d8dc46f6e9 Add constant for the DE_CFG MSR on AMD CPUs.
Reported by:	Patrick Mooney <pmooney@pfmooney.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25885
2020-09-11 20:32:40 +00:00
Ruslan Bukin
cb9050dd21 Move the rid variable to the generic iommu context.
It could be used in various IOMMU platforms, not only DMAR.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26373
2020-09-10 14:12:25 +00:00
Mateusz Guzik
ab6c81a218 x86: clean up empty lines in .c and .h files 2020-09-01 21:23:59 +00:00
Konstantin Belousov
f446480b5f amd64: Handle 5-level paging on wakeup.
We can switch into long mode directly with LA57 enabled.

Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:43:23 +00:00
Konstantin Belousov
25f2da2e64 Add amd64 procctl(2) ops to manage forced LA48/LA57 VA after exec.
Tested by:	pho (LA48 hardware)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:32:13 +00:00
Konstantin Belousov
4ba405dcdb Add definition for CR4.LA57 bit.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:08:05 +00:00
Peter Grehan
3a3f1e9dfa Export a routine to provide the TSC_AUX MSR value and use this in vmm.
Also, drop an unnecessary set of braces.

Requested by:	kib
Reviewed by:	kib
MFC after:	3 weeks
2020-08-18 11:36:38 +00:00
Ruslan Bukin
0424f19e9e Move dmar_domain_unload_task to busdma_iommu.c.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25972
2020-08-06 12:49:25 +00:00
Ruslan Bukin
16696f6057 Add iommu_domain constructor and destructor.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25956
2020-08-06 08:48:23 +00:00
Ruslan Bukin
c4cd699010 o Add machine/iommu.h and include MD iommu headers from it,
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.

Requested by:	andrew
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25957
2020-08-05 19:11:31 +00:00
Ruslan Bukin
78b517543b Add a few macroses for conversion between DMAR unit, domain, ctx
and IOMMU unit, domain, ctx.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25926
2020-08-04 20:51:05 +00:00
Mark Johnston
96ad26eefb Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches.  They no longer serve any
purpose since UMA now provides their functionality by default.  Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by:	cem, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25937
2020-08-04 13:58:36 +00:00
Ruslan Bukin
0eed04c802 Add iommu_domain_map_ops virtual table with map/unmap methods
so x86 can support Intel DMAR and AMD IOMMU simultaneously.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25894
2020-07-31 23:02:17 +00:00
Ruslan Bukin
c8597a1f9f o Don't include headers from iommu.h, include them from the header
consumers instead;
o Order includes properly.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25878
2020-07-29 22:08:54 +00:00
Ruslan Bukin
386f81fca7 Fix !ACPI_DMAR build.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25882
2020-07-29 19:22:50 +00:00
Ruslan Bukin
1b0c9c21d9 o Move iommu_set_buswide_ctx, iommu_is_buswide_ctx to
the generic iommu busdma backend;
o Move bus_dma_iommu_set_buswide, bus_dma_iommu_load_ident
  prototypes to iommu.h.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25866
2020-07-29 13:23:27 +00:00
Ruslan Bukin
ea4c01156a o Move the buswide_ctxs bitmap to iommu_unit and rename related functions.
o Rename bus_dma_dmar_load_ident() as well.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25852
2020-07-28 16:08:14 +00:00
Alexander Motin
855e49f3b0 Add initial driver for ACPI Platform Error Interfaces.
APEI allows platform to report different kinds of errors to OS in several
ways.  We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI.  Without respective
driver it ended up in kernel panic without any additional information.

This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling.  It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise.  Errors
marked as fatal still end up in kernel panic, but some more informative.

When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code.  Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2020-07-27 21:19:41 +00:00
Ruslan Bukin
15f6baf445 Rename DMAR flags:
o DMAR_DOMAIN_* -> IOMMU_DOMAIN_*
o DMAR_PGF_* -> IOMMU_PGF_*

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25812
2020-07-26 12:29:22 +00:00
Ruslan Bukin
357149f037 o Make the _hw_iommu sysctl node non-static;
o Move the dmar sysctl knobs to _hw_iommu_dmar.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25807
2020-07-25 21:37:07 +00:00
Ruslan Bukin
9c843a409b o Move iommu gas prototypes, DMAR flags to iommu.h;
o Move hw.dmar sysctl node to iommu_gas.c.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25802
2020-07-25 19:07:12 +00:00
Alexander Motin
aba10e131f Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.

To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs.  On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY.  To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25754
2020-07-25 15:19:38 +00:00
Ruslan Bukin
3024e8af1e Move Intel GAS to dev/iommu/ as now a part of generic iommu framework.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25799
2020-07-25 11:34:50 +00:00
Ruslan Bukin
62ad310c93 Split-out the Intel GAS (Guest Address Space) management component
from Intel DMAR support, so it can be used on other IOMMU systems.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25743
2020-07-25 09:28:38 +00:00
Alexander Motin
9977c593a7 Introduce ipi_self_from_nmi().
It allows safe IPI sending to current CPU from NMI context.

Unlike other ipi_*() functions this waits for delivery to leave LAPIC in
a state safe for interrupted code.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-07-24 20:52:09 +00:00
Alexander Motin
279cd05b7e Use APIC_IPI_DEST_OTHERS for bitmapped IPIs too.
It should save bunch of LAPIC register accesses.

MFC after:	2 weeks
2020-07-24 20:44:50 +00:00
Alexander Motin
23ce462092 Make lapic_ipi_vectored(APIC_IPI_DEST_SELF) NMI safe.
Sending IPI to self or all CPUs does not require write into upper part of
the ICR, prone to races.  Previously the code disabled interrupts, but it
was not enough for NMIs.  Instead of that when possible write only lower
part of the register, or use special SELF IPI register in x2APIC mode.

This also removes ICR reads used to preserve reserved bits on write.
It was there from the beginning, but I failed to find explanation why,
neither I see Linux doing it.  Specification even tells that ICR content
may be lost in deep C-states, so if hardware does not bother to preserve
it, why should we?

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-07-24 19:54:15 +00:00
Ruslan Bukin
1238a28d15 Move sys/iommu.h to dev/iommu/ as a part of generic IOMMU busdma backend.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25750
2020-07-21 13:50:10 +00:00
Ruslan Bukin
f2b2f31707 Move the Intel DMAR busdma backend to a generic place so
it can be used on other IOMMU systems.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25720
2020-07-21 10:38:51 +00:00
Ruslan Bukin
08e99af305 o Move iommu_test_boundary() to sys/iommu.h
o Rename DMAR -> IOMMU in comments
o Add IOMMU_PAGE_SIZE / IOMMU_PAGE_MASK macroses
o x86 only: dmar_quirks_pre_use() / dmar_instantiate_rmrr_ctxs()

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25665
2020-07-18 13:10:31 +00:00
Ryan Moeller
ef013ceecd hwpmc: Always set pmc_cpuid to something
pmc_cpuid was uninitialized for most AMD processor families.  We can still
populate this string for unimplemented families.

Also added a CPUID_TO_STEPPING macro and converted existing code to use it.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25673
2020-07-14 22:25:06 +00:00
Konstantin Belousov
dc43978aa5 amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu.  Now each CPU can initiate
shootdown IPI independently from other CPUs.  Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu.  After that IPI is sent to all targets
which scan for zeroed scoreboard generation words.  Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set.  Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.

Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.

The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.

Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.

Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
   this is fine because we in fact do not send IPIs from interrupt
   handlers. More for !x2APIC mode where ICR access for write requires
   two registers write, we disable interrupts around it. If considered
   incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
   cache line, to reduce ping-pong.

Reviewed by:	markj (previous version)
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D25510
2020-07-14 20:37:50 +00:00
Ruslan Bukin
59e37c8a54 Start splitting-out the Intel DMAR busdma backend to a generic place,
so it can be used on other IOMMU systems.

Provide MI iommu_unit, iommu_domain and iommu_ctx structs in sys/iommu.h;
use them as a first member of MD dmar_unit, dmar_domain and dmar_ctx.

Change the namespace in DMAR backend: use iommu_ prefix instead of dmar_.

Move some macroses and function prototypes to sys/iommu.h.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25574
2020-07-14 10:55:19 +00:00
Jung-uk Kim
450d86fc7f Assume all TSCs are synchronized for AMD Family 17h processors and later
when it has passed the synchronization test.

"Processor Programming Reference (PPR) for AMD Family 17h" states that
the TSC uses a common reference for all sockets, cores and threads.

MFC after:	1 month
2020-06-22 20:42:58 +00:00
Konstantin Belousov
17edf152e5 Control for Special Register Buffer Data Sampling mitigation.
New microcode update for Intel enables mitigation for SRBDS, which
slows down RDSEED and related instructions.  The update also provides
a control to limit the mitigation to SGX enclaves, which should
restore the speed of random generator by the cost of potential
cross-core bufer sampling.

See https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling

GIve the user control over it.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25221
2020-06-12 22:14:45 +00:00
Konstantin Belousov
958d257ed5 x86: add bits definitions for SRBDS mitigation control.
See https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25221
2020-06-12 22:12:57 +00:00
Andrew Gallatin
6da16e3eb0 x86: Bump default msi/msix vector limit to 2048
Given that 64c/128t CPUs are currently available, and that many
devices (nvme, many NICs) desire to map 1 MSI-X vector per core,
or even 1 per-thread, it is becoming far easier to see MSI-X interrupt
setup fail due to msi vector exhaustion, and devices fail to attach at
boot on large system.

This bump costs 12KB on amd64 (and 6KB on i386), which seems
worth the trade off for a better out of the box experience on
high end hardware.

Reviewed by:	jhb
MFC after:	21 days
Sponsored by:	Netflix
2020-06-12 18:41:12 +00:00
Konstantin Belousov
e09fb42a9a Correct comment (this should have been committed with r362065).
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2020-06-11 20:26:39 +00:00
Konstantin Belousov
e7a291f418 Restore TLB invalidations done before smp started.
In particular, invalidation of the preloaded modules text to allow
execution from it was broken after D25188/r362031.

Reviewed by:	markj
Reported by:	delphij, dhw
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2020-06-11 17:25:20 +00:00
Konstantin Belousov
3b23ffe271 amd64 pmap: reorder IPI send and local TLB flush in TLB invalidations.
Right now code first flushes all local TLB entries that needs to be
flushed, then signals IPI to remote cores, and then waits for
acknowledgements while spinning idle.  In the VMWare article 'Don’t
shoot down TLB shootdowns!' it was noted that the time spent spinning
is lost, and can be more usefully used doing local TLB invalidation.

We could use the same invalidation handler for local TLB as for
remote, but typically for pmap == curpmap we can use INVLPG for locals
instead of INVPCID on remotes, since we cannot control context
switches on them.  Due to that, keep the local code and provide the
callbacks to be called from smp_targeted_tlb_shootdown() after IPIs
are fired but before spin wait starts.

Reviewed by:	alc, cem, markj, Anton Rang <rang at acm.org>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D25188
2020-06-10 22:07:57 +00:00
Jason A. Harmening
1dccf71b4b Remove unnecessary WITNESS check in x86 bus_dma
When I did some bus_dma cleanup in r320528, I brought forward some sketchy
WITNESS checks from the prior x86 busdma wrappers, instead of recognizing
them as technical debt and just dropping them.  Two of these were removed in
r346351 and r346851, but one remains in bounce_bus_dmamem_alloc(). This check
could be constrained to only apply in the BUS_DMA_NOWAIT case, but it's cleaner
to simply remove it and rely on the checks already present in the sleepable
allocation paths used by this function.

While here, remove another unnecessary witness check in bus_dma_tag_create
(the tag is always allocated with M_NOWAIT), and fix a couple of typos.

Reported by:	cem
Reviewed by:	kib, cem
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25107
2020-06-03 00:16:36 +00:00
Roger Pau Monné
a7752a2eda xenpv: do not use low 1MB for Xen mappings on i386
On amd64 we already avoid using memory below 4GB in order to prevent
clashes with MMIO regions, but i386 was allowed to use any hole in
the physical memory map in order to map Xen pages.

Limit this to memory above the 1MB boundary on i386 in order to avoid
clashes with the MMIO holes in that area.

Sponsored by:	Citrix Systems R&D
MFC after:	1 week
2020-05-28 08:18:34 +00:00
Conrad Meyer
9d3b7f622a x86: Detect new feature bits
Fix an off-by-one in AVX512VPOPCNTDQ identification.  That was actually the
TME bit.

Reported by:	debdrup
2020-05-26 23:12:57 +00:00
Ruslan Bukin
43843cc281 Rename dmar_get_dma_tag() to acpi_iommu_get_dma_tag().
This is needed for a new IOMMU controller support.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D24943
2020-05-26 16:40:40 +00:00
Konstantin Belousov
ea6020830c amd64: Add a knob to flush RSB on context switches if machine has SMEP.
The flush is needed to prevent cross-process ret2spec, which is not handled
on kernel entry if IBPB is enabled but SMEP is present.
While there, add i386 RSB flush.

Reported by:	Anthony Steinhauser <asteinhauser@google.com>
Reviewed by:	markj, Anthony Steinhauser
Discussed with:	philip
admbugs:	961
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-05-20 22:00:31 +00:00
Konstantin Belousov
36e1ad61e8 Do not consider CAP_RDCL_NO as an indicator for all MDS vulnerabilities
handled by hardware.

Reported by:	Anthony Steinhauser <asteinhauser@google.com>
admbugs:	962
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-05-20 21:22:25 +00:00
Mark Johnston
e76aab6ae2 Call acpi_pxm_set_proximity_info() slightly earlier on x86.
This function is responsible for setting pc_domain in each pcpu
structure.  Call it from the main function that starts APs, rather than
a separate SYSINIT.  This makes it easier to close the window where
UMA's per-CPU slab allocator may be called while pc_domain is
uninitialized.  In particular, the allocator uses pc_domain to allocate
domain-local pages, so allocations before this point end up using domain
0 for everything.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24757
2020-05-14 16:07:27 +00:00
Eric van Gyzen
ba0ced82ea Fix handling of NMIs from unknown sources (BMC, hypervisor)
Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure.  This includes NMIs from
IPMI BMCs and hypervisors.

Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.

Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi.  Expand the meaning of panic_on_nmi by making
it a bitfield.  There are currently two bits: one for NMIs due to
hardware failure, and one for all others.  Leave room for more.

If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes: machdep.kdb_on_nmi is gone; machdep.panic_on_nmi changed
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D24558
2020-04-26 00:41:29 +00:00
Konstantin Belousov
ab23c2784b Improve TSC calibration logic.
Stop attempting to use FADT legacy hardware flag, it is almost always
a lie.

Instead, unless user explicitly disabled the calibration, calibrate
against 8254 ISA clock.  Then, obtain the rough value of the expected
TSC frequency from CPUID leafs 0x15/0x16 or even from the CPU
marketing name string.  If calibration results look unbelievably bogus
comparing with CPUID leafs report, use CPUID one.

Intel does not recommend to use CPUID leaf 0x16 for the value of the
system time frequency, indeed the error there might be up to 1% which
e.g. makes ntpd give up.  If ISA clock is present, we win, if not, we
get some frequency that allows the machine to boot without enormous
delay.

Next improvement would be to use HPET for re-calibration if we decided
that ISA clock gives bogus results, after HPETs are enumerated. This
is a much bigger change since we probably would need to re-evaluate
some constants depending on TSC frequency.

Reviewed by:	emaste, jhb, scottl
Tested by:	scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D24426
2020-04-15 22:28:51 +00:00
Konstantin Belousov
bd8a359f81 x86 tsc: fall back to CPUID if calibration results looks unbelievable.
Teested and reviewed by:	scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D24247
2020-04-01 16:21:11 +00:00
Konstantin Belousov
46ed9da5fc Do not spuriously re-enable disabled io_apic pin on EOI for some configurations.
If EOI suppression is supported but reported ioapic version is so old
that it does not has EOI register (weird virtualization setup), fix
Intel trick of eoi-ing by flipping pin type (edge/level) to account
for the disabled pin.

Reported by:	Juniper
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:34:52 +00:00
Konstantin Belousov
1314c4920f Stop (trying to) renumber io apics.
It does not serve any purpose now, the io apic id is not seen by
software, and some Intel documents claim that the register is
implemented for FUD reasons.  More, renumbering seems to not work on
new Intel machines which actually have mismatched MADT and hw IDs.

On older machines where separate APIC bus existed, unique numbering of
all APICs was required for bus arbitration to work, but it is no
longer true (that machines were SMP from pre-Pentium IV era).

When matching PCIe IOAPIC device against MADT-enumerated IOAPICs,
compare io_apic_id from BAR against io_apic_id read from the
MADT-pointed register page.

Reviewed by:	jhb
Tested by:	flo (previous version), pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:31:35 +00:00
Konstantin Belousov
c105f5c1fa Widen the stored io_apic_id to 8 bits.
It seems that the newer Intel chipset did that, and Linux reads 8
bits. The only detail is that all seen datasheets, even under NDA,
claim that io apic id is 4 bits.

Submitted by:	jeff
Reviewed by:	jhb
Tested by:	flo, pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:24:34 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Konstantin Belousov
a324b7f71d Fix IBRS for machines with IBRS_ALL capability.
When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores.  Right now code only performed MSR write
onr the CPU where sysctl was run.

Properly report hw.ibrs_active for IBRS_ALL.  Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.

Reported and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-25 17:26:10 +00:00
Konstantin Belousov
43cd55db57 x86/identcpu.c whitespace cleanup.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-02-21 16:55:28 +00:00
Konstantin Belousov
3f490f6e57 print_svm_info: decode a CPUID 0x8000000a.edx bit 20.
This is SVM features word, the bit is defined in "PPR for AMD Family
17h Model 31h B0", document 55803 Rev 0.54.

N.B. GuesSpecCtl (no 't') is the spelling from the document.

Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	3 days
2020-02-21 16:41:17 +00:00
Mateusz Guzik
082a6b2a92 Make atomic_load_ptr type-aware
Returned value has type based on the argument, meaning consumers no longer
have to cast in the commmon case.

This commit keeps the kernel compilable without patching the rest.
2020-02-14 23:15:41 +00:00
Doug Moore
5a9447a324 In dmar_gas_lowermatch, skip searching a subtree if all its addresses are greater than lowaddr.
In dmar_gas_uppermatch, skip searching a subtree if all its gaps-between-alloctions are too small.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23391
2020-02-01 21:47:34 +00:00
Conrad Meyer
e656fa70dc hwpstate_intel(4): Save admin-set EPP/EPB and restore after suspend 2020-02-01 20:12:02 +00:00
Conrad Meyer
c93eb470b4 hwpstate_intel(4): Print failure message only on failure
X-MFC-With: r357379
2020-02-01 20:11:25 +00:00
Conrad Meyer
cd4e43b27d hwpstate_intel(4): Detect and support PKG variant
If package-level control is present, we default to using it.  Per-core
software control may be enabled by setting the machdep.hwpstate_pkg_ctrl
tunable to "0" in loader.conf(5).
2020-02-01 19:50:10 +00:00
Conrad Meyer
556a1a0bc6 hwpstate_intel(4): Add fallback EPP using PERF_BIAS MSR
Per Intel SDM (Vol 3b Part 2), if HWP indicates EPP (energy-performance
preference) is not supported, the hardware instead uses the ENERGY_PERF_BIAS
MSR.  In the epp sysctl handler, fall back to that MSR if HWP does not
support EPP and CPUID indicates the ENERGY_PERF_BIAS MSR is supported.
2020-02-01 19:49:13 +00:00
Conrad Meyer
5e3574c8cd x86: Add/amend some power-management comments/macros
No functional change.
2020-02-01 19:46:02 +00:00
Conrad Meyer
b80d476c3c hwpstate_intel(4): Error check epp sysctl & bail if HW does not support feature 2020-02-01 19:45:27 +00:00
Conrad Meyer
f591c3c847 intel_hwpstate(4): Use identcpu-cached cpuid 6 leaf
No functional change.
2020-02-01 17:54:46 +00:00
Conrad Meyer
351896d372 intel_hwpstate(4): Don't leak bound thread in error conditions
I don't know why a Skylake CPU with the HWP feature bit present would trap
on MSR reads of the HWP registers, but if this occurs, do not leave the
attach thread bound.  This could conceivably cause reported hangs, although
I have no evidence that this is the cause.

Reported by:	ae@, Andreas Nilsson <andrnils AT gmail.com>
X-MFC-With:	r357002
2020-02-01 17:30:45 +00:00
Conrad Meyer
43524989c5 hwpstate(4): Ignore CurPstateLimit by default
Add a sysctl knob to allow users to re-enable it, and document the knob and
default in cpufreq.4.  (While here, add a few unrelated updates to
cpufreq.4.)

It seems that the register value in some hardware simply reflects the
configured P-state.  This results in an inadvertent and unintended outcome
where the P-state can only walk down, and then the driver becomes "stuck" in
the slowest possible P-state.

The Linux driver never consults this register, so that's some evidence that
ignoring the contents are relatively harmless.

PR:		234733
Reported by:	sigsys AT gmail.com, Erich Dollanksy <freebsd.ed.lists AT
		sumeritec.com>
2020-01-31 17:40:41 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Conrad Meyer
07a65f9d38 hwpstate_intel(4): Silence/fix Coverity reports
These were all introduced in the initial import of hwpstate_intel(4).

Reported by:	Coverity
CIDs:		1413161, 1413164, 1413165, 1413167
X-MFC-With:	r357002
2020-01-29 03:15:34 +00:00
Conrad Meyer
d9591f0c2a x86: identcpu: Decode new Intel Structured Extended feature bits 2020-01-28 01:37:20 +00:00
Conrad Meyer
4799e1997a x86: identcpu: Decode new Zen2 AMD Feature2 bit 2020-01-28 01:36:45 +00:00
Doug Moore
f886c4ba71 Correct the use of RB_AUGMENT in the RB_TREE macros so that is invoked
at the root of every subtree that changes in an insert or delete, and
only once, and ordered from the bottom of the tree to the top.  For
intel_gas.c, the only user of RB_AUGMENT I can find, change the
augmenting routine so that it does not climb from entry to tree root
on every call, and remove a 'tree correcting' function that can be
supplanted by proper tree augmentation.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23189
2020-01-27 15:09:13 +00:00
Conrad Meyer
9ea85092d9 hwpstate(4): Log a debug line when throttled
If we're going to throttle user requested P-states, we should at least produce
a debug log line indicating the surprising behavior.

PR:		inspired by 234733
2020-01-27 06:04:32 +00:00
Conrad Meyer
08a220dd79 cpufreq(4): Fix missing MODULE_DEPEND on hwpstate_intel
DRIVER_MODULE does not actually define a MODULE_VERSION, which is required
to satisfy a MODULE_DEPENDency.  Declare one explicitly in
hwpstate_intel(4).

Reported by:	flo
X-MFC-With:	r357002
2020-01-23 23:52:57 +00:00
Konstantin Belousov
b94c55a9cb Fix r356919.
Instead of waiting for pc_curthread which is overwritten by
init_secondary_tail(), wait for non-NULL pc_curpcb, to be set by the
first context switch.
Assert that pc_curpcb is not set too early.

Reported and tested by:	rlibby
Reviewed by:	markj, rlibby
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23330
2020-01-23 17:08:33 +00:00
Cy Schubert
ca9fb12a0b Fix 32-bit build post r357002. 2020-01-23 03:38:41 +00:00
Conrad Meyer
4577cf3744 cpufreq(4): Add support for Intel Speed Shift
Intel Speed Shift is Intel's technology to control frequency in hardware,
with hints from software.

Let's get a working version of this in the tree and we can refine it from
here.

Submitted by:	bwidawsk, scottph
Reviewed by:	bcr (manpages), myself
Discussed with:	jhb, kib (earlier versions)
With feedback from:	Greg V, gallatin, freebsdnewbie AT freenet.de
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D18028
2020-01-22 23:28:42 +00:00
Konstantin Belousov
2ee49fac82 Add support for Hygon Dhyana Family 18h processor.
As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.

The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).

To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.

Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.

The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.

References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:

https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef

[2] MSR and CPUID definition:

https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf

Submitted by:	Pu Wen <puwen@hygon.cn>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23163
2020-01-21 13:22:35 +00:00
Konstantin Belousov
65e5f2cdd4 x86: Wait for curthread to be set up as an indicator that the boot stack
is no longer used.

pc_curthread is set by cpu_switch after it stopped using the old
thread (or boot) stack.  This makes the smp_after_idle_runnable()
function not dependent on the internals of the scheduler operations.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23276
2020-01-20 17:23:03 +00:00
Mateusz Guzik
44c78346f6 x86: fix assertion in ipi_send_cpu to range check the passed id
Prior to the change for sufficiently bad id (and in particular NOCPU which is -1)
it would access memory outside of the cpu_apic_ids array.
2020-01-19 21:35:51 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Scott Long
757d4fbaa7 Introduce the concept of busdma tag templates. A template can be allocated
off the stack, initialized to default values, and then filled in with
driver-specific values, all without having to worry about the numerous
other fields in the tag. The resulting template is then passed into
busdma and the normal opaque tag object created.  See the man page for
details on how to initialize a template.

Templates do not support tag filters.  Filters have been broken for many
years, and only existed for an ancient make/model of hardware that had a
quirky DMA engine.  Instead of breaking the ABI/API and changing the
arugment signature of bus_dma_tag_create() to remove the filter arguments,
templates allow us to ignore them, and also significantly reduce the
complexity of creating and managing tags.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D22906
2019-12-24 14:48:46 +00:00
Ryan Libby
9825eadf2c bitset: rename confusing macro NAND to ANDNOT
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too.  The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with:	jeff
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22791
2019-12-13 09:32:16 +00:00
Scott Long
0d42317659 Fix the TAA state machine to do the right thing when the TAA
migitation is available in microcode and the operator has set
the sysctl to automatic mode.

Reported by:	Coverity
CID: 1408334

MFC after:	3 days
Sponsored by:	Intel
2019-12-10 18:57:39 +00:00
Konstantin Belousov
ff326a1879 x86: Restore the critical section around whole ipi_bitmap_handler() if
hardclock IPI is delivered.

In the current code after r355311, critical section is taken only
around hardclockintr() call, and sched_preempt() is called after the
section is exited. If we reschedule after exit, as we typically would
due to conditions that caused IPI, in ULE the runq tdq_ipipending is
not cleared, which blocks generation of further preempt IPIs.

Since all relatively modern (10 years) hardware has per-cpu event
timers, restoring the critical section conditionally does not affect
it.

Reported and tested by: cy
Diagnosed and reviewed by: jeff (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D22716
2019-12-07 00:28:08 +00:00
Scott Long
961aacb107 Move the mds, irbs, and ssb mitigation knobs into machdep.mitigations.
They're in both the old and new places in HEAD for the moment for
discussion and transition.  The old locations will be garbage collected
in 4 weeks.  MFCs to 12 an 11 will keep the old and new for transition
purposes.

Reviewed by:	kib
MFC after:	4 weeks
Sponsored by:	Intel
Differential Revision:	https://reviews.freebsd.org/D22590
2019-12-06 02:43:05 +00:00
Conrad Meyer
ee02bd9c9c x86: Add missed break to TAA status sysctl
Just a typo that Coverity identified.

Coverity also identified an unused store in the same functional area (x86 TAA
stuff), but this commit does not address that issue (CID 1408334).

Reported by:	Coverity
CID:		1408328, 1408332
2019-12-04 02:42:22 +00:00
Jeff Roberson
0f9e06e18b Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
Jeff Roberson
fb6a57ef89 Don't run sched_preempt() inside of an extra critical section. This disables
the sched_preempt() switch optimization and causes the sched lock to be dropped
and immediately reacquired.

Reviewed by:	jhb, kib, mav, markj (with changes)
Differential Revision:	https://reviews.freebsd.org/D22623
2019-12-02 22:34:19 +00:00
Konstantin Belousov
5c3771d272 bus_dma_dmar_load_ident(9): load identity mapping into the map.
Requested, reviewed and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22559
2019-11-27 19:57:17 +00:00
Scott Long
184b15ff07 Clean up and clarify meta commentary on TAA. Add a state to denote
that TSX doesn't exist on the CPU.

MFC after:	3 days
Sponsored by:	Intel
2019-11-27 19:12:32 +00:00
Konstantin Belousov
4f4f3c8fdc Limit bus_dma_dmar_set_buswide() definition to kernel only.
The header is abused for inclusion into userspace, and on stable
branches neither device_t nor bool types are not defined when used
from userspace.

Sponsored by:	The FreeBSD Foundation
X-MFC after:	now
2019-11-25 14:16:41 +00:00
Andrew Turner
849aef496d Port the NetBSD KCSAN runtime to FreeBSD.
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.

This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22315
2019-11-21 11:22:08 +00:00
Konstantin Belousov
685666aaf7 bus_dma_dmar_set_buswide(9): KPI to indicate that the whole dmar
context should share page tables.

Practically it means that dma requests from any device on the bus are
translated according to the entries loaded for the bus:0:0 device.
KPI requires that the slot and function of the device be 0:0, and that
no tags for other devices on the bus were used.

The intended use are NTBs which pass TLPs from the downstream to the
host with slot:func of the downstream originator.

Reviewed and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22434
2019-11-18 20:56:59 +00:00
Konstantin Belousov
fa83f68917 Add x86 msr tweak KPI.
Use the KPI to tweak MSRs in mitigation code.

Reviewed by:	markj, scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22431
2019-11-18 20:53:57 +00:00
Scott Long
e372160177 TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.

Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.

Control knobs are:
machdep.mitigations.taa.enable:
        0 - no software mitigation is enabled
        1 - attempt to disable TSX
        2 - use the VERW mitigation
        3 - automatically select the mitigation based on processor
	    features.

machdep.mitigations.taa.state:
        inactive        - no mitigation is active/enabled
        TSX disable     - TSX is disabled in the bare metal CPU as well as
                        - any virtualized CPUs
        VERW            - VERW instruction clears CPU buffers
	not vulnerable	- The CPU has identified itself as not being
			  vulnerable

Nothing in the base FreeBSD system uses TSX.  However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.

Reviewed by:	emaste, imp, kib, scottph
Sponsored by:	Intel
Differential Revision:	22374
2019-11-16 00:26:42 +00:00
Scott Long
22d13bfd34 Revert a patch that accidentally was committed with r354729 2019-11-15 11:54:51 +00:00
Scott Long
99a6085fde Fix a typo in how the AVX512DQ feature bit is checked.
Reviewed by:	kib
Sponsored by:	Intel
2019-11-15 11:53:06 +00:00
Scott Long
837d733265 Add new bit definitions for TSX, related to the TAA issue. The actual
mitigation will follow in a future commit.

Sponsored by:	Intel
2019-11-12 19:15:16 +00:00
Konstantin Belousov
c08973d09c Workaround for Intel SKL002/SKL012S errata.
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs.  For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs.  The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.

Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.

Reviewed by:	alc, emaste, markj, scottl
Security:	CVE-2018-12207
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21884
2019-11-12 18:01:33 +00:00
Roger Pau Monné
b2802351c1 xen: fix dispatching of NMIs
Currently NMIs are sent over event channels, but that defeats the
purpose of NMIs since event channels can be masked. Fix this by
issuing NMIs using a hypercall, which injects a NMI (vector #2) to the
desired vCPU.

Note that NMIs could also be triggered using the emulated local APIC,
but using a hypercall is better from a performance point of view
since it doesn't involve instruction decoding when not using x2APIC
mode.

Reported and Tested by:	avg
Sponsored by:		Citrix Systems R&D
2019-11-12 10:31:28 +00:00
Scott Long
c47c10a1f3 Add the text attribute for MDS_NO in the IA32_ARCH_CAP MSR. 2019-11-11 22:18:05 +00:00
Andriy Gapon
e688e78187 revert r354482, checking for XENHVM was a wrong way of checking for Xen 2019-11-07 21:43:31 +00:00
Andriy Gapon
bff7f83d39 IPI_TRACE is not really supported on xen
x86 stack_save_td_running() can work safely only if IPI_TRACE is a
non-maskable interrupt.  But at the moment FreeBSD/Xen does not provide
support for the NMI delivery mode.  So, mark the functionality as
unsupported similarly to other platforms without NMI.
Maybe there is a way to provide a Xen-specific working
stack_save_td_running(), but I couldn't figure it out.

MFC after:	3 weeks
Sponsored by:	Panzura
2019-11-07 21:14:59 +00:00
Andrew Gallatin
bb7aaac379 Add tunable to allow interrupts on hyperthreaded cores
Enabling interrupts on htt cores has benefits to workloads which are primarily
interrupt driven by increasing the logical cores available for interrupt handling.
The tunable is named machdep.hyperthreading_intr_allowed

Reviewed by:	kib, jhb
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22233
2019-11-04 19:30:19 +00:00
Conrad Meyer
ebcfcba8f8 amd64: Fix typo: RDPRU bit is 0x10, not 0x04
Bit 4 != 4, of course.

X-MFC-With:	r354162
2019-10-30 04:00:44 +00:00
Conrad Meyer
706bc29b7b amd64: Define and decode new AMD64 feature bits
These are documented in revisions 3.32 of the public AMD64 Vol. 2 and
revision 3.28 of Vol. 3, published October and September 2019, respectively.
2019-10-30 01:41:14 +00:00
Conrad Meyer
1a352c3c1b hw.intrbalance: Make sysctl tunable
This allows specifying a boot-time preference in loader.conf.
2019-10-19 16:37:49 +00:00