Commit Graph

1375 Commits

Author SHA1 Message Date
Mark Johnston
deca0138dc x86: Check for APIC presence only if DEV_ATPIC is defined
We only attempt to gracefully handle absence of an APIC if "device
atpic" is defined in the kernel configuration.

Suggested by:	kib
Reviewed by:	jhb, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-12-28 17:47:49 -05:00
John Baldwin
254e4e5b77 Simplify swi for bus_dma.
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages.  For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm).  However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi.  I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux.  However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih.  Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly.  This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by:	imp, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33447
2021-12-28 13:51:25 -08:00
Alexander Motin
1d6fb900ed x86: Remove CTLFLAG_NEEDGIANT from sysctls.
MFC after:	2 weeks
2021-12-25 22:24:20 -05:00
Corvin Köhne
16f02a4cb4 pci: add missing PCI id of Coffee Lake GPU
The PCI id of an UHD Graphics 630 for Coffee Lake GPUs is missing in
the PCI id list of all Intel GPUs.

You can take a look at
https://dgpu-docs.intel.com/devices/hardware-table.html to check that
this device id exists.  Or check the linux code:
d0e062ebb3

MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33460
2021-12-17 23:18:31 +02:00
Mateusz Guzik
e7236a7ddf xen: plug some of set-but-not-used vars
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-15 13:46:17 +00:00
Mateusz Guzik
ceed3949bc x86: plug a set-but-not-unused var in native_lapic_ipi_free
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-10 11:55:03 +00:00
Alexander Motin
8493918868 busdma: Remove outdated comments about Giant.
MFC after:	2 weeks
2021-12-09 22:18:53 -05:00
Alexander Motin
a69f810466 smist: Remove unneeded Giant from bus_dma_tag_create().
bus_dmamap_load() call uses BUS_DMA_NOWAIT.

MFC after:	2 weeks
2021-12-09 20:54:22 -05:00
John Baldwin
1a62e9bc00 Add <machine/tls.h> header to hold MD constants and helpers for TLS.
The header exports the following:

- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)

Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).

Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr.  libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).

A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.

Reviewed by:	kib (older version of x86/tls.h), jrtc27
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33351
2021-12-09 13:17:13 -08:00
Bjoern A. Zeeb
df38ada293 modules: increase MAXMODNAME and provide backward compat
With various firmware files used by graphics and wireless drivers
we are exceeding the current 32 character module name (file path
in kldxref) length.
In order to overcome this issue bump it to the maximum path length
for the next version.
To be able to MFC provide backward compat support for another version
of the struct as the offsets for the second half change due to the
array size increase.

MAXMODNAME being defined to MAXPATHLEN needs param.h to be
included first.  With only 7 modules (or LinuxKPI module.h) not
doing that adjust them rather than including param.h in module.h [1].

Reported by:	Greg V (greg unrelenting.technology)
Sponsored by:	The FreeBSD Foundation
Suggested by:	imp [1]
MFC after:	10 days
Reviewed by:	imp (and others to different level)
Differential Revision:	https://reviews.freebsd.org/D32383
2021-12-09 18:09:53 +00:00
Alexander Motin
63346fef33 mca: Some error handling logic improvements.
- Enable local MCEs on capable Intel CPUs.  It delivers exceptions
only to the affected CPU instead of global broadcast, requiring a lot
of synchronization between CPUs.  AMD always deliver MCEs locally.
 - Make MCE handler process only uncorrected errors, while CMCI and
polling only corrected.  It reduces synchronization problems between
them and is explicitly recommended by the documentation.
 - Add minimal support for uncorrected software recoverable errors
on Intel CPUs.  It allows to avoid kernel panics in case uncorrected
errors do not affect current operation, like ones found during scrub
or write.  Such errors are only logged, postponing the panic until
the corrupted data will actually be needed (that may never happen).
 - Reduce polling period from 1 hour to 5 minutes.

MFC after:	2 weeks
2021-12-08 21:39:24 -05:00
Alexander Motin
9a128e1678 mca: Switch to using taskqueue_enqueue_timeout_sbt().
Previously it was not allowed on fast taskqueues.  It was fixed in
4730a8972b.  This should make no functional change, just a bit
cleaner and efficient code.

MFC after:	1 week
2021-12-08 12:29:15 -05:00
Alexander Motin
3bdba24c74 mca: Decode new Intel status bits.
MFC after:	1 week
2021-12-08 12:03:28 -05:00
Alexander Motin
935dc0de88 mca: Remove excessively verbose debug messages.
Expecially in case of AMD there was more than dozen lines per CPU.

MFC after:	1 week
2021-12-07 22:27:09 -05:00
Alexander Motin
c2003f2684 mca: Make some sysctls also a loader tunables.
MFC after:	1 week
2021-12-07 22:22:01 -05:00
Mark Johnston
553af8f1ec x86: Perform late TSC calibration before LAPIC timer calibration
This ensures that LAPIC calibration is done using the correct tsc_freq
value, i.e., the one associated with the TSC timecounter.  It does mean
though that TSC calibration cannot use sbinuptime() to read the
reference timecounter, as timehands are not yet set up.

Reviewed by:	kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33209
2021-12-06 10:42:19 -05:00
Mark Johnston
62d09b46ad x86: Defer LAPIC calibration until after timecounters are available
This ensures that we have a good reference timecounter for performing
calibration.

Change lapic_setup to avoid configuring the timer when booting, and move
calibration and initial configuration to a new lapic routine,
lapic_calibrate_timer.  This calibration will be initiated from
cpu_initclocks(), before an eventtimer is selected.

Reviewed by:	kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33206
2021-12-06 10:42:10 -05:00
Mark Johnston
f06f1d1fdb x86: Deduplicate clock.h
The headers were mostly identical on amd64 and i386.

No functional change intended.

Reviewed by:	cperciva, mav, imp, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33205
2021-12-06 10:39:08 -05:00
Scott Long
c0ea0b4989 Fix "set but not used" in the x86 pci driver.
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-05 15:10:16 -07:00
Mitchell Horne
03b3d7bbec x86: remove unused T_USER flag
It stopped being used in 3c256f5395, when trap() was reorganized to
have separate switch statements for user and kernel traps. Remove the
two leftover references and the flag itself.

Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D33253
2021-12-05 11:12:40 -04:00
Alexander Motin
a07d426509 x86: Make AMD elvtX dump more compact.
MFC after:	2 weeks
2021-12-04 21:47:19 -05:00
Scott Long
d85a58cb0c Fix "set but not used" in busdma_bounce.
Sponsored by: Rubicon Communications, LLC ("Netgate")
2021-12-03 15:20:42 -07:00
Mitchell Horne
1adebe3cd6 minidump: Parameterize minidumpsys()
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.

This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.

Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.

The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.

Reviewed by:	kib, markj, jhb
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D31989
2021-11-19 15:05:52 -04:00
Mark Johnston
22875f8879 x86: Implement deferred TSC calibration
There is no universal way to find the TSC frequency.  Newer Intel CPUs
may report it via CPUID leaves 0x15 and 0x16.  Sometimes it can be
obtained from the PLATFORM_INFO MSR as well, though we never use that.
On older platforms we derive the frequency using a DELAY(1000000) call,
which uses the 8254 PIT.  On some newer platforms the 8254 is apparently
non-functional, leading to bogus calibration results.  On such platforms
the TSC frequency must be available from CPUID.  It is also possible to
disable calibration with a tunable, in which case we try to parse the
brand string if the TSC freq is not available from CPUID.

CPUID 0x15 provides an authoritative TSC frequency value, but even that
is not always available on new Intel platforms.  CPUID 0x16 provides the
specified processor base frequency, which is not the same as the TSC
frequency.  Empirically, it is close enough for early boot, but too far
off for timekeeping: on a Comet Lake NUC, CPUID 0x16 yields 1600MHz but
the TSC frequency is rougly 1608MHz, leading to frequent clock stepping
when NTP is in use.

Thus we have a situation where we cannot calibrate using the PIT and
cannot obtain a precise frequency from CPUID (or MSRs).  This change
seeks to address that by using the CPUID 0x16 value during early boot
and refining the calibration later once ACPI-based timecounters are
available.  TSC frequency detection is thus split into two phases:

Early phase:
- On Intel platforms, query CPUID 0x15 and 0x16 and use that value
  initially if available.
- Otherwise, get an estimate using the PIT, reducing the delay loop to
  100ms from 1s.
- Continue to register the TSC as the CPU ticks provider early, even
  though the frequency may be off.  Otherwise any code executed during
  boot that uses cpu_ticks() (e.g., context switching) gets tripped up
  when the ticks provider changes.

Later phase:
- In SI_SUB_CLOCKS, once the timehands are initialized, load the current
  TSC and timecounter (sbinuptime()) values at the beginning and end of
  a 1s interval and use the timecounter frequency (typically from
  kvmclock, HPET or the ACPI PM timer) to estimate the TSC frequency.
- Update the TSC timecounter, global tsc_freq and CPU ticker with the
  new frequency and finally register the TSC as a timecounter.

Reviewed by:	kib, jhb (previous version)
Discussed with:	imp, cperciva
MFC after:	6 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32512
2021-11-15 16:13:24 -05:00
Mark Johnston
ab12e8db29 amd64: Reduce the amount of cpuset copying done for TLB shootdowns
We use pmap_invalidate_cpu_mask() to get the set of active CPUs.  This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.

Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers.  Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32792
2021-11-15 13:01:31 -05:00
Alexander Motin
6badb512a9 Prefer CPUID leaf 1Fh for Intel CPU topology detection.
Leaf 1Fh is a prefered extended version of 0Bh.  It is supported by
new Lader Lake CPUs, though does not report anything new so far.

MFC after:	2 weeks
2021-11-06 00:53:52 -04:00
Kyle Evans
6a8ea6d174 sched: split sched_ap_entry() out of sched_throw()
sched_throw() can no longer take a NULL thread, APs enter through
sched_ap_entry() instead.  This completely removes branching in the
common case and cleans up both paths.  No functional change intended.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D32829
2021-11-05 15:45:51 -05:00
Kyle Evans
589aed00e3 sched: separate out schedinit_ap()
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).

Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:

- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler

cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().

Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.

Reviewed by:	alfredo, kib, markj
Triage help from:	andrew, markj
Smoke-tested by:	alfredo (ppc), kevans (arm64, x86), mhorne (arm)
Differential Revision:	https://reviews.freebsd.org/D32797
2021-11-03 15:54:59 -05:00
Kornel Duleba
06e6ca6dd3 dmar: Disable protected memory regions after initialization
Some BIOSes protect memory region they reside in by using DMAR to
prevent devices from doing any DMA transactions to that part of RAM.
AMI refers to this as "DMA Control Guarantee".
Disable the protection when address translation is enabled.
I stumbled upon this while investigation a failing coredump on a device
which has this feature enabled.

Sponsored by:		Stormshield
Obtained from:		Semihalf
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D32591
2021-10-29 10:08:25 +02:00
Kornel Duleba
3c02da8096 dmar: Don't try to reserve PCI regions for non-existing devices
In some cases we might have to create DMAR context before the
corresponding device has been enumerated by the PCI bus.
In that case we get called with NULL dev, because of that trying
to reserve PCI regions causes a NULL pointer dereference in
pci_find_pcie_root_port.

Sponsored by:		Stormshield
Obtained from:		Semihalf
MFC after:		2 weeks
Reviewed by:		kib, rlibby
Differential revision:	https://reviews.freebsd.org/D32589
2021-10-29 10:08:25 +02:00
Gleb Smirnoff
6aae3517ed Retire synchronous PPP kernel driver sppp(4).
The last two drivers that required sppp are cp(4) and ce(4).

These devices are still produced and can be purchased
at Cronyx <http://cronyx.ru/hardware/wan.html>.

Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no
longer support FreeBSD officially.  Later they have dropped
support for Linux drivers to.  As of mid-2020 they don't even
have a developer to maintain their Windows driver.  However,
their support verbally told me that they could provide aid to
a FreeBSD developer with documentaion in case if there appears
a new customer for their devices.

These drivers have a feature to not use sppp(4) and create an
interface, but instead expose the device as netgraph(4) node.
Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on
top of the node and get your synchronous PPP.  Alternatively
you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC.
Actually, last time I used cp(4) back in 2004, using netgraph(4)
instead of sppp(4) was already the right way to do.

Thus, remove the sppp(4) related part of the drivers and enable
by default the negraph(4) part.  Further maintenance of these
drivers in the tree shouldn't be a big deal.

While doing that, remove some cruft and enable cp(4) compilation
on amd64.  The ce(4) for some unknown reason marks its internal
DDK functions with __attribute__ fastcall, which most likely is
safe to remove, but without hardware I'm not going to do that, so
ce(4) remains i386-only.

Reviewed by:		emaste, imp, donner
Differential Revision:	https://reviews.freebsd.org/D32590
See also:		https://reviews.freebsd.org/D23928
2021-10-22 11:41:36 -07:00
Konstantin Belousov
661bd70bd7 DMAR: clean up warnings about write-only variables
For some of them, used only when KTR or KMSAN are configured, apply
__unused attribute directly.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-21 21:40:46 +03:00
Mark Johnston
06ebadc5f5 x86: Remove some leftover APM support
This is obsolete since commit 8c576a279e ("Remove APM BIOS support").

Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32510
2021-10-18 09:56:59 -04:00
Mark Johnston
de8554295b cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well.  No functional change
intended.

This is a re-application of commit
9068f6ea69.

Reviewed by:	cem, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32029
2021-10-18 09:56:58 -04:00
Konstantin Belousov
f8d3368b43 apic: initialize lapic_paddr statically
The default value for LAPIC registers page physical address
is usually right. Having this value available early makes
pmap_force_invalidate_cache_range(), used on non-self-snoop machines,
avoid flushing LAPIC range for early calls.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32318
2021-10-06 05:52:56 +03:00
Mitchell Horne
ab4ed843a3 minidump: De-duplicate the progress bar
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31885
2021-09-29 16:42:21 -03:00
Konstantin Belousov
24a3897c2c x86 bounce_bus_dmamem_alloc(): use malloc_aligned() only when possible
malloc_domainset_aligned() requires that alignment is less than
page size. Fall back to other allocation methods, most likely
kmem_alloc_contig(), when malloc_aligned() cannot fullfill the driver
request.

Reported by:	Loic F <loic.f@hardenedbsd.org>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32127
2021-09-25 15:58:12 +03:00
Alexander Motin
d3a8f98acb Make CPU children explicitly share parent unit numbers.
Before this device unit number match was coincidental and broke if I
disabled some CPU device(s).  Aside of cosmetics, for some drivers
(may be considered broken) it caused talking to wrong CPUs.
2021-09-24 23:31:51 -04:00
Alexander Motin
ef50d5fbc3 x86: Add NUMA nodes into CPU topology.
Depending on hardware, NUMA nodes may match last level caches, or
they may be above them (AMD Zen 2/3) or below (Intel Xeon w/ SNC).
This information is provided by ACPI instead of CPUID, and it is
provided for each CPU individually instead of mask widths, but
this code should be able to properly handle all the above cases.

This change should immediately allow idle stealing in sched_ule(4)
to prefer load from NUMA-local CPUs to remote ones when the node
does not match LLC.  Later we may think of how to better handle it
on sched_pickcpu() side.

MFC after:	1 month
2021-09-23 14:31:38 -04:00
Konstantin Belousov
e36d0e86e3 Revert "linux32: add a hack to avoid redefining the type of the savefpu tag"
This reverts commit 0f6829488e.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble.  Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant.  Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.

PR:	258678
Reported by:	cy
Reviewed by:	cy, kevans, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32060
2021-09-22 23:17:47 +03:00
Mark Johnston
bcdc599dc2 Revert "cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it"
This reverts commit 9068f6ea69.

The underlying macro needs to be reworked to avoid problems with control
flow statements.

Reported by:	rlibby
2021-09-21 13:51:42 -04:00
Konstantin Belousov
0f6829488e linux32: add a hack to avoid redefining the type of the savefpu tag
when compiling in amd64 kernel environment with -m32.  This is a temporal
workaround for some future proper (but unclear) fix.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Mark Johnston
9068f6ea69 cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well.  No functional change
intended.

Reviewed by:	cem, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32029
2021-09-21 12:07:47 -04:00
Konstantin Belousov
2b6eec531a x86: duplicate acpi_wakeup.c per i386 and amd64
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31931
2021-09-14 00:23:14 +03:00
Konstantin Belousov
db2ba218d9 amd64 acpi_wakeup: map 1:1 whole low 4G for the trampoline page table
This is required since kernel text might be physically located
anywhere below 4G.

PR:	258432
Reported by:	Taku YAMAMOTO <taku@tackymt.homeip.net>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31916
2021-09-13 19:52:13 +03:00
Konstantin Belousov
ceca8ac1ce x86 acpi_install_wakeup_handler(): style
Do not use tab between type and variable name in local declarations.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31916
2021-09-13 19:52:06 +03:00
Konstantin Belousov
e99255c8a6 amd64: do not touch low memory in acpi_wakeup_ap() if booted by UEFI
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31916
2021-09-13 19:51:52 +03:00
Colin Percival
cd165c8bf0 x86/tsc.c: Add TSLOG to test_tsc
On my benchmark system this takes ~ 14 ms; enough to be worth
recording in the boot time profile.
2021-09-09 17:02:15 -07:00
Andrew Turner
b792434150 Create sys/reg.h for the common code previously in machine/reg.h
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).

Reviewed by:	imp, markj
Sponsored by:	DARPA, AFRL (original work)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19830
2021-08-30 12:50:53 +01:00
Dimitry Andric
8e3c56d6b6 xen: Fix warning by adding KERNBASE to modlist_paddr before casting
Clang 13 produces the following warning for hammer_time_xen():

sys/x86/xen/pv.c:183:19: error: the pointer incremented by -2147483648 refers past the last possible element for an array in 64-bit address space containing 256-bit (32-byte) elements (max possible 576460752303423488 elements) [-Werror,-Warray-bounds]
                    (vm_paddr_t)start_info->modlist_paddr + KERNBASE;
                                ^                               ~~~~~~~~
sys/xen/interface/arch-x86/hvm/start_info.h:131:5: note: array 'modlist_paddr' declared here
    uint64_t modlist_paddr;         /* Physical address of an array of           */
    ^

This is because the expression first casts start_info->modlist_paddr to
struct hvm_modlist_entry * (via vmpaddr_t), and *then* adds KERNBASE,
which is then interpreted as KERNBASE * sizeof(struct
hvm_modlist_entry).

Instead, parenthesize the addition to get the intended result, and cast
it to struct hvm_modlist_entry * afterwards. Also remove the cast to
vmpaddr_t since it is not necessary.

Reviewed by:	royger
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D31711
2021-08-29 19:43:00 +02:00
Adam Fenn
d4b2d3035a pvclock: Add vDSO support
Add vDSO support for timekeeping devices that support the KVM/XEN
paravirtual clock API.

Also, expose, in the userspace-accessible '<machine/pvclock.h>',
definitions that will be needed by 'libc' to support
'VDSO_TH_ALGO_X86_PVCLK'.

Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31418
2021-08-14 15:57:54 +03:00
Adam Fenn
6c69c6bb4c kvm_clock: KVM paravirtual clock support
Add support for the KVM paravirtual clock device.

Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D29733
2021-08-14 15:57:54 +03:00
Adam Fenn
0b3382b863 pvclock: Add 'struct pvclock' API
Consolidate more hypervisor-agnostic functionality behind a new 'struct
pvclock' API.

This should also make it easier to subsequently add hypervisor-agnostic
vDSO timekeeping support.

Also, perform some clean-up:
    - Remove 'pvclock_get_last_cycles()'; do not allow external access
      to 'pvclock_last_systime' since this is not necessary.
    - Consolidate/simplify wall and system time reading codepaths.
    - Ensure correct ordering within wall and system time reading
      codepaths via 'atomic(9)' and 'rdtsc_ordered()' rather than via
      'rmb()'.
    - Remove some extra newlines.

Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31418
2021-08-14 15:57:54 +03:00
Adam Fenn
652ae7b114 x86: cpufunc: Add rdtsc_ordered()
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.

Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.

Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D31416
2021-08-14 15:57:53 +03:00
NagaChaitanya Vellanki
2a9b4076dc
Merge common parts of i386 and amd64's ieeefp.h into x86/x86_ieeefp.h
MFC after:	1 week
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D26292
2021-08-12 18:45:22 +08:00
Dmitry Chagin
de8374df28 fork: Allow ABI to specify fork return values for child.
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D31472
MFC after:		2 weeks
2021-08-12 11:45:25 +03:00
Mark Johnston
693c9516fa busdma: Add KMSAN integration
Sanitizer instrumentation of course cannot automatically update shadow
state when devices write to host memory.  KMSAN thus hooks into busdma,
both to update shadow state after a device write, and to verify that the
kernel does not publish uninitalized bytes to devices.

To implement this, when KMSAN is configured, each dmamap embeds a memory
descriptor describing the region currently loaded into the map.
bus_dmamap_sync() uses the operation flags to determine whether to
validate the loaded region or to mark it as initialized in the shadow
map.

Note that in cases where the amount of data written is less than the
buffer size, the entire buffer is marked initialized even when it is
not.  For example, if a NIC writes a 128B packet into a 2KB buffer, the
entire buffer will be marked initialized, but subsequent accesses past
the first 128 bytes are likely caused by bugs.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31338
2021-08-10 21:27:54 -04:00
Mark Johnston
3a1802fef4 busdma: Add an internal BUS_DMA_FORCE_MAP flag to x86 bounce_busdma
Use this flag to indicate that busdma should allocate a map structure
even no bouncing is required to satisfy the tag's constraints.  This
will be used for KMSAN.

Also fix a memory leak that can occur if the kernel fails to allocate
bounce pages in bounce_bus_dmamap_create().

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31338
2021-08-10 21:27:54 -04:00
Mark Johnston
b0f71f1bc5 amd64: Add MD bits for KMSAN
Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code.  This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls.  Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.

Use these functions in amd64 interrupt and exception handlers.  Note
that handlers for user->kernel transitions need not be annotated.

Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31467
2021-08-10 21:27:53 -04:00
Ed Maste
9feff969a0 Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by:	The FreeBSD Foundation
2021-08-08 10:42:24 -04:00
Konstantin Belousov
d0bc4b4666 x86_msr_op: extend the KPI to allow MSR read and single-CPU operations
Reivewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31386
2021-08-05 18:46:37 +03:00
Mark Johnston
b2ed7e988a bus: Convert to the new interceptor scheme
This was missed in commit a90d053b84.

Fixes:		a90d053b84
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-30 15:15:27 -04:00
Mark Johnston
a90d053b84 Simplify kernel sanitizer interceptors
KASAN and KCSAN implement interceptors for various primitive operations
that are not instrumented by the compiler.  KMSAN requires them as well.
Rather than adding new cases for each sanitizer which requires
interceptors, implement the following protocol:
- When interceptor definitions are required, define
  SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX.
- In headers that declare functions which need to be intercepted by a
  sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide
  declarations.
- When SAN_RUNTIME is defined, do not redefine the names of intercepted
  functions.  This is typically the case in files which implement
  sanitizer runtimes but is also needed in, for example, files which
  define ifunc selectors for intercepted operations.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-29 21:13:32 -04:00
Konstantin Belousov
b27fe1c3ba amd64: stop doing special allocation for the AP startup trampoline
There is no reason now why do we need to allocate trampoline page very
early in the boot process.  The only requirement for the page is that
it is below 1M to be usable by the real mode during init.  This can be
handled by vm_alloc_contig() when we do the startup.

Also assert that startup trampoline fits into single page.  In principle
we can do multi-page allocation if needed, but it is not.

Move the alloc_ap_trampoline() function and the boot_address variable to
i386/mp_machdep.c.  Keep existing mechanism of early alloc on i386.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31343
2021-07-30 01:20:45 +03:00
Alexander Motin
5a49f19141 Do not expose to scheduler caches of single CPU.
Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2).  But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work.  With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.

MFC after:	2 weeks
2021-07-28 16:38:01 -04:00
Julien Grall
ac959cf544 xen: introduce xen_has_percpu_evtchn()
xen_vector_callback_enabled is x86 specific and availability of
per-cpu event channel delivery differs on other architectures.

Introduce a new helper to check if there's support for per-cpu event
channel injection.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29402
2021-07-28 17:27:05 +02:00
Julien Grall
0b4f30c236 xen/control: introduce xen_pv_shutdown_handler()
While x86 only register PV shutdown handler for PV guests. ARM guests
are always using HVM and requires the PV shutdown handler.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29406
2021-07-28 17:27:04 +02:00
Julien Grall
69c6eee756 xen: introduce xen_pv_disks_disabled()
ARM guest is considered as HVM in Freebsd but they only support PV disk
(no emulation available).

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29403
2021-07-28 17:27:04 +02:00
Julien Grall
5f70008327 xen/netfront: introduce xen_pv_nics_disabled()
ARM guest is considered as HVM but it only supports PV nics (no
emulation available).

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29405
2021-07-28 17:27:04 +02:00
Elliott Mitchell
c89f1f12b0 xen/xen-os: move inclusion of machine/xen-os.h later
Several of x86 enable/disable functions depend upon the xen*domain()
functions.  As such the xen*domain() functions need to be declared
before machine/xen-os.h.

Officially declare direct inclusion of machine/xen/xen-os.h verboten as
such will break these functions/macros.  Remove one such soon to be
broken inclusion.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29811
2021-07-28 17:27:04 +02:00
Elliott Mitchell
9976c5a540 xen/intr: use __func__ instead of function names
Functions tend to get renamed and unless the developer is careful
often debugging messages are missed. As such using func is far
superior.  Replace several instances of hard-coded function names.

Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29499
2021-07-28 17:27:03 +02:00
Elliott Mitchell
5ca00e0c98 xen/intr: use struct xenisrc * as xen_intr_handle_t
Since xen_intr_handle_t is meant to be an opaque handle and the only
use is retrieving the associated struct xenisrc *, directly use it as
the opaque handler.

Also add a wrapper function for converting the other direction.  If some
other value becomes appropriate in the future, these two functions will
be the only spots needing modification.

Reviewed by: mhorne, royger
Differential Revision: https://reviews.freebsd.org/D29500
2021-07-28 17:27:03 +02:00
Elliott Mitchell
b6ff9345a4 xen: create VM_MEMATTR_XEN for Xen memory mappings
The requirements for pages shared with Xen/other VMs may vary from
architecture to architecture.  As such create a macro which various
architectures can use.

Remove a use of PAT_WRITE_BACK in xenstore.c.  This is a x86-ism which
shouldn't have been present in a common area.

Original idea: Julien Grall <julien@xen.org>, 2014-01-14 06:44:08
Approach suggested by: royger
Reviewed by: royger, mhorne
Differential Revision: https://reviews.freebsd.org/D29351
2021-07-28 17:27:02 +02:00
Julien Grall
a48f7ba444 xen: move x86/xen/xenpv.c to dev/xen/bus/xenpv.c
Minor changes are necessary to make this processor-independent, but
moving the file out of x86 and into common is the first step (so
others don't add /more/ x86-isms).

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29042
2021-07-28 17:27:02 +02:00
Konstantin Belousov
d6717f8778 amd64: rework AP startup
Stop using temporal page table with 1:1 mapping of low 1G populated over
the whole VA.  Use 1:1 mapping of low 4G temporarily installed in the
normal kernel page table.

The features are:
- now there is one less step for startup asm to perform
- the startup code still needs to be at lower 1G because CPU starts in
  real mode. But everything else can be located anywhere in low 4G
  because it is accessed by non-paged 32bit protected mode.  Note that
  kernel page table root page is at low 4G, as well as the kernel itself.
- the page table pages can be allocated by normal allocator, there is
  no need to carve them from the phys_avail segments at very early time.
  The allocation of the page for startup code still requires some magic.
  Pages are freed after APs are ignited.
- la57 startup for APs is less tricky, we directly load the final page
  table and do not need to tweak the paging mode.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31121
2021-07-27 20:11:15 +03:00
Mark Johnston
f95e683fa2 Annotate amd64 stack unwinders with __nomemorysanitize
Sponsored by:	The FreeBSD Foundation
2021-07-23 10:47:13 -04:00
Dmitry Chagin
1ca6b15bbd Drop "All rights reserved" from my copyright statements.
Add email and fixup years while here.

Reviewed by:		imp
Differential Revision:	https://reviews.freebsd.org/D30912
MFC after:		2 weeks
2021-07-20 10:05:50 +03:00
Dmitry Chagin
9931033bbf linux(4); Almost complete the vDSO.
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.

The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
  implemented in the vDSO: for now gettimeofday, clock_gettime.

The first two have been implemented, so add the implementation of system
calls.

System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
  can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).

In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.

Tested by:		trasz (arm64)
Differential revision:  https://reviews.freebsd.org/D30900
MFC after:              2 weeks
2021-07-20 10:01:18 +03:00
Mark Johnston
36226163fa x86: Mark the trapframe as initialized in ipi_bitmap_handler()
Otherwise KASAN may generate false positives if the trapframe was
written into a poisoned region of the stack.

Reported by:	pho
Reported by:	syzbot+ee60455cd58e6eed20c9@syzkaller.appspotmail.com
Reported by:	syzbot+be5f9df26426ace3a00c@syzkaller.appspotmail.com
Sponsored by:	The FreeBSD Foundation
2021-07-09 20:38:50 -04:00
Alex Richardson
9bb8a4091c Reduce code duplication in machine/_types.h
Many of these typedefs are the same across all architectures or can
be set based on an architecture-independent compiler-provided macro
(e.g. __SIZEOF_SIZE_T__). These macros have been available since GCC 4.6
and Clang sometime before 3.0 (godbolt.org does not have any older clang
versions installed).

I originally considered using the compiler-provided `__FOO_TYPE__` directly.
However, in order to do so we have to check that those match the previous
typedef exactly (not just that they have the same size) since any change
would be an ABI break. For example, changing `long` to `long long` results
in different C++ name mangling. Additionally, Clang and GCC disagree on
the underlying type for some of (u)int*_fast_t types, so this change
only moves the definitions that are identical across all architectures
and does not touch those types.

This de-deduplication will allow us to have a smaller diff downstream in
CheriBSD: we only have to only change the (u)intptr_t definition in
sys/_types.h in CheriBSD instead of having to change machine/_types.h for
all CHERI-enabled architectures (currently RISC-V, AArch64 and MIPS).

Reviewed By: imp, kib
Differential Revision: https://reviews.freebsd.org/D29895
2021-06-14 16:30:16 +01:00
Konstantin Belousov
37f780d3e0 Disable x2APIC for SandyBridge laptops with Samsung BIOS
From the PR:
Almost always, my Samsung RF511 laptop could not boot with
x2APIC enabled in the kernel. It froze during SMP initialization,
shortly after "ACPI APIC Table: <SECCSD LH43STAR>" was printed
to the console. When the kernel is instructed not to use x2APIC,
the system boots correctly.

PR:	256389
Submitted by:	David Sebek <dasebek@gmail.com>
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30624
2021-06-03 22:47:31 +03:00
Konstantin Belousov
e9e00cc0c9 madt_setup_local: extract special case checks into a helper
Reviewed by:	markj
Tested by:	David Sebek <dasebek@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30624
2021-06-03 22:47:31 +03:00
Konstantin Belousov
92adf00d05 madt_setup_local: convert series of strcmp to iteration over the array
to prepare for one more addition

Reviewed by:	markj
Tested by:	David Sebek <dasebek@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30624
2021-06-03 22:47:31 +03:00
Konstantin Belousov
a603d41aca madt_setup_local: skip further checks if ACPI DMAR table already disabled x2APIC
Reviewed by:	markj
Tested by:	David Sebek <dasebek@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30624
2021-06-03 22:47:31 +03:00
Mark Johnston
cbe59a6475 i386: Make setidt_disp a size_t instead of uintptr_t
setidt_disp is the offset of the ISR trampoline relative to the address
of the routines in exception.s, so uintptr_t is not quite right.

Also remove a bogus declaration I added in commit 18f55c67f7, it is not
required after all.

Reported by:	jrtc27
Reviewed by:	jrtc27, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30590
2021-06-01 19:37:50 -04:00
Mark Johnston
18f55c67f7 x86: Fix lapic_ipi_alloc() on i386
The loop which checks to see if "dynamic" IDT entries are allocated
needs to compare with the trampoline address of the reserved ISR.
Otherwise it will never succeed.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Tested by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30576
2021-05-31 18:51:14 -04:00
Edward Tomasz Napierala
83043a741d linux: deduplicate DUMMY() entries
No functional changes.

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30524
2021-05-29 17:51:36 +00:00
Roger Pau Monné
9e14ac116e x86/xen: further PVHv1 removal cleanup
The AP startup extern variable declarations are not longer needed,
since PVHv2 uses the native AP startup path using the lapic. Remove
the declaration and make the variables static to mp_machdep.c

Sponsored by: Citrix Systems R&D
2021-05-18 10:43:31 +02:00
Mark Johnston
4224dbf4c7 xen: Remove leftover bits missed in commit ac3ede5371
Fixes:		ac3ede5371 ("x86/xen: remove PVHv1 code")
Reviewed by:	royger
Differential Revision:	https://reviews.freebsd.org/D30316
2021-05-17 13:06:44 -04:00
Roger Pau Monné
ac3ede5371 x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.

Note FreeBSD supports PVHv2, which is the replacement for PVHv1.

Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228
2021-05-17 11:41:21 +02:00
Mitchell Horne
2117a66af5 xen: remove hypervisor_info
This was a source of indirection needed to support PVHv1. Now that that
support has been removed, we can eliminate it.

Reviewed by: royger
2021-05-17 10:56:52 +02:00
Mitchell Horne
c93e6ea344 xen: remove support for PVHv1 bootpath
PVHv1 is a legacy interface supported only by Xen versions 4.4 through
4.9.

Reviewed by: royger
2021-05-17 10:56:52 +02:00
Mark Johnston
831850d8b0 stack(9): Disable KASAN in stack_capture()
When unwinding the stack, we may encounter a stack frame in a poisoned
region of the stack, triggering a false positive.

Reviewed by:	andrew, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30126
2021-05-07 14:31:08 -04:00
Eric van Gyzen
2f32a971b7 Wait longer for a previous IPI to be sent
When sending an IPI, if a previous IPI is still pending delivery,
native_lapic_ipi_vectored() waits for the previous IPI to be sent.
We've seen a few inexplicable panics with the current timeout of 50 ms.
Increase the timeout to 1 second and make it tunable.

No hardware specification mentions a timeout in this case; I checked
the Intel SDM, Intel MP spec, and Intel x2APIC spec.  Linux and illumos
wait forever.  In Linux, see __default_send_IPI_shortcut() in
arch/x86/kernel/apic/ipi.c.  In illumos, see apic_send_ipi() in
usr/src/uts/i86pc/io/pcplusmp/apic_common.c.  However, misbehaving hardware
could hang the system if we wait forever.

Reviewed by:	mav kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29942
2021-04-30 13:32:29 -05:00
Mark Johnston
f115c06121 amd64: Add MD bits for KASAN
- Initialize KASAN before executing SYSINITs.
- Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN.
- Increase the kernel stack size if KASAN is enabled.  Some of the
  ASAN instrumentation increases stack usage and it's enough to
  trigger stack overflows in ZFS.
- Mark the trapframe as valid in interrupt handlers if it is
  assigned to td_intr_frame.  Otherwise, an interrupt in a function
  which creates a poisoned alloca region can trigger false positives.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29455
2021-04-13 17:42:20 -04:00
Piotr Pawel Stefaniak
a212f56d10 Balance parentheses in sysctl descriptions 2021-04-11 10:30:55 +02:00
Konstantin Belousov
a8b75a57c9 x86: add x86_clear_dbregs() helper
Move the code from exec_setregs() to reset debug registers state on exec,
to the x86_clear_dbregs() helper

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29687
2021-04-10 04:25:01 +03:00
Mark Johnston
0f07c234ca Remove more remnants of sio(4)
Reviewed by:	imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29626
2021-04-07 14:33:02 -04:00
Mitchell Horne
4beb385813 gdb: allow setting/removing hardware watchpoints
Handle the 'z' and 'Z' remote packets for manipulating hardware
watchpoints.

This could be expanded quite easily to support hardware or software
breakpoints as well.

https://sourceware.org/gdb/onlinedocs/gdb/Packets.html

Reviewed by:	cem, markj
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
NetApp PR:	51
Differential Revision:	https://reviews.freebsd.org/D29173
2021-03-30 11:36:41 -03:00
Mitchell Horne
15dc1d4452 x86: implement kdb watchpoint functions
Add wrappers around the dbreg interface that can be consumed by MI
kernel debugger code. The dbreg functions themselves are updated to
return error codes, not just -1. dbreg_set_watchpoint() is extended to
accept access bits as an argument.

Reviewed by:	jhb, kib, markj
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D29155
2021-03-29 12:05:43 -03:00
Mitchell Horne
720dc6bcb5 Consolidate machine/endian.h definitions
This change serves two purposes.

First, we take advantage of the compiler provided endian definitions to
eliminate some long-standing duplication between the different versions
of this header. __BYTE_ORDER__ has been defined since GCC 4.6, so there
is no need to rely on platform defaults or e.g. __MIPSEB__ to determine
endianness. A new common sub-header is added, but there should be no
changes to the visibility of these definitions.

Second, this eliminates the hand-rolled __bswapNN() routines, again in
favor of the compiler builtins. This was done already for x86 in
e6ff6154d2. The benefit here is that we no longer have to maintain our
own implementations on each arch, and can instead rely on the compiler
to emit appropriate instructions or libcalls, as available. This should
result in equivalent or better code generation. Notably 32-bit arm will
start using the `rev` instruction for these routines, which is available
on armv6+.

PR:		236920
Reviewed by:	arichardson, imp
Tested by:	bdragon (BE powerpc)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D29012
2021-03-26 19:00:22 -03:00
Mark Johnston
3ead60236f Generalize bus_space(9) and atomic(9) sanitizer interceptors
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN.  Lay a bit of groundwork for KASAN and KMSAN.

When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h.  These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.

No functional change intended.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2021-03-22 22:21:53 -04:00
Mitchell Horne
c02c04f113 x86: consolidate hw watchpoint logic into new file
This is a prerequisite to using these functions outside of ddb, but also
provides some cleanup and minor refactoring. This code is almost
entirely duplicated between the two implementations, the only
significant difference being the lack of dbreg synchronization on i386.

Cleanups are:
 - demote some internal functions to static
 - use the constant NDBREGS instead of a '4' literal
 - remove K&R definitions
 - some added comments

Reviewed by:	kib, jhb
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29153
2021-03-19 16:51:52 -03:00
Julien Grall
b55c0d5f56 xen: move x86-specific xen_vector_callback_enabled to sys/x86
This is x86-only and so should not be in the common area.

Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential revision: https://reviews.freebsd.org/D29040
2021-03-15 14:20:21 +01:00
Kyle Evans
8cc15b0dfc x86: tsc: deprioritize TSC on VirtualBox
Misbehavior has been observed with TSC under VirtualBox, where threads
doing small sleeps (~1 second) may miss their wake up and hang around
in a sleep state indefinitely.  Switching back to ACPI-fast decidedly
fixes it, so stop using TSC on VirtualBox at least for the time being.

This partially reverts 84eaf2ccc6, applying it only to VirtualBox and
increasing the quality to 0. Negative qualities can never be chosen and
cannot be chosen with the tunable recently added. If we do not have a
timecounter with a higher quality than 0, then TSC does at least leave
the system mostly usable.

PR:		253087
Reviewed by:	emaste, kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D29132
2021-03-08 14:43:06 -06:00
Mark Johnston
435c7cfb24 Rename _cscan_atomic.h and _cscan_bus.h to atomic_san.h and bus_san.h
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29103
2021-03-08 12:39:06 -05:00
Allan Jude
d0673fe160 smbios: Move smbios driver out from x86 machdep code
Add it to the x86 GENERIC and MINIMAL kernels

Sponsored by:	Ampere Computing LLC
Submitted by:	Klara Inc.
Reviewed by:	rpokala
Differential Revision:	https://reviews.freebsd.org/D28738
2021-02-23 21:17:09 +00:00
Roger Pau Monné
a2495c3667 xen/boot: allow specifying boot method when booted from Xen
Allow setting the bootmethod variable from the Xen PVH entry point, in
order to be able to correctly set the underlying firmware mode when
booted as a dom0.

Move the bootmethod variable to be defined in x86/cpu_machdep.c
instead so it can be shared by both i386 and amd64.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D28619
2021-02-16 15:26:11 +01:00
Mark Johnston
b577047027 mca: Handle inconsistent CMCI capability reporting
A BIOS bug may apparently cause the BSP to report that it does not
implement CMCI, with some APs reporting that they do.  In this scenario,
avoid a NULL pointer dereference that occurs in cmci_monitor() because
cmc_state was not allocated by the BSP.

PR:		253272
Reported by:	asomers, mmacy
Reviewed by:	kib (previous version)
MFC after:	1 week
2021-02-08 14:42:54 -05:00
Mateusz Guzik
e6ff6154d2 x86: use compiler intrinsics for bswap* 2021-02-01 04:53:23 +00:00
Roger Pau Monné
b6d85a5f51 stand/multiboot: adjust the protocol between loader and kernel
There's a currently ad-hoc protocol to hand off the FreeBSD kernel
payload between the loader and the kernel itself when Xen is in the
middle of the picture. Such protocol wasn't very resilient to changes
to the loader itself, because it relied on moving metadata around to
package it using a certain layout. This has proven to be fragile, so
replace it with a more robust version.

The new protocol requires using a xen_header structure that will be
used to pass data between the FreeBSD loader and the FreeBSD kernel
when booting in dom0 mode. At the moment the only data conveyed is the
offset of the start of the module metadata relative to the start of the
module itself.

This is a slightly disruptive change since it also requires a change
to the kernel which is contained in this patch. In order to update
with this change the kernel must be updated before updating the
loader, as described in the handbook. Note this is only required when
booting a FreeBSD/Xen dom0. This change doesn't affect the normal
FreeBSD boot protocol.

This fixes booting FreeBSD/Xen in dom0 mode after
3630506b9d.

Sponsored by:		Citrix Systems R&D
MFC after:		3 days
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D28411
2021-01-29 15:23:26 +01:00
Konstantin Belousov
9f47eeffa3 x86: switch kernel TSC timecounter to RDTSCP on AMD Zen CPUs
Reported by:	many
Tested by:	gallatin, mikael, rhurlin
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-21 14:55:31 +02:00
Konstantin Belousov
f3ea417f96 x86 busdma_bounce: use malloc_domainset_aligned(9).
This stops busdma bounce making assumptions about alignment of malloc(9)
results, which are no longer true.

Also add assert that the result of malloc_aligned() fits into single
page, which is the assumption of the code.

Reported by:	dim
Reviewed by:	andrew, jah, markj
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28147
2021-01-17 19:29:05 +02:00
Konstantin Belousov
f9d85a0821 Revert "x86 busdma_bounce: do not make assumptions about alignment of malloc(9) results."
This reverts commit 8f54940f01.
The free needs to be called on the address returned by malloc,
not the realigned address.

Noted by:	andrew
Sponsored by:	The FreeBSD Foundation
2021-01-13 17:44:00 +02:00
Konstantin Belousov
8f54940f01 x86 busdma_bounce: do not make assumptions about alignment of malloc(9) results.
Reported by:	dim
Reviewed by:	dim, jah
Tested by:	dim, pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28108
2021-01-13 17:00:49 +02:00
Konstantin Belousov
895ad33784 x86 budma_bounce: style.
Reviewed by:	dim, jah
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28108
2021-01-13 17:00:46 +02:00
Konstantin Belousov
a013e285df x86 tsc: mark %eax as earlyclobber in tscp_get_timecount_low().
i386 codegen insists on preloading tc_priv into register on i386, and
this register cannot be %eax because RDTSCP instruction clobbers it
before it is used.

Reported and tested by:	dim
MFC after:	6 days
Sponsored by:	The FreeBSD Foundation
2021-01-11 00:05:49 +02:00
Konstantin Belousov
9e680e4005 tsc: add RDTSCP or faster variants of get_timecount()
Use it in preference of Xfenced RDTSC if RDTSCP is supported. It is
recommended by both Intel and AMD. But, on AMD Zens and newer use
LFENCE, as recommended by AMD [*]. In particular, this means that now
AMD CPUs use more appropriate fence instead of too harsh MFENCe.

Add comment explaining the intent of the selection logic.

Reported by:	gallatin [*]
Reviewed by:	gallatin, markj
Tested by:	gallatin, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
826fc3cc3d tsc: use u_int for return type for prototype, same as in definitions.
Reviewed by:	gallatin, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:34 +02:00
Konstantin Belousov
aa9450e44b x86 identcpu.c: fix formatting of the comment.
Reviewed by:	gallatin, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27986
2021-01-10 04:42:33 +02:00
Konstantin Belousov
84eaf2ccc6 x86: stop punishing VMs with low priority for TSC timecounter
I suspect that virtualization techniques improved from the time when we
have to effectively disable TSC use in VM.  For instance, it was reported
(complained) in https://github.com/JuliaLang/julia/issues/38877 that
FreeBSD is groundlessly slow on AWS with some loads.

Remove the check and start watching for complaints.

Reviewed by:	emaste, grehan
Discussed with:	cperciva
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27629
2020-12-23 12:45:15 +02:00
Ryan Libby
ee47a12a49 dmar: reserve memory windows of PCIe root port
PCI memory address space is shared between memory-mapped devices (MMIO)
and host memory (which may be remapped by an IOMMU). Device accesses to
an address within a memory aperture in a PCIe root port will be treated
as peer-to-peer and not forwarded to an IOMMU. To avoid this, reserve
the address space of the root port's memory apertures in the address
space used by the IOMMU for remapping.

Reviewed by:	kib, tychon
Discussed with:	Anton Rang <rang@acm.org>
Tested by:	tychon
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D27503
2020-12-09 18:43:58 +00:00
Warner Losh
730b1b4d1c busdma: Annotate bus_dmamap_sync() with fence
Add an explicit thread fence release before returning from
bus_dmamap_sync. This should be a no-op in practice, but makes
explicit that all ordinary stores will be completed before subsequent
reads/writes to ordinary device memory. On x86, normal memory ordering
is strong enough to generally guarantee this. The fence keeps the
optimizer (likely LTO) from reordering other calls around this.
The other architectures already have calls, as appropriate, that
are equivalent.

Note: On x86, there is one exception to this rule. If you've mapped
memory as write combining, then you will need to add a sfence or
similar. Normally, though, busdma doesn't operate on such memory, and
drivers that do already cope appropriately.

Reviewed by: kib@, gallatin@, chuck@, mav@
Differential Revision: https://reviews.freebsd.org/D27448
2020-12-04 21:34:04 +00:00
John Baldwin
5941edfcdc Add a kstack_contains() helper function.
This is useful for stack unwinders which need to avoid out-of-bounds
reads of a kernel stack which can trigger kernel faults.

Reviewed by:	kib, markj
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27356
2020-12-01 17:04:46 +00:00
Toomas Soome
a4a10b37d4 Add VT driver for VBE framebuffer device
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.

struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.

Differential Revision:	https://reviews.freebsd.org/D27373
2020-11-30 08:22:40 +00:00
Yuri Pankov
41062d6878 hwpstate_intel: don't unconditionally print the error message
Actually check the wrmsr_safe() return value when setting autonomous
HWP for package.

PR:		245582
Differential Revision:	https://reviews.freebsd.org/D24744
2020-11-29 01:43:04 +00:00
Ruslan Bukin
f593116991 Add device_t member to struct iommu.
This is needed on arm64 for the interface between iommu framework
and iommu controller drivers.

Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D27229
2020-11-16 15:29:52 +00:00
Conrad Meyer
e9b13c6612 linux(4): Deduplicate unimpl/dummy syscall handlers
No functional change.

Reviewed by:	emaste, trasz
Differential Revision:	https://reviews.freebsd.org/D27099
2020-11-05 19:30:31 +00:00
Ruslan Bukin
9729b14985 Move the iommu stubs to a generic place, so they are available on all the
platforms.

This allows to not depend on the IOMMU macro in AHCI driver.

Requested by:	kib
Suggested by:	andrew
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26887
2020-10-23 21:27:48 +00:00
Ruslan Bukin
94dfb28ee0 Assign the reserved apic region (GAS entry) to the iommu domain msi_entry.
Requested by:	kib
Reviewed by:	kib
Sponsored by:	Innovate DSbD
Differential Revision:	https://reviews.freebsd.org/D26859
2020-10-19 15:50:58 +00:00
Konstantin Belousov
d3ba71b2b1 Limit workaround for errata E400 to appropriate AMD cpus.
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10.  For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved.  Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.

While there, style the code:
    move MSR defines to specialreg.h
    move identification to initcpu.c

Reported by:	whu
Reviewed by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26470
2020-10-14 22:57:50 +00:00
Konstantin Belousov
6f3b523c9a Avoid dump_avail[] redefinition.
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h.  This fixes default gcc build for mips.

Reviewed by:	alc, scottph
Tested by:	kevans (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26741
2020-10-14 22:51:40 +00:00
Warner Losh
8e82f10172 timer_restore is now unused, remove it
apm was the only consumer of timer_restore. Now that it's gone, this
can be removed.
2020-10-08 20:56:11 +00:00
Mitchell Horne
6debfd4b13 Remove unused function cpu_boot()
The prototype was added with the creation of kern_shutdown.c in r17658,
but it appears to have never been implemented. Remove it now.

Reviewed by:	cem, kib
Differential Revision:	https://reviews.freebsd.org/D26702
2020-10-06 23:16:56 +00:00
Ryan Moeller
3331a1d173 Explicit CTLFLAG_DYN not needed
Dynamically created OIDs automatically get this flag set.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D26561
2020-10-04 19:37:15 +00:00
Konstantin Belousov
5e8ea68fd8 Move ctx_switch_xsave declaration to amd64 md_var.h.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:07:09 +00:00
Ruslan Bukin
6186bfbd18 Rename kernel option ACPI_DMAR to IOMMU.
This is mostly needed for a common arm64/amd64 iommu code.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26587
2020-09-29 20:29:07 +00:00
Michal Meloun
88f7c52f31 Add missing declarations of 64-bit variants of bus_peek/bus_poke on amd64.
It fixes GENERIC-KCSAN build.

Reported by:	rpokala
MFC after:	1 month
MFC with:	r365899
2020-09-24 08:40:32 +00:00
Warner Losh
f9ba2bbe3a Use envvar rather than nonstandard hint. lines
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.

Suggested by: kevans
2020-09-23 19:18:53 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Michal Meloun
3182062142 Add missing assignment forgotten in r365899
Noticed by:	mav
MFC after:	1 month
MFC with:	r365899
2020-09-20 15:11:52 +00:00
Michal Meloun
95a85c125d Add NetBSD compatible bus_space_peek_N() and bus_space_poke_N() functions.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.

Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.

This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25371
2020-09-19 11:06:41 +00:00
Scott Long
74c781ed91 Refine the busdma template interface. Provide tools for filling in fields
that can be extended, but also ensure compile-time type checking.  Refactor
common code out of arch-specific implementations.  Move the mpr and mps
drivers to this new API.  The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
2020-09-14 05:58:12 +00:00
Jason A. Harmening
3966af52f3 amd64: prevent KCSan false positives on LAPIC mapping
For configurations without x2APIC support (guests, older hardware), the global
LAPIC MMIO mapping will trigger false-positive KCSan reports as it will appear
that multiple CPUs are concurrently reading and writing the same address.
This isn't actually true, as the underlying physical access will be performed
on the local CPU's APIC. Additionally, because LAPIC access can happen during
event timer configuration, the resulting KCSan printf can produce a panic due
to attempted recursion on event timer resources.

Add a __nosanitizethread preprocessor define to prevent the compiler from
inserting TSan hooks, and apply it to the x86 LAPIC accessors.

PR:		249149
Reported by:	gbe
Reviewed by:	andrew, kib
Tested by:	gbe
Differential Revision:	https://reviews.freebsd.org/D26354
2020-09-12 07:04:00 +00:00
John Baldwin
d8dc46f6e9 Add constant for the DE_CFG MSR on AMD CPUs.
Reported by:	Patrick Mooney <pmooney@pfmooney.com>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25885
2020-09-11 20:32:40 +00:00
Ruslan Bukin
cb9050dd21 Move the rid variable to the generic iommu context.
It could be used in various IOMMU platforms, not only DMAR.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26373
2020-09-10 14:12:25 +00:00
Mateusz Guzik
ab6c81a218 x86: clean up empty lines in .c and .h files 2020-09-01 21:23:59 +00:00
Konstantin Belousov
f446480b5f amd64: Handle 5-level paging on wakeup.
We can switch into long mode directly with LA57 enabled.

Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:43:23 +00:00
Konstantin Belousov
25f2da2e64 Add amd64 procctl(2) ops to manage forced LA48/LA57 VA after exec.
Tested by:	pho (LA48 hardware)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:32:13 +00:00
Konstantin Belousov
4ba405dcdb Add definition for CR4.LA57 bit.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:08:05 +00:00
Peter Grehan
3a3f1e9dfa Export a routine to provide the TSC_AUX MSR value and use this in vmm.
Also, drop an unnecessary set of braces.

Requested by:	kib
Reviewed by:	kib
MFC after:	3 weeks
2020-08-18 11:36:38 +00:00
Ruslan Bukin
0424f19e9e Move dmar_domain_unload_task to busdma_iommu.c.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25972
2020-08-06 12:49:25 +00:00
Ruslan Bukin
16696f6057 Add iommu_domain constructor and destructor.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25956
2020-08-06 08:48:23 +00:00
Ruslan Bukin
c4cd699010 o Add machine/iommu.h and include MD iommu headers from it,
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.

Requested by:	andrew
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25957
2020-08-05 19:11:31 +00:00
Ruslan Bukin
78b517543b Add a few macroses for conversion between DMAR unit, domain, ctx
and IOMMU unit, domain, ctx.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25926
2020-08-04 20:51:05 +00:00
Mark Johnston
96ad26eefb Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches.  They no longer serve any
purpose since UMA now provides their functionality by default.  Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by:	cem, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25937
2020-08-04 13:58:36 +00:00
Ruslan Bukin
0eed04c802 Add iommu_domain_map_ops virtual table with map/unmap methods
so x86 can support Intel DMAR and AMD IOMMU simultaneously.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25894
2020-07-31 23:02:17 +00:00
Ruslan Bukin
c8597a1f9f o Don't include headers from iommu.h, include them from the header
consumers instead;
o Order includes properly.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25878
2020-07-29 22:08:54 +00:00
Ruslan Bukin
386f81fca7 Fix !ACPI_DMAR build.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25882
2020-07-29 19:22:50 +00:00
Ruslan Bukin
1b0c9c21d9 o Move iommu_set_buswide_ctx, iommu_is_buswide_ctx to
the generic iommu busdma backend;
o Move bus_dma_iommu_set_buswide, bus_dma_iommu_load_ident
  prototypes to iommu.h.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25866
2020-07-29 13:23:27 +00:00
Ruslan Bukin
ea4c01156a o Move the buswide_ctxs bitmap to iommu_unit and rename related functions.
o Rename bus_dma_dmar_load_ident() as well.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25852
2020-07-28 16:08:14 +00:00
Alexander Motin
855e49f3b0 Add initial driver for ACPI Platform Error Interfaces.
APEI allows platform to report different kinds of errors to OS in several
ways.  We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI.  Without respective
driver it ended up in kernel panic without any additional information.

This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling.  It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise.  Errors
marked as fatal still end up in kernel panic, but some more informative.

When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code.  Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2020-07-27 21:19:41 +00:00
Ruslan Bukin
15f6baf445 Rename DMAR flags:
o DMAR_DOMAIN_* -> IOMMU_DOMAIN_*
o DMAR_PGF_* -> IOMMU_PGF_*

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25812
2020-07-26 12:29:22 +00:00
Ruslan Bukin
357149f037 o Make the _hw_iommu sysctl node non-static;
o Move the dmar sysctl knobs to _hw_iommu_dmar.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25807
2020-07-25 21:37:07 +00:00
Ruslan Bukin
9c843a409b o Move iommu gas prototypes, DMAR flags to iommu.h;
o Move hw.dmar sysctl node to iommu_gas.c.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25802
2020-07-25 19:07:12 +00:00
Alexander Motin
aba10e131f Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.

To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs.  On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY.  To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25754
2020-07-25 15:19:38 +00:00
Ruslan Bukin
3024e8af1e Move Intel GAS to dev/iommu/ as now a part of generic iommu framework.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25799
2020-07-25 11:34:50 +00:00
Ruslan Bukin
62ad310c93 Split-out the Intel GAS (Guest Address Space) management component
from Intel DMAR support, so it can be used on other IOMMU systems.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25743
2020-07-25 09:28:38 +00:00
Alexander Motin
9977c593a7 Introduce ipi_self_from_nmi().
It allows safe IPI sending to current CPU from NMI context.

Unlike other ipi_*() functions this waits for delivery to leave LAPIC in
a state safe for interrupted code.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-07-24 20:52:09 +00:00
Alexander Motin
279cd05b7e Use APIC_IPI_DEST_OTHERS for bitmapped IPIs too.
It should save bunch of LAPIC register accesses.

MFC after:	2 weeks
2020-07-24 20:44:50 +00:00
Alexander Motin
23ce462092 Make lapic_ipi_vectored(APIC_IPI_DEST_SELF) NMI safe.
Sending IPI to self or all CPUs does not require write into upper part of
the ICR, prone to races.  Previously the code disabled interrupts, but it
was not enough for NMIs.  Instead of that when possible write only lower
part of the register, or use special SELF IPI register in x2APIC mode.

This also removes ICR reads used to preserve reserved bits on write.
It was there from the beginning, but I failed to find explanation why,
neither I see Linux doing it.  Specification even tells that ICR content
may be lost in deep C-states, so if hardware does not bother to preserve
it, why should we?

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2020-07-24 19:54:15 +00:00
Ruslan Bukin
1238a28d15 Move sys/iommu.h to dev/iommu/ as a part of generic IOMMU busdma backend.
Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25750
2020-07-21 13:50:10 +00:00
Ruslan Bukin
f2b2f31707 Move the Intel DMAR busdma backend to a generic place so
it can be used on other IOMMU systems.

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25720
2020-07-21 10:38:51 +00:00
Ruslan Bukin
08e99af305 o Move iommu_test_boundary() to sys/iommu.h
o Rename DMAR -> IOMMU in comments
o Add IOMMU_PAGE_SIZE / IOMMU_PAGE_MASK macroses
o x86 only: dmar_quirks_pre_use() / dmar_instantiate_rmrr_ctxs()

Reviewed by:	kib
Sponsored by:	DARPA/AFRL
Differential Revision:	https://reviews.freebsd.org/D25665
2020-07-18 13:10:31 +00:00
Ryan Moeller
ef013ceecd hwpmc: Always set pmc_cpuid to something
pmc_cpuid was uninitialized for most AMD processor families.  We can still
populate this string for unimplemented families.

Also added a CPUID_TO_STEPPING macro and converted existing code to use it.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25673
2020-07-14 22:25:06 +00:00
Konstantin Belousov
dc43978aa5 amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu.  Now each CPU can initiate
shootdown IPI independently from other CPUs.  Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu.  After that IPI is sent to all targets
which scan for zeroed scoreboard generation words.  Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set.  Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.

Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.

The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.

Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.

Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
   this is fine because we in fact do not send IPIs from interrupt
   handlers. More for !x2APIC mode where ICR access for write requires
   two registers write, we disable interrupts around it. If considered
   incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
   cache line, to reduce ping-pong.

Reviewed by:	markj (previous version)
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D25510
2020-07-14 20:37:50 +00:00
Ruslan Bukin
59e37c8a54 Start splitting-out the Intel DMAR busdma backend to a generic place,
so it can be used on other IOMMU systems.

Provide MI iommu_unit, iommu_domain and iommu_ctx structs in sys/iommu.h;
use them as a first member of MD dmar_unit, dmar_domain and dmar_ctx.

Change the namespace in DMAR backend: use iommu_ prefix instead of dmar_.

Move some macroses and function prototypes to sys/iommu.h.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25574
2020-07-14 10:55:19 +00:00
Jung-uk Kim
450d86fc7f Assume all TSCs are synchronized for AMD Family 17h processors and later
when it has passed the synchronization test.

"Processor Programming Reference (PPR) for AMD Family 17h" states that
the TSC uses a common reference for all sockets, cores and threads.

MFC after:	1 month
2020-06-22 20:42:58 +00:00
Konstantin Belousov
17edf152e5 Control for Special Register Buffer Data Sampling mitigation.
New microcode update for Intel enables mitigation for SRBDS, which
slows down RDSEED and related instructions.  The update also provides
a control to limit the mitigation to SGX enclaves, which should
restore the speed of random generator by the cost of potential
cross-core bufer sampling.

See https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling

GIve the user control over it.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25221
2020-06-12 22:14:45 +00:00
Konstantin Belousov
958d257ed5 x86: add bits definitions for SRBDS mitigation control.
See https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25221
2020-06-12 22:12:57 +00:00
Andrew Gallatin
6da16e3eb0 x86: Bump default msi/msix vector limit to 2048
Given that 64c/128t CPUs are currently available, and that many
devices (nvme, many NICs) desire to map 1 MSI-X vector per core,
or even 1 per-thread, it is becoming far easier to see MSI-X interrupt
setup fail due to msi vector exhaustion, and devices fail to attach at
boot on large system.

This bump costs 12KB on amd64 (and 6KB on i386), which seems
worth the trade off for a better out of the box experience on
high end hardware.

Reviewed by:	jhb
MFC after:	21 days
Sponsored by:	Netflix
2020-06-12 18:41:12 +00:00
Konstantin Belousov
e09fb42a9a Correct comment (this should have been committed with r362065).
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2020-06-11 20:26:39 +00:00
Konstantin Belousov
e7a291f418 Restore TLB invalidations done before smp started.
In particular, invalidation of the preloaded modules text to allow
execution from it was broken after D25188/r362031.

Reviewed by:	markj
Reported by:	delphij, dhw
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2020-06-11 17:25:20 +00:00
Konstantin Belousov
3b23ffe271 amd64 pmap: reorder IPI send and local TLB flush in TLB invalidations.
Right now code first flushes all local TLB entries that needs to be
flushed, then signals IPI to remote cores, and then waits for
acknowledgements while spinning idle.  In the VMWare article 'Don’t
shoot down TLB shootdowns!' it was noted that the time spent spinning
is lost, and can be more usefully used doing local TLB invalidation.

We could use the same invalidation handler for local TLB as for
remote, but typically for pmap == curpmap we can use INVLPG for locals
instead of INVPCID on remotes, since we cannot control context
switches on them.  Due to that, keep the local code and provide the
callbacks to be called from smp_targeted_tlb_shootdown() after IPIs
are fired but before spin wait starts.

Reviewed by:	alc, cem, markj, Anton Rang <rang at acm.org>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D25188
2020-06-10 22:07:57 +00:00
Jason A. Harmening
1dccf71b4b Remove unnecessary WITNESS check in x86 bus_dma
When I did some bus_dma cleanup in r320528, I brought forward some sketchy
WITNESS checks from the prior x86 busdma wrappers, instead of recognizing
them as technical debt and just dropping them.  Two of these were removed in
r346351 and r346851, but one remains in bounce_bus_dmamem_alloc(). This check
could be constrained to only apply in the BUS_DMA_NOWAIT case, but it's cleaner
to simply remove it and rely on the checks already present in the sleepable
allocation paths used by this function.

While here, remove another unnecessary witness check in bus_dma_tag_create
(the tag is always allocated with M_NOWAIT), and fix a couple of typos.

Reported by:	cem
Reviewed by:	kib, cem
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25107
2020-06-03 00:16:36 +00:00
Roger Pau Monné
a7752a2eda xenpv: do not use low 1MB for Xen mappings on i386
On amd64 we already avoid using memory below 4GB in order to prevent
clashes with MMIO regions, but i386 was allowed to use any hole in
the physical memory map in order to map Xen pages.

Limit this to memory above the 1MB boundary on i386 in order to avoid
clashes with the MMIO holes in that area.

Sponsored by:	Citrix Systems R&D
MFC after:	1 week
2020-05-28 08:18:34 +00:00
Conrad Meyer
9d3b7f622a x86: Detect new feature bits
Fix an off-by-one in AVX512VPOPCNTDQ identification.  That was actually the
TME bit.

Reported by:	debdrup
2020-05-26 23:12:57 +00:00
Ruslan Bukin
43843cc281 Rename dmar_get_dma_tag() to acpi_iommu_get_dma_tag().
This is needed for a new IOMMU controller support.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D24943
2020-05-26 16:40:40 +00:00
Konstantin Belousov
ea6020830c amd64: Add a knob to flush RSB on context switches if machine has SMEP.
The flush is needed to prevent cross-process ret2spec, which is not handled
on kernel entry if IBPB is enabled but SMEP is present.
While there, add i386 RSB flush.

Reported by:	Anthony Steinhauser <asteinhauser@google.com>
Reviewed by:	markj, Anthony Steinhauser
Discussed with:	philip
admbugs:	961
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-05-20 22:00:31 +00:00
Konstantin Belousov
36e1ad61e8 Do not consider CAP_RDCL_NO as an indicator for all MDS vulnerabilities
handled by hardware.

Reported by:	Anthony Steinhauser <asteinhauser@google.com>
admbugs:	962
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-05-20 21:22:25 +00:00
Mark Johnston
e76aab6ae2 Call acpi_pxm_set_proximity_info() slightly earlier on x86.
This function is responsible for setting pc_domain in each pcpu
structure.  Call it from the main function that starts APs, rather than
a separate SYSINIT.  This makes it easier to close the window where
UMA's per-CPU slab allocator may be called while pc_domain is
uninitialized.  In particular, the allocator uses pc_domain to allocate
domain-local pages, so allocations before this point end up using domain
0 for everything.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24757
2020-05-14 16:07:27 +00:00
Eric van Gyzen
ba0ced82ea Fix handling of NMIs from unknown sources (BMC, hypervisor)
Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure.  This includes NMIs from
IPMI BMCs and hypervisors.

Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.

Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi.  Expand the meaning of panic_on_nmi by making
it a bitfield.  There are currently two bits: one for NMIs due to
hardware failure, and one for all others.  Leave room for more.

If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes: machdep.kdb_on_nmi is gone; machdep.panic_on_nmi changed
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D24558
2020-04-26 00:41:29 +00:00
Konstantin Belousov
ab23c2784b Improve TSC calibration logic.
Stop attempting to use FADT legacy hardware flag, it is almost always
a lie.

Instead, unless user explicitly disabled the calibration, calibrate
against 8254 ISA clock.  Then, obtain the rough value of the expected
TSC frequency from CPUID leafs 0x15/0x16 or even from the CPU
marketing name string.  If calibration results look unbelievably bogus
comparing with CPUID leafs report, use CPUID one.

Intel does not recommend to use CPUID leaf 0x16 for the value of the
system time frequency, indeed the error there might be up to 1% which
e.g. makes ntpd give up.  If ISA clock is present, we win, if not, we
get some frequency that allows the machine to boot without enormous
delay.

Next improvement would be to use HPET for re-calibration if we decided
that ISA clock gives bogus results, after HPETs are enumerated. This
is a much bigger change since we probably would need to re-evaluate
some constants depending on TSC frequency.

Reviewed by:	emaste, jhb, scottl
Tested by:	scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D24426
2020-04-15 22:28:51 +00:00
Konstantin Belousov
bd8a359f81 x86 tsc: fall back to CPUID if calibration results looks unbelievable.
Teested and reviewed by:	scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D24247
2020-04-01 16:21:11 +00:00
Konstantin Belousov
46ed9da5fc Do not spuriously re-enable disabled io_apic pin on EOI for some configurations.
If EOI suppression is supported but reported ioapic version is so old
that it does not has EOI register (weird virtualization setup), fix
Intel trick of eoi-ing by flipping pin type (edge/level) to account
for the disabled pin.

Reported by:	Juniper
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:34:52 +00:00
Konstantin Belousov
1314c4920f Stop (trying to) renumber io apics.
It does not serve any purpose now, the io apic id is not seen by
software, and some Intel documents claim that the register is
implemented for FUD reasons.  More, renumbering seems to not work on
new Intel machines which actually have mismatched MADT and hw IDs.

On older machines where separate APIC bus existed, unique numbering of
all APICs was required for bus arbitration to work, but it is no
longer true (that machines were SMP from pre-Pentium IV era).

When matching PCIe IOAPIC device against MADT-enumerated IOAPICs,
compare io_apic_id from BAR against io_apic_id read from the
MADT-pointed register page.

Reviewed by:	jhb
Tested by:	flo (previous version), pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:31:35 +00:00
Konstantin Belousov
c105f5c1fa Widen the stored io_apic_id to 8 bits.
It seems that the newer Intel chipset did that, and Linux reads 8
bits. The only detail is that all seen datasheets, even under NDA,
claim that io apic id is 4 bits.

Submitted by:	jeff
Reviewed by:	jhb
Tested by:	flo, pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:24:34 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Konstantin Belousov
a324b7f71d Fix IBRS for machines with IBRS_ALL capability.
When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores.  Right now code only performed MSR write
onr the CPU where sysctl was run.

Properly report hw.ibrs_active for IBRS_ALL.  Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.

Reported and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-25 17:26:10 +00:00
Konstantin Belousov
43cd55db57 x86/identcpu.c whitespace cleanup.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-02-21 16:55:28 +00:00