Commit Graph

735 Commits

Author SHA1 Message Date
Poul-Henning Kamp
8ba749fbe3 Introduce an architecture-agnostic <sys/_stdarg.h> to reduce
platform divergence.

Only architectures which pass arguments in registers (mips)
and platforms which use really weird compilers (any?) would
need to augment the contents of <sys/_stdarg.h>

Convert x86, arm and arm64 architectures to use <sys/_stdarg.h>
2017-12-25 20:54:00 +00:00
Warner Losh
ed98ce5cad Further investigation shows this shouldn't have been added at all.
Remove it.
2017-12-24 17:59:48 +00:00
Warner Losh
d76103580a Comment this out until I have time to get to the bottom of why it's
failing for some people.
2017-12-24 16:36:50 +00:00
Warner Losh
7dcb3b1295 Warn when nonPNP ISA devices are attached in GENERIC that they are
being removed from GENERIC in 12. Always print PNP info for ISA when
it exists: it doesn't depend on ISAPNP. Add PNP ID to orm and vga to
prevent us from warning about them since those devices aren't being
removed from GENERIC. PNP devices will be removed from GENERIC too,
but they will be automatically loaded, so need no warning. We don't
warn for non-GENERIC kernels because people running them are presumed
to know what they are doing.

MFC After: 2 weeks
2017-12-23 22:57:14 +00:00
Konstantin Belousov
6332b14887 Add missed AVX512VL (128 and 256 bit vector length) extension
identification bit.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-12-23 21:32:50 +00:00
Bruce Evans
da9fba5447 Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension.
restart_cpus() worked well enough by accident.  Before this set of fixes,
resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to
restart) as restart_cpus().  resume_cpus() waited for the wrong cpuset
(stopped_cpus) to become empty, but since mixtures of stopped and suspended
CPUs are not close to working, stopped_cpus must be empty when resuming so
the wait is null -- restart_cpus just allows the other CPUs to restart and
returns without waiting.

Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and
add further kludges to try to keep it working for the XEN case.  It
was only used for XEN.  It waited on suspended_cpus.  This works for
XEN.  However, for ACPI, resuming is a 2-step process.  ACPI has already
woken up the other CPUs and removed them from suspended_cpus.  This
fix records the move by putting them in a new cpuset resuming_cpus.
Waiting on suspended_cpus would give the same null wait as waiting on
stopped_cpus.  Wait on resuming_cpus instead.

Add a cpuset toresume_cpus to map the CPUs being told to resume to keep
this separate from the cpuset started_cpus for mapping the CPUs being told
to restart.  Mixtures of stopped and suspended/resuming CPUs are still far
from working.  Describe new and some old cpusets in comments.

Add further kludges to cpususpend_handler() to try to avoid breaking it
for XEN.  XEN doesn't use resumectx(), so it doesn't use the second
return path for savectx(), and it goes from the suspended state directly
to the restarted state, while ACPI resume goes through the resuming state.
Enter the resuming state early for all cases so that resume_cpus can test
for being in this state and not have to worry about the intermediate
!suspended state for ACPI only.

Reviewed by:	kib
2017-12-21 09:17:48 +00:00
Bruce Evans
2ba6fe0009 Remove the permanent double mapping of low physical memory and replace
it by a transient double mapping for the one instruction in ACPI wakeup
where it is needed (and for many surrounding instructions in ACPI resume).
Invalidate the TLB as soon as convenient after undoing the transient
mapping.  ACPI resume already has the strict ordering needed for this.

This fixes the non-trapping of null pointers and other garbage pointers
below NBPDR (except transiently).  NBPDR is quite large (4MB, or 2MB for
PAE).

This fixes spurious traps at the first instruction in VM86 bioscalls.
The traps are for transiently missing read permission in the first
VM86 page (physical page 0) which was just written to at KERNBASE in
the kernel.  The mechanism is unknown (it is not simply PG_G).

locore uses a similar but larger transient double mapping and needs
it for 2 instructions instead of 1.  Unmap the first PDE in it after
the 2 instructions to detect most garbage pointers while bootstrapping.
pmap_bootstrap() finishes the unmapping.

Remove the avoidance of the double mapping for a recently fixed special
case.  ACPI resume could use this avoidance (made non-special) to avoid
any problems with the transient double mapping, but no such problems
are known.

Update comments in locore.  Many were for old versions of FreeBSD which
tried to map low memory r/o except for special cases, or might have
allowed access to low memory via physical offsets.  Now all kernel
maps are r/w, and removal of of the double map disallows use of physical
offsets again.
2017-12-18 13:53:22 +00:00
Pedro F. Giffuni
64de3fdd58 SPDX: use the Beerware identifier. 2017-11-30 20:33:45 +00:00
Jung-uk Kim
82f0844956 Properly skip the first CPU. It only accidentally worked because the
CPU_FOREACH() loop always starts from BSP (cpu0) and the if condition
is always false for APs.

Reported by:	cem
2017-11-30 20:21:42 +00:00
Pedro F. Giffuni
8820ecc040 SPDX: Fix some cases wrongly attributed to MIT.
In the cases of BSD-style license variants without clauses, use 0BSD for
the time being in lack of a better description.
2017-11-30 15:10:11 +00:00
Jung-uk Kim
e374a321fe Add a tunable "debug.hwpstate_verify" to check P-state after changing it and
turn it off by default.  It is very inefficient to verify current P-state of
each core, especially for CPUs with many cores.  When multiple commands are
requested to the same power domain before completion of pending transitions,
the last command is executed according to the manual.  Because requests are
serialized by the caller, all cores will receive the same command for each
call.  Do not call sched_bind() and sched_unbind().  It is redundant because
the caller does it anyway.
2017-11-30 01:40:07 +00:00
Jung-uk Kim
72b27e9773 Fix style(9). 2017-11-29 23:52:31 +00:00
Pedro F. Giffuni
ebf5747bdb sys/x86: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:11:47 +00:00
Konstantin Belousov
383f241dce Remove lint support from system headers and MD x86 headers.
Reviewed by:	dim, jhb
Discussed with:	imp
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13156
2017-11-23 11:40:16 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Ruslan Bukin
3b418d1b9a Add Intel Processor Trace registers for:
- CPUID
- Table of Physical Addresses (ToPA).

Sponsored by:	DARPA, AFRL
2017-11-17 17:54:10 +00:00
Konstantin Belousov
4e421792ec Remove i386 XBOX support.
It is for console presented at 2001 and featuring Pentium III
processor.  Even if any of them are still alive and run FreeBSD, we do
not have any sign of life from their users.  While removing another
dozens of #ifdefs from the i386 sources reduces the aversion from
looking at the code and improves the platform vitality.

Reviewed by:	cem, pfg, rink (XBOX support author)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13016
2017-11-16 14:27:02 +00:00
Ruslan Bukin
b510dab312 Add Intel Processor Trace (PT) MSRs.
Sponsored by:	DARPA, AFRL
2017-11-12 23:13:04 +00:00
Konstantin Belousov
dc00696a27 Correct operators precedence.
Also keep the calculated vm_page_alloc_contig() flags in the variable
to not re-evaluate it on the loop iteration.

Noted by:	alc
Sponsored by:	The FreeBSD Foundation
2017-11-09 13:09:07 +00:00
Jeff Roberson
8d6fbbb867 Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
Michal Meloun
904d8c492f Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
 - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
   same way as for AT_HWCAP.

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12699
2017-10-21 12:05:01 +00:00
Conrad Meyer
194446f9b7 x86: Decode AMD "Extended Feature Extensions ID EBX" bits
In particular, this determines CPU support for the CLZERO instruction.

(No, I am not making this name up.)

Sponsored by:	Dell EMC Isilon
2017-09-20 18:30:37 +00:00
Conrad Meyer
c50df68a08 MCA: Expand AMD Thresholding support to cover all banks
When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason.  However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks.  This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Reviewed by:	markj (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12321
2017-09-17 22:58:13 +00:00
John Baldwin
8df419f2df Add AT_EHDRFLAGS and AT_HWCAP on amd64.
x86 has two separate (but identical) list of AT_* constants and the
earlier commit to add AT_HWCAP only updated the i386 list.
2017-09-14 15:34:29 +00:00
John Baldwin
c2f37b9245 Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'.  A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags.  If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.

The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present.  This is a step towards unifying the
AT_* constants across platforms.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12290
2017-09-14 14:26:55 +00:00
Conrad Meyer
d63edb4dc6 MCA: Rename AMD MISC bits/masks
They apply to all AMD MCAi_MISC0 registers, not just MCA4 (NB).

No functional change.

Sponsored by:	Dell EMC Isilon
2017-09-11 20:42:07 +00:00
Conrad Meyer
f739be66e6 x86 MCA: Extract CMCI support predicate into function
On AMD, the MCG_CAP feature bit is reserved -- not explicitly zero.  Do not
use it to determine CMCI support.

Reviewed by:	avg, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12320
2017-09-11 20:41:25 +00:00
Konstantin Belousov
809f2d8b8b Fix ioapic acpi id matching on PCI attach and rid calculation.
Sponsored by:	The FreeBSD Foundation
MFC after:	11 days
2017-09-11 18:29:09 +00:00
Conrad Meyer
e8be4e41c6 Decode new AMD SVM feature bits on family 17h
Sponsored by:	Dell EMC Isilon
2017-09-11 18:11:53 +00:00
Konstantin Belousov
3c700e2e4c Enhance qpi.c to make it usable on all Core-microarchitecture Xeons.
Scan all buses for CSR bus, not stopping on the first failed
match. Scan all slots for function 0 on the found bus, for instance on
IvyBridge the slot 0 is not decoded at all. Since the scan is quite
unsafe, and access to the buses is mostly useful for developers,
enable the csr buses scan with the tunable.

Current qpi.c makes too many assumptions about the uncore
configuration buses location and about slots occupied.  Also it
restricts itself only to Nehalem CPUs.  It is needed on all Core-based
Xeons.  On the 2600 v2 (IvyBridge) machine I have access to, the CSR
buses have numbers 31 (BSP socket) and 63 (second socket), and there
is no functions pci0.31.0.0 or pci0.63.0.0.  According to the CPU
datasheet, all devices on the uncore bus occupy slots >= 8.

Practically, the attach to config buses is required for the intel-pcm
pcm-memory.x tool to work, for instance.

Reviewed by:	jhb (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12268
2017-09-08 19:51:03 +00:00
Konstantin Belousov
fd15fee1ed Use IOAPIC PCI rid as the interrupt TLP source id for DMAR interrupt
remapping.

VT-d specification requires use of PCI rid as source id for IOAPICs
enumerated by PCI bus.  The values from the DMAR ACPI table should be
only used when IOAPIC is not on PCI.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Hardware provided by:	Intel
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D12205
2017-09-08 19:45:37 +00:00
Konstantin Belousov
3fd0053a50 Add an ioapic_get_rid() function to obtain PCIe TLP requester-id for
the interrupt messages from given IOAPIC, if the IOAPIC can be
enumerated on PCI bus.

If IOAPIC has PCI binding, match the PCI device against MADT
enumerated IOAPIC.  Match is done first by registers window physical
address, then by IOAPIC ID as read from the APIC ID register.

PCI bsf address of the matched PCI device is the rid.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Hardware provided by:	Intel
MFC after:	2 weeks
X-Differential revision:	https://reviews.freebsd.org/D12205
2017-09-08 19:39:20 +00:00
Konstantin Belousov
1a92c8402d Add a constant specifying the min size of the IOAPIC registers window.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-08 19:25:11 +00:00
Konstantin Belousov
6ff9ce94ce Consistently use tabs for indent.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-08 10:39:28 +00:00
Conrad Meyer
01a20b9875 mca: Fix printf types from r323289 on i386
Reported by:	Michael Butler <imb AT protected-networks.net>
Sponsored by:	Dell EMC Isilon
2017-09-08 01:06:35 +00:00
Conrad Meyer
092c0e867a x86 MCA: Helpfully, print why ECC thresholding is not enabled on AMD
Sponsored by:	Dell EMC Isilon
2017-09-07 21:33:27 +00:00
Conrad Meyer
d848ecfb7e x86 MCA: Enable AMD thresholding support on 17h
17h supports MCA thresholding in the same way as 16h and earlier.
Supposedly a ScalableMca feature bit in CPUID 8000_0007:EBX must be set, but
that was not true for earlier models, so be careful about relying on it.

While here, document a missing bit in LS MCA MISC0.

Reviewed by:	truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12237
2017-09-07 21:31:07 +00:00
Conrad Meyer
cd8c258198 Store AMD RAS Capabilities cpuid value and name flags
Reviewed by:	truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12237
2017-09-07 21:29:51 +00:00
Conrad Meyer
2e81566368 cpufreq(4) hwpstate: Yield CPU awaiting frequency change
It doesn't seem necessary to busy the CPU while waiting to transition
into a different p-state.

PR:		221621 (related, but does not completely address)
Reviewed by:	truckman
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12260
2017-09-07 20:20:12 +00:00
Konstantin Belousov
fd9bc183bb Fix typos. Stop claiming that two children are created.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-09-06 11:47:59 +00:00
Roger Pau Monné
45ff071d6e acpi/srat: zero the SRAT cpu array
Fix from fallout introduced in r322348 that moved the cpus array to a
dynamic allocation without zeroing the area.

Reported by:		mjg
MFC with:		r322348
Reviewed by:		mjg
Differential revision:	https://reviews.freebsd.org/D12220
2017-09-04 10:08:42 +00:00
Konstantin Belousov
2624320fcc Stop masking FSGSBASE and SMEP features under monitors.
Not enabling FSGSBASE in %cr4 does not prevent reporting of the
feature by the CPUID instruction (blame Int*l).  As result, kernels
which were run under monitors pretended that usermode cannot modify
TLS base without the syscall, while libc noted right combination of
capable CPU and the new kernel version, trying to use the WRFSBASE
instruction.

Really old hypervisors that cannot handle enablement of these features
in %cr4 would require the manual configuration, by setting the loader
tunable hw.cpu_stdext_disable=0x81

Reported by:	lwhsu, mjoras
Sponsored by:	The FreeBSD Foundation
MFC after:	18 days
2017-08-24 10:57:34 +00:00
Alexander Motin
ffc7e53a65 Fix off-by-one error when parsing SRAT table.
Reviewed by:	jhb
MFC after:	1 week
2017-08-22 19:56:30 +00:00
Conrad Meyer
bb14d5643b subr_smp: Clean up topology analysis, add additional layers
Rather than repeatedly nesting loops, separate concerns with a single loop
per call stack level.  Use a table to drive the recursive routine.  Handle
missing topology layers more gracefully (infer a single unit).

Analyze some additional optional layers which may be present on e.g. AMD Zen
systems (groups, aka dies, per package; and cachegroups, aka CCXes, per
group).

Display that additional information in the boot-time topology information,
when it is relevent (non-one).

Reviewed by:	markj@, mjoras@ (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12019
2017-08-22 00:10:15 +00:00
Conrad Meyer
c768afe370 hwpstate: Add support for family 17h pstate info from MSRs
This information is normally available via acpi_perf, but in case it is not,
add support for fetching the information via MSRs on AMD family 17h (Zen)
processors.  Zen uses a slightly different formula than previous generation
AMD CPUs.

This was inspired by, but does not fix, PR 221621.

Reported by:	Sean P. R. <seanpr AT swbell.net>
Reviewed by:	mjoras@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12082
2017-08-20 00:41:49 +00:00
Conrad Meyer
0b53ecd1d7 Discover CPU topology on multi-die AMD Zen systems
The Nodes per Processor topology information determines how many bits of the
APIC ID represent the Node (Zeppelin die, on Zen systems) ID.  Documented in
Ryzen and Epyc Processor Programming Reference (PPR).

Correct topology information enables the scheduler to make better decisions
on this hardware.

Reviewed by:	kib@
Tested by:	jeff@ (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11801
2017-08-17 16:54:37 +00:00
Conrad Meyer
35d87c7e96 Fix unused varable warning in !SMP case
Fallout from r322588.  I'm not sure why !SMP is a knob we have, but, we have
it.

Reported by:	Michael Butler <imb AT protected-networks.net>
Sponsored by:	Dell EMC Isilon
2017-08-17 04:37:27 +00:00
Conrad Meyer
dc6a82801d x86: Add dynamic interrupt rebalancing
Add an option to dynamically rebalance interrupts across cores
(hw.intrbalance); off by default.

The goal is to minimize preemption. By placing interrupt sources on distinct
CPUs, ithreads get preferentially scheduled on distinct CPUs.  Overall
preemption is reduced and latency is reduced. In our workflow it reduced
"fighting" between two high-frequency interrupt sources.  Reduced latency
was proven by, e.g., SPEC2008.

Submitted by:	jeff@ (earlier version)
Reviewed by:	kib@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10435
2017-08-16 18:48:53 +00:00
Roger Pau Monné
72446721e4 srat: use pmap_unmapbios
To match the pmap_mapbios.

Reported by:	jhb
MFC with:	r322403
2017-08-13 14:50:38 +00:00