Some AMD systems I have report 8 NMI and 3591 polled error sources.
Previous code could handle only one NMI source and used separate
callout for each polled source. New code can handle multiple NMIs
and groups polled sources by power of 2 of the polling period.
MFC after: 2 weeks
Since revision 3.0 this structure grown another field, breaking access
to the following data structures. This change fixes the PCIe errors
decoding on newer systems.
MFC after: 2 weeks
ACPI implementation of device_get_property would return "-1" when
property was found, but it's type wasn't supported.
This causes device_has_property to return false in that scenario, which
arguably could be considered as incorrect.
Fix that by returning "0" in that case.
Reviewed by: bz, mw
Tested by: mw
MFC after: 2 weeks
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D33103
In the acpi_cpu_postattach SYSINIT function cpu_softc may be NULL, e.g.
on arm64 when booting from FDT. Check it is not NULL at the start of
the function so we don't try to dereference a NULL pointer.
Sponsored by: The FreeBSD Foundation
Before this device unit number match was coincidental and broke if I
disabled some CPU device(s). Aside of cosmetics, for some drivers
(may be considered broken) it caused talking to wrong CPUs.
There are already APIC ID, ACPI ID and OS ID for each CPU. In perfect
world all of those may match, but at least for SuperMicro server boards
none of them do. Plus none of them match the CPU devices listing order
by ACPI. Previous code used the ACPI device listing order to number
cpuX devices. It looked nice from NewBus perspective, but introduced
4th different set of IDs. Extremely confusing one, since in some places
the device unit numbers were treated as OS CPU IDs (coretemp), but not
in others (sysctl dev.cpu.X.%location).
Generialize bus specific property accessors. Those functions allow driver code
to access device specific information.
Currently there is only support for FDT and ACPI buses.
Reviewed by: manu, mw
Sponsored by: Semihalf
Differential revision: https://reviews.freebsd.org/D31597
This disables testing the ACPI timer by default, forcing the use of
ACPI-fast rather than ACPI-safe. The broken-ACPI-timers workaround
can be re-enabled by setting the hw.acpi.timer_test_enabled=1 tunable.
This speeds up the FreeBSD boot process by 140 ms on an EC2 c5.xlarge
instance.
This change will not be MFCed.
Assuming no problems are reported, acpi_timer_test, the associated
tunable, and the ACPI-safe timecounter should be removed in FreeBSD 15.
Relnotes: The ACPI-safe timer is disabled in favour of ACPI-fast;
if timekeeping issues are observed, please test with
hw.acpi.timer_test_enabled=1 in loader.conf and report
if that fixes the problem.
When hw.acpi.timer_test_enabled is set to 0, this makes acpi_timer_test
return 1 without actually testing the ACPI timer; this results in the
ACPI-fast timecounter always being used rather than potentially using
ACPI-safe.
The ACPI timer testing was introduced in 2002 as a workaround for
errata in Pentium II and Pentium III chipsets, and is unlikely to be
needed in 2021.
While I'm here, add TSENTER/TSEXIT to make it easier to see the time
spent on the test (if it is enabled).
Reviewed by: allanjude, imp
MFC After: 1 week
Add vDSO support for timekeeping devices that support the KVM/XEN
paravirtual clock API.
Also, expose, in the userspace-accessible '<machine/pvclock.h>',
definitions that will be needed by 'libc' to support
'VDSO_TH_ALGO_X86_PVCLK'.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
Add the ability to map named components from IORT to their
SMMU or ITS node in order to setup interrupts.
It is now possible to find a node by its name (substring) and
resource ID similar to PCI nodes.
This is needed by work on a driver for NXP's Second Generation
Data Path Acceleration Architecture (DPAA2).
Reviewed by: andrew
MFC after: 2 weeks
Differential Revision:: https://reviews.freebsd.org/D31267
o Arm CoreLink TM CMN-600 Coherent Mesh Network controller,
o Arm CoreLink DMC-620 Dynamic Memory Controller.
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
These have been found in some Arm ACPI tables generated by edk2, e.g.
when describing the pl011 uart on the Arm AEMv8 model.
Reviewed by: imp, jkim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31110
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
Otherwise the resouce buffer may have been freed when
AcpiSetCurrentResources() is called, leading to a use-after-free.
PR: 255862
Submitted by: Lv Yunlong <lylgood@foxmail.com> (original version)
MFC after: 1 week
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
MFC after: 1 month
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
MFC after: 1 month
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
Discussed with: kib
MFC after: 1 month
It appears that production versions of EPYC firmware get the _STA method right
for these nodes. In fact, this workaround breaks on production hardware by
including too many uart nodes. This work around was for pre-release hardware
that wound up not having a large deployment. Move this work around to a kernel
option since the machines that needed it have been powered off and are difficult
to resurrect. Should there be a more significant deployment than is understood,
we can restrict it based on smbios strings.
Discussed with: mmacy@, seanc@, jhb@
MFC After: 3 days
The SRAT may contain multiple distinct entries that together describe a
contiguous region of physical memory. In this case we were not
coalescing the corresponding entries in the memory affinity table, which
led to fragmented phys_avail[] entries. Since r338431 the vm_phys_segs[]
entries derived from phys_avail[] will be coalesced, resulting in a
situation where vm_phys_segs[] entries do not have a covering
phys_avail[] entry. vm_page_startup() will not add such segments to the
physical memory allocator, leaving them unused.
Reported by: Don Morris <dgmorris@earthlink.net>
Reviewed by: kib, vangyzen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27620
This ensures that no writes are pending in memory, either metadata or
user data, but not including dirty pages not yet converted to fs writes.
Only filesystems declared local are suspended.
Note that this does not guarantee absence of the metadata errors or
leaks if resume is not done: for instance, on UFS unlinked but opened
inodes are leaked and require fsck to gc.
Reviewed by: markj
Discussed with: imp
Tested by: imp (previous version), pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27054
Matching table format is compatible with ACPI_ID_PROBE bus method.
Note that while ACPI_ID_PROBE matches against _HID and all _CIDs, current
acpi_pnpinfo_str() exports only _HID and first _CID. That means second
and further _CIDs should be added to both acpi_pnpinfo_str() and
ACPICOMPAT_PNP_INFO if device matching against them is required.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26824
- Use ACPI style for _DSM evaluation helper parameter types.
- Constify UUID parameter.
- Increase size of returned DSM function bitmap by acpi_DSMQuery() up to 64
items. Old limit of 8 functions is not sufficient for JEDEC JESD245 NVDIMMs.
- Add new acpi_EvaluateDSMTyped() helper which performs additional return
value type check as compared with acpi_EvaluateDSM().
- Reimplement acpi_EvaluateDSM() on top of the acpi_EvaluateDSMTyped() call.
Reviewed by: scottph, manu
Differential Revision: https://reviews.freebsd.org/D26602
Some DSDT entries have multiple interrupts for one device.
Add support for it.
This fixes ahci on NXP LS2160 and genet on RPi4
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25145
This is mostly needed for a common arm64/amd64 iommu code.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26587
This function isn't ACPI dependent and we may use it on FDT systems
as well.
o Don't repeat the function declaration, include iommu.h instead.
Reviewed by: andrew, kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26584
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
This new function allows us to find the SMMU instance assigned
for a particular PCI RID.
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D25687
The only thing this tunable enables now is reporting to ACPI _OSC that
Active State Power Management and Clock Power Management Capability are
"supported" by the OS.
I've found that at least some Supermicro server boards do not allow OS
to support native PCIe hot-plug unless it reports those capabilities.
After spending significant time in PCIe specs I have found very little
motivation for that, and none of it applies to those motherboards, not
enabling ASPM themselves. So unless OS explicitly wants to save power,
I see nothing for it to do there actually.
I guess it may get sense to support ASPM when we get Thunderbolt support.
Otherwise I have no system with PCIe hot-plug where power saving matters.
It would be nice to enable this by default, but I worry that it affect
power saving of some laptops, even though I haven't noticed that myself.
It seems that second call does not add any useful state change for all
implemented timecounters.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Some laptops don't send ACPI "lid status changed" notifications upon
opening the lid if the system was currently suspended. In r358219
this was partially fixed, updating the "lid_status" variable upon
resume even if there is no "status changed" notification from ACPI.
Unfortunately the fix in r358219 did not include notifying userland
via devd; this causes problems on systems using upowerd (e.g. KDE),
since upowerd remembers the most recent devd notification about the
lid status rather than querying the sysctl to get the current status.
This showed up as two symptoms when KDE's "When laptop lid closed: Sleep"
option is set:
1. 50% of the time, closing the lid would not trigger S3 sleep.
2. 50% of the time, plugging/unplugging AC power would trigger S3 sleep.
PR: 246477
MFC after: 3 days
This function is responsible for setting pc_domain in each pcpu
structure. Call it from the main function that starts APs, rather than
a separate SYSINIT. This makes it easier to close the window where
UMA's per-CPU slab allocator may be called while pc_domain is
uninitialized. In particular, the allocator uses pc_domain to allocate
domain-local pages, so allocations before this point end up using domain
0 for everything.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24757
Only _BCL and _BCM methods seem to be essential to the driver's
operation. If _BQC is missing then we can assume that the current
brightness is whatever we set by the last _BCM invocation. If _DCS or
_DGS is missing the we can make assumptions as well.
The change is based on a patch suggested by Anthony Jenkins
<Scoobi_doo@yahoo.com> in PR 207086.
PR: 207086
Submitted by: Anthony Jenkins <Scoobi_doo@yahoo.com (earlier version)
Reviewed by: manu
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24653