that are being done by the OS.
For now this'll match up with the "wakeups"; although I'll dig deeper into
this to see if we can determine which sleep state the CPU managed to get
into. Most things I've seen these days only expose up to C2 or C3 via
ACPI even though the CPU goes all the way down to C6 or C7.
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
PCI domain/segment. Host-PCI bridge drivers use this API to allocate
bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
full range of bus numbers from secbus to subbus from their parent bridge.
The drivers also always program their primary bus register. The bridge
drivers also support growing their bus range by extending the bus resource
and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.
Reviewed by: imp
MFC after: 1 month
the memory ranges that they decode for downstream devices rather than
creating ResourceProducer range resource entries. The result is that
we allocate the full range to the PCI root bridge device causing
allocations in child devices to all fail.
As a workaround, ignore any standard memory resources on a PCI root
bridge device. It is normal for a PCI root bridge to allocate an I/O
resource for the I/O ports used for PCI config access, but I have not
seen any PCI root bridges that legitimately allocate a memory resource.
Reviewed by: jkim
MFC after: 1 week
shifts into the sign bit. Instead use (1U << 31) which gets the
expected result.
This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.
A similar change was made in OpenBSD.
Discussed with: -arch, rdivacky
Reviewed by: cperciva
resist easy conversion since they implement a great deal of their attach
logic inside probe(). Some of this could be fixed by moving it to attach(),
but some requires something more subtle than BUS_PROBE_NOWILDCARD.
1.3 of Intelб╝ Virtualization Technology for Directed I/O Architecture
Specification. The Extended Context and PASIDs from the rev. 2.2 are
not supported, but I am not aware of any released hardware which
implements them. Code does not use queued invalidation, see comments
for the reason, and does not provide interrupt remapping services.
Code implements the management of the guest address space per domain
and allows to establish and tear down arbitrary mappings, but not
partial unmapping. The superpages are created as needed, but not
promoted. Faults are recorded, fault records could be obtained
programmatically, and printed on the console.
Implement the busdma(9) using DMARs. This busdma backend avoids
bouncing and provides security against misbehaving hardware and driver
bad programming, preventing leaks and corruption of the memory by wild
DMA accesses.
By default, the implementation is compiled into amd64 GENERIC kernel
but disabled; to enable, set hw.dmar.enable=1 loader tunable. Code is
written to work on i386, but testing there was low priority, and
driver is not enabled in GENERIC. Even with the DMAR turned on,
individual devices could be directed to use the bounce busdma with the
hw.busdma.pci<domain>:<bus>:<device>:<function>.bounce=1 tunable. If
DMARs are capable of the pass-through translations, it is used,
otherwise, an identity-mapping page table is constructed.
The driver was tested on Xeon 5400/5500 chipset legacy machine,
Haswell desktop and E5 SandyBridge dual-socket boxes, with ahci(4),
ata(4), bce(4), ehci(4), mfi(4), uhci(4), xhci(4) devices. It also
works with em(4) and igb(4), but there some fixes are needed for
drivers, which are not committed yet. Intel GPUs do not work with
DMAR (yet).
Many thanks to John Baldwin, who explained me the newbus integration;
Peter Holm, who did all testing and helped me to discover and
understand several incredible bugs; and to Jim Harris for the access
to the EDS and BWG and for listening when I have to explain my
findings to somebody.
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Xen PVHVM guest.
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks
sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
- Make sure that are no MMU related IPIs pending on migration.
- Reset pending IPI_BITMAP on resume.
- Init vcpu_info on resume.
sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
- Add a "suspend_cancelled" parameter to pic_resume(). For the
Xen PIC, restoration of interrupt services differs between
the aborted suspend and normal resume cases, so we must provide
this information.
sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
- Don't swap out "suspend safe" timers across a suspend/resume
cycle. This includes the Xen PV and ACPI timers.
sys/dev/xen/control/control.c:
- Perform proper suspend/resume process for PVHVM:
- Suspend all APs before going into suspension, this allows us
to reset the vcpu_info on resume for each AP.
- Reset shared info page and callback on resume.
sys/dev/xen/timer/timer.c:
- Implement suspend/resume support for the PV timer. Since FreeBSD
doesn't perform a per-cpu resume of the timer, we need to call
smp_rendezvous in order to correctly resume the timer on each CPU.
sys/dev/xen/xenpci/xenpci.c:
- Don't reset the PCI interrupt on each suspend/resume.
sys/kern/subr_smp.c:
- When suspending a PVHVM domain make sure there are no MMU IPIs
in-flight, or we will get a lockup on resume due to the fact that
pending event channels are not carried over on migration.
- Implement a generic version of restart_cpus that can be used by
suspended and stopped cpus.
sys/x86/xen/hvm.c:
- Implement resume support for the hypercall page and shared info.
- Clear vcpu_info so it can be reset by APs when resuming from
suspension.
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
- Support UP kernel configurations.
sys/x86/xen/xen_intr.c:
- Properly rebind per-cpus VIRQs and IPIs on resume.
A warning is emitted again if the temperature became briefly valid
meanwhile. This avoids spamming the user when the sensor is broken.
Other values (ie. not _TMP) always raise a warning.
settings for ACPI-enumerated serial ports by forcing any IRQs that use
an ISA IRQ value with these settings to active-high instead of active-low.
This is known to occur with the BIOS on an Intel D2500CCE motherboard.
Tested by: Robert Ames <robertames@hotmail.com>, lev
Submitted by: Juergen Weiss weiss at uni-mainz.de (original patch)
we are probing a PCI-PCI bridge it is because we found one by enumerating
the devices on a PCI bus, so the bridge is definitely present. A few
BIOSes report incorrect status (_STA) for some bridges that claimed they
were not present when in fact they were.
While here, move this check earlier for Host-PCI bridges so attach fails
before doing any work that needs to be torn down.
PR: kern/91594
Tested by: Jack Vogel @ Intel
MFC after: 1 week
that uses non-ISA IRQs but use a plain IRQ resource in _CRS. However,
a non-ISA IRQ can't fit into a plain IRQ resource. If we encounter a
link like this, build the resource buffer from _PRS instead of _CRS.
- Set the correct size of the end tag in a resource buffer.
Tested by: Benjamin Lee <ben@b1c1l1.com>
MFC after: 2 weeks
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.
The commmit is not targeted for MFC.
This hack is picked up from Linux, which claims that it follows
Windows behavior.
PR: amd64/174409
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>,
Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after: 13 days
ASUS P8Z77-V board reports _AC2, _AC3 and _AC4 setpoints as 0C. With active
cooling already automatically set to _AC2, that still caused driver to print
two useless lines about temperature above _AC3 and _AC4 every ten seconds.
Three setponts of 0C is probably a board bug, but the same spam could happen
also in correct case if system is runnign not with the lowest cooling level.
... to avoid any races or inconsistencies.
This should fix a regression introduced in r243404.
Also, remove a stale comment that has not been true for quite a while
now.
Pointyhat to: avg
Teested by: trociny, emaste, dumbbell (earlier version)
MFC after: 1 week
... instead of the ever increasing ones.
Also, do free old resources when allocating new ones when cx states
change.
Tested by: Tom Lislegaard <Tom.Lislegaard@proact.no>
Obtained from: jkim
MFC after: 1 week
This alleviates issues on newer Sandy/Ivy Bridge gear that seems to require
boatloads more ACPI resources than before.
Reviewed by: avg@
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
parent adapter's _DOD list, only check the low 16 bits of both _ADR and
_DOD. The language in the ACPI spec seems to indicate that the _ADR values
should exactly match the entries in _DOD. However, I assume that the
masking added to _DOD values was added to work around some known busted
machines (the commit history doesn't indicate either way), and the ACPI
spec does require that the low 16 bits are unique for all video outputs,
so only check the low 16 bits should be fine.
This fixes recognition of video outputs that use the new standardized
device ID scheme in ACPI 3.0 that set bit 31 such as certain Dell laptops.
Tested by: Juergen Lock nox jelal kn-bremen de
MFC after: 3 days
(1022) in HPET. But according to report they still haven't fixed problem
with level-triggered interrupts.
Make workaround used for earlier chipsets apply to this new ID also.
PR: amd64/171355
MFC after: 3 days
For C1 and C2 states use cpu_ticks() to measure sleep time instead of much
slower ACPI timer. We can't do it for C3, as TSC may stop there. But it is
less important there as wake up latency is high any way.
For C1 and C2 states do not check/clear bus mastering activity status, as
it is important only for C3. As side effect it can make CPU enter C2 instead
of C3 if last BM activity was two sleeps back (unlike one before), but
that may be even good because of collecting more statistics. Premature BM
wakeup from C3, entered because of overestimation, can easily be worse then
entering C2 from both performance and power consumption points of view.
Together on dual Xeon E5645 system on sequential 512 bytes read test this
change makes cpu_idle_acpi() as fast as simplest cpu_idle_hlt() and only
few percents slower then cpu_idle_mwait(), while deeper states are still
actively used during idle periods.
To help with diagnostics, add C-state type into dev.cpu.X.cx_supported.
Sponsored by: iXsystems, Inc.
... from a user-set persistent limit on the said level.
Allow to set the user-imposed limit below current deepest available level
as the available levels may be dynamically changed by ACPI platform
in both directions.
Allow "Cmax" as an input value for cx_lowest sysctls to mean that there
is not limit and OS can use all available C-states.
Retire global cpu_cx_count as it no longer serves any meaningful
purpose.
Reviewed by: jhb, gianni, sbruno
Tested by: sbruno, Vitaly Magerya <vmagerya@gmail.com>
MFC after: 2 weeks
although by default only C1 is enabled (cx_lowest=0) and enabling deeper
states goes through acpi_cpu_set_cx_lowest which re-evaluates cpu_non_c3
MFC after: 2 weeks
cpu_non_c3 is already evaluated in acpi_cpu_cx_cst and in
acpi_cpu_set_cx_lowest.
Besides acpi_cpu_cx_list is not protected by any locking.
As a result also move setting of cpu_can_deep_sleep to more appropriate
places.
MFC after: 2 weeks
Adjust power_profile script to handle the new world order as well.
Some vendors are opting out of a C2 state and only defining C1 & C3. This
leads the acpi_cpu display to indicate that the machine supports C1 & C2
which is caused by the (mis)use of the index of the cx_state array as the
ACPI_STATE_CX value.
e.g. the code was pretending that cx_state[i] would
always convert to i by subtracting 1.
cx_state[2] == ACPI_STATE_C3
cx_state[1] == ACPI_STATE_C2
cx_state[0] == ACPI_STATE_C1
however, on certain machines this would lead to
cx_state[1] == ACPI_STATE_C3
cx_state[0] == ACPI_STATE_C1
This didn't break anything but led to a display of:
* dev.cpu.0.cx_supported: C1/1 C2/96
Instead of
* dev.cpu.0.cx_supported: C1/1 C3/96
MFC after: 2 weeks
suspend/resume procedures are minimized among them.
common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().
amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().
i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.
Reviewed by: attilio@, acpi@
(described in ACPICA source code).
- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
it twice in interrupt disabled/enabled context (ia64 version is
just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
(i386 version) for preparation of x86/acpica/acpi_wakeup.c
(MFC candidate).
Reviewed by: jkim@
MFC after: 2 days
DEVICE_RESUME() should be done before AcpiLeaveSleepState() because
PCI config space evaluation can be occurred during control method
executions.
This should fix one of the hang up problems on resuming.
MFC after: 3 days
processor objects. Instead of forcing the new-bus CPU objects to use
a unit number equal to pc_cpuid, adjust acpi_pcpu_get_id() to honor the
MADT IDs by default. As with the previous change, setting
debug.acpi.cpu_unordered to 1 in the loader will revert to the old
behavior.
Tested by: jimharris
MFC after: 1 month
disabled or the system is in shutdown procedure.
This should fix the problem which kernel never response to the sleep
button press events after the message `suspend request ignored (not
ready yet)'.
MFC after: 3 days
Use MADT to match ACPI Processor objects to CPUs. MADT and DSDT/SSDTs may
list CPUs in different orders, especially for disabled logical cores. Now
we match ACPI IDs from the MADT with Processor objects, strictly order CPUs
accordingly, and ignore disabled cores. This prevents us from executing
methods for other CPUs, e. g., _PSS for disabled logical core, which may not
exist. Unfortunately, it is known that there are a few systems with buggy
BIOSes that do not have unique ACPI IDs for MADT and Processor objects. To
work around these problems, 'debug.acpi.cpu_unordered' tunable is added.
Set this to a non-zero value to restore the old behavior.
Many thanks to jhb for pointing me to the right direction and the manual
page change.
Reported by: Harris, James R (james dot r dot harris at intel dot com)
Tested by: Harris, James R (james dot r dot harris at intel dot com)
Reviewed by: jhb
MFC after: 1 month
list CPUs in different orders, especially for disabled logical cores. Now
we match ACPI IDs from the MADT with Processor objects, strictly order CPUs
accordingly, and ignore disabled cores. This prevents us from executing
methods for other CPUs, e. g., _PSS for disabled logical core, which may not
exist. Unfortunately, it is known that there are a few systems with buggy
BIOSes that do not have unique ACPI IDs for MADT and Processor objects. To
work around these problems
bridges. Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.
To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0. For ACPI, use the _ADR
method to find the slot and function of the device. For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices. Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.
This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.
Tested by: bapt
MFC after: 1 week
Lower (ISA) IRQs are working, but allowed mask is not set correctly.
Block both by default to allow HP BL465c G6 blade system to boot.
Reported by: Attila Nagy <bra@fsn.hu>
MFC after: 1 week
The tag enforces a single restriction that all DMA transactions must not
cross a 4GB boundary. Note that while this restriction technically only
applies to PCI-express, this change applies it to all PCI devices as it
is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
a new pci_attach_common() routine that performs actions common to the
attach phase of all PCI bus drivers. Right now this only consists of
a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
pci_attach_common() from their attach routines.
MFC after: 2 weeks
- Increase probing order for ECDT table to match HID-based probing.
- Decrease probing order for HPET table to match HID-based probing.
- Decrease probing order for CPUs and system resources.
- Fix ACPI_DEV_BASE_ORDER to reflect the reality.
decoded ranges. Pass any request for a specific range that fails because
it is not in a decoded range for an ACPI Host-PCI bridge up to the parent
to see if it can still be allocated. This is based on the assumption that
many BIOSes are inconsistent/broken and that settings programmed into BARs
or resources assigned to other built-in components are more trustworthy than
the list of decoded resource ranges in _CRS. This effectively limits the
decoded ranges to only being used for "wildcard" ranges when allocating
fresh resources for a BAR, etc. At some point I would like to only be
this permissive during an early scan of firmware-assigned resources during
boot and to be strict about all later allocations, but that isn't viable
currently.
MFC after: 2 weeks
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
a decoded range for an ACPI Host-PCI bridge, try to allocate it from the
ACPI system resource range. If that works, permit the resource allocation
regardless.
MFC after: 1 week
avoid lost timer interrupts. Previous optimization attempt doing it only
for intervals less then 5000 ticks (~300us) reported to be unreliable by
some people. Probably because of some heavy SMI code on their boards.
Introduce additional safety interval of 128 counter ticks (~9us) between
programmed comparator and counter values to cover different cases of
delayed write found on some chipsets.
Approved by: re (kib)
the resource covers the entire range. Some BIOSes appear to mark
endpoints as non-fixed incorrectly (non-fixed endpoints are supposed to
be used in _PRS when OSPM is allowed to allocate a certain chunk of
address space within a larger range, I don't believe it is supposed to be
used for _CRS).
Approved by: re (kib)
resource allocation on x86 platforms:
- Add a new helper API that Host-PCI bridge drivers can use to restrict
resource allocation requests to a set of address ranges for different
resource types.
- For the ACPI Host-PCI bridge driver, use Producer address range resources
in _CRS to enumerate valid address ranges for a given Host-PCI bridge.
This can be disabled by including "hostres" in the debug.acpi.disabled
tunable.
- For the MPTable Host-PCI bridge driver, use entries in the extended
MPTable to determine the valid address ranges for a given Host-PCI
bridge. This required adding code to parse extended table entries.
Similar to the new PCI-PCI bridge driver, these changes are only enabled
if the NEW_PCIB kernel option is enabled (which is enabled by default on
amd64 and i386).
Approved by: re (kib)