Adjust power_profile script to handle the new world order as well.
Some vendors are opting out of a C2 state and only defining C1 & C3. This
leads the acpi_cpu display to indicate that the machine supports C1 & C2
which is caused by the (mis)use of the index of the cx_state array as the
ACPI_STATE_CX value.
e.g. the code was pretending that cx_state[i] would
always convert to i by subtracting 1.
cx_state[2] == ACPI_STATE_C3
cx_state[1] == ACPI_STATE_C2
cx_state[0] == ACPI_STATE_C1
however, on certain machines this would lead to
cx_state[1] == ACPI_STATE_C3
cx_state[0] == ACPI_STATE_C1
This didn't break anything but led to a display of:
* dev.cpu.0.cx_supported: C1/1 C2/96
Instead of
* dev.cpu.0.cx_supported: C1/1 C3/96
MFC after: 2 weeks
suspend/resume procedures are minimized among them.
common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().
amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().
i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.
Reviewed by: attilio@, acpi@
(described in ACPICA source code).
- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
it twice in interrupt disabled/enabled context (ia64 version is
just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
(i386 version) for preparation of x86/acpica/acpi_wakeup.c
(MFC candidate).
Reviewed by: jkim@
MFC after: 2 days
DEVICE_RESUME() should be done before AcpiLeaveSleepState() because
PCI config space evaluation can be occurred during control method
executions.
This should fix one of the hang up problems on resuming.
MFC after: 3 days
processor objects. Instead of forcing the new-bus CPU objects to use
a unit number equal to pc_cpuid, adjust acpi_pcpu_get_id() to honor the
MADT IDs by default. As with the previous change, setting
debug.acpi.cpu_unordered to 1 in the loader will revert to the old
behavior.
Tested by: jimharris
MFC after: 1 month
disabled or the system is in shutdown procedure.
This should fix the problem which kernel never response to the sleep
button press events after the message `suspend request ignored (not
ready yet)'.
MFC after: 3 days
Use MADT to match ACPI Processor objects to CPUs. MADT and DSDT/SSDTs may
list CPUs in different orders, especially for disabled logical cores. Now
we match ACPI IDs from the MADT with Processor objects, strictly order CPUs
accordingly, and ignore disabled cores. This prevents us from executing
methods for other CPUs, e. g., _PSS for disabled logical core, which may not
exist. Unfortunately, it is known that there are a few systems with buggy
BIOSes that do not have unique ACPI IDs for MADT and Processor objects. To
work around these problems, 'debug.acpi.cpu_unordered' tunable is added.
Set this to a non-zero value to restore the old behavior.
Many thanks to jhb for pointing me to the right direction and the manual
page change.
Reported by: Harris, James R (james dot r dot harris at intel dot com)
Tested by: Harris, James R (james dot r dot harris at intel dot com)
Reviewed by: jhb
MFC after: 1 month
list CPUs in different orders, especially for disabled logical cores. Now
we match ACPI IDs from the MADT with Processor objects, strictly order CPUs
accordingly, and ignore disabled cores. This prevents us from executing
methods for other CPUs, e. g., _PSS for disabled logical core, which may not
exist. Unfortunately, it is known that there are a few systems with buggy
BIOSes that do not have unique ACPI IDs for MADT and Processor objects. To
work around these problems
bridges. Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.
To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0. For ACPI, use the _ADR
method to find the slot and function of the device. For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices. Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.
This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.
Tested by: bapt
MFC after: 1 week
Lower (ISA) IRQs are working, but allowed mask is not set correctly.
Block both by default to allow HP BL465c G6 blade system to boot.
Reported by: Attila Nagy <bra@fsn.hu>
MFC after: 1 week
The tag enforces a single restriction that all DMA transactions must not
cross a 4GB boundary. Note that while this restriction technically only
applies to PCI-express, this change applies it to all PCI devices as it
is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
a new pci_attach_common() routine that performs actions common to the
attach phase of all PCI bus drivers. Right now this only consists of
a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
pci_attach_common() from their attach routines.
MFC after: 2 weeks
- Increase probing order for ECDT table to match HID-based probing.
- Decrease probing order for HPET table to match HID-based probing.
- Decrease probing order for CPUs and system resources.
- Fix ACPI_DEV_BASE_ORDER to reflect the reality.
decoded ranges. Pass any request for a specific range that fails because
it is not in a decoded range for an ACPI Host-PCI bridge up to the parent
to see if it can still be allocated. This is based on the assumption that
many BIOSes are inconsistent/broken and that settings programmed into BARs
or resources assigned to other built-in components are more trustworthy than
the list of decoded resource ranges in _CRS. This effectively limits the
decoded ranges to only being used for "wildcard" ranges when allocating
fresh resources for a BAR, etc. At some point I would like to only be
this permissive during an early scan of firmware-assigned resources during
boot and to be strict about all later allocations, but that isn't viable
currently.
MFC after: 2 weeks
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
a decoded range for an ACPI Host-PCI bridge, try to allocate it from the
ACPI system resource range. If that works, permit the resource allocation
regardless.
MFC after: 1 week
avoid lost timer interrupts. Previous optimization attempt doing it only
for intervals less then 5000 ticks (~300us) reported to be unreliable by
some people. Probably because of some heavy SMI code on their boards.
Introduce additional safety interval of 128 counter ticks (~9us) between
programmed comparator and counter values to cover different cases of
delayed write found on some chipsets.
Approved by: re (kib)
the resource covers the entire range. Some BIOSes appear to mark
endpoints as non-fixed incorrectly (non-fixed endpoints are supposed to
be used in _PRS when OSPM is allowed to allocate a certain chunk of
address space within a larger range, I don't believe it is supposed to be
used for _CRS).
Approved by: re (kib)
resource allocation on x86 platforms:
- Add a new helper API that Host-PCI bridge drivers can use to restrict
resource allocation requests to a set of address ranges for different
resource types.
- For the ACPI Host-PCI bridge driver, use Producer address range resources
in _CRS to enumerate valid address ranges for a given Host-PCI bridge.
This can be disabled by including "hostres" in the debug.acpi.disabled
tunable.
- For the MPTable Host-PCI bridge driver, use entries in the extended
MPTable to determine the valid address ranges for a given Host-PCI
bridge. This required adding code to parse extended table entries.
Similar to the new PCI-PCI bridge driver, these changes are only enabled
if the NEW_PCIB kernel option is enabled (which is enabled by default on
amd64 and i386).
Approved by: re (kib)
processors unless the invariant TSC bit of CPUID is set. Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4). This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).
Reported by: Fabian Keil (freebsd-listen at fabiankeil dot de)
Ian FREISLICH (ianf at clue dot co dot za)
Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)
- Core2 Duo T5870 (C3 state available/enabled)
jkim - Xeon X5150 (C3 state unavailable)
resource allocation from an x86 Host-PCI bridge driver so that it can be
reused by the ACPI Host-PCI bridge driver (and eventually the MPTable
Host-PCI bridge driver) instead of duplicating the same logic. Note that
this means that hw.acpi.host_mem_start is now replaced with the
hw.pci.host_mem_start tunable that was already used in the non-ACPI case.
This also removes hw.acpi.host_mem_start on ia64 where it was not
applicable (the implementation was very x86-specific).
While here, adjust the logic to apply the new start address on any
"wildcard" allocation even if that allocation comes from a subset of
the allowable address range.
Reviewed by: imp (1)
ACPI Device() objects that do not have any device IDs available via the
_HID or _CID methods. Without a device ID a device driver cannot attach
to the device anyway. Namespace objects that are devices but not of
type ACPI_TYPE_DEVICE are not affected.
A few BIOSes have also attached a _CRS method to a PCI device to
allocate resources that are not managed via a BAR. With the previous
code those resources are allocated from acpi0 directly which can interfere
with the new PCI-PCI bridge driver (since the PCI device in question may
be behind a bridge and its resources should be allocated from that
bridge's windows instead). The resources were also orphaned and
and would end up associated with some other random device whose device_t
reused the pointer of the original ACPI-enumerated device (after it was
free'd by the ACPI PCI bus driver) in devinfo output which was confusing.
If we want to handle _CRS on PCI devices we can adjust the ACPI PCI bus
driver to do that in the future and associate the resources with the
proper device object respecting PCI-PCI bridges, etc.
Note that with this change the ACPI PCI bus driver no longer has to
delete ACPI-enumerated device_t devices that mirror PCI devices since
they should in general not exist. There are rare cases when a BIOS
will give a PCI device a _HID (e.g. I've seen a PCI-ISA bridge given
a _HID for a system resource device). In that case we leave both the
ACPI and PCI-enumerated device_t objects around just as in the previous
code.
quality to 950. HPET on modern platforms usually have better resolution and
lower latency than ACPI timer. Effectively this changes default timecounter
hardware from ACPI-fast to HPET by default when both are available.
Discussed with: avg
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests. Now the
driver actively manages the I/O windows.
This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices. The
suballocations are managed by creating an rman for each I/O window. The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus. Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus. If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.
When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free. From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed. The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.
Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows). If
that fails, the request is passed up to the parent PCI bus directly
however.
The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window. This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.
The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.
The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled. This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now. Once all platforms support the new driver, the
kernel option and old driver will be removed.
PR: kern/143874 kern/149306
Tested by: mav
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
show that there are perfectly working PM timers with occasional "hiccups",
probably because of an SMI. Now we ignore the maximum if it happens once in
the test loop and the width is small enough. Also, relax normal width a bit
to count in a boundary case.
on the fact that real hardware has almost fixed cost to read the ACPI timer.
It is virtually always false for hardware emulation and it makes no sense to
read it multiple times, which is already quite expensive for full emulation.
doesn't "fail", it may merely return garbage if it is not a valid ivar
for a given device. Our parent device must be a 'pcib' device, so we
can just assume it implements pcib IVARs, and all pcib devices have a
bus number.
Submitted by: clang via rdivacky
install or remove non-SCI interrupt handlers per ACPI Component Architecture
User Guide and Programmer Reference. ACPICA may install such interrupt
handler when a GPE block device is found, for example. Add a wrapper for
ACPI_OSD_HANDLER, convert its return values to ours, and make it a filter.
Prefer KASSERT(9) over panic(9) as we have never seen those in reality.
Clean up some style(9) nits and add my copyright.
table is present, then the acpi_ec(4) driver will allocate its resources
from nexus0 before the acpi0 device reserves resources for child devices.
Reviewed by: jkim
This is based on the patch submitted by Yuri Skripachov.
Overview of the changes:
- clarify double-use of some ACPI_BATT_STAT_* definitions
- clean up undefined/extended status bits returned by _BST
- warn about charging+discharging bits being set at the same time
PR: kern/124744
Submitted by: Yuri Skripachov <y.skripachov@gmail.com>
Tested by: Yuri Skripachov <y.skripachov@gmail.com>
MFC after: 2 weeks
- Avoid side-effect assignments in if statements when possible.
- Don't use ! to check for NULL pointers, explicitly check against NULL.
- Explicitly check error return values against 0.
- Don't use INTR_MPSAFE for interrupt handlers with only filters as it is
meaningless.
- Remove unneeded function casts.
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case. Now we support cpu_get_nominal_mhz()
for the case, instead. Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.
looking to see if there is an existing IRQ resource for a given IRQ
provided by the BIOS and using that RID if so. Otherwise, allocate a new
RID for the new IRQ.
Reviewed by: mav (a while ago)
copied as a template for _SRS, a string pointer for descriptor name is also
copied and it becomes stale as soon as it gets de-allocated[2]. Now _CRS is
used as a template for _SRS as ACPI specification suggests if it is usable.
The template from _PRS is still utilized but only when _CRS is not available
or broken. To avoid use-after-free the problem in this case, however, only
mandatory fields are copied, optional data is removed, and structure length
is adjusted accordingly.
Reported by: hps[1]
Analyzed by: avg[2]
Tested by: hps
'hw.acpi.remove_interface'. hw.acpi.install_interface lets you install new
interfaces. Conversely, hw.acpi.remove_interface lets you remove OS
interfaces from the pre-defined list in ACPICA. For example,
hw.acpi.install_interface="FreeBSD"
lets _OSI("FreeBSD") method to return 0xffffffff (or success) and
hw.acpi.remove_interface="Windows 2009"
lets _OSI("Windows 2009") method to return zero (or failure). Both are
comma-separated lists and leading white spaces are ignored. For example,
the following examples are valid:
hw.acpi.install_interface="Linux, FreeBSD"
hw.acpi.remove_interface="Windows 2006, Windows 2006.1"
added with hw.pci.do_powerstate but the PCI version was splitted into two
separate tunables later and now this is completely stale. To make it worse,
PCI devices enumerated in ACPI tree ignore this tunable as it is handled by
a function in acpi_pci.c instead.
knowledges from the file. All PCI devices enumerated in ACPI tree must use
correct one from acpi_pci.c any way. Reduce duplicate codes as we did for
pci.c in r213905. Do not return ESRCH from PCIB_POWER_FOR_SLEEP method.
When the method is not found, just return zero without modifying the given
default value as it is completely optional. As a side effect, the return
state must not be NULL. Note there is actually no functional change by
removing ESRCH because acpi_pcib_power_for_sleep() always returns zero.
Adjust debugging messages and add new ones under bootverbose to help
debugging device power state related issues.
Reviewed by: jhb, imp (earlier versions)
Short description of the changes:
- attempt to retry some commands for which it is possible (read, query)
- always make a short sleep before checking EC status in polled mode
- periodically poll EC status in interrupt mode
- change logic for detecting broken interrupt delivery and falling back
to polled mode
- check that EC is ready for input before starting a new command, wait
if necessary
This commit is based on the original patch by David Naylor.
PR: kern/150517
Submitted by: David Naylor <naylor.b.david@gmail.com>
Reviewed by: jkim
MFC after: 3 weeks
number of unexplained interrupt problems. For some reason, using HPET
interrupts there breaks HDA sound. Legacy route mode interrupts reported
to work fine there.
address spaces
There has been no need to do that starting with ACPICA 20040427 as
AcpiEnableSubsystem() installs the handlers automatically.
Additionaly, explicitly calling AcpiInstallAddressSpaceHandler before
AcpiEnableSubsystem is not supported by ACPICA and leads to too early
execution of _REG methods in some DSDTs, which may result in problems.
Big thanks to Robert Moore of ACPICA/Intel for explaining the above.
Reported by: Daniel Bilik <daniel.bilik@neosystem.cz>
Tested by: Daniel Bilik <daniel.bilik@neosystem.cz>
Reviewed by: jkim
Suggested by: "Moore, Robert" <robert.moore@intel.com>
MFC after: 1 week
ACPI specification sates that if P_LVL2_LAT > 100, then a system doesn't
support C2; if P_LVL3_LAT > 1000, then C3 is not supported.
But there are no such rules for Cx state data returned by _CST. If a
state is not supported it should not be included into the return
package. In other words, any latency value returned by _CST is valid,
it's up to the OS and/or user to decide whether to use it.
Submitted by: nork
Suggested by: mav
MFC after: 1 week
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.
There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.
As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.
Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.
Followup to: r212213
MFC after: 10 days
are still bound to BSP. It confuses timer management logic in per-CPU mode
and may cause timer not being reloaded. Check such cases on interrupt
arival and reload timer to give system some more time to manage proper
binding.
- Add special check for case when time expires before being programmed.
This fixes interrupt loss and respectively timer death on attempt to
program very short interval. Increase minimal supported period to more
realistic value.
- Add support for hint.hpet.X.allowed_irqs tunable, allowing manually
specify which interrupts driver allowed to use. Unluckily, many BIOSes
program wrong allowed interrupts mask, so driver tries to stay on safe
side by not using unshareable ISA IRQs. This option gives control over
this limitation, allowing more per-CPU timers to be provided, when FSB
interrupts are not supported. Value of this tunable is bitmask.
- Do not use regular interrupts on virtual machines. QEMU and VirtualBox
do not support them properly, that may cause problems. Stay safe by default.
Same time both QEMU and VirtualBox work fine in legacy_route mode.
VirtualBox also works fine if manually specify allowed ISA IRQs with above.
method is used by the PCI bus driver to query the power management system
to determine the proper device state to be used for a device during suspend
and resume. For the ACPI PCI bridge drivers this calls
acpi_device_pwr_for_sleep(). This removes ACPI-specific knowledge from
the PCI and PCI-PCI bridge drivers.
Reviewed by: jkim
bus_generic_resume() since the pci_link(4) driver was added.
- Change the ACPI PCI-PCI bridge driver to inherit most of its methods
from the generic PCI-PCI bridge driver. In particular, this will now
restore PCI config registers for ACPI PCI-PCI bridges.
Tested by: Oleg Sharoyko osharoiko of gmail
lengths. Make MI wrapper code to validate periods in request. Make kernel
clock management code to honor these hardware limitations while choosing hz,
stathz and profhz values.
AcpiOsMapMemory()/AcpiOsUnmapMemory() (-> pmap_mapbios()/pmap_unmapbios())
for AcpiOsReadMemory() and AcpiOsWriteMemory(). Although they do not sound
too obvious, these functions are exclusively used to access memory mapped
IO in ACPICA.
According to ACPICA User Guide and Programmer Reference, the read data must
be zero extended to fill the 64-bit return value even if the bit width of
the location is less than 64.
- Return error when 64-bit access is requested as we do not support 64-bit
PCI register access (yet). XXX We may have to split it up into two 32-bit
accesses if it is really required.
According to ACPICA User Guide and Programmer Reference, the read data must
be zero extended to fill the 32-bit return value even if the bit width of
the port is less than 32.
- Remove 64-bit read/write from AcpiOsReadMemory() and AcpiOsWriteMemory().
These functions do not support 64-bit access (yet). Clean up style nits
and unnecessary bit masking while I am here.
Reported by: Liu, Jinsong (jinsong dot liu at intel dot com) via
Lin Ming (ming dot m dot lin at intel dot com) [1]
HPET to steal IRQ0 from i8254 and IRQ8 from RTC timers. It can be suitable
for HPETs without FSB interrupts support, as it gives them two unshared
IRQs. It allows them to provide one per-CPU event timer on dual-CPU system,
that should be suitable for further tickless kernels.
To enable it, such lines may be added to /boot/loader.conf:
hint.atrtc.0.clock=0
hint.attimer.0.clock=0
hint.hpet.0.legacy_route=1
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.
Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.
For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.
This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.
Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
measured interval as upper bound. It should be more precise then just
assuming hz/2. For idle CPU it should be quite precise, for busy - not
worse then before.
state lower than the lowest one supported by the current CPU. This closes
some races with changes to the hw.acpi.cpu_cx_lowest sysctl while Cx
states for individual CPUs were changing (e.g. unplugging the AC adapter
of a laptop) that could result in panics.
Submitted by: Giovanni Trematerra
Tested by: David Demelier demelier dot david of gmail
MFC after: 3 days
via %s
Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.
Found by: clang
MFC after: 2 week
device, make sure we have no real HPET device entry with same ID.
As side effect, it potentially allows several HPETs to be attached.
Use first of them for timecounting, rest (if ever present) could later
be used as event sources.
Setting the new sysctl MIB "debug.acpi.enable_debug_objects" to a non-zero
value enables us to print Debug object when something is written to it.
- Allow users to disable interpreter slack mode. Setting the new tunable
"debug.acpi.interpreter_slack" to zero disables some workarounds for common
BIOS mistakes and enables strict ACPI implementations by the specification.
It is belived that that pass s not needed anymore.
Specifically it is not required now for the reasons that were given
in the removed comment.
Discussed with: jhb
MFC after: 4 weeks
Some current systems dynamically load SSDT(s) when _PDC/_OSC method
of Processor is evaluated. Other devices in ACPI namespace may access
objects defined in the dynamic SSDT. Drivers for such devices might
have to have a rather high priority, because of other dependencies.
Good example is acpi_ec driver for EC.
Thus we attach to Processors as early as possible to load the SSDTs
before any other drivers may try to evaluate control methods.
It also seems to be a natural order for a processor in a device
hierarchy.
On the other hand, some child devices on acpi cpu bus need to access
other system resources like PCI configuration space of chipset devices,
so they need to be probed and attached rather late.
For this reason we probe and attach the cpu bus at
SI_SUB_CONFIGURE:SI_ORDER_MIDDLE SYSINIT level.
In the future this could be done more elegantly via multipass.
Please note that acpi drivers that might access ACPI namespace from
device_identify will do that before _PDC/_OSC of Processors are evaluated.
Legacy cpu driver is not affected by this change.
PR: kern/142561 (in part)
Reviewed by: jhb
Silence from: acpi@
MFC after: 5 weeks
_PDC was deprecated in favor of _OSC long time ago, but it
seems that they still peacefully coexist and in some case
only _PDC is present.
Still _OSC provides a reacher interface and is capable to
report back its status.
If the status is non-zero, then report it, we may find
it useful to understand what firmware expects from OS.
Also clean up some comments that became less useful over time.
Reviewed by: njl, jhb, rpaulo
MFC after: 3 weeks
Also, account for a quirk of AMD/ATI HPET which reports number of timers
instead of id of the last timer as manadated by the specification.
Currently this has no effect on functionality but in the future we may
make actual use of the HPET timers, not only of its timecounter.
MFC after: 2 weeks
This is not only a prudent thing to do, but also makes sure that probe
method is not confused by non-NULL 'private', if the previous attach
attempt fails for any reason.
PR: kern/142561
Tested by: Alex Goncharov <alex-goncharov@comcast.net>
MFC after: 4 days
o acpi_hpet: auto-added 'wildcard' devices can be identified by
non-NULL handle attribute.
o acpi_ec: auto-add 'wildcard' devices can be identified by
unset (NULL) private attribute.
o acpi_cpu: use private instead of magic to store cpu id.
Reviewed by: jhb
Silence from: acpi@
MFC after: 2 weeks
X-MFC-Note: perhaps the ivar should stay for ABI stability
sysctl lock. The 'video' lock now protects the 'bus' of video output
devices attached to a graphics adapter. It is used when iterating over
the list of outputs, etc. The 'video_output' lock is used to lock the
output-specific data similar to a driver lock for the individual video
outputs.
MFC after: 2 weeks
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
mapping the RSDT or XSDT and searching for a table with a given signature
out into acpica_machdep.c for both amd64 and i386.
BIOS-enumerated devices:
- Assume a device is a match if the memory and I/O ports match even if the
IRQ or DRQ is wrong or missing. Some BIOSes don't include an IRQ for
the atrtc device for example.
- Add a hack to better match floppy controller devices. Many BIOSes do not
include the starting port of the floppy controller listed in the hints
(0x3f0) in the resources for the device. So far, however, all the BIOS
variations encountered do include the 'port + 2' resource (0x3f2), so
adjust the matching for "fdc" devices to look for 'port + 2'.
Reviewed by: imp
MFC after: 3 days
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.
Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.
For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.
Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.
Bump __FreeBSD_version in order to reflect the newbus lock introduction.
Reviewed by: ed, hps, jhb, imp, mav, scottl
No answer by: ariff, thompsa, yongari
Tested by: pho,
G. Trematerra <giovanni dot trematerra at gmail dot com>,
Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by: Yahoo! Incorporated
Approved by: re (ksmith)
- Preallocate some memory for ACPI tasks early enough. We cannot use
malloc(9) any more because spin mutex may be held here. The reserved
memory can be tuned via debug.acpi.max_tasks tunable or ACPI_MAX_TASKS
in kernel configuration. The default is 32 tasks.
- Implement a custom taskqueue_fast to wrap the new memory allocation.
This implementation is not the fastest in the world but we are being
conservative here.
a _BBN value of 0 if it was for the first bridge encountered since some
older systems returned _BBN of 0 for all bridges. However, some newer
systems enumerate bridges with non-zero _BBN before bus 0 which is
perfectly valid. Handle both cases by trusting the first bridge that has
a _BBN of 0 and falling back to reading from non-standard config registers
only for subsequent bridges with a _BBN of 0. We also only perform this
check for segment (domain) 0. We assume that _BBN is always correct
for segments other than 0.
Tested by: Josef Moellers josef.moellers at fujitsu
MFC after: 1 week
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.
By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.
- Probe supported sleep states from acpi_attach() just once and do not
call AcpiGetSleepTypeData() again. It is redundant because
AcpiEnterSleepStatePrep() does it any way.
- Treat UNKNOWN sleep state as NONE, i.e., "do nothing", and remove obscure
NONE state (ACPI_S_STATES_MAX + 1) to avoid confusions.
- Do not set unsupported sleep states as default button/switch events.
If the default sleep state is not supported, just set it as UNKNOWN/NONE.
- Do not allow sleep state change if the system is not fully up and running.
This should prevent entering S5 state multiple times, which causes strange
behaviours later.
- Make sleep states case-insensitive when they are used with sysctl(8).
For example,
sysctl hw.acpi.lid_switch_state=s1
sysctl hw.acpi.sleep_button_state=none
are now legal and equivalent to the uppercase ones.
This change adds (possibly redundant) early check for invalid
state input parameter (including S0). Handling of S5 request
is reduced to simply calling shutdown_nice(). As a result
control flow of acpi_EnterSleepState is somewhat simplified
and resume/backout half of the function is not executed
for S5 (soft poweroff) request and invalid state requests.
Note: it seems that shutdown_nice may act as nop when initproc
is already initialized (to grab pid of 1), but init process is in
"pre-natal" state.
Tested by: Fabian Keil <fk@fabiankeil.de>
Reviewed by: njl, jkim
Approved by: rpaulo
into acpi_cpu_startup() which is where all the other code to update this
global variable lives. This fixes a bug where cpu_cx_count was not updated
correctly if acpi_cpu_generic_cx_probe() returned early.
PR: kern/108581
Debugged by: Bruce Cran
Reviewed by: avg, njl, sepotvin
MFC after: 3 days
This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
This is triggered only if BIOS configures ACPI_BITREG_BUS_MASTER_RLD
aka BRLD_EN_BM to 1.
Rationale:
1. we do not support C3 on PIIX4E
2. bus master activity need not break out of C2 state
3. because of CPU_QUIRK_NO_BM_CTRL quirk we may reset bus master
status which would result in immediate break out from C2
So if you have seen
cpu0: too many short sleeps, backing off to C1
with this chipset before you may want to try cx_lowest of C2 again.
Reviewed by: rpaulo (mentor), njl
Approved by: rpaulo (mentor)
if (batt_sleep_ms)
AcpiOsSleep(1);
where the rest are all:
if (batt_sleep_ms)
AcpiOsSleep(batt_sleep_ms);
I can't recall why that one was different, so change it
to match the rest.
Pointed out by: Christoph Mallon
MFC after: 2 weeks
On some laptops with smart batteries, enabling battery monitoring
software causes keystrokes from atkbd to be lost. This has also been
reported on Linux, and is apparently due to the keyboard and I2C line
for the battery being routed through the same chip. Whether that's
accurate or not, adding extra sleeps to the status checking code
causes the problem to go away.
I've been running this for nearly six months now on my laptop,
it works like a charm.
Reviewed by: Nate Lawson (in a previous revision)
MFC after: 2 weeks
- An "at" hint now reserves a device name.
- A new BUS_HINT_DEVICE_UNIT method is added to the bus interface. When
determining the unit number of a device, this method is invoked to
let the bus driver specify the unit of a device given a specific
devclass. This is the only way a device can be given a name reserved
via an "at" hint.
- Implement BUS_HINT_DEVICE_UNIT() for the acpi(4) and isa(4) bus drivers.
Both of these busses implement this by comparing the resources for a
given hint device with the resources enumerated by ACPI/PnPBIOS and
wire a unit if the hint resources are a subset of the "real" resources.
- Use bus_hinted_children() for adding hinted devices on isa(4) busses
now instead of doing it by hand.
- Remove the unit kludging from sio(4) as it is no longer necessary.
Prodding from: peter, imp
OK'd by: marcel
MFC after: 1 month
use process ID as ACPI thread ID. Concurrent requests with equal thread
IDs broke ACPI mutexes operation causing unpredictable errors including
AE_AML_MUTEX_NOT_ACQUIRED that I have seen.
Use kernel thread ID instead of process ID for ACPI thread.
- Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the
rest of the kernel. It now also accepts parameters via function
arguments rather than global variables.
- Add a notion of minimum and maximum bus numbers and reject requests for
an out of range bus.
- Add more range checks on slot/func/reg/bytes parameters to the cfg reg
read/write routines. Don't panic on any invalid parameters, just fail
the request (writes do nothing, reads return -1). This matches the
behavior of the other cfg mechanisms.
- Port the memory mapped configuration space access to amd64. On amd64
we simply use the direct map (via pmap_mapdev()) for the memory mapped
window.
- During acpi_attach() just after loading the ACPI tables, check for a
MCFG table. If it exists, call pciereg_cfgopen() on each subtable
(memory mapped window). For now we only support windows for domain 0
that start with bus 0. This removes the need for more chipset-specific
quirks in the MD code.
- Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets
since these machines should all have MCFG tables via ACPI.
- Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen()
earlier.
MFC after: 2 weeks
behavior. Specifically, probe Host-PCI bridges in the order they are
encountered in the tree. For CPUs, just use an order of 100000 and assume
that no Host-PCI bridges will be more than 10000 levels deep in the
namespace. This fixes an issue on some boxes where the HPET timer stopped
attaching.
assumptions about the state of the cooling devices. Instead, switch them
off on init and, only after that, we are in TZ_ACTIVE_NONE.
Submited by: Andriy Gapon <avg at icyb.net.ua>
Reviewed by: njl
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.
Discussed with: imp
Silence on: arch@
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
check its device ID rather than having a bogus PCI attachment that only
checked for the ID in probe and always failed. As a side effect, you
can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
for ichss0) when doing PCI config space operations to enable SpeedStep.
MFC after: 2 weeks
Reviewed by: njl, Andriy Gapon avg of icyb.net.ua
the appropriate bit in the DEVACTB register.
This change allows the C2 state on those systems to work as expected.
Reviewed by: njl
Submitted by: Andriy Gapon <avg at icyb.net.ua>
MFC after: 1 week
spec:
- Use read/modify/write cycles to enable and disable the HPET instead of
writing 0 to reserved bits.
- Shutdown the HPET during suspend as encouraged by the spec.
- Fail to attach to an HPET with a period of zero.
MFC after: 1 week
PR: kern/119675 [3]
Reported by: Leo Bicknell | bicknell ufp.org
zone code. The GPE handler method (i.e. _L00) generates various Notify
events that need to be run to completion before the GPE is re-enabled.
In ACPI-CA, we queue an asynch callback at the same priority as a Notify
so that it will only run after all Notify handlers have completed. The
callback re-enables the GPE afterwards. We also changed the priority of
Notifies to be the same as GPEs, given the possibility that another GPE
could arrive before the Notifies have completed and we don't want it to
get queued ahead of the rest.
The ACPI-CA change was submitted by Alexey Starikovskiy (SUSE) and will
appear in a later release. Special thanks to him for helping track this
bug down.
MFC after: 1 week
Tested by: jhb, Yousif Hassan <yousif / alumni.jmu.edu>
correct number of acpi_thermalX devices. Having this wrong caused the
acpi_thermal thread to realloc the array of devices on each loop iteration.
MFC after: 1 week
PR: kern/118497
Submitted by: Pasi Parviainen
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
CPUs to make sure idle threads are evicted from the softc before returning
from acpi_cpu_shutdown(). However, this is unnecessary since stop_cpus()
handles this for itself and at this point it's possible that our IPI will be
blocked (interrupts disabled).
Thanks to: Glen Leeder <glen.leeder / nokia.com>
MFC after: 3 days
reason (not all BIOSen have _DIS methods for all link devices for example).
This matches the behavior of attach() with respect to _DIS as well.
Submitted by: njl
handle to the PCI device_t if the ACPI device_t is already attached to a
driver. This happens on the Tablet TC1000 which for some reason includes
two PCI-ISA bridges and treats the second bridge as an ACPI system resource
device.
Reviewed by: njl (a while ago)
MFC after: 3 days
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.