back in a simulated resume instead of entering the requested suspend state.
This helps in testing drivers separately from the acpi suspend code. To
test your drivers, set debug.acpi.suspend_bounce=1 and then run
acpiconf -s3 (or 4).
MFC after: 1 day
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
specific request and thus should first try to be allocated from the
sys_resource pool. This avoids using the sys_resource pool for wildcard
requests that have bounded ranges coming from cbb(4) and Host-PCI pcib(4)
drivers.
Tested by: Andrea Bittau <a.bittau of cs.ucl.ac.uk fame>
Sleuthing by: Andrea Bittau as well
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.
MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd
one (hardware & global lock). This should address witness complaints that
a duplicate mutex is being acquired. Be sure to free the mutex to fix a
potential memory leak.
MFC after: 3 days
simpler. It now can just use rman_is_region_manager() during
acpi_release_resource() to see if the the resource is suballocated from
a system resource. Also, the driver no longer needs MD knowledge about
how to setup bus space tags and handles when doing a suballocation, but
can simply rely on bus_activate_resource() in the parent setting all that
up.
cause the EC to stop handling future events because the GPE stayed masked.
Set a flag when queueing a GPE handler since it will ultimately re-enable
the GPE. In all other cases, re-enable it ourselves. I reworked the
patch from the submitter.
Submitted by: Rong-en Fan <grafan@gmail.com>
most systems, it causes the EC not to respond for some Acer and Compaq/HP
laptops. This is the default value for Linux also. For systems that need
it, burst mode can be enabled via the tunable/sysctl:
debug.acpi.ec.burst="1"
acpi module. Also clean up print of args a little.
This was accidentally committed as 1.9.2.3 in the stable branch. Since it
is harmless, I will let the "insta-MFC" stand unless there is a problem.
EC occasionally times out and provides bogus values (3000C). This change
prevents those systems from prematurely shutting down while we work on the
underlying problem. Also, bump the sanity value to 0...200C from 0...150C.
case where it asynchronously exits burst mode on its own. Handle different
values of hz in sleep loop. Provide more debugging options to tune EC
behavior. These tunables/sysctls may be temporary and are not for user
access if the EC is working properly. Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.
Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working. Testing shows an
increase of responsiveness by ~75% on one system.
PR: kern/98171
triggers a KASSERT) or local variables. In the case of kern_ndis, the
tsleep() actually used a common sleep address (curproc) making it
susceptible to a premature wakeup.
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
modern dual-core systems as well.
- Parse the _CST packages for each cpu and track all the states individually,
on a per-cpu basis.
- Revert to generic FADT/P_BLK based Cx control if the _CST package
is not present on all cpus. In that case, the new driver will
still support per-cpu Cx state handling. The driver will determine the
highest Cx level that can be supported by all the cpus and configure the
available Cx state based on that.
- Fixed the case where multiple cpus in the system share the same
registers for Cx state handling. To do that, added a new flag
parameter to the acpi_PkgGas and acpi_bus_alloc_gas functions that
enable the caller to add the RF_SHAREABLE flag. This flag could also be
useful to other callers (acpi_throttle?) in the tree but this change is
not yet made.
- For Core Duo cpus, both cores seems to be taken out of C3 state when
any one of the cores need to transition out. This broke the short sleep
detection logic. It is disabled now if there is more than one cpu in
the system for now as it fixed it in my case. This quirk may need to
be re-enabled later differently.
- Added support to control cx_lowest on a per-cpu basis. There is still
a generic cx_lowest to enable changing cx_lowest for all cpus with a single
sysctl and for ease of use. Sample output for the new sysctl:
dev.cpu.0.cx_supported: C1/1 C2/1 C3/57
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_usage: 0.00% 43.16% 56.83%
dev.cpu.1.cx_supported: C1/1 C2/1 C3/57
dev.cpu.1.cx_lowest: C3
dev.cpu.1.cx_usage: 0.00% 45.65% 54.34%
hw.acpi.cpu.cx_lowest: C3
This work was done by Stephane E. Potvin with some simple reworking by
myself. Thank you.
Submitted by: Stephane E. Potvin <sepotvin / videotron.ca>
MFC after: 2 weeks
return an error since it returns a count of battery devices in the system.
Set it to 0 explicitly, since it is the only switch branch that doesn't set
it.
# I guess no one uses it.
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
- Add 3 new functions to the pci_if interface along with suitable wrappers
to provide the device driver visible API:
- pci_alloc_msi(dev, int *count) backed by PCI_ALLOC_MSI(). '*count'
here is an in and out parameter. The driver stores the desired number
of messages in '*count' before calling the function. On success,
'*count' holds the number of messages allocated to the device. Also on
success, the driver can access the messages as SYS_RES_IRQ resources
starting at rid 1. Note that the legacy INTx interrupt resource will
not be available when using MSI. Note that this function will allocate
either MSI or MSI-X messages depending on the devices capabilities and
the 'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. Also note
that the driver should activate the memory resource that holds the
MSI-X table and pending bit array (PBA) before calling this function
if the device supports MSI-X.
- pci_release_msi(dev) backed by PCI_RELEASE_MSI(). This function
releases the messages allocated for this device. All of the
SYS_RES_IRQ resources need to be released for this function to succeed.
- pci_msi_count(dev) backed by PCI_MSI_COUNT(). This function returns
the maximum number of MSI or MSI-X messages supported by this device.
MSI-X is preferred if present, but this function will honor the
'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. This function
should return the largest value that pci_alloc_msi() can return
(assuming the MD code is able to allocate sufficient backing resources
for all of the messages).
- Add default implementations for these 3 methods to the pci_driver generic
PCI bus driver. (The various other PCI bus drivers such as for ACPI and
OFW will inherit these default implementations.) This default
implementation depends on 4 new pcib_if methods that bubble up through
the PCI bridges to the MD code to allocate IRQ values and perform any
needed MD setup code needed:
- PCIB_ALLOC_MSI() attempts to allocate a group of MSI messages.
- PCIB_RELEASE_MSI() releases a group of MSI messages.
- PCIB_ALLOC_MSIX() attempts to allocate a single MSI-X message.
- PCIB_RELEASE_MSIX() releases a single MSI-X message.
- Add default implementations for these 4 methods that just pass the
request up to the parent bus's parent bridge driver and use the
default implementation in the various MI PCI bridge drivers.
- Add MI functions for use by MD code when managing MSI and MSI-X
interrupts:
- pci_enable_msi(dev, address, data) programs the MSI capability address
and data registers for a group of MSI messages
- pci_enable_msix(dev, index, address, data) initializes a single MSI-X
message in the MSI-X table
- pci_mask_msix(dev, index) masks a single MSI-X message
- pci_unmask_msix(dev, index) unmasks a single MSI-X message
- pci_pending_msix(dev, index) returns true if the specified MSI-X
message is currently pending
- Save the MSI capability address and data registers in the pci_cfgreg
block in a PCI devices ivars and restore the values when a device is
resumed. Note that the MSI-X table is not currently restored during
resume.
- Add constants for MSI-X register offsets and fields.
- Record interesting data about any MSI-X capability blocks we come
across in the pci_cfgreg block in the ivars for PCI devices.
Tested on: em (i386, MSI), bce (amd64/i386, MSI), mpt (amd64, MSI-X)
Reviewed by: scottl, grehan, jfv
MFC after: 2 months
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.
Reviewed by: alc
function independently. This change is not only load-tested since I don't
have hardware that supports acpi_dock. Clean up comments and a name a
few constants.
that aren't listed as valid in the link device's set of possible IRQs.
This allows the hints to be used to work around broken BIOSes that don't
specify the correct ste of possible IRQs. A warning is issued in the
dmesg in this case to be consistent with the $PIR handling code.
MFC after: 1 week
perform the reboot action via the reset register instead of our legacy
method. Default is 0 (use legacy). This is needed because some systems
hang on reboot even though they claim to support the reset register.
MFC after: 2 days
Prevent casual modification by requiring hw.acpi.thermal.user_override to
be set first. Fix printing of negative temperatures in the K->C conversion.
Document the remaining thermal sysctls.
MFC after: 3 days
systems. Introduce a new sysctl "hw.acpi.disable_on_reboot" that allows
users to re-enable the old behavior in case it's needed for some systems.
We never disable in the power-off path.
Original approach submitted by Alexander Logvinov <abuse@akavia.ru> with
reworking by Jung-uk Kim and myself.
Move the code for printing timer statistics into a test function instead of
an ifdef (accessible via the debug.acpi.hpet_test tunable). Also use defines
for register offsets instead of magic values.
Courtesy of: slow flight to HK
If the embedded controller exists before the sysresource devices, for
example, it will be attached first. Instead, let the normal device
order function work as we first desired. [1]
There still remained a problem where we couldn't allocate resources in
acpi0 that were passed up by the sysresource pseudo-devices. These
devices had to probe/attach first to give their resources to acpi, then
acpi would allocate them before probing/attaching other devices. To
work around this, we attach them from acpi_sysres_alloc(). A better
approach would be to implement multi-pass probe/attach in newbus but
that's a much bigger task.
Suggested by: jhb [1]
Hardware from: Centaur Technologies
MFC after: 1 week
some systems were designed so that AML writes to various resources shared
with OS drivers, including the RTC, PIC, PCI, etc. These writes could
collide with writes by the OS and should never be performed. For now, we
print a message if such an access occurs, but do not block it. To block
the access, the tunable "debug.acpi.block_bad_io" can be set to 1. In the
future, we will flip the switch and this will become the default.
Information about this problem was found in Microsoft KB 283649. They
block IO accesses if the BIOS indicates via _OSI that it is Windows 2001
or higher. They always block accesses to the PIC, cascaded PIC, and ELCRs,
no matter how old the BIOS.
systems (blade servers). On most systems, this is implemented as an IO
write to the SMI port and the BIOS generates the actual reset.
PR: kern/94939
Submitted by: dodell@ixsystems.com
Reviewed by: jhb
MFC after: 3 weeks
taskqueue_start_threads(struct taskqueue **, int count, int pri,
const char *name, ...);
This allows the creation of 1 or more threads that will service a single
taskqueue. Also rework the taskqueue_create() API to remove the API change
that was introduced a while back. Creating a taskqueue doesn't rely on
the presence of a process structure, and the proc mechanics are much better
encapsulated in taskqueue_start_threads(). Also clean up the
taskqueue_terminate() and taskqueue_free() functions to safely drain
pending tasks and remove all associated threads.
The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed
to use the new API, but drivers compiled against the old definitions will
still work. Thus, recompiling drivers is not a strict requirement.
before. The symptom is that the battery inform us its charge and discharge
at the same time...
* fix bst.rate to correctly output the (dis)charging rate. We'll use
the current average over one minute command and not the at_rate command.
Note that this method is not correct if the capacity_mode is set, but
since we don't set it ourself, it is not a problem.
The at_rate do not give the actual rate but is used to compute the
estimated time for (dis)charging a battery. We should actually
write an estimation of the actual rate using at_rate cmd and then
perform a read to the various estimators.
Approved by: njl
MFC after: 2 days
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.
doesn't have any actual interrupts is listed in a _PRT entry, only print
a warning rather than panic'ing when we walk the _PRT's to build up count
of entries that reference a given link (the counts are used as weights so
that we can attempt to balance the load across IRQs used by link devices).
Instead, only panic if we attempt to use the _PRT entry to route an
interrupt for a device.
PR: i386/89545
Tested by: anders
POSIX. This also makes the struct correct we ever implement an i386-time64
architecture. Not that we need too.
Reviewed by: imp, brooks
Approved by: njl (acpica), des (no objects, touches procfs)
Tested with: make universe
to search for a specific extended capability. If the specified capability
is found for the given device, then the function returns success and
optionally returns the offset of that capability. If the capability is
not found, the function returns an error.
immediately from acpi_pci_link_route_interrupt() since we aren't going
to have a valid pci_link device to talk to try to route interrupts. This
fixes a page fault if you disable just pci_link. Note that trying to use
ACPI without pci_link is probably not advised however.
MFC after: 1 week
Tested by: Eugene Grosbein eugen at kuzbass dot ru
polarity. Some machines route PCI IRQs to an ISA IRQ but fail to include
an interrupt override entry to set the polarity and trigger of the given
ISA IRQ in their MADT table.
PR: usb/74989
Reported by: Julien Gabel jpeg at thilelli dot net
MFC after: 1 week
- Improve panic message if we fail to read the PCI bus number from a bridge
device.
- Don't try to lookup a BIOS IRQ for a link unless the link is routed via
an ISA IRQ since BIOSen currently only route PCI link devices via ISA
IRQs.
Tested by: Mathieu Prevot bsdhack at club-internet dot fr
MFC after: 1 week
Instead, re-evaluate _BIF only when we get a notify and use the cached
results. We also still evaluate _BIF once on boot. Also, optimize the
init loop a little by only querying for a particular info if it's not valid.
MFC after: 2 days
the IRQ set by the BIOS in existing devices to actually get the correct
bus number of the child PCI bus. I was not reading the bus number from
the bridge device correctly. The __BUS_ACCESSOR() macros (from which
pcib_get_bus() is built) assume that the passed in argument is a child
device. However, at the time I'm reading the bus there is no child
device yet, so I was passing in the pcib device as the child device.
The parent of the pcib device probably returned an error in the case of
a host bridge, thus resulting in random stack garbage for the bus number.
For PCI-PCI bridges, the bus number being used was actually the subvendor
of the PCI-PCI bridge device itself.
MFC after: 1 week
acpi_resource change was a minor nit offered as an early candidate for
the recent ACPICA import problem and the acpi.c change is one I need to
test still that makes the ordered probing of system devices actually work
as advertised (probe devices in order based on the type of device rather
than in the order we encounter them in the device tree).
entry that is not zero, assume that it is really a hard-wired IRQ (commonly
used for APIC routing) and not a source index. In practice, we've only
ever seen source indices of 0 for legitimate non-hard-wired _PRT entries.
Reviewed by: njl
Tested by: Alex Lyashkov shadow at psoft dot net
MFC after: 2 weeks
- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some
memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.
immediately, back off to the next higher Cx sleep state. Some machines
with a Via chipset report a valid C3 but a register read doesn't actually
halt the CPU. This would cause the machine to appear unresponsive as it
repeatedly called cpu_idle() which immediately returned. Causing interrupts
(i.e. by pressing the power button) would cause the system to make forward
progress, showing that it wasn't actually hung.
Also, enable interrupts a little earlier. We don't need them disabled
to calculate the delta time for the read.
Reported by: silby
MFC after: 2 weeks
is on AC power (i.e. not a laptop). This allows power_profile to run once
for desktop systems as well, for instance, to set C3 or CPU frequency.
MFC after: 2 weeks
support the CM-battery interface. Smart batteries can eventually be
supported without ACPI via a separate SMBus interface. The ACPI interface
uses the embedded controller for reading/writing to the SMBus, and normal
ASL definitions for locating the battery controller (since SMBus can't be
enumerated.) Also import definitions for the smart battery interface.
This was written by Hans Petter Selasky with minor cleanups from myself.
Submitted by: Hans Petter Selasky <hselasky / c2i.net>