- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
- Add 3 new functions to the pci_if interface along with suitable wrappers
to provide the device driver visible API:
- pci_alloc_msi(dev, int *count) backed by PCI_ALLOC_MSI(). '*count'
here is an in and out parameter. The driver stores the desired number
of messages in '*count' before calling the function. On success,
'*count' holds the number of messages allocated to the device. Also on
success, the driver can access the messages as SYS_RES_IRQ resources
starting at rid 1. Note that the legacy INTx interrupt resource will
not be available when using MSI. Note that this function will allocate
either MSI or MSI-X messages depending on the devices capabilities and
the 'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. Also note
that the driver should activate the memory resource that holds the
MSI-X table and pending bit array (PBA) before calling this function
if the device supports MSI-X.
- pci_release_msi(dev) backed by PCI_RELEASE_MSI(). This function
releases the messages allocated for this device. All of the
SYS_RES_IRQ resources need to be released for this function to succeed.
- pci_msi_count(dev) backed by PCI_MSI_COUNT(). This function returns
the maximum number of MSI or MSI-X messages supported by this device.
MSI-X is preferred if present, but this function will honor the
'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. This function
should return the largest value that pci_alloc_msi() can return
(assuming the MD code is able to allocate sufficient backing resources
for all of the messages).
- Add default implementations for these 3 methods to the pci_driver generic
PCI bus driver. (The various other PCI bus drivers such as for ACPI and
OFW will inherit these default implementations.) This default
implementation depends on 4 new pcib_if methods that bubble up through
the PCI bridges to the MD code to allocate IRQ values and perform any
needed MD setup code needed:
- PCIB_ALLOC_MSI() attempts to allocate a group of MSI messages.
- PCIB_RELEASE_MSI() releases a group of MSI messages.
- PCIB_ALLOC_MSIX() attempts to allocate a single MSI-X message.
- PCIB_RELEASE_MSIX() releases a single MSI-X message.
- Add default implementations for these 4 methods that just pass the
request up to the parent bus's parent bridge driver and use the
default implementation in the various MI PCI bridge drivers.
- Add MI functions for use by MD code when managing MSI and MSI-X
interrupts:
- pci_enable_msi(dev, address, data) programs the MSI capability address
and data registers for a group of MSI messages
- pci_enable_msix(dev, index, address, data) initializes a single MSI-X
message in the MSI-X table
- pci_mask_msix(dev, index) masks a single MSI-X message
- pci_unmask_msix(dev, index) unmasks a single MSI-X message
- pci_pending_msix(dev, index) returns true if the specified MSI-X
message is currently pending
- Save the MSI capability address and data registers in the pci_cfgreg
block in a PCI devices ivars and restore the values when a device is
resumed. Note that the MSI-X table is not currently restored during
resume.
- Add constants for MSI-X register offsets and fields.
- Record interesting data about any MSI-X capability blocks we come
across in the pci_cfgreg block in the ivars for PCI devices.
Tested on: em (i386, MSI), bce (amd64/i386, MSI), mpt (amd64, MSI-X)
Reviewed by: scottl, grehan, jfv
MFC after: 2 months
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.
with some Dell servers that booted w/o a problem[*] on 5.4, but failed
with 6.0-BETA.
On the PCI bus, when we do lazy resource allocation, we narrow the
range requested as we pass through bridges to reflect how the bridges
are programmed and what addresses they pass. However, when we're
doing an allocation on a bus that's directly connected to a host
bridge, no such translation can take place. We already had a fallback
range for memory requests, but none for ioports. As such, provide a
fallback for I/O ports so we don't allocate location 0, which will
have undesired side effects when the resources are actually used.
This fixes a problem with booting a Dell server with usb in the
kernel. However, it is an unsatisfying solution. I don't like the
hard coded value, and I think we should start narrowing the resources
returned to not be in the so-called isa alias area (where the ranage &
0x0300 must be 0 iirc). Doing such filtering will have to wait for
another day.
This may be a good 6 candidate, maybe after its had a chance to be
refined.
Tested by: glebius@
- Use a new-bus device driver for the ACPI PCI link devices. The devices
are called pci_linkX. The driver includes suspend/resume support so that
the ACPI bridge drivers no longer have to poke the links to get them
to handle suspend/resume. Also, the code to handle which IRQs a link is
routed to and choosing an IRQ when a link is not already routed is all
contained in the link driver. The PCI bridge drivers now ask the link
driver which IRQ to use once they determine that a _PRT entry does not
use a hardwired interrupt number.
- The new link driver includes support for multiple IRQ resources per
link device as well as preserving any non-IRQ resources when adjusting
the IRQ that a link is routed to.
- The entire approach to routing when using a link device is now
link-centric rather than pci bus/device/pin specific. Thus, when
using a tunable to override the default IRQ settings, one now uses
a single tunable to route an entire link rather than routing a single
device that uses the link (which has great foot-shooting potential if
the user tries to route the same link to two different IRQs using two
different pci bus/device/pin hints). For example, to adjust the IRQ
that \_SB_.LNKA uses, one would set 'hw.pci.link.LNKA.irq=10' from the
loader.
- As a side effect of having the link driver, unused link devices will now
be disabled when they are probed.
- The algorithm for choosing an IRQ for a link that doesn't already have an
IRQ assigned is now much closer to the one used in $PIR routing. When a
link is routed via an ISA IRQ, only known-good IRQs that the BIOS has
already used are used for routing instead of using probabilities to
guess at which IRQs are probably not used by an ISA device. One change
from $PIR is that the SCI is always considered a viable ISA IRQ, so that
if the BIOS does not setup any IRQs the kernel will degenerate to routing
all interrupts over the SCI. For non ISA IRQs, interrupts are picked
from the possible pool using a simplistic weighting algorithm.
Tested by: ru, scottl, others on acpi@
Reviewed by: njl
allocate unallocated memory resources from the top 32MB of the address
space rather than the top 2GB. While the latter works on some
chipsets, it fails badly on others. 32MB is more conservative and
matches what cheap harware from this era is hardwired to pass.
incomplete in that the PRT routing was not aware of link programming.
Fix this by doing all routing through the link devices. The new algorithm
for setting up links is:
1. Read _CRS to get current setting. If invalid (not in _PRS), then set
to 0.
2. Attempt to call _DIS on the link. If successful, mark the link as not
routed. Otherwise, assume it still is.
Then when a routing request occurs:
3. Update weights for all IRQs
4. Attempt to route the initial IRQ if valid
5. If that fails, walk through the sorted list, attempting to route IRQs.
6. Configure the trigger/polarity based on _PRS.
Other changes:
* Add acpi_pci_find_prt() to look up the PRT entry for a given device and
acpi_pci_link_route() to select/route the best IRQ for it.
* Remove duplicated code in acpi_pcib_route_interrupt() that picked the
first IRQ from _PRS.
* Remove unneeded arguments from acpi_pcib_resume() and friends.
* Ignore _STA on link devices but report if it seems strange.
* Add a prt_source handle to the PRT structure since the ACPI struct
ACPI_PCI_ROUTING_TABLE uses a fixed-size entry for it. We'll need to
dynamically size this object if we want to use it the same way ACPI-CA
does. Null-terminate the source.
Tested by: Luo Hong <luohong99_at_mails.tsinghua.edu.cn>,
Jeffrey Katcher <jmkatcher_at_yahoo.com>
Info from: jhb, Len Brown (Intel)
this more accurately reflects what the underlying hardware of most
acpi machines that don't have children pci busses.
We still need a better way to get this information from acpi/hardware.
Unify the code to disable GPEs with the enable code. Shutdown is handled
the same way. ACPI now does all wake/sleep prep for child devices so
now they no longer need to call external functions in the suspend/resume
path. Add the flags to non-ACPI busses (i.e., pci).
allocation was passed up to nexus. Now, we probe sysresource objects and
manage the resources they describe in a local rman pool. This helps
devices which attach/detach varying resources (like the _CST object) and
module loads/unloads. The allocation/release routines now check to see if
the resource is described in a child sysresource object and if so,
allocate from the local rman. Sysresource objects add their resources to
the pool and reserve them upon boot. This means sysresources need to be
probed before other ACPI devices.
Changes include:
* Add ordering to the child device probe. The current order is: system
resource objects, embedded controllers, then everything else.
* Make acpi_MatchHid take a handle instead of a device_t arg.
* Replace acpi_{get,set}_resource with the generic equivalents.
o Save and restore bars for suspend/resume as well as for D3->D0
transitions.
o preallocate resources that the PCI devices use to avoid resource
conflicts
o lazy allocation of resources not allocated by the BIOS.
o set unattached drivers to state D3. Set power state to D0
before probe/attach. Right now there's two special cases
for this (display and memory devices) that need work in other
areas of the tree.
Please report any bugs to me.
to PCI bridge can be read be evaluating the _BBN method of the host to PCI
device. Unfortunately, there appear to be some lazy/ignorant/moronic/
whatever BIOS writers that return 0 for _BBN for all host to PCI bridges in
the system. On a system with a single host to PCI bridge this is not a
problem as the child bus of that single bridge will be bus 0 anyway.
However, on systems with multiple host to PCI bridges and l/i/m/w BIOS
writers this is a major problem resulting in all but the first host to
PCI bridge failing to attach. So, this adds a workaround.
If the _BBN of a host to PCI bridge is zero and pcib0 already exists
and is not us, the we use _ADR to look up our PCI function and slot
(we currently assume we are on bus 0) and use that to call
host_pcib_get_busno() to try and extract our bus number from config
registers on the host to PCI bridge device. If that fails, then we make
an evil assumption that ACPI's _SB_ namespace lays out the host to PCI
bridges in ascending order and use our pcib unit number as our bus
number.
Approved by: re
This allocate the best IRQ to boot-disable devices (have IRQ 0).
Allocated IRQ will be used for PCI interrupt routing when ACPI is
enabled.
Note that verbose messaging enabled for the time being so that
people can easily notice the strange behavior if it happened.
- Add an ACPI PCI-PCI bridge driver (the previous driver just handled
Host-PCI bridges) that is a PCI driver that is a subclass of the generic
PCI-PCI bridge driver. It overrides probe, attach, read_ivar, and
pci_route_interrupt.
- The probe routine only succeeds if our parent is an ACPI PCI bus which
we test for by seeing if we can read our ACPI_HANDLE as an ivar.
- The attach routine saves a copy of our handle and calls the new
acpi_pcib_attach_common() function described below.
- The read_ivar routine handles normal PCI-PCI bridge ivars and adds an
ivar to return the ACPI_HANDLE of the bus this bridge represents.
- The route_interrupt routine fetches the _PRT (PCI Interrupt Routing
Table) from the bridge device's softc and passes it off to
acpi_pcib_route_interrupt() to route the interrupt.
- Split the old ACPI Host-PCI bridge driver into two pieces. Part of
the attach routine and most of the route_interrupt routine remain in
acpi_pcib.c and are shared by both ACPI PCI bridge drivers.
- The attach routine verifies the PCI bridge is present, reads in
the _PRT for the bridge, and attaches the child PCI bus.
- The route_interrupt routine uses the passed in _PRT to route a PCI
interrupt.
The rest of the driver is the ACPI Host-PCI bridge specific bits that
live in acpi_pcib_acpi.c.
- We no longer duplicate pcib_maxslots but use it directly.
- The driver now uses the pcib devclass instead of its own devclass.
This means that PCI busses are now only children of pcib devices.
- Allow the ACPI_HANDLE for the child PCI bus to be read as an ivar
of the child bus.
- Fetch the _PRT for routing PCI interrupts directly from our softc
instead of walking the devclass to find ourself and then fetch our
own softc.
With this change and the new ACPI PCI bus driver, ACPI can now properly
route interrupts for devices behind PCI-PCI bridges. That is, the
Itanium2 with like 10 PCI busses can now boot ok and route all the PCI
interrupts. Hopefully this will also fix problems people are having with
CardBus bridges behind PCI-PCI bridges not properly routing interrupts
when ACPI is used.
Tested on: i386, ia64
LNK device (interrupt source provider sort of) is present before using it,
but the code actually tested the status (_STA) of the PCI bridge device
doing the routing, not the actual LNK device. Fix it to check the status
of the LNK device.
Use ACPI_SUCCESS/ACPI_FAILURE consistently.
The AcpiGetInto* interfaces are obsoleted by ACPI_ALLOCATE_BUFFER.
Use _ADR as well as _BBN to get our bus number.
Fix acpi_DeviceIsPresent to check for valid _STA data and to check
the "present" and "functioning" bits.
Use acpi_DeviceIsPresent in acpi_pcib rather than rolling our own
(also broken) version.
in the case where there are no interrupts routed for it does not
contain enough space to use it to route an interrupt. In the case
where we need to route an interrupt, throw away the returned buffer
and create a new one containing the interrupt we want.
PCI bus object. This should deal both with already-routed interrupts
as well as devices that need an interrupt routed.
Note that it *doesn't* deal with interlocked interrupt dependancies, nor
does it select between interrupt options in a smart way. These are
optimisations that need further work.
- Use __func__ instead of __FUNCTION.
- Support power-off to S3 or S5 (takawata)
- Enable ACPI debugging earlier (with a sysinit)
- Fix a deadlock in the EC code (takawata)
- Improve arithmetic and reduce the risk of spurious wakeup in
AcpiOsSleep.
- Add AcpiOsGetThreadId.
- Simplify mutex code (still disabled).
acpi_EvaluateInteger.
Use acpi_EvaluateInteger instead of doing things the hard way where
possible.
AcpiSetSystemSleepState (unofficial) becomes AcpiEnterSleepState.
Use the AcpiGbl_FADT pointer rather than searching for the FADT.
infrastructure. It's not perfect, but it's a lot better than what
we've been using so far. The following rules apply to this:
o BSD component names should be capitalised
o Layer names should be taken from the non-CA set for now. We
may elect to add some new BSD-specific layers later.
- Make it possible to turn off selective debugging flags or layers
by listing them in debug.acpi.layer or debug.acpi.level prefixed
with !.
- Fully implement support for avoiding nodes in the ACPI namespace.
Nodes may be listed in the debug.acpi.avoid environment variable;
these nodes and all their children will be ignored (although still
scanned over) by ACPI functions which scan the namespace. Multiple
nodes can be specified, separated by whitespace.
- Implement support for selectively disabling ACPI subsystem components
via the debug.acpi.disable environment variable. The following
components can be disabled:
o bus creation/scanning of the ACPI 'bus'
o children attachment of children to the ACPI 'bus'
o button the acpi_button control-method button driver
o ec the acpi_ec embedded-controller driver
o isa acpi replacement of PnP BIOS for ISA device discovery
o lid the control-method lid switch driver
o pci pci root-bus discovery
o processor CPU power/speed management
o thermal system temperature detection and control
o timer ACPI timecounter
Multiple components may be disabled by specifying their name(s)
separated by whitespace.
- Add support for ioctl registration. ACPI subsystem components may
register ioctl handlers with the /dev/acpi generic ioctl handler,
allowing us to avoid the need for a multitude of /dev/acpi* control
devices, etc.