- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
- Add 3 new functions to the pci_if interface along with suitable wrappers
to provide the device driver visible API:
- pci_alloc_msi(dev, int *count) backed by PCI_ALLOC_MSI(). '*count'
here is an in and out parameter. The driver stores the desired number
of messages in '*count' before calling the function. On success,
'*count' holds the number of messages allocated to the device. Also on
success, the driver can access the messages as SYS_RES_IRQ resources
starting at rid 1. Note that the legacy INTx interrupt resource will
not be available when using MSI. Note that this function will allocate
either MSI or MSI-X messages depending on the devices capabilities and
the 'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. Also note
that the driver should activate the memory resource that holds the
MSI-X table and pending bit array (PBA) before calling this function
if the device supports MSI-X.
- pci_release_msi(dev) backed by PCI_RELEASE_MSI(). This function
releases the messages allocated for this device. All of the
SYS_RES_IRQ resources need to be released for this function to succeed.
- pci_msi_count(dev) backed by PCI_MSI_COUNT(). This function returns
the maximum number of MSI or MSI-X messages supported by this device.
MSI-X is preferred if present, but this function will honor the
'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. This function
should return the largest value that pci_alloc_msi() can return
(assuming the MD code is able to allocate sufficient backing resources
for all of the messages).
- Add default implementations for these 3 methods to the pci_driver generic
PCI bus driver. (The various other PCI bus drivers such as for ACPI and
OFW will inherit these default implementations.) This default
implementation depends on 4 new pcib_if methods that bubble up through
the PCI bridges to the MD code to allocate IRQ values and perform any
needed MD setup code needed:
- PCIB_ALLOC_MSI() attempts to allocate a group of MSI messages.
- PCIB_RELEASE_MSI() releases a group of MSI messages.
- PCIB_ALLOC_MSIX() attempts to allocate a single MSI-X message.
- PCIB_RELEASE_MSIX() releases a single MSI-X message.
- Add default implementations for these 4 methods that just pass the
request up to the parent bus's parent bridge driver and use the
default implementation in the various MI PCI bridge drivers.
- Add MI functions for use by MD code when managing MSI and MSI-X
interrupts:
- pci_enable_msi(dev, address, data) programs the MSI capability address
and data registers for a group of MSI messages
- pci_enable_msix(dev, index, address, data) initializes a single MSI-X
message in the MSI-X table
- pci_mask_msix(dev, index) masks a single MSI-X message
- pci_unmask_msix(dev, index) unmasks a single MSI-X message
- pci_pending_msix(dev, index) returns true if the specified MSI-X
message is currently pending
- Save the MSI capability address and data registers in the pci_cfgreg
block in a PCI devices ivars and restore the values when a device is
resumed. Note that the MSI-X table is not currently restored during
resume.
- Add constants for MSI-X register offsets and fields.
- Record interesting data about any MSI-X capability blocks we come
across in the pci_cfgreg block in the ivars for PCI devices.
Tested on: em (i386, MSI), bce (amd64/i386, MSI), mpt (amd64, MSI-X)
Reviewed by: scottl, grehan, jfv
MFC after: 2 months
to search for a specific extended capability. If the specified capability
is found for the given device, then the function returns success and
optionally returns the offset of that capability. If the capability is
not found, the function returns an error.
interrupt to be used for a device. This is intended solely for internal
use of PCI bus implementations, and exists so that PCI bus drivers
implementing special interrupt assignment methods which require
additional work at the bus level to work right can be easily derived
from the generic driver (or any other one) without resorting to hacks.
It will be used in the sparc64 ofw_pcibus driver, which will be
committed shortly.
Make use of this method in the generic implementation, and add it to
the method table of bus drivers derived from the PCI one.
Reviewed by: imp, -hackers
non-device code.
* Re-implement the method dispatch to improve efficiency. The new system
takes about 40ns for a method dispatch on a 300Mhz PII which is only
10ns slower than a direct function call on the same hardware.
This changes the new-bus ABI slightly so make sure you re-compile any
driver modules which you use.
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.
(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)
This is a checkpoint of work-in-progress, but is quite functional.
The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.
Approved by: core