- Implement timing out of VPD register access.[1]
- Fix an off-by-one error of freeing malloc'd space when checksum is invalid.
- Fix style(9) bugs, i.e., sizeof cannot be followed by space.
- Retire now obsolete 'hw.pci.enable_vpd' tunable.
Submitted by: cokane (initial revision)[1]
Reviewed by: marius (intermediate revision)
Silence from: jhb, jmg, rwatson
Tested by: cokane, jkim
MFC after: 3 days
the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken
with the introduction of PCI domain support.
As the size of struct pci_conf_io wasn't changed with that commit,
this unfortunately requires the ABI of PCIOCGETCONF to be broken
again in order to be able to provide backwards compatibility to
the old version of that IOCTL.
Requested by: imp
Discussed with: re (kensmith)
Reviewed by: PCI maintainers (imp, jhb)
MFC after: 5 days
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
the duration of the function. The device we would otherwise
have left in an useless state may just as well be the low-level
console. When booting verbose, we do need it addressable if we
want to avoid a MCA.
Approved by: re (kensmith)
device's, not the bridge's, softc to be used to check the
PCIB_DISABLE_MSI flag. This resulted in randomly allowing
or denying MSI interrupts based on whatever value the driver
happened to store at sizeof(device_t) bytes into its softc.
I noticed this when I stopped getting MSI interrupts
after slighly re-arranging mxge's softc yesterday.
the power_nodriver tunable is off. pci_cfg_save() already checks the
tunable internally, and no other callers of pci_cfg_save() check the
tunable.
Reviewed by: imp
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
that the MSI mapping window is fixed at 0xfee00000 and the capability
does not include two more dwords used to program the address. Supporting
this mostly results in quieting spurious warnings during boot about
non-default MSI mapping windows.
- HT 2.00b also added a new HT capability type, so support that in pciconf.
MFC after: 3 days
Tested by: jmg
it via pci_get_vpd_*() rather than always reading it for each device during
boot. I've left the tunable so that it can still be turned off if a device
driver causes a lockup via a query to a broken device, but devices whose
drivers do not use VPD (the vast majority) should no longer result in
lockups during boot, and most folks should not need to tweak the tunable
now.
Tested on: bge(4)
Silence from: jmg
blacklist a bunch of old chipsets. If a system contains a PCI-PCI bridge
that supports PCI-X, assume the chipset supports PCI-X. If a system
contains a PCI-express root port, assume the chipset supports PCI-express.
If the chipset doesn't support either PCI-X or PCI-express, then blacklist
it by default. We should now only need to explicitly blacklist PCI-X or
PCI-express chipsets that don't properly handle MSI.
broke the method as all the MSI-X table indices were off by one in
the backend MD code.
- Fix a cosmetic nit in the bootverbose printf in pci_alloc_msix_method().
tunable allowing automatic parsing of VPD data to be disabled. The
default is left as-is; if you are having problems with hard hangs at boot
due to VPD, try setting hw.pci.enable_vpd=0. A proper architectural
solution has been under discussion for some time, but this allows me to
boot my test machines in the mean time.
Submitted by: bz
Head nod: jmg
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
capability rather than hardcoded offsets for a particular card. While
I'm here, expand the constants some.
- Change the ahd(4) driver to use pci_find_extcap() to locate the PCI-X
capability to keep up with the first change.
Reviewed by: scottl, gibbs (earlier version)
- Retire the PCI_SUB*_1 constants and don't try to read a subvendor ID out
of them. There isn't a standard subvendor ID field for PCI-PCI bridges.
Instead, the dword at offset 0x34 is actually mostly reserved except for
the LSB which is the capabilities pointer.
- Add support for the PCI-PCI bridge subvendor ID capability (13) and use
it to set the subvendor ID for PCI-PCI bridges.
MFC after: 1 month
bridge if it doesn't pass MSI messages up correctly. We set the flag
in pcib_attach() if the device ID is disabled via a PCI quirk.
- Disable MSI for devices behind the AMD 8131 HT-PCIX bridge. Linux has
the same quirk.
Tested by: no one despite repeated calls for testers
work:
- A new PCI quirk (PCI_QUIRK_DISABLE_MSI) is added to the quirk table.
- A new pci_msi_device_blacklisted() determines if a passed in device
matches an MSI quirk in the quirk table. This can be overridden (all
quirks ignored) by setting the hw.pci.honor_msi_blacklist to 0.
- A global blacklist check is performed in the MI PCI bus code by checking
to see if the device at 0:0:0 is blacklisted.
Tested by: jdp
subtypes of HT capabilities.
- Add constants for the MSI mapping window HT PCI capability.
- On i386 and amd64, enable the MSI mapping window on any HT bridges we
encounter and report any non-standard mapping window addresses.
- Add 3 new functions to the pci_if interface along with suitable wrappers
to provide the device driver visible API:
- pci_alloc_msi(dev, int *count) backed by PCI_ALLOC_MSI(). '*count'
here is an in and out parameter. The driver stores the desired number
of messages in '*count' before calling the function. On success,
'*count' holds the number of messages allocated to the device. Also on
success, the driver can access the messages as SYS_RES_IRQ resources
starting at rid 1. Note that the legacy INTx interrupt resource will
not be available when using MSI. Note that this function will allocate
either MSI or MSI-X messages depending on the devices capabilities and
the 'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. Also note
that the driver should activate the memory resource that holds the
MSI-X table and pending bit array (PBA) before calling this function
if the device supports MSI-X.
- pci_release_msi(dev) backed by PCI_RELEASE_MSI(). This function
releases the messages allocated for this device. All of the
SYS_RES_IRQ resources need to be released for this function to succeed.
- pci_msi_count(dev) backed by PCI_MSI_COUNT(). This function returns
the maximum number of MSI or MSI-X messages supported by this device.
MSI-X is preferred if present, but this function will honor the
'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. This function
should return the largest value that pci_alloc_msi() can return
(assuming the MD code is able to allocate sufficient backing resources
for all of the messages).
- Add default implementations for these 3 methods to the pci_driver generic
PCI bus driver. (The various other PCI bus drivers such as for ACPI and
OFW will inherit these default implementations.) This default
implementation depends on 4 new pcib_if methods that bubble up through
the PCI bridges to the MD code to allocate IRQ values and perform any
needed MD setup code needed:
- PCIB_ALLOC_MSI() attempts to allocate a group of MSI messages.
- PCIB_RELEASE_MSI() releases a group of MSI messages.
- PCIB_ALLOC_MSIX() attempts to allocate a single MSI-X message.
- PCIB_RELEASE_MSIX() releases a single MSI-X message.
- Add default implementations for these 4 methods that just pass the
request up to the parent bus's parent bridge driver and use the
default implementation in the various MI PCI bridge drivers.
- Add MI functions for use by MD code when managing MSI and MSI-X
interrupts:
- pci_enable_msi(dev, address, data) programs the MSI capability address
and data registers for a group of MSI messages
- pci_enable_msix(dev, index, address, data) initializes a single MSI-X
message in the MSI-X table
- pci_mask_msix(dev, index) masks a single MSI-X message
- pci_unmask_msix(dev, index) unmasks a single MSI-X message
- pci_pending_msix(dev, index) returns true if the specified MSI-X
message is currently pending
- Save the MSI capability address and data registers in the pci_cfgreg
block in a PCI devices ivars and restore the values when a device is
resumed. Note that the MSI-X table is not currently restored during
resume.
- Add constants for MSI-X register offsets and fields.
- Record interesting data about any MSI-X capability blocks we come
across in the pci_cfgreg block in the ivars for PCI devices.
Tested on: em (i386, MSI), bce (amd64/i386, MSI), mpt (amd64, MSI-X)
Reviewed by: scottl, grehan, jfv
MFC after: 2 months
only those bars that had addresses assigned by the BIOS and where the
bridges were properly programmed. Now even unprogrammed ones work.
This was needed for sun4v. We still only implement up to 2GB memory
ranges, even for 64-bit bars. PCI standards at least through 2.2 say
that this is the max (or 1GB is, I only know it is < 32bits).
o Always define pci_addr_t as uint64_t. A pci address is always 64-bits,
but some hosts can't address all of them.
o Preserve the upper half of the 64-bit word during resource probing.
o Test to make sure that 64-bit values can fit in a u_long (true on some
platforms, but not others). Don't use those that can't.
o minor pedantry about data sizes.
o Better bridge resource reporting in bootverbose case.
o Minor formatting changes to cope with different data types on different
platforms.
Submitted by: jmg, with many changes by me to fully support 64-bit
addresses.
If the length is zero, catch this early, instead of making dflen go negative
and letting bad things happen... We also check to see if RV (checksum) is
0, and handle that has a checksum failure...
Properly handle checksum failures by not processing read-write VPD data,
and removing all the found read-only data...
Tested by: oleg (dflen going negative)
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.
Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
Lower the minimum for memory mapped I/O from 32 bytes to 16 bytes.
This fixes bus enumeration on ia64 now that the Diva auxiliary
serial port is attached to.
capability is present as not all devices supported by the agp_i810 driver
(such as i915) have the AGP capability. Instead, add an identify routine
to the agp_i810 driver that uses the PCI ID to determine if it should
create an agp child device.
share devclass pointers, a mistake I've encouraged in the past) and
move the declaration of the pci_driver kobj class from cardbus.c to
pci_private.h so that other drivers can inherit from pci_driver.
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.
force allocation of unallocated BARs (cardbus uses this to preallocate
everything). Add a prefetchmask to allow for busses that get prefetch
hints to set them. Addjust pci_add_map and pci_ata_maps to take a new
force flag which pci_add_resources will pass in. Implement 'force' in
pci_add_map. Write new value of allocated resource into the bar, if
the allocation succeeded (we should have done this before, but with
the new force the bug was very obvious).
as a bus so that other drivers such as drm(4), acpi_video(4), and agp(4)
can attach to it thus allowing multiple drivers for the same device. It
also removes the need for the drmsub hack for the i8[13]0/i915 drm and agp
drivers.
duplicated anyways) and into a single MI driver. Extend the driver a bit
to implement the bus and PCI kobj interfaces such that other drivers can
attach to it and transparently act as if their parent device is the PCI
bus (for the most part).
to search for a specific extended capability. If the specified capability
is found for the given device, then the function returns success and
optionally returns the offset of that capability. If the capability is
not found, the function returns an error.
actual resource values we received from the system rather than the range
we requested. Since we request a range starting at 0, we would record
that number. Later, since this == 0, we'd allocate again. However,
we wouldn't write the new resource into the BAR. This resulted in
a resource leak as well as a BAR that couldn't access the resource at
all since rman_get_start, et al, were wrong.
MFC After: 1 week (assuming RELENG_6 is open for business)
of the PCIR_HDRTYPE register. It's the value returned from this
read access that determines whether or not we decide a device is
present at the current slot index. For some reason that I can't
adequately explain, this read fails on my machine when probing the
USB controller on my machine (which happens a multifunction device
at slot index 3 hung off the PCI-PCI bridge on the AMD8111 (bus
index 1)). The read will return 0xFF even though it should return
0x80 to indicate the presence of a multifunction device.
As near as I can tell, there's some timing issue involved with reading
the 'dead' slot indexes 0 through 2 that causes the read of the actual
device at slot 3 to fail. I tried a couple of different tricks to
correct the problem (the patch to amd64/pci/pci_cfgreg.c fixes it
for the amd64 arch), but adding this delay is the only thing that
always allows the USB controllers to be correctly probed 100% of the
time. Whatever the problem is, it's likely confined to the AMD8111
chipset. However, a simple 1us delay is fairly harmless and should
have no side effects for other hardware. I consider this to be
voodoo, but it's fairly benign voodoo and it makes my USB keyboard
and mouse work again.
Note that this is the second time that I've had to resort to a
1us delay to fix a PCI-related problem with this AMD8111/Opteron
system (the first being a fix I made a while back to the NDISulator).
It's possible the delay really belongs in the cfgreg code itself,
or that pci_cfgreg needs some custom hackery for an errata in the
8111. (I checked but couldn't find any documented errata on AMD's
site that could account for these problems.)
routing, etc. in a static pci_assign_interrupt() function.
- Add a sledgehammer that allows the user to override the interrupt
assignment of any PCI device via a tunable (e.g. "hw.pci0.7.INTB=5" would
force any functions on the pci device in slot 7 of bus 0 that use B# to
use IRQ 5). This should be used with great caution! Generally, if the
interrupt routing in use provides specific tunables (such as hard-wiring
the IRQ for a given $PIR or ACPI PCI link device), then those should be
used instead. One instance where this tunable might be useful is if a
box has an MPTable with duplicate entries for the same PCI device with
different IRQs.
MFC after: 1 week
the Intel 82371AB PCI-ISA bridge. We now do this all the time for the
!APIC case in the atpic driver. This cuts the raw line count for this
driver by about 40%.
MFC after: 1 week
has been removed. It has been replaced by hw.pci.do_power_nodriver
and hw.pci.do_power_resume. The former defaults to 0 while the latter
defaults to 1.
When do_powerstate was set to 0, it broke suspend/resume for a lot of
people as an unintended consequence. This change will only affect the
areas that were intended to affect. This change will have no effect on
servers, but will help laptops quite a bit.
MFC After: 3 days.
same as today: do no power management. 1 means be conservative about
what you power down (any device class that has caused problems gets
added here). 2 means be agressive about what gets powered down (any
device class that's fundamental to the system is here). 3 means power
them all down, reguardless. The default is 1.
The effect in the default system is to add mass storage devices to the
list that we don't power down. From all the pciconf -l lists that
I've seen for the aac and amr issue, the bad device has been a mass
storage device class.
This is an attempt at a compromise between the very small number of
systems that have extreme issues with powerdown, and the very large
number of systems that gain real benefits from powerdown (I get about
20% more battery life when I attach a minimal set of drivers on my
Sony). Hopefully it will strike the proper balance.
MFC After: 3 days (before next beta)
return the correct bar size if we encountered a 64-bit BAR that had
its resources already assigned. If the resources weren't yet
assigned, we'd bogusly assume it was a 32-bit bar and return 1.
not exsist, do not have ioctl return an error, but instead set -1
in the data returned to the user. This allows the HP bios flash
utilities to work without requiring changes to their code.
Reviewed by: jhb
against 0 in pci_alloc_map, just like we do in pci_add_map. Also,
make sure that we restore the value to the BAR that was there before
if the bar is 0. Chances are that it was 0 before the write too and
that the restoration is a nop, but better safe than sorry.
Notice by: dwhite
we are processing has a base address of zero. Note that this will only
change behavior for devices where all the BARs of a given type have a base
address of 0 since we will enable the appropriate access when we encounter
the first BAR with a base that is not 0. Specifically, this allows certain
Toshiba laptops to no longer require 'hw.pci.enable_io_modes=0' to avoid
hangs during boot.
PR: kern/20040
PR: i386/63776 (possibly)
PR: i386/68900 (possibly)
PR: i386/74532 (possibly)
MFC after: 1 week
theoretically unload pci bridges or pci drivers. It will also allow
detach to work if one needed to detach a subtree.
This is inspired by looking at the p4 commits from bms to his 5.4
tree, but I didn't look at the final results.
for the VGA I/O or memory ranges, when it's not within the default
ranges decoded by the bridge. When allocation for VGA addresses is
attempted, check that the bridge has the VGA Enable bit set before
allowing it.
As such, newbusified VGA drivers can allocate their resources when
the VGA adapter is behind a PCI-to-PCI bridge.
Reviewed by: imp@, jhb@
printf's during a verbose boot is more intuitive (the BAR listings and
interrupt routing info now comes after the config header dump rather than
just before it).
Otherwise, busses that implement the pcib interface that forget to
implement pcib_route_interrupt would return EIO, which the caller
interprets as 'use interrupt 6'. This is likely the cause of much of
the grief that we had when I enabled power modes for the cardbus
bridge, since the card needed to reroute the interrupt to it and it
was getting 6 which was d by the pccbb sanity checks.
last in the list rather than first.
This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This
also means that the pci code will once again print the resources in BAR
ascending order.
like a valid range. We already do this in the memory case (although
the code there is somewhat different than the I/o case because we have
to deal with different kinds of memory). Since most laptops don't
have non-subtractive bridges, this wasn't seen in practice.
Evidentally the Compaq R3000 hits this problem with PC Cards.
Some minor style fixes while I'm here.
Submitted by: Jung-uk Kim
suggested by Peter Edwards. This seems to fix my fxp problems and
likely will fix his as well. Use DELAY rather than *sleep because we
can be called from any context.
the PCI bus. We presently have no drivers for these devices, so they
are powered down. This is undesirable behavior since it breaks the
system when the base peripherals go away suddenly in the middle of
boot.
# if we ever get generic drivers for memory and/or base peripherals, then
# we can remove the tests here.
back on again in resume. Override the default of D3 with the value the
BIOS specifies in _SxD, if present. Skip serial devices (PNP05xx) since
they seem to hang when set to D3 and may require special driver support.
Also, skip non-type 0 PCI devices (i.e., bridges) since our we don't yet
save/restore their config space and that seems to be necessary.
If this gives you trouble with suspend/resume, you can disable the new
ACPI and PCI power behavior separately with these tunables & sysctls:
debug.acpi.do_powerstate
hw.pci.do_powerstate
Approved by: imp (pci)
Tested by: acpi@ (numerous)
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.
MFC after: 1 month
Inspired by: kris
PCI native addressing. That means that if the HW says that using "real"
addresses instead of the hardwired legacy compat ones is allowed, we will
use them.
in the various pci specifications as readonly. vendor, subvendor,
device and subdevice are required to be loaded in hardware by some
means that isn't the system BIOS or other system software (although
some devices do have ways of accomplishing this). class and subclass
are defined to be read-only in section 6.2.1 (v2.2). Apart from the
status register, which we weren't touching, these are the only
read-only registers I could find in the 2.2 spec.
progif is also defined as being read-only in section 6.2.1. However,
the PCI IDE programming document specifically states that some of the
bits are read/write. Since we may have to restore registers before we
have a driver attached, go ahead and restore this one byte when
transitioning between D3 and D0.
The PCI spec also says that writes to reserved and unimplemented
registers must be completed normally. It makes no statements about
writes to read-only registers, so be as conservative as possible,
while covering the exception to the rule that is documented in a
subpart of the standard.
Requested by: socttl
Split the baby. For idepci devices, now both legacy mode bits need
not be set. We can run an idepci in a split mode. However, it only
works better than before, not works. It works better in that when one
device is legacy and the other isn't and disabled, we now operate
correctly.
sos submitted a version of this patch.
subclass, progif and revid. While these are typically read
only fields, they aren't always read-only. progif is writable
for ata devices, for example. It does no harm when they are
read only, and helps when they aren't.
chattiness was left in for debugging, but now that nearly all of the
problems relating to the changes have been fixed, it is only annoying. It
is still available via bootverbose.
Prodded by: jhb
1) In pci.c, we need to check the child device's state, not the parent
device's state.
2) In acpi_pci.c, we have to run the power state change after the acpi
method when the old_state is > new state, not the other way around.
Submitted by: Dmitry Remesov
PR: 65694
resource pre-allocation. The problem is that the BARs of the EBus bridges
contain the ranges for the resources for the EBus devices beyond the bridge.
So when the EBus code tries to allocate the resource for an EBus device
it's already allocated by the PCI code.
To be removed again as soon as we have a proper solution in the EBus Code.
Reviewed by: tmm
Approved by: marcel (mentor)
While I would have prefered to have a solution that didn't move
knowledge of this into the pci layer. However, this is literally the
only exception that's listed in the PCI standard to the usual way of
decoding BARs. atapci devices in legacy mode now ignore the first 4
bars and hard code the values to the legacy ide values (well, for each
of the controllers that are in legacy mode). The 5th bar is handled
normally.
Remove the zero bar handling. zero bars should be ignored at all
other times, and since we handle that specially, we don't need the
older workaround.
the sense that any write to them reads back as a 0. This presents a
problem to our resource allocation scheme. If we encounter such vars,
the code now treats them as special, allowing any allocation against
them to succeed. I've not seen anything in the standard to clearify
what host software should do when it encounters these sorts of BARs.
Also cleaned up some output while I'm here and add commmented out
bootverbose lines until I'm ready to reduce the verbosity of boot
messages.
This gets a number of south bridges and ata controllers made mostly by
VIA, AMD and nVidia working again. Thanks to Soren Schmidt for his
help in coming up with this patch.
we get the resource allocation stuff hammered out.
Fix and off by one error that caused unnecessary filtering of valid
BARs for only 4 bytes than ICH3 and other PCI IDE controllers have.
Andrew Gallatin submitted this, although it doesn't solve the problems
ICH3 controllers have with the new code, it does restore the former
resource list on the probe line.
device in D0 to D0, that's a no-op, however the messages seem to be
confusing some people. Eventually, these messages will be parked
behind a if (bootverbose).
# I don't think this will fix any real bugs...
o Save and restore bars for suspend/resume as well as for D3->D0
transitions.
o preallocate resources that the PCI devices use to avoid resource
conflicts
o lazy allocation of resources not allocated by the BIOS.
o set unattached drivers to state D3. Set power state to D0
before probe/attach. Right now there's two special cases
for this (display and memory devices) that need work in other
areas of the tree.
Please report any bugs to me.
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.
Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
pain and suffering. Attempt to back it out by removing the 'if the
requested range is larger than the window, clip to the window' code.
This is a band-aide until the issues are better understood and the
issues with the lazy allocation patches are resolved.
signals to addresses to the child busses. Typically, ProgIf of 1
means a subtractive bridge. However, Intel has a whole lot of ones
with a ProgIf of 80 that are also subtractive. We cope with these
bridges too. This eliminates hw.pci.allow_unsupported_io_range
because that had almost the same effect as these patches (almost means
'buggy'). Remove the bogus checks for ISA bus locations: these cycles
aren't special and are only passed by transparent bridges.
We allow any range to succeed. If the range is a superset of the
range that's decoded, trim the resource to that range. Otherwise,
pass the range unchanged. This will change the location that PC Card
and CardBus cards are attached. This might bogusly cause some
overlapping allocation that wasn't present before, but the overlapping
fixes need to be in the pci level.
There's also a few formatting changes here.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.
parameter in the read and write case dereferenced an unitialized
pointer and can't possibly ever have catched an actual invalid
argument.
This was apparently true for the read/write and getconf cases. The
latter does not even receive the paramter that is to be verified.
I'm surprised that this did not cause kernel panics, but it seems
that the uninitialized local variable happens to contain data that
may be used as a pointer to memory that satisfies the test condition.
Make the code work as intended by moving the test inside the switch
case where the pointer has been properly initialized.
Since the read and write case shared just about all code (except
for the single call to PCIB_READ_CONFIG resp. PCIB_WRITE_CONFIG) I
have merged both cases.
Noticed by: trhodes@FreeBSD.org (Tom Rhodes)
and replace it with the more intuitive name PCIR_BARS.
- Add a PCIR_BAR(x) macro that returns the config space register offset of
the 32-bit BAR x.
MFC after: 3 days
- Add a new PCIM_HDRTYPE constant for the field in PCIR_HDRTYPE that holds
the header type.
- Replace several magic numbers with appropriate constants for the header
type register and a couple of PCI_FUNCMAX.
- Merge to amd64 the fix to the i386 bridge code to skip devices with
unknown header types.
Requested by: imp (1, 2)
- Factor out code common to all ISA bridge drivers attach methods into a
isab_attach() function.
- Rename the PCI-ISA bridge driver's attach function to pci_isab_attach()
and have it call isab_attach().
interrupt to be used for a device. This is intended solely for internal
use of PCI bus implementations, and exists so that PCI bus drivers
implementing special interrupt assignment methods which require
additional work at the bus level to work right can be easily derived
from the generic driver (or any other one) without resorting to hacks.
It will be used in the sparc64 ofw_pcibus driver, which will be
committed shortly.
Make use of this method in the generic implementation, and add it to
the method table of bus drivers derived from the PCI one.
Reviewed by: imp, -hackers
are some Sun PCI devices around which bogusly set intpin to 0, although
they use the intline mechanism; this allows the device driver to correct
that.
Reviewed by: imp
read permision only required for listing, read/write required for
read/write to registers
fix a possible memory leak
clean up error handling a bit
Reviewed by: silence
Reorder how the pci probing in handled. before adding devices, check to
see if the slot is a multi-function device to see if we should probe all
the functions.
Original idea by: imp
ia64-specific.
- When trying to re-route interrupts, don't change cfg->intline if the
re-route fails by returning an invalid vector. This fixes machines
without any way of routing interrupts such as older PC's without a
$PIR table.
We do not currently write the new intline value back to the hardware, but
we should. That will likely be added in a later commit.
Always route PCI interrupts on i386 UP machines. I was planning to enable
this for i386 anyways once SMP support is done. Having this enabled fixes
problems on many people's laptops.
Requested by: imp
(with other names) in the USB driver sources, but I felt that pcireg.h
should have a complete list - at least of classes and interfaces that we
know about and use.
command config register. At the present, this represents a nop
because these bits should have been set earlier in the process. In
the future, we'll only set these bits when the driver requests the
resource, not when the bus code detects the resource.
Reviewed by: mdodd
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
pci busses implement this.
Also minor comment smithing in cardbus. Fix copyright to this year
with my name on it since I've been doing a lot to this file.
Reviewed by: jhb
We allow the request to go through if it matches either a prefetchable
or a non-prefetchable part of the bridge. We do not check to make
sure it is the right kind of memory because most drivers to not yet
properly set RF_PREFETCHABLE (only cardbus seems to do so, and I'm not
entirely sure it does it right). RF_PREFETCHABLE was invented for
cardbus, so hasn't been properly documented yet.
This is still overridable by hw.pci.allow_unsupported_io_ranges, but
the need for that is greatly reduced, especially for the nvida driver.
Approved by: re
Reviewed by: jhb and many testers
Submitted by: Matt Emmerton (although this has been reworked somewhat)
buses support querying the MAC address in a standard-for-that-bus way.
The base pci bus returns NULL for this IVAR always.
Submitted by: sam
Approved by: re (blanket for NEWCARD)
- If a PCI device is not present, then a 32-bit read_config() is going to
return 0xffffffff not 0xffff.
- For the 82454NX chipset, the MIOC that we read the bus numbers of the
various host-PCI bridges from is at function (slot) 0x10 not 0x0.
Approved by: re (rwatson)
across system suspends on the Intel 82371AB PCI-ISA bridge. On a
Sony Vaio C1XD that I have, these registers are not set correctly
after an ACPI resume. The result is that after resuming, a shared
IRQ is left in edge-triggered mode so the interrupt can later become
jammed in a state where the line remains asserted, but the handler
is never called.
Reviewed by: jhb
supported option and it disabled a whole 2 lines of bootverbose messages.
I wanted to see 1 of the messages (about the latency timers). This
is a wrong place to decode pci configurations, but the code is already
here and handles more details than pciconf(8).
when the first PCI bus attaches.
- Create /dev/pci during MOD_LOAD as well.
- Destroy /dev/pci during MOD_UNLOAD (not that you can kldunload pci, but
might as well get the code right)
- Add an ACPI PCI-PCI bridge driver (the previous driver just handled
Host-PCI bridges) that is a PCI driver that is a subclass of the generic
PCI-PCI bridge driver. It overrides probe, attach, read_ivar, and
pci_route_interrupt.
- The probe routine only succeeds if our parent is an ACPI PCI bus which
we test for by seeing if we can read our ACPI_HANDLE as an ivar.
- The attach routine saves a copy of our handle and calls the new
acpi_pcib_attach_common() function described below.
- The read_ivar routine handles normal PCI-PCI bridge ivars and adds an
ivar to return the ACPI_HANDLE of the bus this bridge represents.
- The route_interrupt routine fetches the _PRT (PCI Interrupt Routing
Table) from the bridge device's softc and passes it off to
acpi_pcib_route_interrupt() to route the interrupt.
- Split the old ACPI Host-PCI bridge driver into two pieces. Part of
the attach routine and most of the route_interrupt routine remain in
acpi_pcib.c and are shared by both ACPI PCI bridge drivers.
- The attach routine verifies the PCI bridge is present, reads in
the _PRT for the bridge, and attaches the child PCI bus.
- The route_interrupt routine uses the passed in _PRT to route a PCI
interrupt.
The rest of the driver is the ACPI Host-PCI bridge specific bits that
live in acpi_pcib_acpi.c.
- We no longer duplicate pcib_maxslots but use it directly.
- The driver now uses the pcib devclass instead of its own devclass.
This means that PCI busses are now only children of pcib devices.
- Allow the ACPI_HANDLE for the child PCI bus to be read as an ivar
of the child bus.
- Fetch the _PRT for routing PCI interrupts directly from our softc
instead of walking the devclass to find ourself and then fetch our
own softc.
With this change and the new ACPI PCI bus driver, ACPI can now properly
route interrupts for devices behind PCI-PCI bridges. That is, the
Itanium2 with like 10 PCI busses can now boot ok and route all the PCI
interrupts. Hopefully this will also fix problems people are having with
CardBus bridges behind PCI-PCI bridges not properly routing interrupts
when ACPI is used.
Tested on: i386, ia64
OOP speak, you would mark these as 'protected' members. Specifically:
- Make the pcib_softc struct public so it can be used by subclasses.
- Make pcib_{read,write}_ivar(), pcib_alloc_resource(), pcib_maxslots(),
and pcib_{read,write}_config() globals that can be used by subclasses.
- Make the pcib devclass a global variable.
- Move most of the pcib_attach() function into a global
pcib_attach_common() function that can be called by the attach routines
of subclasses.
Tested on: i386, alpha, sparc64, ia64
- Make the pci devclass a global variable.
- Add child devices in pci_attach() instead of pci_probe(). Change
pci_probe() to just check for a valid bus number from the associated
bridge and return -1000 if successful. This allows subclasses of the
PCI bus driver to override the generic driver.
- Move the code to load the vendor data into its own public function.
Really though, doing this at attach is just plain wrong. This should
really be done in the module load routine instead. As a side effect,
the 'busno' variable in pci_attach() is now no longer static (minor
bug that was harmless so far.)
- Change pci_add_children() to take an extra argument that is the size of
the device info structure passed to pci_read_device() and make it public
so subclasses of the PCI bus can call it in their attach routines.
- Move the bits to attach a probed PCI child to a PCI bus into a global
pci_add_child() function. This will allow subclasses that can detect
a PCI device not found in the normal PCI probe to add those devices in
their own attach routine. (I have seen this in the ACPI tree on my
laptop for example.) As a side effect, change the static function
pci_add_resources() to get the busno, slot, and func from the passed
in dinfo structure instead of requiring them as function arguments.
Tested on: i386, alpha, ia64, sparc64