only in low memory situations, so the error fork of these fixes is
lightly tested, but they should do the least-wrong thing...
Submitted by: Hans Petter Selasky
memory area's base and limit are optional. The low 4-bits of the "low"
prefetchable registers indicates whether or not a 32-bit or 64-bit
region is supported. The PCI-PCI driver had been assuming that all bridges
supported a 64-bit region (and thus the two upper 32-bit registers). Fix
the driver to only use those registers if the low 4-bits of the "low"
registers indicate that a 64-bit region is supported. The PCI-PCI bridge
in the XBox happens to be a bridge that only supports a 32-bit region.
Reported by: rink
MFC after: 1 week
domain, pribus (the primary bus, eg the bus that this chip is on),
secbus (the secondary bus, eg the bus immediately behind this chip)
and subbus (the number of the highest bus behind this chip).
Normally, this information is reported via bootverbose parameters, but
that's hard to use for debugging in some cases.
This adds reading of pribus to make this happen. In addition, change
the narrow types to u_int to allow for easier reporting via sysctl for
domain, secbus and subbus. This should have no effect, but if it
does, please let me know.
pci_add_map(). First, this condition is already handled earlier in
the function. Second, as written the check would never fire as the
'start' value was overwritten with a long value (rman_get_start() returns
long) before the comparison was done.
Discussed with: imp
MFC after: 2 weeks
for a PCI device during the boot-time probe of the parent PCI bus, then
zero the BAR and clear the resource list entry for that BAR. This forces
the PCI bus driver to request a valid resource range from the parent bridge
driver when the device driver tries to allocate the BAR. Similarly, if the
initial value of a BAR is a valid range but it is > 4GB and the current OS
only has 32-bit longs, then do a full teardown of the initial value of the
BAR to force a reallocation.
Reviewed by: imp
MFC after: 1 week
used but MSI to HyperTransport IRQ mapping is enabled, and would act as
if MSI is turned on, resulting in interrupt loss.
This commit will,
1. enable MSI mapping on a device only when MSI is enabled for that
device and the MSI address matches the HT mapping window.
2. enable MSI mapping on a bridge only when a downstream device is
allocated an MSI address in the mapping window
PR: kern/118842
Reviewed by: jhb
MFC after: 1 week
PCI-express chipset (and thus has functional MSI) if there are any
PCI-express devices in the system, not requiring a root port device.
With PCI-X the chipset detection has to be very conservative because there
are known systems with PCI-X devices that do not appear to have PCI-X
chipsets. However, with PCI-express I'm not sure it is possible to have
a PCI-express device in a system with a non-PCI-express chipset. If we
assume that is the case then this change is valid. It is also required
for at least some PCI-express systems that don't have any devices with
a root port capability (some ICH9 systems).
MFC after: 1 week
Reported by: jfv
but reread it from the device_t every time the device list is fetched.
Previously the device name in pciconf -l would not be updated when a driver
was unloaded or if a device was detached and attached to a different
driver.
MFC after: 1 week
PR: kern/104777
Submitted by: "Iasen Kostoff" tbyte | otel net
- Use the correct offsets when copying out the results of PCIOCGETCONF_OLD.
This happened to not affect the 64-bit architectures because there the
addition of pc_domain to struct pcisel didn't change the overall size of
struct pci_conf. [1]
- Always copy the name and unit information to conf_old so it's also part
of the output once this information is cached in dinfo.
- Use the correct type for flags in struct pci_match_conf_old. This
change is more or less cosmetic though.
Reported and tested by: bde [1]
Reviewed by: imp
MFC after: 3 days
Committed from: 24C3
- Implement timing out of VPD register access.[1]
- Fix an off-by-one error of freeing malloc'd space when checksum is invalid.
- Fix style(9) bugs, i.e., sizeof cannot be followed by space.
- Retire now obsolete 'hw.pci.enable_vpd' tunable.
Submitted by: cokane (initial revision)[1]
Reviewed by: marius (intermediate revision)
Silence from: jhb, jmg, rwatson
Tested by: cokane, jkim
MFC after: 3 days
the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken
with the introduction of PCI domain support.
As the size of struct pci_conf_io wasn't changed with that commit,
this unfortunately requires the ABI of PCIOCGETCONF to be broken
again in order to be able to provide backwards compatibility to
the old version of that IOCTL.
Requested by: imp
Discussed with: re (kensmith)
Reviewed by: PCI maintainers (imp, jhb)
MFC after: 5 days
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
the duration of the function. The device we would otherwise
have left in an useless state may just as well be the low-level
console. When booting verbose, we do need it addressable if we
want to avoid a MCA.
Approved by: re (kensmith)
device's, not the bridge's, softc to be used to check the
PCIB_DISABLE_MSI flag. This resulted in randomly allowing
or denying MSI interrupts based on whatever value the driver
happened to store at sizeof(device_t) bytes into its softc.
I noticed this when I stopped getting MSI interrupts
after slighly re-arranging mxge's softc yesterday.
the power_nodriver tunable is off. pci_cfg_save() already checks the
tunable internally, and no other callers of pci_cfg_save() check the
tunable.
Reviewed by: imp
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
that the MSI mapping window is fixed at 0xfee00000 and the capability
does not include two more dwords used to program the address. Supporting
this mostly results in quieting spurious warnings during boot about
non-default MSI mapping windows.
- HT 2.00b also added a new HT capability type, so support that in pciconf.
MFC after: 3 days
Tested by: jmg
it via pci_get_vpd_*() rather than always reading it for each device during
boot. I've left the tunable so that it can still be turned off if a device
driver causes a lockup via a query to a broken device, but devices whose
drivers do not use VPD (the vast majority) should no longer result in
lockups during boot, and most folks should not need to tweak the tunable
now.
Tested on: bge(4)
Silence from: jmg
blacklist a bunch of old chipsets. If a system contains a PCI-PCI bridge
that supports PCI-X, assume the chipset supports PCI-X. If a system
contains a PCI-express root port, assume the chipset supports PCI-express.
If the chipset doesn't support either PCI-X or PCI-express, then blacklist
it by default. We should now only need to explicitly blacklist PCI-X or
PCI-express chipsets that don't properly handle MSI.
broke the method as all the MSI-X table indices were off by one in
the backend MD code.
- Fix a cosmetic nit in the bootverbose printf in pci_alloc_msix_method().
tunable allowing automatic parsing of VPD data to be disabled. The
default is left as-is; if you are having problems with hard hangs at boot
due to VPD, try setting hw.pci.enable_vpd=0. A proper architectural
solution has been under discussion for some time, but this allows me to
boot my test machines in the mean time.
Submitted by: bz
Head nod: jmg
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
capability rather than hardcoded offsets for a particular card. While
I'm here, expand the constants some.
- Change the ahd(4) driver to use pci_find_extcap() to locate the PCI-X
capability to keep up with the first change.
Reviewed by: scottl, gibbs (earlier version)
- Retire the PCI_SUB*_1 constants and don't try to read a subvendor ID out
of them. There isn't a standard subvendor ID field for PCI-PCI bridges.
Instead, the dword at offset 0x34 is actually mostly reserved except for
the LSB which is the capabilities pointer.
- Add support for the PCI-PCI bridge subvendor ID capability (13) and use
it to set the subvendor ID for PCI-PCI bridges.
MFC after: 1 month
bridge if it doesn't pass MSI messages up correctly. We set the flag
in pcib_attach() if the device ID is disabled via a PCI quirk.
- Disable MSI for devices behind the AMD 8131 HT-PCIX bridge. Linux has
the same quirk.
Tested by: no one despite repeated calls for testers
work:
- A new PCI quirk (PCI_QUIRK_DISABLE_MSI) is added to the quirk table.
- A new pci_msi_device_blacklisted() determines if a passed in device
matches an MSI quirk in the quirk table. This can be overridden (all
quirks ignored) by setting the hw.pci.honor_msi_blacklist to 0.
- A global blacklist check is performed in the MI PCI bus code by checking
to see if the device at 0:0:0 is blacklisted.
Tested by: jdp
subtypes of HT capabilities.
- Add constants for the MSI mapping window HT PCI capability.
- On i386 and amd64, enable the MSI mapping window on any HT bridges we
encounter and report any non-standard mapping window addresses.
- Add 3 new functions to the pci_if interface along with suitable wrappers
to provide the device driver visible API:
- pci_alloc_msi(dev, int *count) backed by PCI_ALLOC_MSI(). '*count'
here is an in and out parameter. The driver stores the desired number
of messages in '*count' before calling the function. On success,
'*count' holds the number of messages allocated to the device. Also on
success, the driver can access the messages as SYS_RES_IRQ resources
starting at rid 1. Note that the legacy INTx interrupt resource will
not be available when using MSI. Note that this function will allocate
either MSI or MSI-X messages depending on the devices capabilities and
the 'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. Also note
that the driver should activate the memory resource that holds the
MSI-X table and pending bit array (PBA) before calling this function
if the device supports MSI-X.
- pci_release_msi(dev) backed by PCI_RELEASE_MSI(). This function
releases the messages allocated for this device. All of the
SYS_RES_IRQ resources need to be released for this function to succeed.
- pci_msi_count(dev) backed by PCI_MSI_COUNT(). This function returns
the maximum number of MSI or MSI-X messages supported by this device.
MSI-X is preferred if present, but this function will honor the
'hw.pci.enable_msix' and 'hw.pci.enable_msi' tunables. This function
should return the largest value that pci_alloc_msi() can return
(assuming the MD code is able to allocate sufficient backing resources
for all of the messages).
- Add default implementations for these 3 methods to the pci_driver generic
PCI bus driver. (The various other PCI bus drivers such as for ACPI and
OFW will inherit these default implementations.) This default
implementation depends on 4 new pcib_if methods that bubble up through
the PCI bridges to the MD code to allocate IRQ values and perform any
needed MD setup code needed:
- PCIB_ALLOC_MSI() attempts to allocate a group of MSI messages.
- PCIB_RELEASE_MSI() releases a group of MSI messages.
- PCIB_ALLOC_MSIX() attempts to allocate a single MSI-X message.
- PCIB_RELEASE_MSIX() releases a single MSI-X message.
- Add default implementations for these 4 methods that just pass the
request up to the parent bus's parent bridge driver and use the
default implementation in the various MI PCI bridge drivers.
- Add MI functions for use by MD code when managing MSI and MSI-X
interrupts:
- pci_enable_msi(dev, address, data) programs the MSI capability address
and data registers for a group of MSI messages
- pci_enable_msix(dev, index, address, data) initializes a single MSI-X
message in the MSI-X table
- pci_mask_msix(dev, index) masks a single MSI-X message
- pci_unmask_msix(dev, index) unmasks a single MSI-X message
- pci_pending_msix(dev, index) returns true if the specified MSI-X
message is currently pending
- Save the MSI capability address and data registers in the pci_cfgreg
block in a PCI devices ivars and restore the values when a device is
resumed. Note that the MSI-X table is not currently restored during
resume.
- Add constants for MSI-X register offsets and fields.
- Record interesting data about any MSI-X capability blocks we come
across in the pci_cfgreg block in the ivars for PCI devices.
Tested on: em (i386, MSI), bce (amd64/i386, MSI), mpt (amd64, MSI-X)
Reviewed by: scottl, grehan, jfv
MFC after: 2 months
only those bars that had addresses assigned by the BIOS and where the
bridges were properly programmed. Now even unprogrammed ones work.
This was needed for sun4v. We still only implement up to 2GB memory
ranges, even for 64-bit bars. PCI standards at least through 2.2 say
that this is the max (or 1GB is, I only know it is < 32bits).
o Always define pci_addr_t as uint64_t. A pci address is always 64-bits,
but some hosts can't address all of them.
o Preserve the upper half of the 64-bit word during resource probing.
o Test to make sure that 64-bit values can fit in a u_long (true on some
platforms, but not others). Don't use those that can't.
o minor pedantry about data sizes.
o Better bridge resource reporting in bootverbose case.
o Minor formatting changes to cope with different data types on different
platforms.
Submitted by: jmg, with many changes by me to fully support 64-bit
addresses.
If the length is zero, catch this early, instead of making dflen go negative
and letting bad things happen... We also check to see if RV (checksum) is
0, and handle that has a checksum failure...
Properly handle checksum failures by not processing read-write VPD data,
and removing all the found read-only data...
Tested by: oleg (dflen going negative)
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.
Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.
Lower the minimum for memory mapped I/O from 32 bytes to 16 bytes.
This fixes bus enumeration on ia64 now that the Diva auxiliary
serial port is attached to.
capability is present as not all devices supported by the agp_i810 driver
(such as i915) have the AGP capability. Instead, add an identify routine
to the agp_i810 driver that uses the PCI ID to determine if it should
create an agp child device.
share devclass pointers, a mistake I've encouraged in the past) and
move the declaration of the pci_driver kobj class from cardbus.c to
pci_private.h so that other drivers can inherit from pci_driver.
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.
force allocation of unallocated BARs (cardbus uses this to preallocate
everything). Add a prefetchmask to allow for busses that get prefetch
hints to set them. Addjust pci_add_map and pci_ata_maps to take a new
force flag which pci_add_resources will pass in. Implement 'force' in
pci_add_map. Write new value of allocated resource into the bar, if
the allocation succeeded (we should have done this before, but with
the new force the bug was very obvious).
as a bus so that other drivers such as drm(4), acpi_video(4), and agp(4)
can attach to it thus allowing multiple drivers for the same device. It
also removes the need for the drmsub hack for the i8[13]0/i915 drm and agp
drivers.
duplicated anyways) and into a single MI driver. Extend the driver a bit
to implement the bus and PCI kobj interfaces such that other drivers can
attach to it and transparently act as if their parent device is the PCI
bus (for the most part).
to search for a specific extended capability. If the specified capability
is found for the given device, then the function returns success and
optionally returns the offset of that capability. If the capability is
not found, the function returns an error.
actual resource values we received from the system rather than the range
we requested. Since we request a range starting at 0, we would record
that number. Later, since this == 0, we'd allocate again. However,
we wouldn't write the new resource into the BAR. This resulted in
a resource leak as well as a BAR that couldn't access the resource at
all since rman_get_start, et al, were wrong.
MFC After: 1 week (assuming RELENG_6 is open for business)
of the PCIR_HDRTYPE register. It's the value returned from this
read access that determines whether or not we decide a device is
present at the current slot index. For some reason that I can't
adequately explain, this read fails on my machine when probing the
USB controller on my machine (which happens a multifunction device
at slot index 3 hung off the PCI-PCI bridge on the AMD8111 (bus
index 1)). The read will return 0xFF even though it should return
0x80 to indicate the presence of a multifunction device.
As near as I can tell, there's some timing issue involved with reading
the 'dead' slot indexes 0 through 2 that causes the read of the actual
device at slot 3 to fail. I tried a couple of different tricks to
correct the problem (the patch to amd64/pci/pci_cfgreg.c fixes it
for the amd64 arch), but adding this delay is the only thing that
always allows the USB controllers to be correctly probed 100% of the
time. Whatever the problem is, it's likely confined to the AMD8111
chipset. However, a simple 1us delay is fairly harmless and should
have no side effects for other hardware. I consider this to be
voodoo, but it's fairly benign voodoo and it makes my USB keyboard
and mouse work again.
Note that this is the second time that I've had to resort to a
1us delay to fix a PCI-related problem with this AMD8111/Opteron
system (the first being a fix I made a while back to the NDISulator).
It's possible the delay really belongs in the cfgreg code itself,
or that pci_cfgreg needs some custom hackery for an errata in the
8111. (I checked but couldn't find any documented errata on AMD's
site that could account for these problems.)
routing, etc. in a static pci_assign_interrupt() function.
- Add a sledgehammer that allows the user to override the interrupt
assignment of any PCI device via a tunable (e.g. "hw.pci0.7.INTB=5" would
force any functions on the pci device in slot 7 of bus 0 that use B# to
use IRQ 5). This should be used with great caution! Generally, if the
interrupt routing in use provides specific tunables (such as hard-wiring
the IRQ for a given $PIR or ACPI PCI link device), then those should be
used instead. One instance where this tunable might be useful is if a
box has an MPTable with duplicate entries for the same PCI device with
different IRQs.
MFC after: 1 week
the Intel 82371AB PCI-ISA bridge. We now do this all the time for the
!APIC case in the atpic driver. This cuts the raw line count for this
driver by about 40%.
MFC after: 1 week
has been removed. It has been replaced by hw.pci.do_power_nodriver
and hw.pci.do_power_resume. The former defaults to 0 while the latter
defaults to 1.
When do_powerstate was set to 0, it broke suspend/resume for a lot of
people as an unintended consequence. This change will only affect the
areas that were intended to affect. This change will have no effect on
servers, but will help laptops quite a bit.
MFC After: 3 days.
same as today: do no power management. 1 means be conservative about
what you power down (any device class that has caused problems gets
added here). 2 means be agressive about what gets powered down (any
device class that's fundamental to the system is here). 3 means power
them all down, reguardless. The default is 1.
The effect in the default system is to add mass storage devices to the
list that we don't power down. From all the pciconf -l lists that
I've seen for the aac and amr issue, the bad device has been a mass
storage device class.
This is an attempt at a compromise between the very small number of
systems that have extreme issues with powerdown, and the very large
number of systems that gain real benefits from powerdown (I get about
20% more battery life when I attach a minimal set of drivers on my
Sony). Hopefully it will strike the proper balance.
MFC After: 3 days (before next beta)
return the correct bar size if we encountered a 64-bit BAR that had
its resources already assigned. If the resources weren't yet
assigned, we'd bogusly assume it was a 32-bit bar and return 1.
not exsist, do not have ioctl return an error, but instead set -1
in the data returned to the user. This allows the HP bios flash
utilities to work without requiring changes to their code.
Reviewed by: jhb
against 0 in pci_alloc_map, just like we do in pci_add_map. Also,
make sure that we restore the value to the BAR that was there before
if the bar is 0. Chances are that it was 0 before the write too and
that the restoration is a nop, but better safe than sorry.
Notice by: dwhite
we are processing has a base address of zero. Note that this will only
change behavior for devices where all the BARs of a given type have a base
address of 0 since we will enable the appropriate access when we encounter
the first BAR with a base that is not 0. Specifically, this allows certain
Toshiba laptops to no longer require 'hw.pci.enable_io_modes=0' to avoid
hangs during boot.
PR: kern/20040
PR: i386/63776 (possibly)
PR: i386/68900 (possibly)
PR: i386/74532 (possibly)
MFC after: 1 week
theoretically unload pci bridges or pci drivers. It will also allow
detach to work if one needed to detach a subtree.
This is inspired by looking at the p4 commits from bms to his 5.4
tree, but I didn't look at the final results.
for the VGA I/O or memory ranges, when it's not within the default
ranges decoded by the bridge. When allocation for VGA addresses is
attempted, check that the bridge has the VGA Enable bit set before
allowing it.
As such, newbusified VGA drivers can allocate their resources when
the VGA adapter is behind a PCI-to-PCI bridge.
Reviewed by: imp@, jhb@
printf's during a verbose boot is more intuitive (the BAR listings and
interrupt routing info now comes after the config header dump rather than
just before it).
Otherwise, busses that implement the pcib interface that forget to
implement pcib_route_interrupt would return EIO, which the caller
interprets as 'use interrupt 6'. This is likely the cause of much of
the grief that we had when I enabled power modes for the cardbus
bridge, since the card needed to reroute the interrupt to it and it
was getting 6 which was d by the pccbb sanity checks.
last in the list rather than first.
This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This
also means that the pci code will once again print the resources in BAR
ascending order.
like a valid range. We already do this in the memory case (although
the code there is somewhat different than the I/o case because we have
to deal with different kinds of memory). Since most laptops don't
have non-subtractive bridges, this wasn't seen in practice.
Evidentally the Compaq R3000 hits this problem with PC Cards.
Some minor style fixes while I'm here.
Submitted by: Jung-uk Kim
suggested by Peter Edwards. This seems to fix my fxp problems and
likely will fix his as well. Use DELAY rather than *sleep because we
can be called from any context.
the PCI bus. We presently have no drivers for these devices, so they
are powered down. This is undesirable behavior since it breaks the
system when the base peripherals go away suddenly in the middle of
boot.
# if we ever get generic drivers for memory and/or base peripherals, then
# we can remove the tests here.
back on again in resume. Override the default of D3 with the value the
BIOS specifies in _SxD, if present. Skip serial devices (PNP05xx) since
they seem to hang when set to D3 and may require special driver support.
Also, skip non-type 0 PCI devices (i.e., bridges) since our we don't yet
save/restore their config space and that seems to be necessary.
If this gives you trouble with suspend/resume, you can disable the new
ACPI and PCI power behavior separately with these tunables & sysctls:
debug.acpi.do_powerstate
hw.pci.do_powerstate
Approved by: imp (pci)
Tested by: acpi@ (numerous)
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.
MFC after: 1 month
Inspired by: kris
PCI native addressing. That means that if the HW says that using "real"
addresses instead of the hardwired legacy compat ones is allowed, we will
use them.
in the various pci specifications as readonly. vendor, subvendor,
device and subdevice are required to be loaded in hardware by some
means that isn't the system BIOS or other system software (although
some devices do have ways of accomplishing this). class and subclass
are defined to be read-only in section 6.2.1 (v2.2). Apart from the
status register, which we weren't touching, these are the only
read-only registers I could find in the 2.2 spec.
progif is also defined as being read-only in section 6.2.1. However,
the PCI IDE programming document specifically states that some of the
bits are read/write. Since we may have to restore registers before we
have a driver attached, go ahead and restore this one byte when
transitioning between D3 and D0.
The PCI spec also says that writes to reserved and unimplemented
registers must be completed normally. It makes no statements about
writes to read-only registers, so be as conservative as possible,
while covering the exception to the rule that is documented in a
subpart of the standard.
Requested by: socttl
Split the baby. For idepci devices, now both legacy mode bits need
not be set. We can run an idepci in a split mode. However, it only
works better than before, not works. It works better in that when one
device is legacy and the other isn't and disabled, we now operate
correctly.
sos submitted a version of this patch.
subclass, progif and revid. While these are typically read
only fields, they aren't always read-only. progif is writable
for ata devices, for example. It does no harm when they are
read only, and helps when they aren't.
chattiness was left in for debugging, but now that nearly all of the
problems relating to the changes have been fixed, it is only annoying. It
is still available via bootverbose.
Prodded by: jhb
1) In pci.c, we need to check the child device's state, not the parent
device's state.
2) In acpi_pci.c, we have to run the power state change after the acpi
method when the old_state is > new state, not the other way around.
Submitted by: Dmitry Remesov
PR: 65694
resource pre-allocation. The problem is that the BARs of the EBus bridges
contain the ranges for the resources for the EBus devices beyond the bridge.
So when the EBus code tries to allocate the resource for an EBus device
it's already allocated by the PCI code.
To be removed again as soon as we have a proper solution in the EBus Code.
Reviewed by: tmm
Approved by: marcel (mentor)
While I would have prefered to have a solution that didn't move
knowledge of this into the pci layer. However, this is literally the
only exception that's listed in the PCI standard to the usual way of
decoding BARs. atapci devices in legacy mode now ignore the first 4
bars and hard code the values to the legacy ide values (well, for each
of the controllers that are in legacy mode). The 5th bar is handled
normally.
Remove the zero bar handling. zero bars should be ignored at all
other times, and since we handle that specially, we don't need the
older workaround.
the sense that any write to them reads back as a 0. This presents a
problem to our resource allocation scheme. If we encounter such vars,
the code now treats them as special, allowing any allocation against
them to succeed. I've not seen anything in the standard to clearify
what host software should do when it encounters these sorts of BARs.
Also cleaned up some output while I'm here and add commmented out
bootverbose lines until I'm ready to reduce the verbosity of boot
messages.
This gets a number of south bridges and ata controllers made mostly by
VIA, AMD and nVidia working again. Thanks to Soren Schmidt for his
help in coming up with this patch.
we get the resource allocation stuff hammered out.
Fix and off by one error that caused unnecessary filtering of valid
BARs for only 4 bytes than ICH3 and other PCI IDE controllers have.
Andrew Gallatin submitted this, although it doesn't solve the problems
ICH3 controllers have with the new code, it does restore the former
resource list on the probe line.
device in D0 to D0, that's a no-op, however the messages seem to be
confusing some people. Eventually, these messages will be parked
behind a if (bootverbose).
# I don't think this will fix any real bugs...
o Save and restore bars for suspend/resume as well as for D3->D0
transitions.
o preallocate resources that the PCI devices use to avoid resource
conflicts
o lazy allocation of resources not allocated by the BIOS.
o set unattached drivers to state D3. Set power state to D0
before probe/attach. Right now there's two special cases
for this (display and memory devices) that need work in other
areas of the tree.
Please report any bugs to me.
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.
Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
pain and suffering. Attempt to back it out by removing the 'if the
requested range is larger than the window, clip to the window' code.
This is a band-aide until the issues are better understood and the
issues with the lazy allocation patches are resolved.
signals to addresses to the child busses. Typically, ProgIf of 1
means a subtractive bridge. However, Intel has a whole lot of ones
with a ProgIf of 80 that are also subtractive. We cope with these
bridges too. This eliminates hw.pci.allow_unsupported_io_range
because that had almost the same effect as these patches (almost means
'buggy'). Remove the bogus checks for ISA bus locations: these cycles
aren't special and are only passed by transparent bridges.
We allow any range to succeed. If the range is a superset of the
range that's decoded, trim the resource to that range. Otherwise,
pass the range unchanged. This will change the location that PC Card
and CardBus cards are attached. This might bogusly cause some
overlapping allocation that wasn't present before, but the overlapping
fixes need to be in the pci level.
There's also a few formatting changes here.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.
parameter in the read and write case dereferenced an unitialized
pointer and can't possibly ever have catched an actual invalid
argument.
This was apparently true for the read/write and getconf cases. The
latter does not even receive the paramter that is to be verified.
I'm surprised that this did not cause kernel panics, but it seems
that the uninitialized local variable happens to contain data that
may be used as a pointer to memory that satisfies the test condition.
Make the code work as intended by moving the test inside the switch
case where the pointer has been properly initialized.
Since the read and write case shared just about all code (except
for the single call to PCIB_READ_CONFIG resp. PCIB_WRITE_CONFIG) I
have merged both cases.
Noticed by: trhodes@FreeBSD.org (Tom Rhodes)
and replace it with the more intuitive name PCIR_BARS.
- Add a PCIR_BAR(x) macro that returns the config space register offset of
the 32-bit BAR x.
MFC after: 3 days