It seems Killer E2200/E2400 has a BIOS misconfiguration or silicon
bug which triggers DMA write errors when driver uses advertised
maximum payload size. Force the maximum payload size to 128 bytes
in DMA configuration.
This change should fix occasional DMA write errors reported on
Killer E2200.
Tested by: <psy0nic@sys-tek.org>
- Read interrupt properties at bus enumeration time and store
it into global mapping table.
- At bus_activate_resource() time, given mapping entry is resolved and
connected to real interrupt source. A copy of mapping entry is attached
to given resource.
- At bus_setup_intr() time, mapping entry stored in resource is used
for delivery of requested interrupt configuration.
- For MSI/MSIX interrupts, mapping entry is created within
pci_alloc_msi()/pci_alloc_msix() call.
- For legacy PCI interrupts, mapping entry must be created within
pcib_route_interrupt() by pcib driver itself.
Reviewed by: nwhitehorn, andrew
Differential Revision: https://reviews.freebsd.org/D7493
Some devices report that they have an MRL when they actually
do not. Since they always report that the MRL is open, child
devices would be ignored. Try to detect these devices and
ignore their claim of HotPlug support. Specifically,
if there is an open MRL but the Data Link Layer is active,
the MRL is not real.
Revert r303645 to re-enable HotPlug support for slots with
power controllers, since it works correctly in my testing.
Start the DLL state-change timer if Presence /or/ MRL state changes,
along with other conditions. Previously, we started the timer iff
Presence changed. If there is an MRL, it must be closed for power
to be turned on, so Presence is unlikely to change on an MRL-close event.
Add a printf() of interesting registers on HotPlug interrupts and
commands (one from erj@). These were very useful for debugging.
Guard them with bootverbose, since they're spam in normal operation.
In collaboration with: jhb
Reviewed by: jhb
MFC after: 1 day
Relnotes: yes (re-enable HotPlug support for slots with power controllers)
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7509
Several files use the internal name of `struct device` instead of
`device_t` which is part of the public API. This patch changes all
`struct device *` to `device_t`.
The remaining occurrences of `struct device` are those referring to the
Linux or OpenBSD version of the structure, or the code is not built on
FreeBSD and it's unclear what to do.
Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version)
Approved by: emaste, jhibbits, sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7447
Previously the loop in PCIIOCGETCONF would terminate as soon as it
found enough matches. Now it will continue iterating through the
PCI device list and only terminate if it finds another matching device
for which it has no room to store a conf structure. This means that
PCI_GETCONF_LAST_DEVICE is reliably returned when the number of
matching devices is equal to the number of slots in the matches
buffer. For example, if a program requests the conf structure for a
single PCI function with a specified domain/bus/slot/function it will
now get PCI_GETCONF_LAST_DEVICE instead of PCI_GETCONF_MORE_DEVS.
While here, simplify the loop conditional a bit more by explicitly
breaking out of the loop if copyout() fails and removing a redundant
i < pci_numdevs check.
Reviewed by: vangyzen, imp
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7445
The interpretation of the Electromechanical Interlock Status was
inverted, so we disengaged the EI if a card was inserted.
Fix it to engage the EI if a card is inserted.
When displaying the slot capabilites/status with pciconf:
- We inverted the sense of the Power Controller Control bit,
saying the power was off when it was really on (according to
this bit). Fix that.
- Display the status of the Electromechanical Interlock:
EI(engaged)
EI(disengaged)
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7426
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs. By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.
Reviewed by: rstone
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7400
After further review of the spec, I do not think the current HotPlug
code handles slots with power controllers correctly. In particular,
the power state of the slot is to be inferred from other events, not
from examining the state of the power control bit in SLOT_CTL. For now,
disable PCI hotplug support on such slots.
PR: 211081
Tested by: Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
MFC after: 3 days
Some systems and/or devices (such as riser cards) do not include a
non-compliant implementation of PCI-e HotPlug that can result in devices
not being attached (e.g. the HotPlug code might assume that a card is
being unplugged and will power the slot off and detach it). This
tunable can be set to 0 to disable support for PCI-e HotPlug ignoring
the incorrect HotPlug state on these slots.
PR: 211081
Reported by: Sergey Renkas <serg_ic@mail.ru> (SuperMicro X7 riser card)
Reported by: Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
(Intel X520 adapter)
MFC after: 1 week
Relnotes: yes
this to be the case. This will mean we don't try and handle the cache in
bus_dmamap_sync when it is not needed.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6605
tested on the Pass 1.1 and 2.0 ThunderX machines in the Netperf cluster.
Reviewed by: jhb
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6453
- Add a pcib_detach() function for the PCI-PCI bridge driver. It
tears down the NEW_PCIB and hotplug state including destroying
resource managers, deleting child devices, and disabling hotplug
events.
- Add a detach method to the ACPI PCI-PCI bridge driver which calls
pcib_detach() and then frees the copy of the _PRT interrupt routing
table.
- Add a detach method to the PCI-Cardbus bridge driver which frees
the PCI bus resources in addition to calling cbb_detach().
- Explicitly clear any pending hotplug events during attach to ensure
future events will generate an interrupt.
- If a the Command Completed bit is set in the slot status register
when the command completion timeout fires, treat it as if the
command completed and the completion interrupt was just lost rather
than forcing a detach.
- Don't wait for a Command Completed notification if Command Completion
interrupts are disabled. The spec explicitly says no interrupt is
enabled when clearing CCIE, and on my T400 no interrupt is generated
when CCIE is changed from cleared to set, either. In addition, the
T400 doesn't appear to set the Command Completed bit in the cases
where it doesn't generate an interrupt, so don't schedule the timer
either. (If the CC bit were always set, one could always set the timer
and rely on the logic of treating CC set as a missed interrupt.)
Reviewed by: imp (older version)
Differential Revision: https://reviews.freebsd.org/D6424
Previously the command completion interrupt would post any pending
command immediately before pcib_pcie_hotplug_update() had been
run to inspect the current status. Now, the command completion
interrupt merely clears the flag and stops the timer assuming that
the caller is always going to call pcib_pcie_hotplug_update() to
generate the next hotplug command if one is needed.
While here, fix a bug for systems with command completion where the
old (existing) value was written to the slot control register instead
of the new value. This fixes the complaint about a missing hotplug
interrupt on my T400.
Differential Revision: https://reviews.freebsd.org/D6363
translate the pci rid to a controller ID. The translation could be based
on the 'msi-map' OFW property, a similar ACPI option, or hard-coded for
hardware lacking the above options.
Reviewed by: wma
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Add a new get_id interface to pci and pcib. This will allow us to both
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone (previous version)
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
interface with 5 methods to mirror the 5 MSI/MSI-X methods in the pcib
interface. The pcib driver will need to perform a device specific lookup
to find the MSI controller and pass this to intrng as the xref. Intrng
will finally find the controller and have it handle the requested operation.
Obtained from: ABT Systems Ltd
MFH: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5985
which leads to end being before start and thus a signed extended very large
number of size later on, which kva_alloc() will fail upon and we will panic.
Add the missing call.
Debugged with: andrew
Reviewed by: br, andrew
Sponsored by: DARPA/AFRL
Found: while using virtio with gem5
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6337
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
When devctl was added, the location string for PCI devices was changed to
use the PCI "selector" that pciconf and devctl accept. However, devd
assumes that location strings are formatted as a list of name=value pairs.
As a result, devd is no longer parsing any of the values out of PCI
device events. Restore the previous format of the PCI location strings
to restore the location and slot keywords in case any devd scripts are
using this. Add the "selector" as a new 'dbsf' location variable.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D6253
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.
This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.
The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).
These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.
PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.
Reviewed by: adrian, imp, vangyzen (older versions)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6136
Save the value of the IOV control and page size registers and restore
them (along with the VF count) in pci_cfg_save/pci_cfg_restore. This
ensures ARI remains enabled if a PF driver resets itself during the
PCI_IOV_INIT callback. This might also properly restore SRIOV state
across suspend/resume.
Reviewed by: rstone, vangyzen
Differential Revision: https://reviews.freebsd.org/D6192
While here, check if ARI was enabled by re-reading the config register
after writing it and return an error if the write fails.
Reviewed by: rstone, vangyzen
pci_remap_msix() can be used to alter the mapping of allocated
MSI-X vectors to the MSI-X table. The code had an off by one error
when adding the IRQ resources after performing a remap. This was
fatal for any vectors in the table that used the "last" valid IRQ as
those vectors were assigned a garbage IRQ value.
MFC after: 3 days
This allows the PCI-PCI bridge driver to save a reference to the child
device in its softc.
Note that this required moving the "pci" device creation out of
acpi_pcib_attach(). Instead, acpi_pcib_attach() is renamed to
acpi_pcib_fetch_prt() as it's sole action now is to fetch the PCI
interrupt routing table.
Differential Revision: https://reviews.freebsd.org/D6021
Rescanning a PCI bus uses the following steps:
- Fetch the current set of child devices and save it in the 'devlist'
array.
- Allocate a parallel array 'unchanged' initalized with NULL pointers.
- Scan the bus checking each slot (and each function on slots with a
multifunction device).
- If a valid function is found, look for a matching device in the 'devlist'
array. If a device is found, save the pointer in the 'unchanged' array.
If a device is not found, add a new device.
- After the scan has finished, walk the 'devlist' array deleting any
devices that do not have a matching pointer in the 'unchanged' array.
- Finally, fetch an updated set of child devices and explicitly attach any
devices that are not present in the 'unchanged' array.
This builds on the previous changes to move subclass data management into
pci_alloc_devinfo(), pci_child_added(), and bus_child_deleted().
Subclasses of the PCI bus use custom rescan logic explicitly override the
rescan method to disable rescans.
Differential Revision: https://reviews.freebsd.org/D6018
This is a trivial follow-up to r296308. Annotate the intentional fallthrough
to make it clear for future readers and linters.
Reported by: Coverity
CID: 1352716
Discussed with: jhb
Sponsored by: EMC / Isilon Storage Division
The ACPI and OFW PCI bus drivers as well as CardBus override this to
allocate the larger ivars to hold additional info beyond the stock PCI ivars.
This removes the need to pass the size to functions like pci_add_iov_child()
and pci_read_device() simplifying IOV and bus rescanning implementations.
As a result of this and earlier changes, the ACPI PCI bus driver no longer
needs its own device_attach and pci_create_iov_child methods but can use
the methods in the stock PCI bus driver instead.
Differential Revision: https://reviews.freebsd.org/D5891
Instead of providing a wrapper around device_delete_child() that the PCI
bus and child bus drivers must call explicitly, move the bulk of the logic
from pci_delete_child() into a bus_child_deleted() method
(pci_child_deleted()). This allows PCI devices to be safely deleted via
device_delete_child().
- Add a bus_child_deleted method to the ACPI PCI bus which clears the
device_t associated with the corresponding ACPI handle in addition to
the normal PCI bus cleanup.
- Change cardbus_detach_card to call device_delete_children() and move
CardBus-specific delete logic into a new cardbus_child_deleted() method.
- Use device_delete_child() instead of pci_delete_child() in the SRIOV code.
- Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which
frees the OpenFirmware device info for each PCI device.
Reviewed by: imp
Tested on: amd64 (CardBus and PCI-e hotplug)
Differential Revision: https://reviews.freebsd.org/D5831
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit. This extends rman's resources to uintmax_t. With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).
Why uintmax_t and not something machine dependent, or uint64_t? Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures. 64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead. That being said, uintmax_t was chosen for source
clarity. If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros. Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.
Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Special thanks to bz for his testing on ARM.
Reviewed By: bz, jhb (previous)
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
Summary:
The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a
32-bit platform, will leave the upper 32 bits as 0. The maximum range of a
resource is 0xFFF.... (all bits of the full type set). By dropping the 'ul'
suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit
platforms gets it to a type-independent 'unsigned max'.
Reviewed By: cem
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5255
Fix the boundary limit to end at the end of the region and not one beyond (1).
Diagnosed by: andrew (1)
Reviewed by: andrew, br
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D5493
On some platforms, BAR entries are hardcoded and must not be accessed
using standard method. Add functionality to identify this situation
and configure the bus based on Enhanced Allocation structure.
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D5242
Most calls to bus_alloc_resource() use "anywhere" as the range, with a given
count. Migrate these to use the new bus_alloc_resource_anywhere() API.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D5370
If Enhanced Allocation is not used, we can't allocate any random
range. All internal devices have hardcoded place where they can
be located within PCI address space. Fortunately, we can read
this value from BAR.
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D5455
* provided OFW interface for pci_host_generic (for handling devices which are present in DTS under the PCI node)
* removed support for internal PCI from arm64/cavium
* cleaned up and made most of the code common
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D5261
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
Reviewed By: jhb
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
mv_pci driver omitted slot 0, which can be valid device on Armada38x.
New mechanism detects if device is root link, basing on vendor's
and device's IDs.
It is restricted to Armada38x; on other machines, behaviour remains
the same.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4377
While here, explicitly note the requirement that the BAR(s) must be
allocated prior to calling pci_alloc_msix().
Reviewed by: andrew, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D4688
* Use the interrupt-map property to route interrupts
* Remove the IRQ rman, it's now unneeded
* Support MSI/MSI-X interrupts
With this I'm able to use the two NICs I've tested (em and msk), however
while I can boot with an AHCI devie attached it fails when any drives are
connected.
Obtained from: ABT Systems Ltd
Sponsored by: SoftIron Inc
bridges. Currently this includes information about what resources a
bridge decodes on the upstream side for use by downstream devices including
bus numbers, I/O port resources, and memory resources. Windows and bus
ranges are enumerated for both PCI-PCI bridges and PCI-CardBus bridges.
To simplify the implementation, all enumeration is done by reading the
appropriate config space registers directly rather than querying the
bridge driver in the kernel via new ioctls. This does result in a few
limitations.
First, an unimplemented window in a PCI-PCI bridge cannot be accurately
detected as accurate detection requires writing to the window base
register. That is not safe for pciconf(8). Instead, this assumes that
any window where both the base and limit read as all zeroes is
unimplemented.
Second, the PCI-PCI bridge driver in a tree has a few quirks for
PCI-PCI bridges that use subtractive decoding but do not indicate that
via the progif config register. The list of quirks is duplicated in
pciconf's source.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4171
PCI-Express capability registers (that is, PCI config registers in the
standard PCI config space belonging to the PCI-Express capability
register set).
Note that all of the current PCI-e registers are either 16 or 32-bits,
so only widths of 2 or 4 bytes are supported.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D4088
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
Internal bridges in Cavium ThunderX SoC behave as subtractive,
but they are unable to be identified. Force setting an appropriate
flag.
Reviewed by: emaste, imp
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3277
It is possible that some HW will use different PCI devids,
hence allow to replace the default domain🚌slot:func schema
by implementing and registering custom function.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3118
Work based on Cavium Thunder PCIe driver by Semihalf.
Reviewed by: andrew, jhb
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D2386
(easily) without having to go to other drivers to change the
magical return values. This wouldn't be so bad if there were
proper defines for these constants.
In particular dev/acpica/acpi_pcib_pci.c returns -1000 as the
probe priority and it's expected that this driver gets to
attach over the common PCI bus drivers.
BUS_PROBE_HOOVER is. Drivers like proto(4), when compiled into the
kernel or preloaded, will render your system useless by virtue of
attaching to your PCI busses.
Return BUS_PROBE_GENERIC instead. It's just the next priority up
from BUS_PROBE_HOOVER. No other meaning has been give to its use.
While BUS_PROBE_DEFAULT seems like a better candidate, it's hard
not to think that there must be some reason why these drivers
return -10000 in the first place.
Differential Revision: D2705
buildkernel run.
Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros. It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.
Review: https://reviews.freebsd.org/D2665
Reviewed by: rodrigc
Looked at by: bjk
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Leaf drivers should not import the PCI bus interface to add IOV handling.
Instead, move the IOV client methods to a separate kobj interface.
Differential Revision: https://reviews.freebsd.org/D2584
Reviewed by: rstone
Summary:
The Freescale PCIe Root Complex shows up as a Processor class device, PowerPC
subclass, so the generic PCI code ignores it for a bridge. This adds support
for it.
As part of this, update the Freescale PCI hostbridge driver, to allow probing
beyond the root complex, instead of only allowing "proper" PCI-PCI bridges.
Reviewers: #powerpc, marcel, nwhitehorn
Reviewed By: nwhitehorn
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2442
Relnotes: yes
Change the nvlist_recv() function to take additional argument that
specifies flags expected on the received nvlist. Receiving a nvlist with
different set of flags than the ones we expect might lead to undefined
behaviour, which might be potentially dangerous.
Update consumers of this and related functions and update the tests.
Approved by: pjd (mentor)
Update man page for nvlist_unpack, nvlist_recv, nvlist_xfer, cap_recv_nvlist
and cap_xfer_nvlist.
Reviewed by: AllanJude
Approved by: pjd (mentor)
(type 1 and type 2) as well as leaf devices (type 0). In particular,
this allows the existing PCI bus logic to save and restore capability
registers such as MSI and PCI-express work for bridge devices rather than
requiring that code to be duplicated in bridge drivers. It also means
that bridge drivers no longer need to save and restore basic registers
such as the PCI command register or BARs nor manage powerstates for the
bridge device.
While here, pci_setup_secbus() has been changed to initialize the 'sec'
and 'sub' fields in the 'secbus' structure instead of requiring the pcib
and pccbb drivers to do this in the NEW_PCIB + PCI_RES_BUS case.
Differential Revision: https://reviews.freebsd.org/D2240
Reviewed by: imp, jmg
MFC after: 2 weeks
driver's suspend and resume routines. These have been redundant no-ops
since r214065 changed the PCI bus driver to manage power states for
all devices (including type 1/2 bridge devices) during suspend and resume.
for type 0 devices, not type 1 or 2 bridges. Don't read them for bridge
devices during bus scans and return an error when attempting to read them
as ivars for bridge devices.
A late change to the SR-IOV infrastructure broke passthrough of
VFs. device_set_devclass() was being used to try to force the
ppt driver to attach to the device, but this didn't work because
the DF_FIXEDCLASS flag wasn't being set on the device, so the
ppt driver probe routine would not match when it returned
BUS_NOWILDCARD. Fix this by adding a new device function that
both sets the devclass and sets the DF_FIXEDCLASS flag, and use
that to force the ppt driver to attach to VFs.
Differential Revision: https://reviews.freebsd.org/D2041
Reviewed by: jhb
MFC after: 3 weeks
Pass all SR-IOV configuration to the kernel using an nvlist. The
main benefit that this offers is flexibility. It allows a driver
to accept any number of parameters of any type supported by the
SR-IOV configuration infrastructure with having to make any
changes outside of the driver.
It also offers the user very fine-grained control over the
configuration of the VFs -- if they want, they can have different
configuration applied to every VF.
Differential Revision: https://reviews.freebsd.org/D82
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Add a function that validates that the user-provided SR-IOV
configuration is valid. This includes basic checks that the
structure of the configuration is correct (e.g. all required
configuration nodes are present) as well as validating against
a configuration schema.
The schema validation consists of:
- Ensuring that all required config parameters are present.
- If the schema defines a default value for a parameter,
adding the default value if the parameter is not set.
- Ensuring that no parameters are specified in the config
that are not defined in the schema.
- Ensuring that have the correct type defined in the schema.
- Ensuring that no configuration nodes are present for devices
that do not exist. For example, if 2 VFs are configured,
then we validate that a node called VF-5 does not exist.
Differential Revision: https://reviews.freebsd.org/D81
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
When creating VFs, we must size each SR-IOV BAR on the PF and
allocate a configuous I/O memory window large enough for every VF.
However, the window only needs to be aligned to a boundary equal
to the size of the window for a single VF.
When a VF attempts to allocate an I/O memory resource, we must
intercept the request in the pci driver and pass it off to the
SR-IOV code, which will allocate the correct window from the
pre-allocated memory space for the PF.
Inform the pci driver about the size and address of the BARs on
the VF when the VF is created. This is required by pciconf -b and
bhyve.
Differential Revision: https://reviews.freebsd.org/D78
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
The SR-IOV standard requires VFs to read all-ones when the VID
and DID registers are read. The VMM (hypervisor) is required to
emulate them instead. Make pci_read_config() do this emulation.
Change pci_user.c to use pci_read_config() to read config space
registers instead of going directly to the pcib so that the
emulated VID/DID registers work correctly on VFs. This is
required both for pciconf and bhyve PCI passthrough.
Differential Revision: https://reviews.freebsd.org/D77
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device. An ioctl can be invoked on that device to
create VFs and have the driver initialize them.
At this point, allocating memory I/O windows (BARs) is not
supported.
Differential Revision: https://reviews.freebsd.org/D76
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Refactor PCI resource allocation code to allow a request for a
memory-mapped I/O window that is a multiple of a requested size.
This is needed by the SR-IOV code because the VF BARs are all
allocated contiguously. We can't just allocate a resource that is
a multiple of a single VF BAR because the size of an allocation
implies its alignment requirement.
Differential Revision: https://reviews.freebsd.org/D71
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Refactor creation of PCI devices into helper methods that can be
used by the VF creation code.
Differential Revision: https://reviews.freebsd.org/D67
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
allows the user to request administrative changes to individual devices
such as attach or detaching drivers or disabling and re-enabling devices.
- Add a new /dev/devctl2 character device which uses ioctls for device
requests. The ioctls use a common 'struct devreq' which is somewhat
similar to 'struct ifreq'.
- The ioctls identify the device to operate on via a string. This
string can either by the device's name, or it can be a bus-specific
address. (For unattached devices, a bus address is the only way to
locate a device.) Bus drivers register an eventhandler to claim
unrecognized device names that the driver recognizes as a valid address.
Two buses currently support addresses: ACPI recognizes any device
in the ACPI namespace via its full path starting with "\" and
the PCI bus driver recognizes an address specification of
'pci[<domain>:]<bus>:<slot>:<func>' (identical to the PCI selector
strings supported by pciconf).
- To make it easier to cut and paste, change the PnP location string
in the PCI bus driver to output a full PCI selector string rather
than 'slot=<slot> function=<func>'.
- Add a devctl(3) interface in libdevctl which provides a wrapper around
the ioctls and is the preferred interface for other userland code.
- Add a devctl(8) program which is a simple wrapper around the requests
supported by devctl(3).
- Add a device_is_suspended() function to check DF_SUSPENDED.
- Add a resource_unset_value() function that can be used to remove a
hint from the kernel environment. This is used to clear a
hint.<driver>.<unit>.disabled hint when re-enabling a boot-time
disabled device.
Reviewed by: imp (parts)
Requested by: imp (changing PCI location string)
Relnotes: yes
for the lookup.
- For devices affected by PCI_QUIRK_MSI_INTX_BUG, ensure PCIM_CMD_INTxDIS
is cleared when using MSI/MSI-X.
- Employ PCI_QUIRK_MSI_INTX_BUG for BCM5714(S)/BCM5715(S)/BCM5780(S) rather
than clearing PCIM_CMD_INTxDIS unconditionally for all devices in bge(4).
MFC after: 3 days
state said device should go into.
This was a snafu introduced in the ACPI/PCI awareness separation.
When putting a device into a power state, the bus (and thus firmware,
eg ACPI) should be asked before hand to check whether the device
can indeed go into that power state.
There's a set of nodes in ACPI under each device - the _SxD nodes - which
state which ACPI power state to put the device into when the system is
going into power save state 'x'. So when going into S3, the existence
of an _S3D node would override whatever the system was trying to do.
By default the PCI code wants to put devices into D3 before suspending.
I have a laptop here (Asus Zenbook - check the PR) whose EHCI controller
really wants to be in D2 during suspend, not D3. So if we put it into
D3 and then try to enter S3, everything hangs. The device itself
can go into D3 - it just can't be there when the call to ACPI to enter
S3 occurs. The PCI patch fixes this.
jkim@ noticed that the same is needed for the ACPI child device
enumeration.
Thankyou to Matt Dillon (the programmer, not the actor) for buying me
this particular laptop so I could debug the issues with the Atheros
AR9485 that is in it. It's his fault that I ended up with this
laptop and was sufficiently annoyed by the lack of USB suspend
to go down this rabbit hole.
Tested:
* Thinkpad T400
* Thinkpad X230
* Thinkpad T42
* Thinkpad T60
* Asus Zenbook (see PR)
* Asus EEEPC 701
* Asus EEEPC 1001PX
TODO:
* Figure out what we should do about devices we unload drivers for
that want to be in a specific state when entering S3 / S4 -
the "put devices into D3 if they're not bound to a driver" option
may also mess with things.
PR: kern/194884
Reviewed by: jhb, jkim
MFC after: 1 week
Relnotes: yes
Sponsored by: Matt Dillon <dillon@apollo.backplane.com> (hardware)
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe
* Add a bus_if.m method - get_domain() - returning the VM domain or
ENOENT if the device isn't in a VM domain;
* Add bus methods to print out the domain of the device if appropriate;
* Add code in srat.c to save the PXM -> VM domain mapping that's done and
expose a function to translate VM domain -> PXM;
* Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute
and if so map it to the VM domain;
* (.. yes, this works recursively.)
* Have the pci bus glue print out the device VM domain if present.
Note: this is just the plumbing to start enumerating information -
it doesn't at all modify behaviour.
Differential Revision: D906
Reviewed by: jhb
Sponsored by: Norse Corp
Summary:
Add the beginnings of multipass suspend/resume, by introducing
BUS_SUSPEND_CHILD/BUS_RESUME_CHILD, and move the PCI driver to this.
Reviewers: jhb
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D590
This is needed so when running under Xen the calls to pci_child_added
can be intercepted and a custom Xen method can be used to register
those devices with Xen. This should not include any functional
change, since the Xen implementation will be added in a following
patch and the native implementation is a noop.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
dev/pci/pci.c:
dev/pci/pci_if.m:
dev/pci/pci_private.h:
dev/pci/pcivar.h:
- Add the pci_child_added newbus method.
Make the functions pci_disable_msi, pci_enable_msi and pci_enable_msix
methods of the newbus PCI bus. This code should not include any
functional change.
Sponsored by: Citrix Systems R&D
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D354
dev/pci/pci.c:
- Convert the mentioned functions to newbus methods.
- Fix the callers of the converted functions.
sys/dev/pci/pci_private.h:
dev/pci/pci_if.m:
- Declare the new methods.
dev/pci/pcivar.h:
- Add helpers to call the newbus methods.
ofed/include/linux/pci.h:
- Add define to prevent the ofed version of pci_enable_msix from
clashing with the FreeBSD native version.
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Ensure that first_func is set to 0 on every iteration of the PCI slot
enumeration loop after the first. There is a continue statement that would
cause first_func to stay at 1 any PCI device where slot 0 has no functions
until we find a slot that does have a function. This would cause us to
not enumerate the first PCI function on the device.
Credit to markj@ for spotting the bug.
X-MFC-With: r264011
PCIe Alternate RID Interpretation (ARI) is an optional feature that
allows devices to have up to 256 different functions. It is
implemented by always setting the PCI slot number to 0 and
re-purposing the 5 bits used to encode the slot number to instead
contain the function number. Combined with the original 3 bits
allocated for the function number, this allows for 256 functions.
This is enabled by default, but it's expected to be a no-op on currently
supported hardware. It's a prerequisite for supporting PCI SR-IOV, and
I want the ARI support to go in early to help shake out any bugs in it.
ARI can be disabled by setting the tunable hw.pci.enable_ari=0.
Reviewed by: kib
MFC after: 2 months
Sponsored by: Sandvine Inc.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it. I may have accidentally applied a patch to a non-clean
working tree. Revert everything while I figure out what went wrong.
Pointy hat to: rstone
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
PCI domain/segment. Host-PCI bridge drivers use this API to allocate
bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
full range of bus numbers from secbus to subbus from their parent bridge.
The drivers also always program their primary bus register. The bridge
drivers also support growing their bus range by extending the bus resource
and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.
Reviewed by: imp
MFC after: 1 month
are mostly useful for debugging.
- hw.pci.clear_bars ignores all firmware-assigned ranges for BARs when
set.
- hw.pci.clear_pcib ignores all firmware-assigned ranges for PCI-PCI
bridge I/O windows when set.
MFC after: 1 week
- Store the length of each read-only VPD value since not all values are
guaranteed to be ASCII values (though most are).
- Add a new pciio ioctl to fetch VPD for a single PCI device. The values
are returned as a list of variable length records, one for the device
name and each keyword.
- Add a new -V flag to pciconf's list mode which displays VPD data for
each device.
MFC after: 1 week
The previous code was checking the "VGA Enable" bit on the video card's
parent PCI-to-PCI bridge only. This didn't work for the case where the
video card is attached to the root PCI bus (ie. the card has no parent
PCI-to-PCI bridge).
Now, the new code:
1. checks the "VGA Enable" bit on the parent bridge only if it's a
PCI-to-PCI bridge;
2. always checks the "I/O" and "Memory address space decoding" bits
on the video card itself.
However, vendor-specific bits are not used.
This fixes the use of many integrated Radeon cards: without this patch,
we fail to detect them as the boot display and, when radeonkms looks for
the Video BIOS, it skips the shadow copy made by the System BIOS. It
then fails to fully initialize the card, because the shadow copy is the
only way to read the Video BIOS in these situations. A workaround was to
force the boot display selection using the "hw.pci.default_vgapci_unit"
tunable.
A previous version of this patch added a new function doing the checks.
Now, the vga_pci_is_boot_display() function is used to perform the
checks (only until the boot display is found) and return if the given
device is the boot display or not.
Furthermore, vga_pci_attach() logs "Boot video device" if the card being
attached it the Chosen One:
vgapci0: <VGA-compatible display> [...]
vgapci0: Boot video device
Reviewed by: kib@, jhb@ (both a previous version)
Tested by: lunatic_ (#freebsd-xorg, integrated Radeon card,
xmj (#freebsd-xorg, i915+NVIDIA cards)
referenced by pointer, making it non-static should not have even the
negligible impact on the existing code.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Here are two new functions to map and unmap the Video BIOS:
void * vga_pci_map_bios(device_t dev, size_t *size);
void vga_pci_unmap_bios(device_t dev, void *bios);
The BIOS is either taken from the shadow copy made by the System BIOS at
boot time if the given device was used for the default display (i386,
amd64 and ia64 only), or from the PCI expansion ROM.
Additionally, one can determine if a given device was the default
display at boot time using the following new function:
void vga_pci_unmap_bios(device_t dev, void *bios);
beasts still exist unfortunately. More details can be found in other
references, but the short version is that bridges with this bit set ignore
I/O port ranges that alias to valid ISA I/O port ranges. In the driver
this requires not allocating these alias regions from the parent device
(so they are free to be acquired by ISA devices), and ensuring no child
devices use resources from these alias regions.
- Change the pcib_window structure to allow for an array of backing
resources rather than a single resource and update the existing code
to cope with this. Some of the coping requires using the saved
base and limit values in pcib_window instead of using rman operations
on the backing resource.
- Add special handling for allocating and adjusting the I/O port window
of an ISA-enabled bridge to only allocate the non-alias ranges and
add those to the associated resource manager.
- Reject I/O port allocations for a fixed request that conflicts with an
ISA alias range.
- Remove the "no prefected decode" verbose printf during boot. The absence
of a "prefetched decode" line is sufficient.
- Replace the "subtractively decoded bridge" verbose printf with a single
printf that lists all the "special" decoding modes of a bridge: ISA,
subtractive, and VGA.
- Add a custom bus_release_resource() method to the PCI bus driver so that
it can properly free resources for I/O windows of PCI-PCI bridges.
(These resources are not stored in the bridge device's resource list.)
PR: misc/179033
MFC after: 2 weeks
VMware up to at least ESXi 5.1. Actually, using INTx in that case instead
may still result in interrupt storms, with MSI being the only working
option in some configurations. So introduce a PCI_QUIRK_DISABLE_MSIX quirk
which only blacklists MSI-X but not also MSI and use it for the VMware
PCI-PCI-bridges. Note that, currently, we still assume that if MSI doesn't
work, MSI-X won't work either - but that's part of the internal logic and
not guaranteed as part of the API contract. While at it, add and employ
a pci_has_quirk() helper.
Reported and tested by: Paul Bucher
- Use NULL instead of 0 for pointers.
Submitted by: jhb (mostly)
Approved by: jhb
MFC after: 3 days
bug where a PCI device would be powered down if it failed to probe, but
not when its driver was detached (e.g. via kldunload).
- Add a new helper method resource_list_release_active() which forcefully
releases any active resources of a specified type from a resource list.
- Add a bus_child_detached method for the PCI bus driver which forces any
active resources to be released (and whines to the console if it finds
any) and then powers the device down.
- Call pci_child_detached() if we fail to probe a device when a driver
is kldloaded. This isn't perfect but can avoid leaking resources
from a probe() routine in the kldload case.
Reviewed by: imp, brooks
MFC after: 1 month
tester of this fix, and realloc_bars breaks some other cases as a small
BAR that is reallocated can end up grabbing space needed by a much larger
BAR in the existing window of a PCI-PCI bridge.
MFC after: 3 days
assigned conflicting ranges to BARs then leaving the BARs alone could
result in one device stealing mmio accesses intended to go to a second
device. Prior to 233677 the PCI bus driver attempted to handle this case
by clearing the BAR to 0 depending on BARs based at 0 not decoding (which
is not guaranteed to be true). Now when a conflicting BAR is detected the
following steps are taken:
1) If hw.pci.realloc_bars (a new tunable) is enabled (default is enabled),
then ignore the current BAR setting from the firmware and attempt to
allocate a fresh resource range for the BAR.
2) If 1) failed (or was disabled), disable decoding for the relevant
BAR type (e.g. disable mem decoding for a memory BAR) and emit a
warning if booting verbose.
Tested by: Alex Keda <admin@lissyara.su>
MFC after: 1 week
device which makes the request for dma tag, instead of some descendant
of the PCI device, by creating a pass-through trampoline for vga_pci
and ata_pci buses.
Sponsored by: The FreeBSD Foundation
Suggested by: jhb
Discussed with: jhb, mav
MFC after: 1 week
devices. While at it, update the comment now that we know that MSI-X
doesn't work with ESXi 5.1 for Intel 82576 either and the underlying issue
is a bug in the MSI-X allocation code of the hypervisor.
Reported by: Harald Schmalzbauer
- Make the nomatch table const.
MFC after: 1 week
them, please let me know if not). Most of these are of the form:
static const struct bzzt_type {
[...list of members...]
} const bzzt_devs[] = {
[...list of initializers...]
};
The second const is unnecessary, as arrays cannot be modified anyway,
and if the elements are const, the whole thing is const automatically
(e.g. it is placed in .rodata).
I have verified this does not change the binary output of a full kernel
build (except for build timestamps embedded in the object files).
Reviewed by: yongari, marius
MFC after: 1 week
#defines. This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.
This is a mostly mechanical rename:
s/PCIR_EXPRESS_/PCIER_/g
s/PCIM_EXP_/PCIEM_/g
s/PCIM_LINK_/PCIEM_LINK_/g
When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.
Discussed with: jhb
MFC after: 1 week
- Add constants for the rest of the fields in the PCI-express device
capability and control registers.
- Tweak some of the recently added PCI-e capability constants (always
use hex for offsets in config space, and include a shortened
version of the relevant register in the name of field constants).
MFC after: 1 week
have been chosen based on the bit names in the PCI Express Base
Specification 3.0, and to match the predominant style of the existing
bit definitions.
MFC after: 1 week
on the secondary side of a bridge will not be propagated to the primary
bus unless this is enabled. Busmastering is not enabled by default (we
have relied on firmware to set this bit to date). The OS needs to set it
for any bridges not configured by system firmware.
Tested by: Steve Polyack korvus comcast net
MFC after: 2 weeks
the request up the tree in order to be on the safe side. Growing windows
in this case would mean to switch resources to positive decoding and
it's unclear how to correctly handle this. At least with ALi/ULi M5249
PCI-PCI bridges, this also just doesn't work out of the box.
Reviewed by: jhb
MFC after: 3 days
growing "downward" (moving the start address down). First, an off by
one error caused the end address to be moved down an extra alignment
chunk unnecessarily. Second, when aligning the new candidate starting
address, the wrong bits were masked off.
Tested by: Andrey Zonov andrey zonov org
MFC after: 3 days
and deactivating PCI resources. Previously, if a device had more than
48 MSI interrupts, then activating message 48 (which has a rid == PCIR_BIOS)
would incorrectly try to enable the PCI ROM BAR.
Tested by: Olivier Cinquin ocinquin uci edu
MFC after: 3 days
not disable it and it is even harmful as hselasky found out. Historically,
this code was originated from (OLDCARD) CardBus driver and later leaked into
PCI driver when CardBus was newbus'ified and refactored with PCI driver.
However, it is not really necessary even for CardBus.
Reviewed by: hselasky, imp, jhb
bridges. Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.
To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0. For ACPI, use the _ADR
method to find the slot and function of the device. For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices. Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.
This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.
Tested by: bapt
MFC after: 1 week
Don't disable BARs on any PCI display devices, because doing that can
sometimes cause the main memory bus to stop working, causing all
memory reads to return nothing but 0xFFFFFFFF, even though the memory
location was previously written. After a while a privileged
instruction fault will appear and then nothing more can be debugged.
The reason for this behaviour is unknown.
MFC after: 1 week
For example, some BIOS for AMD SB600 south bridge may map HPET MMIO base
address as a memory BAR for SMBus controller depending on a PM register
configuration. Before r231161 (and r232086, subsequent MFC to stable/9),
it was not fatal but hpet(4) just failed to attach. Since we probe and
attach HPET earlier than PCI devices now, it caused unfortunate hard lockup.
With this patch, it does not hang any more and HPET works at the same time.
Clean up some style nits while I am in the neighborhood.
PR: kern/165647
Reviewed by: jhb
MFC after: 3 days
Expand pci_save_state and pci_restore_state to save more of
the config state for PCI Express and PCI-X devices. Various
writable control registers are present in PCI Express that
can potentially be lost over suspend/resume cycle.
This change is modeled after similar functionality in Linux.
Reviewed by: wlosh,jhb
MFC after: 1 month
all for platforms that only have 32-bit bus addresses. Second, remove
the 'tag_valid' flag from the softc. Instead, if we don't create a
tag in pci_attach_common(), just cache the value of our parent's tag
so that we always have a valid tag to return.
- pci_find_extcap() is repurposed to be used for fetching PCI-express
extended capabilities (PCIZ_* constants in <dev/pci/pcireg.h>).
- pci_find_htcap() can be used to locate a specific HyperTransport
capability (PCIM_HTCAP_* constants in <dev/pci/pcireg.h>).
- Cache the starting location of the PCI-express capability for PCI-express
devices in PCI device ivars.
The tag enforces a single restriction that all DMA transactions must not
cross a 4GB boundary. Note that while this restriction technically only
applies to PCI-express, this change applies it to all PCI devices as it
is simpler to implement that way and errs on the side of caution.
- Add a softc structure for PCI bus devices to hold the bus_dma tag and
a new pci_attach_common() routine that performs actions common to the
attach phase of all PCI bus drivers. Right now this only consists of
a bootverbose printf and the allocate of a bus_dma tag if necessary.
- Adjust all PCI bus drivers to allocate a PCI bus softc and to call
pci_attach_common() from their attach routines.
MFC after: 2 weeks
pci_cfg_save() and pci_cfg_restore() for device drivers to use when
saving and restoring state (e.g. to handle device-specific resets).
Reviewed by: imp
MFC after: 2 weeks
through by VMware so blacklist their PCI-PCI bridge for MSI/MSI-X here.
Note that besides currently there not being a quirk type that disables
MSI-X only and there's no evidence that MSI doesn't work with the VMware
pass-through, it's really questionable whether MSI generally works in
that setup as VMware only mention three know working devices [1, p. 4].
Also not that this quirk entry currently doesn't affect the devices
emulated by VMware in any way as these don't claim support MSI/MSI-X to
begin with. [2]
While at it, make the PCI quirk table const and static.
- Remove some duplicated empty lines.
- Use DEVMETHOD_END.
PR: 163812, http://forums.freebsd.org/showthread.php?t=27899 [2]
Reviewed by: jhb
MFC after: 3 days
pci_get_vpd_readonly_method(). Previously the loop was always running
to completion and falling through to failing with ENXIO.
PR: kern/164313
Submitted by: Chuck Tuffli chuck tuffli net
MFC after: 1 week
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
resource allocation on x86 platforms:
- Add a new helper API that Host-PCI bridge drivers can use to restrict
resource allocation requests to a set of address ranges for different
resource types.
- For the ACPI Host-PCI bridge driver, use Producer address range resources
in _CRS to enumerate valid address ranges for a given Host-PCI bridge.
This can be disabled by including "hostres" in the debug.acpi.disabled
tunable.
- For the MPTable Host-PCI bridge driver, use entries in the extended
MPTable to determine the valid address ranges for a given Host-PCI
bridge. This required adding code to parse extended table entries.
Similar to the new PCI-PCI bridge driver, these changes are only enabled
if the NEW_PCIB kernel option is enabled (which is enabled by default on
amd64 and i386).
Approved by: re (kib)
bridge is blacklisted. In that case just return from pci_alloc_msix_method(),
otherwise we continue without a single MSI-X resource, causing subsequent
attempts to use the seemingly available resource to fail or when booting
verbose a NULL-pointer dereference of rle->start when trying to print the
IRQ in pci_alloc_msix_method().
Reviewed by: jhb
MFC after: 1 week
the recent changes to track BAR state explicitly. The code would now
attempt to add the same BAR twice in this case. Instead, change this so
that it recognizes this case and only adds it once and do not delete the
BAR outright after parsing the CIS.
Tested by: bschmidt
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests. Now the
driver actively manages the I/O windows.
This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices. The
suballocations are managed by creating an rman for each I/O window. The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus. Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus. If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.
When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free. From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed. The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.
Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows). If
that fails, the request is passed up to the parent PCI bus directly
however.
The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window. This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.
The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.
The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled. This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now. Once all platforms support the new driver, the
kernel option and old driver will be removed.
PR: kern/143874 kern/149306
Tested by: mav
allocated, not the maximum number of messages the device supports. The
spec only requires the former, and I believe I implemented the latter due
to misunderstanding an e-mail. In particular, this fixes an issue where
having several devices that all support 16 messages can run out of
IDT vectors on x86 even though the driver only uses a single message.
Submitted by: Bret Ketchum bcketchum of gmail
MFC after: 1 week
bus driver will now remember the size of a BAR obtained during the initial
bus scan and use that size when doing lazy resource allocation rather than
resizing the BAR. The bus driver will now also report unallocated BARs to
userland for display by 'pciconf -lb'. Psuedo-resources that are not BARs
(such as the implicit I/O port resources for master/slave ATA controllers)
will no longer be listed as BARs in 'pciconf -lb'. During resume, BARs are
restored from their new saved state instead of having the raw registers
saved and restored across resume. This also fixes restoring BARs at
unusual loactions if said BAR has been allocated by a driver.
Add a constant for the offset of the ROM BIOS BAR in PCI-PCI bridges and
properly handle ROM BIOS BARs in PCI-PCI bridges. The PCI bus now also
properly handles the lack of a ROM BIOS BAR in a PCI-Cardbus bridge.
Tested by: jkim
"extended capabilities" to refer to the new set of capability structures
starting at offset 0x100 in config space for PCI-express devices. For now
both function names will still work. I will merge this to older branches
to ease driver portability, but 9.0 will ship with a new pci_find_extcap()
function that locates extended capabilities instead.
Reviewed by: imp
MFC after: 1 week
chipsets that do not have an HT slave at 0:0:0:0. The Linux quirk is
actually specific to Nvidia chipsets and the check I had added was in
the wrong place.
Prodded by: nathanw
- Always enable the HyperTransport MSI mapping window for HyperTransport
to PCI bridges (these show up as HyperTransport slave devices).
The mapping windows in PCI-PCI bridges are enabled by existing code
in the PCI-PCI bridge driver as MSI requests propagate up the device
tree, but Host-PCI bridges don't really show up in that tree.
- If the PCI device at domain 0 bus 0 slot 0 function 0 is not a
HyperTransport device, then blacklist MSI on any other HT devices in
the system. Linux has a similar quirk.
PR: kern/155442
Tested by: Zack Dannar zdannar of gmail
MFC after: 1 week
causing the size calculation to be truncated to the size of an int
(32-bits on all current architectures).
Submitted by: Anish akgupt3 of gmail
MFC after: 1 week
within the first 4 bytes of the EHCI memory space. For controllers that
use big-endian MMIO, reading them with 1- and 2-byte reads would then
return the wrong values. Instead, read the combined register with a 4-byte
read and mask out the interesting quantities.
PCI-express or PCI-X capabilities if we are running in a virtual machine.
- Whitelist the Intel 82440 chipset used by QEMU.
Tested by: jfv
MFC after: 1 week
ignore BARs that are invalid due to having a size of zero, not to ignore
BARs with an existing base of zero. While here, reorganize the code
slightly to make the intent clearer.
Reported by: avg
MFC after: 1 week
Specification Rev. 1.2. Rename pp_pcmcsr field of PM capabilities to pp_bse
to avoid further confusions and adjust some comments accordingly. The real
PMCSR (Power Management Control/Status Register) is PCIR_POWER_STATUS and
it is actually BSE (PCI-to-PCI Bridge Support Extensions) register.
PCI status register to map its current name.
- Use PCIM_* rather than PCIR_* for constants for fields in various AER
registers. I got about half of them right in the previous commit.
MFC after: 1 week
PCI-express. I used PCIZ_* for ID constants (plain capability IDs use
PCIY_*).
- Add register definitions for the Advanced Error Reporting, Virtual
Channels, and Device Serial Number extended capabilities.
- Teach pciconf -c to list extended as well as plain capabilities. Adds
more detailed parsing for AER, VC, and device serial numbers.
MFC after: 2 weeks
method is used by the PCI bus driver to query the power management system
to determine the proper device state to be used for a device during suspend
and resume. For the ACPI PCI bridge drivers this calls
acpi_device_pwr_for_sleep(). This removes ACPI-specific knowledge from
the PCI and PCI-PCI bridge drivers.
Reviewed by: jkim
bus_generic_resume() since the pci_link(4) driver was added.
- Change the ACPI PCI-PCI bridge driver to inherit most of its methods
from the generic PCI-PCI bridge driver. In particular, this will now
restore PCI config registers for ACPI PCI-PCI bridges.
Tested by: Oleg Sharoyko osharoiko of gmail
Powermac G5 systems. MSI and several other things are not presently
supported.
The U3/U4 internal device support portions of this change were contributed
by Andreas Tobler.
MFC after: 1 week
pci_delete_child() function called by the cardbus driver. The new function
uses resource_list_unreserve() to release the BARs decoded by the device
being removed.
Reviewed by: imp
Tested by: brooks
handling for the PCIR_BIOS decoding enable bit from the cardbus driver.
The PCIR_BIOS BAR does include type bits like other BARs. Instead, it is
always a 32-bit non-prefetchable memory BAR where the low bit is used as a
flag to enable decoding.
Reviewed by: imp
are not allocated by the device driver. These resources should still appear
allocated from the system's perspective so that their assigned ranges are
not reused by other resource requests. The PCI bus driver has used a hack
to effect this for a while now where it uses rman_set_device() to assign
devices to the PCI bus when they are first encountered and later assigns
them to the actual device when a driver allocates a BAR. A few downsides of
this approach is that it results in somewhat confusing devinfo -r output as
well as not being very easily portable to other bus drivers.
This commit adds generic support for "reserved" resources to the resource
list API used by many bus drivers to manage the resources of child devices.
A resource may be reserved via resource_list_reserve(). This will allocate
the resource from the bus' parent without activating it.
resource_list_alloc() recognizes an attempt to allocate a reserved resource.
When this happens it activates the resource (if requested) and then returns
the reserved resource. Similarly, when a reserved resource is released via
resource_list_release(), it is deactivated (if it is active) and the
resource is then marked reserved again, but is left allocated from the
bus' parent. To completely remove a reserved resource, a bus driver may
use resource_list_unreserve(). A bus driver may use resource_list_busy()
to determine if a reserved resource is allocated by a child device or if
it can be unreserved.
The PCI bus driver has been changed to use this framework instead of
abusing rman_set_device() to keep track of reserved vs allocated resources.
Submitted by: imp (an older version many moons ago)
MFC after: 1 month
itself to an associated PCI device if it exists. It is little bit hackish
but it should fix build without frame buffer driver since r198964.
Fix some style(9) nits in vga_isa.c while we are here.
Have the early USB takeover enabled for i386 and amd64
by default.
This also avoids a panic on PowerPC where the resource
isn't released properly and we find a busy resource
when the USB host controller wants to allocate it...
- Do not map entire real mode memory (1MB). Instead, we map IVT/BDA and
ROM area separately. Most notably, ROM area is mapped as device memory
(uncacheable) as it should be. User memory is dynamically allocated and
free'ed with contigmalloc(9) and contigfree(9). Remove now redundant and
potentially dangerous x86bios_alloc.c. If this emulator ever grows to
support non-PC hardware, we may implement it with rman(9) later.
- Move all host-specific initializations from x86emu_util.c to x86bios.c and
remove now unnecessary x86emu_util.c. Currently, non-PC hardware is not
supported. We may use bus_space(9) later when the KPI is fixed.
- Replace all bzero() calls for emulated registers with more obviously named
x86bios_init_regs(). This function also initializes DS and SS properly.
- Add x86bios_get_intr(). This function checks if the interrupt vector is
available for the platform. It is not necessary for PC-compatible hardware
but it may be needed later. ;-)
- Do not try turning off monitor if DPMS does not support the state.
- Allocate stable memory for VESA OEM strings instead of just holding
pointers to them. They may or may not be accessible always. Fix a memory
leak of video mode table while I am here.
- Add (experimental) BIOS POST call for vesa(4). This function calls VGA
BIOS POST code from the current VGA option ROM. Some video controllers
cannot save and restore the state properly even if it is claimed to be
supported. Usually the symptom is blank display after resuming from suspend
state. If the video mode does not match the previous mode after restoring,
we try BIOS POST and force the known good initial state. Some magic was
taken from NetBSD (and it was taken from vbetool, I believe.)
- Add a loader tunable for vgapci(4) to give a hint to dpms(4) and vesa(4)
to identify who owns the VESA BIOS. This is very useful for multi-display
adapter setup. By default, the POST video controller is automatically
probed and the tunable "hw.pci.default_vgapci_unit" is set to corresponding
vgapci unit number. You may override it from loader but it is very unlikely
to be necessary. Unfortunately only AGP/PCI/PCI-E controllers can be
matched because ISA controller does not have necessary device IDs.
- Fix a long standing bug in state save/restore function. The state buffer
pointer should be ES:BX, not ES:DI according to VBE 3.0. If it ever worked,
that's because BX was always zero. :-)
- Clean up register initializations more clearer per VBE 3.0.
- Fix a lot of style issues with vesa(4).
all host controllers at the same time, we avoid problems where the BIOS will
actually write to the USB registers of all the USB host controllers every time
we handover one of them, and consequently reset the OS programmed values.
Submitted by: avg
Reviewed by: jhb
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
unsigned before comparison with maximum value to cut off negative
values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
values are already checked in the subsequent switch
Reviewed by: jhb
MFC after: 1 week
decoding "took". Other OS's that I checked do not do this and it breaks
some amdpm(4) devices. Prior to 7.2 we did not honor the error returned
when this failed anyway, so this in effect restores previous behavior.
PR: kern/137668
Tested by: Aurelien Mere aurelien.mere amc-os.com
MFC after: 3 days