Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.
Sponsored by: Netflix
pcib_request_feature allows drivers to request the firmware (ACPI)
release certain features it may be using. ACPI normally manages things
like hot plug, advanced error reporting and other features until the
OS requests ACPI to relenquish control since it is taking over.
Sponsored by: Netflix
As of r313097, the HotPlug code requires the link to support
reporting of the data-link status. Remove tests for this capability
from code that can now assume its presence.
Suggested by: jhb
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9431
Some PCI-e bridges report that they support HotPlug in the slot
capabilities but do not report support for Data Layer Active events
in the link capabilities register. These bridges do not work correctly
when HotPlug is used. Further, while the description of HotPlug in
the spec does not mention that DL active events are required, the
description of the link capabilities register says that DL active is
required for HotPlug. Thanks to Dave Baukus for finding that language
in the spec.
PR: 211699
Submitted by: Dave Baukus <daveb@spectralogic.com>
Reviewed by: vangyzen
MFC after: 3 days
Replace archaic "busses" with modern form "buses."
Intentionally excluded:
* Old/random drivers I didn't recognize
* Old hardware in general
* Use of "busses" in code as identifiers
No functional change.
http://grammarist.com/spelling/buses-busses/
PR: 216099
Reported by: bltsrc at mail.ru
Sponsored by: Dell EMC Isilon
This patch solves IRQ generation problems using the mlx5en(4) driver
with xenserver v6.5.0 in SRIOV and PCI-passthrough modes.
Until further the hw.pci.msix_rewrite_table quirk must be set manually
in /boot/loader.conf .
Reviewed by: jhb @
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
ARM GIC specification in device trees use 3 cells, so the current
limit of 2 causes the last cell to be dropped. This in turn can
cause the interrupt polarity and trigger settings to be incorrect.
Increase the limit to 4 which should handle all reasonable cases.
This fixes issues seen in QEMU when registering PCI interrupts.
FDT attachment to a new file. A separate ACPI attachment will then be added
to allow arm64 servers with ACPI to use it over FDT.
This should also help with merging this with the ofwpci driver, with
further work needed to remove restrictions this driver places on resource
allocation.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7319
It's unsafe to update the BAR when the related EN bit is set.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7914
During a bus rescan the check for an invalid vendor ID of a subfunction
used the wrong constant.
Submitted by: Dexuan Cui <decui@microsoft.com>
MFC after: 3 days
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Reviewed by: imp, wblock (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7751
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7667
Other files including pci_host_generic.h failed to compile
due to missing declaration of enum pci_id_type.
Obtained from: Semihalf
Submitted by: Michal Stanek <mst@semihalf.com>
Sponsored by: Annapurna Labs
Reviewed by: wma
Differential Revision: https://reviews.freebsd.org/D7561
It seems Killer E2200/E2400 has a BIOS misconfiguration or silicon
bug which triggers DMA write errors when driver uses advertised
maximum payload size. Force the maximum payload size to 128 bytes
in DMA configuration.
This change should fix occasional DMA write errors reported on
Killer E2200.
Tested by: <psy0nic@sys-tek.org>
- Read interrupt properties at bus enumeration time and store
it into global mapping table.
- At bus_activate_resource() time, given mapping entry is resolved and
connected to real interrupt source. A copy of mapping entry is attached
to given resource.
- At bus_setup_intr() time, mapping entry stored in resource is used
for delivery of requested interrupt configuration.
- For MSI/MSIX interrupts, mapping entry is created within
pci_alloc_msi()/pci_alloc_msix() call.
- For legacy PCI interrupts, mapping entry must be created within
pcib_route_interrupt() by pcib driver itself.
Reviewed by: nwhitehorn, andrew
Differential Revision: https://reviews.freebsd.org/D7493
Some devices report that they have an MRL when they actually
do not. Since they always report that the MRL is open, child
devices would be ignored. Try to detect these devices and
ignore their claim of HotPlug support. Specifically,
if there is an open MRL but the Data Link Layer is active,
the MRL is not real.
Revert r303645 to re-enable HotPlug support for slots with
power controllers, since it works correctly in my testing.
Start the DLL state-change timer if Presence /or/ MRL state changes,
along with other conditions. Previously, we started the timer iff
Presence changed. If there is an MRL, it must be closed for power
to be turned on, so Presence is unlikely to change on an MRL-close event.
Add a printf() of interesting registers on HotPlug interrupts and
commands (one from erj@). These were very useful for debugging.
Guard them with bootverbose, since they're spam in normal operation.
In collaboration with: jhb
Reviewed by: jhb
MFC after: 1 day
Relnotes: yes (re-enable HotPlug support for slots with power controllers)
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7509
Several files use the internal name of `struct device` instead of
`device_t` which is part of the public API. This patch changes all
`struct device *` to `device_t`.
The remaining occurrences of `struct device` are those referring to the
Linux or OpenBSD version of the structure, or the code is not built on
FreeBSD and it's unclear what to do.
Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version)
Approved by: emaste, jhibbits, sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7447
Previously the loop in PCIIOCGETCONF would terminate as soon as it
found enough matches. Now it will continue iterating through the
PCI device list and only terminate if it finds another matching device
for which it has no room to store a conf structure. This means that
PCI_GETCONF_LAST_DEVICE is reliably returned when the number of
matching devices is equal to the number of slots in the matches
buffer. For example, if a program requests the conf structure for a
single PCI function with a specified domain/bus/slot/function it will
now get PCI_GETCONF_LAST_DEVICE instead of PCI_GETCONF_MORE_DEVS.
While here, simplify the loop conditional a bit more by explicitly
breaking out of the loop if copyout() fails and removing a redundant
i < pci_numdevs check.
Reviewed by: vangyzen, imp
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7445
The interpretation of the Electromechanical Interlock Status was
inverted, so we disengaged the EI if a card was inserted.
Fix it to engage the EI if a card is inserted.
When displaying the slot capabilites/status with pciconf:
- We inverted the sense of the Power Controller Control bit,
saying the power was off when it was really on (according to
this bit). Fix that.
- Display the status of the Electromechanical Interlock:
EI(engaged)
EI(disengaged)
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7426
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs. By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.
Reviewed by: rstone
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7400
After further review of the spec, I do not think the current HotPlug
code handles slots with power controllers correctly. In particular,
the power state of the slot is to be inferred from other events, not
from examining the state of the power control bit in SLOT_CTL. For now,
disable PCI hotplug support on such slots.
PR: 211081
Tested by: Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
MFC after: 3 days
Some systems and/or devices (such as riser cards) do not include a
non-compliant implementation of PCI-e HotPlug that can result in devices
not being attached (e.g. the HotPlug code might assume that a card is
being unplugged and will power the slot off and detach it). This
tunable can be set to 0 to disable support for PCI-e HotPlug ignoring
the incorrect HotPlug state on these slots.
PR: 211081
Reported by: Sergey Renkas <serg_ic@mail.ru> (SuperMicro X7 riser card)
Reported by: Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
(Intel X520 adapter)
MFC after: 1 week
Relnotes: yes
this to be the case. This will mean we don't try and handle the cache in
bus_dmamap_sync when it is not needed.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6605
tested on the Pass 1.1 and 2.0 ThunderX machines in the Netperf cluster.
Reviewed by: jhb
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6453
- Add a pcib_detach() function for the PCI-PCI bridge driver. It
tears down the NEW_PCIB and hotplug state including destroying
resource managers, deleting child devices, and disabling hotplug
events.
- Add a detach method to the ACPI PCI-PCI bridge driver which calls
pcib_detach() and then frees the copy of the _PRT interrupt routing
table.
- Add a detach method to the PCI-Cardbus bridge driver which frees
the PCI bus resources in addition to calling cbb_detach().
- Explicitly clear any pending hotplug events during attach to ensure
future events will generate an interrupt.
- If a the Command Completed bit is set in the slot status register
when the command completion timeout fires, treat it as if the
command completed and the completion interrupt was just lost rather
than forcing a detach.
- Don't wait for a Command Completed notification if Command Completion
interrupts are disabled. The spec explicitly says no interrupt is
enabled when clearing CCIE, and on my T400 no interrupt is generated
when CCIE is changed from cleared to set, either. In addition, the
T400 doesn't appear to set the Command Completed bit in the cases
where it doesn't generate an interrupt, so don't schedule the timer
either. (If the CC bit were always set, one could always set the timer
and rely on the logic of treating CC set as a missed interrupt.)
Reviewed by: imp (older version)
Differential Revision: https://reviews.freebsd.org/D6424
Previously the command completion interrupt would post any pending
command immediately before pcib_pcie_hotplug_update() had been
run to inspect the current status. Now, the command completion
interrupt merely clears the flag and stops the timer assuming that
the caller is always going to call pcib_pcie_hotplug_update() to
generate the next hotplug command if one is needed.
While here, fix a bug for systems with command completion where the
old (existing) value was written to the slot control register instead
of the new value. This fixes the complaint about a missing hotplug
interrupt on my T400.
Differential Revision: https://reviews.freebsd.org/D6363
translate the pci rid to a controller ID. The translation could be based
on the 'msi-map' OFW property, a similar ACPI option, or hard-coded for
hardware lacking the above options.
Reviewed by: wma
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Add a new get_id interface to pci and pcib. This will allow us to both
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone (previous version)
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
interface with 5 methods to mirror the 5 MSI/MSI-X methods in the pcib
interface. The pcib driver will need to perform a device specific lookup
to find the MSI controller and pass this to intrng as the xref. Intrng
will finally find the controller and have it handle the requested operation.
Obtained from: ABT Systems Ltd
MFH: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5985
which leads to end being before start and thus a signed extended very large
number of size later on, which kva_alloc() will fail upon and we will panic.
Add the missing call.
Debugged with: andrew
Reviewed by: br, andrew
Sponsored by: DARPA/AFRL
Found: while using virtio with gem5
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6337
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
When devctl was added, the location string for PCI devices was changed to
use the PCI "selector" that pciconf and devctl accept. However, devd
assumes that location strings are formatted as a list of name=value pairs.
As a result, devd is no longer parsing any of the values out of PCI
device events. Restore the previous format of the PCI location strings
to restore the location and slot keywords in case any devd scripts are
using this. Add the "selector" as a new 'dbsf' location variable.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D6253
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.
This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.
The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).
These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.
PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.
Reviewed by: adrian, imp, vangyzen (older versions)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6136
Save the value of the IOV control and page size registers and restore
them (along with the VF count) in pci_cfg_save/pci_cfg_restore. This
ensures ARI remains enabled if a PF driver resets itself during the
PCI_IOV_INIT callback. This might also properly restore SRIOV state
across suspend/resume.
Reviewed by: rstone, vangyzen
Differential Revision: https://reviews.freebsd.org/D6192
While here, check if ARI was enabled by re-reading the config register
after writing it and return an error if the write fails.
Reviewed by: rstone, vangyzen
pci_remap_msix() can be used to alter the mapping of allocated
MSI-X vectors to the MSI-X table. The code had an off by one error
when adding the IRQ resources after performing a remap. This was
fatal for any vectors in the table that used the "last" valid IRQ as
those vectors were assigned a garbage IRQ value.
MFC after: 3 days
This allows the PCI-PCI bridge driver to save a reference to the child
device in its softc.
Note that this required moving the "pci" device creation out of
acpi_pcib_attach(). Instead, acpi_pcib_attach() is renamed to
acpi_pcib_fetch_prt() as it's sole action now is to fetch the PCI
interrupt routing table.
Differential Revision: https://reviews.freebsd.org/D6021
Rescanning a PCI bus uses the following steps:
- Fetch the current set of child devices and save it in the 'devlist'
array.
- Allocate a parallel array 'unchanged' initalized with NULL pointers.
- Scan the bus checking each slot (and each function on slots with a
multifunction device).
- If a valid function is found, look for a matching device in the 'devlist'
array. If a device is found, save the pointer in the 'unchanged' array.
If a device is not found, add a new device.
- After the scan has finished, walk the 'devlist' array deleting any
devices that do not have a matching pointer in the 'unchanged' array.
- Finally, fetch an updated set of child devices and explicitly attach any
devices that are not present in the 'unchanged' array.
This builds on the previous changes to move subclass data management into
pci_alloc_devinfo(), pci_child_added(), and bus_child_deleted().
Subclasses of the PCI bus use custom rescan logic explicitly override the
rescan method to disable rescans.
Differential Revision: https://reviews.freebsd.org/D6018
This is a trivial follow-up to r296308. Annotate the intentional fallthrough
to make it clear for future readers and linters.
Reported by: Coverity
CID: 1352716
Discussed with: jhb
Sponsored by: EMC / Isilon Storage Division
The ACPI and OFW PCI bus drivers as well as CardBus override this to
allocate the larger ivars to hold additional info beyond the stock PCI ivars.
This removes the need to pass the size to functions like pci_add_iov_child()
and pci_read_device() simplifying IOV and bus rescanning implementations.
As a result of this and earlier changes, the ACPI PCI bus driver no longer
needs its own device_attach and pci_create_iov_child methods but can use
the methods in the stock PCI bus driver instead.
Differential Revision: https://reviews.freebsd.org/D5891
Instead of providing a wrapper around device_delete_child() that the PCI
bus and child bus drivers must call explicitly, move the bulk of the logic
from pci_delete_child() into a bus_child_deleted() method
(pci_child_deleted()). This allows PCI devices to be safely deleted via
device_delete_child().
- Add a bus_child_deleted method to the ACPI PCI bus which clears the
device_t associated with the corresponding ACPI handle in addition to
the normal PCI bus cleanup.
- Change cardbus_detach_card to call device_delete_children() and move
CardBus-specific delete logic into a new cardbus_child_deleted() method.
- Use device_delete_child() instead of pci_delete_child() in the SRIOV code.
- Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which
frees the OpenFirmware device info for each PCI device.
Reviewed by: imp
Tested on: amd64 (CardBus and PCI-e hotplug)
Differential Revision: https://reviews.freebsd.org/D5831
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit. This extends rman's resources to uintmax_t. With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).
Why uintmax_t and not something machine dependent, or uint64_t? Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures. 64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead. That being said, uintmax_t was chosen for source
clarity. If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros. Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.
Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Special thanks to bz for his testing on ARM.
Reviewed By: bz, jhb (previous)
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544
Summary:
The idea behind this is '~0ul' is well-defined, and casting to uintmax_t, on a
32-bit platform, will leave the upper 32 bits as 0. The maximum range of a
resource is 0xFFF.... (all bits of the full type set). By dropping the 'ul'
suffix, C type promotion rules apply, and the sign extension of ~0 on 32 bit
platforms gets it to a type-independent 'unsigned max'.
Reviewed By: cem
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5255
Fix the boundary limit to end at the end of the region and not one beyond (1).
Diagnosed by: andrew (1)
Reviewed by: andrew, br
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D5493
On some platforms, BAR entries are hardcoded and must not be accessed
using standard method. Add functionality to identify this situation
and configure the bus based on Enhanced Allocation structure.
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D5242
Most calls to bus_alloc_resource() use "anywhere" as the range, with a given
count. Migrate these to use the new bus_alloc_resource_anywhere() API.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D5370
If Enhanced Allocation is not used, we can't allocate any random
range. All internal devices have hardcoded place where they can
be located within PCI address space. Fortunately, we can read
this value from BAR.
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D5455
* provided OFW interface for pci_host_generic (for handling devices which are present in DTS under the PCI node)
* removed support for internal PCI from arm64/cavium
* cleaned up and made most of the code common
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D5261
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
Reviewed By: jhb
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
mv_pci driver omitted slot 0, which can be valid device on Armada38x.
New mechanism detects if device is root link, basing on vendor's
and device's IDs.
It is restricted to Armada38x; on other machines, behaviour remains
the same.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4377
While here, explicitly note the requirement that the BAR(s) must be
allocated prior to calling pci_alloc_msix().
Reviewed by: andrew, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D4688
* Use the interrupt-map property to route interrupts
* Remove the IRQ rman, it's now unneeded
* Support MSI/MSI-X interrupts
With this I'm able to use the two NICs I've tested (em and msk), however
while I can boot with an AHCI devie attached it fails when any drives are
connected.
Obtained from: ABT Systems Ltd
Sponsored by: SoftIron Inc
bridges. Currently this includes information about what resources a
bridge decodes on the upstream side for use by downstream devices including
bus numbers, I/O port resources, and memory resources. Windows and bus
ranges are enumerated for both PCI-PCI bridges and PCI-CardBus bridges.
To simplify the implementation, all enumeration is done by reading the
appropriate config space registers directly rather than querying the
bridge driver in the kernel via new ioctls. This does result in a few
limitations.
First, an unimplemented window in a PCI-PCI bridge cannot be accurately
detected as accurate detection requires writing to the window base
register. That is not safe for pciconf(8). Instead, this assumes that
any window where both the base and limit read as all zeroes is
unimplemented.
Second, the PCI-PCI bridge driver in a tree has a few quirks for
PCI-PCI bridges that use subtractive decoding but do not indicate that
via the progif config register. The list of quirks is duplicated in
pciconf's source.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4171
PCI-Express capability registers (that is, PCI config registers in the
standard PCI config space belonging to the PCI-Express capability
register set).
Note that all of the current PCI-e registers are either 16 or 32-bits,
so only widths of 2 or 4 bytes are supported.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D4088
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
Internal bridges in Cavium ThunderX SoC behave as subtractive,
but they are unable to be identified. Force setting an appropriate
flag.
Reviewed by: emaste, imp
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3277
It is possible that some HW will use different PCI devids,
hence allow to replace the default domain🚌slot:func schema
by implementing and registering custom function.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3118
Work based on Cavium Thunder PCIe driver by Semihalf.
Reviewed by: andrew, jhb
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D2386
(easily) without having to go to other drivers to change the
magical return values. This wouldn't be so bad if there were
proper defines for these constants.
In particular dev/acpica/acpi_pcib_pci.c returns -1000 as the
probe priority and it's expected that this driver gets to
attach over the common PCI bus drivers.
BUS_PROBE_HOOVER is. Drivers like proto(4), when compiled into the
kernel or preloaded, will render your system useless by virtue of
attaching to your PCI busses.
Return BUS_PROBE_GENERIC instead. It's just the next priority up
from BUS_PROBE_HOOVER. No other meaning has been give to its use.
While BUS_PROBE_DEFAULT seems like a better candidate, it's hard
not to think that there must be some reason why these drivers
return -10000 in the first place.
Differential Revision: D2705
buildkernel run.
Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros. It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.
Review: https://reviews.freebsd.org/D2665
Reviewed by: rodrigc
Looked at by: bjk
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Leaf drivers should not import the PCI bus interface to add IOV handling.
Instead, move the IOV client methods to a separate kobj interface.
Differential Revision: https://reviews.freebsd.org/D2584
Reviewed by: rstone
Summary:
The Freescale PCIe Root Complex shows up as a Processor class device, PowerPC
subclass, so the generic PCI code ignores it for a bridge. This adds support
for it.
As part of this, update the Freescale PCI hostbridge driver, to allow probing
beyond the root complex, instead of only allowing "proper" PCI-PCI bridges.
Reviewers: #powerpc, marcel, nwhitehorn
Reviewed By: nwhitehorn
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2442
Relnotes: yes
Change the nvlist_recv() function to take additional argument that
specifies flags expected on the received nvlist. Receiving a nvlist with
different set of flags than the ones we expect might lead to undefined
behaviour, which might be potentially dangerous.
Update consumers of this and related functions and update the tests.
Approved by: pjd (mentor)
Update man page for nvlist_unpack, nvlist_recv, nvlist_xfer, cap_recv_nvlist
and cap_xfer_nvlist.
Reviewed by: AllanJude
Approved by: pjd (mentor)
(type 1 and type 2) as well as leaf devices (type 0). In particular,
this allows the existing PCI bus logic to save and restore capability
registers such as MSI and PCI-express work for bridge devices rather than
requiring that code to be duplicated in bridge drivers. It also means
that bridge drivers no longer need to save and restore basic registers
such as the PCI command register or BARs nor manage powerstates for the
bridge device.
While here, pci_setup_secbus() has been changed to initialize the 'sec'
and 'sub' fields in the 'secbus' structure instead of requiring the pcib
and pccbb drivers to do this in the NEW_PCIB + PCI_RES_BUS case.
Differential Revision: https://reviews.freebsd.org/D2240
Reviewed by: imp, jmg
MFC after: 2 weeks
driver's suspend and resume routines. These have been redundant no-ops
since r214065 changed the PCI bus driver to manage power states for
all devices (including type 1/2 bridge devices) during suspend and resume.
for type 0 devices, not type 1 or 2 bridges. Don't read them for bridge
devices during bus scans and return an error when attempting to read them
as ivars for bridge devices.
A late change to the SR-IOV infrastructure broke passthrough of
VFs. device_set_devclass() was being used to try to force the
ppt driver to attach to the device, but this didn't work because
the DF_FIXEDCLASS flag wasn't being set on the device, so the
ppt driver probe routine would not match when it returned
BUS_NOWILDCARD. Fix this by adding a new device function that
both sets the devclass and sets the DF_FIXEDCLASS flag, and use
that to force the ppt driver to attach to VFs.
Differential Revision: https://reviews.freebsd.org/D2041
Reviewed by: jhb
MFC after: 3 weeks
Pass all SR-IOV configuration to the kernel using an nvlist. The
main benefit that this offers is flexibility. It allows a driver
to accept any number of parameters of any type supported by the
SR-IOV configuration infrastructure with having to make any
changes outside of the driver.
It also offers the user very fine-grained control over the
configuration of the VFs -- if they want, they can have different
configuration applied to every VF.
Differential Revision: https://reviews.freebsd.org/D82
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Add a function that validates that the user-provided SR-IOV
configuration is valid. This includes basic checks that the
structure of the configuration is correct (e.g. all required
configuration nodes are present) as well as validating against
a configuration schema.
The schema validation consists of:
- Ensuring that all required config parameters are present.
- If the schema defines a default value for a parameter,
adding the default value if the parameter is not set.
- Ensuring that no parameters are specified in the config
that are not defined in the schema.
- Ensuring that have the correct type defined in the schema.
- Ensuring that no configuration nodes are present for devices
that do not exist. For example, if 2 VFs are configured,
then we validate that a node called VF-5 does not exist.
Differential Revision: https://reviews.freebsd.org/D81
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
When creating VFs, we must size each SR-IOV BAR on the PF and
allocate a configuous I/O memory window large enough for every VF.
However, the window only needs to be aligned to a boundary equal
to the size of the window for a single VF.
When a VF attempts to allocate an I/O memory resource, we must
intercept the request in the pci driver and pass it off to the
SR-IOV code, which will allocate the correct window from the
pre-allocated memory space for the PF.
Inform the pci driver about the size and address of the BARs on
the VF when the VF is created. This is required by pciconf -b and
bhyve.
Differential Revision: https://reviews.freebsd.org/D78
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
The SR-IOV standard requires VFs to read all-ones when the VID
and DID registers are read. The VMM (hypervisor) is required to
emulate them instead. Make pci_read_config() do this emulation.
Change pci_user.c to use pci_read_config() to read config space
registers instead of going directly to the pcib so that the
emulated VID/DID registers work correctly on VFs. This is
required both for pciconf and bhyve PCI passthrough.
Differential Revision: https://reviews.freebsd.org/D77
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device. An ioctl can be invoked on that device to
create VFs and have the driver initialize them.
At this point, allocating memory I/O windows (BARs) is not
supported.
Differential Revision: https://reviews.freebsd.org/D76
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Refactor PCI resource allocation code to allow a request for a
memory-mapped I/O window that is a multiple of a requested size.
This is needed by the SR-IOV code because the VF BARs are all
allocated contiguously. We can't just allocate a resource that is
a multiple of a single VF BAR because the size of an allocation
implies its alignment requirement.
Differential Revision: https://reviews.freebsd.org/D71
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Refactor creation of PCI devices into helper methods that can be
used by the VF creation code.
Differential Revision: https://reviews.freebsd.org/D67
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
allows the user to request administrative changes to individual devices
such as attach or detaching drivers or disabling and re-enabling devices.
- Add a new /dev/devctl2 character device which uses ioctls for device
requests. The ioctls use a common 'struct devreq' which is somewhat
similar to 'struct ifreq'.
- The ioctls identify the device to operate on via a string. This
string can either by the device's name, or it can be a bus-specific
address. (For unattached devices, a bus address is the only way to
locate a device.) Bus drivers register an eventhandler to claim
unrecognized device names that the driver recognizes as a valid address.
Two buses currently support addresses: ACPI recognizes any device
in the ACPI namespace via its full path starting with "\" and
the PCI bus driver recognizes an address specification of
'pci[<domain>:]<bus>:<slot>:<func>' (identical to the PCI selector
strings supported by pciconf).
- To make it easier to cut and paste, change the PnP location string
in the PCI bus driver to output a full PCI selector string rather
than 'slot=<slot> function=<func>'.
- Add a devctl(3) interface in libdevctl which provides a wrapper around
the ioctls and is the preferred interface for other userland code.
- Add a devctl(8) program which is a simple wrapper around the requests
supported by devctl(3).
- Add a device_is_suspended() function to check DF_SUSPENDED.
- Add a resource_unset_value() function that can be used to remove a
hint from the kernel environment. This is used to clear a
hint.<driver>.<unit>.disabled hint when re-enabling a boot-time
disabled device.
Reviewed by: imp (parts)
Requested by: imp (changing PCI location string)
Relnotes: yes
for the lookup.
- For devices affected by PCI_QUIRK_MSI_INTX_BUG, ensure PCIM_CMD_INTxDIS
is cleared when using MSI/MSI-X.
- Employ PCI_QUIRK_MSI_INTX_BUG for BCM5714(S)/BCM5715(S)/BCM5780(S) rather
than clearing PCIM_CMD_INTxDIS unconditionally for all devices in bge(4).
MFC after: 3 days
state said device should go into.
This was a snafu introduced in the ACPI/PCI awareness separation.
When putting a device into a power state, the bus (and thus firmware,
eg ACPI) should be asked before hand to check whether the device
can indeed go into that power state.
There's a set of nodes in ACPI under each device - the _SxD nodes - which
state which ACPI power state to put the device into when the system is
going into power save state 'x'. So when going into S3, the existence
of an _S3D node would override whatever the system was trying to do.
By default the PCI code wants to put devices into D3 before suspending.
I have a laptop here (Asus Zenbook - check the PR) whose EHCI controller
really wants to be in D2 during suspend, not D3. So if we put it into
D3 and then try to enter S3, everything hangs. The device itself
can go into D3 - it just can't be there when the call to ACPI to enter
S3 occurs. The PCI patch fixes this.
jkim@ noticed that the same is needed for the ACPI child device
enumeration.
Thankyou to Matt Dillon (the programmer, not the actor) for buying me
this particular laptop so I could debug the issues with the Atheros
AR9485 that is in it. It's his fault that I ended up with this
laptop and was sufficiently annoyed by the lack of USB suspend
to go down this rabbit hole.
Tested:
* Thinkpad T400
* Thinkpad X230
* Thinkpad T42
* Thinkpad T60
* Asus Zenbook (see PR)
* Asus EEEPC 701
* Asus EEEPC 1001PX
TODO:
* Figure out what we should do about devices we unload drivers for
that want to be in a specific state when entering S3 / S4 -
the "put devices into D3 if they're not bound to a driver" option
may also mess with things.
PR: kern/194884
Reviewed by: jhb, jkim
MFC after: 1 week
Relnotes: yes
Sponsored by: Matt Dillon <dillon@apollo.backplane.com> (hardware)
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.
Submitted by: kmacy
Tested by: make universe
* Add a bus_if.m method - get_domain() - returning the VM domain or
ENOENT if the device isn't in a VM domain;
* Add bus methods to print out the domain of the device if appropriate;
* Add code in srat.c to save the PXM -> VM domain mapping that's done and
expose a function to translate VM domain -> PXM;
* Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute
and if so map it to the VM domain;
* (.. yes, this works recursively.)
* Have the pci bus glue print out the device VM domain if present.
Note: this is just the plumbing to start enumerating information -
it doesn't at all modify behaviour.
Differential Revision: D906
Reviewed by: jhb
Sponsored by: Norse Corp
Summary:
Add the beginnings of multipass suspend/resume, by introducing
BUS_SUSPEND_CHILD/BUS_RESUME_CHILD, and move the PCI driver to this.
Reviewers: jhb
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D590
This is needed so when running under Xen the calls to pci_child_added
can be intercepted and a custom Xen method can be used to register
those devices with Xen. This should not include any functional
change, since the Xen implementation will be added in a following
patch and the native implementation is a noop.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
dev/pci/pci.c:
dev/pci/pci_if.m:
dev/pci/pci_private.h:
dev/pci/pcivar.h:
- Add the pci_child_added newbus method.
Make the functions pci_disable_msi, pci_enable_msi and pci_enable_msix
methods of the newbus PCI bus. This code should not include any
functional change.
Sponsored by: Citrix Systems R&D
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D354
dev/pci/pci.c:
- Convert the mentioned functions to newbus methods.
- Fix the callers of the converted functions.
sys/dev/pci/pci_private.h:
dev/pci/pci_if.m:
- Declare the new methods.
dev/pci/pcivar.h:
- Add helpers to call the newbus methods.
ofed/include/linux/pci.h:
- Add define to prevent the ofed version of pci_enable_msix from
clashing with the FreeBSD native version.
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Ensure that first_func is set to 0 on every iteration of the PCI slot
enumeration loop after the first. There is a continue statement that would
cause first_func to stay at 1 any PCI device where slot 0 has no functions
until we find a slot that does have a function. This would cause us to
not enumerate the first PCI function on the device.
Credit to markj@ for spotting the bug.
X-MFC-With: r264011
PCIe Alternate RID Interpretation (ARI) is an optional feature that
allows devices to have up to 256 different functions. It is
implemented by always setting the PCI slot number to 0 and
re-purposing the 5 bits used to encode the slot number to instead
contain the function number. Combined with the original 3 bits
allocated for the function number, this allows for 256 functions.
This is enabled by default, but it's expected to be a no-op on currently
supported hardware. It's a prerequisite for supporting PCI SR-IOV, and
I want the ARI support to go in early to help shake out any bugs in it.
ARI can be disabled by setting the tunable hw.pci.enable_ari=0.
Reviewed by: kib
MFC after: 2 months
Sponsored by: Sandvine Inc.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it. I may have accidentally applied a patch to a non-clean
working tree. Revert everything while I figure out what went wrong.
Pointy hat to: rstone
I/O windows, the default is to preserve the firmware-assigned resources.
PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture
defines a PCI_RES_BUS resource type.
- Add a helper API to create top-level PCI bus resource managers for each
PCI domain/segment. Host-PCI bridge drivers use this API to allocate
bus numbers from their associated domain.
- Change the PCI bus and CardBus drivers to allocate a bus resource for
their bus number from the parent PCI bridge device.
- Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the
full range of bus numbers from secbus to subbus from their parent bridge.
The drivers also always program their primary bus register. The bridge
drivers also support growing their bus range by extending the bus resource
and updating subbus to match the larger range.
- Add support for managing PCI bus resources to the Host-PCI bridge drivers
used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib).
- Define a PCI_RES_BUS resource type for amd64 and i386.
Reviewed by: imp
MFC after: 1 month
are mostly useful for debugging.
- hw.pci.clear_bars ignores all firmware-assigned ranges for BARs when
set.
- hw.pci.clear_pcib ignores all firmware-assigned ranges for PCI-PCI
bridge I/O windows when set.
MFC after: 1 week
- Store the length of each read-only VPD value since not all values are
guaranteed to be ASCII values (though most are).
- Add a new pciio ioctl to fetch VPD for a single PCI device. The values
are returned as a list of variable length records, one for the device
name and each keyword.
- Add a new -V flag to pciconf's list mode which displays VPD data for
each device.
MFC after: 1 week