Instead of returning 0xffs some controllers, such as Layerscape generate
an external exception when someone attempts to read any register
of config space of a non-existing device other than PCIR_VENDOR.
This causes a kernel panic.
Fix it by bailing during device enumeration if a device vendor register
returns invalid value. (0xffff)
Use this opportunity to replace some hardcoded values with a macro.
I believe that this change won't have any unintended side-effects since
it is safe to assume that vendor == 0xffff -> hdr_type == 0xffff.
Sponsored by: Alstom
Obtained from: Semihalf
Reviewed by: jhb
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D33059
In a VF's configuration space, "memory space enable" is hard-wired to 0,
so the existing implementation always returns false. We need to read
the SR-IOV control register from the PF device to get the value of the
MSE bit.
Fix pci_bar_enabled() to read this register instead for VFs. I don't
see any way to access the PF's config space without a backpointer in the
pci device ivars, so I added one.
This fixes a regression where bhyve(8) fails to map the MSI-X table
after commit 7fa233534736 ("bhyve: Map the MSI-X table unconditionally
for passthrough") when a VF is passed through, since with that commit we
use PCIOCBARMMAP to map the table and that ioctl always fails for VFs
without this change. As a bonus, pciconf(8) now correctly reports the
enablement of BARs for VFs.
Reported and tested by: Raúl Muñoz <raul.munoz@custos.es>
Reviewed by: rstone, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32839
Linux KPIs like pci_resource_start/len assume that BARs have been
allocated, but FreeBSD lazily allocates BARs if it cannot allocate the
firmware-allocated BARs. Thus using the Linux KPIs must force allocation
of the BARs rather than returning 0 for the start and length, which can
crash drm-kmod drivers that assume the BARs are valid. This is needed
for the AMDGPU driver to be able to attach on SiFive's HiFive Unmatched.
Reviewed by: hselasky, jhb, mav
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32447
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
Usually on boot the MPS is already configured by BIOS. But we've
found that on hot-plug it is not true at least for our Supermicro
X11 boards. As result, mismatch between root's configuration of
256 bytes and device's default of 128 bytes cause problems for some
devices, while others seem to work fine.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
When debugging leaked MSI/MSI-X vectors through LinuxKPI I found
the informational printf unhelpful. Rather than just stating we
leaked also tell how many MSI or MSI-X vectors we leak.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: jhb
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29394
pci_find_class_from help finding one or multiple device matching
a class and subclass.
If the from argument is not null we will first loop in the device list
until we find the matching device and only then start to check if the
class/subclass matches.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D27549
This is mostly needed for a common arm64/amd64 iommu code.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26587
This function isn't ACPI dependent and we may use it on FDT systems
as well.
o Don't repeat the function declaration, include iommu.h instead.
Reviewed by: andrew, kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26584
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
The only thing this tunable enables now is reporting to ACPI _OSC that
Active State Power Management and Clock Power Management Capability are
"supported" by the OS.
I've found that at least some Supermicro server boards do not allow OS
to support native PCIe hot-plug unless it reports those capabilities.
After spending significant time in PCIe specs I have found very little
motivation for that, and none of it applies to those motherboards, not
enabling ASPM themselves. So unless OS explicitly wants to save power,
I see nothing for it to do there actually.
I guess it may get sense to support ASPM when we get Thunderbolt support.
Otherwise I have no system with PCIe hot-plug where power saving matters.
It would be nice to enable this by default, but I worry that it affect
power saving of some laptops, even though I haven't noticed that myself.
PCI bus driver restores most but not all of a child PCI-PCI bridge
configuration. The bridge's I/O windows are restored by pcib driver and
that happens later in time. This can be problematic because the Command
register is restored before the windows are restored. If the firmware
programs the windows incorrectly or even does not program them at all,
then the bridge can start claiming I/O cycles that are not intended for
it. This will continue until the correct windows are restored.
I have observed this problem with a buggy BIOS where after resuming from
S3 an I/O port window of a PCI-PCI bridge was configured with zero base
and limit causing the bridge to claim 0x0 - 0xFFF port range. That
interfered with ACPI port access including ACPI PM Timer at port 0x808,
thus wreaking havoc in the time keeping.
The solution is to restore the Command register of PCI-PCI bridges after
the windows are restored in pcib driver. While here, I decided that for
other PCI device types (normal and cardbus) it's better to restore the
Command register after their BARs are restored.
To do: per jhb's suggestion, move the window handling to pci driver.
Reviewed by: imp, jhb, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25028
The warning used to be displayed for valid VPDs about 512B or above in
size. Fix the size check and add a break while here so that the routine
stops if if detects any problem.
Tested with "pciconf -lV"
Reviewed by: kib@, jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D23679
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Reviewed by: imp, kib
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23633
First reported against ESXi 5.0, PCI passthrough was not working due to
MSI-X issues. However, this issue was fixed via patch releases against
ESXi 5.5 and 6.0 in 2016. Given ESXi 5.5 and earlier have been EOL, this
patch removes the VMware MSI-X blacklist entries in the quirk table.
PR: 203874
Reviewed by: imp, jhb
MFC after: 1 month
Sponsored by: VMware
Differential Revision: https://reviews.freebsd.org/D22819
Move the locking back into the ioctl handler. This "fixes" the race where we hve
a hot plug event just after the dropping of Giant in pci_find_dbsf, assuming the
driver doesn't then call anything that drops and picks up Giant again... It's a
little safer since don't think it doesn't, but we lack the tools to know for
sure.
The /dev/pci device doesn't need GIANT, per se. However, one routine
that it calls, pci_find_dbsf implicitly does. It walks a list that can
change when PCI scans a new bus. With hotplug, this means we could
have a race with that scanning. To prevent that, take out Giant around
scanning the list.
However, given that we have places in the tree that drop giant, if
held when we call into them, the whole use of Giant to protect newbus
may be less effective that we desire, so add a comment about why we're
talking it out, and we'll address the issue when we lock newbus with
something other than Giant.
table VCTRL registers.
Unconditionally program the MSI-X vector control Mask field for MSI-X
table entries without regarud for Mask's previous value. Some devices
return all zeros on reads of the VCTRL registers, which would cause us
to skip disabling interrupts. This fixes the Samsung SM961/PM961 SSDs
which are return zero starting from offset 0x3084 within the memory
region specified by BAR0, even when they are active MSI-X vectors.
The Illumos kernel writes these unconditionally to 0 or 1. However,
section 6.8.2.9 of the PCI Local Bus 3.0 spec (dated Feb 3, 2004)
states for bits 31::01:
After reset, the state of these bits must be 0. However, for
potential future use, software must preserve the value of
these reserved bits when modifying the value of other Vector
Control bits. If software modifies the value of these reserved
bits, the result is undefined."
so we always set or clear the Mask bit, but otherwise preserves the
old value.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
Reviewed By: imp, jhb
Submitted by: Ka Ho Ng
MFC After: 1 week
Differential Revision: https://reviews.freebsd.org/D20873
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
For PCI device (i.e. child of a PCI bus), reset tries FLR if
implemented and worked, and falls to power reset otherwise.
For PCIe bus (child of a PCIe bridge or root port), reset
disables PCIe link and then re-trains it, performing what is known as
link-level reset.
Reviewed by: imp (previous version), jhb (previous version)
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19646
When pci_realloc_bars was first added, the intention was to eventually
enable it by default, but it was left disabled to preserve existing
behavior. The setting is pretty conservative in that it does not
attempt to allocate resources for BARs that the BIOS/firmware leaves
disabled. It only attempts to reallocate resources for a BAR that the
firmware programmed during boot but that conflicts with another
resource during the kernel's device scan.
PR 221350 is an example of a machine that this knob fixes.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D18965
The goal of this change is to fix a problem with PCI shared interrupts
during suspend and resume.
I have observed a couple of variations of the following scenario.
Devices A and B are on the same PCI bus and share the same interrupt.
Device A's driver is suspended first and the device is powered down.
Device B generates an interrupt. Interrupt handlers of both drivers are
called. Device A's interrupt handler accesses registers of the powered
down device and gets back bogus values (I assume all 0xff). That data is
interpreted as interrupt status bits, etc. So, the interrupt handler
gets confused and may produce some noise or enter an infinite loop, etc.
This change affects only PCI devices. The pci(4) bus driver marks a
child's interrupt handler as suspended after the child's suspend method
is called and before the device is powered down. This is done only for
traditional PCI interrupts, because only they can be shared.
At the moment the change is only for x86.
Notable changes in core subsystems / interfaces:
- BUS_SUSPEND_INTR and BUS_RESUME_INTR methods are added to bus
interface along with convenience functions bus_suspend_intr and
bus_resume_intr;
- rman_set_irq_cookie and rman_get_irq_cookie functions are added to
provide a way to associate an interrupt resource with an interrupt
cookie;
- intr_event_suspend_handler and intr_event_resume_handler functions
are added to the MI interrupt handler interface.
I added two new interrupt handler flags, IH_SUSP and IH_CHANGED, to
implement the new intr_event functions. IH_SUSP marks a suspended
interrupt handler. IH_CHANGED is used to implement a barrier that
ensures that a change to the interrupt handler's state is visible
to future interrupts.
While there, I fixed some whitespace issues in comments and changed a
couple of logically boolean variables to be bool.
MFC after: 1 month (maybe)
Differential Revision: https://reviews.freebsd.org/D15755
Mark some buses as BUS_PASS_BUS, and some resources as BUS_PASS_RESOURCE.
This also decouples some resource attachment orderings from being races by
device tree ordering, instead relying on the bus pass to provide the
ordering.
This was originally intended to support multipass suspend/resume, but it's
also needed on PowerMacs when using fdt, as the device tree seems to get
created in reverse of the OFW tree.
Reviewed by: nwhitehorn (long ago)
Differential Revision: https://reviews.freebsd.org/D918
This is very primitive code to inspect the PCI error state and AER
error state, dump the log and clear errors, from ddb.
pci_print_faulted_dev() is made external to allow calling it from
other places. It was called from NMI handler but this chunk is not
included.
Also there is a tunable-controlled code to clear AER on device attach,
disabled by default.
All this code was useful to me when I debugged ACPI_DMAR failures (not
faults) long time ago.
Reviewed by: cem, imp (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7813
VirtIO V1 provides configuration in multiple VENDOR capabilities so this
allows all of the configuration to be discovered.
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14325
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.
Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
This allows one to specify, for example, that if there's an igb card
in bus 12, slot 0, function 0, it should be assigned igb5. If there
isn't, or there's one in a different slot, normal numbering rules
apply (hinted units are skipped). Adding 'hint.igb.5.at="pci12:0:0"'
or 'hint.igb.5.at="pci0:12:0:0"' to /boot/device.hints will accomplish
this. The double quotes are important.
The kernel only accepts the strings (in shell notation):
pci$d:$b:$s:$f
and pci$b:$s:$f
where $d is the pci domain, $b is the pci bus number, $s is the slot
number and $f is the function number. A string compare is done with
the current device to avoid another string parser in the kernel. All
numbers are unsigned decimal without leading zeros.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13546
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
According to the PCI Local Specification rev. 3.0 in case of a 64-bit
BAR both the low and the high parts of the register should be set to
~0 before attempting to read back the size.
So far I have found no single device that has problems with the
previous approach, but I think it's better to stay on the safe size.
This commit should not introduce any functional change.
MFC after: 3 weeks
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D11750
Replace archaic "busses" with modern form "buses."
Intentionally excluded:
* Old/random drivers I didn't recognize
* Old hardware in general
* Use of "busses" in code as identifiers
No functional change.
http://grammarist.com/spelling/buses-busses/
PR: 216099
Reported by: bltsrc at mail.ru
Sponsored by: Dell EMC Isilon
This patch solves IRQ generation problems using the mlx5en(4) driver
with xenserver v6.5.0 in SRIOV and PCI-passthrough modes.
Until further the hw.pci.msix_rewrite_table quirk must be set manually
in /boot/loader.conf .
Reviewed by: jhb @
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
It's unsafe to update the BAR when the related EN bit is set.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7914
During a bus rescan the check for an invalid vendor ID of a subfunction
used the wrong constant.
Submitted by: Dexuan Cui <decui@microsoft.com>
MFC after: 3 days
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Reviewed by: imp, wblock (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7751
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7667