98 Commits

Author SHA1 Message Date
Michal Krawczyk
8108393d98 vfio: fix truncated BAR offset for 32-bit
When 32-bit application is built on 64-bit system it is possible that
the offset of the resource is outside of the 32-bit value.

The problem with the unsigned long is, that it is 32-bit and not 64-bit
when using armhf compiler. Although the system is returning u64 value,
we are losing it's value if it's higher than 32-bit in the conversion
process. It can further cause mmap to fail due to offset being 0 or to
map not intended memory region.

To make it more portable, the uint64_t value is now being used for
storing offset instead of unsigned long. The size of being 32-bit seems
to be fine as the 32-bit application won't be able to access bigger
memory and it is further converted to size_t anyway. But for better
readability and to be consistent, it's type was changed to size_t as
well.

Fixes: 0205f873557c ("vfio: fix overflow of BAR region offset and size")
Cc: stable@dpdk.org

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-26 17:30:17 +02:00
David Marchand
e02b661b51 bus/pci: check IO permissions for UIO only
On x86, calling inb/outb special instructions (used in UIO ioport
read/write parts) is only possible if the right IO permissions has been
granted.

The only user of this API (the net/virtio pmd) checks this
unconditionnaly but this should be hidden by the rte_pci_ioport API
itself and only checked when the device is bound to a UIO driver.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-10-25 11:54:10 +02:00
Seth Howell
c345c7d1ac bus/pci: remove useless link dependency on ethdev
The makefile in drivers/bus/pci specified rte_ethdev as a dependency for
the library. However there are no actual symbols from librte_ethdev used
in librte_bus_pci.

Including librte_ethdev as a dependency only becomes a problem in some
niche cases like when attempting to build the rte_bus_pci library as a
shared object without building the rte_ethdev library.

I specifically ran into this when trying to build the DPDK included as
an SPDK submodule on a FreeBSD machine. I figure that since there are no
real dependencies between the two, we should enable building
librte_bus_pci without librte_ethdev.

Fixes: c752998b5e2e ("pci: introduce library and driver")
Cc: stable@dpdk.org

Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-10-25 10:51:15 +02:00
Stephen Hemminger
2e8d5cf763 bus/pci: fix Intel IOMMU sysfs access check
Just open the sysfs file and handle failure, rather than using access().
This eliminates Coverity warnings about TOCTOU
"time of check versus time of use"; although for this sysfs file that is
not really an issue anyway.

Coverity issue: 347276
Fixes: 54a328f552ff ("bus/pci: forbid IOVA mode if IOMMU address width too small")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-09 10:22:45 +02:00
David Marchand
8ac3591694 remove useless include of EAL memory config header
Restrict this header inclusion to its real users.

Fixes: 028669bc9f0d ("eal: hide shared memory config")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-09 10:22:24 +02:00
David Marchand
66d3724b2c bus/pci: always check IOMMU capabilities
IOMMU capabilities won't change and must be checked even if no PCI device
seem to be supported yet when EAL initialised.

This is to accommodate with SPDK that registers its drivers after
rte_eal_init(), especially on PPC platform where the IOMMU does not
support VA.

Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Takeshi Yoshimura <tyos@jp.ibm.com>
2019-08-05 12:08:15 +02:00
David Marchand
b62f3aff91 bus/pci: remove unused x86 Linux constant
This macro is unused after a previous fix.

Fixes: fe822eb8c565 ("bus/pci: use IOVA DMA mask check when setting IOVA mode")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-08-05 12:02:20 +02:00
Nithin Dabilpuram
33543fb3b6 vfio: revert interrupt eventfd setup at probe
This reverts commit 89aac60e0be9ed95a87b16e3595f102f9faaffb4.
"vfio: fix interrupts race condition"

The above mentioned commit moves the interrupt's eventfd setup
to probe time but only enables one interrupt for all types of
interrupt handles i.e VFIO_MSI, VFIO_LEGACY, VFIO_MSIX, UIO.
It works fine with default case but breaks below cases specifically
for MSIX based interrupt handles.

* Applications like l3fwd-power that request rxq interrupts
  while ethdev setup.
* Drivers that need > 1 MSIx interrupts to be configured for
  functionality to work.

VFIO PCI for MSIx expects all the possible vectors to be setup up
when using VFIO_IRQ_SET_ACTION_TRIGGER so that they can be
allocated from kernel pci subsystem. Only way to increase the number
of vectors later is first free all by using VFIO_IRQ_SET_DATA_NONE
with action trigger and then enable new vector count.

Above commit changes the behavior of rte_intr_[enable|disable] to
only mask and unmask unlike earlier behavior and thereby
breaking above two scenarios.

Fixes: 89aac60e0be9 ("vfio: fix interrupts race condition")
Cc: stable@dpdk.org

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-23 12:00:14 +02:00
Jerin Jacob
d622cad892 bus/pci: change IOVA as VA flag name
In order to align name with other PCI driver flag such as
RTE_PCI_DRV_NEED_MAPPING and to reflect its purpose, change
RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-07-22 17:46:32 +02:00
David Marchand
b76fafb174 eal: fix IOVA mode selection as VA for PCI drivers
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).

The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.

We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.

Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.

Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-07-22 17:45:52 +02:00
David Marchand
62f8f5ace5 bus/pci: remove Mellanox kernel driver type
This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2.

The PCI bus now reports DC when faced with a device bound to an unknown
driver and, in such a case, the IOVA mode is selected against physical
address availability.

As a consequence, there is no reason for this special case for Mellanox
drivers.

Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
2019-07-22 17:44:08 +02:00
David Marchand
89aac60e0b vfio: fix interrupts race condition
Populating the eventfd in rte_intr_enable in each request to vfio
triggers a reconfiguration of the interrupt handler on the kernel side.
The problem is that rte_intr_enable is often used to re-enable masked
interrupts from drivers interrupt handlers.

This reconfiguration leaves a window during which a device could send
an interrupt and then the kernel logs this (unsolicited from the kernel
point of view) interrupt:
[158764.159833] do_IRQ: 9.34 No irq handler for vector

VFIO api makes it possible to set the fd at setup time.
Make use of this and then we only need to ask for masking/unmasking
legacy interrupts and we have nothing to do for MSI/MSIX.

"rxtx" interrupts are left untouched but are most likely subject to the
same issue.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1654824
Fixes: 5c782b3928b8 ("vfio: interrupts")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
2019-07-10 18:53:47 +02:00
Anatoly Burakov
028669bc9f eal: hide shared memory config
Now that everything that has ever accessed the shared memory
config is doing so through the public API's, we can make it
internal. Since we're removing quite a few headers from
rte_eal_memconfig.h, we need to add them back in places
where this header is used.

This bumps the ABI, so also change all build files and make
update documentation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-06 10:32:34 +02:00
Ben Walker
703458e19c bus/pci: consider only usable devices for IOVA mode
When selecting the preferred IOVA mode of the pci bus, the current
heuristic ("are devices bound?", "are devices bound to UIO?", "are pmd
drivers supporting IOVA as VA?" etc..) should honor the device
white/blacklist so that an unwanted device does not impact the decision.

There is no reason to consider a device which has no driver available.

This applies to all OS, so implements this in common code then call a
OS specific callback.

On Linux side:
- the VFIO special considerations should be evaluated only if VFIO
  support is built,
- there is no strong requirement on using VA rather than PA if a driver
  supports VA, so defaulting to DC in such a case.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-07-05 16:56:00 +02:00
Stephen Hemminger
c530aa78e3 bus/pci: fix TOCTOU for sysfs access
Using access followed by open causes a static analysis warning
about Time of check versus Time of use. Also, access() and
open() have different UID permission checks.

This is not a serious problem; but easy to fix by using errno instead.

Coverity issue: 300870
Fixes: 4a928ef9f611 ("bus/pci: enable write combining during mapping")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-06-14 16:33:56 +09:00
Yongseok Koh
0cb86518db bus/pci: add Mellanox kernel driver type
When checking RTE_PCI_DRV_IOVA_AS_VA flag to determine IOVA mode,
pci_one_device_has_iova_va() returns true only if kernel driver of the
device is vfio. However, Mellanox mlx4/5 PMD doesn't need to be detached
from kernel driver and attached to VFIO/UIO. Control path still goes
through the existing kernel driver, which is mlx4_core/mlx5_core. In order
to make RTE_PCI_DRV_IOVA_AS_VA effective for mlx4/mlx5 PMD, a new kernel
driver type has to be introduced.

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2019-06-04 00:33:06 +02:00
Bruce Richardson
adf93ca564 build: increase readability via shortcut variables
Define variables for "is_linux", "is_freebsd" and "is_windows"
to make the code shorter for comparisons and more readable.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-04-17 18:09:52 +02:00
Bruce Richardson
6723c0fc72 replace snprintf with strlcpy
Do a global replace of snprintf(..."%s",...) with strlcpy, adding in the
rte_string_fns.h header if needed.  The function changes in this patch were
auto-generated via command:

  spatch --sp-file devtools/cocci/strlcpy.cocci --dir . --in-place

and then the files edited using awk to add in the missing header:

  gawk -i inplace '/include <rte_/ && ! seen { \
  	print "#include <rte_string_fns.h>"; seen=1} {print}'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-04-04 22:46:05 +02:00
Bruce Richardson
f9acaf84e9 replace snprintf with strlcpy without adding extra include
For files that already have rte_string_fns.h included in them, we can
do a straight replacement of snprintf(..."%s",...) with strlcpy. The
changes in this patch were auto-generated via command:

spatch --sp-file devtools/cocci/strlcpy-with-header.cocci --dir . --in-place

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-04-04 22:45:54 +02:00
David Marchand
27893e4eee drivers: remove Linux EAL from include path
None of those drivers require EAL linux specific headers.

Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-04-04 22:06:16 +02:00
Shahaf Shuler
c33a675b62 bus: introduce device level DMA memory mapping
The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:48:56 +01:00
Bruce Richardson
5fbc1d498f build/freebsd: rename macro BSDPAPP to FREEBSD
Rename the macro and all instances in DPDK code, but keep a copy of
the old macro defined for legacy code linking against DPDK

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-03-12 23:01:14 +01:00
Bruce Richardson
742bde12f3 build/linux: rename macro from LINUXAPP to LINUX
Rename the macro to make things shorter and more comprehensible. For
both meson and make builds, keep the old macro around for backward
compatibility.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-03-12 17:31:22 +01:00
Alejandro Lucero
43f2b3d250 vfio: fix error message
The message refers to uio driver.

Fixes: ff0b67d1c868 ("vfio: DMA mapping")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-23 22:49:11 +01:00
Tone Zhang
9cea8774cf vfio: support 64KB kernel page size
With a larger PAGE_SIZE it is possible for the MSI table to very
close to the end of the BAR s.t. when we align the start and end
of the MSI table to the PAGE_SIZE, the end offset of the MSI
table is out of the PCI BAR boundary.

This patch addresses the issue by comparing both the start and the
end offset of the MSI table with the BAR size, and skip the mapping
if it is out of Bar scope.

The patch fixes the debug log as below:
EAL: Skipping BAR0

Signed-off-by: Tone Zhang <tone.zhang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 00:12:20 +01:00
Darek Stojaczyk
047e3f9f2a vfio: do not needlessly setup device in secondary process
Setting up a device that wasn't setup in the primary
process will possibly break the primary process. That's
because the IPC message to retrieve the group fd in the
primary will also *open* that group if it wasn't opened
before. Even though the secondary process closes that fd
soon after as a part of its error handling path, the
primary process leaks it.

What's worse, opening that fd on the primary will
increment the process-local counter of opened groups.
If it was 0 before, then the group will never be added
to the vfio container, nor dpdk memory will be ever
mapped.

This patch moves the proper error checks earlier in the
code to fully prevent setting up devices in secondary
processes that weren't setup in the primary process.

Fixes: 2f4adfad0a69 ("vfio: add multiprocess support")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-25 13:09:05 +01:00
Ferruh Yigit
d3110b124a bus/pci: fix allocation of device path
The pci_resource_by_index called strlen() on uninitialized
memory which would lead to the wrong size of memory allocated
for the path portion of the resource map. This would either cause
excessively large allocation, or worse memory corruption.

Coverity issue: 300868
Fixes: ea9d56226e72 ("pci: introduce function to map uio resource by index")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-11-25 11:51:11 +01:00
Thomas Monjalon
b91bc6f357 vfio: fix build with Linux < 4.0
drivers/bus/pci/linux/pci_vfio.c:45:23: error:
‘failure_handle_lock’ defined but not used

Fixes: 8ffe73865124 ("vfio: add lock for hot-unplug failure handler")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-18 22:31:30 +01:00
Jeff Guo
8ffe738651 vfio: add lock for hot-unplug failure handler
When the sigbus handler be enabled for hot-unplug, whatever hot-unplug
sigbus or origin sigbus occur, the sigbus handler will be invoked and
it will access the bus and device. While in the control path, the vfio
req handler also will process the bus and device, so a protection of
the resources in vfio req handler should be need. This patch add a lock
in vfio req handler when process bus and device resource, to avoid the
synchronization issue when device be hot-unplugged.

Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
2018-11-18 17:16:42 +01:00
Fan Zhang
a38eafedda bus/pci: fix config r/w access
The recent change to rte_pci_read/write_config() missed
uio_pci_generic case.

Fixes: 630deed612ca ("bus/pci: compare kernel driver instead of interrupt handler")
Cc: stable@dpdk.org

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-11-06 02:11:25 +01:00
Alejandro Lucero
ec20068713 bus/pci: avoid call to DMA mask check
Calling rte_mem_check_dma_mask when memory has not been initialized
yet is wrong. This patch use rte_mem_set_dma_mask instead.

Once memory initialization is done, the dma mask set will be used
for checking memory mapped is within the specified mask.

Fixes: fe822eb8c565 ("bus/pci: use IOVA DMA mask check when setting IOVA mode")

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:02:08 +01:00
Alejandro Lucero
0de9eb6138 mem: rename DMA mask check with proper prefix
Current name rte_eal_check_dma_mask does not follow the naming
used in the rest of the file.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-11-05 01:01:54 +01:00
Qi Zhang
55e411b301 bus/pci: fix resource mapping override
When scanning an already plugged device, the virtual address
of mapped PCI resource in rte_pci_device will be overridden
with 0, that may cause driver does not work correctly.
The fix is not to update any rte_pci_device's field if the being
scanned device's driver is already probed.

Bugzilla ID: 85
Fixes: c752998b5e2e ("pci: introduce library and driver")
Cc: stable@dpdk.org

Reported-by: Geoffrey Lv <geoffrey.lv@gmail.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-10-31 19:43:34 +01:00
Darek Stojaczyk
3f2ef27972 bus/pci: propagate probing error codes
In a couple of places we check its error code against -EEXIST,
but this function returned either -1, 0, or 1.

This gets critical when hotplugging a device in secondary
process, while the same device is already plugged in the
primary. Failing to "hotplug" it in the primary will cause
the secondary to fail as well.

Fixes: e9d159c3d534 ("eal: allow probing a device again")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-29 01:59:48 +01:00
Darek Stojaczyk
d59ba0296e vfio: fix interrupt unregister for hotplug notifier
This function is documented to return the number of unregistered
callbacks or negative numbers on error, but pci_vfio checks for
ret != 0 to detect failures. Not anymore.

Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:59:48 +01:00
Alejandro Lucero
630deed612 bus/pci: compare kernel driver instead of interrupt handler
Invoking the right pci read/write functions is based on interrupt
handler type. However, this is not configured for secondary processes
precluding to use those functions.

This patch fixes the issue using the driver name the device is bound
to instead.

Fixes: 632b2d1deeed ("eal: provide functions to access PCI config")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-29 01:02:32 +01:00
Luca Boccassi
e8d435f1f3 bus/pci: harmonize return value of config read
On Linux, rte_pci_read_config on success returns the number of read
bytes, but on BSD it returns 0.
Document the return values, and have BSD behave as Linux does.

At least one case (bnx2x PMD) treats 0 as an error, so the change
makes sense also for that.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-29 00:32:14 +01:00
Alejandro Lucero
fe822eb8c5 bus/pci: use IOVA DMA mask check when setting IOVA mode
Currently the code precludes IOVA mode if IOMMU hardware reports
less addressing bits than necessary for full virtual memory range.

Although VT-d emulation currently only supports 39 bits, it could
be iovas for allocated memlory being within that supported range.
This patch allows IOVA mode in such a case adding a call to
rte_eal_check_dma_mask using the reported addressing bits by the
IOMMU hardware.

Indeed, memory initialization code has been modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most systems. And this is likely 100% true for VMs.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-28 22:06:33 +01:00
Alejandro Lucero
f74d50a7df bus/pci: check IOMMU addressing limitation just once
Current code checks if IOMMU hardware reports enough addressing
bits for using IOVA mode but it repeats the same check for any
PCI device present. This is not necessary because the IOMMU hardware
is the same for all of them.

This patch only checks the IOMMU using first PCI device found.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-28 22:06:15 +01:00
Anatoly Burakov
edd035d241 vfio: improve musl compatibility
Musl already has PAGE_SIZE defined, and our define clashed with it.
Rename our define to SYS_PAGE_SIZE.

Bugzilla ID: 36

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-22 11:28:35 +02:00
Anatoly Burakov
5d7b673d5f mk: build with _GNU_SOURCE defined by default
We use _GNU_SOURCE all over the place, but often times we miss
defining it, resulting in broken builds on musl. Rather than
fixing every library's and driver's and application's makefile,
fix it by simply defining _GNU_SOURCE by default for all
builds.

Remove all usages of _GNU_SOURCE in source files and makefiles,
and also fixup a couple of instances of using __USE_GNU instead
of _GNU_SOURCE.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-22 11:28:27 +02:00
Thomas Monjalon
739e13bcc9 devargs: fix freeing during device removal
After calling unplug function of a bus, the device is expected
to be freed. It is too late for getting devargs to remove.
Anyway, the buses which implement unplug are already freeing
the devargs, except the PCI bus.
So the call to rte_devargs_remove() is removed from EAL and
added in PCI.

Fixes: 2effa126fbd8 ("devargs: simplify parameters of removal function")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-10-19 22:37:10 +02:00
Thomas Monjalon
e9d159c3d5 eal: allow probing a device again
In the devargs syntax for device representors, it is possible to add
several devices at once: -w dbdf,representor=[0-3]
It will become a more frequent case when introducing wildcards
and ranges in the new devargs syntax.

If a devargs string is provided for probing, and updated with a bigger
range for a new probing, then we do not want it to fail because
part of this range was already probed previously.
There can be new ports to create from an existing rte_device.

That's why the check for an already probed device
is moved as bus responsibility.
In the case of vdev, a global check is kept in insert_vdev(),
assuming that a vdev will always have only one port.
In the case of ifpga and vmbus, already probed devices are checked.
In the case of NXP buses, the probing is done only once (no hotplug),
though a check is added at bus level for consistency.
In the case of PCI, a driver flag is added to allow PMD probing again.
Only the PMD knows the ports attached to one rte_device.

As another consequence of being able to probe in several steps,
the field rte_device.devargs must not be considered as a full
representation of the rte_device, but only the latest probing args.
Anyway, the field rte_device.devargs is used only for probing.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-18 01:49:52 +02:00
Thomas Monjalon
52897e7e70 eal: add function to query device status
The function rte_dev_is_probed() is added in order to improve semantic
and enforce proper check of the probing status of a device.

It will answer this rte_device query:
Is it already successfully probed or not?

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
2018-10-18 01:49:28 +02:00
Thomas Monjalon
391797f042 drivers/bus: move driver assignment to end of probing
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa5e ("bus/pci: reference driver structure before mapping")

However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
2018-10-17 10:26:59 +02:00
Jerin Jacob
94d7265976 vfio: fix missing header inclusion
The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE
and used in the below files.

drivers/bus/pci/linux/pci_vfio.c
drivers/bus/pci/pci_common.c
lib/librte_eal/linuxapp/eal/eal_interrupts.c

However, Except the first file, the change missed to include
<rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined.
This creates runtime following error on vfio-pci mode and
kernel >= 4.0.0 combination.

EAL: [rte_intr_enable] Unknown handle type of fd 95
EAL: [pci_vfio_enable_notifier]Fail to enable req notifier.
EAL: Fail to unregister req notifier handler.
EAL: Error setting up notifier!
EAL: Requested device 0000:07:00.1 cannot be used

Fixes: cda94419964f ("vfio: fix build with Linux < 4.0")

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
2018-10-17 10:16:18 +02:00
Jeff Guo
cda9441996 vfio: fix build with Linux < 4.0
Since the older kernel version do not implement the device request
interface for vfio, so when build on the kernel < v4.0.0, which is
the version begin to add the device request interface, it will
throw the error to show “VFIO_PCI_REQ_IRQ_INDEX” is undeclared.
This patch aim to fix this compile issue by add the macro
“HAVE_VFIO_DEV_REQ_INTERFACE” after checking the kernel version.

Fixes: 0eb8a1c4c786 ("vfio: add request notifier interrupt")
Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-10-16 14:54:25 +02:00
Jeff Guo
c115fd000c vfio: handle hotplug request notifier
When device is be hot-unplugged, the vfio kernel module will sent req
notifier to request user space to release the allocated resources at
first. After that, vfio kernel module will detect the device disappear,
and then delete the device in kernel.

This patch aim to add req notifier processing to enable hotplug for vfio.
By enable the req notifier monitoring and register the notifier callback,
when device be hot-unplugged, the hot-unplug handler will be called to
process hotplug for vfio.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-15 23:42:15 +02:00
Jeff Guo
44c976236e bus/pci: add VFIO request interrupt handle to device
There are some extended interrupt types in vfio pci device except from the
existing interrupts, such as err and req notifier, they could be useful for
device error monitoring. And these corresponding interrupt handler is
different from the other interrupt handler that register in PMDs, so a new
interrupt handler should be added. This patch will add specific req handler
in generic pci device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-15 23:13:31 +02:00
Jeff Guo
5c96a29934 bus/pci: support sigbus handler
This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-10-15 22:17:26 +02:00