If sysfs directory was incorrectly formatted then the vmbus
setup code would leak a directory handle in the error path.
Coverity issue: 302848
Fixes: 831dba47bd ("bus/vmbus: add Hyper-V virtual bus support")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The devargs of a device can be replaced by a newly allocated one
when trying to probe again the same device (multi-process or
multi-ports scenarios). This is breaking some pointer references.
It can be avoided by copying the new content, freeing the new devargs,
and returning the already inserted pointer.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Calling rte_mem_check_dma_mask when memory has not been initialized
yet is wrong. This patch use rte_mem_set_dma_mask instead.
Once memory initialization is done, the dma mask set will be used
for checking memory mapped is within the specified mask.
Fixes: fe822eb8c5 ("bus/pci: use IOVA DMA mask check when setting IOVA mode")
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Current name rte_eal_check_dma_mask does not follow the naming
used in the rest of the file.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
build error:
In function ‘fman_if_init’,
.../drivers/bus/dpaa/base/fman/fman.c:186:2:
error: ‘strncpy’ output may be truncated copying 4095 bytes from a
string of length 4095 [-Werror=stringop-truncation]
strncpy(__if->node_path, dpa_node->full_name, PATH_MAX - 1);
strncpy may result a not null-terminated string,
replaced it with strlcpy
Fixes: 5b22cf7446 ("bus/dpaa: introducing FMan configurations")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
When scanning an already plugged device, the virtual address
of mapped PCI resource in rte_pci_device will be overridden
with 0, that may cause driver does not work correctly.
The fix is not to update any rte_pci_device's field if the being
scanned device's driver is already probed.
Bugzilla ID: 85
Fixes: c752998b5e ("pci: introduce library and driver")
Cc: stable@dpdk.org
Reported-by: Geoffrey Lv <geoffrey.lv@gmail.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Some global variables are defined with generic names, add component name
as prefix to variables to prevent collusion with application variables.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Tianfei Zhang <tianfei.zhang@intel.com>
Some global variables can indeed be static, add static keyword to them.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
In a couple of places we check its error code against -EEXIST,
but this function returned either -1, 0, or 1.
This gets critical when hotplugging a device in secondary
process, while the same device is already plugged in the
primary. Failing to "hotplug" it in the primary will cause
the secondary to fail as well.
Fixes: e9d159c3d5 ("eal: allow probing a device again")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This function is documented to return the number of unregistered
callbacks or negative numbers on error, but pci_vfio checks for
ret != 0 to detect failures. Not anymore.
Fixes: c115fd000c ("vfio: handle hotplug request notifier")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Invoking the right pci read/write functions is based on interrupt
handler type. However, this is not configured for secondary processes
precluding to use those functions.
This patch fixes the issue using the driver name the device is bound
to instead.
Fixes: 632b2d1dee ("eal: provide functions to access PCI config")
Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
On Linux, rte_pci_read_config on success returns the number of read
bytes, but on BSD it returns 0.
Document the return values, and have BSD behave as Linux does.
At least one case (bnx2x PMD) treats 0 as an error, so the change
makes sense also for that.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Currently the code precludes IOVA mode if IOMMU hardware reports
less addressing bits than necessary for full virtual memory range.
Although VT-d emulation currently only supports 39 bits, it could
be iovas for allocated memlory being within that supported range.
This patch allows IOVA mode in such a case adding a call to
rte_eal_check_dma_mask using the reported addressing bits by the
IOMMU hardware.
Indeed, memory initialization code has been modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most systems. And this is likely 100% true for VMs.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Current code checks if IOMMU hardware reports enough addressing
bits for using IOVA mode but it repeats the same check for any
PCI device present. This is not necessary because the IOMMU hardware
is the same for all of them.
This patch only checks the IOMMU using first PCI device found.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
It's not necessary to insert device argment to devargs_list
during bus scan, but this happens when we try to attach a
device on secondary process. The patch fix the issue.
Fixes: cdb068f031 ("bus/vdev: scan by multi-process channel")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
A virtual device can be matched with following syntax:
bus=vdev,name=X
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
This patch fixes an issue caught with ASAN where a vdev_scan()
to a secondary bus was failing to free some memory.
The doxygen comment in EAL is fixed at the same time.
Fixes: cdb068f031 ("bus/vdev: scan by multi-process channel")
Fixes: 783b6e5497 ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Paul Luse <paul.e.luse@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Presence of PA-VA Table is transparent to the drivers. Ignoring the
return values from table update call.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Device bus should be initialized after bus scan.
While it does not happened when scan vdev from secondary process,
that cause segment fault at rte_dev_probe when call dev->bus->xxx.
Fixes: cdb068f031 ("bus/vdev: scan by multi-process channel")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Musl already has PAGE_SIZE defined, and our define clashed with it.
Rename our define to SYS_PAGE_SIZE.
Bugzilla ID: 36
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
We use _GNU_SOURCE all over the place, but often times we miss
defining it, resulting in broken builds on musl. Rather than
fixing every library's and driver's and application's makefile,
fix it by simply defining _GNU_SOURCE by default for all
builds.
Remove all usages of _GNU_SOURCE in source files and makefiles,
and also fixup a couple of instances of using __USE_GNU instead
of _GNU_SOURCE.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
After calling unplug function of a bus, the device is expected
to be freed. It is too late for getting devargs to remove.
Anyway, the buses which implement unplug are already freeing
the devargs, except the PCI bus.
So the call to rte_devargs_remove() is removed from EAL and
added in PCI.
Fixes: 2effa126fb ("devargs: simplify parameters of removal function")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
In the devargs syntax for device representors, it is possible to add
several devices at once: -w dbdf,representor=[0-3]
It will become a more frequent case when introducing wildcards
and ranges in the new devargs syntax.
If a devargs string is provided for probing, and updated with a bigger
range for a new probing, then we do not want it to fail because
part of this range was already probed previously.
There can be new ports to create from an existing rte_device.
That's why the check for an already probed device
is moved as bus responsibility.
In the case of vdev, a global check is kept in insert_vdev(),
assuming that a vdev will always have only one port.
In the case of ifpga and vmbus, already probed devices are checked.
In the case of NXP buses, the probing is done only once (no hotplug),
though a check is added at bus level for consistency.
In the case of PCI, a driver flag is added to allow PMD probing again.
Only the PMD knows the ports attached to one rte_device.
As another consequence of being able to probe in several steps,
the field rte_device.devargs must not be considered as a full
representation of the rte_device, but only the latest probing args.
Anyway, the field rte_device.devargs is used only for probing.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
The function rte_dev_is_probed() is added in order to improve semantic
and enforce proper check of the probing status of a device.
It will answer this rte_device query:
Is it already successfully probed or not?
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
The PCI mapping requires to know the PCI driver to use,
even before the probing is done. That's why the PCI driver is
referenced early inside the PCI device structure. See
commit 1d20a073fa ("bus/pci: reference driver structure before mapping")
However the rte_driver does not need to be referenced in rte_device
before the device probing is done.
By moving back this assignment at the end of the device probing,
it becomes possible to make clear the status of a rte_device.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Rosen Xu <rosen.xu@intel.com>
The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE
and used in the below files.
drivers/bus/pci/linux/pci_vfio.c
drivers/bus/pci/pci_common.c
lib/librte_eal/linuxapp/eal/eal_interrupts.c
However, Except the first file, the change missed to include
<rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined.
This creates runtime following error on vfio-pci mode and
kernel >= 4.0.0 combination.
EAL: [rte_intr_enable] Unknown handle type of fd 95
EAL: [pci_vfio_enable_notifier]Fail to enable req notifier.
EAL: Fail to unregister req notifier handler.
EAL: Error setting up notifier!
EAL: Requested device 0000:07:00.1 cannot be used
Fixes: cda9441996 ("vfio: fix build with Linux < 4.0")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
various field of FD structure was getting reset in scattered
fashion. This patch align them in single macro.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This new mode is available in LX2160 platform. The code
dynamically detect the underlying qbman version and choose
the mode at runtime.
Signed-off-by: Youri Querry <youri.querry_1@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
This patch add the support for new Management Complex
Firmware version to 10.1x.x. One of the main changes in
the APIs ordered queue.
The fslmc bus lib ABI will need to be bumped to reflect
the MC FW API and structure changes.
This will also result in bumping of ABI verion of all dependent
libs as they internally use the MC FW APIs and structures.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
With this patch, fslmc bus and ethernet devices on this bus
would start using the physical-virtual library interfaces.
This patch impacts mempool/dpaa2, event/dpaa2, net/dpaa2,
raw/dpaa2_cmdif and raw/dpaa2_qdma as they are dependent
on the bus/fslmc and thus impact linkage of libraries.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
With this patch, dpaa bus and ethernet devices on this bus
would start using the physical-virtual library interfaces.
This patch impacts mempool/dpaa, event/dpaa and net/dpaa as
they are dependent on the bus/dpaa and thus impact linkage of
libraries.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
In case RTE_LIBRTE_DPAA2_USE_PHYS_IOVA is enabled, only supported
class is RTE_IOVA_PA.
Fixes: f7768afac1 ("bus/fslmc: support dynamic IOVA")
Cc: stable@dpdk.org
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Since the older kernel version do not implement the device request
interface for vfio, so when build on the kernel < v4.0.0, which is
the version begin to add the device request interface, it will
throw the error to show “VFIO_PCI_REQ_IRQ_INDEX” is undeclared.
This patch aim to fix this compile issue by add the macro
“HAVE_VFIO_DEV_REQ_INTERFACE” after checking the kernel version.
Fixes: 0eb8a1c4c7 ("vfio: add request notifier interrupt")
Fixes: c115fd000c ("vfio: handle hotplug request notifier")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
When device is be hot-unplugged, the vfio kernel module will sent req
notifier to request user space to release the allocated resources at
first. After that, vfio kernel module will detect the device disappear,
and then delete the device in kernel.
This patch aim to add req notifier processing to enable hotplug for vfio.
By enable the req notifier monitoring and register the notifier callback,
when device be hot-unplugged, the hot-unplug handler will be called to
process hotplug for vfio.
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
There are some extended interrupt types in vfio pci device except from the
existing interrupts, such as err and req notifier, they could be useful for
device error monitoring. And these corresponding interrupt handler is
different from the other interrupt handler that register in PMDs, so a new
interrupt handler should be added. This patch will add specific req handler
in generic pci device.
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.
A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.
It will make possible to remove a rte_device,
using the function pointer from its bus.
The function rte_bus_find_by_device() becomes useless,
and may be removed later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
The function rte_devargs_remove(), which is intended to be internal,
can take a devargs structure as argument.
The matching is still using string comparison of bus name and
device name.
It is simpler and may allow a different devargs matching in future.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
The enum names are *_params (plural form).
And the items are also using the plural form: *_PARAMS_*.
It looks more natural to use the singular form *_PARAM_* for items.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
We could match devices by their PCI id (vendor id, device id, etc).
But for now, only matching by PCI address is implemented.
The devargs parameter "id" is renamed "addr" to reflect its real meaning.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.
Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Currently, DPDK will skip mapping some areas (or even an entire BAR)
if MSI-X table happens to be in them but is smaller than page size.
Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this
as a capability flag. Capability flags themselves are also only
supported since kernel 4.6 [2].
This commit will introduce support for checking VFIO capabilities,
and will use it to check if we are allowed to map BARs with MSI-X
tables in them, along with backwards compatibility for older
kernels, including a workaround for a variable rename in VFIO
region info structure [3].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The PCI bus can now parse a matching field "id" as follows:
"bus=pci,id=0000:00:00.0"
or
"bus=pci,id=00:00.0"
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Few fields in compat are giving re-defination error
with new drivers such as caam_jr.
Checks have been added.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch add supports in bus driver for qbman to support
and configure portal based FDs, which can be used for interrupt
based processing.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
The probing functions of NXP buses were missing to set
the driver used for successfully probing a device.
The NXP driver and the generic rte_driver are now set
in the device structures.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
The rte_afu_driver is assigned to rte_afu_device.driver during probing.
There is no need of accessing the rte_afu_driver via rte_device.driver
and type casting to its container.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Rosen Xu <rosen.xu@intel.com>
When a secondary process handles VDEV_SCAN_ONE mp action, it is possible
the device is already be inserted. This happens when we have multiple
secondary processes which cause multiple broadcasts from primary during
bus->scan. So we don't need to log any error for -EEXIST.
Bugzilla ID: 84
Fixes: cdb068f031 ("bus/vdev: scan by multi-process channel")
Cc: stable@dpdk.org
Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Gage Eads <gage.eads@intel.com>
This patch removes the forward declaration of rte_pci_remove_device()
method. In the past, this forward decalaration was needed for
rte_pci_detach(), which is now removed from pci_common.c.
Fixes: e690338a7b ("bus/pci: remove unused function to detach by address")
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
This function is not used by netvsc driver yet.
Still the code should handle case where device driver returns
zero (due to rescind).
Coverity issue: 302871
Fixes: 831dba47bd ("bus/vmbus: add Hyper-V virtual bus support")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Use strlcpy rather than strncpy to avoid any issues about
null termination.
Coverity issue 302859
Fixes: 831dba47bd ("bus/vmbus: add Hyper-V virtual bus support")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Fix bug reported by Coverity where directory being scanned was
not closed in error path (leaking file descriptor).
Coverity issue: 302848
Fixes: 831dba47bd ("bus/vmbus: add Hyper-V virtual bus support")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Don't signal host that receive ring has been read until all events
have been processed. This reduces the number of guest exits and
therefore improves performance.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
This reverts commit d4774a568b.
The patch is incomplete because kernel 4.16+, while being capable
of mapping MSI-X BARs, will also report if such a capability is
available. Without checking this capability, gratuitous errors
are displayed on kernels <4.16 while VFIO is attempting to mmap
MSI-X BAR and fails, which can be confusing to the user.
Fixes: d4774a568b ("vfio: fix workaround of BAR mapping")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
This patch fixes a trivial typo in pci_common.c.
Fixes: 23eaa9059e ("bus/pci: use given name as generic name")
Cc: stable@dpdk.org
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Currently, VFIO will try to map around MSI-X table in the BARs. When
MSI-X table (page-aligned) size is equal to (page-aligned) size of BAR,
VFIO will just skip the BAR.
Recent kernel versions will allow VFIO to map the entire BAR containing
MSI-X tables (*), so instead of trying to map around the MSI-X vector
or skipping the BAR entirely if it's not possible, we can now try
mapping the entire BAR first. If mapping the entire BAR doesn't
succeed, fall back to the old behavior of mapping around MSI-X table or
skipping the BAR.
(*): "vfio-pci: Allow mapping MSIX BAR",
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
Fixes: 90a1633b23 ("eal/linux: allow to map BARs with MSI-X tables")
Signed-off-by: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Subroutine to unmap VFIO resource is shared by secondary and
primary, and it does not work on the secondary process. Since
for secondary process, it is not necessary to close interrupt
handler, set pci bus mastering and remove vfio_res from
vfio_res_list. So, the patch adds a dedicate function to handle
the situation when a device is unmapped on a secondary process.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
When use memcmp to compare two PCI address, sizeof(struct rte_pci_addr)
is 4 bytes aligned, and it is 8. While only 7 byte of struct rte_pci_addr
is valid. So compare the 8th byte will cause the unexpected result, which
happens when repeatedly attach/detach a device.
Fixes: 94c0776b1b ("vfio: support hotplug")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The dependency on libuuid is useless because the required code
is embedded in EAL, see commit 6bc67c497a ("eal: add uuid API").
Fixes: 831dba47bd ("bus/vmbus: add Hyper-V virtual bus support")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The driver supports Hyper-V networking directly like
virtio for KVM or vmxnet3 for VMware.
This code is based off of the FreeBSD driver. The file and variable
names are kept the same to help with understanding (with most of the
BSD style warts removed).
This version supports the latest NetVSP 6.1 version and
older versions.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
This patch adds support for an additional bus type Virtual Machine BUS
(VMBUS) on Microsoft Hyper-V in Windows 10, Windows Server 2016
and Azure. Most of this code was extracted from FreeBSD and some of
this is from earlier code donated by Brocade.
Only Linux is supported at present, but the code is split
to allow future FreeBSD and Windows support.
The bus support relies on the uio_hv_generic driver from Linux
kernel 4.16. Multiple queue support requires additional sysfs
interfaces which is in kernel 5.0 (a.k.a 4.17).
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
This patch add support for configurable vdqcr exact flag.
This boost the performance, however this can give
side effects for some extra packet fetch. Which has been
taken care in the patch as well.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Avoid array of fq as packets are dq only from a single q.
Signed-off-by: Sunil Kumar Kori <sunil.kori@nxp.com>
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
The buffer offset was incorrectly being set at 64,
thus not honoring the packet headroom.
Fixes: 6d6b4f49a1 ("bus/dpaa: add FMAN hardware operations")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Otherwise the SVR may not be available for dpaa init.
Fixes: 3b59b73dea ("bus/dpaa: update platform SoC value register routines")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Write combining (WC) increases NIC performance by making better
utilization of PCI bus, but cannot be used by all PMDs.
It will be enabled only if RTE_PCI_DRV_WC_ACTIVATE will be set in
drivers flags. For proper work also igb_uio driver must be loaded with
wc_activate set to 1.
When mapping PCI resources, firstly check if it support WC
and then try to use it.
In case of failure, it will fallback to normal mode.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add pointer to driver structure before calling rte_pci_map_device.
It allows to use driver flags for adjusting configuration.
Signed-off-by: Rafal Kozik <rk@semihalf.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The function rte_pci_detach() is private to PCI and is
not used anywhere in current code base. Remove dead code.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
The function rte_pci_probe_one is private to PCI and is
not used anywhere in current code base. Remove dead code.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Prototype for pci_unbind_kernel_driver exists but no code.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Only used in one file, and therefore can be made static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The DPAA bus driver is defining some macros without prefix.
So it can conflict with other libraries like libbsd:
drivers/bus/dpaa/include/compat.h:53:
error: "__packed" redefined
/usr/include/bsd/sys/cdefs.h:120:
note: this is the location of the previous definition
Fixes: 39f373cf01 ("bus/dpaa: add compatibility and helper macros")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
The function dpdmai_set_tx_queue() is not implemented,
so it is removed from the export map file.
Fixes: 23e8fcb018 ("bus/fslmc: support MC DPDMAI object")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
There are some resource leaks in ifpga_scan_one.
This patch fixes it.
Coverity issue: 279459
Fixes: 05fa3d4a65 ("bus/ifpga: add Intel FPGA bus library")
Cc: stable@dpdk.org
Signed-off-by: Rosen Xu <rosen.xu@intel.com>
The control variable should be afu_dev not dev.
Coverity issue: 279455
Fixes: 05fa3d4a65 ("bus/ifpga: add Intel FPGA bus library")
Cc: stable@dpdk.org
Signed-off-by: Rosen Xu <rosen.xu@intel.com>
A device like failsafe can manage sub-devices.
When removing such device, it removes its sub-devices
and try to take the same vdev_device_list_lock.
It was causing a deadlock because the lock was not recursive.
Fixes: 35f462839b ("bus/vdev: add lock on device list")
Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Matan Azrad <matan@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Variable dri_name is a pointer and it is incorrect to use its
size as the buffer size. Caller knows the buffer size and
it is safer to pass it explicitly.
Fixes: fe5f777b53 ("bus/pci: replace strncpy by strlcpy")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
The actual descriptor for qm_mr_entry is 64-byte aligned.
But the original code plays a trick, and puts a u8 common
to the three descriptor subtypes in the union afterwards
outside their structure definitions.
Unfortunately since they compose a struct qm_fd with
alignment 8, this trick destroys the ability of the compiler
to understand what has happened, resulting in this kind of
problem:
drivers/bus/dpaa/include/fsl_qman.h:354:3: error:
alignment 1 of ‘struct <anonymous>’ is less than 8 [-Werror=packed-not-aligned]
} __packed dcern;
on gcc 8 / Fedora 28 out of the box.
This patch moves the u8 verb into the structure definitions
composed into the union, so the alignment of the parent struct
containing the alignment 8 object can also be seen to be
alignment 8 by the compiler. Uses of .verb are fixed up to use
.ern.verb (the same offset of +0 inside all the structs in
the union).
The final struct layout should be unchanged.
Fixes: c47ff048b9 ("bus/dpaa: add QMAN driver core routines")
Fixes: f6fadc3e63 ("bus/dpaa: add QMAN interface driver")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
In function ‘pci_get_kernel_driver_by_path’,
inlined from ‘pci_scan_one.isra.1’ at
drivers/bus/pci/linux/pci.c:317:8:
drivers/bus/pci/linux/pci.c:57:3: error:
‘strncpy’ specified bound depends on the length of the source argument
[-Werror=stringop-overflow=]
strncpy(dri_name, name + 1, strlen(name + 1) + 1);
Fixes: d9a8cd9595 ("pci: add kernel driver type")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Defined FPGA-BUS for Acceleration Drivers of AFUs
1. FPGA PCI Scan (1st Scan) follows DPDK UIO/VFIO PCI Scan Process,
probe Intel FPGA Rawdev Driver, it will be covered in following patches.
2. AFU Scan(2nd Scan) bind DPDK driver to FPGA Partial-Bitstream.
This scan is trigged by hotplug of IFPGA Rawdev probe, in this scan
the AFUs will be created and their drivers are also probed.
This patch will introduce rte_afu_device which describe the AFU device
listed in the FPGA-BUS.
Signed-off-by: Rosen Xu <rosen.xu@intel.com>
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
fle is already in virtual addressing mode - no need to perform
address conversion for it.
Fixes: 8d1f3a5d75 ("crypto/dpaa2_sec: support crypto operation")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
It may be useful to pass arbitrary data to the callback (such
as device pointers), so add this to the mem event callback API.
Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The DPCI devices have both Tx and Rx queues. Event devices use
DPCI Rx queues only, but CMDIF (AIOP) uses both Tx and Rx queues.
This patch enables Tx queues configuration too.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
'dpdmai' devices detected on fsl-mc bus are represented by DPAA2 QDMA
devices in DPDK.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
With Hotplugging memory support, the order of memseg has been changed
from physically contiguous to virtual contiguous. DPAA bus and drivers
depend on PA to VA address conversion for I/O.
This patch creates a list of blocks requested to be pinned to the
DPAA mempool. For searching physical addresses, it is expected that
it would belong to this list (from hardware pool) and hence it is
less expensive than memseg walks. Though, there is a marginal drop
in performance vis-a-vis the legacy mode with physically contiguous
memsegs.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
With Hotplugging memory support, the order of memseg has been changed
from physically contiguous to virtual contiguous. FSLMC bus and dpaa2
drivers depend on PA to VA address conversion when in Physical
addressing mode.
This patch creates a list of blocks requested to be pinned to the
DPAA2 mempool. For searching physical addresses, it is expected that
it would belong to this list (from hardware pool) and hence it is
less expensive than memseg walks. Though, this has marginal impact on
performance vis-a-vis legacy mode with physically contiguous memsegs.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
If start is set and a device before it matches the data,
this device is returned.
This induces potentially infinite loops.
Fixes: c7fe1eea8a ("bus: simplify finding starting point")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
If start is set, and a device before it matches the data
passed for comparison, then this first device is returned.
This induces potentially infinite loops.
Fixes: c7fe1eea8a ("bus: simplify finding starting point")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
A typical distribution will compile with default config and all
buses enabled. Therefore every driver should be silent and not
log anything for this normal case.
This patch gets rid of these messages when running on basic x86
environment such as bare metal or VM.
fslmc: DPAA2: DPRC not available
fslmc: FSLMC Bus Not Available. Skipping
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
rte_eal_devargs is useless, rte_devargs is sufficient.
Only experimental functions are changed for now.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This list should not be used by drivers.
Use the public API instead.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This list should not be operated upon by drivers.
Use the public API to achieve the same functionalities.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
To scan the vdevs in primary, we send request to primary process
to obtain the names for vdevs.
Only the name is shared from the primary. In probe(), the device
driver is supposed to locate (or request more) the detail
information from the primary.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
As we could add virtual devices from different threads now, we
add a spin lock to protect the vdev device list.
Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
op storage in fle is just for reference for post dq.
So, don't convert it to iova mode.
Fixes: 37f96eb01b ("crypto/dpaa2_sec: support scatter gather")
Cc: stable@dpdk.org
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Meson build currently tracks the dependencies between libraries, which
can often make things easier, but has the side-effect of slowing down
the initial meson run if too many duplicated dependencies are provided.
Therefore, we remove dependencies from the dpaa items where other
dependencies already depend on those. This provides a noticable speed-up
in meson configuration runs when lots of sample apps are included in the
build.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Instead of llX, use C99 standard "PRIu64" in format specifier. Former one
breaks compile in ppc64le.
Fixes: c2c167fdb3 ("bus/fslmc: support memory event callbacks for VFIO")
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch moves some of the internal vfio functions from
eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix.
This patch also change the FSLMC bus usages from the internal
VFIO functions to external ones with "rte_" prefix
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
In case of Receive from Ethernet we add a new pull request (prefetch)
but do not fetch the results from that pull request until next
dequeue operation. This keeps the portal in busy mode.
This patch updates the portals bifurcation to have separate portals
to receive packets for Ethernet and all other devices to use a
common portal.
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
VFIO needs to map and unmap segments for DMA whenever they
become available or unavailable, so register a callback for
memory events, and provide map/unmap functions.
Remove unneeded check for number of segments, as in non-legacy
mode this now becomes a valid scenario.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
fslmc bus needs to map all allocated memory for VFIO before
device probe. This bus doesn't support hotplug, so at the time
of this call, all possible device that could be present, are
present. This will also be the place where we install VFIO
callback, although this change will come in the next patch.
Since rte_fslmc_vfio_dmamap() is now only called at bus probe,
there is no longer any need to check if DMA mappings have been
already done.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.
In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.
So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.
Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.
This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.
On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.
The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.
Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.
[1] http://dpdk.org/dev/patchwork/patch/34002/
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
We already set IOVA addresses of memsegs and memzones to VA
address during initialization, so we don't need to check
whether we're in RTE_IOVA_VA mode anywhere else.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Replace the BSD license header with the SPDX tag for files
with only an RehiveTech copyright on them.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
If start is set and a device before it matches the data,
this device is returned.
Fixes: c7fe1eea8a ("bus: simplify finding starting point")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>