If the IOVA mode is not using physical addresses,
no need to log an error about physical address issue.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
The memzone header is often included without good reason.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Exposed VFIO functions simply uses a "vfio" prefix.
Use the proper "rte_vfio" prefix for those symbols.
Fixes: 279b581c897d ("vfio: expose functions")
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Revert back to using VFIO_PRESENT as a marker to enable compilation
of VFIO-related segments.
VFIO_PRESENT is the combination of user configuration RTE_EAL_VFIO and
kernel version support check.
eal_vfio.h VFIO_PRESENT related check ordered to be compatible with
rte_vfio.h one, no functional modification.
Fixes: 279b581c897d ("vfio: expose functions")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
The PCI lib defines the types and methods allowing to use PCI elements.
The PCI bus implements a bus driver for PCI devices by constructing
rte_bus elements using the PCI lib.
Move the relevant code out of the EAL to its expected place.
Libraries, drivers, unit tests and applications are updated to use the
new rte_bus_pci.h header when necessary.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
These symbols are only relevant to PCI operations.
Move them to a private PCI-related header, allowing to remove the
dependency of the PCI subsystem upon private eal_vfio.h.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The following symbols are used by vfio implementations within the PCI bus.
They need to be publicly available for the PCI bus to be outside the
EAL.
+ vfio_enable;
+ vfio_is_enabled;
+ vfio_noiommu_is_enabled;
+ vfio_release_device;
+ vfio_setup_device;
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Some internal configuration elements set by the user on the command line
are necessary outside the EAL, when the PCI bus is detached.
Expose:
+ rte_eal_create_uio_dev
+ rte_eal_has_pci
+ rte_eal_vfio_intr_mode
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This function was previously private to the EAL layer.
Other subsystems requires it, such as the PCI bus.
In order not to force other components to include stdbool, which is
incompatible with several NIC drivers, the return type has
been changed from bool to int.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Remove device reset during application start, the reset for application
exit still there.
Reset in open removed because of following comments:
1- Device reset not completed when VF driver loaded, which cause VF PMD
initialization error.
Adding delay can solve the issue but will increase driver load time.
2- Reset will be issues all devices unconditionally, not very efficient
way.
Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Harish Patil <harish.patil@cavium.com>
Tested-by: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Tested-by: Jingjing Wu <jingjing.wu@intel.com>
Since the functions exported by DPDK EAL on all OS's should be
identical, we should not need separate function version files for each
OS. Therefore move existing version files to the top-level EAL
directory.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The linuxapp and bsdapp interrupt header files are now identical, so
merge them into a common file in common/include.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Remove rte_set_log_level(), rte_get_log_level(),
rte_set_log_type(), and rte_get_log_type().
Also update librte_eal.so version in docuementation.
The LIBABIVER variable in eal has already been modified in
commit f26ab687a74f ("eal: remove Xen dom0 support").
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
When getting group fd from primary process, secondary wasn't storing
the fd anywhere, leading to a (harmless) error message in EAL logs,
and (not so harmless) potential problems when hot-unplugging devices
managed by VFIO in a secondary process.
Fix it by actually storing the group fd whenever we get a valid one
from the secondary process.
Fixes: 94c0776b1bad ("vfio: support hotplug")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
VFIO may be used by buses other than PCI. This patch enables
the VFIO on the basis of vfio root presence.
Since vfio_enable should be called only once, pci_vfio_enable
is also removed.
A debug print is added in case vfio_pci module is not present.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Compile fails when kernel version is <= 3.17 with error:
"dereferencing pointer to incomplete type". This is because struct
uio_device definition is not exposed in kernel earlier than 3.17.
This patch fixes it by using pointer of rte_uio_pci_dev as
dev_id instead of uio_device for irq device handler.
Fixes: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in VM")
Cc: stable@dpdk.org
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
build error:
build/lib/librte_eal/linuxapp/kni/kni_net.c:215:5: error:
‘struct net_device’ has no member named ‘trans_start’
dev->trans_start = jiffies;
Signed-off-by: Nirmoy Das <ndas@suse.de>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
If pass-through a VF by vfio-pci to a Qemu VM, after FLR
in VM, the interrupt setting is not recoverd correctly
to host as below:
in VM guest:
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
in Host:
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
That was because in pci_reset_function, it first reads the
PCI configure and set FLR reset, and then writes PCI configure
as restoration. But not all the writing are successful to Host.
Because vfio-pci driver doesn't allow directly write PCI MSI-X
Cap.
To fix this issue, we need to move the interrupt enablement from
igb_uio probe to open device file. While it is also the similar as
the behaviour in vfio_pci kernel module code.
Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
MSI masks contain a 1 if interrupt is masked, 0 if unmasked.
I got that wrong with the !!state calculation. For better
readability, the mask is now changed like in igbuio_msi_mask_irq.
Fixes: a8ea1e5fb647 ("igb_uio: fix unknown MSI symbols")
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch partially reverts the commit d196343a258e and adds some
functions from Markus' previous version of the patch [1].
igb_uio uses pci_msi_unmask_irq() and pci_msi_mask_irq() kernel APIs
when kernel version is >= 3.19 because these APIs are implemented in
this Linux kernel version.
But these APIs only exported beginning from Linux kernel 4.5, so before
this Linux kernel version igb_uio kernel module is not usable,
and giving following warnings:
"igb_uio: Unknown symbol pci_msi_unmask_irq"
"igb_uio: Unknown symbol pci_msi_mask_irq"
The support for these APIs increased to Linux kernel >= 4.5
For older version of Linux kernel unmask_msi_irq() and mask_msi_irq()
are used but these functions are not exported at all.
Instead of these functions switched back to previous implementation in
igb_uio for MSI-X, and for MSI used igbuio_msi_mask_irq() from [1].
[1]
http://dpdk.org/dev/patchwork/patch/28144/
Fixes: d196343a258e ("igb_uio: use kernel functions for masking MSI-X")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch dynamically selects functions of memcpy at run-time based
on CPU flags that current machine supports. This patch uses function
pointers which are bind to the relative functions at constrctor time.
In addition, AVX512 instructions set would be compiled only if users
config it enabled and the compiler supports it.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
When calibrating the TSC frequency, first, probe the architecture specific
function. If not available, use the existing calibrate scheme.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
With the introduction of IOVA mode, the only blocker to run
with 4KB pages for NICs binding to vfio-pci, is that
RTE_BAD_PHYS_ADDR is not a valid IOVA address.
We can refine this by using VA as IOVA if it's IOVA mode.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
This function can be used to check the role of a specific lcore.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Split pci_vfio_map_resource for primary and secondary processes.
Save all relevant mapping data in primary process to allow
the secondary process to perform mappings.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Normally, command line argument strings are considered immutable, but
SPDK [1] and urdma [2] construct argv arrays to pass to rte_eal_init().
These strings are allocated using malloc() and freed after DPDK
initialization with free(). However, in the case of --file-prefix and
--huge-dir, DPDK takes the pointer to these strings in argv directly. If
a secondary process calls rte_eal_pci_probe() after rte_eal_init()
returns, as is done by SPDK, this causes a use-after-free error because
the strings have been freed by the calling code immediately after
rte_eal_init() returns.
This problem was observed when running SPDK example programs as a
secondary process and causes the secondary processes to fail:
Starting DPDK 16.11.1 initialization...
[ DPDK EAL parameters: identify -c 4 --file-prefix=spdk3260 --base-virtaddr=0x1000000000 --proc-type=auto ]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: SECONDARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL: probe driver: 8086:953 spdk_nvme
EAL: cannot connect to primary process!
EAL: Error - exiting with code: 1
Cause: Requested device 0000:81:00.0 cannot be used
Running strace shows that the file prefix has been zero'd out by the
time that the secondary process attempts to probe the NVMe device.
The use-after-free errors can be easily detected with valgrind:
==8489== Invalid read of size 1
==8489== at 0x4C30D22: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x58DB955: vfprintf (vfprintf.c:1637)
==8489== by 0x59A4685: __vsnprintf_chk (vsnprintf_chk.c:63)
==8489== by 0x59A45E7: __snprintf_chk (snprintf_chk.c:34)
==8489== by 0x1246AB: get_socket_path.constprop.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x124B09: vfio_mp_sync_connect_to_primary (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x123BE4: vfio_get_group_fd.part.1 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x124366: vfio_setup_device (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x126C8A: pci_vfio_map_resource (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x12B115: pci_probe_all_drivers.part.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x12B596: rte_eal_pci_probe (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x11D5B5: spdk_pci_enumerate (pci.c:147)
==8489== Address 0x63f362e is 14 bytes inside a block of size 32 free'd
==8489== at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x11E6FB: spdk_free_args (init.c:136)
==8489== by 0x11EBF5: spdk_env_init (init.c:309)
==8489== by 0x10D2AA: main (identify.c:976)
==8489== Block was alloc'd at
==8489== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x11E7D7: _sprintf_alloc (init.c:76)
==8489== by 0x11EA78: spdk_build_eal_cmdline (init.c:251)
==8489== by 0x11EA78: spdk_env_init (init.c:282)
==8489== by 0x10D2AA: main (identify.c:976)
==8489==
Fix this by using strdup() to create separate memory buffers for these
strings. Note that this patch will cause valgrind to report memory
leaks of these buffers as there is nowhere to free them. Using static
buffers is an option but would make these strings have a fixed maximum
length whereas there is currently no limit defined by the API.
[1] http://spdk.io
[2] https://github.com/zrlio/urdma
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
If mmap fails, it will return the value MAP_FAILED. Checking for this
return code allows us to properly identify mmap failures and report
them as such to the calling function.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.
Related documents are removed or updated, and bump the eal library
version.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
IGB_UIO compilation recently got enabled for ARM64 by default
The igb_uio compilation against ARM64 based stock 4.x (e.g. 4.13)
kernel is giving compilation warnings:
igb_uio.c: In function ‘igbuio_pci_irqcontrol’:
igb_uio.c:115:25: error: implicit declaration of function
‘irq_get_irq_dat ’ [-Werror=implicit-function-declaration]
struct irq_data *irq = irq_get_irq_data(udev->info.irq);
^
igb_uio.c:115:25: error: initialization makes pointer from integer without
a cast [-Werror=int-conversion]
Fixes: d196343a258e ("igb_uio: use kernel functions for masking MSI-X")
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
This is not bugfix, but it's convenient to help developer
to review and maintain the igbuio codes.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This patch adds MSI IRQ mode in a way, that should
also work on older kernel versions. The base for my patch
was an attempt to do this in cf705bc36c which was later
reverted in d8ee82745a. Compilation was tested on Linux 3.2,
4.10 and 4.12.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
For better readability throughout the module, the destruction
order is changed to the exact inverse construction order.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The patch which introduced the usage of pci_alloc_irq_vectors
came after the patch which switched to non-threaded ISR (f0d1896fa1),
but did not use non-threaded ISR, if pci_alloc_irq_vectors
is used.
Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
igb_uio already allocates irqs using pci_alloc_irq_vectors on
recent kernels >= 4.8. The interrupt disable code was not
using the corresponding pci_free_irq_vectors, but the also
deprecated pci_disable_msix, before this fix.
Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Interrupt setup code in igb_uio has to deal with multiple
types of interrupts and kernel versions. This patch moves
the setup and teardown code into own functions, to make
it more readable.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
detect SLE version reverse chronologically as ">=" is being used.
Fixes: 2972254ce163 ("kni: fix build on Suse 12 SP3")
Cc: stable@dpdk.org
Signed-off-by: Nirmoy Das <ndas@suse.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
DPDK has support for both sw and hw mempool and
currently user is limited to use ring_mp_mc pool.
In case user want to use other pool handle,
need to update config RTE_MEMPOOL_OPS_DEFAULT, then
build and run with desired pool handle.
Introducing eal option to override default pool handle.
Now user can override the RTE_MEMPOOL_OPS_DEFAULT by passing
pool handle to eal `--mbuf-pool-ops-name=""`.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
iova autodetection depends on rte_bus_scan result. Result of bus scan will
have updated device_list and each device in that list has its '.kdev' state
updated. That kdrv state used to detect iova mapping mode for that device.
_device_parse() has dependency on rt_bus_scan so,
Below calls moved up in the eal initialization order:
- eal_option_device_parse
- rte_bus_scan
And based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.
Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.
So mode !=2 will be default iova mode (_pa).
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>