1296 Commits

Author SHA1 Message Date
Thomas Monjalon
87607f45bd version: 17.11-rc1
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2017-10-14 01:29:59 +02:00
Nirmoy Das
47b1119fd1 kni: fix build on SLE12 SP3
build error:
build/lib/librte_eal/linuxapp/kni/kni_net.c:215:5: error:
‘struct net_device’ has no member named ‘trans_start’
  dev->trans_start = jiffies;

Signed-off-by: Nirmoy Das <ndas@suse.de>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-13 23:12:18 +01:00
Jingjing Wu
5f6ff30dc5 igb_uio: fix interrupt enablement after FLR in VM
If pass-through a VF by vfio-pci to a Qemu VM, after FLR
in VM, the interrupt setting is not recoverd correctly
to host as below:
 in VM guest:
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
 in Host:
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-

That was because in pci_reset_function, it first reads the
PCI configure and set FLR reset, and then writes PCI configure
as restoration. But not all the writing are successful to Host.
Because vfio-pci driver doesn't allow directly write PCI MSI-X
Cap.

To fix this issue, we need to move the interrupt enablement from
igb_uio probe to open device file. While it is also the similar as
the behaviour in vfio_pci kernel module code.

Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-13 22:35:29 +01:00
Markus Theil
6f4600eba9 igb_uio: fix legacy MSI masking
MSI masks contain a 1 if interrupt is masked, 0 if unmasked.
I got that wrong with the !!state calculation. For better
readability, the mask is now changed like in igbuio_msi_mask_irq.

Fixes: a8ea1e5fb647 ("igb_uio: fix unknown MSI symbols")

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-13 21:57:48 +02:00
Ferruh Yigit
a8ea1e5fb6 igb_uio: fix unknown MSI symbols
This patch partially reverts the commit d196343a258e and adds some
functions from Markus' previous version of the patch [1].

igb_uio uses pci_msi_unmask_irq() and pci_msi_mask_irq() kernel APIs
when kernel version is >= 3.19 because these APIs are implemented in
this Linux kernel version.

But these APIs only exported beginning from Linux kernel 4.5, so before
this Linux kernel version igb_uio kernel module is not usable,
and giving following warnings:
"igb_uio: Unknown symbol pci_msi_unmask_irq"
"igb_uio: Unknown symbol pci_msi_mask_irq"

The support for these APIs increased to Linux kernel >= 4.5

For older version of Linux kernel unmask_msi_irq() and mask_msi_irq()
are used but these functions are not exported at all.
Instead of these functions switched back to previous implementation in
igb_uio for MSI-X, and for MSI used igbuio_msi_mask_irq() from [1].

[1]
http://dpdk.org/dev/patchwork/patch/28144/

Fixes: d196343a258e ("igb_uio: use kernel functions for masking MSI-X")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-13 15:50:13 +02:00
Santosh Shukla
07a6f5c2d3 eal: call plugin init before device parse
Default eal_init code calls
0. eal_plugins_init
1. eal_option_device_parse
2. rte_bus_scan

IOVA commit:cf408c224 missed on calling eal_plugins_init before
eal_option_device_parse, rte_bus_scan and that introduced below
regression for shared mode:

with CONFIG_RTE_BUILD_SHARED_LIB=y:

'net_vhost0,iface=/tmp/vhost-user2' -d ./install/lib/librte_pmd_vhost.so
-- --portmask=1 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=1
--eth-peer=0,52:54:00:11:22:12
EAL: Detected 4 lcore(s)
ERROR: failed to parse device "net_vhost0"
EAL: Unable to parse device 'net_vhost0,iface=/tmp/vhost-user2'
PANIC in main():
Cannot init EAL

Fixes: cf408c224 ("eal: auto detect IOVA mode")

Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2017-10-13 15:38:30 +02:00
Xiaoyun Li
84cc318424 eal/x86: select optimized memcpy at run-time
This patch dynamically selects functions of memcpy at run-time based
on CPU flags that current machine supports. This patch uses function
pointers which are bind to the relative functions at constrctor time.
In addition, AVX512 instructions set would be compiled only if users
config it enabled and the compiler supports it.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
2017-10-13 15:20:50 +02:00
Pablo de Lara
ea39ca97d2 eal/x86: fix FreeBSD build
lib/librte_eal/common/arch/x86/rte_cycles.c: In function 'rdmsr':
lib/librte_eal/common/arch/x86/rte_cycles.c:57:11:
error: unused parameter 'msr' [-Werror=unused-parameter]
 rdmsr(int msr, uint64_t *val)
           ^
lib/librte_eal/common/arch/x86/rte_cycles.c:57:26:
error: unused parameter 'val' [-Werror=unused-parameter]
 rdmsr(int msr, uint64_t *val)
                          ^
Fixes: ad3516bb4ae1 ("eal/x86: implement arch-specific TSC freq query")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2017-10-13 15:20:50 +02:00
Sergio Gonzalez Monroy
ad3516bb4a eal/x86: implement arch-specific TSC freq query
First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
Clock Information Leaf to determine the tsc hz on platforms that
supports it (does not require privileged user).

If the CPUID leaf is not available, then try to determine the tsc hz by
reading the MSR 0xCE (requires privileged user).

Default to the tsc hz estimation if both methods fail.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
2017-10-13 13:07:38 +02:00
Jerin Jacob
15692396fd eal/ppc64: implement arch-specific TSC freq query
In ppc_64, rte_rdtsc() returns timebase register value which increments
at independent timebase frequency and hence not related to lcore cpu
frequency to derive TSC hz. Hence, we stick with master lcore frequency.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2017-10-13 13:07:38 +02:00
Jerin Jacob
583152bc81 eal/armv8: implement arch-specific TSC freq query
Use cntvct_el0 system register to get the system counter frequency.

If the system is configured with RTE_ARM_EAL_RDTSC_USE_PMU then
return 0(let the common code calibrate the tsc frequency).

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
2017-10-13 13:07:23 +02:00
Jerin Jacob
3dbc565e81 timer: honor arch-specific TSC frequency query
When calibrating the TSC frequency, first, probe the architecture specific
function. If not available, use the existing calibrate scheme.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
2017-10-13 13:07:17 +02:00
Pavan Bhagavatula
bc48589e47 eal: move bitmap from sched library
The librte_sched uses rte_bitmap to manage large arrays of bits in an
optimized method so, moving it to eal/common would allow other libraries
and applications to use it.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2017-10-12 22:31:33 +02:00
Luca Boccassi
f642036f3a mk: sort headers before wildcard inclusion
In order to achieve fully reproducible builds, always use the same
inclusion order for headers in the Makefiles.

Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
2017-10-12 22:31:33 +02:00
Jianfeng Tan
7100dfe381 mem: honor IOVA mode for no-huge case
With the introduction of IOVA mode, the only blocker to run
with 4KB pages for NICs binding to vfio-pci, is that
RTE_BAD_PHYS_ADDR is not a valid IOVA address.

We can refine this by using VA as IOVA if it's IOVA mode.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2017-10-12 21:05:21 +01:00
Stephen Hemminger
2cb43002af ethdev: increase device internal name length
Allow sufficient space for UUID in string form (36+1).
Needed to use UUID with Hyper-V.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-12 01:36:58 +01:00
Jiayu Hu
119583797b gso: support TCP/IPv4 GSO
This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input
packets have correct checksums, and doesn't update checksums for
output packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
2017-10-12 01:36:57 +01:00
Shreyansh Jain
63bdef1827 bus: ignore scan and probe failures
Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.

Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-12 00:29:06 +02:00
Pavan Nikhilesh
78666372fa eal: add function to check lcore role
This function can be used to check the role of a specific lcore.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2017-10-11 22:30:16 +02:00
Sergio Gonzalez Monroy
5b618b5b29 eal/x86: use cpuid builtin
GCC does have the __get_cpuid_count builtin which checks for maximum
supported leaf, but implementations differ between CLANG and GCC.

This change provides an implementation compatible with both GCC and
CLANG 3.4+.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-11 21:59:56 +02:00
Jonas Pfefferle
33604c3135 vfio: refactor PCI BAR mapping
Split pci_vfio_map_resource for primary and secondary processes.
Save all relevant mapping data in primary process to allow
the secondary process to perform mappings.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2017-10-10 15:37:58 +02:00
Jonas Pfefferle
ed1e7e576b vfio: fix sPAPR IOMMU DMA window size
DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-10-10 15:36:04 +02:00
Patrick MacArthur
e3f141879e eal: copy raw strings taken from command line
Normally, command line argument strings are considered immutable, but
SPDK [1] and urdma [2] construct argv arrays to pass to rte_eal_init().
These strings are allocated using malloc() and freed after DPDK
initialization with free(). However, in the case of --file-prefix and
--huge-dir, DPDK takes the pointer to these strings in argv directly. If
a secondary process calls rte_eal_pci_probe() after rte_eal_init()
returns, as is done by SPDK, this causes a use-after-free error because
the strings have been freed by the calling code immediately after
rte_eal_init() returns.

This problem was observed when running SPDK example programs as a
secondary process and causes the secondary processes to fail:

Starting DPDK 16.11.1 initialization...
[ DPDK EAL parameters: identify -c 4 --file-prefix=spdk3260 --base-virtaddr=0x1000000000 --proc-type=auto ]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: SECONDARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   cannot connect to primary process!
EAL: Error - exiting with code: 1
Cause: Requested device 0000:81:00.0 cannot be used

Running strace shows that the file prefix has been zero'd out by the
time that the secondary process attempts to probe the NVMe device.

The use-after-free errors can be easily detected with valgrind:

==8489== Invalid read of size 1
==8489==    at 0x4C30D22: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x58DB955: vfprintf (vfprintf.c:1637)
==8489==    by 0x59A4685: __vsnprintf_chk (vsnprintf_chk.c:63)
==8489==    by 0x59A45E7: __snprintf_chk (snprintf_chk.c:34)
==8489==    by 0x1246AB: get_socket_path.constprop.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x124B09: vfio_mp_sync_connect_to_primary (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x123BE4: vfio_get_group_fd.part.1 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x124366: vfio_setup_device (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x126C8A: pci_vfio_map_resource (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x12B115: pci_probe_all_drivers.part.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x12B596: rte_eal_pci_probe (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489==    by 0x11D5B5: spdk_pci_enumerate (pci.c:147)
==8489==  Address 0x63f362e is 14 bytes inside a block of size 32 free'd
==8489==    at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x11E6FB: spdk_free_args (init.c:136)
==8489==    by 0x11EBF5: spdk_env_init (init.c:309)
==8489==    by 0x10D2AA: main (identify.c:976)
==8489==  Block was alloc'd at
==8489==    at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489==    by 0x11E7D7: _sprintf_alloc (init.c:76)
==8489==    by 0x11EA78: spdk_build_eal_cmdline (init.c:251)
==8489==    by 0x11EA78: spdk_env_init (init.c:282)
==8489==    by 0x10D2AA: main (identify.c:976)
==8489==

Fix this by using strdup() to create separate memory buffers for these
strings. Note that this patch will cause valgrind to report memory
leaks of these buffers as there is nowhere to free them. Using static
buffers is an option but would make these strings have a fixed maximum
length whereas there is currently no limit defined by the API.

[1] http://spdk.io
[2] https://github.com/zrlio/urdma

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2017-10-09 23:25:13 +02:00
Seth Howell
7485e06c2a mem: check mmap failure
If mmap fails, it will return the value MAP_FAILED. Checking for this
return code allows us to properly identify mmap failures and report
them as such to the calling function.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2017-10-09 23:17:04 +02:00
Xueming Li
41baec55a8 mem: fix malloc element free in debug mode
malloc_elem_free() is clearing(setting to 0) the trailer cookie when
RTE_MALLOC_DEBUG is enabled. In case of joining free neighbor element,
part of joined memory is not getting cleared due to missing the length
of trailer cookie in the middle.

This patch fixes calculation of free memory length to be cleared in
malloc_elem_free() by including trailer cookie.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2017-10-09 23:15:45 +02:00
Xueming Li
3cd4e0e883 mem: fix malloc debug config
This patch replaces broken macro RTE_LIBRTE_MALLOC_DEBUG with
RTE_MALLOC_DEBUG.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
2017-10-09 23:15:45 +02:00
Xueming Li
f385306357 config: add option to enable asserts
Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.

On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.

Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.

In one word, log is log, assertion is assertion, debug is hot pot :)

Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2017-10-09 23:15:45 +02:00
Jianfeng Tan
f26ab687a7 eal: remove Xen dom0 support
We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.

Related documents are removed or updated, and bump the eal library
version.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
2017-10-09 01:54:29 +02:00
Jianfeng Tan
a7cb2e20d2 mem: remove API to get physical address in dom0
Previously, to get MFN address in dom0, this API is a wrapper to
obtain the "physical address".

As we will removed xen dom0 support, this API is not necessary.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2017-10-09 01:52:37 +02:00
Hemant Agrawal
e1a45fc494 igb_uio: fix build on arm64 kernel
IGB_UIO compilation recently got enabled for ARM64 by default

The igb_uio compilation against ARM64 based stock 4.x (e.g. 4.13)
kernel is giving compilation warnings:

igb_uio.c: In function ‘igbuio_pci_irqcontrol’:
igb_uio.c:115:25: error: implicit declaration of function
‘irq_get_irq_dat ’ [-Werror=implicit-function-declaration]
  struct irq_data *irq = irq_get_irq_data(udev->info.irq);
                         ^
igb_uio.c:115:25: error: initialization makes pointer from integer without
a cast [-Werror=int-conversion]

Fixes: d196343a258e ("igb_uio: use kernel functions for masking MSI-X")

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
2017-10-08 17:19:08 +02:00
Tonghao Zhang
071925527d igb_uio: use UIO macro instead of hardcoded value
This is not bugfix, but it's convenient to help developer
to review and maintain the igbuio codes.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
2017-10-07 00:51:59 +02:00
Markus Theil
74da59da7f igb_uio: add MSI IRQ mode
This patch adds MSI IRQ mode in a way, that should
also work on older kernel versions. The base for my patch
was an attempt to do this in cf705bc36c which was later
reverted in d8ee82745a. Compilation was tested on Linux 3.2,
4.10 and 4.12.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 23:03:03 +01:00
Markus Theil
d196343a25 igb_uio: use kernel functions for masking MSI-X
This patch removes the custom MSI-X mask/unmask code and
uses already existing kernel functions.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 23:03:03 +01:00
Markus Theil
a0b6ce5051 igb_uio: release in exact reverse order
For better readability throughout the module, the destruction
order is changed to the exact inverse construction order.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 23:03:03 +01:00
Markus Theil
d26fc87aa2 igb_uio: fix MSI-X IRQ assignment with new IRQ function
The patch which introduced the usage of pci_alloc_irq_vectors
came after the patch which switched to non-threaded ISR (f0d1896fa1),
but did not use non-threaded ISR, if pci_alloc_irq_vectors
is used.

Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 23:03:03 +01:00
Markus Theil
9838673e32 igb_uio: fix IRQ disable on recent kernels
igb_uio already allocates irqs using pci_alloc_irq_vectors on
recent kernels >= 4.8. The interrupt disable code was not
using the corresponding pci_free_irq_vectors, but the also
deprecated pci_disable_msix, before this fix.

Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 23:03:03 +01:00
Markus Theil
0256ec056f igb_uio: refactor IRQ enable/disable into own functions
Interrupt setup code in igb_uio has to deal with multiple
types of interrupts and kernel versions. This patch moves
the setup and teardown code into own functions, to make
it more readable.

Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 23:03:03 +01:00
Nirmoy Das
d8bc68bcd2 kni: fix SLE version detection
detect SLE version reverse chronologically as ">=" is being used.

Fixes: 2972254ce163 ("kni: fix build on Suse 12 SP3")
Cc: stable@dpdk.org

Signed-off-by: Nirmoy Das <ndas@suse.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2017-10-06 22:45:21 +01:00
Santosh Shukla
a103a97e71 eal: allow user to override default mempool driver
DPDK has support for both sw and hw mempool and
currently user is limited to use ring_mp_mc pool.
In case user want to use other pool handle,
need to update config RTE_MEMPOOL_OPS_DEFAULT, then
build and run with desired pool handle.

Introducing eal option to override default pool handle.

Now user can override the RTE_MEMPOOL_OPS_DEFAULT by passing
pool handle to eal `--mbuf-pool-ops-name=""`.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2017-10-06 20:48:22 +02:00
Santosh Shukla
72d013644b mem: honor IOVA mode in malloc virt2phy
Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:07 +02:00
Santosh Shukla
680f6c1260 mem: honor IOVA mode in virt2phy
Check iova mode and accordingly return phy addr.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:07 +02:00
Santosh Shukla
e85a919286 vfio: honor IOVA mode before mapping
Check iova mode and accordingly map iova to pa or va.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:07 +02:00
Santosh Shukla
cf408c2247 eal: auto detect IOVA mode
iova autodetection depends on rte_bus_scan result. Result of bus scan will
have updated device_list and each device in that list has its '.kdev' state
updated. That kdrv state used to detect iova mapping mode for that device.

_device_parse() has dependency on rt_bus_scan so,
Below calls moved up in the eal initialization order:
	- eal_option_device_parse
	- rte_bus_scan

And based on the result of rte_bus_scan_iommu_class - select iova
mapping mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:07 +02:00
Santosh Shukla
93878cf025 eal: introduce helper API for IOVA mode
Introducing rte_eal_iova_mode() helper API. This API
used by non-eal library for detecting iova mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:07 +02:00
Santosh Shukla
39bef5cb98 bus: get IOMMU class
API(rte_bus_get_iommu_class) helps to automatically detect and select
appropriate iova mapping scheme for iommu capable device on that bus.

Algorithm for iova scheme selection for bus:
0. Iterate through bus_list.
1. Collect each bus iova mode value and update into 'mode' var.
2. Mode selection scheme is:
if mode == 0 then iova mode is _pa,
if mode == 1 then iova mode is _pa,
if mode == 2 then iova mode is _va,
if mode == 3 then iova mode ia _pa.

So mode !=2  will be default iova mode (_pa).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:07 +02:00
Santosh Shukla
815c7deaed pci: get IOMMU class on Linux
Get iommu class of PCI device on the bus and returns preferred iova
mapping mode for that bus.

Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
Flag used when driver needs to operate in iova=va mode.

Algorithm for iova scheme selection for PCI bus:
0. If no device bound then return with RTE_IOVA_DC mapping mode,
else goto 1).
1. Look for device attached to vfio kdrv and has .drv_flag set
to RTE_PCI_DRV_IOVA_AS_VA.
2. Look for any device attached to UIO class of driver.
3. Check for vfio-noiommu mode enabled.

If 2) & 3) is false and 1) is true then select
mapping scheme as RTE_IOVA_VA. Otherwise use default
mapping scheme (RTE_IOVA_PA).

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:39:03 +02:00
Santosh Shukla
a4f0a2dbe5 pci: get IOMMU class
Introducing rte_pci_get_iommu_class API which helps to get iommu class
of PCI device on the bus and returns preferred iova mapping mode for
PCI bus.

Patch also adds rte_pci_get_iommu_class definition for:
- bsdapp: api returns default iova mode.
- linuxapp: Has stub implementation, Followup patch has complete
  implementation.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2017-10-06 20:35:37 +02:00
Santosh Shukla
aadc3eb002 pci: export match function
Export rte_pci_match() function as it needed in the followup patch.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2017-10-06 20:32:35 +02:00
Olivier Matz
84585f330f uio: fix compilation with -Og
The compilation with gcc-6.3.0 and EXTRA_CFLAGS=-Og gives the following
error:

  CC eal_pci_uio.o
  eal_pci_uio.c: In function ‘pci_get_uio_de ’:
  eal_pci_uio.c:221:9: error: ‘uio_num’ may be used uninitialized in
                       this function [-Werror=maybe-uninitialized]
  return uio_num;
         ^~~~~~~

This is a false positive: gcc is not able to see that when e != NULL,
uio_num is always set. Fix the warning by initializing uio_num to -1,
and by the way, change the type to int.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2017-10-06 02:49:50 +02:00
Lukasz Majczak
1e0a17ac4c eal: fix auxv open check for ARM and PPC
The assertion of return value from the open() function is done against
0, while it is a correct value - open() returns -1 in case of an error.
It causes problems while trying to run as a daemon, in which case, this
call to open() will return 0 as a valid descriptor.

Fixes: b94e5c9406b5 ("eal/arm: add CPU flags for ARMv7")
Fixes: 97523f822ba9 ("eal/arm: add CPU flags for ARMv8")
Fixes: 9ae155385686 ("eal/ppc: cpu flag checks for IBM Power")
Cc: stable@dpdk.org

Signed-off-by: Lukasz Majczak <lma@semihalf.com>
Acked-by: Jan Viktorin <viktorin@rehivetech.com>
Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>
2017-10-06 00:15:43 +02:00