This commit adds a new function to disable the runtime mapped
service-cores check. This allows an application to take responsibility
of running unmapped services.
This feature is useful in cases like unit tests, where the application
code (or unit test in this case) requires accurate control over when
the service function is called to ensure correct behaviour, and when
an application has an advanced use-case and wishes to manage services
manually.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
This commit adds a new function which allows an application lcore
(aka; *not* a dedicated service-lcore) to run a service. This
function is required to allow applications to gradually port
functionality to using services, while still being able to run
ordinary application functions.
This requirement became clear when a patch to the existing
eventdev/pipeline sample app was modified to use a service-core
for scheduling - and that same core should be capable of running
the "worker()" function from the application too.
This patch refactors the existing running code into a smaller
"service_run" function, which can be called by the dedicated
service-core loop, and the newly added function.
[1] http://dpdk.org/ml/archives/dev/2017-October/079876.html
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
The starting point is known. The iterator can be directly set to it.
The function rte_bus_find can easily be used with a comparison function
always returning True. This would make it a regular bus iterator.
Users doing so would however accomplish such iteration in
O(N * N/2) = O(N^2)
Which can be avoided.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Remove device reset during application start, the reset for application
exit still there.
Reset in open removed because of following comments:
1- Device reset not completed when VF driver loaded, which cause VF PMD
initialization error.
Adding delay can solve the issue but will increase driver load time.
2- Reset will be issues all devices unconditionally, not very efficient
way.
Fixes: b58eedfc7d ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Harish Patil <harish.patil@cavium.com>
Tested-by: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Tested-by: Jingjing Wu <jingjing.wu@intel.com>
Since the functions exported by DPDK EAL on all OS's should be
identical, we should not need separate function version files for each
OS. Therefore move existing version files to the top-level EAL
directory.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The linuxapp and bsdapp interrupt header files are now identical, so
merge them into a common file in common/include.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
A number of interrupt functions only existed on Linux. Adding in stubs
for these functions corrects this omission, and allows the map files for
both Linux and FreeBSD to be identical.
Fixes: 9efe9c6cdc ("eal/linux: add epoll wrappers")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The bsdapp-specific rte_interrupts.h file does not need to be different
from the linuxapp one, as there is nothing Linux specific in the APIs or
data structures. This will then allow us to merge the files in a common
location to avoid duplication.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
If the default location for the PMD .so files does not exist, it should
not be treated as a fatal error condition like an incorrect path on the
command line. Therefore check that the path exists and is a directory
before adding it to the list of paths to check for PMDs.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Remove rte_set_log_level(), rte_get_log_level(),
rte_set_log_type(), and rte_get_log_type().
Also update librte_eal.so version in docuementation.
The LIBABIVER variable in eal has already been modified in
commit f26ab687a7 ("eal: remove Xen dom0 support").
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
When getting group fd from primary process, secondary wasn't storing
the fd anywhere, leading to a (harmless) error message in EAL logs,
and (not so harmless) potential problems when hot-unplugging devices
managed by VFIO in a secondary process.
Fix it by actually storing the group fd whenever we get a valid one
from the secondary process.
Fixes: 94c0776b1b ("vfio: support hotplug")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
VFIO may be used by buses other than PCI. This patch enables
the VFIO on the basis of vfio root presence.
Since vfio_enable should be called only once, pci_vfio_enable
is also removed.
A debug print is added in case vfio_pci module is not present.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Compile fails when kernel version is <= 3.17 with error:
"dereferencing pointer to incomplete type". This is because struct
uio_device definition is not exposed in kernel earlier than 3.17.
This patch fixes it by using pointer of rte_uio_pci_dev as
dev_id instead of uio_device for irq device handler.
Fixes: 5f6ff30dc5 ("igb_uio: fix interrupt enablement after FLR in VM")
Cc: stable@dpdk.org
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
build error:
build/lib/librte_eal/linuxapp/kni/kni_net.c:215:5: error:
‘struct net_device’ has no member named ‘trans_start’
dev->trans_start = jiffies;
Signed-off-by: Nirmoy Das <ndas@suse.de>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
If pass-through a VF by vfio-pci to a Qemu VM, after FLR
in VM, the interrupt setting is not recoverd correctly
to host as below:
in VM guest:
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
in Host:
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
That was because in pci_reset_function, it first reads the
PCI configure and set FLR reset, and then writes PCI configure
as restoration. But not all the writing are successful to Host.
Because vfio-pci driver doesn't allow directly write PCI MSI-X
Cap.
To fix this issue, we need to move the interrupt enablement from
igb_uio probe to open device file. While it is also the similar as
the behaviour in vfio_pci kernel module code.
Fixes: b58eedfc7d ("igb_uio: issue FLR during open and release of device file")
Cc: stable@dpdk.org
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Tested-by: Shijith Thotton <shijith.thotton@caviumnetworks.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
MSI masks contain a 1 if interrupt is masked, 0 if unmasked.
I got that wrong with the !!state calculation. For better
readability, the mask is now changed like in igbuio_msi_mask_irq.
Fixes: a8ea1e5fb6 ("igb_uio: fix unknown MSI symbols")
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Tested-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch partially reverts the commit d196343a25 and adds some
functions from Markus' previous version of the patch [1].
igb_uio uses pci_msi_unmask_irq() and pci_msi_mask_irq() kernel APIs
when kernel version is >= 3.19 because these APIs are implemented in
this Linux kernel version.
But these APIs only exported beginning from Linux kernel 4.5, so before
this Linux kernel version igb_uio kernel module is not usable,
and giving following warnings:
"igb_uio: Unknown symbol pci_msi_unmask_irq"
"igb_uio: Unknown symbol pci_msi_mask_irq"
The support for these APIs increased to Linux kernel >= 4.5
For older version of Linux kernel unmask_msi_irq() and mask_msi_irq()
are used but these functions are not exported at all.
Instead of these functions switched back to previous implementation in
igb_uio for MSI-X, and for MSI used igbuio_msi_mask_irq() from [1].
[1]
http://dpdk.org/dev/patchwork/patch/28144/
Fixes: d196343a25 ("igb_uio: use kernel functions for masking MSI-X")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch dynamically selects functions of memcpy at run-time based
on CPU flags that current machine supports. This patch uses function
pointers which are bind to the relative functions at constrctor time.
In addition, AVX512 instructions set would be compiled only if users
config it enabled and the compiler supports it.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
Clock Information Leaf to determine the tsc hz on platforms that
supports it (does not require privileged user).
If the CPUID leaf is not available, then try to determine the tsc hz by
reading the MSR 0xCE (requires privileged user).
Default to the tsc hz estimation if both methods fail.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
In ppc_64, rte_rdtsc() returns timebase register value which increments
at independent timebase frequency and hence not related to lcore cpu
frequency to derive TSC hz. Hence, we stick with master lcore frequency.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Use cntvct_el0 system register to get the system counter frequency.
If the system is configured with RTE_ARM_EAL_RDTSC_USE_PMU then
return 0(let the common code calibrate the tsc frequency).
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
When calibrating the TSC frequency, first, probe the architecture specific
function. If not available, use the existing calibrate scheme.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
The librte_sched uses rte_bitmap to manage large arrays of bits in an
optimized method so, moving it to eal/common would allow other libraries
and applications to use it.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
In order to achieve fully reproducible builds, always use the same
inclusion order for headers in the Makefiles.
Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
With the introduction of IOVA mode, the only blocker to run
with 4KB pages for NICs binding to vfio-pci, is that
RTE_BAD_PHYS_ADDR is not a valid IOVA address.
We can refine this by using VA as IOVA if it's IOVA mode.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Allow sufficient space for UUID in string form (36+1).
Needed to use UUID with Hyper-V.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input
packets have correct checksums, and doesn't update checksums for
output packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.
TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.
If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Bus scan is responsible for finding devices over *all* buses.
Some of these buses might not be able to scan but that should
not prevent other buses to be scanned.
Same is the case for probing. It is possible that some devices which
were scanned didn't have a specific driver. That should not prevent
other buses from being probed.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
This function can be used to check the role of a specific lcore.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
GCC does have the __get_cpuid_count builtin which checks for maximum
supported leaf, but implementations differ between CLANG and GCC.
This change provides an implementation compatible with both GCC and
CLANG 3.4+.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Split pci_vfio_map_resource for primary and secondary processes.
Save all relevant mapping data in primary process to allow
the secondary process to perform mappings.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Normally, command line argument strings are considered immutable, but
SPDK [1] and urdma [2] construct argv arrays to pass to rte_eal_init().
These strings are allocated using malloc() and freed after DPDK
initialization with free(). However, in the case of --file-prefix and
--huge-dir, DPDK takes the pointer to these strings in argv directly. If
a secondary process calls rte_eal_pci_probe() after rte_eal_init()
returns, as is done by SPDK, this causes a use-after-free error because
the strings have been freed by the calling code immediately after
rte_eal_init() returns.
This problem was observed when running SPDK example programs as a
secondary process and causes the secondary processes to fail:
Starting DPDK 16.11.1 initialization...
[ DPDK EAL parameters: identify -c 4 --file-prefix=spdk3260 --base-virtaddr=0x1000000000 --proc-type=auto ]
EAL: Detected 40 lcore(s)
EAL: Auto-detected process type: SECONDARY
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL: probe driver: 8086:953 spdk_nvme
EAL: cannot connect to primary process!
EAL: Error - exiting with code: 1
Cause: Requested device 0000:81:00.0 cannot be used
Running strace shows that the file prefix has been zero'd out by the
time that the secondary process attempts to probe the NVMe device.
The use-after-free errors can be easily detected with valgrind:
==8489== Invalid read of size 1
==8489== at 0x4C30D22: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x58DB955: vfprintf (vfprintf.c:1637)
==8489== by 0x59A4685: __vsnprintf_chk (vsnprintf_chk.c:63)
==8489== by 0x59A45E7: __snprintf_chk (snprintf_chk.c:34)
==8489== by 0x1246AB: get_socket_path.constprop.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x124B09: vfio_mp_sync_connect_to_primary (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x123BE4: vfio_get_group_fd.part.1 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x124366: vfio_setup_device (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x126C8A: pci_vfio_map_resource (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x12B115: pci_probe_all_drivers.part.0 (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x12B596: rte_eal_pci_probe (in /home/pmacarth/src/spdk/examples/nvme/identify/identify)
==8489== by 0x11D5B5: spdk_pci_enumerate (pci.c:147)
==8489== Address 0x63f362e is 14 bytes inside a block of size 32 free'd
==8489== at 0x4C2ED5B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x11E6FB: spdk_free_args (init.c:136)
==8489== by 0x11EBF5: spdk_env_init (init.c:309)
==8489== by 0x10D2AA: main (identify.c:976)
==8489== Block was alloc'd at
==8489== at 0x4C2DB2F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8489== by 0x11E7D7: _sprintf_alloc (init.c:76)
==8489== by 0x11EA78: spdk_build_eal_cmdline (init.c:251)
==8489== by 0x11EA78: spdk_env_init (init.c:282)
==8489== by 0x10D2AA: main (identify.c:976)
==8489==
Fix this by using strdup() to create separate memory buffers for these
strings. Note that this patch will cause valgrind to report memory
leaks of these buffers as there is nowhere to free them. Using static
buffers is an option but would make these strings have a fixed maximum
length whereas there is currently no limit defined by the API.
[1] http://spdk.io
[2] https://github.com/zrlio/urdma
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Patrick MacArthur <patrick@patrickmacarthur.net>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
If mmap fails, it will return the value MAP_FAILED. Checking for this
return code allows us to properly identify mmap failures and report
them as such to the calling function.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
malloc_elem_free() is clearing(setting to 0) the trailer cookie when
RTE_MALLOC_DEBUG is enabled. In case of joining free neighbor element,
part of joined memory is not getting cleared due to missing the length
of trailer cookie in the middle.
This patch fixes calculation of free memory length to be cleared in
malloc_elem_free() by including trailer cookie.
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Currently, enabling assertion have to set CONFIG_RTE_LOG_LEVEL to
RTE_LOG_DEBUG. CONFIG_RTE_LOG_LEVEL is the default log level of control
path, RTE_LOG_DP_LEVEL is the log level of data path. It's a little bit
hard to understand literally that assertion is decided by control path
LOG_LEVEL, especially assertion used on data path.
On the other hand, DPDK need an assertion enabling switch w/o impacting
log output level, assuming "--log-level" not specified.
Assertion is an important API to balance DPDK high performance and
robustness. To promote assertion usage, it's valuable to unhide
assertion out of COFNIG_RTE_LOG_LEVEL.
In one word, log is log, assertion is assertion, debug is hot pot :)
Rationale of this patch is to introduce an dedicate switch of
assertion: RTE_ENABLE_ASSERT
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.
Related documents are removed or updated, and bump the eal library
version.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Previously, to get MFN address in dom0, this API is a wrapper to
obtain the "physical address".
As we will removed xen dom0 support, this API is not necessary.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
IGB_UIO compilation recently got enabled for ARM64 by default
The igb_uio compilation against ARM64 based stock 4.x (e.g. 4.13)
kernel is giving compilation warnings:
igb_uio.c: In function ‘igbuio_pci_irqcontrol’:
igb_uio.c:115:25: error: implicit declaration of function
‘irq_get_irq_dat ’ [-Werror=implicit-function-declaration]
struct irq_data *irq = irq_get_irq_data(udev->info.irq);
^
igb_uio.c:115:25: error: initialization makes pointer from integer without
a cast [-Werror=int-conversion]
Fixes: d196343a25 ("igb_uio: use kernel functions for masking MSI-X")
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
This is not bugfix, but it's convenient to help developer
to review and maintain the igbuio codes.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This patch adds MSI IRQ mode in a way, that should
also work on older kernel versions. The base for my patch
was an attempt to do this in cf705bc36c which was later
reverted in d8ee82745a. Compilation was tested on Linux 3.2,
4.10 and 4.12.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
For better readability throughout the module, the destruction
order is changed to the exact inverse construction order.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
The patch which introduced the usage of pci_alloc_irq_vectors
came after the patch which switched to non-threaded ISR (f0d1896fa1),
but did not use non-threaded ISR, if pci_alloc_irq_vectors
is used.
Fixes: 99bb58f3ad ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
igb_uio already allocates irqs using pci_alloc_irq_vectors on
recent kernels >= 4.8. The interrupt disable code was not
using the corresponding pci_free_irq_vectors, but the also
deprecated pci_disable_msix, before this fix.
Fixes: 99bb58f3ad ("igb_uio: switch to new irq function for MSI-X")
Cc: stable@dpdk.org
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>