Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
move_pages() is only used to get the numa node id, but this function
is not allowed by default in docker (it needs CAP_SYS_NICE and an update of
the seccomp profile).
get_mempolicy() also requires CAP_SYS_NICE but doesn't need any change in
the default seccomp profile.
Note that the returned value of move_pages() was not checked, thus some
errors could be hidden (if the requested id was 0).
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Didier Pallard <didier.pallard@6wind.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When checking RTE_PCI_DRV_IOVA_AS_VA flag to determine IOVA mode,
pci_one_device_has_iova_va() returns true only if kernel driver of the
device is vfio. However, Mellanox mlx4/5 PMD doesn't need to be detached
from kernel driver and attached to VFIO/UIO. Control path still goes
through the existing kernel driver, which is mlx4_core/mlx5_core. In order
to make RTE_PCI_DRV_IOVA_AS_VA effective for mlx4/mlx5 PMD, a new kernel
driver type has to be introduced.
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
The meson build never checked for the presence of rdrand and rdseed
instructions, while make build never checked for rdseed. Ensure builds
always have the appropriate checks - and therefore defines - for these
instructions. For runtime, we also add in rdseed to the list of known
bits returned from cpuid() instruction, so we can confirm its presence at
application init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
The get_socket_mem_size() function is only used in 64-bit builds,
causing clang to warn about it for 32-bit builds. Add the __rte_unused
attribute to the function to silence the warning.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Since we now always use _FILE_OFFSET_BITS=64 flag when building
DPDK, we can remove the Makefile and C-file #defines setting it
individually for parts of the build.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
The fields of the internal EAL core configuration are currently
laid bare as part of the API. This is not good practice and limits
fixing issues with layout and sizes.
Make new accessor functions for the fields used by current drivers
and examples.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Purely cosmetic change, use unsigned int instead of unsigned alone.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
snprintf guarantees to always correctly place a null terminator
in the buffer string. So manually placing a null terminator
in a buffer right after a call to snprintf is redundant code.
Additionally, there is no need to use 'sizeof(buffer) - 1' in snprintf as this
means we are not using the last character in the buffer. 'sizeof(buffer)' is
enough.
Cc: stable@dpdk.org
Signed-off-by: Michael Santana <msantana@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When handling synchronous or asynchronous requests, the reply
must be sent explicitly even if the result of the operation is
an error, to avoid the other side timing out. Make note of this
in documentation explicitly.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
IPC and memory-related API's should not be mixed because memory
relies on IPC internally. Add explicit warnings to IPC API and
to the documentation about this.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
"VFIO group is not viable" error message is correct
but not very user friendly for something which can
usually be easily rectified.
Add some additional text to give more of a hint.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
The function check_input() was returning a bool as error code.
It is changed to return an int, semantically more correct.
While at it, make checks of validate_action_name() return
explicit as described in the coding guidelines.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Length of buffer and number of fd's to send are signed values, so
they can be negative, but the API doesn't check for that. Fix it
by checking for negative values as well.
Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, IPC does not check received messages for invalid data
and passes them to user code unchanged. This may result in buffer
overruns on reading message data. Fix this by checking the message
length and fd number on receive, and discard any messages that
are not valid.
Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
According to manpage, ENOBUFS error indicates that either the
input or the output queue is full. This should be considered
an error, but it is treated as an "ignore" condition. Fix the
code to report an error instead.
Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Rami Rosen <ramirose@gmail.com>
When sending multiple requests, rte_mp_request_sync
can succeed sending a few of those requests, but then
fail on a later one and in the end return with rc=-1.
The upper layers - e.g. device hotplug - currently
handles this case as if no messages were sent and no
memory for response buffers was allocated, which is
not true. Fixed by always freeing memory buffers on
failure.
Bugzilla ID: 228
Fixes: 783b6e54971d ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
This message was missing newline, and should capitalize
"Cannot" like all the others in this area.
Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Rami Rosen <ramirose@gmail.com>
The function rte_eal_cleanup() was introduced more than one year ago,
in DPDK 18.02. It is no longer experimental, allowing
pdump, proc-info and hotplug_mp apps to not need any experimental API.
The function rte_ctrl_thread_create() was introduced one year ago
in DPDK 18.05. It is no longer experimental, allowing
KNI PMD and TEP example to not need any experimental API.
The functions rte_socket_count() and rte_socket_id_by_idx() were
introduced one year ago in DPDK 18.05. They are no longer experimental.
The function rte_dev_is_probed() was introduced half a year ago
in DPDK 18.11. It is no longer experimental.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
The type for MAC address should be unsigned.
Fixes: 1cfe212ed17a ("kni: support MAC address change")
Cc: stable@dpdk.org
Signed-off-by: Jie Pan <panjie5@jd.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
These APIs are available in DPDK for last 4 releases
and used by multiple drivers.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Within EAL we had a series of if statements for selecting the EAL directory
to use. Now that the directory names match those of the OS's they are for
we can instead just use a generated subdirectory name, shortening the code.
To avoid strange errors, we still need to check for unsupported OS's, but
do this check up-front in the config meson.build file.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Do a global replace of snprintf(..."%s",...) with strlcpy, adding in the
rte_string_fns.h header if needed. The function changes in this patch were
auto-generated via command:
spatch --sp-file devtools/cocci/strlcpy.cocci --dir . --in-place
and then the files edited using awk to add in the missing header:
gawk -i inplace '/include <rte_/ && ! seen { \
print "#include <rte_string_fns.h>"; seen=1} {print}'
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
For files that already have rte_string_fns.h included in them, we can
do a straight replacement of snprintf(..."%s",...) with strlcpy. The
changes in this patch were auto-generated via command:
spatch --sp-file devtools/cocci/strlcpy-with-header.cocci --dir . --in-place
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When creating files on disk, e.g. for EAL configuration or shared memory
locks, etc., there is no need to grant any permissions on those files to
other users. All directories are already created with 0700 permissions, so
we should create all files with 0600 permissions.
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When enabling pedantic compilation with CONFIG_RTE_LIBRTE_MLX5_DEBUG,
the compiler complains about non standard 128-bit integer type:
include/rte_atomic_64.h:223:3: error:
ISO C does not support ‘__int128’ types [-Werror=pedantic]
It must be marked as an extension of the standard C language
to be accepted in pedantic compilation.
Fixes: 640c5f09ef2c ("eal/x86: add 128-bit atomic compare exchange")
Cc: gage.eads@intel.com
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gage Eads <gage.eads@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
baremetal execution environments may have a different
method to enable RTE_INIT instead of using compiler
constructor and/or OS specific linker scheme.
Allow an option to override RTE_INIT* macros using
rte_os.h or appropriate header file.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This operation can be used for non-blocking algorithms, such as a
non-blocking stack or ring.
It is available only for x86_64.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
The commit below added an address hint as starting address for 64-bit
systems in case an explicit base virtual address was not set by the user.
The justification for such hint was to help devices that work in VA
mode and has a address range limitation to work smoothly with the eal
memory subsystem.
While the base address value selected may work fine for the eal
initialization, it easily breaks when trying to register external memory
using rte_extmem_register API.
Trying to register anonymous memory on RH x86_64 machine took several
minutes, during them the function eal_get_virtual_area repeatedly
scanned for a good VA candidate.
The attempt to guess which VA address will be free for mapping will
always result in not portable, error prone code:
* different application may use different libraries along w/ DPDK. One
can never guess which library was called first and how much virtual
memory it consumed.
* external memory can be registered at any time in the application run
time.
In order not to break the existing secondary process design, this patch
only limits the max number of tries that will be done with the
address hint.
When the number of tries exceeds the threshold the code
will use the suggested address from kernel.
Fixes: 1df21702873d ("mem: use address hint for mapping hugepages")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
The hexdump code obviously came from somewhere else originally.
It is not formatted according to DPDK coding style.
Also, drop the comment which is not useful the docbock comment
is already in the rte_hexdump.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The divisor is not modified here. Doesn't really matter for optimizaton
since the function is inline already; but helps with expressing
intent.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Added meson workarounds to build helloworld on Windows.
Windows currently only supports kvargs and eal libraries.
This change restricts the build flow to supported libraries
only.
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Added headers to support Windows environment for common source.
These headers will have Windows specific implementions of the
system library APIs provided in Linux and FreeBSD.
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Adding sys/queue.h on Windows for supporting common code.
This implementation has BSD-3-Clause licensing.
Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Updated lib/meson.build to create shared libraries on Windows.
Added DEF files to list the exports for the eal and kvargs libraries.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Updated rte_common.h to include rte_os.h to contain
OS specific macros and functions. Updated rte_string_fns.h
to include rte_common.h for rte_os.h
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Added rte_os.h files to support OS specific functionality.
Updated build system to contain OS headers in the include
path.
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Added initial stub source files and required meson changes
for Windows support.
kernel/windows/meson is a stub file added to support
Windows specific source in future releases.
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Only one header file (rte_kni_common.h) was in the sub-directory
include/exec-env/
This file was installed in a sub-directory of the same name
in the makefile-based build.
Source and install directories are moved as below:
lib/librte_eal/linux/eal/include/exec-env/
-> lib/librte_eal/linux/eal/include/
build/include/exec-env/
-> build/include/
The consequence is to have a file hierarchy a bit more flat.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.
Single file segments option stores lock files per page to ensure
that pages are deleted when there are no more users, however this
is not necessary because the processes will be holding onto the
pages anyway because of mmap(). Thus, removing pages from the
filesystem is safe even though they may be used by some other
secondary process. As a result, single file segments mode no
longer stores inordinate amounts of segment fd's, and the above
issue with fd limits is solved.
However, this will not work for legacy mem mode. For that, simply
document that using bigger page sizes is the only option.
[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, segment resizing code sits in one giant function which
handles both in-memory and regular modes. Split them up into
individual functions.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
On Linux, we currently initialize rte_alarms after
starting to listen for IPC hotplug requests, which gives
us a data race window. Upon receiving such hotplug
request we always try to set an alarm and this obviously
doesn't work if the alarms weren't initialized yet.
To fix it, we initialize alarms before starting to
listen for IPC hotplug messages. Specifically, we move
rte_eal_alarm_init() right after rte_eal_intr_init() as
it makes some sense to keep those two close to each other.
We update the BSD code as well to keep the initialization
order the same in both EAL implementations.
Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Cc: stable@dpdk.org
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
MSI-X permits a device to allocate up to 2048 interrupts as per PCIe
spec.
Increase the max number of vectors to a reasonable value of 512.
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
There is no guarantee that pthread_self() returns the thread ID or that
pthread_t is an integer. The thread ID is not that useful so simply
remove it.
This fixes the following warning when building with musl libc:
lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'sigbus_handler':
lib/librte_eal/linuxapp/eal/eal_dev.c:70:3: warning:
cast from pointer to integer of different size [-Wpointer-to-int-cast]
(int)pthread_self(), info->si_addr);
^
Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
Cc: stable@dpdk.org
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>