use case: if callback is used to receive message form socket,
and the message received is disconnect/error, this callback needs
to be unregistered, but cannot because it is still active.
With this patch it is possible to mark the callback to be
unregistered once the interrupt process is done with this
interrupt source.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
Commit cdc242f260 says:
For Linux kernel 4.0 and newer, the ability to obtain
physical page frame numbers for unprivileged users from
/proc/self/pagemap was removed. Instead, when an IOMMU
is present, simply choose our own DMA addresses instead.
In this case the user still sees error messages, so adjust
the log levels. Later, other checks will ensure that errors
are logged in the appropriate cases.
Fixes: cdc242f260 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
The documentation for rte_realloc claims that the resized area
will always reside on the same NUMA node. This is not actually
the case - while *resized* area will be on the same NUMA node,
if resizing the area is not possible, then the memory will be
reallocated using rte_malloc(), which can allocate memory on
another NUMA node, depending on which lcore rte_realloc() was
called from and which NUMA nodes have memory available.
Fix the API doc to match the actual code of rte_realloc().
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
DPDK malloc library allows broken programs to work because
the semantics of zmalloc and malloc are the same.
This patch enables a more secure model which will catch
(and crash) programs that reuse memory already freed if
RTE_MALLOC_DEBUG is enabled.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
The version number in the DPDK_VERSION file will never have an offset
that needs to be subtracted, so remove that logic from the version
string generation.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Since we have the version number in a separate file at the root level,
we should not need to duplicate this in rte_version.h too. Best
approach here is to move the macros for specifying the year/month/etc.
parts from the version header file to the build config file - leaving
the other utility macros for e.g. printing the version string, where they
are.
For "make", this is done by having a little bit of awk parse the version
file and pass the results through to the preprocessor for the config
generation stage.
For "meson", this is done by parsing the version and adding it to the
standard dpdk_conf object.
In both cases, we need to append a large number - in this case "99",
previously 16 in original code - to the version number when we want to do
version number comparisons. Without this, the release version e.g. 19.05.0
will compare as less than it's RC's e.g. 19.05.0-rc4. With it, the
comparison is correct as "19.05.0.99 > 19.05.0-rc4.99".
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Currently, rte_realloc will not respect original allocation's
NUMA node when memory cannot be resized, and there is no
NUMA-aware equivalent of rte_realloc. This patch adds such a function.
The new API will ensure that reallocated memory stays on
requested NUMA node, as well as allow moving allocated memory
to a different NUMA node.
Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Rather than using linuxapp and bsdapp everywhere, we can change things to
use the, more readable, terms "linux" and "freebsd" in our build configs.
Rather than renaming the configs we can just duplicate the existing ones
with the new names using symlinks, and use the new names exclusively
internally. ["make showconfigs" also only shows the new names to keep the
list short] The result is that backward compatibility is kept fully but any
new builds or development can be done using the newer names, i.e. both
"make config T=x86_64-native-linuxapp-gcc" and "T=x86_64-native-linux-gcc"
work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Rename the macro and all instances in DPDK code, but keep a copy of
the old macro defined for legacy code linking against DPDK
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Rename the macro to make things shorter and more comprehensible. For
both meson and make builds, keep the old macro around for backward
compatibility.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The term "linuxapp" is a legacy one, but just calling the subdirectory
"linux" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The term "bsdapp" is a legacy one, but just calling the subdirectory
"freebsd" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
-l and -c options are two ways to select the cores used by DPDK.
Their format differs, but the checks on the selected cores are the same.
Use an intermediate array to separate the specific parsing checks from
the common consistency checks.
The parsing functions now concentrate on validating the passed string
and do nothing more.
We can report all invalid core indexes rather than only the first error.
In the error log message, reporting [0, cfg->lcore_count - 1] as a valid
range is then wrong when the core list is not continuous.
Example on my 8 cpus laptop with core 2 and 6 disabled.
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu6/online
Before:
./master/app/testpmd -l 0-7 --no-huge -m 512 -- --total-num-mbufs 2048
EAL: Detected 6 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: invalid core list, please check core numbers are in [0, 5] range
...
After:
./master/app/testpmd -l 0-7 --no-huge -m 512 -- --total-num-mbufs 2048
EAL: Detected 6 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: lcore 2 unavailable
EAL: lcore 6 unavailable
EAL: invalid core list, please check specified cores are part of 0-1,3-5,7
...
Fixes: d888cb8b96 ("eal: add core list input format")
Fixes: b38693b612 ("eal: fix core number validation")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
We don't need to look for trailing spaces.
This is a copy/paste block from eal_parse_coremask().
Remove it and the associated comment.
Fixes: d888cb8b96 ("eal: add core list input format")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Spawning the ctrl threads on anything that is not part of the eal
coremask is not that polite to the rest of the system, especially
when you took good care to pin your processes on cpu resources with
tools like taskset (linux) / cpuset (freebsd).
Rather than introduce yet another eal options to control on which cpu
those ctrl threads are created, let's take the startup cpu affinity
as a reference and remove the eal coremask from it.
If no cpu is left, then we default to the master core.
The cpuset is computed once at init before the original cpu affinity
is lost.
Introduced a RTE_CPU_AND macro to abstract the differences between linux
and freebsd respective macros.
Examples in a 4 cores FreeBSD vm:
$ ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \
-- -i --total-num-mbufs=2048
$ procstat -S 1057
PID TID COMM TDNAME CPU CSID CPU MASK
1057 100131 testpmd - 2 1 2
1057 100140 testpmd eal-intr-thread 1 1 0-1
1057 100141 testpmd rte_mp_handle 1 1 0-1
1057 100142 testpmd lcore-slave-3 3 1 3
$ cpuset -l 1,2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \
-- -i --total-num-mbufs=2048
$ procstat -S 1061
PID TID COMM TDNAME CPU CSID CPU MASK
1061 100131 testpmd - 2 2 2
1061 100144 testpmd eal-intr-thread 1 2 1
1061 100145 testpmd rte_mp_handle 1 2 1
1061 100147 testpmd lcore-slave-3 3 2 3
$ cpuset -l 2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \
-- -i --total-num-mbufs=2048
$ procstat -S 1065
PID TID COMM TDNAME CPU CSID CPU MASK
1065 100131 testpmd - 2 2 2
1065 100148 testpmd eal-intr-thread 2 2 2
1065 100149 testpmd rte_mp_handle 2 2 2
1065 100150 testpmd lcore-slave-3 3 2 3
Fixes: d651ee4919 ("eal: set affinity for control threads")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
pthread_setaffinity_np returns a >0 value on error.
We could end up letting the ctrl threads on the current process cpu
affinity.
Fixes: d651ee4919 ("eal: set affinity for control threads")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
pthread_getaffinity_np returns a >0 value when failing.
This is mainly for the sake of correctness.
The only case where it could fail is when passing an incorrect cpuset
size wrt to the kernel.
Fixes: 2eba8d21f3 ("eal: restrict cores auto detection")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
The RTE_PMD_DEBUG_TRACE was only enabled for EVENTDEV_DEBUG and
that configuration is now handled by RTE_EDEV_LOG macros.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Use dependency() instead of manual append to ldflags.
Move libbsd inclusion to librte_eal, so that all other libraries and
PMDs will inherit it.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Since compat library is only a single header, we can easily move it into
the EAL common headers instead of tracking it separately. The downside of
this is that it becomes a little more difficult to have any libs that are
built before EAL depend on it. Thankfully, this is not a major problem as
the only library which uses rte_compat.h and is built before EAL (kvargs)
already has the path to the compat.h header file explicitly called out as
an include path.
However, to ensure that we don't hit problems later with this, we can add
EAL common headers folder to the global include list in the meson build
which means that all common headers can be safely used by all libraries, no
matter what their build order.
As a side-effect, this patch also fixes an issue with building on BSD using
meson, due to compat lib no longer needing to be listed as a dependency.
Fixes: a8499f65a1 ("log: add missing experimental tag")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Add the strlcat function to DPDK to exist alongside the strlcpy one.
While strncat is generally safe for use for concatenation, the API for the
strlcat function is perhaps a little nicer to use, and supports truncation
detection.
See commit 5364de644a ("eal: support strlcpy function") for more
details on the function selection logic, since we only should be using the
DPDK-provided version when no system-provided version is present.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Modern memory mode allowes to not reserve any memory by the
'--socket-mem' option. i.e. it could be possible to specify
zero preallocated memory like '--socket-mem 0'.
Also, it should be possible to configure unlimited memory
allocations by '--socket-limit 0'.
Both cases are impossible now and blocks starting the DPDK
application:
./dpdk-app --socket-limit 0 <...>
EAL: invalid parameters for --socket-limit
EAL: Invalid 'command line' arguments.
Unable to initialize DPDK: Invalid argument
Fixes: 6b42f75632 ("eal: enable non-legacy memory mode")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
It is only possible to know IOMMU type of a given VFIO container
by attempting to initialize it. Since secondary process never
attempts to set up VFIO container itself (because they're shared
between primary and secondary), it never knows which IOMMU type
the container is using, and never sets up the appropriate config
structures. This results in inability to perform DMA mappings in
secondary process.
Fix this by allowing secondary process to query IOMMU type of
primary's default container at device initialization.
Note that this fix is assuming we're only interested in default
container.
Bugzilla ID: 174
Fixes: 6bcb7c95fe ("vfio: share default container in multi-process")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This fixes x86_64-native-linuxapp-clang build with
CONFIG_RTE_FORCE_INTRINSICS=y:
include/generic/rte_atomic.h:218:9: error:
implicit declaration of function '__atomic_exchange_2'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
include/generic/rte_atomic.h:501:9: error:
implicit declaration of function '__atomic_exchange_4'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
include/generic/rte_atomic.h:783:9: error:
implicit declaration of function '__atomic_exchange_8'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
We didn't caught this issue previously on other platforms because
CONFIG_RTE_FORCE_INTRINSICS enabled by default only for armv8.
Fixes: 7bdccb9307 ("eal: fix ARM build with clang")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
When specifying parameters such as hugefile prefix from the
command-line, it is possibly to supply an empty string. This may
lead to various problems: for example, if hugefile prefix is
empty, the runtime config path construction may end up
looking like "/var/run/dpdk//_config", which will technically
work, but is wrong and places files in the wrong place.
To fix it, check lengths of such user-specified parameters for
hugefile prefix, as well as hugepage dir and user-specified
mbuf pool ops string.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
In the unlikely case when the dpdk application is started with no cpu
available in the [0, RTE_MAX_LCORE - 1] range, the master_lcore is
automatically chosen as RTE_MAX_LCORE which triggers an out of bound
access.
Either you have a crash then, or the initialisation fails later when
trying to pin the master thread on it.
In my test, with RTE_MAX_LCORE == 2:
$ taskset -c 2 ./master/app/testpmd --no-huge -m 512 --log-level *:debug
[...]
EAL: pthread_setaffinity_np failed
PANIC in eal_thread_init_master():
cannot set affinity
7: [./master/app/testpmd() [0x47f629]]
Bugzilla ID: 19
Fixes: 2eba8d21f3 ("eal: restrict cores auto detection")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
When incorrect core value or range provided,
as part of -l command line option, a crash occurs.
Added valid range checks to fix the crash.
Added ut check for negative core values.
Added unit test case for invalid core number range.
Fixes: d888cb8b96 ("eal: add core list input format")
Cc: stable@dpdk.org
Signed-off-by: Hari Kumar Vemula <hari.kumarx.vemula@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
INFO is not correct when logging an error.
Fixes: 2395332798 ("eal: add option register infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Not only check against other registered options, but also common EAL
options. This will mitigate user confusion.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Add a usage string field in rte_option, allowing to display
help to the user and describe which options are currently available.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Current options name can be passed with arbitrary format.
Force the use of "--" prefix and thus POSIX long options format.
This restricts the ability to introduce surprising options and will help
future additional checks.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
In case DPDK built using GCC, RTE_TOOLCHAIN_CLANG is not defined.
But 'rte_atomic.h' is a generic header that included to the
external apps like OVS while building with DPDK. As a result,
clang build of OVS fails on armv8 if DPDK built using gcc:
include/generic/rte_atomic.h:215:9: error:
implicit declaration of function '__atomic_exchange_2'
is invalid in C99
include/generic/rte_atomic.h:494:9: error:
implicit declaration of function '__atomic_exchange_4'
is invalid in C99
include/generic/rte_atomic.h:772:9: error:
implicit declaration of function '__atomic_exchange_8'
is invalid in C99
We need to check for current compiler, not the compiler used for
DPDK build.
Fixes: 7bdccb9307 ("eal: fix ARM build with clang")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
The original code was supposed to overwrite the value pointed to
by the pointer, but the new one is instead overwriting the
pointer value itself, which has no effect outside that function.
Fix it by adding a pointer dereference.
Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
A local variable ``flags`` was shadowing another variable from outer
scope. Fix this by renaming the variable and make it const.
Fixes: c127be93f6 ("mem: support using memfd segments for in-memory mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Callbacks are only registered in the primary, so do not attempt to
unregister callbacks in secondary processes.
Fixes: 43e4631371 ("vfio: support memory event callbacks")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
On FreeBSD, closing the file descriptor drops the lock even if the
file descriptor was mmap'ed. This leads to the cleanup at the end
of EAL init to remove fbarray files that are still in use by the
process itself.
However, instead of working around this issue, we can take advantage
of the fact that FreeBSD doesn't really create any per-process
files in the first place, so no cleanup is actually needed.
Fixes: 0a529578f1 ("eal: clean up unused files on initialization")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, we use strdup in a few places to store command-line
parameter values for certain internal config values. There are
several issues with that.
First of all, they're never freed, so memory ends up leaking
either after EAL exit, or when these command-line options are
supplied multiple times.
Second of all, they're defined as `const char *`, so they
*cannot* be freed even if we wanted to.
Finally, strdup may return NULL, which will be stored in the
config. For most fields, NULL is a valid value, but for the
default prefix, the value is always expected to be valid.
To fix all of this, three things are done. First, we change
the definitions of these values to `char *` as opposed to
`const char *`. This does not break the ABI, and previous
code assumes constness (which is more restrictive), so it's
safe to do so.
Then, fix all usages of strdup to check return value, and add
a cleanup function that will free the memory occupied by
these strings, as well as freeing them before assigning a new
value to prevent leaks when parameter is specified multiple
times.
And finally, add an internal API to query hugefile prefix, so
that, absent of a valid value, a default value will be
returned, and also fix up all usages of hugefile prefix to
use this API instead of accessing hugefile prefix directly.
Bugzilla ID: 108
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, malloc statistics and external heap creation code
use memory hotplug lock as a way to synchronize accesses to
heaps (as in, locking the hotplug lock to prevent list of heaps
from changing under our feet). At the same time, malloc
statistics code will also lock the heap because it needs to
access heap data and does not want any other thread to allocate
anything from that heap.
In such scheme, it is possible to enter a deadlock with the
following sequence of events:
thread 1 thread 2
rte_malloc()
rte_malloc_dump_stats()
take heap lock
take hotplug lock
failed to allocate,
attempt to take
hotplug lock
attempt to take heap lock
Neither thread will be able to continue, as both of them are
waiting for the other one to drop the lock. Adding an
additional lock will require an ABI change, so instead of
that, make malloc statistics calls thread-unsafe with
respect to creating/destroying heaps.
Fixes: 72cf92b318 ("malloc: index heaps using heap ID rather than NUMA node")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When secondary process quit, the mp_socket* file still exist, that
cause rte_mp_request_sync fail when try to send message on a floating
socket.
The patch fix the issue by introduce a function rte_mp_channel_cleanup.
This function will be called by rte_eal_cleanup and it will close the
mp socket and delete the mp_socket* file.
Fixes: bacaa27540 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Add missing implementation for 64-bit log2 function, and extend
the unit test to test this new function. Also, remove duplicate
reimplementation of this function from testpmd and memalloc.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add missing implementation for 64-bit fls function, and extend
unit test to test the new function as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add an rte_bsf64 function that follows the convention of existing
rte_bsf32 function. Also, add missing implementation for safe
version of rte_bsf32, and implement unit tests for all recently
added bsf varieties.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The function rte_bsf64 was deprecated in a previous release, so
remove the function, and the deprecation notice associated with
it.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When using --no-shconf or --in-memory modes, there is no runtime
directory to be created, so there is no point in attempting to
clean it.
Fixes: 0a529578f1 ("eal: clean up unused files on initialization")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.
To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.
Fixes: 41dbdb6872 ("mem: add external API to retrieve page fd")
Fixes: 3a44687139 ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.
Fixes: 5282bb1c36 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.
This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The external heaps API already implicitly expects start address
of the external memory area to be page-aligned, but it is not
enforced or documented. Fix this by implementing additional
parameter checks at memory add call, and document the page
alignment requirement explicitly.
Fixes: 7d75c31014 ("malloc: allow adding memory to named heaps")
Cc: stable@dpdk.org
Suggested-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
We already trigger a mem event notification inside the walk function,
no need to do it twice.
Fixes: f32c7c9de9 ("malloc: enable event callbacks for external memory")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When secondary process hotplugs memory, it sends a request
to primary, which then performs the real mmap() and sends
sync requests to all secondary processes. Upon receiving
such sync request, each secondary process will notify the
upper layers of hotplugged memory (and will call all
locally registered event callbacks).
In the end we'll end up with memory event callbacks fired
in all the processes except the primary, which is a bug.
This gets critical if memory is hotplugged while a VFIO
device is attached, as the VFIO memory registration -
which is done from a memory event callback present in the
primary process only - is never called.
After this patch, a primary process fires memory event
callbacks before secondary processes start their
synchronizations - both for hotplug and hotremove.
Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Seth Howell <seth.howell@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory. This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.
DPDK creates internal malloc_elem structures for these
allocated regions. As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine. This results
in subsequent allocations that can span across multiple
RDMA MRs. This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC. It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.
As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification. This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing. It is not possible
to unregister part of a memory region.
To support these types of applications, this patch adds
a new --match-allocations EAL init flag. When this
flag is specified, malloc elements from different
hugepage allocations will never be merged. Memory will
also only be freed back to the system (with the requisite
memory event callback) exactly as it was originally
allocated.
Since part of this patch is extending the size of struct
malloc_elem, we also fix up the malloc autotests so they
do not assume its size exactly fits in one cacheline.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
The RTE_PROC_PRIMARY error handler lost the unlock statement in the
current codes. Now unlock and return in one place to fix it.
Fixes: 49df3db848 ("memzone: replace memzone array with fbarray")
Cc: stable@dpdk.org
Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add the check for null peer pointer like the bundle pointer in the mp request
handler. They should follow same style. And add some logs for nomem cases.
Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When rte_eal_alarm_set failed, need to free the bundle mem in the
error handler of handle_primary_request and handle_secondary_request.
Fixes: 244d513071 ("eal: enable hotplug on multi-process")
Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org
Signed-off-by: Gao Feng <davidfgao@tencent.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Missing brackets around the if means that the loop will end at
its first iteration.
Fixes: 2395332798 ("eal: add option register infrastructure")
Cc: stable@dpdk.org
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
When creating process data structures, EAL will create many files
in EAL runtime directory. Because we allow multiple secondary
processes to run, each secondary process gets their own unique
file. With many secondary processes running and exiting on the
system, runtime directory will, over time, create enormous amounts
of sockets, fbarray files and other stuff that just sits there
unused because the process that allocated it has died a long time
ago. This may lead to exhaustion of disk (or RAM) space in the
runtime directory.
Fix this by removing every unlocked file at initialization that
matches either socket or fbarray naming convention. We cannot be
sure of any other files, so we'll leave them alone. Also, remove
similar code from mp socket code.
We do it at the end of init, rather than at the beginning, because
secondary process will use primary process' data structures even
if the primary itself has died, and we don't want to remove those
before we lock them.
Bugzilla ID: 106
Cc: stable@dpdk.org
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When rte_log_register_type_and_pick_level() has been introduced, it has
been correctly added to the EXPERIMENTAL section of the eal map and the
symbol itself has been marked at its definition.
However, the declaration of this symbol in rte_log.h is missing the
__rte_experimental tag.
Because of this, a user can try to call this symbol without being aware
this is an experimental api (neither compilation nor link warning).
Fixes: b22e77c026 ("eal: register log type and pick level from args")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Prior to this patch, the two affected .c files include <dirent.h>
unnecessarily. This commit removes the include lines.
Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Start version numbering for a new release cycle,
and introduce a template file for release notes.
The release notes comments are updated to mandate
a scope label for API and ABI changes.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
When RTE_EAL_NUMA_AWARE_HUGEPAGES is set to "n", not all memtypes
will be valid, because we skip some due to not supporting other
NUMA nodes, leading to a division by zero error down the line
because the necessary memtype fields weren't populated.
Fix it by limiting number of memtypes to number of memtypes we
have actually created.
Fixes: 1dd342d0fd ("mem: improve segment list preallocation")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Even if a device failed to plug, it's still a device
object that references the devargs. Those devargs will
be freed automatically together with the device, but
freeing them any earlier - like it's done in the hotplug
error handling path right now - will give us a dangling
pointer and a segfault scenario.
Consider the following case:
* secondary process receives the hotplug request IPC message
* devargs are either created or updated
* the bus is scanned
* a new device object is created with the latest devargs
* the device can't be plugged for whatever reason,
bus->plug returns error
* the devargs are freed, even though they're still referenced
by the device object on the bus
For PCI devices, the generic device name comes from
a buffer within the devargs. Freeing those will make
EAL segfault whenever the device name is checked.
This patch just prevents the hotplug error handling
path from removing the devargs when there's a device
that references them. This is done by simply exiting
early from the hotplug function. As mentioned in the
beginning, those devargs will be freed later, together
with the device itself.
Fixes: 7e8b266501 ("eal: fix hotplug add / remove")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Device detach triggered through IPC leaked some memory.
It allocated a devargs objects just to use it for
parsing the devargs string in order to retrieve the
device name. Those devargs weren't passed anywhere
and were never freed.
First of all, let's put those devargs on the stack,
so they doesn't need to be freed. Then free the
additional arguments string as soon as it's allocated,
because we won't need it.
Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Consider the following scenario:
1) primary process (A) starts, probes the bus
2) a secondary process (B) starts, probes the bus
3) yet another secondary process (C) starts
4) (C) registers the pci driver and hotplugs the device
* an IPC attach req is sent to the primary (A)
* (A) ignores the -EEXIST from process-local probe
* (A) propagates the request to all secondary processes
* (B) responds with -EEXIST
* (A) replies to the original request with the -EEXIST
return code
* the -EEXIST is returned back to the user, although the
device was successfully attached both locally and in
all other processes
This patch makes the primary process reply with rc=0 even if
there was another secondary process with the device already
attached. The primary process already didn't reply with -EEXIST
when the device was attached locally, so now this behavior is
even more consistent. Looking by the code, this seems to be the
originally intended behavior.
Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
When primary process receives an IPC attach request
of a device that's already locally-attached, it
doesn't setup its variables properly and is prone to
segfaulting on a subsequent rollback.
`ret = local_dev_probe(req->devargs, &dev)`
The above function will set `dev` pointer to the
proper device *unless* it returns with error. One of
those errors is -EEXIST, which the hotplug function
explicitly ignores. For -EEXIST, it proceeds with
attaching the device and expects the dev pointer to
be valid.
This patch makes `local_dev_probe` set the dev pointer
even if it returns -EEXIST.
Fixes: ac9e4a1737 ("eal: support attach/detach shared device from secondary")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
If a device fails to attach before it's plugged,
the subsequent rollback will still try to detach it,
causing a segfault. Unplugging a device that wasn't
plugged isn't really supported, so this patch adds
an extra error check to prevent that from happening.
While here, fix this also for normal (non-rollback)
detach, which could also theoretically segfault on
non-plugged device.
Fixes: 244d513071 ("eal: enable hotplug on multi-process")
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
If rte_eal_iopl_init() will be called more than once we'll leak
the file descriptor.
Fixes: b46fe31862 ("eal/bsd: fix virtio on FreeBSD")
Cc: stable@dpdk.org
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
In case of optimized compilation, RTE_BUILD_BUG_ON use an external
variable which is neither defined, nor used.
It seems not optimized out in case of OPDL compiled with clang -O1:
opdl_ring.c: undefined reference to `RTE_BUILD_BUG_ON_detected_error'
clang-6.0: fatal error: linker command failed with exit code 1
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Rename rte_bsf64 to rte_bsf64_safe (this is a "safe" version in
that it prevents undefined behavior by checking if incoming
parameter is zero) and move it to common header.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
RTE_BITMAP_OPTIMIZATIONS was never set to 0 and makes no sense
anyway, so remove all code related to it. Also, drop the "likely"
for bsf64 code, because it's a generic function and we cannot
make any assumptions about likely values of incoming arguments.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Previous fix for rte_panic has moved setting of alarm before
sending the message. This means that whether we send a message,
the alarm would still trigger. The comment noted that cleanup
would happen in the alarm handler, but that's not what actually
happened - instead, in the event of failed send we freed the
memory in-place, before putting the request on the queue.
This works OK when the message is sent, but when sending the
message fails, the alarm would still trigger with a pointer
argument that points to non-existent memory, and cause
memory corruption.
There probably is a "proper" fix for this issue, with correct
handling of sent vs. unsent requests, however it would be
simpler just to sacrifice the sent request in the (extremely
unlikely) event of alarm set failing. The other process would
still send a response, but it will be ignored by the sender.
Fixes: 45e5f49e87 ("ipc: remove panic in async request")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When device be hot-unplugged, the hot-unplug handler will be invoked by uio
remove event and the device will be detached, then kernel will sent another
pci remove event. So if there is any unlock miss, it will cause a dead lock
issue. This patch will add this missing unlock for hot-unplug handler.
Fixes: 0fc54536b1 ("eal: add failure handling for hot-unplug")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Removed the use of MAP_HUGETLB for anonymous mapping on ppc64. The
MAP_HUGETLB had previously been added to workaround issues on IBM Power8
systems when mapping /dev/zero.
In the current code the MAP_HUGETLB flag will cause the anonymous mapping
to fail on Power9.
Note, Power8 is currently failing to correctly mmap Hugepages, with and
without this change.
Fixes: 284ae3e9ff ("eal/ppc: fix mmap for memory initialization")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeep@us.ibm.com>
It may so happen that two memory locations may be adjacent in
virtual memory, but belong to different segment lists. With
current code, such segments will be concatenated. Fix the
adjacency checking code to also check if the adjacent malloc
elements belong to the same memseg list.
Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
For IOVA as VA mode, we assume that memory is contiguous. However,
for external segments that assumption may not necessarily hold.
Fix the code to not assume that external memory segments are
contiguous even in IOVA as VA mode.
Fixes: 5282bb1c36 ("mem: allow memseg lists to be marked as external")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The rte_eal_get_runtime_dir() function is currently being declared in two
header files.
This API was made public in commit 6911c9fd8f ("eal: export function to
get runtime directory"), adding it to rte_eal.h. To make it public, the
'rte' prefix was added to the function so it needed to be modified in the
original location of the declaration, eal_filesystem.h. By only modifying,
and not removing the decalration, it is now a duplicate.
This patch removes the declaration from eal_filesystem.h.
Fixes: 6911c9fd8f ("eal: export function to get runtime directory")
Reported-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This updates the license on the rte_rtm.h file to be the standard
BSD-3-Clause license used for the rest of DPDK, thus bringing the file in
compliance with the DPDK licensing policy.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
When TSX transactions abort, it is generally worth retrying a number of
times before falling back to the traditional locking path, as the
parallelism benefits from TSX can be worth it when a transaction does
succeed. For cases with multiple threads and high contention rates, it
can be useful to have increasing delays between retry attempts, so as to
avoid having the same threads repeatedly collided.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>