IOVA mode in DPDK is either PA or VA.
The new build option enable_iova_as_pa configures the mode to PA
at compile time.
By default, this option is enabled.
If the option is disabled, only drivers which support it are enabled.
Supported driver can set the flag pmd_supports_disable_iova_as_pa
in its build file.
mbuf structure holds the physical (PA) and virtual address (VA).
If IOVA as PA is disabled at compile time, PA field (buf_iova)
of mbuf is redundant as it is the same as VA
and is replaced by a dummy field.
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
During EAL init, all buses are probed and the devices found are
initialized. On eal_cleanup(), the inverse does not happen, meaning any
allocated memory and other configuration will not be cleaned up
appropriately on exit.
Currently, in order for device cleanup to take place, applications must
call the driver-relevant functions to ensure proper cleanup is done before
the application exits. Since initialization occurs for all devices on the
bus, not just the devices used by an application, it requires a)
application awareness of all bus devices that could have been probed on the
system, and b) code duplication across applications to ensure cleanup is
performed. An example of this is rte_eth_dev_close() which is commonly used
across the example applications.
This patch proposes adding bus cleanup to the eal_cleanup() to make EAL's
init/exit more symmetrical, ensuring all bus devices are cleaned up
appropriately without the application needing to be aware of all bus types
that may have been probed during initialization.
Contained in this patch are the changes required to perform cleanup for
devices on the PCI bus and VDEV bus during eal_cleanup(). There would be an
ask for bus maintainers to add the relevant cleanup for their buses since
they have the domain expertise.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
In case of higher order (greater than 99) logical cores, name was
truncated (length is restricted to 16 characters, including the
terminating null byte ('\0')) and it makes hard to follow threads.
Before this fix, this issue can be reproduced using following arguments:
--lcores=0,10@1,100@2
Then we had:
lcore-worker-10
lcore-worker-10
Signed-off-by: Abdullah Ömer Yamaç <omer.yamac@ceng.metu.edu.tr>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Do not include <ctype.h>, <errno.h>, and <stdlib.h> from <rte_common.h>,
because they are not used by this file.
Include the needed headers directly from the files that need them.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add support for using hugepages for worker lcore stack memory. The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.
EAL option '--huge-worker-stack[=stack-size-in-kbytes]' is added to allow
the feature to be enabled at runtime. If the size is not specified,
the system pthread stack size will be used.
Signed-off-by: Don Wallwork <donw@xsightlabs.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Bug scenario:
1. start testpmd:
$ dpdk-testpmd -l 4-6 -a 0000:7d:00.0 --trace=.* -- -i
2. quit testpmd and then observed segment fault:
Bye...
Segmentation fault (core dumped)
The root cause is that rte_trace_save() and eal_trace_fini() access
the huge pages which were cleanup by rte_eal_memory_detach().
This patch moves rte_trace_save() and eal_trace_fini() before
rte_eal_memory_detach() to fix the bug.
Fixes: dfbc61a2f9 ("mem: detach memsegs on cleanup")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
All OS implementations provide the same main loop.
Introduce helpers (shared for Linux and FreeBSD) to handle synchronisation
between main and threads and factorize the rest as common code.
Thread id are now logged as string in a common format across OS.
Note:
- this change also fixes Windows EAL: worker threads cpu affinity was
incorrectly reported in log.
- libabigail flags this change as breaking ABI in clang builds:
1 function with some indirect sub-type change:
[C] 'function int rte_eal_remote_launch(int (void*)*, void*, unsigned
int)' at eal_common_launch.c:35:1 has some indirect sub-type
changes:
parameter 1 of type 'int (void*)*' changed:
in pointed to type 'function type int (void*)' at rte_launch.h:31:1:
entity changed from 'function type int (void*)' to 'typedef
lcore_function_t' at rte_launch.h:31:1
type size hasn't changed
This is being investigated on libabigail side.
For now, we don't have much choice but to waive reports on this symbol.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
So far, a worker thread has been using its thread_id to discover which
lcore has been assigned to it.
On the other hand, as noted by Tyler, the pthread API does not strictly
guarantee that a new thread won't start running eal_thread_loop before
pthread_create writes to &lcore_config[xx].thread_id.
Though all OS implementations supported in DPDK (recently) ensure this
property, it is more robust to have the main thread directly pass
the worker thread lcore.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.
Remove redundant NULL pointer checks before free functions
found by nullfree.cocci
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The mp action resources in malloc should be cleaned up via
rte_eal_cleanup.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When rte_eal_cleanup is called, hotplug should unregister the
resources associated with the multi-process server.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When rte_eal_cleanup is called the rte_mp_action for VFIO
should be freed.
Fixes: edf73dd330 ("ipc: handle unsupported IPC in action register")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When application calls rte_eal_cleanup on shutdown,
the DPDK log should be closed and cleaned up.
This helps reduce false reports from tools like ASAN
and valgrind that track memory leaks.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Both Linux and FreeBSD have same code for creating runtime
directory and reading sysfs files. Put them in the new lib/eal/unix
subdirectory.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Systemd.exec supports configuring the runtime directory of a service
via RuntimeDirectory=. This creates the directory with the necessary
permissions which actual service may not have if running in container.
The change to DPDK is to look for the environment RUNTIME_DIRECTORY
first and use that in preference to the fallback alternatives.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
The size argument to eal_set_runtime_dir is useless and was
being used incorrectly in strlcpy. It worked only because
all callers passed PATH_MAX which is same as sizeof the destination
runtime_dir.
Note: this is an internal API so no user exposed change.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Linux EAL ensured that mapped hugepages are clean
by always mapping from newly created files:
existing hugepage backing files were always removed.
In this case, the kernel clears the page to prevent data leaks,
because the mapped memory may contain leftover data
from the previous process that was using this memory.
Clearing takes the bulk of the time spent in mmap(2),
increasing EAL initialization time.
Introduce a mode to keep existing files and reuse them
in order to speed up initial memory allocation in EAL.
Hugepages mapped from such files may contain data
left by the previous process that used this memory,
so RTE_MEMSEG_FLAG_DIRTY is set for their segments.
If multiple hugepages are mapped from the same file:
1. When fallocate(2) is used, all memory mapped from this file
is considered dirty, because it is unknown
which parts of the file are holes.
2. When ftruncate(3) is used, memory mapped from this file
is considered dirty unless the file is extended
to create a new mapping, which implies clean memory.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
clang-13 rightfully complains that the total_mem variable in
eal_parse_socket_arg is set but not used, since the final
accumulated total_mem result isn't used anywhere.
So just remove the total_mem variable.
Fixes: 0a703f0f36 ("eal/linux: fix parsing zero socket memory and limits")
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.
Implementing alarm cleanup routine, where the memory allocated
for interrupt instance can be freed.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Telemetry interface should be exposed for primary processes only, since
secondary processes will conflict on socket creation, and since all
data in secondary process is generally available to primary. For
example, all device stats for ethdevs, cryptodevs, etc. will all be
common across processes.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
When multi-process is not wanted and DPDK is run with the "no-shconf"
flag, the telemetry library still needs a runtime directory to place the
unix socket for telemetry connections. Therefore, rather than not
creating the directory when this flag is set, we can change the code to
attempt the creation anyway, but not error out if it fails. If it
succeeds, then telemetry will be available, but if it fails, the rest of
DPDK will run without telemetry. This ensures that the "in-memory" flag
will allow DPDK to run even if the whole filesystem is read-only, for
example.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
There is no reason for the DPDK libraries to all have 'librte_' prefix on
the directory names. This prefix makes the directory names longer and also
makes it awkward to add features referring to individual libraries in the
build - should the lib names be specified with or without the prefix.
Therefore, we can just remove the library prefix and use the library's
unique name as the directory name, i.e. 'eal' rather than 'librte_eal'
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>