Make rte_device opaque for non internal users.
This will make extending this object possible without breaking the ABI.
Some applications may have been dereferencing rte_device objects, mark
this object's accessors as stable.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
For diagnostic, it may be useful to provide a description of the device
with bus specific information.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Prepare for making the device object opaque by adding accessors.
Update existing "external" users.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Make rte_driver opaque for non internal users.
This will make extending this object possible without breaking the ABI.
Introduce a new driver header and move rte_driver definition.
Update drivers and library to use the internal header.
Some applications may have been dereferencing rte_driver objects, mark
this object's accessors as stable.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
Prepare for making the driver object opaque by adding accessors.
Update existing "external" users.
Internal users may still dereference a rte_driver object.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Make rte_bus opaque for non internal users.
This will make extending this object possible without breaking the ABI.
Introduce a new driver header and move rte_bus definition and helpers.
Update drivers and library to use the internal header.
Some applications may have been dereferencing rte_bus objects, mark
this object's accessors as stable.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add helpers to get a rte_bus object details.
This will be used externally.
Internal users may still dereference a rte_bus object.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
iova enum definition does not need to be defined as part of the bus API.
Move it to rte_eal.h.
With this step, rte_eal.h does not depend on rte_bus.h and rte_dev.h.
Fix existing code that was relying on these implicit inclusions.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
For any bus that does not support device iteration, rte_dev_iterator_init
both returned an error code and logged an error message.
An application (like testpmd) that only wants to list devices, would have
no choice but to inspect a bus object to avoid spewing error logs.
Make those log messages debug level, and remove the check in testpmd.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
DLB2 has a need to parse a user supplied coremask as part
of an optimization that associates optimal core/resource
pairs. Therefore eal_parse_coremask has been renamed
to rte_eal_parse_coremask and exported but kept internal.
Signed-off-by: Abdullah Sevincer <abdullah.sevincer@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Do not include <ctype.h>, <errno.h>, and <stdlib.h> from <rte_common.h>,
because they are not used by this file.
Include the needed headers directly from the files that need them.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
There is no reason for rte_str_to_size() to be inline.
Move the implementation out of <rte_common.h>.
Export it as a stable ABI because it always has been public.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
This commit fixes an issue where calling rte_service_lcore_stop()
would result in a service's "active on lcore" status becoming stale.
The stale status would result in rte_service_may_be_active() always
returning "1", indicating that the service is not certainly stopped.
This is fixed by ensuring the "active on lcore" status of each service
is set to 0 when an lcore is stopped.
Fixes: e30dd31847d2 ("service: add mechanism for quiescing")
Fixes: 8929de043eb4 ("service: retrieve lcore active state")
Reported-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Add support for using hugepages for worker lcore stack memory. The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.
EAL option '--huge-worker-stack[=stack-size-in-kbytes]' is added to allow
the feature to be enabled at runtime. If the size is not specified,
the system pthread stack size will be used.
Signed-off-by: Don Wallwork <donw@xsightlabs.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
If called to allocate memory of size is between multiple of hugepage
size minus malloc_header_len and hugepage size, rte_malloc fails.
This fix replaces malloc_elem_trailer_len with malloc_elem_overhead in
try_expand_heap() to include malloc_elem_header_len when calculating
n_seg.
Bugzilla ID: 800
Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Fidaullah Noonari <fidaullah.noonari@emumba.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
The PIE code and other applications can benefit from having a
fast way to get a random floating point value. This new function
is equivalent to drand() in the standard library.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Bug scenario:
1. start testpmd:
$ dpdk-testpmd -l 4-6 -a 0000:7d:00.0 --trace=.* \
--file-prefix=trace_autotest -- -i
2. then observed:
EAL: eal_trace_init():93 failed to initialize trace [File exists]
EAL: FATAL: Cannot init trace
EAL: Cannot init trace
EAL: Error - exiting with code: 1
The root cause it that the offset set wrong with long file-prefix and
then lead the strftime return failed.
At the same time, trace_session_name_generate() uses errno as the return
value, but the errno was not set if strftime returned zero.
A previously set errno (EEXIST or ENOENT from call to mkdir for creating
the runtime configuration directory) was returned in this case.
This is fragile and may lead to incorrect logic if errno was set
to 0 previously.
This also resulted in inaccurate prompting.
Set errno to ENOSPC if strftime return zero.
Fixes: 321dd5f8fa62 ("trace: add internal init and fini interface")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Caught by ASan, if a secondary process tried to attach a device with an
incorrect driver name, devargs was leaked.
Fixes: 64051bb1f144 ("devargs: unify scratch buffer storage")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
As described in bugzilla, ASan reports accesses to all memory segment as
invalid, since those parts have not been allocated with rte_malloc.
Move __rte_no_asan to rte_common.h and disable ASan on a part of the test.
Bugzilla ID: 880
Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, when we free previously allocated memory, we mark the area as
"freed" for ASan purposes (flag 0xfd). However, sometimes, freeing a
malloc element will cause pages to be unmapped from memory and re-backed
with anonymous memory again. This may cause ASan's "use-after-free"
error down the line, because the allocator will try to write into
memory areas recently marked as "freed".
To fix this, we need to mark the unmapped memory area as "available",
and fixup surrounding malloc element header/trailers to enable later
malloc routines to safely write into new malloc elements' headers or
trailers.
Bugzilla ID: 994
Fixes: 6cc51b1293ce ("mem: instrument allocator for ASan")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
All OS implementations provide the same main loop.
Introduce helpers (shared for Linux and FreeBSD) to handle synchronisation
between main and threads and factorize the rest as common code.
Thread id are now logged as string in a common format across OS.
Note:
- this change also fixes Windows EAL: worker threads cpu affinity was
incorrectly reported in log.
- libabigail flags this change as breaking ABI in clang builds:
1 function with some indirect sub-type change:
[C] 'function int rte_eal_remote_launch(int (void*)*, void*, unsigned
int)' at eal_common_launch.c:35:1 has some indirect sub-type
changes:
parameter 1 of type 'int (void*)*' changed:
in pointed to type 'function type int (void*)' at rte_launch.h:31:1:
entity changed from 'function type int (void*)' to 'typedef
lcore_function_t' at rte_launch.h:31:1
type size hasn't changed
This is being investigated on libabigail side.
For now, we don't have much choice but to waive reports on this symbol.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
So far, a worker thread has been using its thread_id to discover which
lcore has been assigned to it.
On the other hand, as noted by Tyler, the pthread API does not strictly
guarantee that a new thread won't start running eal_thread_loop before
pthread_create writes to &lcore_config[xx].thread_id.
Though all OS implementations supported in DPDK (recently) ensure this
property, it is more robust to have the main thread directly pass
the worker thread lcore.
Signed-off-by: David Marchand <david.marchand@redhat.com>
The function rte_devargs_parse() previously was safe to call with
non-initialized devargs structure as parameter.
When adding the support for the global device syntax,
this assumption was broken.
Restore it by forcing memset as part of the call itself.
Bugzilla ID: 933
Fixes: b344eb5d941a ("devargs: parse global device syntax")
Cc: stable@dpdk.org
Signed-off-by: Madhuker Mythri <madhuker.mythri@oracle.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.
Remove redundant NULL pointer checks before free functions
found by nullfree.cocci
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The mp action resources in malloc should be cleaned up via
rte_eal_cleanup.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When rte_eal_cleanup is called, hotplug should unregister the
resources associated with the multi-process server.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When rte_eal_cleanup is called, all control threads should exit.
For the mp thread, this best handled by closing the mp_socket
and letting the thread see that.
This also fixes potential problems where the mp_socket gets
another hard error, and the thread runs away repeating itself
by reading the same error.
Fixes: 85d6815fa6d0 ("eal: close multi-process socket during cleanup")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
When application calls rte_eal_cleanup on shutdown,
the DPDK log should be closed and cleaned up.
This helps reduce false reports from tools like ASAN
and valgrind that track memory leaks.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The function malloc() could return NULL, the return value
need to be checked.
Fixes: 6f63858e55e6 ("mem: prevent preallocated pages from being freed")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The size argument to eal_set_runtime_dir is useless and was
being used incorrectly in strlcpy. It worked only because
all callers passed PATH_MAX which is same as sizeof the destination
runtime_dir.
Note: this is an internal API so no user exposed change.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Added an internal helper to get OS-specific EAL mapping base address
This helper can be used by the drivers to program offload / accelerator
devices, where the base address can be used as a reference address by
the accelerator to access the host memory
An address can also be represented as an offset relative to the base
address using smaller data types
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Linux EAL ensured that mapped hugepages are clean
by always mapping from newly created files:
existing hugepage backing files were always removed.
In this case, the kernel clears the page to prevent data leaks,
because the mapped memory may contain leftover data
from the previous process that was using this memory.
Clearing takes the bulk of the time spent in mmap(2),
increasing EAL initialization time.
Introduce a mode to keep existing files and reuse them
in order to speed up initial memory allocation in EAL.
Hugepages mapped from such files may contain data
left by the previous process that used this memory,
so RTE_MEMSEG_FLAG_DIRTY is set for their segments.
If multiple hugepages are mapped from the same file:
1. When fallocate(2) is used, all memory mapped from this file
is considered dirty, because it is unknown
which parts of the file are holes.
2. When ftruncate(3) is used, memory mapped from this file
is considered dirty unless the file is extended
to create a new mapping, which implies clean memory.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
In preparation to extend --huge-unlink option semantics
refactor how it is stored in the internal configuration.
It makes future changes more isolated.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
EAL malloc layer assumed all free elements content
is filled with zeros ("clean"), as opposed to uninitialized ("dirty").
This assumption was ensured in two ways:
1. EAL memalloc layer always returned clean memory.
2. Freed memory was cleared before returning into the heap.
Clearing the memory can be as slow as around 14 GiB/s.
To save doing so, memalloc layer is allowed to return dirty memory.
Such segments being marked with RTE_MEMSEG_FLAG_DIRTY.
The allocator tracks elements that contain dirty memory
using the new flag in the element header.
When clean memory is requested via rte_zmalloc*()
and the suitable element is dirty, it is cleared on allocation.
When memory is deallocated, the freed element is joined
with adjacent free elements, and the dirty flag is updated:
a) If the joint element contains dirty parts, it is dirty:
dirty + freed + dirty = dirty => no need to clean
freed + dirty = dirty the freed memory
Dirty parts may be large (e.g. initial allocation),
so clearing them could create unpredictable slowdown.
b) If the only dirty part of the joint element
is the freed memory, the joint element can be made clean:
clean + freed + clean = clean => freed memory
clean + freed = clean must be cleared
freed + clean = clean
freed = clean
This logic naturally reproduces the old behavior
and always applies in modes when EAL memalloc layer
returns only clean segments.
As a result, memory is either cleared on free, as before,
or it will be cleared on allocation if need be, but never twice.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
On Windows, strerror returns just "Unknown error" for errnum greater
than MAX_ERRNO, while linux and freebsd returns "Unknown error <num>",
which is the current expectation for errno_autotest. Differentiate
the error string on Windows to remove a "duplicate error code" failure.
Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Any EAL memory allocation often goes through eal_get_virtual_area()
function, which will print a warning whenever the resulting allocation
didn't match the specified address requirements. This is useful for
when we have requested a specific base virtual address, to let the user
know that the mapping has deviated from that address.
However, on Linux, we also have a default base address that's there to
ensure better chances of successful secondary process initialization,
as well as higher likelihood of the virtual areas to fit inside the
IOMMU address width. Because of this default base address, there are
warnings printed even when no base address was explicitly requested,
which can be confusing to the user.
Emit this warning with debug level unless base address was explicitly
requested by the user.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add support for Address Sanitizer (ASan) for PPC/POWER architecture.
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch defines ASAN_SHADOW_OFFSET for arm64 according to the ASan
documentation. This offset should cover all arm64 VMAs supported by
ASan.
Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
When using rte_malloc() from a thread which is not bound to a numa
socket (the typical case is a control thread, but it can also happen
on a dataplane thread if its cpu affinity is on cores attached to
several sockets), the used heap is the one from numa socket 0, which
may not have available memory.
Fix this by selecting the first socket which has available memory.
Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.
Fixes: b94580d6887e ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org
Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
If the user requests to use an lcore above 128 using -l,
the eal will exit with "EAL: invalid core list syntax" and
very little else useful information.
This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application to physical cores.
For example, if "-l 12-16,130,132" is used, we see the following
additional output on the command line:
EAL: lcore 132 >= RTE_MAX_LCORE (128)
EAL: lcore 133 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133
The same is added to -c option parsing.
For example, if "-c 0x300000000000000000000000000000000" is
used, we see the following additional output on the command line:
EAL: lcore 128 >= RTE_MAX_LCORE (128)
EAL: lcore 129 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@128,1@129
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Devargs used in device iterator initialization wasn't set to zero, random
data like bus string lead to invalid address access.
This patch initializes devargs.
Bugzilla ID: 862
Fixes: c99a2d4c6b7f ("eal: implement device iteration initialization")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
This patch adds necessary hooks in the memory allocator for ASan.
This feature is currently available in DPDK only on Linux x86_64.
If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be
defined and RTE_MALLOC_ASAN must be set accordingly in meson.
Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Remove the usage of pthread barrier and replace it with
synchronization using atomic variable.
This also removes the use of reference count required to synchronize
freeing the memory.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Dynamically allocating the efds and elist array of intr_handle
structure, based on size provided by user. Eg size can be
MSIX interrupts supported by a PCI device.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
VFIO/UIO are mutually exclusive, storing file descriptor in a single
field is enough.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Moving interrupt handle structure definition inside a EAL private
header to make its fields totally opaque to the outside world.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.
Implementing alarm cleanup routine, where the memory allocated
for interrupt instance can be freed.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>