Commit Graph

206 Commits

Author SHA1 Message Date
Deepak Khandelwal
90bf3f89ed mem: skip attaching external memory in secondary process
Currently, EAL init in secondary processes will attach all fbarrays
in the memconfig to have access to the primary process's page tables.
However, fbarrays corresponding to external memory segments should
not be attached at initialization, because this will happen as part
of `rte_extmem_attach` [1] or `rte_malloc_heap_memory_attach` [2] calls.

1: https://doc.dpdk.org/api/rte__memory_8h.html#a2796da68de6825f8edf53759f8e4d230
2: https://doc.dpdk.org/api/rte__malloc_8h.html#af6360dea35bdf162feeb2b62cf149fd3

Fixes: ff3619d624 ("malloc: allow attaching to external memory chunks")
Cc: stable@dpdk.org

Suggested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Deepak Khandelwal <deepak.khandelwal@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-04-28 13:44:13 +02:00
Tyler Retzlaff
8001c0ddfe eal/windows: set main lcore affinity
Add missing code to affinitize main_lcore from lcore configuration.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-04-25 09:38:15 +02:00
Mattias Rönnblom
76076342ec eal: emit warning for unused trylock return value
Mark the trylock family of spinlock functions with
__rte_warn_unused_result.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-04-14 14:38:20 +02:00
Mattias Rönnblom
eb13e558f8 eal: add macro to warn for unused function return values
This patch adds a wrapper macro __rte_warn_unused_result for the
warn_unused_result function attribute.

Marking a function __rte_warn_unused_result will make the compiler
emit a warning in case the caller does not use the function's return
value.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-04-14 14:38:20 +02:00
David Marchand
a95d70547c eal: factorize lcore main loop
All OS implementations provide the same main loop.
Introduce helpers (shared for Linux and FreeBSD) to handle synchronisation
between main and threads and factorize the rest as common code.
Thread id are now logged as string in a common format across OS.

Note:
- this change also fixes Windows EAL: worker threads cpu affinity was
  incorrectly reported in log.

- libabigail flags this change as breaking ABI in clang builds:
  1 function with some indirect sub-type change:

  [C] 'function int rte_eal_remote_launch(int (void*)*, void*, unsigned
      int)' at eal_common_launch.c:35:1 has some indirect sub-type
      changes:
    parameter 1 of type 'int (void*)*' changed:
      in pointed to type 'function type int (void*)' at rte_launch.h:31:1:
        entity changed from 'function type int (void*)' to 'typedef
          lcore_function_t' at rte_launch.h:31:1
        type size hasn't changed

  This is being investigated on libabigail side.
  For now, we don't have much choice but to waive reports on this symbol.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-04-14 13:59:50 +02:00
David Marchand
449e7dbc7b eal: cleanup lcore ID hand-over
So far, a worker thread has been using its thread_id to discover which
lcore has been assigned to it.

On the other hand, as noted by Tyler, the pthread API does not strictly
guarantee that a new thread won't start running eal_thread_loop before
pthread_create writes to &lcore_config[xx].thread_id.

Though all OS implementations supported in DPDK (recently) ensure this
property, it is more robust to have the main thread directly pass
the worker thread lcore.

Signed-off-by: David Marchand <david.marchand@redhat.com>
2022-04-14 13:59:50 +02:00
David Marchand
1e230b9be8 eal/windows: add missing C++ include guards
Add missing 'extern "C"' to file.

Fixes: 1db72630da ("eal/windows: do not expose private facilities")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-04-08 10:49:39 +02:00
Tyler Retzlaff
e4e983b975 eal/windows: fix data race when creating threads
eal_thread_loop() uses lcore_config[i].thread_id,
which is stored upon the return from CreateThread().
Per documentation, eal_thread_loop() can start
before CreateThread() returns and the ID is stored.

Create lcore worker threads suspended and then subsequently resume to
allow &lcore_config[i].thread_id be stored before eal_thread_loop
execution.

Fixes: 53ffd9f080 ("eal/windows: add minimum viable code")
Cc: stable@dpdk.org

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2022-03-30 19:01:52 +02:00
Haiyue Wang
f6ecec2b91 eal/x86: remove atomic header include loop
Remove the x86 top atomic header include from the architecture related
header file, since this x86 top atomic header file has included them.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
2022-03-30 18:54:14 +02:00
Bruce Richardson
29fd052dcc eal/freebsd: add missing C++ include guards
Add missing 'extern "C"' to file.

Fixes: 428eb983f5 ("eal: add OS specific header file")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2022-03-15 02:06:13 +01:00
Wenxuan Wu
4e3582ab5b eal/linux: fix device monitor stop return
The ret value in rte_dev_event_monitor_stop stands for whether the
monitor has been successfully closed, and should not bind with
rte_intr_callback_unregister, so once it goes to the right exit point of
rte_dev_event_monitor, the ret value should be set to 0.

Also, the refmonitor count has been carefully evaluated, the value
change from 1 to 0, so there is no potential memory leak failure.

Fixes: 1fef6ced07 ("eal/linux: allow multiple starts of event monitor")
Cc: stable@dpdk.org

Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com>
2022-03-07 19:21:18 +01:00
Madhuker Mythri
356a2aa054 devargs: fix crash with uninitialized parsing
The function rte_devargs_parse() previously was safe to call with
non-initialized devargs structure as parameter.

When adding the support for the global device syntax,
this assumption was broken.
Restore it by forcing memset as part of the call itself.

Bugzilla ID: 933
Fixes: b344eb5d94 ("devargs: parse global device syntax")
Cc: stable@dpdk.org

Signed-off-by: Madhuker Mythri <madhuker.mythri@oracle.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
2022-02-27 19:28:59 +01:00
Steve Yang
1a287fc9c9 eal/linux: fix illegal memory access in uevent handler
'recv()' fills the 'buf', later 'strlcpy()' used to copy from this buffer.
But as coverity warns 'recv()' doesn't guarantee that 'buf' is
null-terminated, but 'strlcpy()' requires it.

Enlarge 'buf' size to 'EAL_UEV_MSG_LEN + 1' and ensure the last one can
be set to 0 when received buffer size is EAL_UEV_MSG_LEN.

CID 375864:  Memory - illegal accesses  (STRING_NULL)
Passing unterminated string "buf" to "dev_uev_parse", which expects
a null-terminated string.

Coverity issue: 375864
Fixes: 0d0f478d04 ("eal/linux: add uevent parse and process")
Cc: stable@dpdk.org

Signed-off-by: Steve Yang <stevex.yang@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2022-02-27 19:12:34 +01:00
Brian Dooley
d7e9c02cca eal: add missing C++ guards
Some public header files were missing 'extern "C"' C++ guards,
and couldn't be used by C++ applications. Add the missing guards.

Fixes: af75078fec ("first public release")
Fixes: 7f3aa08639 ("eal: introduce bit operations API")
Fixes: 166a743c53 ("compat: add infrastructure to support symbol versioning")
Fixes: 8f40ee0734 ("eal/x86: get hypervisor name")
Fixes: 75583b0d1e ("eal: add keep alive monitoring")
Fixes: 88701645c9 ("eal: move interrupt type out of igb_uio")
Fixes: f04519d809 ("lib: add missing include dependencies")
Fixes: f58880682c ("trace: implement register API")
Fixes: 428eb983f5 ("eal: add OS specific header file")
Cc: stable@dpdk.org

Signed-off-by: Brian Dooley <brian.dooley@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
2022-02-22 14:47:49 +01:00
Sean Morrissey
30a1de105a lib: remove unneeded header includes
These header includes have been flagged by the iwyu_tool
and removed.

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
2022-02-22 13:10:39 +01:00
Michael Barker
ed57d08dfd eal: ignore gcc-compat warning in clang-only macro
When compiling with clang using -Wpedantic (or -Wgcc-compat) the use of
diagnose_if kicks up a warning:

.../include/rte_interrupts.h:623:1: error: 'diagnose_if' is a clang
extension [-Werror,-Wgcc-compat]
__rte_internal
^
.../include/rte_compat.h:36:16: note: expanded from macro '__rte_internal'
__attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \

This change ignores the '-Wgcc-compat' warning in the specific location
where the warning occurs.  It is safe to do in this circumstance as the
specific macro is only defined when using the clang compiler.

Signed-off-by: Michael Barker <mikeb01@gmail.com>
2022-02-12 14:37:47 +01:00
Stephen Hemminger
06c047b680 remove unnecessary null checks
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.

Remove redundant NULL pointer checks before free functions
found by nullfree.cocci

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-12 12:07:48 +01:00
Stephen Hemminger
a0cc7be20d mem: cleanup multiprocess resources
The mp action resources in malloc should be cleaned up via
rte_eal_cleanup.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-11 19:49:22 +01:00
Stephen Hemminger
e8dc971b63 eal: cleanup multiprocess hotplug resources
When rte_eal_cleanup is called, hotplug should unregister the
resources associated with the multi-process server.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-11 19:49:22 +01:00
Stephen Hemminger
6412941ae8 vfio: cleanup the multiprocess sync handle
When rte_eal_cleanup is called the rte_mp_action for VFIO
should be freed.

Fixes: edf73dd330 ("ipc: handle unsupported IPC in action register")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-11 19:49:22 +01:00
Stephen Hemminger
6e858b4d92 ipc: end multiprocess thread during cleanup
When rte_eal_cleanup is called, all control threads should exit.
For the mp thread, this best handled by closing the mp_socket
and letting the thread see that.

This also fixes potential problems where the mp_socket gets
another hard error, and the thread runs away repeating itself
by reading the same error.

Fixes: 85d6815fa6 ("eal: close multi-process socket during cleanup")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-11 19:49:22 +01:00
Stephen Hemminger
5f4eb82f3c log: close in cleanup stage
When application calls rte_eal_cleanup on shutdown,
the DPDK log should be closed and cleaned up.

This helps reduce false reports from tools like ASAN
and valgrind that track memory leaks.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-11 19:49:22 +01:00
Yunjian Wang
5f69ebbd85 mem: check allocation in dynamic hugepage init
The function malloc() could return NULL, the return value
need to be checked.

Fixes: 6f63858e55 ("mem: prevent preallocated pages from being freed")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2022-02-11 08:46:21 +01:00
Pavan Nikhilesh
264ff3f250 eal/arm: inline 128-bit atomic compare exchange with GCC
GCC [1] now assigns even register pairs for CASP, the fix has also been
backported to all stable releases of older GCC versions.
Removing the manual register allocation allows GCC to inline the
functions and pick optimal registers for performing CASP.

1: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=563cc649beaf

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
2022-02-11 08:44:09 +01:00
Bruce Richardson
59144f6edd eal: fix C++ include
C++ files could not include some headers because:

* "new" is a keyword in C++, so can't be a variable name
* there is no automatic casting to/from void *

Fixes: 184104fc61 ("ticketlock: introduce fair ticket based locking")
Fixes: 032a7e5499 ("trace: implement provider payload")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2022-02-10 23:02:35 +01:00
Stephen Hemminger
6e97b5fc1a eal: move Unix filesystem functions into one file
Both Linux and FreeBSD have same code for creating runtime
directory and reading sysfs files. Put them in the new lib/eal/unix
subdirectory.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2022-02-09 19:12:53 +01:00
Stephen Hemminger
1835a22f34 support systemd service convention for runtime directory
Systemd.exec supports configuring the runtime directory of a service
via RuntimeDirectory=. This creates the directory with the necessary
permissions which actual service may not have if running in container.

The change to DPDK is to look for the environment RUNTIME_DIRECTORY
first and use that in preference to the fallback alternatives.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
2022-02-09 19:12:40 +01:00
Stephen Hemminger
36514d8dfa eal: remove size for setting runtime directory
The size argument to eal_set_runtime_dir is useless and was
being used incorrectly in strlcpy. It worked only because
all callers passed PATH_MAX which is same as sizeof the destination
runtime_dir.

Note: this is an internal API so no user exposed change.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-02-09 16:42:31 +01:00
Srikanth Yalavarthi
f3ca33bb20 eal: add internal function to get base address
Added an internal helper to get OS-specific EAL mapping base address

This helper can be used by the drivers to program offload / accelerator
devices, where the base address can be used as a reference address by
the accelerator to access the host memory

An address can also be represented as an offset relative to the base
address using smaller data types

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 23:59:10 +01:00
Dmitry Kozlyuk
0dff3f26d6 eal: extend --huge-unlink for hugepage file reuse
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
32b4771cd8 eal/linux: allow hugepage file reuse
Linux EAL ensured that mapped hugepages are clean
by always mapping from newly created files:
existing hugepage backing files were always removed.
In this case, the kernel clears the page to prevent data leaks,
because the mapped memory may contain leftover data
from the previous process that was using this memory.
Clearing takes the bulk of the time spent in mmap(2),
increasing EAL initialization time.

Introduce a mode to keep existing files and reuse them
in order to speed up initial memory allocation in EAL.
Hugepages mapped from such files may contain data
left by the previous process that used this memory,
so RTE_MEMSEG_FLAG_DIRTY is set for their segments.
If multiple hugepages are mapped from the same file:
1. When fallocate(2) is used, all memory mapped from this file
   is considered dirty, because it is unknown
   which parts of the file are holes.
2. When ftruncate(3) is used, memory mapped from this file
   is considered dirty unless the file is extended
   to create a new mapping, which implies clean memory.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
52d7d91ed4 eal: refactor --huge-unlink storage
In preparation to extend --huge-unlink option semantics
refactor how it is stored in the internal configuration.
It makes future changes more isolated.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk
2edd037c09 mem: add dirty malloc element support
EAL malloc layer assumed all free elements content
is filled with zeros ("clean"), as opposed to uninitialized ("dirty").
This assumption was ensured in two ways:
1. EAL memalloc layer always returned clean memory.
2. Freed memory was cleared before returning into the heap.

Clearing the memory can be as slow as around 14 GiB/s.
To save doing so, memalloc layer is allowed to return dirty memory.
Such segments being marked with RTE_MEMSEG_FLAG_DIRTY.
The allocator tracks elements that contain dirty memory
using the new flag in the element header.
When clean memory is requested via rte_zmalloc*()
and the suitable element is dirty, it is cleared on allocation.
When memory is deallocated, the freed element is joined
with adjacent free elements, and the dirty flag is updated:

a) If the joint element contains dirty parts, it is dirty:

    dirty + freed + dirty = dirty  =>  no need to clean
            freed + dirty = dirty      the freed memory

   Dirty parts may be large (e.g. initial allocation),
   so clearing them could create unpredictable slowdown.

b) If the only dirty part of the joint element
   is the freed memory, the joint element can be made clean:

    clean + freed + clean = clean  =>  freed memory
    clean + freed         = clean      must be cleared
            freed + clean = clean
            freed         = clean

   This logic naturally reproduces the old behavior
   and always applies in modes when EAL memalloc layer
   returns only clean segments.

As a result, memory is either cleared on free, as before,
or it will be cleared on allocation if need be, but never twice.

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-02-08 21:32:53 +01:00
Weiguo Li
0ae7844fcd eal/windows: remove useless C++ include guard
Remove the incomplete cplusplus guard in internal header.

Fixes: 6e1ed4cbbe ("eal/windows: add dirent implementation")

Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2022-02-08 17:13:50 +01:00
Jie Zhou
e14f1744d6 eal: differentiate strerror message on Windows
On Windows, strerror returns just "Unknown error" for errnum greater
than MAX_ERRNO, while linux and freebsd returns "Unknown error <num>",
which is the current expectation for errno_autotest. Differentiate
the error string on Windows to remove a "duplicate error code" failure.

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2022-02-08 14:19:40 +01:00
Jie Zhou
7e71c4dce3 eal/windows: fix error code for not supported API
UT memory_autotest on Windows has 2 failed cases on EAL APIs
eal_memalloc_get_seg_fd and eal_memalloc_get_seg_fd_offset. These 2
APIs are not supported on Windows yet. Should return ENOTSUP such that
in test_memory.c these 2 ENOTSUP cases will not be marked as failures,
same as other ENOTSUP cases.

Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org

Signed-off-by: Jie Zhou <jizh@linux.microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2022-02-08 14:19:40 +01:00
Pallavi Kadam
416c1bef9d eal/windows: set worker thread affinity at init
Sometimes OS tries to switch the core. So, bind the lcore thread
to a fixed core.
Implement affinity call on Windows similar to Linux.

Signed-off-by: Qiao Liu <qiao.liu@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
Tested-by: Idan Hackmon <idanhac@nvidia.com>
2022-02-02 23:44:05 +01:00
Martijn Bakker
44f44d8298 pflock: fix header file installation
The generic header file was missing
in the list of files to install.

Fixes: 9667d97c25 ("pflock: add phase-fair reader writer locks")
Cc: stable@dpdk.org

Signed-off-by: Martijn Bakker <gladdyu@gmail.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-02-02 14:34:11 +01:00
Anatoly Burakov
4042dc2037 mem: quiet base address hint warning if not requested
Any EAL memory allocation often goes through eal_get_virtual_area()
function, which will print a warning whenever the resulting allocation
didn't match the specified address requirements. This is useful for
when we have requested a specific base virtual address, to let the user
know that the mapping has deviated from that address.

However, on Linux, we also have a default base address that's there to
ensure better chances of successful secondary process initialization,
as well as higher likelihood of the virtual areas to fit inside the
IOMMU address width. Because of this default base address, there are
warnings printed even when no base address was explicitly requested,
which can be confusing to the user.

Emit this warning with debug level unless base address was explicitly
requested by the user.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2022-01-28 12:06:26 +01:00
Stephen Hemminger
8a5a91401d eal/linux: log hugepage create errors with filename
While debugging running DPDK service in a container, it is
useful to see which file creation failed.  Don't hide this
failure with DEBUG.

Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2022-01-21 15:40:58 +01:00
Bruce Richardson
cadb255e25 eal: add OS defines for C conditional checks
Define a set of macros in the build configuration to allow C runtime
code to check the current OS environment. This saves the user having to
use ifdefs for e.g. disabling particular tests on Windows.
See included documentation changes for usage examples.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2022-01-17 19:26:42 +01:00
Josh Soref
7be78d0279 fix spelling in comments and strings
The tool comes from https://github.com/jsoref

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2022-01-11 12:16:53 +01:00
Sean Morrissey
f8dbaebbf1 fix PMD wording
Removing the use of driver following PMD as its unnecessary.

Cc: stable@dpdk.org

Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-11-26 11:28:34 +01:00
Stephen Hemminger
4a6672c2d3 fix spelling in comments and doxygen
Fix spelling errors in comments including doxygen found using codespell.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-16 17:57:09 +01:00
David Christensen
f2a66612ee eal/ppc: support ASan
Add support for Address Sanitizer (ASan) for PPC/POWER architecture.

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-16 11:24:22 +01:00
Volodymyr Fialko
001d402c89 eal/arm64: support ASan
This patch defines ASAN_SHADOW_OFFSET for arm64 according to the ASan
documentation. This offset should cover all arm64 VMAs supported by
ASan.

Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-11-12 15:30:00 +01:00
Maciej Szwed
aeed570a21 interrupt: fix request notifier interrupt processing
We should call read() on RTE_INTR_HANDLE_VFIO_REQ event
to confirm that event.

Fixes: 0eb8a1c4c7 ("vfio: add request notifier interrupt")
Cc: stable@dpdk.org

Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
2021-11-08 18:26:07 +01:00
Harman Kalra
7e2083e462 eal/linux: check interrupt file descriptor validity
This patch fixes coverity issue by adding a check for negative event fd
value.

Coverity issue: 373711, 373694
Fixes: c2bd9367e1 ("lib: remove direct access to interrupt handle")

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-08 17:32:42 +01:00
Harman Kalra
3fcca9fac6 interrupts: check file descriptor validity
This patch fixes coverity issues by adding a check for negative event
fd value.

Coverity issue: 373716, 373699, 373693, 373688
Fixes: bbbac4cd6e ("interrupts: remove direct access to interrupt handle")

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-08 17:32:42 +01:00
Anatoly Burakov
4fd15c6af0 vfio: set errno on unsupported OS
Currently, when code is running on FreeBSD or Windows, there is no way
to distinguish between a geniune error and a "VFIO is unsupported"
error. Fix the dummy implementations to also set the rte_errno flag.

Fixes: 279b581c89 ("vfio: expose functions")
Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Fixes: 964b2f3bfb ("vfio: export some internal functions")
Fixes: ea2dc10668 ("vfio: add multi container support")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-08 16:45:28 +01:00
Anatoly Burakov
da6e4cdca1 vfio: fix FreeBSD documentation
On FreeBSD, `rte_vfio_is_enabled()` and `rte_vfio_noiommu_is_enabled()`
API calls will not return error, and will instead return 0. This is
intentional, because the caller of this API does not care whether VFIO
is supported at all, and will instead be interested in whether VFIO is
enabled or not. However, the doxygen comments for these functions state
that they will return an error on FreeBSD, which is incorrect.

Fix the doxygen comment to call out the fact that these
functions are only relevant on Linux, but remove the reference to
returning errors.

Fixes: 279b581c89 ("vfio: expose functions")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-08 16:42:55 +01:00
Anatoly Burakov
bf8b792f3b vfio: fix FreeBSD clear group stub
On FreeBSD, `rte_vfio_clear_group()` was returning 0 even though this
function is not valid for FreeBSD, and is called out to return error in
doxygen comments.
Fix the return value to match documentation.

Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-11-08 16:42:44 +01:00
Anatoly Burakov
84e03bde1c vfio: drop fallback Linux implementation
Currently, VFIO support for Linux is compiled unconditionally, and
supported kernel versions start with 4.4, so VFIO is assumed to always
be enabled. There is no way of disabling VFIO support at compile time
anyway, so just drop the "VFIO not available" fallback code altogether.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
2021-11-08 16:27:15 +01:00
Olivier Matz
9bffc92850 mem: fix dynamic hugepage mapping in container
Since its introduction in 2018, the SIGBUS handler was never registered,
and all related functions were unused.

A SIGBUS can be received by the application when accessing to hugepages
even if mmap() was successful, This happens especially when running
inside containers when there is not enough hugepages. In this case, we
need to recover. A similar scheme can be found in eal_memory.c.

Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-05 15:28:55 +01:00
Ilyes Ben Hamouda
770d41bf33 malloc: fix allocation with unknown socket ID
When using rte_malloc() from a thread which is not bound to a numa
socket (the typical case is a control thread, but it can also happen
on a dataplane thread if its cpu affinity is on cores attached to
several sockets), the used heap is the one from numa socket 0, which
may not have available memory.

Fix this by selecting the first socket which has available memory.

Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.

Fixes: b94580d688 ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org

Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2021-11-05 15:28:49 +01:00
David Hunt
bb0bd346d5 eal: suggest using --lcores option
If the user requests to use an lcore above 128 using -l,
the eal will exit with "EAL: invalid core list syntax" and
very little else useful information.

This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application to physical cores.

For example, if "-l 12-16,130,132" is used, we see the following
additional output on the command line:

EAL: lcore 132 >= RTE_MAX_LCORE (128)
EAL: lcore 133 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133

The same is added to -c option parsing.

For example, if "-c 0x300000000000000000000000000000000" is
used, we see the following additional output on the command line:

EAL: lcore 128 >= RTE_MAX_LCORE (128)
EAL: lcore 129 >= RTE_MAX_LCORE (128)
EAL: To use high physical core ids, please use --lcores to map them
to lcore ids below RTE_MAX_LCORE,
EAL: e.g. --lcores 0@128,1@129

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-05 14:39:37 +01:00
David Marchand
f5fa0e110f eal: promote non-EAL lcore API as stable
This API has been around for more than a year (and is in LTS 20.11).
It did not receive negative feedback and will be used in a next OVS
release.
Mark it stable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-04 22:57:58 +01:00
David Marchand
5633173341 eal/linux: fix device hotplug
The device event interrupt handler was always freed.

Bugzilla ID: 845
Fixes: c2bd9367e1 ("lib: remove direct access to interrupt handle")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yan Xia <yanx.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-04 15:13:41 +01:00
David Marchand
4847122aab eal/linux: fix uevent message parsing
Caught with ASan:
==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address
  0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578
READ of size 1 at 0x7f0daa2fc0d0 thread T1
    #0 0x7f0daeefacb1  (/lib64/libasan.so.5+0xbacb1)
    #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167
    #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248
    #3 0x1169b91 in eal_intr_process_interrupts
  ../lib/eal/linux/eal_interrupts.c:1026
    #4 0x116a3a2 in eal_intr_handle_interrupts
  ../lib/eal/linux/eal_interrupts.c:1100
    #5 0x116a7f0 in eal_intr_thread_main
  ../lib/eal/linux/eal_interrupts.c:1172
    #6 0x112640a in ctrl_thread_init
  ../lib/eal/common/eal_common_thread.c:202
    #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159)
    #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72)

Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192
  in frame
    #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226

  This frame has 2 object(s):
    [32, 48) 'uevent'
    [96, 4192) 'buf' <== Memory access at offset 4192 overflows this
  variable
HINT: this may be a false positive if your program uses some custom
  stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
Thread T1 created by T0 here:
    #0 0x7f0daee92ea3 in __interceptor_pthread_create
  (/lib64/libasan.so.5+0x52ea3)
    #1 0x1126542 in rte_ctrl_thread_create
  ../lib/eal/common/eal_common_thread.c:228
    #2 0x116a8b5 in rte_eal_intr_init
  ../lib/eal/linux/eal_interrupts.c:1200
    #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044
    #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105
    #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802)

Bugzilla ID: 792
Fixes: 0d0f478d04 ("eal/linux: add uevent parse and process")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Yan Xia <yanx.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-11-04 15:13:41 +01:00
Jim Harris
628bac7df1 eal/linux: remove unused variable for socket memory
clang-13 rightfully complains that the total_mem variable in
eal_parse_socket_arg is set but not used, since the final
accumulated total_mem result isn't used anywhere.
So just remove the total_mem variable.

Fixes: 0a703f0f36 ("eal/linux: fix parsing zero socket memory and limits")

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-11-04 13:27:18 +01:00
Xueming Li
fc382022c6 eal: fix device iterator when no bus is selected
Devargs used in device iterator initialization wasn't set to zero, random
data like bus string lead to invalid address access.

This patch initializes devargs.

Bugzilla ID: 862
Fixes: c99a2d4c6b ("eal: implement device iteration initialization")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
2021-11-04 11:44:49 +01:00
Dmitry Kozlyuk
9790fc2149 eal/freebsd: fix IOVA mode selection
FreeBSD EAL selected IOVA mode PA even in --no-huge mode
where PA are not available. Memory zones were created with IOVA
equal to RTE_BAD_IOVA with no indication this field is not usable.

Change IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. In --no-huge mode, disallow forcing --iova-mode=pa, and select VA.
3. Otherwise select IOVA mode according to bus requests, default to PA.
In case contigmem is inaccessible, memory initialization will fail
with a message indicating the cause.

Fixes: c2361bab70 ("eal: compute IOVA mode based on PA availability")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-11-03 18:32:19 +01:00
Feifei Wang
4ed4e554ac mcslock: use wait until scheme for unlock
Instead of polling for mcslock to be updated, use wait until scheme
for this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
41902d2468 pflock: use wait until scheme for read lock
Instead of polling for read pflock update, use wait until scheme for
this case.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Feifei Wang
875f350924 eal: add a new helper for wait until scheme
Add a new generic helper which is a macro for wait until scheme.

Furthermore, to prevent compilation warning in arm:
----------------------------------------------
'warning: implicit declaration of function ...'
----------------------------------------------
Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
And add ‘__RTE_ARM’ for these macros to fix the namespace.
This is because original macros are undefine at the end of the file.
If the new macro calls them in other files, they will be seen as
'not defined'.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-11-03 15:50:14 +01:00
Zhihong Peng
6cc51b1293 mem: instrument allocator for ASan
This patch adds necessary hooks in the memory allocator for ASan.

This feature is currently available in DPDK only on Linux x86_64.
If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be
defined and RTE_MALLOC_ASAN must be set accordingly in meson.

Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-10-29 16:25:03 +02:00
Anatoly Burakov
ab910a8068 vfio: fix partial unmap
Partial unmap support was introduced in commit c13ca4e81c
("vfio: fix DMA mapping granularity for IOVA as VA"), and with it
was added a check that dereferenced the IOMMU type to determine whether
partial ummapping is supported for currently configured IOMMU type. In
certain circumstances (such as when VFIO is supported, but no devices
were bound to the VFIO driver), the IOMMU type pointer can be NULL.

However, dereferencing of IOMMU type was guarded by access to the user
maps list - that is, we were always checking the user map list first,
and then, if we found a memory region that encloses the one we're trying
to unmap, we would have performed the IOMMU type check.

This ensured that the IOMMU type check will not cause any NULL pointer
dereferences, because in order for an IOMMU type check to have been
performed, there necessarily must have been at least one memory region
that was previously mapped successfully, and that implies having a
defined IOMMU type.

When commit 56259f7fc0 ("vfio: allow partially unmapping adjacent
memory") was introduced, the IOMMU type check was moved to
before we were traversing the user mem maps list, thereby introducing a
potential NULL dereference, because the IOMMU type access was no longer
guarded by the user mem maps list traversal.

Fix the issue by moving the IOMMU type check to after the user mem maps
traversal, thereby ensuring that by the time the check happens, the
IOMMU type is always valid.

Fixes: 56259f7fc0 ("vfio: allow partially unmapping adjacent memory")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Xuan Ding <xuan.ding@intel.com>
2021-10-28 09:51:55 +02:00
Honnappa Nagarahalli
705356f081 eal: simplify control thread creation
Remove the usage of pthread barrier and replace it with
synchronization using atomic variable.
This also removes the use of reference count required to synchronize
freeing the memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
2021-10-25 21:43:10 +02:00
Harman Kalra
8cb5d08db9 interrupts: extend event list
Dynamically allocating the efds and elist array of intr_handle
structure, based on size provided by user. Eg size can be
MSIX interrupts supported by a PCI device.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
99e6c7e316 interrupts: rename device specific file descriptor
VFIO/UIO are mutually exclusive, storing file descriptor in a single
field is enough.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
73d844fd08 interrupts: make interrupt handle structure opaque
Moving interrupt handle structure definition inside a EAL private
header to make its fields totally opaque to the outside world.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
c2bd9367e1 lib: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
90b13ab8d4 alarm: remove direct access to interrupt handle
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the libraries access the interrupt handle fields.

Implementing alarm cleanup routine, where the memory allocated
for interrupt instance can be freed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
bbbac4cd6e interrupts: remove direct access to interrupt handle
Making changes to the interrupt framework to use interrupt handle
APIs to get/set any field.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Harman Kalra
b7c9842916 interrupts: add allocator and accessors
Prototype/Implement get set APIs for interrupt handle fields.
User won't be able to access any of the interrupt handle fields
directly while should use these get/set APIs to access/manipulate
them.

Internal interrupt header i.e. rte_eal_interrupt.h is rearranged,
as APIs defined are moved to rte_interrupts.h and epoll specific
definitions are moved to a new header rte_epoll.h.
Later in the series rte_eal_interrupt.h will be removed.

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
2021-10-25 21:20:12 +02:00
Dmitry Kozlyuk
0c8fc83a71 eal/windows: fix IOVA mode detection and handling
Windows EAL did not detect IOVA mode and worked incorrectly
if physical addresses could not be obtained
(if virt2phys driver was missing or inaccessible).
In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address.
Inability to obtain IOVA, be it PA or VA, should cause a failure
for the DPDK allocator, but it was hidden by the implementation,
so allocations did not fail when they should.
The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode.
However, rte_eal_iova_mode() always returned RTE_IOVA_DC
(while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA),
because IOVA mode detection was not implemented.

Implement IOVA mode detection:
1. Always allow to force --iova-mode=va.
2. Allow to force --iova-mode=pa only if virt2phys is available.
3. If no mode is forced and virt2phys is available,
   select the mode according to bus requests, default to PA.
4. If no mode is forced but virt2phys is unavailable, default to VA.
Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA.
Fix rte_eal_iova_mode() by returning the selected mode.

Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org

Reported-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2021-10-25 20:59:40 +02:00
Harman Kalra
e6732d0d6e mem: add telemetry infos
Registering new telemetry callbacks to list named (memzones)
and unnamed (malloc) memory reserved and return information
based on arguments provided by user.

Example:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384}
Connected to application: "dpdk-testpmd"
-->
--> /eal/memzone_list
{"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}
-->
-->
--> /eal/memzone_info,0
{"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data",    \
"Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \
"Hugepage_size": 536870912, "Hugepage_base": "0x120000000",   \
"Hugepage_used": 1}}
-->
-->
--> /eal/memzone_info,6
{"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0",  \
"Length": 669918336, "Address": "0x15811db80", "Socket": 0,  \
"Flags": 0, "Hugepage_size": 536870912, "Hugepage_base": "0x140000000", \
"Hugepage_used": 2}}
-->
-->
--> /eal/memzone_info,14
{"/eal/memzone_info": null}
-->
-->
--> /eal/heap_list
{"/eal/heap_list": [0]}
-->
-->
--> /eal/heap_info,0
{"/eal/heap_info": {"Head id": 0, "Name": "socket_0",     \
"Heap_size": 1610612736, "Free_size": 927645952,          \
"Alloc_size": 682966784, "Greatest_free_size": 529153152, \
"Alloc_count": 482, "Free_count": 2}}

Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-25 19:39:54 +02:00
Honnappa Nagarahalli
3596537005 eal: fix memory ordering around lcore task accesses
Ensure that the memory operations before the call to
rte_eal_remote_launch are visible to the worker thread.
Use the function pointer to execute in worker thread
as the guard variable.

Ensure that the memory operations in worker thread, that happen
before it returns the status of the assigned function, are
visible to the main thread. Use the variable containing the
lcore's state as the guard variable.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli
f6c6c686f1 eal: remove FINISHED lcore state
FINISHED state seems to be used to indicate that the worker's update
of the 'state' is not visible to other threads. There seems to be no
requirement to have such a state.

Since the FINISHED state is removed, the API rte_eal_wait_lcore
is updated to always return the status of the last function that
ran in the worker core.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli
33969e9c61 eal: reset lcore task callback and argument
In the rte_eal_remote_launch function, the lcore function
pointer is checked for NULL. However, the pointer is never
reset to NULL. Reset the lcore function pointer and argument
after the worker has completed executing the lcore function.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Eli Britstein
6de430b707 eal/x86: avoid cast-align warning in memcpy functions
Functions and macros in x86 rte_memcpy.h may cause cast-align warnings,
when using strict cast align flag with supporting gcc:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static

For example:
In file included from main.c:24:
/dpdk/build/include/rte_memcpy.h: In function 'rte_mov16':
/dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases
required alignment of target type [-Wcast-align]
  306 |  xmm0 = _mm_loadu_si128((const __m128i *)src);
      |                         ^

As the code assumes correct alignment, add first a (void *) or (const
void *) castings, to avoid the warnings.

Fixes: 9484092baa ("eal/x86: optimize memcpy for AVX512 platforms")
Cc: stable@dpdk.org

Signed-off-by: Eli Britstein <elibr@nvidia.com>
2021-10-25 17:28:12 +02:00
Xuan Ding
56259f7fc0 vfio: allow partially unmapping adjacent memory
Currently, if we map a memory area A, then map a separate memory area B
that by coincidence happens to be adjacent to A, current implementation
will merge these two segments into one, and if partial unmapping is not
supported, these segments will then be only allowed to be unmapped in
one go. In other words, given segments A and B that are adjacent, it
is currently not possible to map A, then map B, then unmap A.

Fix this by adding a notion of "chunk size", which will allow
subdividing segments into equally sized segments whenever we are dealing
with an IOMMU that does not support partial unmapping. With this change,
we will still be able to merge adjacent segments, but only if they are
of the same size. If we keep with our above example, adjacent segments A
and B will be stored as separate segments if they are of different
sizes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2021-10-21 14:24:21 +02:00
Xueming Li
5adef306da devargs: make bus optional
Global devargs syntax is used as device iteration filter like
"class=vdpa", a devargs without bus args is valid from parsing
perspective.

This patch makes bus args optional.

Fixes: d2a66ad794 ("bus: add device arguments name parsing")

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-10-21 11:32:44 +02:00
Xueming Li
9a1a9e4a2d devargs: support path value with global device syntax
Slash is used to split global device arguments.

To support path value which contains slash, this patch parses devargs by
locating both slash and layer name key:
  bus=a,name=/some/path/class=b,k1=v1/driver=c,k2=v2
"/class=" and "/driver" are valid start of a layer.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Gaetan Rivet <grive@u256.net>
2021-10-21 11:32:06 +02:00
Feifei Wang
c4629b02c5 mcslock: use WFE in lock for aarch64
Instead of polling for previous lock holder unlocking, use
wait_until_equal API.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-10-20 08:22:41 +02:00
Feifei Wang
a6e24bf417 mem: use WFE for init sync on aarch64
Instead of polling for mcfg->magic to be updated, use wait_until_equal
API.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
2021-10-20 08:22:18 +02:00
David Marchand
bc1a35fb3f memzone: enforce valid flags when reserving
If we do not enforce valid flags are passed by an application, this
application might face issues in the future when we add more flags.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-15 10:29:21 +02:00
Bruce Richardson
e89463a366 eal: limit telemetry to primary processes
Telemetry interface should be exposed for primary processes only, since
secondary processes will conflict on socket creation, and since all
data in secondary process is generally available to primary. For
example, all device stats for ethdevs, cryptodevs, etc. will all be
common across processes.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
2021-10-14 20:31:10 +02:00
David Christensen
b698651b91 eal/ppc: use compiler builtins for atomics
Replace existing PPC assembly code for rte_atomicXX ops with compiler
atomic builtins as previously adopted by DPDK (see [1] and [2]).  This
has the additional benefit of resolving a POWER10 build failure due to an
outstanding gcc issue which fails on the existing PPC assembly code [3].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
[2] https://doc.dpdk.org/guides/rel_notes/deprecation.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
2021-10-14 16:51:25 +02:00
Bruce Richardson
0faa4cfc50 eal/freebsd: ignore in-memory option
The in-memory option is not supported on FreeBSD so print a warning and
ignore the flag when it is specified for BSD apps. The lack of support
is due to the different way in which memory is managed on FreeBSD using
the contigmem driver rather than via a hugetlbfs filesystem.

Fixes: 14de8734c4 ("eal: add --in-memory option")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-13 17:11:26 +02:00
David Marchand
2f3758751b eal/x86: sort CPU extended features definitions
Sort the definitions for extended features (leaf 0) to enhance
readability.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-12 21:07:53 +02:00
David Marchand
aae3037ab1 eal/x86: fix some CPU extended features definitions
Caught while checking CPUID related stuff in OVS.

According to [1], for Structured Extended Feature Flags Enumeration Leaf
(EAX = 0x07H, ECX = 0):

- BMI1 is associated to EBX, bit 3 (was incorrectly 2),
- SMEP is associated to EBX, bit 7 (was incorrectly 6),
- BMI2 is associated to EBX, bit 8 (was incorrectly 7),
- ERMS is associated to EBX, bit 9 (was incorrectly 8),

1: https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2021-10-12 21:07:50 +02:00
John Levon
24d5a1ce6b eal/linux: allow hugetlbfs sub-directories
get_hugepage_dir() was implemented in such a way that a --huge-dir
option had to exactly match the mountpoint, but there's no reason for
this restriction: DPDK might not be the only user of hugepages, and
shouldn't assume it owns an entire mountpoint. For example, if I have
/dev/hugepages/myapp, and /dev/hugepages/dpdk, I should be able to
specify:

--huge-dir=/dev/hugepages/dpdk/

and have DPDK only use that sub-directory.

Fix the implementation to allow a sub-directory within a suitable
hugetlbfs mountpoint to be specified, preferring the closest match.

Signed-off-by: John Levon <john.levon@nutanix.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-10-12 21:07:46 +02:00
Dmitry Kozlyuk
d47dd94162 eal/windows: do not install virt2phys header
The header was not intended to be a public one.
DPDK users should use `rte_mem_virt2iova()` to translate addresses.
Other virt2phys users should use the header from the driver instead.

Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
2021-10-11 21:17:12 +02:00
Narcisa Vasile
694c81721e eal/windows: fix CPU cores counting
On Windows, -l/--lcores EAL option was unable to process CPU sets
containing CPUs other than 0 and 1, because CPU_COUNT() macro
only checked these CPUs in the set. Fix CPU_COUNT() by enumerating
all possible CPU indices.

Fixes: e8428a9d89 ("eal/windows: add some basic functions and macros")
Cc: stable@dpdk.org

Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
2021-10-11 18:52:56 +02:00
Bruce Richardson
47a4f2650c eal/freebsd: lock memory device to prevent conflicts
Only a single DPDK process on the system can be using the /dev/contigmem
mappings at a time, but this was never explicitly enforced, e.g. when
using --in-memory flag on two processes. To prevent possible conflict
issues, we lock the dev node when it's in use, preventing other DPDK
processes from starting up and causing problems for us.

Fixes: 764bf26873 ("add FreeBSD support")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2021-10-02 16:30:16 +02:00
Ivan Malov
8cfad59e29 log: promote some function to stable
This one might be quite mature to be attested as stable.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-02 11:12:32 +02:00
Mattias Rönnblom
15a1e00a65 eal: promote random generator with upper bound to stable
Remove experimental tag from rte_rand_max().

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-02 11:12:19 +02:00
William Tu
f1f6ebc0ea eal: remove sys/queue.h from public headers
Currently there are some public headers that include 'sys/queue.h', which
is not POSIX, but usually provided by the Linux/BSD system library.
(Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
The file is missing on Windows. During the Windows build, DPDK uses a
bundled copy, so building a DPDK library works fine.  But when OVS or other
applications use DPDK as a library, because some DPDK public headers
include 'sys/queue.h', on Windows, it triggers an error due to no such
file.

One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
Windows environment, such as [1]. However, this means DPDK exports the
functionalities of 'sys/queue.h' into the environment, which might cause
symbols, macros, headers clashing with other applications.

The patch fixes it by removing the "#include <sys/queue.h>" from
DPDK public headers, so programs including DPDK headers don't depend
on the system to provide 'sys/queue.h'. When these public headers use
macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix.
For Windows, we copy the definitions from <sys/queue.h> to rte_os.h
in Windows EAL. Note that these RTE_ macros are compatible with
<sys/queue.h>, both at the level of API (to use with <sys/queue.h>
macros in C files) and ABI (to avoid breaking it).

Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>,
the patch replaces it with RTE_TAILQ_FOREACH_SAFE.

[1] http://mails.dpdk.org/archives/dev/2021-August/216304.html

Suggested-by: Nick Connolly <nick.connolly@mayadata.io>
Suggested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2021-10-01 13:09:43 +02:00
Dmitry Kozlyuk
6787d0af94 lib: remove sched.h from public headers
Public headers including POSIX-specific <sched.h> were unusable
on Windows. These includes were superfluous, remove them.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
2021-10-01 08:35:05 +02:00