Use rte_pktmbuf_free_bulk instead of loop when freeing
packets.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
When we're attaching fbarrays in secondary processes, we check for
whether the intended memory address for the fbarray is already in use by
some other, local fbarray. However, the check for end-overlap (i.e. to
see if our memory area's end overlaps with some other fbarray) is
incorrectly counting end offset as part of the overlap. Fix the check.
Fixes: 5b61c62cfd ("fbarray: add internal tailq for mapped areas")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Zhihong Peng <zhihongx.peng@intel.com>
When no hugepages are found, we log a message about it, but we never
specify on which node. We also implicitly declare the page size based
on the directory name, but that's not very user friendly.
Fix both by changing the text of the message to note the NUMA node (if
applicable) and explicitly mention page size in kilobytes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Add a simple API to allow getting the monitor conditions for
power-optimized monitoring of the Rx queues from the PMD, as well as
release notes information.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Now that we have everything in a C file, we can store the information
about our sleep, and have a native mechanism to wake up the sleeping
core. This mechanism would however only wake up a core that's sleeping
while monitoring - waking up from `rte_power_pause` won't work.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently, the "sync" version of power monitor intrinsic is supposed to
be used for purposes of waking up a sleeping core. However, there are
better ways to achieve the same result, so remove the unneeded function.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Instead of passing around pointers and integers, collect everything
into struct. This makes API design around these intrinsics much easier.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently, the API documentation mandates that if the user wants to use
the power management intrinsics, they need to call the
`rte_cpu_get_intrinsics_support` API and check support for specific
intrinsics.
However, if the user does not do that, it is possible to get illegal
instruction error because we're using raw instruction opcodes, which may
or may not be supported at runtime.
Now that we have everything in a C file, we can check for support at
startup and prevent the user from possibly encountering illegal
instruction errors.
We also add return values to the API's as well, because why not.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Currently, power intrinsics are inline functions. Make them part of the
ABI so that we can have various internal data associated with them
without exposing said data to the outside world.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The librte_cfgfile lib is functional on Windows.
Enable compilation of this lib for Windows.
Signed-off-by: Narcisa Vasile <navasile@linux.microsoft.com>
Currently bitmap line not empty check API assumes cache line
of 64B and only checks 8 slabs. Since in 128B cacheline, we
have 16 slabs per cacheline, rte_bitmap_clear() will mark
complete line as empty as soon as 8 slabs are empty thereby
breaking bitmap scan functionality. Fix it by defining new
__rte_bitmap_line_not_empty() for 128B cacheline platform.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
In some cases, a device or infrastructure may want to enable hotplug
but application may also try and start hotplug as well. Therefore
change the monitor_started from a boolean into a reference count.
Signed-off-by: Long Li <longli@microsoft.com>
Explicitly cast void * to type * so that EAL headers may be compiled
as C or C++.
Fixes: e8428a9d89 ("eal/windows: add some basic functions and macros")
Cc: stable@dpdk.org
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Currently, when rte_service_init() fails at initialization, the
application always gets a ENOEXEC error code. For example, with testpmd,
this is displayed as:
Cannot init EAL: Exec format error
This error code does not describe the real issue. Instead, use the error
code returned by the function.
Fixes: e398245008 ("service: initialize with EAL")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
In some situations, we would get several ip fragments, which total
data length is less than min_ip_len(64) and padding with zeros.
We simulated intermediate fragments by modifying the MTU.
To illustrate the problem, we simplify the packet format and
ignore the impact of the packet header.In namespace2,
a packet whose data length is 1520 is sent.
When the packet passes tap2, the packet is divided into two
fragments: fragment A and B, similar to (1520 = 1510 + 10).
When the packet passes tap3, the larger fragment packet A is
divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
Finally, the bond interface receives three fragments:
A1, A2, and B (1520 = 1500 + 10 + 10).
One fragmented packet A2 is smaller than the minimum Ethernet
frame length, so it needs to be padded.
|---------------------------------------------------|
| HOST |
| |--------------| |----------------------------| |
| | ns2 | | |--------------| | |
| | |--------| | | |--------| |--------| | |
| | | tap1 | | | | tap2 | ns1| tap3 | | |
| | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
| |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
| |--------| |--------| |--------| |
| | | | |
| |-----------------| | |
| | |
| |--------| |
| | bond | |
|--------------------------------------|mtu=1500|---|
|--------|
When processing the preceding packets above,
DPDK would aggregate fragmented packets A2 and B.
And error packets are generated, which padding(zero)
is displayed in the middle of the packet.
A2 + B:
0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
So, we would calculate the length of padding, and remove
the padding in pkt_len and data_len before aggregation.
And also we have the fix for both ipv4 and ipv6.
Fixes: 7f0983ee33 ("ip_frag: check fragment length of incoming packet")
Cc: stable@dpdk.org
Signed-off-by: Yicai Lu <luyicai@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
As most NICs do not support segmentation for VXLAN-encapsulated
UDP/IPv4 packets, this patch adds VXLAN UDP/IPv4 GSO support.
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Jiayu Hu <jiayu.hu@intel.com>
The file rte_random.c is required to build i40e PMD on Windows.
Add rte_rand variable to export file.
Redefine _m_prefetchw for Clang toolchain due to following error
with respect to conflicting types:
FAILED: lib/76b5a35@@rte_eal@sta/librte_eal_common_rte_random.c.obj
clang @lib/76b5a35@@rte_eal@sta/librte_eal_common_rte_random.c.obj.rsp
In file included from ../lib/librte_eal/common/rte_random.c:13:
In file included from ..\lib/librte_eal/include\rte_eal.h:20:
In file included from ..\lib/librte_eal/include\rte_per_lcore.h:25:
In file included from ..\lib/librte_eal/windows/include\pthread.h:21:
In file included from ..\lib/librte_eal/windows/include\rte_windows.h:27:
In file included from C:\Program Files (x86)\Windows Kits\10\include\
10.0.18362.0\um\windows.h:171:
In file included from C:\Program Files (x86)\Windows Kits\10\include\
10.0.18362.0\shared\windef.h:24:
In file included from C:\Program Files (x86)\Windows Kits\10\include\
10.0.18362.0\shared\minwindef.h:182:
C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um\
winnt.h:3324:1: error: conflicting types for '_m_prefetchw'
_m_prefetchw (
^
C:\Program Files\LLVM\lib\clang\10.0.0\include\prfchwintrin.h:50:1:
note: previous definition is here
_m_prefetchw(void *__P)
^
1 error generated.
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Async enqueue offloads large copies to DMA devices, and small copies
are still performed by the CPU. However, it requires users to get
enqueue completed packets by rte_vhost_poll_enqueue_completed(), even
if they are completed by the CPU when rte_vhost_submit_enqueue_burst()
returns. This design incurs extra overheads of tracking completed
pktmbufs and function calls, thus degrading performance on small packets.
This patch enhances async enqueue for small packets by enabling
rte_vhost_submit_enqueue_burst() to return completed packets.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch removes unnecessary check and function calls, and it changes
appropriate types for internal variables and fixes typos.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Added new path to do lpm4 lookup by using scalable vector extension.
The SVE path will be selected if compiler has flag SVE set.
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
rte_lpm_lookupx4 could return wrong next hop when more than 256 tbl8
groups are created. This is caused by incorrect type casting of tbl8
group index that been stored in tbl24 entry. The casting caused group
index truncation and hence wrong tbl8 group been searched.
Issue fixed by applying proper mask to tbl24 entry to get tbl8 group index.
Fixes: dc81ebbaca ("lpm: extend IPv4 next hop field")
Fixes: cbc2f1dccf ("lpm/arm: support NEON")
Fixes: d2cc795934 ("lpm: add AltiVec for ppc64")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Improved performance for AVX512 FIB6 lookup by doubling the number
of flows being processed
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
When scanning a buffer it is possible that the scan will abort
due to some internal resource limit.
This commit adds such response flag, so application can handle such cases.
Signed-off-by: Francis Kelly <fkelly@nvidia.com>
Signed-off-by: Ori Kam <orika@nvidia.com>
Add support for TLS functionality in EAL.
The following functions are added:
rte_thread_tls_key_create - create a TLS data key.
rte_thread_tls_key_delete - delete a TLS data key.
rte_thread_tls_value_set - set value bound to the TLS key
rte_thread_tls_value_get - get value bound to the TLS key
TLS key is defined by the new type rte_tls_key.
The API allocates the thread local storage (TLS) key.
Any thread of the process can subsequently use this key
to store and retrieve values that are local to the thread.
Those functions are added in addition to TLS capability
in rte_per_lcore.h to allow abstraction of the pthread
layer for all operating systems.
Windows implementation is under librte_eal/windows and
implemented using WIN32 API for Windows only.
Unix implementation is under librte_eal/unix and
implemented using pthread for UNIX compilation.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Move the definition of the functions
rte_thread_set_affinity and rte_thread_get_affinity
to new file, rte_thread.h
The file will implement generic threading functionality
and will only host threading functions which do not reference
pthread API.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
This patch moves memory region mmaping and related
preparation in a dedicated function in order to simplify
VHOST_USER_SET_MEM_TABLE request handling function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves the registration of postcopy to a
dedicated function, with the goal of simplifying
VHOST_USER_SET_MEM_TABLE request handling function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch moves the registration of memory regions to
userfaultfd to a dedicated function, with the goal of
simplifying VHOST_USER_SET_MEM_TABLE request handling
function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Simply replace the smp barriers with atomic thread fence for vhost control
path, if there are no synchronization points.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Simply replace smp barriers with atomic thread fence for
virtio packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Used idx can be synchronized by one-way barrier instead of full
write barrier for split vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Relax the full read barrier to one-way barrier for desc flags in
packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The ordering between avail index and desc reads has been enforced
by load-acquire for split vring, so smp_rmb barrier is not needed
behind it.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As function desc_is_avail performs a load-acquire barrier to
enforce the ordering between desc flags and desc content, it is
unnecessary to add a rte_smp_rmb barrier around the trace which
follows desc_is_avail.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use rte_atomic_thread_fence wrapper which has been provided for
__atomic_thread_fence builtins to support optimized code for
__ATOMIC_SEQ_CST memory order on x86 platforms.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
The header was missing the extern "C" directive which causes name
mangling of functions by C++ compilers, leading to linker errors
complaining of undefined references to these functions.
Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org
Signed-off-by: Ashish Sadanandan <ashish.sadanandan@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The "rev->epdata.event" assigned to "events.epdata.event" directly, which
was wrong in case of epoll events. It should be set to the "evs.events".
Fixes: 9efe9c6cdc ("eal/linux: add epoll wrappers")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
According to GCC documentation for __builtin_clz:
Returns the number of leading 0-bits in x,
starting at the most significant bit position.
If x is 0, the result is undefined.
__builtin_clz will be called with 0 if the existing
prefix address matches the one we want to insert.
Fixes: 5a5793a5ff ("rib: add RIB library")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
When building with clang (11.0,--buildtype=debug), eal_lcore.c
produces a -Wformat-nonliteral warning from the vfprintf call
in log_early.
Add __rte_format_printf annotation.
Fixes: b8a36b0866 ("eal/windows: improve CPU and NUMA node detection")
Cc: stable@dpdk.org
Suggested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
Compiling with MinGW in --buildtype=debug produces a redefinition
error for strncasecmp.
The root cause is that rte_os.h shouldn't be injecting POSIX definitions
into the environment. It is the applications responsibility to decide
how to handle missing functionality.
Resolving this properly will require further work, but in the meantime
wrap all such definitions with #ifndef/#endif. This resolves the specific
issue with strncasecmp and handles similar issues that applications may
encounter.
Fixes: e8428a9d89 ("eal/windows: add some basic functions and macros")
Cc: stable@dpdk.org
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
MinGW-w64 above 8.0.0 exposes VirtualAlloc2() API in headers, but lacks
it in import libraries. Hence, availability of this API at compile-time
can't be used to choose between locating VirtualAlloc2() manually or
relying on the dynamic linker.
Fix redefinition compile-time errors.
Always link VirtualAlloc2() when using GCC.
Fixes: 2a5d547a4a ("eal/windows: implement basic memory management")
Cc: stable@dpdk.org
Reported-by: Thomas Monjalon <thomas@monjalon.net>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Thomas Monjalon <thomas@monjalon.net>
A new generic shared actions API may be used to create shared
counter. There is no point to keep duplicate COUNT action specific
capability to create shared counters.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
GCC build with '-O0' on platforms with RTE_ARM_FEATURE_ATOMICS set
failed for:
../lib/librte_efd/rte_efd.c
Assembler messages:
3866: Error: selected processor does not support `crc32cb w0,w0,w1'
3890: Error: selected processor does not support `crc32ch w0,w0,w1'
3914: Error: selected processor does not support `crc32cw w0,w0,w1'
3938: Error: selected processor does not support `crc32cx w0,w0,x1'
This was caused by an architecture specifier added for Clang.
Unlike Clang, GCC considers each inline assembly block to be dependent
and therefore, the architecture specifier impacts assemble of some
blocks require certain extension support.
Removed the architecture for GCC to fix the issue.
Fixes: 8fce34cd0a ("eal/arm: fix clang build of native target")
Cc: stable@dpdk.org
Reported-by: Feifei Wang <feifei.wang2@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Commit 49b536fc30 ("eal: load only shared libs from driver plugin directories")
introduced a check that any shared library must ends with .so, but it can't
work, at least, on Fedora/CentOS/RHEL since .so symlinks are not installed
when you install dpdk package, but only when you install dpdk-devel package.
This commit adds also a check for .so.ABI_VERSION to check for shared lib.
See Fedora Packaging Guidelines for more information:
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages
Fixes: 49b536fc30 ("eal: load only shared libs from driver plugin directories")
Cc: stable@dpdk.org
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Commit 06c7871dde ("eal: restrict default plugin path to shared lib mode")
introduced a check that enabled shared lib mode when librte_eal.so can
be loaded, but it can't work, at least, on Fedora/CentOS/RHEL since .so
symlinks are not installed when you install dpdk package, but only when
you install dpdk-devel package.
This commit uses librte_eal.so.ABI_VERSION to check for shared lib,
since it exists on any linux distributions.
See Fedora Packaging Guidelines for more information:
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages
Fixes: 06c7871dde ("eal: restrict default plugin path to shared lib mode")
Cc: stable@dpdk.org
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
The initialization me->locked=1 in lock() must happen before
next->locked=0 in unlock(), otherwise a thread may hang forever,
waiting me->locked become 0. On weak memory systems (such as ARMv8),
the current implementation allows me->locked=1 to be reordered with
announcing the node (pred->next=me) and, consequently, to be
reordered with next->locked=0 in unlock().
This fix adds a release barrier to pred->next=me, forcing
me->locked=1 to happen before this operation.
Fixes: 2173f3333b ("mcslock: add MCS queued lock implementation")
Cc: stable@dpdk.org
Signed-off-by: Diogo Behrens <diogo.behrens@huawei.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>