In the virtio blk vDPA live migration use case, before the live
migration process, QEMU will set call fd to vDPA back-end. QEMU
and vDPA back-end stand by until live migration starts.
During live migration process, QEMU sets kick fd and a new call
fd. However, after the kick fd is set to the vDPA back-end, the
vDPA back-end configures device and data path starts. The new
call fd will cause some kind of "re-configuration", this kind
of "re-configuration" cause IO drop.
After this patch, vDPA back-end configures device after kick fd
and call fd are well set and make sure no IO drops.
This patch only impact virtio blk vDPA device and does not impact
net device.
Fixes: 7015b6577178 ("vdpa/ifc: add block device SW live-migration")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add spinlock protection in queue delete function.
This protects the data path while the queue delete operation
is in progress.
Fixes: a3bbf2e09756 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
These are functions related to interrupts that have been
in since 20.02 release or earlier.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
These API's have been around for a long time and by now are fixed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
The RTE_LOG_REGISTER is not experimental, and the experimental
tag was never enforced on these.
Make rte_log_can_log a fully supported function.
It was introduced nearly 2yrs ago.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
The comments in rte_rib6 were cut-and-pasted from rte_rib
and because of that some references to rte_rib_node were
not updated.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Found by nullfree.cocci.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[David: for lpm parts:]
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
[David: for vdpa/mlx5 parts:]
Acked-by: Matan Azrad <matan@nvidia.com>
[David: for dma/dpaa2, raw/ifpga, vdpa/mlx5:]
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
[David: reran cocci.sh and updated common/mlx5 and cryptodev asym test]
Signed-off-by: David Marchand <david.marchand@redhat.com>
Make sure all functions which use the convention that XXX_free(NULL)
is a nop are all documented.
The wording is chosen to match the documentation of free(3).
"If ptr is NULL, no operation is performed."
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
[David: squashed with other series updates, unified wording]
Remove extraneous phrase "This API is used to" and use
active instead of passive voice when describing a function.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
[David: for raw/ioat and dmadev parts:]
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Conor Walsh <conor.walsh@intel.com>
Add support for using hugepages for worker lcore stack memory. The
intent is to improve performance by reducing stack memory related TLB
misses and also by using memory local to the NUMA node of each lcore.
EAL option '--huge-worker-stack[=stack-size-in-kbytes]' is added to allow
the feature to be enabled at runtime. If the size is not specified,
the system pthread stack size will be used.
Signed-off-by: Don Wallwork <donw@xsightlabs.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
GCC 12 raises warnings on usage of rte_memcpy with IPv4 options handling
in fragments for both the ip_frag library and unit tests.
For example in the library:
In function ‘_mm256_storeu_si256’,
inlined from ‘rte_mov32’ at
../lib/eal/x86/include/rte_memcpy.h:347:2,
inlined from ‘rte_mov128’ at
../lib/eal/x86/include/rte_memcpy.h:369:2,
inlined from ‘rte_memcpy_generic’
at ../lib/eal/x86/include/rte_memcpy.h:445:4,
inlined from ‘rte_memcpy’
at ../lib/eal/x86/include/rte_memcpy.h:851:10,
inlined from ‘__create_ipopt_frag_hdr’
at ../lib/ip_frag/rte_ipv4_fragmentation.c:68:4,
inlined from ‘rte_ipv4_fragment_packet’
at ../lib/ip_frag/rte_ipv4_fragmentation.c:242:16:
/usr/lib/gcc/x86_64-redhat-linux/12/include/avxintrin.h:935:8: error:
array subscript ‘__m256i_u[1]’ is partly outside array bounds of
‘uint8_t[60]’ {aka ‘unsigned char[60]’} [-Werror=array-bounds]
935 | *__P = __A;
| ~~~~~^~~~~
../lib/ip_frag/rte_ipv4_fragmentation.c: In function
‘rte_ipv4_fragment_packet’:
../lib/ip_frag/rte_ipv4_fragmentation.c:122:17: note: at offset [52, 60]
into object ‘ipopt_frag_hdr’ of size 60
122 | uint8_t ipopt_frag_hdr[IPV4_HDR_MAX_LEN];
| ^~~~~~~~~~~~~~
To resolve the compilation warning, replace the rte_memcpy with memcpy.
Fixes: b50a14a853aa ("ip_frag: add IPv4 options fragment")
Signed-off-by: Huichao Cai <chcchc88@163.com>
If called to allocate memory of size is between multiple of hugepage
size minus malloc_header_len and hugepage size, rte_malloc fails.
This fix replaces malloc_elem_trailer_len with malloc_elem_overhead in
try_expand_heap() to include malloc_elem_header_len when calculating
n_seg.
Bugzilla ID: 800
Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org
Signed-off-by: Fidaullah Noonari <fidaullah.noonari@emumba.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
rte_dump_stack() needs to be usable in situations when a bug is
encountered and from signal handlers (such as SEGV).
Glibc backtrace_symbols() calls malloc which makes it
dangerous in a signal handler that is handling errors that maybe
due to memory corruption. Additionally, rte_log() is unsafe because
syslog() is not signal safe; printf() is also documented as
not being safe.
This version formats message and uses writev for each line in a manner
similar to what glibc version of backtrace_symbols_fd() does. The
FreeBSD version of backtrace_symbols_fd() is not signal safe.
Sample output:
0: ./build/app/dpdk-testpmd (rte_dump_stack+0x2b) [560a6e9c002b]
1: ./build/app/dpdk-testpmd (main+0xad) [560a6decd5ad]
2: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xcd) [7fd43d3e27fd]
3: ./build/app/dpdk-testpmd (_start+0x2a) [560a6e83628a]
Bugzilla ID: 929
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
copy_data was returning a pointer to an increased (off by one) descriptor.
Subsequent calls to copy_data in the library were then failing.
Fix this by incrementing the descriptor only if there is some left data
to copy.
Fixes: 4414bb67010d ("vhost/crypto: fix build with GCC 12")
Cc: stable@dpdk.org
Reported-by: Jakub Poczatek <jakub.poczatek@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Jakub Poczatek <jakub.poczatek@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
GCC 12 raises the following warning:
In file included from ../lib/mempool/rte_mempool.h:46,
from ../lib/mbuf/rte_mbuf.h:38,
from ../lib/vhost/vhost_crypto.c:7:
../lib/vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_fetch_requests’:
../lib/eal/x86/include/rte_memcpy.h:371:9: warning: array subscript 1 is
outside array bounds of ‘struct virtio_crypto_op_data_req[1]’
[-Warray-bounds]
371 | rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/vhost/vhost_crypto.c:1178:42: note: while referencing ‘req’
1178 | struct virtio_crypto_op_data_req req;
| ^~~
Split this function and separate the per descriptor copy.
This makes the code clearer, and the compiler happier.
Note: logs for errors have been moved to callers to avoid duplicates.
Fixes: 3c79609fda7c ("vhost/crypto: handle virtually non-contiguous buffers")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Since the commit 02798b073520 ("vhost: improve virtio-net layer logs"),
vhost logs contain the socket path as a prefix.
Async dequeue path was copied from the sync dequeue path but a log
was incorrect.
Fixes: 84d5204310d7 ("vhost: support async dequeue for split ring")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patchs renames the local variables free_entries to
avail_entries in the dequeue path.
Indeed, this variable represents the number of new packets
available in the Virtio transmit queue, so these entries
are actually used, not free.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch implements packed ring dequeue data path
for asynchronous vhost.
Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
rte_vhost_clear_queue_thread_unsafe() supports to clear
in-flight packets for async enqueue only. But after
supporting async dequeue, this API should support async dequeue too.
This patch also adds the thread-safe version of this API,
the difference between the two API is that thread safety uses lock.
These APIs maybe used to clean up packets in the async channel
to prevent packet loss when the device state changes or
when the device is destroyed.
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
The Virtio specification requires that in case of checksum
offloading, the pseudo-header checksum must be set in the
L4 header.
When received from another Vhost-user port, the packet
checksum might already contain the pseudo-header checksum
but we have no way to know it. So we have no other choice
than doing the pseudo-header checksum systematically.
This patch handles this using the rte_net_intel_cksum_prepare()
helper.
Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
During adapter create, memory is allocated for storing event port
configuration which is freed during adapter free. The following
error is seen during free "EAL: Error: Invalid memory"
The service data pointer storage for txa_service_data_array is
allocated during adapter create with incorrect size which is less
than the required size.
Initialization of this memory causes buffer overflow and result in
metadata overwrite of event port config memory allocated above
and results in the above error message during free.
Allocating the correct size of memory for txa_service_data_array
prevents overwriting other memory areas like event port config
memory.
Fixes: a3bbf2e09756 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Fix the UDP header fields, wrong byte order used for src and dst port
and wrong offset used when updating UDP datagram length.
Fixes: 01eef5907fc3 ("ipsec: support NAT-T")
Cc: stable@dpdk.org
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
ALIGNMENT_MASK is only used internally.
Besides it lacks a DPDK-related prefix.
Hide it from external eyes.
Fixes: f5472703c0bd ("eal: optimize aligned memcpy on x86")
Cc: stable@dpdk.org
Reported-by: Morten Brørup <mb@smartsharesystems.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
The function rte_pie_drop was attempting to do a random probability
drop, but because of incorrect usage of fixed point divide
it would always return 1.
Change to use new rte_drand() instead.
Fixes: 44c730b0e379 ("sched: add PIE based congestion management")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
The qdelay variable is derived from and compared to 64 bit
value so it doesn't have to be floating point.
Fixes: 44c730b0e379 ("sched: add PIE based congestion management")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
The PIE code and other applications can benefit from having a
fast way to get a random floating point value. This new function
is equivalent to drand() in the standard library.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
These header includes have been flagged by the iwyu_tool
and removed.
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
These header includes have been flagged by the iwyu_tool
and removed.
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Bug scenario:
1. start testpmd:
$ dpdk-testpmd -l 4-6 -a 0000:7d:00.0 --trace=.* \
--file-prefix=trace_autotest -- -i
2. then observed:
EAL: eal_trace_init():93 failed to initialize trace [File exists]
EAL: FATAL: Cannot init trace
EAL: Cannot init trace
EAL: Error - exiting with code: 1
The root cause it that the offset set wrong with long file-prefix and
then lead the strftime return failed.
At the same time, trace_session_name_generate() uses errno as the return
value, but the errno was not set if strftime returned zero.
A previously set errno (EEXIST or ENOENT from call to mkdir for creating
the runtime configuration directory) was returned in this case.
This is fragile and may lead to incorrect logic if errno was set
to 0 previously.
This also resulted in inaccurate prompting.
Set errno to ENOSPC if strftime return zero.
Fixes: 321dd5f8fa62 ("trace: add internal init and fini interface")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Bug scenario:
1. start testpmd:
$ dpdk-testpmd -l 4-6 -a 0000:7d:00.0 --trace=.* -- -i
2. quit testpmd and then observed segment fault:
Bye...
Segmentation fault (core dumped)
The root cause is that rte_trace_save() and eal_trace_fini() access
the huge pages which were cleanup by rte_eal_memory_detach().
This patch moves rte_trace_save() and eal_trace_fini() before
rte_eal_memory_detach() to fix the bug.
Fixes: dfbc61a2f9a6 ("mem: detach memsegs on cleanup")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The P4 language requires marking a header as valid before any of the
header fields are written as opposed to after the writes are done.
Hence, the optimization of replacing the sequence of instructions to
generate a header by reading it from the table action data with a
single DMA internal instruction are reworked from "mov all + validate
-> dma" to "validate + mov all -> dma".
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Fix comparison used to check against the maximum number of learner
table timeouts.
Fixes: e2ecc53582fb ("pipeline: improve learner table timers")
Signed-off-by: Harshad Narayane <harshad.suresh.narayane@intel.com>
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Fix segmentation fault due to null pointer dereferencing inside the
"mirror" instruction when number of mirroring slots is set to 0. This
was taking place when the "mirror" instruction was used without the
mirror feature being properly configured, i.e. the API function
rte_swx_pipeline_mirroring_config was not called at initialization.
Fixes: dac0ecd9098 ("pipeline: support packet mirroring")
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
rte_xmm_t is a union type which wraps around xmm_t and maps its contents
to scalar structures. Since C++ has stricter type conversion rules than
C, the rte_xmm_t::x has to be used instead of C-casting.
The generated assembly is identical to the code without the fix (checked
both on x86 and RISC-V).
Fixes: 406937f89ffd ("lpm: add scalar version of lookupx4")
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
rte_xmm_t is a union type which wraps around xmm_t and maps its contents
to scalar structures. Since C++ has stricter type conversion rules than
C, the rte_xmm_t::x has to be used instead of C-casting.
Fixes: f22e705ebf12 ("eal/riscv: support RISC-V architecture")
Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>