As stated in the manual, pthread_attr_init return value should be
checked.
Besides, a pthread_attr_t should be destroyed once unused.
In practice, we may have no leak (from what I read in glibc current code),
but this may change in the future.
Stick to a correct use of the API.
Fixes: 5cf3fd3af4df ("vdpa/mlx5: add CPU core parameter to bind polling thread")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Matan Azrad <matan@nvidia.com>
To help encourage use of virtio-user in place of KNI, put a reference to
the relevant howto section at the top of the KNI doc.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The HOWTO guide for using virtio-user as an exception path to the kernel
only provided an example of how testpmd may be used for that purpose.
However, a real application wanting to use virtio-user as exception path
would likely want to create such devices from code within the app
itself. Therefore, we update the doc with instructions and a code
snippet showing how this may be done.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch extensively reworks the howto guide on using virtio-user for
exception packets. Changes include:
* rename "exceptional path" to "exception path"
* remove references to uio and just reference vfio-pci
* simplify testpmd command-lines, giving a basic usage example first
before adding on detail about checksum or TSO parameters
* give a complete working example showing traffic flowing through the
whole system from a testpmd loopback using the created TAP netdev
* replace use of "ifconfig" with Linux standard "ip" command
* other general rewording.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
TIMER_MILLISECOND is defined as the number of CPU cycles per millisecond.
The current definition is correct only for cores with frequency of 2GHz.
Use DPDK API to get CPU frequency, and to define timer period.
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Signed-off-by: Omar Awaysa <omara@nvidia.com>
Here adds configs for Phytium server.
Signed-off-by: Zhipeng Lu <luzhipeng@cestc.cn>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
When allocating multi segmented buffers, and in case there is
a remainder in total buf len, the actual job len might be more
than expected job_len.
This adds additional space in the mbuf in the multi seg case,
to allow the remaining memory to be stored in one segment.
Fixes: c1d1b94eec58 ("app/regex: fix number of matches")
Cc: stable@dpdk.org
Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Check that nb_jobs is not zero before using it for a division.
Fixes: f5cffb7eb7fb6 ("app/regex: read data file once at startup")
Cc: stable@dpdk.org
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Add PPD (PCIe Port Definition) status check for SPR (Sapphire Rapids).
Note that NTB on SPR has the same device id with that on ICX, while
the field offsets of PPD Control Register are different. Here, we use
the PCI device revision id to distinguish the HW platform (ICX/SPR)
and check the Port Config Status and Port Definition accordingly.
+---------------------------+--------------------+--------------------+
| Fields | Bit Range (on ICX) | Bit Range (on SPR) |
+---------------------------+--------------------+--------------------+
| Port Configuration Status | 12 | 14 |
| Port Definition | 9:8 | 10:8 |
+---------------------------+--------------------+--------------------+
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
The 'info' struct was being declared as a NULL pointer. If a NULL
pointer is passed to 'rte_dma_info_get', EINVAL is returned and the
struct is not populated. This subsequently causes a segfault when
dereferencing 'info'.
This patch fixes the issue by simply declaring 'info' on the stack and
passing its address to 'rte_dma_info_get'.
Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
Cc: stable@dpdk.org
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
During PCI device close, any allocated memory needs to be free'd.
Currently, one of the free's is being called on an incorrect idxd_dmadev
struct member, namely 'batch_idx_ring'.
At device creation, memory is allocated for both 'batch_comp_ring' and
'batch_idx_ring' simultaneously. Calling free only on 'batch_idx_ring'
meant the first half of this memory was not being free'd, leading to the
memleak.
This patch fixes this memleak by calling free on 'batch_comp_ring' which
will free the memory for both rings.
Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
Cc: stable@dpdk.org
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
ASAN reports a memory leak for the 'pci' pointer in the 'idxd_dmadev'
struct.
This is fixed by free'ing the struct when the last queue on the PCI
device is being closed.
Fixes: 9449330a8458 ("dma/idxd: create dmadev instances on PCI probe")
Cc: stable@dpdk.org
Reported-by: Xingguang He <xingguang.he@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
EVP_PKEY function need to be called twice for RSA sign
and verify operations in 3.0 EVP API. Original OpenSSL
1.x routines are untouched. The OPENSSL_API_COMPAT is
also removed as the driver now supports OpenSSL 3.0 lib
as well when it is detected on the host.
Fixes: d7bd42f6db19 ("crypto/openssl: update RSA routine with 3.0 EVP API")
Signed-off-by: Kai Ji <kai.ji@intel.com>
Currently when running the dpdk-perf-test with DOCSIS
security sessions, a segmentation fault occurs. This
is due to the check being made that the session is not
equal to op->sym->sec_session. This check passes the
first time but on the second iteration fails and doesn't
create the build_request.
This commit fixes that error by getting the ctx first
from the private session data and then comparing ctx,
rather than op->sym->sec_session, with the sess.
Fixes: fb3b9f492205 ("crypto/qat: rework burst data path")
Cc: stable@dpdk.org
Signed-off-by: Rebecca Troy <rebecca.troy@intel.com>
Signed-off-by: Kai Ji <kai.ji@intel.com>
Negative integrity item refers to condition when the item value mask
is set, but value spec is cleared:
... integrity value mask l4_ok value spec 0 ...
ethdev library defines integrity bits `l3_ok` and `l4_ok` as accumulators
for all hardware L3 and L4 integrity verifications respectfully.
Hardware `l3_ok` and `l4_ok` integrity bits refer to L3 and L4
network headers only.
Integrity bits `l3_ok` and `l4_ok` are not compatible between
ethdev library and hardware.
PMD translations for ethdev `l3_ok` are:
IPv4: `l3_ok` and `l3_csum_ok`
IPv6: `l3_ok`
ethdev `l4_ok` is translated into PMD `l4_ok` and `l4_csum_ok` bits.
Positive IPv4 `l3_ok` flow item configuration is translated into
a single matcher that AND corresponding hardware bits.
Negative IPv4 `l3_ok` is translated into 2 hardware conditions where
each condition probes a single integrity bit:
ethdev::l3_ok is 0 => MLX5::l3_ok is 0 OR MLX5:l3_csum_ok is 0
MLX5 hardware does not do OR condition in flow rule item.
Negative IPv4 `l3_ok` must be translated into 2 flow rules.
Similarly negative ethdev `l4_ok` condition is also translated into 2
hardware rules.
Current PMD roadmap does not allow implicit flow rule split.
Bugzilla ID: 948
Cc: stable@dpdk.org
Suggested-by: Raja Zidane <rzidane@nvidia.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The number of memory regions (MR) that MLX5 PMD can use
was limited by 512 per IB device, the size of the global MR cache
that was fixed at compile time.
The cache allows to search MR LKey by address efficiently,
therefore it is the last place searched on data path
(skipped is the global MR database which would be slow).
If the application logic caused the PMD to create more than 512 MRs,
which can be the case with external memory,
those MRs would never be found on data path
and later cause a HW failure.
The cache size was fixed because at the time of overflow
the EAL memory hotplug lock may be held,
prohibiting to allocate a larger cache
(it must reside in DPDK memory for multi-process support).
This patch adds logic to release the necessary locks,
extend the cache, and repeat the attempt to insert new entries.
`mlx5_mr_btree` structure had `overflow` field
that was set when a cache (not only the global one)
could not accept new entries.
However, it was only checked for the global cache,
because caches of upper layers were dynamically expandable.
With the global cache size limitation removed, this field is not needed.
Cache size was previously limited by 16-bit indices.
Use the space in the structure previously field by `overflow` field
to extend indices to 32 bits.
With this patch, it is the HW and RAM that limit the number of MRs.
Fixes: 974f1e7ef146 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add mlx5 internal test for map and unmap external RxQs.
This patch adds to testpmd app a runtime function to test the mapping
API.
testpmd> mlx5 port (port_id) ext_rxq map (sw_queue_id) (hw_queue_id)
testpmd> mlx5 port (port_id) ext_rxq unmap (sw_queue_id)
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
Add mlx5 internal option in testpmd similar to run-time function
"port attach" which adds another parameter named "socket" for attaching
port and add 2 devargs before.
The arguments are "cmd_fd" and "pd_handle" using to import device
created out of PMD. Testpmd application import it using IPC, and updates
the devargs list before attaching.
These arguments were added in
the commit 9d936f4f1a5e ("common/mlx5: support remote PD and CTX")
The syntax is:
testpmd> mlx5 port attach (identifier) socket=(path)
Where "path" is the IPC socket path agreed on the remote process.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
Since firmware has added support for toggling PTP mode on 10k platforms
userspace code should allow doing that as well.
Cc: stable@dpdk.org
Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
A packet with RTE_PTYPE_L4_FRAG(0x300) contains both RTE_PTYPE_L4_TCP
(0x100) & RTE_PTYPE_L4_UDP (0x200). A fragmented packet as defined in
rte_mbuf_ptype.h cannot be recognized as other L4 types and hence the
GRO layer should not use IS_IPV4_TCP_PKT or IS_IPV4_UDP_PKT for
RTE_PTYPE_L4_FRAG. Hence, if the packet type is RTE_PTYPE_L4_FRAG the
IP header should be parsed to recognize the appropriate IP type and
invoke the respective gro handler.
Fixes: 1ca5e6740852 ("gro: support UDP/IPv4")
Cc: stable@dpdk.org
Signed-off-by: Kumara Parameshwaran <kumaraparamesh92@gmail.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This commit fixes an issue where calling rte_service_lcore_stop()
would result in a service's "active on lcore" status becoming stale.
The stale status would result in rte_service_may_be_active() always
returning "1", indicating that the service is not certainly stopped.
This is fixed by ensuring the "active on lcore" status of each service
is set to 0 when an lcore is stopped.
Fixes: e30dd31847d2 ("service: add mechanism for quiescing")
Fixes: 8929de043eb4 ("service: retrieve lcore active state")
Reported-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
The Rx function was not specified in the secondary process, causing the
secondary process to segfault in a multi-process environment.
This patch specify RX/TX functions in "dev_init" to support secondary
processes.
Fixes: 66fde1b943eb ("net/igc: add skeleton")
Cc: stable@dpdk.org
Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Enable double VLAN by default after firmware v8.3
and disable double VLAN is not allowed in subsequent
operations.
Fixes: 38e9762be16a ("net/i40e: add outer VLAN processing")
Signed-off-by: Kevin Liu <kevinx.liu@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
When the VF is in closed state, the vf_reset flag can not be reverted
if the VF is reset asynchronously. This prevents all virtchnl commands
from executing, causing subsequent calls to iavf_dev_reset() to fail.
So the vf_reset flag needs to be reverted even when VF is in closed state.
Fixes: 676d986b4b86 ("net/iavf: fix crash after VF reset failure")
Cc: stable@dpdk.org
Signed-off-by: Yiding Zhou <yidingx.zhou@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Current code doesn't allocate memory of lookup element to add packet
flag. This patch adds one lookup item in the list to fix this memory
issue.
Fixes: 8b95092b7f69 ("net/ice/base: fix direction of flow that matches any")
Cc: stable@dpdk.org
Signed-off-by: Yuying Zhang <yuying.zhang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
This fix replaces the usage of roc_nix_num_xstats_get() which is compile
time RoC API with runtime RoC roc_nix_xstats_names_get() API resolving
xstat count difference for cn9k and cn10k while displaying xstats
for ethdev ports
Fixes: 825bd1d9d8e6 ("common/cnxk: update extra stats for inline device")
Cc: stable@dpdk.org
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
After parsing GRE tunnel, parse subsequent protocols
(for example, TCP or UDP) as tunneled versions.
Fixes: c34ea71b878 ("common/cnxk: add NPC parsing API")
Cc: stable@dpdk.org
Signed-off-by: Satheesh Paul <psatheesh@marvell.com>
Reviewed-by: Kiran Kumar K <kirankumark@marvell.com>
The callfds[] array stores eventfds sequentially for Rx and Tx vq.
Fixes: d61138d4f0e2 ("drivers: remove direct access to interrupt handle")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
updating vhost usage message to be aligned with the documentation.
Signed-off-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Vhost sample app documentation describes parameters that are not in the
code and omits parameters that exist.
Also switching the order of sections on running vhost and VM,
since the --client parameter in the sample line
requires a socket to be created by VM.
Removing uio references and updating with vfio-pci.
Signed-off-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Meson build system creates a vhost binary but Makefile
and docs reference same as vhost-switch. Updating makefile
to match meson and the docs accordingly.
Signed-off-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
We recently improved the log messages in the vhost library, adding some
context that helps filtering for a given vhost-user device.
However, some parts of the code were missed, and some later code changes
broke this new convention (fixes were sent previous to this patch).
Change the VHOST_LOG_CONFIG/DATA helpers and always ask for a string
used as context. This should help limit regressions on this topic.
Most of the time, the context is the vhost-user device socket path.
For the rest when a vhost-user device can not be related, generic
names were chosen:
- "dma", for vhost-user async DMA operations,
- "device", for vhost-user device creation and lookup,
- "thread", for threads management,
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Those messages were missed when adding socket context.
Fix this.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
device information in the log messages was dropped.
Fixes: 52ade97e3641 ("vhost: fix physical address mapping")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch checks the return value of rte_dma_info_get()
called in rte_vhost_async_dma_configure().
Coverity issue: 379066
Fixes: 53d3f4778c1d ("vhost: integrate dmadev in asynchronous data-path")
Cc: stable@dpdk.org
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
When DPDK app is running in the VF, it sometimes rings the doorbell
before dev_config has had a chance to complete and hence it misses
the event. As workaround, ring the doorbell when vDPA reports the
notify_area to QEMU.
Fixes: 630be406dcbf ("vdpa/sfc: get queue notify area info")
Cc: stable@dpdk.org
Signed-off-by: Vijay Kumar Srivastava <vsrivast@xilinx.com>
Signed-off-by: Abhimanyu Saini <absaini@amd.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
If vring state changes after pmd starts working, the locked vring
notifies pmd, thus calling update_queuing_status(), the latter
will wait for pmd to finish accessing vring, while pmd is also
waiting for vring to be unlocked, thus causing deadlock.
Actually, update_queuing_status() only needs to wait while
destroy/stopping the device, but not in other cases.
This patch adds a flag for whether or not to wait to fix this issue.
Fixes: 1ce3c7fe149f ("net/vhost: emulate device start/stop behavior")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch fixes the missing virtio net header copy in sync
dequeue path caused by refactoring, which affects dequeue
offloading.
Fixes: 6d823bb302c7 ("vhost: prepare sync for descriptor to mbuf refactoring")
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This adds ConnectX-6 LX to the list of supported
Mellanox devices that run the MLX5 vdpa PMD.
Signed-off-by: Wisam Jaddo <wisamm@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
drain_eth_rx() uses rte_vhost_avail_entries() to calculate
the available entries to determine if a retry is required.
However, this function only works with split rings, and
calculating packed rings will return the wrong value and cause
unnecessary retries resulting in a significant performance penalty.
This patch fix that by using the difference between tx/rx burst
as the retry condition.
Fixes: be800696c26e ("examples/vhost: use burst enqueue and dequeue from lib")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Wei Ling <weix.ling@intel.com>
In the virtio blk vDPA live migration use case, before the live
migration process, QEMU will set call fd to vDPA back-end. QEMU
and vDPA back-end stand by until live migration starts.
During live migration process, QEMU sets kick fd and a new call
fd. However, after the kick fd is set to the vDPA back-end, the
vDPA back-end configures device and data path starts. The new
call fd will cause some kind of "re-configuration", this kind
of "re-configuration" cause IO drop.
After this patch, vDPA back-end configures device after kick fd
and call fd are well set and make sure no IO drops.
This patch only impact virtio blk vDPA device and does not impact
net device.
Fixes: 7015b6577178 ("vdpa/ifc: add block device SW live-migration")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch moves the 'Recommended IOVA mode in async datapath'
section under 'Vhost asynchronous data path' as a sub-section,
which makes the doc cleaner.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
For vhost message VHOST_USER_GET_CONFIG, we do not check
payload size in vhost lib, we check payload size in driver
specific ops.
For ifc vdpa driver, we just need to make sure payload size
is not smaller than sizeof(struct virtio_blk_config).
Fixes: 856d03bcdc54 ("vdpa/ifc: add block operations")
Signed-off-by: Andy Pei <andy.pei@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch updates the correct usage for async enqueue APIs.
The rte_vhost_poll_enqueue_completed() needs to be
called in time to notify the guest of completed packets and
avoid packet loss.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The virtio-user initialization requires unix socket to receive backend
messages in block mode. However, vhost_user_update_link_state() sets
the same socket to nonblocking via fcntl, which affects all threads.
Enabling the rxq interrupt can causes both of these behaviors to occur
concurrently, with the result that the initialization may fail
because no messages are received in nonblocking socket.
Thread 1:
virtio_init_device()
--> virtio_user_start_device()
--> vhost_user_set_memory_table()
--> vhost_user_check_reply_ack()
Thread 2:
virtio_interrupt_handler()
--> vhost_user_update_link_state()
Fix that by replacing O_NONBLOCK with the recv per-call option
MSG_DONTWAIT.
Fixes: ef53b6030039 ("net/virtio-user: support LSC")
Cc: stable@dpdk.org
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>