Improve the devargs handling in two aspects:
- Parse the devargs string only once.
- Return error and report for unknown keys.
The common driver parses once the devargs string into a dictionary, then
provides it to all the drivers' probe. Each driver updates within it
which keys it has used, then common driver receives the updated
dictionary and reports about unknown devargs.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Functions like free, rte_free, and rte_mempool_free
already handle NULL pointer so the checks here are not necessary.
Remove redundant NULL pointer checks before free functions
found by nullfree.cocci
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
When the event thread polls traffic and a virtq is stopping, the FW loses
synchronization in the virtq indexes.
It causes LM failure on synchronization between the HOST indexes to
the GUEST indexes.
Unset the event thread before the queue stop in the LM process.
Fixes: 31b9c29c86 ("vdpa/mlx5: support close and config operations")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The return value of "mlx5_os_wrapped_mkey_create" is checked in the
caller. A zero means success without any error.
The typo in the if-condition should be fixed in case there is a
misjudgment.
Fixes: 398ea8450c ("vdpa/mlx5: workaround dirty bitmap MR creation")
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Due to kernel issue in direct MKEY creation using the DevX API, this
patch replaces the virtio MR creation to use Verbs API.
Fixes: cc07a42da2 ("vdpa/mlx5: prepare memory regions")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
Due to kernel driver/FW issues in direct MKEY creation using the DevX
API, this patch replaces the dirty bitmap MR creation to use wrapped
mkey instead.
Fixes: 9d39e57f21 ("vdpa/mlx5: support live migration")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
The number of WQEBBs was provided to QP create, and QP size was calculated
by multiplying the number of WQEBBs by 64, which is the send WQE size.
When creating RQ in the QP (i.e., vdpa driver), the queue size was bigger
because the receive WQE size is 16.
Provide queue size to QP create instead of the number of WQEBBs.
Fixes: f9213ab12c ("common/mlx5: share DevX queue pair operations")
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The DevX interface for QP creation expects the number of WQEBBs.
Wrongly, the number of descriptors was provided to the QP creation.
In addition, the QP size must be a power of 2 what was not guaranteed.
Provide the number of WQEBBs to the QP creation API.
Round up the SQ size to a power of 2.
Rename (sq/rq)_size to num_of_(send/receive)_wqes.
Fixes: 6152534e21 ("crypto/mlx5: support queue pairs operations")
Cc: stable@dpdk.org
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Tal Shnaiderman <talshn@nvidia.com>
The rdma-core library can map doorbell register in two ways, depending
on the environment variable "MLX5_SHUT_UP_BF":
- as regular cached memory, the variable is either missing or set to
zero. This type of mapping may cause the significant doorbell
register writing latency and requires an explicit memory write
barrier to mitigate this issue and prevent write combining.
- as non-cached memory, the variable is present and set to not "0"
value. This type of mapping may cause performance impact under
heavy loading conditions but the explicit write memory barrier is
not required and it may improve core performance.
The UAR creation function maps a doorbell in one of the above ways
according to the system. In run time, it always adds an explicit memory
barrier after writing to.
In cases where the doorbell was mapped as non-cached memory, the
explicit memory barrier is unnecessary and may impair performance.
The commit [1] solved this problem for a Tx queue. In run time, it
checks the mapping type and provides the memory barrier after writing to
a Tx doorbell register if it is needed. The mapping type is extracted
directly from the uar_mmap_offset field in the queue properties.
This patch shares this code between the drivers and extends the above
solution for each of them.
[1] commit 8409a28573
("net/mlx5: control transmit doorbell register mapping")
Fixes: f8c97babc9 ("compress/mlx5: add data-path functions")
Fixes: 8e196c08ab ("crypto/mlx5: support enqueue/dequeue operations")
Fixes: 4d4e245ad6 ("regex/mlx5: support enqueue")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
UAR mapping type can be affected by the devarg tx_db_nc, which can cause
setting the environment variable MLX5_SHUT_UP_BF.
So, the MLX5_SHUT_UP_BF value and the UAR mapping parameter affect the
UAR cache mode.
Wrongly, the devarg was considered for the MLX5_SHUT_UP_BF but not for
the UAR mapping parameter in all the drivers except the net.
Take the tx_db_nc devarg into account for all the drivers.
Fixes: ca1418ce39 ("common/mlx5: share device context object")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
This patch marks the vDPA driver APIs as internal and
rename the corresponding header file to vdpa_driver.h.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Removing direct access to interrupt handle structure fields,
rather use respective get set APIs for the same.
Making changes to all the drivers access the interrupt handle fields.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Raslan Darawsheh <rasland@nvidia.com>
VAR is the device memory space for the virtio queues doorbells,
Qemu could mmap it to directly to speed up doorbell push.
On a busy system, Qemu takes time to release VAR resources during driver
shutdown. If vdpa restarted quickly, the VAR allocation failed with
error 28 since the VAR is singleton resource per device.
This patch adds retry mechanism for VAR allocation.
Fixes: 4cae722c1b ("vdpa/mlx5: move virtual doorbell alloc to probe")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
After a vDPA application restart, Qemu restores VQ with used and
available index, new incoming packet triggers virtio driver to
handle buffers. Under heavy traffic, no available buffer for
firmware to receive new packets, no Rx interrupts generated,
driver is stuck on endless interrupt waiting.
As a firmware workaround, this patch sends a notification after
VQ setup to ask driver handling buffers and filling new buffers.
Fixes: bff7350110 ("vdpa/mlx5: prepare virtio queues")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add HCA attributes structure as a field of device config structure.
It query in common probing, and updates the timestamp format fields.
Each driver use HCA attributes from common device config structure,
instead of query it for itself.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Create shared Protection Domain in common area and add it and its PDN as
fields of common device structure.
Use this Protection Domain in all drivers and remove the PD and PDN
fields from their private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add option to get IB device after disabling RoCE. It is relevant if
there is vDPA class in device arguments list.
Use common device context in vDPA driver and remove the ctx field from
its private structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Create common probing structure that includes, for now, basic probing
information detected by the common driver and share it with all the
internal drivers.
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Currently drivers using QP (vDPA, crypto and compress, regex soon)
manage their memory, creation, modification and destruction of the QP,
in almost identical code.
Move QP memory management, creation and destruction to common.
Add common function to change QP state to RTS.
Add user_index attribute to QP creation.
It's for better code maintenance and reuse.
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
When VM size is larger than 4G (u32) and memory region is larger than 4G,
the 32-bit GCD function overflowed and returned wrong value
that resulted in memory registration failure.
This patch calls 64-bit GCD function to avoid overflow.
Fixes: cc07a42da2 ("vdpa/mlx5: prepare memory regions")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
Error occurs when configuring meson with --buildtype=minsize
with GCC 11.1.0:
drivers/vdpa/mlx5/mlx5_vdpa_mem.c: In function ‘mlx5_vdpa_mem_register’:
drivers/vdpa/mlx5/mlx5_vdpa_mem.c:183:24: error:
initialization of ‘uint64_t’ {aka ‘long unsigned int’} from ‘void *’
makes integer from pointer without a cast [-Werror=int-conversion]
| uint64_t gcd = NULL;
| ^~~~
drivers/vdpa/mlx5/mlx5_vdpa_mem.c:244:75: error:
‘mode’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
| klm_size = mode == MLX5_MKC_ACCESS_MODE_KLM ?
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| KLM_SIZE_MAX_ALIGN(empty_region_sz) : gcd;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Start a new release cycle with empty release notes.
The ABI version becomes 22.0.
The map files are updated to the new ABI major number (22).
The ABI exceptions are dropped and CI ABI checks are disabled because
compatibility is not preserved.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
The mlx5_vdpa_event_qp_create function makes shifting to the numeric
constant 1, then multiplies it by another constant and finally assigns
it into a uint64_t variable.
The numeric constant type is an int with a 32-bit sign. if after
shifting , its MSB (bit of sign) will change, the uint64 variable will
get into it a different value than what the function intended it to get.
Set the numeric constant 1 to be uint64_t in the first place.
Fixes: 8395927cdf ("vdpa/mlx5: prepare HW queues")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
RoCE disabling requirement is based on PCI address.
In order to support Sub-Function, a conversion is needed
in the case of an auxiliary device.
SF device can be probed with such devargs string:
auxiliary:mlx5_core.sf.<id>,class=vdpa
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This adds matching on the reserved field of VXLAN
header (the last 8-bits). The capability from rdma-core
is detected by creating a dummy matcher using misc5
when the device is probed.
For non-zero groups and FDB domain, the capability is
detected from rdma-core, meanwhile for NIC domain group
zero it's relying on the HCA_CAP from FW.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Raslan Darawsheh <rasland@nvidia.com>
Packet was corrupted when TSO requested without CSUM update.
Enables CSUM automatically if only TSO requested.
Fixes: 2aa8444b00 ("vdpa/mlx5: support stateless offloads")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Add common devargs key definition for "bus", "class" and "driver".
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
The vDPA PCI device unplug process should release all the private
device resources and also to unregister the device.
The device unregistration was missed what remained the device data
invalid in the rte_vhost library.
Unregister the device in unplug process via the remove operation.
Fixes: 95276abaaf ("vdpa/mlx5: introduce Mellanox vDPA driver")
Cc: stable@dpdk.org
Reported-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Matan Azrad <matan@nvidia.com>
Tested-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
The crypto driver added new fields to the mkey attributes struct:
crypto_en and set_remote_rw.
The entire mkey struct was not initialized, only specific fields in it,
which caused the new added fields not to be initialized resulting in a
mkey creation error.
This is fixed by initializing the entire mkey attributes struct to 0
which will prevent this issue from reoccurring if any fields are added
to the mkey struct in the future.
Fixes: 0111a74e13 ("common/mlx5: adjust DevX mkey fields for crypto")
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Let's try to enforce the convention where most drivers use a pmd. logtype
with their class reflected in it, and libraries use a lib. logtype.
Introduce two new macros:
- RTE_LOG_REGISTER_DEFAULT can be used when a single logtype is
used in a component. It is associated to the default name provided
by the build system,
- RTE_LOG_REGISTER_SUFFIX can be used when multiple logtypes are used,
and then the passed name is appended to the default name,
RTE_LOG_REGISTER is left untouched for existing external users
and for components that do not comply with the convention.
There is a new Meson variable log_prefix to adapt the default name
for baseband (pmd.bb.), bus (no pmd.) and mempool (no pmd.) classes.
Note: achieved with below commands + reverted change on net/bonding +
edits on crypto/virtio, compress/mlx5, regex/mlx5
$ git grep -l RTE_LOG_REGISTER drivers/ |
while read file; do
pattern=${file##drivers/};
class=${pattern%%/*};
pattern=${pattern#$class/};
drv=${pattern%%/*};
case "$class" in
baseband) pattern=pmd.bb.$drv;;
bus) pattern=bus.$drv;;
mempool) pattern=mempool.$drv;;
*) pattern=pmd.$class.$drv;;
esac
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
done
$ git grep -l RTE_LOG_REGISTER lib/ |
while read file; do
pattern=${file##lib/};
pattern=lib.${pattern%%/*};
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern',/RTE_LOG_REGISTER_DEFAULT(\1,/' $file;
sed -i -e 's/RTE_LOG_REGISTER(\(.*\), '$pattern'\.\(.*\),/RTE_LOG_REGISTER_SUFFIX(\1, \2,/' $file;
done
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The driver should notify the guest for each traffic burst detected by CQ
polling.
The CQ polling trigger is defined by `event_mode` device argument,
either by busy polling on all the CQs or by blocked call to HW
completion event using DevX channel.
Also, the polling event modes can move to blocked call when the
traffic rate is low.
The current blocked call uses the EAL interrupt API suffering a lot
of overhead in the API management and serve all the drivers and
libraries using only single thread.
Use blocking FD of the DevX channel in order to do blocked call
directly by the DevX channel FD mechanism.
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The function pthread_setname_np is non-portable,
so it may be unavailable in old glibc or other systems.
The function rte_thread_setname is workarounding portability issues.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
The get_ib_device_match function iterates over the list of ib devices
returned by the get_device_list glue function and returns the ib device
matching the provided address.
Since this function is in use by several drivers, in this patch we
share the function in common part.
Signed-off-by: Shiri Kuzin <shirik@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
The HW virtq object can be destroyed either when the device is closed or
when the state of the virtq becomes disabled.
Some parameters of the virtq should continue to be managed when the
virtq state is changed but all of them must be initialized when the
device is closed.
Wrongly, the enable parameter stayed on when the device is closed what
might cause creation of invalid virtq in the next time a device is
assigned to the driver.
Clean all the virtqs memory when the device is closed.
Fixes: c47d6e8333 ("vdpa/mlx5: support queue update")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
1/ The function pthread_yield() does not exist in musl libc,
and can be replaced with sched_yield() after including sched.h.
2/ The function pthread_attr_setaffinity_np() does not exist in musl libc,
and can be replaced with pthread_setaffinity_np() after pthread_create().
Fixes: b7fa0bf4d5 ("vdpa/mlx5: fix polling threads scheduling")
Fixes: 5cf3fd3af4 ("vdpa/mlx5: add CPU core parameter to bind polling thread")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: David Marchand <david.marchand@redhat.com>
This patch adds support for the timestamp format settings for
the receive and send queues. If the firmware version x.30.1000
or above is installed and the NIC timestamps are configured
with the real-time format, the default zero values for newly
added fields cause the queue creation to fail.
The patch queries the timestamp formats supported by the hardware
and sets the configuration values in queue context accordingly.
Fixes: 95276abaaf ("vdpa/mlx5: introduce Mellanox vDPA driver")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
The macro DRV_LOG already includes a terminating line feed character
defined in PMD_DRV_LOG_.
The extra line feeds added in some messages are removed.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@nvidia.com>
When the event mode is with 0 fixed delay, the polling-thread will never
give-up CPU.
So, when multi-polling-threads are active, the context-switch between
them will be managed by the system which may affect latency according to
the time-out decided by the system.
In order to fix multi-devices polling thread scheduling, this patch
forces rescheduling for each CQ poll iteration.
Move the polling thread to SCHED_RR mode with maximum priority to
complete the fairness.
Fixes: 6956a48cab ("vdpa/mlx5: set polling mode default delay to zero")
Signed-off-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
When the vDPA device is closed, the driver polling thread is canceled.
The polling thread locks the configuration mutex while it polls the CQs.
When the cancellation happens, it may terminate the thread inside the
critical section what remains the configuration mutex locked.
After device close, the driver may be configured again, in this case,
for example, when the first queue state is updated, the driver tries to
lock the mutex again and deadlock appears.
Initialize the mutex after the polling thread cancellation.
Fixes: 99abbd62c2 ("vdpa/mlx5: fix queue update synchronization")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com>
Acked-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
As announced in the deprecation note, remove all compatibility build
defines from previous make/meson versions and use only the standardized
ones - RTE_LIB_<name> for libraries, and RTE_<CLASS>_<NAME> for drivers.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The next parameters control the HW queue moderation feature.
This feature helps to control the traffic performance and latency
trade-off.
Each packet completion report from HW to SW requires CQ processing by SW
and triggers interrupt for the guest driver. Interrupt report and
handling cost CPU cycles and time and the amount of this affects
directly on packet performance and latency.
hw_latency_mode parameters [int]
0, HW default.
1, Latency is counted from the first packet completion report.
2, Latency is counted from the last packet completion.
hw_max_latency_us parameters [int]
0 - 4095, The maximum time in microseconds that packet completion
report can be delayed.
hw_max_pending_comp parameter [int]
0 - 65535, The maximum number of pending packets completions in an HW
queue.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
For better performance and latency, this patch sets default event
handling mode to polling mode which uses dedicate thread per device to
poll and process event.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch adds new device argument to specify cpu core affinity to
event polling thread for better latency and throughput. The thread
could be also located by name "vDPA-mlx5-<id>".
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
To improve performance and latency, this patch sets Rx polling mode
default delay time to zero.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>