99f9d799ce
The driver should notify the guest for each traffic burst detected by CQ polling. The CQ polling trigger is defined by `event_mode` device argument, either by busy polling on all the CQs or by blocked call to HW completion event using DevX channel. Also, the polling event modes can move to blocked call when the traffic rate is low. The current blocked call uses the EAL interrupt API suffering a lot of overhead in the API management and serve all the drivers and libraries using only single thread. Use blocking FD of the DevX channel in order to do blocked call directly by the DevX channel FD mechanism. Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
173 lines
5.5 KiB
ReStructuredText
173 lines
5.5 KiB
ReStructuredText
.. SPDX-License-Identifier: BSD-3-Clause
|
|
Copyright 2019 Mellanox Technologies, Ltd
|
|
|
|
.. include:: <isonum.txt>
|
|
|
|
MLX5 vDPA driver
|
|
================
|
|
|
|
The MLX5 vDPA (vhost data path acceleration) driver library
|
|
(**librte_vdpa_mlx5**) provides support for **Mellanox ConnectX-6**,
|
|
**Mellanox ConnectX-6 Dx** and **Mellanox BlueField** families of
|
|
10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in
|
|
SR-IOV context.
|
|
|
|
.. note::
|
|
|
|
This driver is enabled automatically when using "meson" build system which
|
|
will detect dependencies.
|
|
|
|
|
|
Design
|
|
------
|
|
|
|
For security reasons and robustness, this driver only deals with virtual
|
|
memory addresses. The way resources allocations are handled by the kernel,
|
|
combined with hardware specifications that allow to handle virtual memory
|
|
addresses directly, ensure that DPDK applications cannot access random
|
|
physical memory (or memory that does not belong to the current process).
|
|
|
|
The PMD can use libibverbs and libmlx5 to access the device firmware
|
|
or directly the hardware components.
|
|
There are different levels of objects and bypassing abilities
|
|
to get the best performances:
|
|
|
|
- Verbs is a complete high-level generic API
|
|
- Direct Verbs is a device-specific API
|
|
- DevX allows to access firmware objects
|
|
- Direct Rules manages flow steering at low-level hardware layer
|
|
|
|
Enabling librte_vdpa_mlx5 causes DPDK applications to be linked against
|
|
libibverbs.
|
|
|
|
A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5
|
|
driver but not in parallel. Hence, the user should decide the driver by the
|
|
``class`` parameter in the device argument list.
|
|
By default, the mlx5 device will be probed by the net/mlx5 driver.
|
|
|
|
Supported NICs
|
|
--------------
|
|
|
|
* Mellanox\ |reg| ConnectX\ |reg|-6 200G MCX654106A-HCAT (2x200G)
|
|
* Mellanox\ |reg| ConnectX\ |reg|-6 Dx EN 25G MCX621102AN-ADAT (2x25G)
|
|
* Mellanox\ |reg| ConnectX\ |reg|-6 Dx EN 100G MCX623106AN-CDAT (2x100G)
|
|
* Mellanox\ |reg| ConnectX\ |reg|-6 Dx EN 200G MCX623105AN-VDAT (1x200G)
|
|
* Mellanox\ |reg| BlueField SmartNIC 25G MBF1M332A-ASCAT (2x25G)
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
- Mellanox OFED version: **5.0**
|
|
see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
|
|
|
|
Compilation option
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
The meson option ``ibverbs_link`` is **shared** by default,
|
|
but can be configured to have the following values:
|
|
|
|
- ``dlopen``
|
|
|
|
Build PMD with additional code to make it loadable without hard
|
|
dependencies on **libibverbs** nor **libmlx5**, which may not be installed
|
|
on the target system.
|
|
|
|
In this mode, their presence is still required for it to run properly,
|
|
however their absence won't prevent a DPDK application from starting (with
|
|
DPDK shared build disabled) and they won't show up as missing with ``ldd(1)``.
|
|
|
|
It works by moving these dependencies to a purpose-built rdma-core "glue"
|
|
plug-in which must be installed in a directory whose name is based
|
|
on ``RTE_EAL_PMD_PATH`` suffixed with ``-glue``.
|
|
|
|
This option has no performance impact.
|
|
|
|
- ``static``
|
|
|
|
Embed static flavor of the dependencies **libibverbs** and **libmlx5**
|
|
in the PMD shared library or the executable static binary.
|
|
|
|
.. note::
|
|
|
|
Default armv8a configuration of meson build sets ``RTE_CACHE_LINE_SIZE``
|
|
to 128 then brings performance degradation.
|
|
|
|
Run-time configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- **ethtool** operations on related kernel interfaces also affect the PMD.
|
|
|
|
Driver options
|
|
^^^^^^^^^^^^^^
|
|
|
|
- ``class`` parameter [string]
|
|
|
|
Select the class of the driver that should probe the device.
|
|
`vdpa` for the mlx5 vDPA driver.
|
|
|
|
- ``event_mode`` parameter [int]
|
|
|
|
- 0, Completion queue scheduling will be managed by a timer thread which
|
|
automatically adjusts its delays to the coming traffic rate.
|
|
|
|
- 1, Completion queue scheduling will be managed by a timer thread with fixed
|
|
delay time.
|
|
|
|
- 2, Completion queue scheduling will be managed by interrupts. Each CQ burst
|
|
arms the CQ in order to get an interrupt event in the next traffic burst.
|
|
|
|
- Default mode is 1.
|
|
|
|
- ``event_us`` parameter [int]
|
|
|
|
Per mode micro-seconds parameter - relevant only for event mode 0 and 1:
|
|
|
|
- 0, A nonzero value to set timer step in micro-seconds. The timer thread
|
|
dynamic delay change steps according to this value. Default value is 1us.
|
|
|
|
- 1, A value to set fixed timer delay in micro-seconds. Default value is 0us.
|
|
|
|
- ``no_traffic_time`` parameter [int]
|
|
|
|
A nonzero value defines the traffic off time, in polling cycle time units,
|
|
that moves the driver to no-traffic mode. In this mode the polling is stopped
|
|
and interrupts are configured to the device in order to notify traffic for the
|
|
driver. Default value is 16.
|
|
|
|
- ``event_core`` parameter [int]
|
|
|
|
CPU core number to set polling thread affinity to, default to control plane
|
|
cpu.
|
|
|
|
- ``hw_latency_mode`` parameter [int]
|
|
|
|
The completion queue moderation mode:
|
|
|
|
- 0, HW default.
|
|
|
|
- 1, Latency is counted from the first packet completion report.
|
|
|
|
- 2, Latency is counted from the last packet completion.
|
|
|
|
- ``hw_max_latency_us`` parameter [int]
|
|
|
|
- 1 - 4095, The maximum time in microseconds that packet completion report
|
|
can be delayed.
|
|
|
|
- 0, HW default.
|
|
|
|
- ``hw_max_pending_comp`` parameter [int]
|
|
|
|
- 1 - 65535, The maximum number of pending packets completions in an HW queue.
|
|
|
|
- 0, HW default.
|
|
|
|
|
|
Error handling
|
|
^^^^^^^^^^^^^^
|
|
|
|
Upon potential hardware errors, mlx5 PMD try to recover, give up if failed 3
|
|
times in 3 seconds, virtq will be put in disable state. User should check log
|
|
to get error information, or query vdpa statistics counter to know error type
|
|
and count report.
|