doc: add shared guide for mlx5 drivers
Adds new documentation for MLX5 common driver that contains: - Its features list (doesn't exist for now). - Its devargs description. - Device configuration information and tutorial. - Quick Start Guide for Mellanox OFED/EN. Move into this doc all shared information from other MLX5 PMD docs and add them reference to new common doc. Signed-off-by: Michael Baum <michaelba@nvidia.com> Reviewed-by: Raslan Darawsheh <rasland@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This commit is contained in:
parent
67e1bb42b9
commit
a3ade5e34d
@ -3,10 +3,10 @@
|
||||
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
MLX5 compress driver
|
||||
MLX5 Compress Driver
|
||||
====================
|
||||
|
||||
The MLX5 compress driver library
|
||||
The mlx5 compress driver library
|
||||
(**librte_compress_mlx5**) provides support for **Mellanox BlueField-2**
|
||||
families of 25/50/100/200 Gb/s adapters.
|
||||
|
||||
@ -25,30 +25,7 @@ So, using the BlueField device (starting from BlueField-2), the compress
|
||||
class operations can be supported in parallel to the net, vDPA and
|
||||
RegEx class operations.
|
||||
|
||||
For security reasons and robustness, this driver only deals with virtual
|
||||
memory addresses. The way resources allocations are handled by the kernel,
|
||||
combined with hardware specifications that allow to handle virtual memory
|
||||
addresses directly, ensure that DPDK applications cannot access random
|
||||
physical memory (or memory that does not belong to the current process).
|
||||
|
||||
The PMD uses libibverbs and libmlx5 to access the device firmware
|
||||
or directly the hardware components.
|
||||
There are different levels of objects and bypassing abilities
|
||||
to get the best performances:
|
||||
|
||||
- Verbs is a complete high-level generic API.
|
||||
- Direct Verbs is a device-specific API.
|
||||
- DevX allows to access firmware objects.
|
||||
|
||||
Enabling librte_compress_mlx5 causes DPDK applications to be linked against
|
||||
libibverbs.
|
||||
|
||||
Mellanox mlx5 PCI device can be probed by number of different PCI devices,
|
||||
for example net / vDPA / RegEx. To select the compress PMD ``class=compress``
|
||||
should be specified as device parameter. The compress device can be probed and
|
||||
used with other Mellanox classes, by adding more options in the class.
|
||||
For example: ``class=net:compress`` will probe both the net PMD and the compress
|
||||
PMD.
|
||||
See :doc:`../../platform/mlx5` guide for more design details.
|
||||
|
||||
Features
|
||||
--------
|
||||
@ -85,6 +62,9 @@ Limitations
|
||||
Driver options
|
||||
--------------
|
||||
|
||||
Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
|
||||
for an additional list of options shared with other mlx5 drivers.
|
||||
|
||||
- ``log-block-size`` parameter [int]
|
||||
|
||||
Log of the Huffman block size in the Deflate algorithm.
|
||||
@ -101,4 +81,4 @@ Prerequisites
|
||||
-------------
|
||||
|
||||
- Mellanox OFED version: **5.2**
|
||||
see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
|
||||
See :ref:`mlx5 common prerequisites <mlx5_linux_prerequisites>` for more details.
|
||||
|
@ -28,23 +28,12 @@ when the MKEY is configured to perform crypto operations.
|
||||
|
||||
The encryption does not require text to be aligned to the AES block size (128b).
|
||||
|
||||
For security reasons and to increase robustness, this driver only deals with virtual
|
||||
memory addresses. The way resources allocations are handled by the kernel,
|
||||
combined with hardware specifications that allow handling virtual memory
|
||||
addresses directly, ensure that DPDK applications cannot access random
|
||||
physical memory (or memory that does not belong to the current process).
|
||||
See :doc:`../../platform/mlx5` guide for more design details.
|
||||
|
||||
The PMD uses ``libibverbs`` and ``libmlx5`` to access the device firmware
|
||||
or to access the hardware components directly.
|
||||
There are different levels of objects and bypassing abilities.
|
||||
To get the best performances:
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
- Verbs is a complete high-level generic API (Linux only).
|
||||
- Direct Verbs is a device-specific API (Linux only).
|
||||
- DevX allows to access firmware objects.
|
||||
|
||||
Enabling ``librte_crypto_mlx5`` causes DPDK applications
|
||||
to be linked against libibverbs on Linux OS.
|
||||
See the :ref:`mlx5 common configuration <mlx5_common_env>`.
|
||||
|
||||
In order to move the device to crypto operational mode, credential and KEK
|
||||
(Key Encrypting Key) should be set as the first step.
|
||||
@ -109,10 +98,8 @@ The mlxreg dedicated tool should be used as follows:
|
||||
Driver options
|
||||
--------------
|
||||
|
||||
- ``class`` parameter [string]
|
||||
|
||||
Select the class of the driver that should probe the device.
|
||||
`crypto` for the mlx5 crypto driver.
|
||||
Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
|
||||
for an additional list of options shared with other mlx5 drivers.
|
||||
|
||||
- ``wcs_file`` parameter [string] - mandatory
|
||||
|
||||
@ -168,13 +155,12 @@ Linux Prerequisites
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Mellanox OFED version: **5.3**.
|
||||
see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
|
||||
|
||||
- Compilation can be done also with rdma-core v15+.
|
||||
see :doc:`../../nics/mlx5` guide for more rdma-core details.
|
||||
|
||||
See :ref:`mlx5 common prerequisites <mlx5_linux_prerequisites>` for more details.
|
||||
|
||||
Windows Prerequisites
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Mellanox WINOF-2 version: **2.60** or higher.
|
||||
see :doc:`../../nics/mlx5` guide for more Mellanox WINOF-2 details.
|
||||
See :ref:`mlx5 common prerequisites <mlx5_windows_prerequisites>` for more details.
|
||||
|
@ -4,23 +4,16 @@
|
||||
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
MLX5 poll mode driver
|
||||
=====================
|
||||
MLX5 Ethernet Poll Mode Driver
|
||||
==============================
|
||||
|
||||
The MLX5 poll mode driver library (**librte_net_mlx5**) provides support
|
||||
The mlx5 Ethernet poll mode driver library (**librte_net_mlx5**) provides support
|
||||
for **Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx** , **Mellanox
|
||||
ConnectX-5**, **Mellanox ConnectX-6**, **Mellanox ConnectX-6 Dx**, **Mellanox
|
||||
ConnectX-6 Lx**, **Mellanox BlueField** and **Mellanox BlueField-2** families
|
||||
of 10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF)
|
||||
in SR-IOV context.
|
||||
|
||||
Information and documentation about these adapters can be found on the
|
||||
`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the
|
||||
`Mellanox community <http://community.mellanox.com/welcome>`__.
|
||||
|
||||
There is also a `section dedicated to this poll mode driver
|
||||
<https://developer.nvidia.com/networking/dpdk>`_.
|
||||
|
||||
|
||||
Design
|
||||
------
|
||||
@ -29,12 +22,6 @@ Besides its dependency on libibverbs (that implies libmlx5 and associated
|
||||
kernel support), librte_net_mlx5 relies heavily on system calls for control
|
||||
operations such as querying/updating the MTU and flow control parameters.
|
||||
|
||||
For security reasons and robustness, this driver only deals with virtual
|
||||
memory addresses. The way resources allocations are handled by the kernel,
|
||||
combined with hardware specifications that allow to handle virtual memory
|
||||
addresses directly, ensure that DPDK applications cannot access random
|
||||
physical memory (or memory that does not belong to the current process).
|
||||
|
||||
This capability allows the PMD to coexist with kernel network interfaces
|
||||
which remain functional, although they stop receiving unicast packets as
|
||||
long as they share the same MAC address.
|
||||
@ -42,18 +29,7 @@ This means legacy linux control tools (for example: ethtool, ifconfig and
|
||||
more) can operate on the same network interfaces that owned by the DPDK
|
||||
application.
|
||||
|
||||
The PMD can use libibverbs and libmlx5 to access the device firmware
|
||||
or directly the hardware components.
|
||||
There are different levels of objects and bypassing abilities
|
||||
to get the best performances:
|
||||
|
||||
- Verbs is a complete high-level generic API
|
||||
- Direct Verbs is a device-specific API
|
||||
- DevX allows to access firmware objects
|
||||
- Direct Rules manages flow steering at low-level hardware layer
|
||||
|
||||
Enabling librte_net_mlx5 causes DPDK applications to be linked against
|
||||
libibverbs.
|
||||
See :doc:`../../platform/mlx5` guide for more design details.
|
||||
|
||||
Features
|
||||
--------
|
||||
@ -522,75 +498,31 @@ Extended statistics can be queried using ``rte_eth_xstats_get()``. The extended
|
||||
|
||||
Finally per-flow statistics can by queried using ``rte_flow_query`` when attaching a count action for specific flow. The flow counter counts the number of packets received successfully by the port and match the specific flow.
|
||||
|
||||
|
||||
Compilation
|
||||
-----------
|
||||
|
||||
See :ref:`mlx5 common compilation <mlx5_common_compilation>`.
|
||||
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
Compilation options
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
Environment Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ibverbs libraries can be linked with this PMD in a number of ways,
|
||||
configured by the ``ibverbs_link`` build option:
|
||||
See :ref:`mlx5 common configuration <mlx5_common_env>`.
|
||||
|
||||
- ``shared`` (default): the PMD depends on some .so files.
|
||||
|
||||
- ``dlopen``: Split the dependencies glue in a separate library
|
||||
loaded when needed by dlopen.
|
||||
It make dependencies on libibverbs and libmlx4 optional,
|
||||
and has no performance impact.
|
||||
|
||||
- ``static``: Embed static flavor of the dependencies libibverbs and libmlx4
|
||||
in the PMD shared library or the executable static binary.
|
||||
|
||||
Environment variables
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- ``MLX5_GLUE_PATH``
|
||||
|
||||
A list of directories in which to search for the rdma-core "glue" plug-in,
|
||||
separated by colons or semi-colons.
|
||||
|
||||
- ``MLX5_SHUT_UP_BF``
|
||||
|
||||
Configures HW Tx doorbell register as IO-mapped.
|
||||
|
||||
By default, the HW Tx doorbell is configured as a write-combining register.
|
||||
The register would be flushed to HW usually when the write-combining buffer
|
||||
becomes full, but it depends on CPU design.
|
||||
|
||||
Run-time configuration
|
||||
Firmware configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- librte_net_mlx5 brings kernel network interfaces up during initialization
|
||||
because it is affected by their state. Forcing them down prevents packets
|
||||
reception.
|
||||
|
||||
- **ethtool** operations on related kernel interfaces also affect the PMD.
|
||||
|
||||
Run as non-root
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
In order to run as a non-root user,
|
||||
some capabilities must be granted to the application::
|
||||
|
||||
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_ipc_lock+ep <dpdk-app>
|
||||
|
||||
Below are the reasons of the need for each capability:
|
||||
|
||||
``cap_sys_admin``
|
||||
When using physical addresses (PA mode), with Linux >= 4.0,
|
||||
for access to ``/proc/self/pagemap``.
|
||||
|
||||
``cap_net_admin``
|
||||
For device configuration.
|
||||
|
||||
``cap_net_raw``
|
||||
For raw ethernet queue allocation through kernel driver.
|
||||
|
||||
``cap_ipc_lock``
|
||||
For DMA memory pinning.
|
||||
See :ref:`mlx5_firmware_config` guide.
|
||||
|
||||
Driver options
|
||||
^^^^^^^^^^^^^^
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
|
||||
for an additional list of options shared with other mlx5 drivers.
|
||||
|
||||
- ``rxq_cqe_comp_en`` parameter [int]
|
||||
|
||||
@ -1054,30 +986,6 @@ Driver options
|
||||
|
||||
Disabled by default (set to 0).
|
||||
|
||||
- ``mr_ext_memseg_en`` parameter [int]
|
||||
|
||||
A nonzero value enables extending memseg when registering DMA memory. If
|
||||
enabled, the number of entries in MR (Memory Region) lookup table on datapath
|
||||
is minimized and it benefits performance. On the other hand, it worsens memory
|
||||
utilization because registered memory is pinned by kernel driver. Even if a
|
||||
page in the extended chunk is freed, that doesn't become reusable until the
|
||||
entire memory is freed.
|
||||
|
||||
Enabled by default.
|
||||
|
||||
- ``mr_mempool_reg_en`` parameter [int]
|
||||
|
||||
A nonzero value enables implicit registration of DMA memory of all mempools
|
||||
except those having ``RTE_MEMPOOL_F_NON_IO``. This flag is set automatically
|
||||
for mempools populated with non-contiguous objects or those without IOVA.
|
||||
The effect is that when a packet from a mempool is transmitted,
|
||||
its memory is already registered for DMA in the PMD and no registration
|
||||
will happen on the data path. The tradeoff is extra work on the creation
|
||||
of each mempool and increased HW resource use if some mempools
|
||||
are not used with MLX5 devices.
|
||||
|
||||
Enabled by default.
|
||||
|
||||
- ``representor`` parameter [list]
|
||||
|
||||
This parameter can be used to instantiate DPDK Ethernet devices from
|
||||
@ -1148,13 +1056,6 @@ Driver options
|
||||
|
||||
By default, the PMD will set this value to 0.
|
||||
|
||||
- ``sys_mem_en`` parameter [int]
|
||||
|
||||
A non-zero value enables the PMD memory management allocating memory
|
||||
from system by default, without explicit rte memory flag.
|
||||
|
||||
By default, the PMD will set this value to 0.
|
||||
|
||||
- ``decap_en`` parameter [int]
|
||||
|
||||
Some devices do not support FCS (frame checksum) scattering for
|
||||
@ -1178,253 +1079,6 @@ Driver options
|
||||
|
||||
By default, the PMD will set this value to 1.
|
||||
|
||||
.. _mlx5_firmware_config:
|
||||
|
||||
Firmware configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Firmware features can be configured as key/value pairs.
|
||||
|
||||
The command to set a value is::
|
||||
|
||||
mlxconfig -d <device> set <key>=<value>
|
||||
|
||||
The command to query a value is::
|
||||
|
||||
mlxconfig -d <device> query | grep <key>
|
||||
|
||||
The device name for the command ``mlxconfig`` can be either the PCI address,
|
||||
or the mst device name found with::
|
||||
|
||||
mst status
|
||||
|
||||
Below are some firmware configurations listed.
|
||||
|
||||
- link type::
|
||||
|
||||
LINK_TYPE_P1
|
||||
LINK_TYPE_P2
|
||||
value: 1=Infiniband 2=Ethernet 3=VPI(auto-sense)
|
||||
|
||||
- enable SR-IOV::
|
||||
|
||||
SRIOV_EN=1
|
||||
|
||||
- maximum number of SR-IOV virtual functions::
|
||||
|
||||
NUM_OF_VFS=<max>
|
||||
|
||||
- enable DevX (required by Direct Rules and other features)::
|
||||
|
||||
UCTX_EN=1
|
||||
|
||||
- aggressive CQE zipping::
|
||||
|
||||
CQE_COMPRESSION=1
|
||||
|
||||
- L3 VXLAN and VXLAN-GPE destination UDP port::
|
||||
|
||||
IP_OVER_VXLAN_EN=1
|
||||
IP_OVER_VXLAN_PORT=<udp dport>
|
||||
|
||||
- enable VXLAN-GPE tunnel flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
or
|
||||
FLEX_PARSER_PROFILE_ENABLE=2
|
||||
|
||||
- enable IP-in-IP tunnel flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
|
||||
- enable MPLS flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=1
|
||||
|
||||
- enable ICMP(code/type/identifier/sequence number) / ICMP6(code/type) fields matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=2
|
||||
|
||||
- enable Geneve flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
or
|
||||
FLEX_PARSER_PROFILE_ENABLE=1
|
||||
|
||||
- enable Geneve TLV option flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
|
||||
- enable GTP flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=3
|
||||
|
||||
- enable eCPRI flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=4
|
||||
PROG_PARSE_GRAPH=1
|
||||
|
||||
- enable dynamic flex parser for flex item::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=4
|
||||
PROG_PARSE_GRAPH=1
|
||||
|
||||
- enable realtime timestamp format::
|
||||
|
||||
REAL_TIME_CLOCK_ENABLE=1
|
||||
|
||||
Linux Prerequisites
|
||||
-------------------
|
||||
|
||||
This driver relies on external libraries and kernel drivers for resources
|
||||
allocations and initialization. The following dependencies are not part of
|
||||
DPDK and must be installed separately:
|
||||
|
||||
- **libibverbs**
|
||||
|
||||
User space Verbs framework used by librte_net_mlx5. This library provides
|
||||
a generic interface between the kernel and low-level user space drivers
|
||||
such as libmlx5.
|
||||
|
||||
It allows slow and privileged operations (context initialization, hardware
|
||||
resources allocations) to be managed by the kernel and fast operations to
|
||||
never leave user space.
|
||||
|
||||
- **libmlx5**
|
||||
|
||||
Low-level user space driver library for Mellanox
|
||||
ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices, it is automatically loaded
|
||||
by libibverbs.
|
||||
|
||||
This library basically implements send/receive calls to the hardware
|
||||
queues.
|
||||
|
||||
- **Kernel modules**
|
||||
|
||||
They provide the kernel-side Verbs API and low level device drivers that
|
||||
manage actual hardware initialization and resources sharing with user
|
||||
space processes.
|
||||
|
||||
Unlike most other PMDs, these modules must remain loaded and bound to
|
||||
their devices:
|
||||
|
||||
- mlx5_core: hardware driver managing Mellanox
|
||||
ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices and related Ethernet kernel
|
||||
network devices.
|
||||
- mlx5_ib: InfiniBand device driver.
|
||||
- ib_uverbs: user space driver for Verbs (entry point for libibverbs).
|
||||
|
||||
- **Firmware update**
|
||||
|
||||
Mellanox OFED/EN releases include firmware updates for
|
||||
ConnectX-4/ConnectX-5/ConnectX-6/BlueField adapters.
|
||||
|
||||
Because each release provides new features, these updates must be applied to
|
||||
match the kernel modules and libraries they come with.
|
||||
|
||||
.. note::
|
||||
|
||||
Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
|
||||
licensed.
|
||||
|
||||
Installation
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Either RDMA Core library with a recent enough Linux kernel release
|
||||
(recommended) or Mellanox OFED/EN, which provides compatibility with older
|
||||
releases.
|
||||
|
||||
RDMA Core with Linux Kernel
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
- Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux installation documentation`_)
|
||||
- Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull request #227 from yishaih/tm")
|
||||
(see `RDMA Core installation documentation`_)
|
||||
- When building for i686 use:
|
||||
|
||||
- rdma-core version 18.0 or above built with 32bit support.
|
||||
- Kernel version 4.14.41 or above.
|
||||
|
||||
- Starting with rdma-core v21, static libraries can be built::
|
||||
|
||||
cd build
|
||||
CFLAGS=-fPIC cmake -DIN_PLACE=1 -DENABLE_STATIC=1 -GNinja ..
|
||||
ninja
|
||||
|
||||
.. _`Linux installation documentation`: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/plain/Documentation/admin-guide/README.rst
|
||||
.. _`RDMA Core installation documentation`: https://raw.githubusercontent.com/linux-rdma/rdma-core/master/README.md
|
||||
|
||||
|
||||
Mellanox OFED/EN
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
- Mellanox OFED version: **4.5** and above /
|
||||
Mellanox EN version: **4.5** and above
|
||||
- firmware version:
|
||||
|
||||
- ConnectX-4: **12.21.1000** and above.
|
||||
- ConnectX-4 Lx: **14.21.1000** and above.
|
||||
- ConnectX-5: **16.21.1000** and above.
|
||||
- ConnectX-5 Ex: **16.21.1000** and above.
|
||||
- ConnectX-6: **20.27.0090** and above.
|
||||
- ConnectX-6 Dx: **22.27.0090** and above.
|
||||
- BlueField: **18.25.1010** and above.
|
||||
|
||||
While these libraries and kernel modules are available on OpenFabrics
|
||||
Alliance's `website <https://www.openfabrics.org/>`__ and provided by package
|
||||
managers on most distributions, this PMD requires Ethernet extensions that
|
||||
may not be supported at the moment (this is a work in progress).
|
||||
|
||||
`Mellanox OFED
|
||||
<https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/>`__ and
|
||||
`Mellanox EN
|
||||
<https://network.nvidia.com/products/ethernet-drivers/linux/mlnx_en/>`__
|
||||
include the necessary support and should be used in the meantime. For DPDK,
|
||||
only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are
|
||||
required from that distribution.
|
||||
|
||||
.. note::
|
||||
|
||||
Several versions of Mellanox OFED/EN are available. Installing the version
|
||||
this DPDK release was developed and tested against is strongly
|
||||
recommended. Please check the `linux prerequisites`_.
|
||||
|
||||
Windows Prerequisites
|
||||
---------------------
|
||||
|
||||
This driver relies on external libraries and kernel drivers for resources
|
||||
allocations and initialization. The dependencies in the following sub-sections
|
||||
are not part of DPDK, and must be installed separately.
|
||||
|
||||
Compilation Prerequisites
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
DevX SDK installation
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The DevX SDK must be installed on the machine building the Windows PMD.
|
||||
Additional information can be found at
|
||||
`How to Integrate Windows DevX in Your Development Environment
|
||||
<https://docs.mellanox.com/display/winof2v250/RShim+Drivers+and+Usage#RShimDriversandUsage-DevXInterface>`__.
|
||||
|
||||
Runtime Prerequisites
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
WinOF2 version 2.60 or higher must be installed on the machine.
|
||||
|
||||
WinOF2 installation
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The driver can be downloaded from the following site:
|
||||
`WINOF2
|
||||
<https://www.mellanox.com/products/adapter-software/ethernet/windows/winof-2>`__
|
||||
|
||||
DevX Enablement
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
DevX for Windows must be enabled in the Windows registry.
|
||||
The keys ``DevxEnabled`` and ``DevxFsRules`` must be set.
|
||||
Additional information can be found in the WinOF2 user manual.
|
||||
|
||||
Supported NICs
|
||||
--------------
|
||||
@ -1470,149 +1124,21 @@ Below are detailed device names:
|
||||
* Mellanox\ |reg| ConnectX\ |reg|-6 Dx EN 200G MCX623105AN-VDAT (1x200G)
|
||||
* Mellanox\ |reg| ConnectX\ |reg|-6 Lx EN 25G MCX631102AN-ADAT (2x25G)
|
||||
|
||||
Quick Start Guide on OFED/EN
|
||||
----------------------------
|
||||
|
||||
1. Download latest Mellanox OFED/EN. For more info check the `linux prerequisites`_.
|
||||
Sub-Function
|
||||
------------
|
||||
|
||||
|
||||
2. Install the required libraries and kernel modules either by installing
|
||||
only the required set, or by installing the entire Mellanox OFED/EN::
|
||||
|
||||
./mlnxofedinstall --upstream-libs --dpdk
|
||||
|
||||
3. Verify the firmware is the correct one::
|
||||
|
||||
ibv_devinfo
|
||||
|
||||
4. Verify all ports links are set to Ethernet::
|
||||
|
||||
mlxconfig -d <mst device> query | grep LINK_TYPE
|
||||
LINK_TYPE_P1 ETH(2)
|
||||
LINK_TYPE_P2 ETH(2)
|
||||
|
||||
Link types may have to be configured to Ethernet::
|
||||
|
||||
mlxconfig -d <mst device> set LINK_TYPE_P1/2=1/2/3
|
||||
|
||||
* LINK_TYPE_P1=<1|2|3> , 1=Infiniband 2=Ethernet 3=VPI(auto-sense)
|
||||
|
||||
For hypervisors, verify SR-IOV is enabled on the NIC::
|
||||
|
||||
mlxconfig -d <mst device> query | grep SRIOV_EN
|
||||
SRIOV_EN True(1)
|
||||
|
||||
If needed, configure SR-IOV::
|
||||
|
||||
mlxconfig -d <mst device> set SRIOV_EN=1 NUM_OF_VFS=16
|
||||
mlxfwreset -d <mst device> reset
|
||||
|
||||
5. Restart the driver::
|
||||
|
||||
/etc/init.d/openibd restart
|
||||
|
||||
or::
|
||||
|
||||
service openibd restart
|
||||
|
||||
If link type was changed, firmware must be reset as well::
|
||||
|
||||
mlxfwreset -d <mst device> reset
|
||||
|
||||
For hypervisors, after reset write the sysfs number of virtual functions
|
||||
needed for the PF.
|
||||
|
||||
To dynamically instantiate a given number of virtual functions (VFs)::
|
||||
|
||||
echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
|
||||
|
||||
6. Install DPDK and you are ready to go.
|
||||
See :doc:`compilation instructions <../linux_gsg/build_dpdk>`.
|
||||
|
||||
Enable switchdev mode
|
||||
---------------------
|
||||
|
||||
Switchdev mode is a mode in E-Switch, that binds between representor and VF or SF.
|
||||
Representor is a port in DPDK that is connected to a VF or SF in such a way
|
||||
that assuming there are no offload flows, each packet that is sent from the VF or SF
|
||||
will be received by the corresponding representor. While each packet that is or SF
|
||||
sent to a representor will be received by the VF or SF.
|
||||
This is very useful in case of SRIOV mode, where the first packet that is sent
|
||||
by the VF or SF will be received by the DPDK application which will decide if this
|
||||
flow should be offloaded to the E-Switch. After offloading the flow packet
|
||||
that the VF or SF that are matching the flow will not be received any more by
|
||||
the DPDK application.
|
||||
|
||||
1. Enable SRIOV mode::
|
||||
|
||||
mlxconfig -d <mst device> set SRIOV_EN=true
|
||||
|
||||
2. Configure the max number of VFs::
|
||||
|
||||
mlxconfig -d <mst device> set NUM_OF_VFS=<num of vfs>
|
||||
|
||||
3. Reset the FW::
|
||||
|
||||
mlxfwreset -d <mst device> reset
|
||||
|
||||
3. Configure the actual number of VFs::
|
||||
|
||||
echo <num of vfs > /sys/class/net/<net device>/device/sriov_numvfs
|
||||
|
||||
4. Unbind the device (can be rebind after the switchdev mode)::
|
||||
|
||||
echo -n "<device pci address" > /sys/bus/pci/drivers/mlx5_core/unbind
|
||||
|
||||
5. Enable switchdev mode::
|
||||
|
||||
echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
|
||||
|
||||
Sub-Function support
|
||||
--------------------
|
||||
|
||||
Sub-Function is a portion of the PCI device, a SF netdev has its own
|
||||
dedicated queues (txq, rxq).
|
||||
A SF shares PCI level resources with other SFs and/or with its parent PCI function.
|
||||
|
||||
0. Requirement::
|
||||
|
||||
OFED version >= 5.4-0.3.3.0
|
||||
|
||||
1. Configure SF feature::
|
||||
|
||||
# Run mlxconfig on both PFs on host and ECPFs on BlueField.
|
||||
mlxconfig -d <mst device> set PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12
|
||||
|
||||
2. Enable switchdev mode::
|
||||
|
||||
mlxdevm dev eswitch set pci/<DBDF> mode switchdev
|
||||
|
||||
3. Add SF port::
|
||||
|
||||
mlxdevm port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
|
||||
|
||||
Get SFID from output: pci/<DBDF>/<SFID>
|
||||
|
||||
4. Modify MAC address::
|
||||
|
||||
mlxdevm port function set pci/<DBDF>/<SFID> hw_addr <MAC>
|
||||
|
||||
5. Activate SF port::
|
||||
|
||||
mlxdevm port function set pci/<DBDF>/<ID> state active
|
||||
|
||||
6. Devargs to probe SF device::
|
||||
|
||||
auxiliary:mlx5_core.sf.<num>,dv_flow_en=1
|
||||
See :ref:`mlx5_sub_function`.
|
||||
|
||||
Sub-Function representor support
|
||||
--------------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A SF netdev supports E-Switch representation offload
|
||||
similar to PF and VF representors.
|
||||
Use <sfnum> to probe SF representor::
|
||||
|
||||
testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
|
||||
testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
|
||||
|
||||
|
||||
Performance tuning
|
||||
------------------
|
||||
|
@ -14,4 +14,5 @@ The following are platform specific guides and setup information.
|
||||
cnxk
|
||||
dpaa
|
||||
dpaa2
|
||||
mlx5
|
||||
octeontx
|
||||
|
602
doc/guides/platform/mlx5.rst
Normal file
602
doc/guides/platform/mlx5.rst
Normal file
@ -0,0 +1,602 @@
|
||||
.. SPDX-License-Identifier: BSD-3-Clause
|
||||
Copyright 2022 6WIND S.A.
|
||||
Copyright (c) 2022 NVIDIA Corporation & Affiliates
|
||||
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
MLX5 Common Driver
|
||||
==================
|
||||
|
||||
The mlx5 common driver library (**librte_common_mlx5**) provides support for
|
||||
**Mellanox ConnectX-4**, **Mellanox ConnectX-4 Lx**, **Mellanox ConnectX-5**,
|
||||
**Mellanox ConnectX-6**, **Mellanox ConnectX-6 Dx**, **Mellanox ConnectX-6 Lx**,
|
||||
**Mellanox BlueField** and **Mellanox BlueField-2** families of
|
||||
10/25/40/50/100/200 Gb/s adapters.
|
||||
|
||||
Information and documentation for these adapters can be found on the
|
||||
`NVIDIA website <https://www.nvidia.com/en-us/networking/>`_.
|
||||
Help is also provided by the
|
||||
`Mellanox community <http://community.mellanox.com/welcome>`_.
|
||||
In addition, there is a `web section dedicated to the Poll Mode Driver
|
||||
<https://developer.nvidia.com/networking/dpdk>`_.
|
||||
|
||||
|
||||
Design
|
||||
------
|
||||
|
||||
For security reasons and to enhance robustness,
|
||||
this driver only handles virtual memory addresses.
|
||||
The way resources allocations are handled by the kernel,
|
||||
combined with hardware specifications that allow handling virtual memory addresses directly,
|
||||
ensure that DPDK applications cannot access random physical memory
|
||||
(or memory that does not belong to the current process).
|
||||
|
||||
There are different levels of objects and bypassing abilities
|
||||
which are used to get the best performance:
|
||||
|
||||
- **Verbs** is a complete high-level generic API
|
||||
- **Direct Verbs** is a device-specific API
|
||||
- **DevX** allows accessing firmware objects
|
||||
- **Direct Rules** manages flow steering at the low-level hardware layer
|
||||
|
||||
On Linux, above interfaces are provided by linking with `libibverbs` and `libmlx5`.
|
||||
See :ref:`mlx5_linux_prerequisites` for installation.
|
||||
|
||||
On Windows, DevX is the only requirement from the above list.
|
||||
See :ref:`mlx5_windows_prerequisites` for DevX SDK package installation.
|
||||
|
||||
|
||||
.. _mlx5_classes:
|
||||
|
||||
Classes
|
||||
-------
|
||||
|
||||
One mlx5 device can be probed by a number of different PMDs.
|
||||
To select a specific PMD, its name should be specified as a device parameter
|
||||
(e.g. ``0000:08:00.1,class=eth``).
|
||||
|
||||
In order to allow probing by multiple PMDs,
|
||||
several classes may be listed separated by a colon.
|
||||
For example: ``class=crypto:regex`` will probe both Crypto and RegEx PMDs.
|
||||
|
||||
|
||||
Supported Classes
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
- ``class=compress`` for :doc:`../../compressdevs/mlx5`.
|
||||
- ``class=crypto`` for :doc:`../../cryptodevs/mlx5`.
|
||||
- ``class=eth`` for :doc:`../../nics/mlx5`.
|
||||
- ``class=regex`` for :doc:`../../regexdevs/mlx5`.
|
||||
- ``class=vdpa`` for :doc:`../../vdpadevs/mlx5`.
|
||||
|
||||
By default, the mlx5 device will be probed by the ``eth`` PMD.
|
||||
|
||||
|
||||
Limitations
|
||||
~~~~~~~~~~~
|
||||
|
||||
- ``eth`` and ``vdpa`` PMDs cannot be probed at the same time.
|
||||
All other combinations are possible.
|
||||
|
||||
- On Windows, only ``eth`` and ``crypto`` are supported.
|
||||
|
||||
|
||||
.. _mlx5_common_compilation:
|
||||
|
||||
Compilation Prerequisites
|
||||
-------------------------
|
||||
|
||||
.. _mlx5_linux_prerequisites:
|
||||
|
||||
Linux Prerequisites
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This driver relies on external libraries and kernel drivers for resources
|
||||
allocations and initialization.
|
||||
The following dependencies are not part of DPDK and must be installed separately:
|
||||
|
||||
- **libibverbs**
|
||||
|
||||
User space Verbs framework used by ``librte_common_mlx5``.
|
||||
This library provides a generic interface between the kernel
|
||||
and low-level user space drivers such as ``libmlx5``.
|
||||
|
||||
It allows slow and privileged operations (context initialization,
|
||||
hardware resources allocations) to be managed by the kernel
|
||||
and fast operations to never leave user space.
|
||||
|
||||
- **libmlx5**
|
||||
|
||||
Low-level user space driver library for Mellanox devices,
|
||||
it is automatically loaded by ``libibverbs``.
|
||||
|
||||
This library basically implements send/receive calls to the hardware queues.
|
||||
|
||||
- **Kernel modules**
|
||||
|
||||
They provide the kernel-side Verbs API and low level device drivers
|
||||
that manage actual hardware initialization
|
||||
and resources sharing with user-space processes.
|
||||
|
||||
Unlike most other PMDs, these modules must remain loaded and bound to
|
||||
their devices:
|
||||
|
||||
- ``mlx5_core``: hardware driver managing Mellanox devices
|
||||
and related Ethernet kernel network devices.
|
||||
- ``mlx5_ib``: InfiniBand device driver.
|
||||
- ``ib_uverbs``: user space driver for Verbs (entry point for ``libibverbs``).
|
||||
|
||||
- **Firmware update**
|
||||
|
||||
Mellanox OFED/EN releases include firmware updates.
|
||||
|
||||
Because each release provides new features, these updates must be applied to
|
||||
match the kernel modules and libraries they come with.
|
||||
|
||||
Libraries and kernel modules can be provided either by the Linux distribution,
|
||||
or by installing Mellanox OFED/EN which provides compatibility with older kernels.
|
||||
|
||||
|
||||
Upstream Dependencies
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The mlx5 kernel modules are part of upstream Linux.
|
||||
The minimal supported kernel version is 4.14.
|
||||
For 32-bit, version 4.14.41 or above is required.
|
||||
|
||||
The libraries `libibverbs` and `libmlx5` are part of ``rdma-core``.
|
||||
It is packaged by most of Linux distributions.
|
||||
The minimal supported rdma-core version is 16.
|
||||
For 32-bit, version 18 or above is required.
|
||||
|
||||
The rdma-core sources can be downloaded at
|
||||
https://github.com/linux-rdma/rdma-core
|
||||
|
||||
It is possible to build rdma-core as static libraries starting with version 21::
|
||||
|
||||
cd build
|
||||
CFLAGS=-fPIC cmake -DIN_PLACE=1 -DENABLE_STATIC=1 -GNinja ..
|
||||
ninja
|
||||
|
||||
|
||||
Mellanox OFED/EN
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
The kernel modules and libraries are packaged with other tools
|
||||
in Mellanox OFED or Mellanox EN.
|
||||
The minimal supported versions are:
|
||||
|
||||
- Mellanox OFED version: **4.5** and above.
|
||||
- Mellanox EN version: **4.5** and above.
|
||||
- Firmware version:
|
||||
|
||||
- ConnectX-4: **12.21.1000** and above.
|
||||
- ConnectX-4 Lx: **14.21.1000** and above.
|
||||
- ConnectX-5: **16.21.1000** and above.
|
||||
- ConnectX-5 Ex: **16.21.1000** and above.
|
||||
- ConnectX-6: **20.27.0090** and above.
|
||||
- ConnectX-6 Dx: **22.27.0090** and above.
|
||||
- BlueField: **18.25.1010** and above.
|
||||
- BlueField-2: **24.28.1002** and above.
|
||||
|
||||
The firmware, the libraries libibverbs, libmlx5, and mlnx-ofed-kernel modules
|
||||
are packaged in `Mellanox OFED
|
||||
<https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/>`_.
|
||||
After downloading, it can be installed with this command::
|
||||
|
||||
./mlnxofedinstall --dpdk
|
||||
|
||||
`Mellanox EN
|
||||
<https://network.nvidia.com/products/ethernet-drivers/linux/mlnx_en/>`_
|
||||
is a smaller package including what is needed for DPDK.
|
||||
After downloading, it can be installed with this command::
|
||||
|
||||
./install --dpdk
|
||||
|
||||
After installing, the firmware version can be checked::
|
||||
|
||||
ibv_devinfo
|
||||
|
||||
.. note::
|
||||
|
||||
Several versions of Mellanox OFED/EN are available. Installing the version
|
||||
this DPDK release was developed and tested against is strongly recommended.
|
||||
Please check the "Tested Platforms" section in the :doc:`../../rel_notes/index`.
|
||||
|
||||
|
||||
.. _mlx5_windows_prerequisites:
|
||||
|
||||
Windows Prerequisites
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The mlx5 PMDs rely on external libraries and kernel drivers
|
||||
for resource allocation and initialization.
|
||||
|
||||
|
||||
DevX SDK Installation
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The DevX SDK must be installed on the machine building the Windows PMD.
|
||||
Additional information can be found at
|
||||
`How to Integrate Windows DevX in Your Development Environment
|
||||
<https://docs.nvidia.com/networking/display/winof2v260/RShim+Drivers+and+Usage#RShimDriversandUsage-DevXInterface>`_.
|
||||
The minimal supported WinOF2 version is 2.60.
|
||||
|
||||
|
||||
Compilation Options
|
||||
-------------------
|
||||
|
||||
Compilation on Linux
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ibverbs libraries can be linked with this PMD in a number of ways,
|
||||
configured by the ``ibverbs_link`` build option:
|
||||
|
||||
``shared`` (default)
|
||||
The PMD depends on some .so files.
|
||||
|
||||
``dlopen``
|
||||
Split the dependencies glue in a separate library
|
||||
loaded when needed by dlopen (see ``MLX5_GLUE_PATH``).
|
||||
It makes dependencies on libibverbs and libmlx5 optional,
|
||||
and has no performance impact.
|
||||
|
||||
``static``
|
||||
Embed static flavor of the dependencies libibverbs and libmlx5
|
||||
in the PMD shared library or the executable static binary.
|
||||
|
||||
|
||||
Compilation on Windows
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The DevX SDK location must be set through two environment variables:
|
||||
|
||||
``DEVX_LIB_PATH``
|
||||
path to the DevX lib file.
|
||||
|
||||
``DEVX_INC_PATH``
|
||||
path to the DevX header files.
|
||||
|
||||
|
||||
.. _mlx5_common_env:
|
||||
|
||||
Environment Configuration
|
||||
-------------------------
|
||||
|
||||
Linux Environment
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
The kernel network interfaces are brought up during initialization.
|
||||
Forcing them down prevents packets reception.
|
||||
|
||||
The ethtool operations on the kernel interfaces may also affect the PMD.
|
||||
|
||||
Some runtime behaviours may be configured through environment variables.
|
||||
|
||||
``MLX5_GLUE_PATH``
|
||||
If built with ``ibverbs_link=dlopen``,
|
||||
list of directories in which to search for the rdma-core "glue" plug-in,
|
||||
separated by colons or semi-colons.
|
||||
|
||||
``MLX5_SHUT_UP_BF``
|
||||
If Verbs is used (DevX disabled),
|
||||
HW queue doorbell register mapping.
|
||||
The value 0 means non-cached IO mapping,
|
||||
while 1 is a regular memory mapping.
|
||||
|
||||
With regular memory mapping, the register is flushed to HW
|
||||
usually when the write-combining buffer becomes full,
|
||||
but it depends on CPU design.
|
||||
|
||||
|
||||
Port Link with OFED/EN
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Ports links must be set to Ethernet::
|
||||
|
||||
mlxconfig -d <mst device> query | grep LINK_TYPE
|
||||
LINK_TYPE_P1 ETH(2)
|
||||
LINK_TYPE_P2 ETH(2)
|
||||
|
||||
mlxconfig -d <mst device> set LINK_TYPE_P1/2=1/2/3
|
||||
|
||||
Link type values are:
|
||||
|
||||
* ``1`` Infiniband
|
||||
* ``2`` Ethernet
|
||||
* ``3`` VPI (auto-sense)
|
||||
|
||||
If link type was changed, firmware must be reset as well::
|
||||
|
||||
mlxfwreset -d <mst device> reset
|
||||
|
||||
|
||||
.. _mlx5_vf:
|
||||
|
||||
SR-IOV Virtual Function with OFED/EN
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
SR-IOV must be enabled on the NIC.
|
||||
It can be checked in the following command::
|
||||
|
||||
mlxconfig -d <mst device> query | grep SRIOV_EN
|
||||
SRIOV_EN True(1)
|
||||
|
||||
If needed, configure SR-IOV::
|
||||
|
||||
mlxconfig -d <mst device> set SRIOV_EN=1 NUM_OF_VFS=16
|
||||
mlxfwreset -d <mst device> reset
|
||||
|
||||
After doing the change, restart the driver::
|
||||
|
||||
/etc/init.d/openibd restart
|
||||
|
||||
or::
|
||||
|
||||
service openibd restart
|
||||
|
||||
Then the virtual functions can be instantiated::
|
||||
|
||||
echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
|
||||
|
||||
|
||||
.. _mlx5_sub_function:
|
||||
|
||||
Sub-Function with OFED/EN
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Sub-Function is a portion of the PCI device,
|
||||
it has its own dedicated queues.
|
||||
An SF shares PCI-level resources with other SFs and/or with its parent PCI function.
|
||||
|
||||
0. Requirement::
|
||||
|
||||
OFED version >= 5.4-0.3.3.0
|
||||
|
||||
1. Configure SF feature::
|
||||
|
||||
# Run mlxconfig on both PFs on host and ECPFs on BlueField.
|
||||
mlxconfig -d <mst device> set PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12
|
||||
|
||||
2. Enable switchdev mode::
|
||||
|
||||
mlxdevm dev eswitch set pci/<DBDF> mode switchdev
|
||||
|
||||
3. Add SF port::
|
||||
|
||||
mlxdevm port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
|
||||
|
||||
Get SFID from output: pci/<DBDF>/<SFID>
|
||||
|
||||
4. Modify MAC address::
|
||||
|
||||
mlxdevm port function set pci/<DBDF>/<SFID> hw_addr <MAC>
|
||||
|
||||
5. Activate SF port::
|
||||
|
||||
mlxdevm port function set pci/<DBDF>/<ID> state active
|
||||
|
||||
6. Devargs to probe SF device::
|
||||
|
||||
auxiliary:mlx5_core.sf.<num>,class=eth:regex
|
||||
|
||||
|
||||
Enable Switchdev Mode
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Switchdev mode is a mode in E-Switch, that binds between representor and VF or SF.
|
||||
Representor is a port in DPDK that is connected to a VF or SF in such a way
|
||||
that assuming there are no offload flows, each packet that is sent from the VF or SF
|
||||
will be received by the corresponding representor.
|
||||
While each packet that is sent to a representor will be received by the VF or SF.
|
||||
|
||||
After :ref:`configuring VF <mlx5_vf>`, the device must be unbound::
|
||||
|
||||
printf "<device pci address>" > /sys/bus/pci/drivers/mlx5_core/unbind
|
||||
|
||||
Then switchdev mode is enabled::
|
||||
|
||||
echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
|
||||
|
||||
The device can be bound again at this point.
|
||||
|
||||
|
||||
Run as Non-Root
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
In order to run as a non-root user,
|
||||
some capabilities must be granted to the application::
|
||||
|
||||
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_ipc_lock+ep <dpdk-app>
|
||||
|
||||
Below are the reasons for the need of each capability:
|
||||
|
||||
``cap_sys_admin``
|
||||
When using physical addresses (PA mode), with Linux >= 4.0,
|
||||
for access to ``/proc/self/pagemap``.
|
||||
|
||||
``cap_net_admin``
|
||||
For device configuration.
|
||||
|
||||
``cap_net_raw``
|
||||
For raw ethernet queue allocation through kernel driver.
|
||||
|
||||
``cap_ipc_lock``
|
||||
For DMA memory pinning.
|
||||
|
||||
|
||||
Windows Environment
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
WinOF2 version 2.60 or higher must be installed on the machine.
|
||||
|
||||
|
||||
WinOF2 Installation
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The driver can be downloaded from the following site: `WINOF2
|
||||
<https://network.nvidia.com/products/adapter-software/ethernet/windows/winof-2/>`_.
|
||||
|
||||
|
||||
DevX Enablement
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
DevX for Windows must be enabled in the Windows registry.
|
||||
The keys ``DevxEnabled`` and ``DevxFsRules`` must be set.
|
||||
Additional information can be found in the WinOF2 user manual.
|
||||
|
||||
|
||||
.. _mlx5_firmware_config:
|
||||
|
||||
Firmware Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Firmware features can be configured as key/value pairs.
|
||||
|
||||
The command to set a value is::
|
||||
|
||||
mlxconfig -d <device> set <key>=<value>
|
||||
|
||||
The command to query a value is::
|
||||
|
||||
mlxconfig -d <device> query <key>
|
||||
|
||||
The device name for the command ``mlxconfig`` can be either the PCI address,
|
||||
or the mst device name found with::
|
||||
|
||||
mst status
|
||||
|
||||
Below are some firmware configurations listed.
|
||||
|
||||
- link type::
|
||||
|
||||
LINK_TYPE_P1
|
||||
LINK_TYPE_P2
|
||||
value: 1=Infiniband 2=Ethernet 3=VPI(auto-sense)
|
||||
|
||||
- enable SR-IOV::
|
||||
|
||||
SRIOV_EN=1
|
||||
|
||||
- the maximum number of SR-IOV virtual functions::
|
||||
|
||||
NUM_OF_VFS=<max>
|
||||
|
||||
- enable DevX (required by Direct Rules and other features)::
|
||||
|
||||
UCTX_EN=1
|
||||
|
||||
- aggressive CQE zipping::
|
||||
|
||||
CQE_COMPRESSION=1
|
||||
|
||||
- L3 VXLAN and VXLAN-GPE destination UDP port::
|
||||
|
||||
IP_OVER_VXLAN_EN=1
|
||||
IP_OVER_VXLAN_PORT=<udp dport>
|
||||
|
||||
- enable VXLAN-GPE tunnel flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
or
|
||||
FLEX_PARSER_PROFILE_ENABLE=2
|
||||
|
||||
- enable IP-in-IP tunnel flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
|
||||
- enable MPLS flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=1
|
||||
|
||||
- enable ICMP(code/type/identifier/sequence number) / ICMP6(code/type) fields matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=2
|
||||
|
||||
- enable Geneve flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
or
|
||||
FLEX_PARSER_PROFILE_ENABLE=1
|
||||
|
||||
- enable Geneve TLV option flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=0
|
||||
|
||||
- enable GTP flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=3
|
||||
|
||||
- enable eCPRI flow matching::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=4
|
||||
PROG_PARSE_GRAPH=1
|
||||
|
||||
- enable dynamic flex parser for flex item::
|
||||
|
||||
FLEX_PARSER_PROFILE_ENABLE=4
|
||||
PROG_PARSE_GRAPH=1
|
||||
|
||||
- enable realtime timestamp format::
|
||||
|
||||
REAL_TIME_CLOCK_ENABLE=1
|
||||
|
||||
|
||||
.. _mlx5_common_driver_options:
|
||||
|
||||
Device Arguments
|
||||
----------------
|
||||
|
||||
The driver can be configured per device.
|
||||
A single argument list can be used for a device managed by multiple PMDs.
|
||||
The parameters must be passed through the EAL option ``-a``,
|
||||
as examples below:
|
||||
|
||||
- PCI device::
|
||||
|
||||
-a 0000:03:00.2,class=eth:regex,mr_mempool_reg_en=0
|
||||
|
||||
- Auxiliary SF::
|
||||
|
||||
-a auxiliary:mlx5_core.sf.2,class=compress,mr_ext_memseg_en=0
|
||||
|
||||
Each device class PMD has its own list of specific arguments,
|
||||
and below are the arguments supported by the common mlx5 layer.
|
||||
|
||||
- ``class`` parameter [string]
|
||||
|
||||
Select the classes of the drivers that should probe the device.
|
||||
See :ref:`mlx5_classes` for more explanation and details.
|
||||
|
||||
The default value is ``eth``.
|
||||
|
||||
- ``mr_ext_memseg_en`` parameter [int]
|
||||
|
||||
A nonzero value enables extending memseg when registering DMA memory. If
|
||||
enabled, the number of entries in MR (Memory Region) lookup table on datapath
|
||||
is minimized and it benefits performance. On the other hand, it worsens memory
|
||||
utilization because registered memory is pinned by kernel driver. Even if a
|
||||
page in the extended chunk is freed, that doesn't become reusable until the
|
||||
entire memory is freed.
|
||||
|
||||
Enabled by default.
|
||||
|
||||
- ``mr_mempool_reg_en`` parameter [int]
|
||||
|
||||
A nonzero value enables implicit registration of DMA memory of all mempools
|
||||
except those having ``RTE_MEMPOOL_F_NON_IO``. This flag is set automatically
|
||||
for mempools populated with non-contiguous objects or those without IOVA.
|
||||
The effect is that when a packet from a mempool is transmitted,
|
||||
its memory is already registered for DMA in the PMD and no registration
|
||||
will happen on the data path. The tradeoff is extra work on the creation
|
||||
of each mempool and increased HW resource use if some mempools
|
||||
are not used with MLX5 devices.
|
||||
|
||||
Enabled by default.
|
||||
|
||||
- ``sys_mem_en`` parameter [int]
|
||||
|
||||
A non-zero value enables the PMD memory management allocating memory
|
||||
from system by default, without explicit rte memory flag.
|
||||
|
||||
By default, the PMD will set this value to 0.
|
@ -3,10 +3,10 @@
|
||||
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
MLX5 RegEx driver
|
||||
MLX5 RegEx Driver
|
||||
=================
|
||||
|
||||
The MLX5 RegEx (Regular Expression) driver library
|
||||
The mlx5 RegEx (Regular Expression) driver library
|
||||
(**librte_regex_mlx5**) provides support for **Mellanox BlueField-2**
|
||||
families of 25/50/100/200 Gb/s adapters.
|
||||
|
||||
@ -17,29 +17,21 @@ This PMD is configuring the RegEx HW engine.
|
||||
For the PMD to work, the application must supply
|
||||
a precompiled rule file in rof2 format.
|
||||
|
||||
The PMD uses libibverbs and libmlx5 to access the device firmware
|
||||
or directly the hardware components.
|
||||
There are different levels of objects and bypassing abilities
|
||||
to get the best performances:
|
||||
|
||||
- Verbs is a complete high-level generic API
|
||||
- Direct Verbs is a device-specific API
|
||||
- DevX allows to access firmware objects
|
||||
|
||||
Enabling librte_regex_mlx5 causes DPDK applications to be linked against
|
||||
libibverbs.
|
||||
|
||||
Mellanox mlx5 pci device can be probed by number of different pci devices,
|
||||
for example net / vDPA / RegEx. To select the RegEx PMD ``class=regex`` should
|
||||
be specified as device parameter. The RegEx device can be probed and used with
|
||||
other Mellanox devices, by adding more options in the class.
|
||||
For example: ``class=net:regex`` will probe both the net PMD and the RegEx PMD.
|
||||
See :doc:`../../platform/mlx5` guide for more design details.
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
- Multi segments mbuf support.
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
See :ref:`mlx5 common compilation <mlx5_common_compilation>`,
|
||||
:ref:`mlx5 firmware configuration <mlx5_firmware_config>`,
|
||||
and :ref:`mlx5 common driver options <mlx5_common_driver_options>`.
|
||||
|
||||
|
||||
Supported NICs
|
||||
--------------
|
||||
|
||||
@ -52,12 +44,8 @@ Prerequisites
|
||||
- Enable the RegEx capabilities using system call from the BlueField-2.
|
||||
- Official support is not yet released.
|
||||
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
- The firmware version must be greater than XX.31.0364
|
||||
|
||||
Run-time configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- **ethtool** operations on related kernel interfaces also affect the PMD.
|
||||
|
@ -3,10 +3,10 @@
|
||||
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
MLX5 vDPA driver
|
||||
MLX5 vDPA Driver
|
||||
================
|
||||
|
||||
The MLX5 vDPA (vhost data path acceleration) driver library
|
||||
The mlx5 vDPA (vhost data path acceleration) driver library
|
||||
(**librte_vdpa_mlx5**) provides support for **Mellanox ConnectX-6**,
|
||||
**Mellanox ConnectX-6 Dx** and **Mellanox BlueField** families of
|
||||
10/25/40/50/100/200 Gb/s adapters as well as their virtual functions (VF) in
|
||||
@ -17,33 +17,8 @@ SR-IOV context.
|
||||
This driver is enabled automatically when using "meson" build system which
|
||||
will detect dependencies.
|
||||
|
||||
|
||||
Design
|
||||
------
|
||||
|
||||
For security reasons and robustness, this driver only deals with virtual
|
||||
memory addresses. The way resources allocations are handled by the kernel,
|
||||
combined with hardware specifications that allow to handle virtual memory
|
||||
addresses directly, ensure that DPDK applications cannot access random
|
||||
physical memory (or memory that does not belong to the current process).
|
||||
|
||||
The PMD can use libibverbs and libmlx5 to access the device firmware
|
||||
or directly the hardware components.
|
||||
There are different levels of objects and bypassing abilities
|
||||
to get the best performances:
|
||||
|
||||
- Verbs is a complete high-level generic API
|
||||
- Direct Verbs is a device-specific API
|
||||
- DevX allows to access firmware objects
|
||||
- Direct Rules manages flow steering at low-level hardware layer
|
||||
|
||||
Enabling librte_vdpa_mlx5 causes DPDK applications to be linked against
|
||||
libibverbs.
|
||||
|
||||
A Mellanox mlx5 PCI device can be probed by either net/mlx5 driver or vdpa/mlx5
|
||||
driver but not in parallel. Hence, the user should decide the driver by the
|
||||
``class`` parameter in the device argument list.
|
||||
By default, the mlx5 device will be probed by the net/mlx5 driver.
|
||||
See :doc:`../../platform/mlx5` guide for design details,
|
||||
and which PMDs can be combined with vDPA PMD.
|
||||
|
||||
Supported NICs
|
||||
--------------
|
||||
@ -58,52 +33,16 @@ Prerequisites
|
||||
-------------
|
||||
|
||||
- Mellanox OFED version: **5.0**
|
||||
see :doc:`../../nics/mlx5` guide for more Mellanox OFED details.
|
||||
|
||||
Compilation option
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The meson option ``ibverbs_link`` is **shared** by default,
|
||||
but can be configured to have the following values:
|
||||
|
||||
- ``dlopen``
|
||||
|
||||
Build PMD with additional code to make it loadable without hard
|
||||
dependencies on **libibverbs** nor **libmlx5**, which may not be installed
|
||||
on the target system.
|
||||
|
||||
In this mode, their presence is still required for it to run properly,
|
||||
however their absence won't prevent a DPDK application from starting (with
|
||||
DPDK shared build disabled) and they won't show up as missing with ``ldd(1)``.
|
||||
|
||||
It works by moving these dependencies to a purpose-built rdma-core "glue"
|
||||
plug-in which must be installed in a directory whose name is based
|
||||
on ``RTE_EAL_PMD_PATH`` suffixed with ``-glue``.
|
||||
|
||||
This option has no performance impact.
|
||||
|
||||
- ``static``
|
||||
|
||||
Embed static flavor of the dependencies **libibverbs** and **libmlx5**
|
||||
in the PMD shared library or the executable static binary.
|
||||
|
||||
.. note::
|
||||
|
||||
Default armv8a configuration of meson build sets ``RTE_CACHE_LINE_SIZE``
|
||||
to 128 then brings performance degradation.
|
||||
See :ref:`mlx5 common prerequisites <mlx5_linux_prerequisites>` for more details.
|
||||
|
||||
Run-time configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- **ethtool** operations on related kernel interfaces also affect the PMD.
|
||||
|
||||
Driver options
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
- ``class`` parameter [string]
|
||||
|
||||
Select the class of the driver that should probe the device.
|
||||
`vdpa` for the mlx5 vDPA driver.
|
||||
Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
|
||||
for an additional list of options shared with other mlx5 drivers.
|
||||
|
||||
- ``event_mode`` parameter [int]
|
||||
|
||||
@ -163,18 +102,6 @@ Driver options
|
||||
- 0, HW default.
|
||||
|
||||
|
||||
Devargs example
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
- PCI devargs::
|
||||
|
||||
-a 0000:03:00.2,class=vdpa
|
||||
|
||||
- Auxiliary devargs::
|
||||
|
||||
-a auxiliary:mlx5_core.sf.2,class=vdpa
|
||||
|
||||
|
||||
Error handling
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user