9accffad65
Add warning and counter to handle the malicious driver detection (MDD) event. When the hardware determines that a malicious driver on VF, this VF will become unworkable, the PF records and gives a warning message. Signed-off-by: Tao Zhu <taox.zhu@intel.com> Acked-by: Qiming Yang <qiming.yang@intel.com> Acked-by: Xiaolong Ye <xiaolong.ye@intel.com>
732 lines
28 KiB
ReStructuredText
732 lines
28 KiB
ReStructuredText
.. SPDX-License-Identifier: BSD-3-Clause
|
|
Copyright(c) 2016 Intel Corporation.
|
|
|
|
I40E Poll Mode Driver
|
|
======================
|
|
|
|
The i40e PMD (librte_pmd_i40e) provides poll mode driver support for
|
|
10/25/40 Gbps Intel® Ethernet 700 Series Network Adapters based on
|
|
the Intel Ethernet Controller X710/XL710/XXV710 and Intel Ethernet
|
|
Connection X722 (only support part of features).
|
|
|
|
|
|
Features
|
|
--------
|
|
|
|
Features of the i40e PMD are:
|
|
|
|
- Multiple queues for TX and RX
|
|
- Receiver Side Scaling (RSS)
|
|
- MAC/VLAN filtering
|
|
- Packet type information
|
|
- Flow director
|
|
- Cloud filter
|
|
- Checksum offload
|
|
- VLAN/QinQ stripping and inserting
|
|
- TSO offload
|
|
- Promiscuous mode
|
|
- Multicast mode
|
|
- Port hardware statistics
|
|
- Jumbo frames
|
|
- Link state information
|
|
- Link flow control
|
|
- Mirror on port, VLAN and VSI
|
|
- Interrupt mode for RX
|
|
- Scattered and gather for TX and RX
|
|
- Vector Poll mode driver
|
|
- DCB
|
|
- VMDQ
|
|
- SR-IOV VF
|
|
- Hot plug
|
|
- IEEE1588/802.1AS timestamping
|
|
- VF Daemon (VFD) - EXPERIMENTAL
|
|
- Dynamic Device Personalization (DDP)
|
|
- Queue region configuration
|
|
- Virtual Function Port Representors
|
|
- Malicious Device Drive event catch and notify
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
- Identifying your adapter using `Intel Support
|
|
<http://www.intel.com/support>`_ and get the latest NVM/FW images.
|
|
|
|
- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment.
|
|
|
|
- To get better performance on Intel platforms, please follow the "How to get best performance with NICs on Intel platforms"
|
|
section of the :ref:`Getting Started Guide for Linux <linux_gsg>`.
|
|
|
|
- Upgrade the NVM/FW version following the `Intel® Ethernet NVM Update Tool Quick Usage Guide for Linux
|
|
<https://www-ssl.intel.com/content/www/us/en/embedded/products/networking/nvm-update-tool-quick-linux-usage-guide.html>`_ and `Intel® Ethernet NVM Update Tool: Quick Usage Guide for EFI <https://www.intel.com/content/www/us/en/embedded/products/networking/nvm-update-tool-quick-efi-usage-guide.html>`_ if needed.
|
|
|
|
Recommended Matching List
|
|
-------------------------
|
|
|
|
It is highly recommended to upgrade the i40e kernel driver and firmware to
|
|
avoid the compatibility issues with i40e PMD. Here is the suggested matching
|
|
list which has been tested and verified. The detailed information can refer
|
|
to chapter Tested Platforms/Tested NICs in release notes.
|
|
|
|
+--------------+-----------------------+------------------+
|
|
| DPDK version | Kernel driver version | Firmware version |
|
|
+==============+=======================+==================+
|
|
| 19.11 | 2.9.21 | 7.00 |
|
|
+--------------+-----------------------+------------------+
|
|
| 19.08 | 2.8.43 | 7.00 |
|
|
+--------------+-----------------------+------------------+
|
|
| 19.05 | 2.7.29 | 6.80 |
|
|
+--------------+-----------------------+------------------+
|
|
| 19.02 | 2.7.26 | 6.80 |
|
|
+--------------+-----------------------+------------------+
|
|
| 18.11 | 2.4.6 | 6.01 |
|
|
+--------------+-----------------------+------------------+
|
|
| 18.08 | 2.4.6 | 6.01 |
|
|
+--------------+-----------------------+------------------+
|
|
| 18.05 | 2.4.6 | 6.01 |
|
|
+--------------+-----------------------+------------------+
|
|
| 18.02 | 2.4.3 | 6.01 |
|
|
+--------------+-----------------------+------------------+
|
|
| 17.11 | 2.1.26 | 6.01 |
|
|
+--------------+-----------------------+------------------+
|
|
| 17.08 | 2.0.19 | 6.01 |
|
|
+--------------+-----------------------+------------------+
|
|
| 17.05 | 1.5.23 | 5.05 |
|
|
+--------------+-----------------------+------------------+
|
|
| 17.02 | 1.5.23 | 5.05 |
|
|
+--------------+-----------------------+------------------+
|
|
| 16.11 | 1.5.23 | 5.05 |
|
|
+--------------+-----------------------+------------------+
|
|
| 16.07 | 1.4.25 | 5.04 |
|
|
+--------------+-----------------------+------------------+
|
|
| 16.04 | 1.4.25 | 5.02 |
|
|
+--------------+-----------------------+------------------+
|
|
|
|
Pre-Installation Configuration
|
|
------------------------------
|
|
|
|
Config File Options
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
The following options can be modified in the ``config`` file.
|
|
Please note that enabling debugging options may affect system performance.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_PMD`` (default ``y``)
|
|
|
|
Toggle compilation of the ``librte_pmd_i40e`` driver.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_DEBUG_*`` (default ``n``)
|
|
|
|
Toggle display of generic debugging messages.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC`` (default ``y``)
|
|
|
|
Toggle bulk allocation for RX.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_INC_VECTOR`` (default ``n``)
|
|
|
|
Toggle the use of Vector PMD instead of normal RX/TX path.
|
|
To enable vPMD for RX, bulk allocation for Rx must be allowed.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` (default ``n``)
|
|
|
|
Toggle to use a 16-byte RX descriptor, by default the RX descriptor is 32 byte.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF`` (default ``64``)
|
|
|
|
Number of queues reserved for PF.
|
|
|
|
- ``CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM`` (default ``4``)
|
|
|
|
Number of queues reserved for each VMDQ Pool.
|
|
|
|
Runtime Config Options
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ``Reserved number of Queues per VF`` (default ``4``)
|
|
|
|
The number of reserved queue per VF is determined by its host PF. If the
|
|
PCI address of an i40e PF is aaaa:bb.cc, the number of reserved queues per
|
|
VF can be configured with EAL parameter like -w aaaa:bb.cc,queue-num-per-vf=n.
|
|
The value n can be 1, 2, 4, 8 or 16. If no such parameter is configured, the
|
|
number of reserved queues per VF is 4 by default. If VF request more than
|
|
reserved queues per VF, PF will able to allocate max to 16 queues after a VF
|
|
reset.
|
|
|
|
|
|
- ``Support multiple driver`` (default ``disable``)
|
|
|
|
There was a multiple driver support issue during use of 700 series Ethernet
|
|
Adapter with both Linux kernel and DPDK PMD. To fix this issue, ``devargs``
|
|
parameter ``support-multi-driver`` is introduced, for example::
|
|
|
|
-w 84:00.0,support-multi-driver=1
|
|
|
|
With the above configuration, DPDK PMD will not change global registers, and
|
|
will switch PF interrupt from IntN to Int0 to avoid interrupt conflict between
|
|
DPDK and Linux Kernel.
|
|
|
|
- ``Support VF Port Representor`` (default ``not enabled``)
|
|
|
|
The i40e PF PMD supports the creation of VF port representors for the control
|
|
and monitoring of i40e virtual function devices. Each port representor
|
|
corresponds to a single virtual function of that device. Using the ``devargs``
|
|
option ``representor`` the user can specify which virtual functions to create
|
|
port representors for on initialization of the PF PMD by passing the VF IDs of
|
|
the VFs which are required.::
|
|
|
|
-w DBDF,representor=[0,1,4]
|
|
|
|
Currently hot-plugging of representor ports is not supported so all required
|
|
representors must be specified on the creation of the PF.
|
|
|
|
- ``Use latest supported vector`` (default ``disable``)
|
|
|
|
Latest supported vector path may not always get the best perf so vector path was
|
|
recommended to use only on later platform. But users may want the latest vector path
|
|
since it can get better perf in some real work loading cases. So ``devargs`` param
|
|
``use-latest-supported-vec`` is introduced, for example::
|
|
|
|
-w 84:00.0,use-latest-supported-vec=1
|
|
|
|
- ``Enable validation for VF message`` (default ``not enabled``)
|
|
|
|
The PF counts messages from each VF. If in any period of seconds the message
|
|
statistic from a VF exceeds maximal limitation, the PF will ignore any new message
|
|
from that VF for some seconds.
|
|
Format -- "maximal-message@period-seconds:ignore-seconds"
|
|
For example::
|
|
|
|
-w 84:00.0,vf_msg_cfg=80@120:180
|
|
|
|
Vector RX Pre-conditions
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
For Vector RX it is assumed that the number of descriptor rings will be a power
|
|
of 2. With this pre-condition, the ring pointer can easily scroll back to the
|
|
head after hitting the tail without a conditional check. In addition Vector RX
|
|
can use this assumption to do a bit mask using ``ring_size - 1``.
|
|
|
|
Driver compilation and testing
|
|
------------------------------
|
|
|
|
Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
|
|
for details.
|
|
|
|
|
|
SR-IOV: Prerequisites and sample Application Notes
|
|
--------------------------------------------------
|
|
|
|
#. Load the kernel module:
|
|
|
|
.. code-block:: console
|
|
|
|
modprobe i40e
|
|
|
|
Check the output in dmesg:
|
|
|
|
.. code-block:: console
|
|
|
|
i40e 0000:83:00.1 ens802f0: renamed from eth0
|
|
|
|
#. Bring up the PF ports:
|
|
|
|
.. code-block:: console
|
|
|
|
ifconfig ens802f0 up
|
|
|
|
#. Create VF device(s):
|
|
|
|
Echo the number of VFs to be created into the ``sriov_numvfs`` sysfs entry
|
|
of the parent PF.
|
|
|
|
Example:
|
|
|
|
.. code-block:: console
|
|
|
|
echo 2 > /sys/devices/pci0000:00/0000:00:03.0/0000:81:00.0/sriov_numvfs
|
|
|
|
|
|
#. Assign VF MAC address:
|
|
|
|
Assign MAC address to the VF using iproute2 utility. The syntax is:
|
|
|
|
.. code-block:: console
|
|
|
|
ip link set <PF netdev id> vf <VF id> mac <macaddr>
|
|
|
|
Example:
|
|
|
|
.. code-block:: console
|
|
|
|
ip link set ens802f0 vf 0 mac a0:b0:c0:d0:e0:f0
|
|
|
|
#. Assign VF to VM, and bring up the VM.
|
|
Please see the documentation for the *I40E/IXGBE/IGB Virtual Function Driver*.
|
|
|
|
#. Running testpmd:
|
|
|
|
Follow instructions available in the document
|
|
:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
|
|
to run testpmd.
|
|
|
|
Example output:
|
|
|
|
.. code-block:: console
|
|
|
|
...
|
|
EAL: PCI device 0000:83:00.0 on NUMA socket 1
|
|
EAL: probe driver: 8086:1572 rte_i40e_pmd
|
|
EAL: PCI memory mapped at 0x7f7f80000000
|
|
EAL: PCI memory mapped at 0x7f7f80800000
|
|
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 8000208a
|
|
Interactive-mode selected
|
|
Configuring Port 0 (socket 0)
|
|
...
|
|
|
|
PMD: i40e_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
|
|
satisfied.Rx Burst Bulk Alloc function will be used on port=0, queue=0.
|
|
|
|
...
|
|
Port 0: 68:05:CA:26:85:84
|
|
Checking link statuses...
|
|
Port 0 Link Up - speed 10000 Mbps - full-duplex
|
|
Done
|
|
|
|
testpmd>
|
|
|
|
|
|
Sample Application Notes
|
|
------------------------
|
|
|
|
Vlan filter
|
|
~~~~~~~~~~~
|
|
|
|
Vlan filter only works when Promiscuous mode is off.
|
|
|
|
To start ``testpmd``, and add vlan 10 to port 0:
|
|
|
|
.. code-block:: console
|
|
|
|
./app/testpmd -l 0-15 -n 4 -- -i --forward-mode=mac
|
|
...
|
|
|
|
testpmd> set promisc 0 off
|
|
testpmd> rx_vlan add 10 0
|
|
|
|
|
|
Flow Director
|
|
~~~~~~~~~~~~~
|
|
|
|
The Flow Director works in receive mode to identify specific flows or sets of flows and route them to specific queues.
|
|
The Flow Director filters can match the different fields for different type of packet: flow type, specific input set per flow type and the flexible payload.
|
|
|
|
The default input set of each flow type is::
|
|
|
|
ipv4-other : src_ip_address, dst_ip_address
|
|
ipv4-frag : src_ip_address, dst_ip_address
|
|
ipv4-tcp : src_ip_address, dst_ip_address, src_port, dst_port
|
|
ipv4-udp : src_ip_address, dst_ip_address, src_port, dst_port
|
|
ipv4-sctp : src_ip_address, dst_ip_address, src_port, dst_port,
|
|
verification_tag
|
|
ipv6-other : src_ip_address, dst_ip_address
|
|
ipv6-frag : src_ip_address, dst_ip_address
|
|
ipv6-tcp : src_ip_address, dst_ip_address, src_port, dst_port
|
|
ipv6-udp : src_ip_address, dst_ip_address, src_port, dst_port
|
|
ipv6-sctp : src_ip_address, dst_ip_address, src_port, dst_port,
|
|
verification_tag
|
|
l2_payload : ether_type
|
|
|
|
The flex payload is selected from offset 0 to 15 of packet's payload by default, while it is masked out from matching.
|
|
|
|
Start ``testpmd`` with ``--disable-rss`` and ``--pkt-filter-mode=perfect``:
|
|
|
|
.. code-block:: console
|
|
|
|
./app/testpmd -l 0-15 -n 4 -- -i --disable-rss --pkt-filter-mode=perfect \
|
|
--rxq=8 --txq=8 --nb-cores=8 --nb-ports=1
|
|
|
|
Add a rule to direct ``ipv4-udp`` packet whose ``dst_ip=2.2.2.5, src_ip=2.2.2.3, src_port=32, dst_port=32`` to queue 1:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> flow_director_filter 0 mode IP add flow ipv4-udp \
|
|
src 2.2.2.3 32 dst 2.2.2.5 32 vlan 0 flexbytes () \
|
|
fwd pf queue 1 fd_id 1
|
|
|
|
Check the flow director status:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> show port fdir 0
|
|
|
|
######################## FDIR infos for port 0 ####################
|
|
MODE: PERFECT
|
|
SUPPORTED FLOW TYPE: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other
|
|
ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other
|
|
l2_payload
|
|
FLEX PAYLOAD INFO:
|
|
max_len: 16 payload_limit: 480
|
|
payload_unit: 2 payload_seg: 3
|
|
bitmask_unit: 2 bitmask_num: 2
|
|
MASK:
|
|
vlan_tci: 0x0000,
|
|
src_ipv4: 0x00000000,
|
|
dst_ipv4: 0x00000000,
|
|
src_port: 0x0000,
|
|
dst_port: 0x0000
|
|
src_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000,
|
|
dst_ipv6: 0x00000000,0x00000000,0x00000000,0x00000000
|
|
FLEX PAYLOAD SRC OFFSET:
|
|
L2_PAYLOAD: 0 1 2 3 4 5 6 ...
|
|
L3_PAYLOAD: 0 1 2 3 4 5 6 ...
|
|
L4_PAYLOAD: 0 1 2 3 4 5 6 ...
|
|
FLEX MASK CFG:
|
|
ipv4-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv4-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv4-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv4-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv4-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv6-udp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv6-tcp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv6-sctp: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv6-other: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
ipv6-frag: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
l2_payload: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
|
guarant_count: 1 best_count: 0
|
|
guarant_space: 512 best_space: 7168
|
|
collision: 0 free: 0
|
|
maxhash: 0 maxlen: 0
|
|
add: 0 remove: 0
|
|
f_add: 0 f_remove: 0
|
|
|
|
|
|
Delete all flow director rules on a port:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> flush_flow_director 0
|
|
|
|
Floating VEB
|
|
~~~~~~~~~~~~~
|
|
|
|
The Intel® Ethernet 700 Series support a feature called
|
|
"Floating VEB".
|
|
|
|
A Virtual Ethernet Bridge (VEB) is an IEEE Edge Virtual Bridging (EVB) term
|
|
for functionality that allows local switching between virtual endpoints within
|
|
a physical endpoint and also with an external bridge/network.
|
|
|
|
A "Floating" VEB doesn't have an uplink connection to the outside world so all
|
|
switching is done internally and remains within the host. As such, this
|
|
feature provides security benefits.
|
|
|
|
In addition, a Floating VEB overcomes a limitation of normal VEBs where they
|
|
cannot forward packets when the physical link is down. Floating VEBs don't need
|
|
to connect to the NIC port so they can still forward traffic from VF to VF
|
|
even when the physical link is down.
|
|
|
|
Therefore, with this feature enabled VFs can be limited to communicating with
|
|
each other but not an outside network, and they can do so even when there is
|
|
no physical uplink on the associated NIC port.
|
|
|
|
To enable this feature, the user should pass a ``devargs`` parameter to the
|
|
EAL, for example::
|
|
|
|
-w 84:00.0,enable_floating_veb=1
|
|
|
|
In this configuration the PMD will use the floating VEB feature for all the
|
|
VFs created by this PF device.
|
|
|
|
Alternatively, the user can specify which VFs need to connect to this floating
|
|
VEB using the ``floating_veb_list`` argument::
|
|
|
|
-w 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4
|
|
|
|
In this example ``VF1``, ``VF3`` and ``VF4`` connect to the floating VEB,
|
|
while other VFs connect to the normal VEB.
|
|
|
|
The current implementation only supports one floating VEB and one regular
|
|
VEB. VFs can connect to a floating VEB or a regular VEB according to the
|
|
configuration passed on the EAL command line.
|
|
|
|
The floating VEB functionality requires a NIC firmware version of 5.0
|
|
or greater.
|
|
|
|
Dynamic Device Personalization (DDP)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The Intel® Ethernet 700 Series except for the Intel Ethernet Connection
|
|
X722 support a feature called "Dynamic Device Personalization (DDP)",
|
|
which is used to configure hardware by downloading a profile to support
|
|
protocols/filters which are not supported by default. The DDP
|
|
functionality requires a NIC firmware version of 6.0 or greater.
|
|
|
|
Current implementation supports GTP-C/GTP-U/PPPoE/PPPoL2TP/ESP,
|
|
steering can be used with rte_flow API.
|
|
|
|
GTPv1 package is released, and it can be downloaded from
|
|
https://downloadcenter.intel.com/download/27587.
|
|
|
|
PPPoE package is released, and it can be downloaded from
|
|
https://downloadcenter.intel.com/download/28040.
|
|
|
|
ESP-AH package is not released yet.
|
|
|
|
Load a profile which supports GTP and store backup profile:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> ddp add 0 ./gtp.pkgo,./backup.pkgo
|
|
|
|
Delete a GTP profile and restore backup profile:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> ddp del 0 ./backup.pkgo
|
|
|
|
Get loaded DDP package info list:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> ddp get list 0
|
|
|
|
Display information about a GTP profile:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> ddp get info ./gtp.pkgo
|
|
|
|
Input set configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
Input set for any PCTYPE can be configured with user defined configuration,
|
|
For example, to use only 48bit prefix for IPv6 src address for IPv6 TCP RSS:
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> port config 0 pctype 43 hash_inset clear all
|
|
testpmd> port config 0 pctype 43 hash_inset set field 13
|
|
testpmd> port config 0 pctype 43 hash_inset set field 14
|
|
testpmd> port config 0 pctype 43 hash_inset set field 15
|
|
|
|
Queue region configuration
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
The Intel® Ethernet 700 Series supports a feature of queue regions
|
|
configuration for RSS in the PF, so that different traffic classes or
|
|
different packet classification types can be separated to different
|
|
queues in different queue regions. There is an API for configuration
|
|
of queue regions in RSS with a command line. It can parse the parameters
|
|
of the region index, queue number, queue start index, user priority, traffic
|
|
classes and so on. Depending on commands from the command line, it will call
|
|
i40e private APIs and start the process of setting or flushing the queue
|
|
region configuration. As this feature is specific for i40e only private
|
|
APIs are used. These new ``test_pmd`` commands are as shown below. For
|
|
details please refer to :doc:`../testpmd_app_ug/index`.
|
|
|
|
.. code-block:: console
|
|
|
|
testpmd> set port (port_id) queue-region region_id (value) \
|
|
queue_start_index (value) queue_num (value)
|
|
testpmd> set port (port_id) queue-region region_id (value) flowtype (value)
|
|
testpmd> set port (port_id) queue-region UP (value) region_id (value)
|
|
testpmd> set port (port_id) queue-region flush (on|off)
|
|
testpmd> show port (port_id) queue-region
|
|
|
|
Limitations or Known issues
|
|
---------------------------
|
|
|
|
MPLS packet classification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For firmware versions prior to 5.0, MPLS packets are not recognized by the NIC.
|
|
The L2 Payload flow type in flow director can be used to classify MPLS packet
|
|
by using a command in testpmd like:
|
|
|
|
testpmd> flow_director_filter 0 mode IP add flow l2_payload ether \
|
|
0x8847 flexbytes () fwd pf queue <N> fd_id <M>
|
|
|
|
With the NIC firmware version 5.0 or greater, some limited MPLS support
|
|
is added: Native MPLS (MPLS in Ethernet) skip is implemented, while no
|
|
new packet type, no classification or offload are possible. With this change,
|
|
L2 Payload flow type in flow director cannot be used to classify MPLS packet
|
|
as with previous firmware versions. Meanwhile, the Ethertype filter can be
|
|
used to classify MPLS packet by using a command in testpmd like:
|
|
|
|
testpmd> ethertype_filter 0 add mac_ignr 00:00:00:00:00:00 ethertype \
|
|
0x8847 fwd queue <M>
|
|
|
|
16 Byte RX Descriptor setting on DPDK VF
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Currently the VF's RX descriptor mode is decided by PF. There's no PF-VF
|
|
interface for VF to request the RX descriptor mode, also no interface to notify
|
|
VF its own RX descriptor mode.
|
|
For all available versions of the i40e driver, these drivers don't support 16
|
|
byte RX descriptor. If the Linux i40e kernel driver is used as host driver,
|
|
while DPDK i40e PMD is used as the VF driver, DPDK cannot choose 16 byte receive
|
|
descriptor. The reason is that the RX descriptor is already set to 32 byte by
|
|
the i40e kernel driver. That is to say, user should keep
|
|
``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n`` in config file.
|
|
In the future, if the Linux i40e driver supports 16 byte RX descriptor, user
|
|
should make sure the DPDK VF uses the same RX descriptor mode, 16 byte or 32
|
|
byte, as the PF driver.
|
|
|
|
The same rule for DPDK PF + DPDK VF. The PF and VF should use the same RX
|
|
descriptor mode. Or the VF RX will not work.
|
|
|
|
Receive packets with Ethertype 0x88A8
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Due to the FW limitation, PF can receive packets with Ethertype 0x88A8
|
|
only when floating VEB is disabled.
|
|
|
|
Incorrect Rx statistics when packet is oversize
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When a packet is over maximum frame size, the packet is dropped.
|
|
However, the Rx statistics, when calling `rte_eth_stats_get` incorrectly
|
|
shows it as received.
|
|
|
|
VF & TC max bandwidth setting
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The per VF max bandwidth and per TC max bandwidth cannot be enabled in parallel.
|
|
The behavior is different when handling per VF and per TC max bandwidth setting.
|
|
When enabling per VF max bandwidth, SW will check if per TC max bandwidth is
|
|
enabled. If so, return failure.
|
|
When enabling per TC max bandwidth, SW will check if per VF max bandwidth
|
|
is enabled. If so, disable per VF max bandwidth and continue with per TC max
|
|
bandwidth setting.
|
|
|
|
TC TX scheduling mode setting
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
There are 2 TX scheduling modes for TCs, round robin and strict priority mode.
|
|
If a TC is set to strict priority mode, it can consume unlimited bandwidth.
|
|
It means if APP has set the max bandwidth for that TC, it comes to no
|
|
effect.
|
|
It's suggested to set the strict priority mode for a TC that is latency
|
|
sensitive but no consuming much bandwidth.
|
|
|
|
VF performance is impacted by PCI extended tag setting
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To reach maximum NIC performance in the VF the PCI extended tag must be
|
|
enabled. The DPDK i40e PF driver will set this feature during initialization,
|
|
but the kernel PF driver does not. So when running traffic on a VF which is
|
|
managed by the kernel PF driver, a significant NIC performance downgrade has
|
|
been observed (for 64 byte packets, there is about 25% line-rate downgrade for
|
|
a 25GbE device and about 35% for a 40GbE device).
|
|
|
|
For kernel version >= 4.11, the kernel's PCI driver will enable the extended
|
|
tag if it detects that the device supports it. So by default, this is not an
|
|
issue. For kernels <= 4.11 or when the PCI extended tag is disabled it can be
|
|
enabled using the steps below.
|
|
|
|
#. Get the current value of the PCI configure register::
|
|
|
|
setpci -s <XX:XX.X> a8.w
|
|
|
|
#. Set bit 8::
|
|
|
|
value = value | 0x100
|
|
|
|
#. Set the PCI configure register with new value::
|
|
|
|
setpci -s <XX:XX.X> a8.w=<value>
|
|
|
|
Vlan strip of VF
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
The VF vlan strip function is only supported in the i40e kernel driver >= 2.1.26.
|
|
|
|
DCB function
|
|
~~~~~~~~~~~~
|
|
|
|
DCB works only when RSS is enabled.
|
|
|
|
Global configuration warning
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
I40E PMD will set some global registers to enable some function or set some
|
|
configure. Then when using different ports of the same NIC with Linux kernel
|
|
and DPDK, the port with Linux kernel will be impacted by the port with DPDK.
|
|
For example, register I40E_GL_SWT_L2TAGCTRL is used to control L2 tag, i40e
|
|
PMD uses I40E_GL_SWT_L2TAGCTRL to set vlan TPID. If setting TPID in port A
|
|
with DPDK, then the configuration will also impact port B in the NIC with
|
|
kernel driver, which don't want to use the TPID.
|
|
So PMD reports warning to clarify what is changed by writing global register.
|
|
|
|
High Performance of Small Packets on 40GbE NIC
|
|
----------------------------------------------
|
|
|
|
As there might be firmware fixes for performance enhancement in latest version
|
|
of firmware image, the firmware update might be needed for getting high performance.
|
|
Check the Intel support website for the latest firmware updates.
|
|
Users should consult the release notes specific to a DPDK release to identify
|
|
the validated firmware version for a NIC using the i40e driver.
|
|
|
|
Use 16 Bytes RX Descriptor Size
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
As i40e PMD supports both 16 and 32 bytes RX descriptor sizes, and 16 bytes size can provide helps to high performance of small packets.
|
|
Configuration of ``CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC`` in config files can be changed to use 16 bytes size RX descriptors.
|
|
|
|
Example of getting best performance with l3fwd example
|
|
------------------------------------------------------
|
|
|
|
The following is an example of running the DPDK ``l3fwd`` sample application to get high performance with a
|
|
server with Intel Xeon processors and Intel Ethernet CNA XL710.
|
|
|
|
The example scenario is to get best performance with two Intel Ethernet CNA XL710 40GbE ports.
|
|
See :numref:`figure_intel_perf_test_setup` for the performance test setup.
|
|
|
|
.. _figure_intel_perf_test_setup:
|
|
|
|
.. figure:: img/intel_perf_test_setup.*
|
|
|
|
Performance Test Setup
|
|
|
|
|
|
1. Add two Intel Ethernet CNA XL710 to the platform, and use one port per card to get best performance.
|
|
The reason for using two NICs is to overcome a PCIe v3.0 limitation since it cannot provide 80GbE bandwidth
|
|
for two 40GbE ports, but two different PCIe v3.0 x8 slot can.
|
|
Refer to the sample NICs output above, then we can select ``82:00.0`` and ``85:00.0`` as test ports::
|
|
|
|
82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
|
|
85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
|
|
|
|
2. Connect the ports to the traffic generator. For high speed testing, it's best to use a hardware traffic generator.
|
|
|
|
3. Check the PCI devices numa node (socket id) and get the cores number on the exact socket id.
|
|
In this case, ``82:00.0`` and ``85:00.0`` are both in socket 1, and the cores on socket 1 in the referenced platform
|
|
are 18-35 and 54-71.
|
|
Note: Don't use 2 logical cores on the same core (e.g core18 has 2 logical cores, core18 and core54), instead, use 2 logical
|
|
cores from different cores (e.g core18 and core19).
|
|
|
|
4. Bind these two ports to igb_uio.
|
|
|
|
5. As to Intel Ethernet CNA XL710 40GbE port, we need at least two queue pairs to achieve best performance, then two queues per port
|
|
will be required, and each queue pair will need a dedicated CPU core for receiving/transmitting packets.
|
|
|
|
6. The DPDK sample application ``l3fwd`` will be used for performance testing, with using two ports for bi-directional forwarding.
|
|
Compile the ``l3fwd sample`` with the default lpm mode.
|
|
|
|
7. The command line of running l3fwd would be something like the following::
|
|
|
|
./l3fwd -l 18-21 -n 4 -w 82:00.0 -w 85:00.0 \
|
|
-- -p 0x3 --config '(0,0,18),(0,1,19),(1,0,20),(1,1,21)'
|
|
|
|
This means that the application uses core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, queue pair 1 forwarding,
|
|
core 20 for port 1, queue pair 0 forwarding, and core 21 for port 1, queue pair 1 forwarding.
|
|
|
|
8. Configure the traffic at a traffic generator.
|
|
|
|
* Start creating a stream on packet generator.
|
|
|
|
* Set the Ethernet II type to 0x0800.
|
|
|
|
Tx bytes affected by the link status change
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For firmware versions prior to 6.01 for X710 series and 3.33 for X722 series, the tx_bytes statistics data is affected by
|
|
the link down event. Each time the link status changes to down, the tx_bytes decreases 110 bytes.
|