2018-02-01 17:18:17 +00:00
|
|
|
.. SPDX-License-Identifier: BSD-3-Clause
|
|
|
|
Copyright(c) 2015-2016 Intel Corporation.
|
2015-12-10 07:00:21 +00:00
|
|
|
|
|
|
|
FM10K Poll Mode Driver
|
|
|
|
======================
|
|
|
|
|
|
|
|
The FM10K poll mode driver library provides support for the Intel FM10000
|
|
|
|
(FM10K) family of 40GbE/100GbE adapters.
|
|
|
|
|
2016-03-02 11:19:13 +00:00
|
|
|
FTAG Based Forwarding of FM10K
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
FTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs
|
|
|
|
support the addition of a Fabric Tag (FTAG) to carry special information.
|
|
|
|
The FTAG is placed at the beginning of the frame, it contains information
|
|
|
|
such as where the packet comes from and goes, and the vlan tag. In FTAG based
|
|
|
|
forwarding mode, the switch logic forwards packets according to glort (global
|
|
|
|
resource tag) information, rather than the mac and vlan table. Currently this
|
|
|
|
feature works only on PF.
|
|
|
|
|
|
|
|
To enable this feature, the user should pass a devargs parameter to the eal
|
2020-11-10 22:55:40 +00:00
|
|
|
like "-a 84:00.0,enable_ftag=1", and the application should make sure an
|
2016-03-02 11:19:13 +00:00
|
|
|
appropriate FTAG is inserted for every frame on TX side.
|
2015-12-10 07:00:21 +00:00
|
|
|
|
2016-02-26 05:56:41 +00:00
|
|
|
Vector PMD for FM10K
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O.
|
|
|
|
It improves load/store bandwidth efficiency of L1 data cache by using a wider
|
|
|
|
SSE/AVX ''register (1)''.
|
|
|
|
The wider register gives space to hold multiple packet buffers so as to save
|
|
|
|
on the number of instructions when bulk processing packets.
|
|
|
|
|
|
|
|
There is no change to the PMD API. The RX/TX handlers are the only two entries for
|
|
|
|
vPMD packet I/O. They are transparently registered at runtime RX/TX execution
|
|
|
|
if all required conditions are met.
|
|
|
|
|
|
|
|
Some constraints apply as pre-conditions for specific optimizations on bulk
|
|
|
|
packet transfers. The following sections explain RX and TX constraints in the
|
|
|
|
vPMD.
|
|
|
|
|
|
|
|
|
|
|
|
RX Constraints
|
|
|
|
~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
|
|
Prerequisites and Pre-conditions
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
For Vector RX it is assumed that the number of descriptor rings will be a power
|
|
|
|
of 2. With this pre-condition, the ring pointer can easily scroll back to the
|
|
|
|
head after hitting the tail without a conditional check. In addition Vector RX
|
|
|
|
can use this assumption to do a bit mask using ``ring_size - 1``.
|
|
|
|
|
|
|
|
|
|
|
|
Features not Supported by Vector RX PMD
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
Some features are not supported when trying to increase the throughput in
|
|
|
|
vPMD. They are:
|
|
|
|
|
|
|
|
* IEEE1588
|
|
|
|
|
|
|
|
* Flow director
|
|
|
|
|
|
|
|
* RX checksum offload
|
|
|
|
|
|
|
|
Other features are supported using optional MACRO configuration. They include:
|
|
|
|
|
|
|
|
* HW VLAN strip
|
|
|
|
|
|
|
|
* L3/L4 packet type
|
|
|
|
|
|
|
|
To enable via ``RX_OLFLAGS`` use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``.
|
|
|
|
|
2018-05-23 02:56:07 +00:00
|
|
|
To guarantee the constraint, the following capabilities in ``dev_conf.rxmode.offloads``
|
2016-02-26 05:56:41 +00:00
|
|
|
will be checked:
|
|
|
|
|
2021-10-22 11:03:12 +00:00
|
|
|
* ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND``
|
2016-02-26 05:56:41 +00:00
|
|
|
|
2021-10-22 11:03:12 +00:00
|
|
|
* ``RTE_ETH_RX_OFFLOAD_CHECKSUM``
|
2016-02-26 05:56:41 +00:00
|
|
|
|
|
|
|
|
|
|
|
RX Burst Size
|
|
|
|
^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
As vPMD is focused on high throughput, it processes 4 packets at a time. So it assumes
|
|
|
|
that the RX burst should be greater than 4 packets per burst. It returns zero if using
|
|
|
|
``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not a multiple of 4, a
|
|
|
|
floor alignment will be applied.
|
|
|
|
|
|
|
|
|
|
|
|
TX Constraint
|
|
|
|
~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
Features not Supported by TX Vector PMD
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
2018-05-23 02:56:07 +00:00
|
|
|
TX vPMD only works when offloads is set to 0
|
2016-02-26 05:56:41 +00:00
|
|
|
|
2018-05-23 02:56:07 +00:00
|
|
|
This means that it does not support any TX offload.
|
2016-02-26 05:56:41 +00:00
|
|
|
|
2015-12-10 07:00:21 +00:00
|
|
|
Limitations
|
|
|
|
-----------
|
|
|
|
|
|
|
|
|
|
|
|
Switch manager
|
|
|
|
~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
The Intel FM10000 family of NICs integrate a hardware switch and multiple host
|
2021-11-22 10:50:44 +00:00
|
|
|
interfaces. The FM10000 PMD only manages host interfaces. For the
|
2019-11-12 19:33:41 +00:00
|
|
|
switch component another switch driver has to be loaded prior to the
|
2021-11-22 10:50:44 +00:00
|
|
|
FM10000 PMD. The switch driver can be acquired from Intel support.
|
2015-12-10 07:00:21 +00:00
|
|
|
Only Testpoint is validated with DPDK, the latest version that has been
|
2016-06-08 08:43:50 +00:00
|
|
|
validated with DPDK is 4.1.6.
|
2015-12-10 07:00:21 +00:00
|
|
|
|
2017-10-24 13:45:52 +00:00
|
|
|
Support for Switch Restart
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
For FM10000 multi host based design a DPDK app running in the VM or host needs
|
|
|
|
to be aware of the switch's state since it may undergo a quit-restart. When
|
|
|
|
the switch goes down the DPDK app will receive a LSC event indicating link
|
|
|
|
status down, and the app should stop the worker threads that are polling on
|
|
|
|
the Rx/Tx queues. When switch comes up, a LSC event indicating ``LINK_UP`` is
|
|
|
|
sent to the app, which can then restart the FM10000 port to resume network
|
|
|
|
processing.
|
|
|
|
|
2019-10-18 15:06:57 +00:00
|
|
|
CRC stripping
|
|
|
|
~~~~~~~~~~~~~
|
2015-12-10 07:00:21 +00:00
|
|
|
|
|
|
|
The FM10000 family of NICs strip the CRC for every packets coming into the
|
2018-09-04 10:12:56 +00:00
|
|
|
host interface. So, keeping CRC is not supported.
|
2015-12-10 07:00:21 +00:00
|
|
|
|
|
|
|
Maximum packet length
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
The FM10000 family of NICS support a maximum of a 15K jumbo frame. The value
|
ethdev: fix max Rx packet length
There is a confusion on setting max Rx packet length, this patch aims to
clarify it.
'rte_eth_dev_configure()' API accepts max Rx packet size via
'uint32_t max_rx_pkt_len' field of the config struct 'struct
rte_eth_conf'.
Also 'rte_eth_dev_set_mtu()' API can be used to set the MTU, and result
stored into '(struct rte_eth_dev)->data->mtu'.
These two APIs are related but they work in a disconnected way, they
store the set values in different variables which makes hard to figure
out which one to use, also having two different method for a related
functionality is confusing for the users.
Other issues causing confusion is:
* maximum transmission unit (MTU) is payload of the Ethernet frame. And
'max_rx_pkt_len' is the size of the Ethernet frame. Difference is
Ethernet frame overhead, and this overhead may be different from
device to device based on what device supports, like VLAN and QinQ.
* 'max_rx_pkt_len' is only valid when application requested jumbo frame,
which adds additional confusion and some APIs and PMDs already
discards this documented behavior.
* For the jumbo frame enabled case, 'max_rx_pkt_len' is an mandatory
field, this adds configuration complexity for application.
As solution, both APIs gets MTU as parameter, and both saves the result
in same variable '(struct rte_eth_dev)->data->mtu'. For this
'max_rx_pkt_len' updated as 'mtu', and it is always valid independent
from jumbo frame.
For 'rte_eth_dev_configure()', 'dev->data->dev_conf.rxmode.mtu' is user
request and it should be used only within configure function and result
should be stored to '(struct rte_eth_dev)->data->mtu'. After that point
both application and PMD uses MTU from this variable.
When application doesn't provide an MTU during 'rte_eth_dev_configure()'
default 'RTE_ETHER_MTU' value is used.
Additional clarification done on scattered Rx configuration, in
relation to MTU and Rx buffer size.
MTU is used to configure the device for physical Rx/Tx size limitation,
Rx buffer is where to store Rx packets, many PMDs use mbuf data buffer
size as Rx buffer size.
PMDs compare MTU against Rx buffer size to decide enabling scattered Rx
or not. If scattered Rx is not supported by device, MTU bigger than Rx
buffer size should fail.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
2021-10-18 13:48:48 +00:00
|
|
|
is fixed and cannot be changed. So, even when the ``rxmode.mtu``
|
2015-12-10 07:00:21 +00:00
|
|
|
member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames
|
|
|
|
up to 15364 bytes can still reach the host interface.
|
2016-03-08 17:16:35 +00:00
|
|
|
|
|
|
|
Statistic Polling Frequency
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
The FM10000 NICs expose a set of statistics via the PCI BARs. These statistics
|
|
|
|
are read from the hardware registers when ``rte_eth_stats_get()`` or
|
|
|
|
``rte_eth_xstats_get()`` is called. The packet counting registers are 32 bits
|
|
|
|
while the byte counting registers are 48 bits. As a result, the statistics must
|
|
|
|
be polled regularly in order to ensure the consistency of the returned reads.
|
|
|
|
|
|
|
|
Given the PCIe Gen3 x8, about 50Gbps of traffic can occur. With 64 byte packets
|
|
|
|
this gives almost 100 million packets/second, causing 32 bit integer overflow
|
|
|
|
after approx 40 seconds. To ensure these overflows are detected and accounted
|
|
|
|
for in the statistics, it is necessary to read statistic regularly. It is
|
|
|
|
suggested to read stats every 20 seconds, which will ensure the statistics
|
|
|
|
are accurate.
|
2016-02-05 04:57:46 +00:00
|
|
|
|
|
|
|
|
|
|
|
Interrupt mode
|
|
|
|
~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
The FM10000 family of NICS need one separate interrupt for mailbox. So only
|
|
|
|
drivers which support multiple interrupt vectors e.g. vfio-pci can work
|
|
|
|
for fm10k interrupt mode.
|