Add LINUX VFIO support to feature list.
Add GRE tunneling offload to supported feature list.
Update firmware references to use newer FW image 8.33.12.0. Update
management FW version as well to 8.34.x.x.
Signed-off-by: Rasesh Mody <rasesh.mody@cavium.com>
Remove reference to deprecated ethdev offload parameter from octeontx
documentation.
Fixes: a92870896b ("net/octeontx: use the new offload APIs")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Remove reference to deprecated ethdev offload parameter from thunderx
documentation.
Fixes: c97da2cbc2 ("net/thunderx: convert to new offload API")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Since we move to new offload APIs, IXGBE_SIMPLE_FLAGS is not used.
Fixes: 51215925a3 ("net/ixgbe: convert to new Tx offloads API")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Add force_link_up devargs to always force link as up for VFs.
This enables VFs on the same NIC to send traffic to each other
even when physical link is down.
Also add RTE_PMD_REGISTER_PARAM_STRING to export all supported
devargs.
Fixes: 011ebc236d ("net/cxgbe: add skeleton VF driver")
Signed-off-by: Shagun Agrawal <shaguna@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
This patch corrects some spelling issues in i40e.rst and clarifies
which controllers and connections are part of the 700 Series.
Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.
Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.
There are multiple layers for MR search.
L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx4_mr_lookup_cache().
If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx4_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.
If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx4_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.
If L3 misses, a new MR for the address should be created -
mlx4_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.
In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.
To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.
'--legacy-mem' is also supported.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.
There are multiple layers for MR search.
L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().
If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.
If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.
If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.
In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.
To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.
'--legacy-mem' is also supported.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
There are two capabilities related to CRC stripping:
1. mlx4 HW capability to perform CRC stripping on a received packet.
This capability is built in mlx4 HW. It should be returned by the API
call mlx4_get_rx_queue_offloads().
2. mlx4 driver capability to enable/disable HW CRC stripping. This
capability is dependent on the driver version.
Before this commit the second capability was falsely returned by
the mentioned API. This commit fixes it by returning the first
capability.
mlx4 HW performs CRC stripping by default and this capability is
always reported as "true".
The ability to enable/disable CRC stripping is supported since this
commit and requires OFED version 4.3-1.5.0.0 or rdma-core version v18.
CRC stripping will be done by default regardless of its configuration
when working with OFED or rdma-core versions earlier than those
previously specified or before this commit.
Fixes: de1df14e6e ("net/mlx4: support CRC strip toggling")
Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Report on TAP supported RSS functions as part of dev_infos_get
callback: ETH_RSS_IP, ETH_RSS_UDP and ETH_RSS_TCP.
Known limitation: TAP supports all of the above hash functions together
and not in partial combinations.
Previous to this commit RSS support was reported as none. Since the
introduction of [1] it is required that all RSS configurations will be
verified.
[1] commit 8863a1fbfc ("ethdev: add supported hash function check")
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Company policy discourages the mention of unreleased hardware in
guides and release notes.
Fixes: 08df773 ("doc: update enic guide and features")
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
Add more descriptions regarding SR-IOV and RSS settings.
Remove 'Multicast MAC filter' and add 'Allmulticast mode' to the
features.
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
New version of the packages with dependencies for the szedata2
driver is needed due to the new API of the libsze2 library which
is used in the driver.
The documentation and the release notes are updated to contain
the information about the required versions.
Signed-off-by: Matej Vido <vido@cesnet.cz>
Acked-by: Jan Remes <remes@netcope.com>
Add device argument to customize Rx descriptor wait timeout which
is supported in DPDK firmware variant only in equal stride super-buffer
Rx mode only.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
DPDK firmware variant supports equal stride super-buffer Rx mode which
provides higher packet rate and packet marks but requires dedicated
mempool manager with contiguous object block allocation (e.g. bucket).
Also the firmware supports subvariant without checksumming on Tx which
allows to reach higher packet rates on transmit if checksumming is not
required.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
HW Rx descriptor represents many contiguous packet buffers which
follow each other. Number of buffers, stride and maximum DMA
length are setup-time configurable per Rx queue based on provided
mempool. The mempool must support contiguous block allocation and
get info API to retrieve number of objects in the block.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Add support for virtual function representor ports to the ixgbe PF
driver. When SR-IOV virtual functions devices are enabled a
corresponding representor port for each VF can be enabled in the
process in which the ixgbe PMD is running within, by specifying the
representor devargs with the list of VF ports that representors
are to be created for.
An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:
-w DBDF,representor=[0,2,4-7]
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add support for virtual function representor ports to the i40e PF
driver. When SR-IOV virtual functions devices are enabled a
corresponding representor port for each VF can be enabled, in the
process in which the i40e PMD is running, by specifying the
representor devargs with the list of VF ports that representors
are to be created for.
An example of the devargs which would create VF representor for virtual
functions 0,2,4,5,6 and 7 is:
-w DBDF,representor=[0,2,4-7]
and to just specify a single representor on virtual function 3 (switch
port id):
-w DBDF,representor=3
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
This patch support L3 VXLAN, no inner L2 header comparing to standard
VXLAN protocol. L3 VXLAN using specific overlay UDP destination port to
discriminate against standard VXLAN, device parameter and FW has to be
configured to support it:
sudo mlxconfig -d <device> -y s IP_OVER_VXLAN_EN=1
sudo mlxconfig -d <device> -y s IP_OVER_VXLAN_PORT=<port>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
TPID handling in rte_flow VLAN and E_TAG pattern item definitions is not
consistent with the normal stacking order of pattern items, which is
confusing to applications.
Problem is that when followed by one of these layers, the EtherType field
of the preceding layer keeps its "inner" definition, and the "outer" TPID
is provided by the subsequent layer, the reverse of how a packet looks like
on the wire:
Wire: [ ETH TPID = A | VLAN EtherType = B | B DATA ]
rte_flow: [ ETH EtherType = B | VLAN TPID = A | B DATA ]
Worse, when QinQ is involved, the stacking order of VLAN layers is
unspecified. It is unclear whether it should be reversed (innermost to
outermost) as well given TPID applies to the previous layer:
Wire: [ ETH TPID = A | VLAN TPID = B | VLAN EtherType = C | C DATA ]
rte_flow 1: [ ETH EtherType = C | VLAN TPID = B | VLAN TPID = A | C DATA ]
rte_flow 2: [ ETH EtherType = C | VLAN TPID = A | VLAN TPID = B | C DATA ]
While specifying EtherType/TPID is hopefully rarely necessary, the stacking
order in case of QinQ and the lack of documentation remain an issue.
This patch replaces TPID in the VLAN pattern item with an inner
EtherType/TPID as is usually done everywhere else (e.g. struct vlan_hdr),
clarifies documentation and updates all relevant code.
It breaks ABI compatibility for the following public functions:
- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()
Summary of changes for PMDs that implement ETH, VLAN or E_TAG pattern
items:
- bnxt: EtherType matching is supported with and without VLAN, but TPID
matching is not and triggers an error.
- e1000: EtherType matching is only supported with the ETHERTYPE filter,
which does not support VLAN matching, therefore no impact.
- enic: same as bnxt.
- i40e: same as bnxt with existing FDIR limitations on allowed EtherType
values. The remaining filter types (VXLAN, NVGRE, QINQ) do not support
EtherType matching.
- ixgbe: same as e1000, with additional minor change to rely on the new
E-Tag macro definition.
- mlx4: EtherType/TPID matching is not supported, no impact.
- mlx5: same as bnxt.
- mvpp2: same as bnxt.
- sfc: same as bnxt.
- tap: same as bnxt.
Fixes: b1a4b4cbc0 ("ethdev: introduce generic flow API")
Fixes: 99e7003831 ("net/ixgbe: parse L2 tunnel filter")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Add new callbacks for eth_dev_ops of i40e to get the information
and data of plugin module eeprom.
Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Add new callbacks for eth_dev_ops of e1000 to get the information and
data of plugin module EEPROM.
Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Add new callbacks for eth_dev_ops of ixgbe to get the information
and data of plugin module eeprom.
Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Expose the runtime queue configuration capability and enhance
i40e_dev_[rx|tx]_queue_setup to handle the situation when
device already started.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
It's not possible to setup a queue when the port is started
because of a check in ethdev layer. New capability flags are
added in order to relax this check for devices which support
queue setup in runtime. The functions rte_eth_[rx|tx]_queue_setup
will raise an error only if the port is started and runtime setup
of queue is not supported.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
If the netvsc driver starts in blacklist mode, it does not
automatically probe IP associated netvsc devices. Therefore, the only
way to probe them is to specify them by the EAL command line, using the
"force" parameter to skip the IP check in the driver.
>From now on, the user does not need to add the "force" parameter if he
specifies an IP associated netvsc device by the EAL command line, and the
responsibility of the IP check is now in the user's hands.
However, in the absence of any specification, the driver still skips IP
associated netvsc devices.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
Update link status related feature document items and minor updates in
some link status related functions.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Recent NIC models support overlay offload. The overlay offload
feature enables the following on the NIC.
- Rx/Tx checksum offloads for both inner and outer packets.
- Rx inner packet type classification.
- TSO.
- Inner RSS.
TX descriptors do not require any changes, except the header length
for TSO. The NIC parses outer/inner packets and performs offloads on
them as necessary. The header length for tunneled TSO includes both
inner and outer headers.
The NIC actually parses and performs the above for NVGRE as well. DPDK
currently has no offload flags for NVGRE, and the hardware has no
controls to individually enable tunnel types either. So do nothing for
now.
The driver enables overlay offload by default. Add a devargs
'disable-overlay=<0|1>' to allow the app to disable it.
Also update the enic guide doc.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
If we want a virtio device to work in vDPA (vhost data path acceleration)
mode, we could add a "vdpa=1" devarg for this device to specify the mode.
This patch let virtio pmd skip device probe when detecting this parameter.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>