Multiple PMDs have dummy/noop Rx/Tx packet burst functions.
These dummy functions are very simple, introduce a common function in
the ethdev and update drivers to use it instead of each driver having
its own functions.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Currently, most ethdev callback API use queue ID as parameter, but Rx
and Tx queue release callback use queue object which is used by Rx and
Tx burst data plane callback.
To align with other eth device queue configuration callbacks:
- queue release callbacks are changed to use queue ID
- all drivers are adapted
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Currently, the secondary process port UAR register mapping used by Tx
queue is done during port initializing.
Unluckily, in port hot-plug case, the secondary process will be
requested to initialize the port when primary process probe the port.
At that time, the port Tx queue number is still not configured, the
secondary process get Tx queue number as 0. This causes the UAR register
not be mapped as secondary process get Tx queue number 0.
This commit adds the check of Tx queue number in secondary process when
port starts is requested. Once the Tx queue number is not matching, do
UAR mapping with the latest Tx queue number.
Fixes: 0203d33a10 ("net/mlx4: support secondary process")
Cc: stable@dpdk.org
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The rte_ethdev_driver.h, rte_ethdev_vdev.h and rte_ethdev_pci.h files are
for drivers only and should be a private to DPDK and not installed.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Steven Webster <steven.webster@windriver.com>
The variable storages of the same name are merged together
if compiled with -fcommon. This is the default.
This default behaviour allows to declare a variable in a header file and
share the variable in every .o binaries thanks to merge at link-time.
In the case of dlopen linking of the glue library, the pointer mlx4_glue
is referencing the glue functions struct and is set after calling
dlopen.
If compiling with -fno-common (default in GCC 10), the variables must be
declared as extern to avoid multiple re-definitions.
In case the glue layer is split in glue library, the variable mlx4_glue
needs to have its own storage for the rest of the PMD.
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Matan Azrad <matan@mellanox.com>
UAR (User Access Region) register does not need to be remapped for
primary process but it should be remapped only for secondary process.
UAR register table is in the process private structure in
rte_eth_devices[],
(struct mlx4_proc_priv *)rte_eth_devices[port_id].process_private
The actual UAR table follows the data structure and the table is used
for both Tx and Rx.
For Tx, BlueFlame in UAR is used to ring the doorbell.
MLX4_TX_BFREG(txq) is defined to get a register for the txq. Processes
access its own private data to acquire the register from the UAR table.
For Rx, the doorbell in UAR is required in arming CQ event. However, it
is a known issue that the register isn't remapped for secondary process.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
In order to support secondary process, a few features are required.
a) rdma-core library should allocate device resources using DPDK's
memory allocator.
b) UAR should be remapped for secondary processes. Currently, in order
not to use different data structure for secondary processes, PMD
tries to reserve identical virtual address space for both primary
and secondary processes.
c) IPC channel is necessary, which can be easily set with rte_mp APIs.
Through the channel, Verbs command FD is delivered to the secondary
process and the device stop/start event is also broadcast from
primary process.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
RTE_MBUF_INDIRECT() is replaced with RTE_MBUF_CLONED() and removed.
This macro was deprecated in release 18.05 when EXT_ATTACHED_MBUF was
introduced.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
There's some performance drop due to extra condition checks on the
datapath. Checking for external memory registration should be consolidated
to the existing bottom-half.
Fixes: 31912d9924 ("net/mlx4: support externally allocated static memory")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
When MLX PMD registers memory for DMA, it accesses the global memseg list
of DPDK to maximize the range of registration so that LKey search can be
more efficient. Granularity of MR registration is per page.
Externally allocated memory shouldn't be used for DMA because it can't be
searched in the memseg list and free event can't be tracked by DPDK. If it
is used, the following error will occur:
net_mlx5: port 0 unable to find virtually contiguous chunk for
address (0x5600017587c0). rte_memseg_contig_walk() failed.
There's a pending patchset [1] which enables externally allocated memory.
Once it is merged, users can register their own memory out of EAL then that
will resolve this issue.
Meanwhile, if the external memory is static (allocated on startup and never
freed), such memory can also be registered by little tweak in the code.
[1] http://patches.dpdk.org/project/dpdk/list/?series=1415
This patch is not a bug fix but needs to be included in stable versions.
Fixes: 9797bfcce1 ("net/mlx4: add new memory region support")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.
There are multiple layers for MR search.
L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx4_mr_lookup_cache().
If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx4_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.
If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx4_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.
If L3 misses, a new MR for the address should be created -
mlx4_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.
In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.
To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.
'--legacy-mem' is also supported.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
This patch removes current support of Memory Region (MR) in order to
accommodate the dynamic memory hotplug patch. This patch can be compiled
but traffic can't flow and HW will raise faults. Subsequent patches will
add new MR support.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Since its inception, the rte_flow RSS action has been relying in part on
external struct rte_eth_rss_conf for compatibility with the legacy RSS API.
This structure lacks parameters such as the hash algorithm to use, and more
recently, a method to tell which layer RSS should be performed on [1].
Given struct rte_eth_rss_conf will never be flexible enough to represent a
complete RSS configuration (e.g. RETA table), this patch supersedes it by
extending the rte_flow RSS action directly.
A subsequent patch will add a field to use a non-default RSS hash
algorithm. To that end, a field named "types" replaces the field formerly
known as "rss_hf" and standing for "RSS hash functions" as it was
confusing. Actual RSS hash function types are defined by enum
rte_eth_hash_function.
This patch updates all PMDs and example applications accordingly.
It breaks ABI compatibility for the following public functions:
- rte_flow_copy()
- rte_flow_create()
- rte_flow_query()
- rte_flow_validate()
[1] commit 676b605182 ("doc: announce ethdev API change for RSS
configuration")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Previous to this commit mlx4 CRC stripping was executed by default and
there was no verbs API to disable it.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Ethdev Rx offloads API has changed since:
commit ce17eddefc ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Ethdev Tx offloads API has changed since:
commit cba7f53b71 ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
This counter saved the descriptor elements which are waiting to be
completed and was used to know if completion function should be
called.
This completion check can be done by other elements management
variables and we can prevent this counter management.
Remove this counter and replace the completion check easily by other
elements management variables.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The previuse code took a send queue entry size for stamping from the
send queue entry pointed by completion queue entry; This 2 reads were
done per packet in completion stage.
The completion burst packets number is managed by fixed size stored in
Tx queue, so we can infer that each valid completion entry actually frees
the next fixed number packets.
The descriptors ring holds the send queue entry, so we just can infer
all the completion burst packet entries size by simple calculation and
prevent calculations per packet.
Adjust completion functions to free full completion bursts packets
by one time and prevent per packet work queue entry reads and
calculations.
Save only start of completion burst or Tx burst send queue entry
pointers in the appropriate descriptor element.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The Tx queue send ring was managed by Tx block head,tail,count and mask
management variables which were used for managing the send queue remain
space and next places of empty or completed work queue entries.
This method suffered from an actual addresses recalculation per packet,
an unnecessary Tx block based calculations and an expensive dual
management of Tx rings.
Move send queue ring calculation to be based on actual addresses while
managing it by descriptors ring indexes.
Add new work queue entry pointer to the descriptor element to hold the
appropriate entry in the send queue.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This patch improves Rx packet type offload report in case the device is
a virtual function device.
In these devices we observed that the L2 tunnel flag is set also for
non-tunneled packets, this leads to a complete misinterpretation of the
packet type being received.
This issue occurs since the tunnel_mode is not set to 0x7 by the driver
for virtual devices and therefore the value in the L2 tunnel flag is
meaningless and should be ignored.
Fixes: aee4a03fee ("net/mlx4: enhance Rx packet type offloads")
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Memory regions assigned to hardware and used during Tx/Rx are mapped to
mbuf pools. Each Rx queue creates its own MR based on the mempool
provided during queue setup, while each Tx queue looks up and registers
MRs for all existing mbuf pools instead.
Since most applications use few large mbuf pools (usually only a single
one per NUMA node) common to all Tx/Rx queues, the above approach wastes
hardware resources due to redundant MRs. This negatively affects
performance, particularly with large numbers of queues.
This patch therefore makes the entire MR registration common to all
queues using a reference count. A spinlock is added to protect against
asynchronous registration that may occur from the Tx side where new
mempools are discovered based on mbuf data.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Associate memory region to mempool (on data path) in a short function.
Handle the less common case of adding a new memory region to mempool
in a separate function.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Various hardware limitations apply to RSS indirection tables, one of
them being they must be an exact 1:1 mapping of the configured Rx queue
indices.
While this restriction is enforced when creating RSS flow rules, it is
not the case when Rx queues themselves are created; underlying WQ
numbers are assigned in turn, not according to queue index.
Applications such as l3fwd-power that create Rx queues from highest to
lowest index (or any other non-sequential order) thus fail to get a
working RSS context.
This commit postpones WQ initialization to dev_start(), once all Rx
queues are configured in order to address this issue.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This patch adds loopback functionality used when the chip is a VF in order
to enable packet transmission between VFs and PF.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This patch adds hardware offloading support for IPV4, UDP and TCP checksum
verification, including inner/outer checksums on supported tunnel types.
It also restores packet type recognition support.
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This patch adds hardware offloading support for IPv4, UDP and TCP checksum
calculation, including inner/outer checksums on supported tunnel types.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This patch adds support for accessing the hardware directly when
handling Rx packets eliminating the need to use Verbs in the Rx data
path.
Rx scatter support: calculate the number of scatters on the fly
according to the maximum expected packet size.
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Modify PMD to send single-buffer packets directly to the device
bypassing the Verbs Tx post and poll routines.
Tx gather support: add support for transmitting packets spanning
over multiple buffers.
Take into consideration the amount of entries a packet occupies
in the TxQ when setting the report-completion flag of the chip.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This patch dissociates single-queue indirection tables and hash QP objects
from Rx queue structures to relinquish their control to users through the
RSS flow rule action, while simultaneously allowing multiple queues to be
associated with RSS contexts.
Flow rules share identical RSS contexts (hashed fields, hash key, target
queues) to save on memory and other resources. The trade-off is some added
complexity due to reference counters management on RSS contexts.
The QUEUE action is re-implemented on top of an automatically-generated
single-queue RSS context.
The following hardware limitations apply to RSS contexts:
- The number of queues in a group must be a power of two.
- Queue indices must be consecutive, for instance the [0 1 2 3] set is
allowed, however [3 2 1 0], [0 2 1 3] and [0 0 1 1 2 3 3 3] are not.
- The first queue of a group must be aligned to a multiple of the context
size, e.g. if queues [0 1 2 3 4] are defined globally, allowed group
combinations are [0 1] and [2 3]; groups [1 2] and [3 4] are not
supported.
- RSS hash key, while configurable per context, must be exactly 40 bytes
long.
- The only supported hash algorithm is Toeplitz.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Work queues (WQs) are lower-level than standard queue pairs (QPs). They are
dedicated to one traffic direction and have to be used in conjunction with
indirection tables and special "hash" QPs to get the same level of
functionality.
These extra objects however are the building blocks for RSS support brought
by subsequent commits, as a single "hash" QP can manage several WQs through
an indirection table according to a hash algorithm and other parameters.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Since live Tx and Rx queues cannot be reused anymore without being
destroyed first, mbuf ring sizes are fixed and known from the start.
This allows a single allocation for queue data structures and mbuf ring
together, saving space and bringing them closer in memory.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
DPDK ensures that setup functions are never called on configured queues,
or only if they have previously been released.
PMDs therefore do not need to deal with the unexpected reconfiguration of
live queues which may fail with no easy way to recover. Dropping support
for this scenario greatly simplifies the code as allocation and setup steps
and checks can be merged.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When not in isolated mode, a flow rule is automatically configured by the
PMD to receive traffic addressed to the MAC address of the device. This
somewhat duplicates flow API functionality.
Remove legacy support for internal flow rules to instead handle them
through the flow API implementation.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Add missing comments and fix those not Doxygen-friendly.
Since the private structure definition is modified, use this opportunity to
add one remaining missing include required by one of its fields
(sys/queue.h for LIST_HEAD()).
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.
No impact on functionality.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.
No impact on functionality.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit groups all data plane functions (Rx/Tx) into a separate file
and adjusts header files accordingly.
Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.
No impact on functionality.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Except for a minor documentation update on internal structure definitions
to make them more Doxygen-friendly, there is no impact on functionality.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>