Since the queue release API does not allow failures (return value is void)
and the flow API does not allow a queue to be released as long as a flow
rule depends on it, the only rational decision to avoid undefined behavior
is to panic in this situation.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Flow queues array offset is not properly computed causing all sorts of
issues. Address it by making the rte_flow structure less complex.
Fixes: 360663e1df ("net/mlx5: prepare support for RSS action rule")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Removing this check improves performance as VLAN and CRC stripping are
enabled most of the time.
Convert MLX5_CQE_VLAN_STRIPPED to network order to speed up the check
instead of doing it on the completion queue entry field.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The patch change the prototype of callback function
(rte_intr_callback_fn) by removing the unnecessary parameter.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Since patch "mbuf: structure reorganization" the compiler complains
sometimes (in some conditions):
.../drivers/net/mlx5/mlx5_rxtx.c: In function ‘mlx5_rx_burst’:
.../drivers/net/mlx5/mlx5_rxtx.c:2082:17: error: ‘len’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
len is not initialised as it will be at the first segment of a received
packet, but it remains hard for the compiler to determine it.
Fixes: 9964b965ad ("net/mlx5: re-add Rx scatter support")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
To optimize the freeing of the segments, we try try to only update
m->refcnt, m->next, and m->nb_segs when it's required (idea from
Konstantin Ananyev <konstantin.ananyev@intel.com>).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Flow validation already processes the final actions to verify if a rule can
be applied or not and the same is done during creation. As the creation
function relies on validation to generate and apply a rule, this job can be
fully handled by the validation function.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Remove unnecessary interface queries and the Resource Domain when
creating the Queue Pair. Since mlx5 PMD data path is on top of native
APIs, such Verbs library calls are no longer needed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Currently mlx5_dev_rss_reta_update() just updates tables in the host,
therefore it isn't immediately effective until restarting the device by
calling mlx5_dev_stop()/mlx5_dev_start() to update the changes in the
device side. This patch adds rebuilding the device-specific data
structure and applying it to the device right away.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
When querying and updating RSS RETA table, it always uses the max size of
the device instead of configured value. This patch fixes it and removed the
related comments in the code.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Mark ID in the completion queue entry is 24 bits, the remaining 8 bits are
reserved and may be nonzero. Do not take them into account when looking for
marked packets.
Fixes: ea3bc3b1df ("net/mlx5: support mark flow action")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
First segment size must be at least 18 bytes, packets not respecting this
are silently not sent by the NIC but counted as sent by the PMD. The only
way to figure out is compiling the PMD in debug mode.
Fixes: 6579c27c11 ("net/mlx5: remove gather loop on segments")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Toggle Rx scatter mode based on the scatter_enable flag and the maximum
packet size only instead of deriving this information from the jumbo_frame
setting and the MTU configuration.
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When configuring Rx/Tx queue, if queue already exists, it is reused. But if
the queue size is changed, it must be resized to not access/overwrite
invalid memory.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx
descriptor can carry multiple packets either by including pointers of
packets or by inlining packets. Inlining packet data can be helpful to
better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid
mode - mixing inlined packets and pointers in a descriptor. This feature is
enabled by default if supported by HW.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Since PKT_TX_TCP_SEG implies PKT_TX_TCP_CKSUM, the PMD must force this
flag.
The fix applied for both tunneled and non-tunneled packets.
Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Fixes: b247f34601 ("net/mlx5: support hardware TSO for VXLAN and GRE")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Checking whether the counter is IB counter was performed with the
wrong index.
Fixes: 859081d3fb ("net/mlx5: add out of buffer counter to extended statistic")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit adds support for hardware TSO for tunneled packets.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Prior to this commit Tx checksum offload was supported only for the
inner headers.
This commit adds support for the hardware to compute the checksum for the
outer headers as well.
The support is for tunneling protocols GRE and VXLAN.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Having a drop queue per drop flow consumes a lot of memory and reduce the
speed capabilities of the NIC to handle such cases.
To avoid this and reduce memory consumption, an RSS drop queue is created
for all drop flows.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Implement a basic flow RSS action. This commits don't handle the default
RSS queues already created by the control plane, this last part being huge.
Any new request RSS flow request will be added using an higher priority
than the default one to be sure this rule will be the one used.
Default ones (those created by dev_start()) remains but has they have a
lower priority they will not receive any new packet.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In mlx5 PMD handling a single queue of several destination queues ends in
creating the same Verbs attribute, the main difference resides in the
indirection table and the RSS hash key.
This helps to prepare the supports to the RSS queues by first handling the
queue action has being an RSS queue with a single queue. No RSS hash key
will be provided to the Verbs flow.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
ibv_attr should be freed in the function which allocates the memory.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Creating a drop queue in mlx5 ends by creating a non polled queue, but if
the associated work queue could not be created the error was not handled
ending in a undefined situation.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Mark value is always reported even when not requested or invalid.
Fixes: ea3bc3b1df ("net/mlx5: support mark flow action")
Cc: stable@dpdk.org
Reported-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When flows cannot be re-applied due to configuration modifications, the
start function should rollback the configuration done.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The number of extended statistics counters is queried through ETHTOOL.
ETHTOOL provides a different number when the link is up or down.
Since extended statistics query occurs at device start,
segmentation fault might happen when changing the link state before and
after the device start.
this commit address this issue, and query the number of statistics
before every call to ETHTOOL.
Fixes: a4193ae3bc ("net/mlx5: support extended statistics")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Interface name is queried, however never used.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The indication on vlan stripping was taken from the wrong location in the
completion entry.
Fixes: 9964b965ad ("net/mlx5: re-add Rx scatter support")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit adds RX out of buffer counter to xstats report.
The counter counts the number of dropped occurred due to lack of buffers
on device RX queues.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Since there is no "descriptor done" flag like on Intel drivers, the
approach is different on mlx5 driver.
- for Tx, we call txq_complete() to free descriptors processed by
the hw, then we check if the descriptor is between tail and head
- for Rx, we need to browse the cqes, managing compressed ones,
to get the number of used descriptors.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Trying to query the link status through the new ETHTOOL_GLINKSETTINGS
ioctl available since Linux 4.5 was always failing due to a kernel bug
fixed since version 4.9.
This commit also addresses a common issue where the headers version used
at compile time differs from that of the kernel on the target system, by
always defining missing symbols and moving the kernel version check at run
time.
Fixes: 1884087198 ("net/mlx5: fix support for newer link speeds")
CC: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The total length field in descriptor of inlined multi-packet send must be
updated before closing a session. There's possibility of updating it
afterward. This bug might cause one packet out of MLX5_MPW_DSEG_MAX gets
silently dropped by HW and impact performance, especially lossless test.
Fixes: 230189d9ff ("net/mlx5: support multi-packet send")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
For some sizes of packets, the number of bytes copied in the work queue
element could be greater than the available size of the inline. In such
situation it could consume one more work queue element where it should
not.
Fixes: 0e8679fcdd ("net/mlx5: fix inline logic")
Cc: stable@dpdk.org
Reported-by: Elad Persiko <eladpe@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Fixes an issue which may occurs with the inline feature activated and a
packet greater than the max_inline requested.
In such situation, more work request elements can be consumed and in the
worst case override some still handled by the NIC, this can result in
sending garbage on the network or putting the work queue in error.
Fixes: 2a66cf3789 ("net/mlx5: support inline send")
Cc: stable@dpdk.org
Reported-by: Elad Persiko <eladpe@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
First two bytes of the Ethernet header was written twice at the same place.
Fixes: b8fe952ec5 ("net/mlx5: prepare Tx vectorization")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Adding a flow when the port is stopped ends in an inconsistent situation
where the queue can receive traffic when it should not.
Record new rules and apply them as soon as the port is started.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
A configuration structure for the MARK action must always be specified.
Fixes: ea3bc3b1df ("net/mlx5: support mark flow action")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Default masks were introduced in the API after its implementation in this
PMD.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Querying the link status can end up being in an inconsistent state,
like the port is reporting speed although it is down.
For this case another query is scheduled.
A race condition can occur between the scheduled query and link
status interrupt handlers.
When the scheduled query by-pass interrupt handlers, the link status
will be stuck in an inconsistent state.
This patch addresses the race condition by not blocking link status
queries in case delayed query is used.
Fixes: 198a3c339a ("mlx5: handle link status interrupts")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Size of the mask is wrongly computed and make the validation process only
verify the first 4 bytes of the layer.
Fixes: 2097d0d1e2 ("net/mlx5: support basic flow items and actions")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
TCI field is read from the wrong place due to an invalid cast. Moreover
there is no need to limit matching to VID since PCP and DEI bits can be
matched as well.
Fixes: 12475fb203 ("net/mlx5: support VLAN flow item")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
in case of an error argument list is not freed.
Fixes: e72dd09b61 ("net/mlx5: add support for configuration through kvargs")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
When the WQ is wrapped around, it wrongly checks the condition when
resetting the pointer. It should be compared against the end of the queue,
not the beginning of the queue. And this isn't even needed when the length
of the copying data crosses the boundary.
Fixes: fdcb0f5305 ("net/mlx5: use work queue buffer as a raw buffer")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The size of Rx RSS indirection table was limited by 256, but it is not
required anymore for all Mellanox NICs. However, the librte_ether still
limits the size by 512.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
On receiving a compressed session of Rx completion, prefetch every entries
to be invalidated. Also, invalidate consumed completions per every 8
mini-completions, not to wait until the last entry is consumed. This helps
to reduce jitter in rx_burst.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Mellanox PMDs do not differentiate IP header with or without options, so
the advertised packet type for an IPv4 should not be RTE_PTYPE_L3_IPV4,
which explicitly means "does not contain any header option".
Change the driver to set
RTE_PTYPE(_INNER)_L3_IPV4_EXT_UNKNOWN or
RTE_PTYPE(_INNER)_L3_IPV6_EXT_UNKNOWN flags for all IPv4/IPv6 packets
received.
Fixes: 429df3803a ("mlx4: replace some offload flags with packet type")
Fixes: 67fa62bc67 ("mlx5: support checksum offload")
Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Signed-off-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Retrieving link status information through the link update callback should
be quick and non-blocking.
Mellanox PMDs retrieve this information through ioctl() calls on the
related kernel netdevice. This appears to take a long time to
complete and may cause significant slowdowns in applications.
While these system calls cannot be accelerated, removing the lock on the
private structure allows applications to perform other control operations
from separate threads in the meantime. This function remains safe without
locking as it does not write the private structure, it is only used to
retrieve the name of the netdevice.
Signed-off-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Add PCI device ID for ConnectX-5 and enable multi-packet send for PF and VF
along with changing documentation and release note.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Dseg pointer is not initialized when the first segment is inlined
causing a segmentation fault in such situation.
Fixes: 2a66cf3789 ("net/mlx5: support inline send")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Too much data is uselessly written to the Tx doorbell, which since v16.11
may also cause Tx queues to behave erratically and crash applications.
This regression was seen on VF devices when the BlueFlame buffer size is
zero (txq->bf_buf_size) due to the following change:
- txq->bf_offset ^= txq->bf_buf_size;
+ txq->bf_offset ^= (1 << txq->bf_buf_size);
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Fixes: d5793daefe ("net/mlx5: reduce memory overhead for BF handling")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Prefetching completion queue entries is inefficient because too few CPU
cycles are spent before their use, which results into cache misses anyway.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use fewer instructions to copy the first two bytes of Ethernet headers to
work queue elements.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Gather function prototypes at the beginning of the file.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Let compiler automatically use the vector capabilities of the target
machine to optimize instructions.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Define a single work queue element type that encompasses them all. It
includes control, Ethernet segment and raw data all grouped in a single
place.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Prepare the code to write the Work Queue Element with vectorized
instructions.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Elad Persiko <eladpe@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commits adds:
- Type of service
- Next protocol ID
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Introduce initial software for rte_flow rules.
VLAN, VXLAN are still not supported.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Flows redirected to a specific queue do not have a valid RSS hash result
and the related mbuf flag must not be set.
Fixes: ecf60761fc ("net/mlx5: return RSS hash result in mbuf")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
We can leave the title completion queue entry untouched since its contents
are not modified.
Reported-by: Liming Sun <lsun@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Completion queue entry data uses network endian, to access them we should
use ntoh*().
Fixes: c305090bba ("net/mlx5: replace countdown with threshold for Tx completions")
Reported-by: Liming Sun <lsun@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The list of segments to free was wrongly manipulated ending by only freeing
the first segment instead of freeing all of them. The last one still
belongs to the NIC and thus should not be freed.
Fixes: a1bdb71a32 ("net/mlx5: fix crash in Rx")
Reported-by: Liming Sun <lsun@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
There is already a directory buildtools for pmdinfogen used by
the build system. The scripts used in makefiles are moved here.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
This makes struct rte_eth_dev independent of struct rte_pci_device by
replacing it with a pointer to the generic struct rte_device.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Only the drivers itself can decide if it could fill PCI information fields
of dev_info.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This moves the non-PCI related initialization of the link state interrupt
callback list and the setting of the default MTU to rte_eth_dev_allocate()
so that drivers only need to set non-default values.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Lets clear the eth_dev->data when allocating a new rte_eth_dev so that
drivers only need to set non-zero values.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Add a new macro RTE_PMD_REGISTER_KMOD_DEP() that allows a driver to
declare the list of kernel modules required to run properly.
Today, most PCI drivers require uio/vfio.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit redefines the completion queue element structure as the
original lacks the required fields.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When mbufs are smaller than MRU, multi-segment support must be enabled to
default set when not in promiscuous or allmulticast modes.
Fixes: 9964b965ad ("net/mlx5: re-add Rx scatter support")
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Constraint alignment was not respected in Tx because of a
wrong use of vector instruction.
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Signed-off-by: Elad Persiko <eladpe@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Not all speed capabilities can be reported properly before Linux 4.8 (25G,
50G and 100G speeds are missing), moreover the API to retrieve them only
exists since Linux 4.5, this commit thus implements compatibility code for
all versions.
Fixes: e274f57322 ("ethdev: add speed capabilities")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The changes introduced by previous commits (ones in fixes lines) made
secondaries attempt to reinitialize the Tx queue structures of the primary
instead of their own, for which they also do not allocate enough memory,
leading to crashes.
Fixes: 1d88ba1719 ("net/mlx5: refactor Tx data path")
Fixes: 21c8bb4928 ("net/mlx5: split Tx queue structure")
Signed-off-by: Olivier Gournet <ogournet@corp.free.fr>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit fixes link status report on device start up when
lcs callback is configured.
Fixes: 62072098b5 ("mlx5: support setting link up or down")
Signed-off-by: Olga Shern <olgas@mellanox.com>
add cb_arg parameter to the _rte_eth_dev_callback_process function.
Adding a parameter to this function allows passing information
to the application when an eth device event occurs such as
a VF to PF message.
This allows the application to decide if a particular function
is permitted.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Signed-off-by: Alex Zelezniak <alexz@att.com>
All macros related to driver registeration renamed from DRIVER_*
to RTE_PMD_*
This includes:
DRIVER_REGISTER_PCI -> RTE_PMD_REGISTER_PCI
DRIVER_REGISTER_PCI_TABLE -> RTE_PMD_REGISTER_PCI_TABLE
DRIVER_REGISTER_VDEV -> RTE_PMD_REGISTER_VDEV
DRIVER_REGISTER_PARAM_STRING -> RTE_PMD_REGISTER_PARAM_STRING
DRIVER_EXPORT_* -> RTE_PMD_EXPORT_*
Fix PMDINFOGEN tool to look for matches of RTE_PMD_REGISTER_*.
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
mlx5_rx_queue_setup() was setting the Rx function by itself instead of
using priv_select_rx_function() written for that purpose.
Fixes: cdab90cb5c ("net/mlx5: add Tx/Rx burst function selection wrapper")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Tx function was handling a double loop to send segmented packets, it can be
done in a single one.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
PMD uses only power of two number of Work Queue Elements (aka WQE), storing
the number of elements in log2 helps to reduce the size of the container to
store it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Blue Flame (aka BF) is a buffer allocated with a power of two value, its
size is returned by Verbs in log2.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
PMD uses only power of two number of Completion Queue Elements (aka CQE),
storing the number of elements in log2 helps to reduce the size of the
container to store it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
PMD uses only power of two number of descriptors, storing the number of
elements in log2 helps to reduce the size of the container to store it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Rework Work Queue Element (aka WQE) structures to fit PMD needs.
A WQE is an aggregation of 16 bytes elements known as "data segments"
(aka dseg).
The only common part is the first two elements i.e. the control one to
define the job type, and the Ethernet segment which embed offload requests
with other information, after that, it can have:
- a raw data packet,
- a data pointer to the packet itself,
- both.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
With recent gcc versions, e.g. gcc 6.1, compilation of mlx drivers with
debug enabled produces lots of errors complaining that "pedantic" is
not a warning level that can be ignored.
error: ‘-pedantic’ is not an option that controls warnings [-Werror=pragmas]
#pragma GCC diagnostic ignored "-pedantic"
^~~~~~~~~~~
These errors can be removed by changing the "-pedantic" to "-Wpedantic".
Fixes: 7fae69eeff ("mlx4: new poll mode driver")
Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To improve performance the NIC expects for large packets to have a pointer
to a cache aligned address, old inline code could break this assumption
which hurts performance.
Fixes: 2a66cf3789 ("net/mlx5: support inline send")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Rework logic of wqe_write() and wqe_write_vlan() which are pretty similar
to keep a single one.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This function was supposed to be inlined, but was not because several
functions calls it. This function should always be inline avoid
external function calls and to optimize code in data-path.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Packet rejection was routed to a polled queue. This patch route them to a
dummy queue which is not polled.
Fixes: 76f5c99e68 ("mlx5: support flow director")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This is done to prepare support for drop queues, which are not related to
existing Rx queues and need to be managed separately.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
memmove was moving bytes as the number of elements next to i, while it
should move the number of elements multiplied by the size of each element.
Fixes: e9086978 ("mlx5: support VLAN filtering")
Signed-off-by: Raslan Darawsheh <rdarawsheh@asaltech.com>
This capability is implemented but not reported.
Fixes: f3db948918 ("mlx5: support Rx VLAN stripping")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The return value in DPDK is negative errno on failure.
Since internal functions in mlx driver return positive
values need to negate this value when it returned to
dpdk layer.
Fixes: 76f5c99 ("mlx5: support flow director")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
The user is allowed to call ->rx_pkt_burst() even without free
mbufs in the pool. In this scenario we'll fail allocating a rep mbuf
on the first iteration (where pkt is still NULL). This would cause us
to deref a NULL pkt (reset refcount and free).
Fix this by checking the pkt before freeing it.
Fixes: a1bdb71a32 ("net/mlx5: fix crash in Rx")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Now that rte_device is available, drivers can start using its members
(numa, name) as well as link themselves into another rte_device list.
As of now no one is using this list, but can be used for moving over all
devices (pdev/vdev/Xdev) and perform bulk actions (like cleanup).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Reword commit log for extra rte_device list]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Remove the 'name' member from rte_pci_driver and move to generic
rte_driver.
Most of the PMD drivers were initially using DRIVER_REGISTER_PCI(<name>..)
as well as assigning a name to eth_driver.pci_drv.name member.
In this patch, only the original DRIVER_REGISTER_PCI(<name>..) name has
been populated into the rte_driver.name member - assignments through
eth_driver has been removed.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Rebase and expand changes to newly added files]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Now that hotplug has been moved to eal, there is no reason to keep the
device type in this layer.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Simplify crypto and ethdev pci drivers init by using newly introduced
init macros and helpers.
Those drivers then don't need to register as "rte_driver"s anymore.
Exceptions:
- virtio and mlx* use RTE_INIT directly as they have custom initialization
steps.
- VDEV devices are not modified - they continue to use PMD_REGISTER_DRIVER.
Update documentation for replacing an example referring to
PMD_REGISTER_DRIVER.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Probe and Remove are more appropriate names for PCI init and uninint
operations. This is a cosmetic change.
Only MLX* uses the PCI direct registration, bypassing PMD_* macro.
The callbacks for this too have been updated.
VDEV are left out. For them, init/uninit are more appropriate.
Suggested-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
As discussed in the past release, driver names are modified
to be more consistent, and the future driver should follow
this new convention.
Driver names consist of:
"driver category"_"driver folder name"_"optional extra name".
For example:
- Crypto null driver -> "crypto_null"
- Network IXGBE VF driver -> "net_ixgbe_vf"
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
RHEL 7.1's GCC for POWER8 reports the following error in one rte_memcpy()
macro call by mlx5:
error: array subscript is above array bounds [-Werror=array-bounds]
It appears to be a GCC bug which can be worked around by making parentheses
more explicit.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Fixed issue could occur when Mbuf starvation happens in a middle of
reception of a segmented packet. In such a situation, the PMD has to
release all segments of that packet. The end condition was wrong
causing it to free an Mbuf still handled by the NIC.
Fixes: 9964b965ad ("net/mlx5: re-add Rx scatter support")
Reported-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In mlx5 rx function, the packet_type and ol_flags mbuf fields are not
properly initialized when no rx offload feature is enabled (checksum, l2
tun checksum, vlan_strip, crc). Thus, these fields can have a value
different of 0 depending on their value when the mbuf was freed.
This can result in an incorrect application behavior if invalid
ol_flags/ptype are set, or memory corruptions if IND_ATTACHED_MBUF is
set in ol_flags.
Fixes: 081f7eae24 ("mlx5: process offload flags only when requested")
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Fixes: 62072098b5 ("mlx5: support setting link up or down")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
According to the documentation, the function
priv_set_flags(priv, keep, flags) should not modify the flags
in "keep" mask.
So 'flags' argument should be masked with '~keep' before ORing
it with the previous flags value.
This avoids messing up the kernel interface flags when calling
priv_set_flags(priv, ~IFF_UP, ~IFF_UP) in priv_set_link():
$ ip link
26: eth0: BROADCAST,MULTICAST,NOARP,ALLMULTI,PROMISC,DEBUG,\
DYNAMIC,AUTOMEDIA,PORTSEL,NOTRAILERS
Fixes: 7fae69eeff ("mlx4: new poll mode driver")
Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Reported-by: Fengtian Guo <fengtian.guo@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since now the PMD_REGISTER_DRIVER macro sets the driver names,
there is no need to have the rte_driver structure setting it
statically, as it will get overridden.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Compilation fails because of some typos.
Fixes: cb6696d220 ("drivers: update registration macro usage")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Modify the PMD_REGISTER_DRIVER macro, adding a name argument to it. The
addition of a name argument creates a token that can be used for subsequent
macros in the creation of unique symbol names to export additional bits of
information for use by the pmdinfogen tool. For example:
PMD_REGISTER_DRIVER(ena_driver, ena);
registers the ena_driver struct as it always did, and creates a symbol
const char this_pmd_name0[] __attribute__((used)) = "ena";
which pmdinfogen can search for and extract. The subsequent macro
DRIVER_REGISTER_PCI_TABLE(ena, ena_pci_id_map);
creates a symbol const char ena_pci_tbl_export[] __attribute__((used)) =
"ena_pci_id_map";
Which allows pmdinfogen to find the pci table of this driver
Using this pattern, we can export arbitrary bits of information.
pmdinfo uses this information to extract hardware support from an object
file and create a json string to make hardware support info discoverable
later.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Compilation errors:
mlx4:
drivers/net/mlx4/mlx4.c(5409): error #188:
enumerated type mixed with another type
priv->intr_handle.type = 0;
^
mlx5:
drivers/net/mlx5/mlx5_rxq.c(282): error #188:
enumerated type mixed with another type
enum hash_rxq_type type = 0;
^
and more same type of error.
Fix these by assigning enum values rather than integer values to the enum
variables
Fixes: c4da6caa42 ("mlx4: handle link status interrupts")
Fixes: 198a3c339a ("mlx5: handle link status interrupts")
Fixes: 0d2186743d ("mlx5: manage all special flow types at once")
Fixes: 612ad38209 ("mlx5: fix hash Rx queue type in RSS mode")
Fixes: 083c2dd317 ("mlx5: refactor special flows handling")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit brings back Rx scatter and related support by the MTU update
function. The maximum number of segments per packet is not a fixed value
anymore (previously MLX5_PMD_SGE_WR_N, set to 4 by default) as it caused
performance issues when fewer segments were actually needed as well as
limitations on the maximum packet size that could be received with the
default mbuf size (supporting at most 8576 bytes).
These limitations are now lifted as the number of SGEs is derived from the
MTU (which implies MRU) at queue initialization and during MTU update.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The primary purpose of rxq_rehash() function is to stop and restart
reception on a queue after re-posting buffers. This may fail if the array
that temporarily stores existing buffers for reuse cannot be allocated.
Update rxq_rehash() to work on the target queue directly (not through a
template copy) and avoid this allocation.
rxq_alloc_elts() is modified accordingly to take buffers from an existing
queue directly and update their refcount.
Unlike rxq_rehash(), rxq_setup() must work on a temporary structure but
should not allocate new mbufs from the pool while reinitializing an
existing queue. This is achieved by using the refcount-aware
rxq_alloc_elts() before overwriting queue data.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Toggling RX checksum offloads is already done at initialization time. This
code does not belong in rxq_rehash().
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Compared to its previous incarnation, the software limit on the number of
mbuf segments is no more (previously MLX5_PMD_SGE_WR_N, set to 4 by
default) hence no need for linearization code and related buffers that
permanently consumed a non negligible amount of memory to handle oversized
mbufs.
The resulting code is both lighter and faster.
With the addition of this code, older GCC versions (such
as 4.8.5) may complain about 'wqe' variable being uninitialized, so
initialize it preemptively, even though it is not necessary to do so.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The space necessary to store segmented packets cannot be known in advance
and must be verified for each of them.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This feature enables the TX burst function to emit up to 5 packets using
only two work queue entries (WQEs) on devices that support it. Saves PCI
bandwidth and improves performance.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Implement send inline feature which copies packet data directly into
work queue entries (WQEs) for improved latency. The maximum packet
size and the minimum number of Tx queues to qualify for inline send
are user-configurable.
This feature is effective when HW causes a performance bottleneck.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Replacing the variable countdown (which depends on the number of
descriptors) with a fixed relative threshold known at compile time improves
performance by reducing the TX queue structure footprint and the amount of
code to manage completions during a burst.
Completions are now requested at most once per burst after threshold is
reached.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Mini (compressed) completion queue entries (CQEs) are returned by the
NIC when PCI back pressure is detected, in which case the first CQE64
contains common packet information followed by a number of CQE8
providing the rest, followed by a matching number of empty CQE64
entries to be used by software for decompression.
Before decompression:
0 1 2 6 7 8
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 |
|-------| |---------| |-------| |-------| |-------| |-------|
| ..... | | cqe8[0] | | | . | | | | | ..... |
| ..... | | cqe8[1] | | | . | | | | | ..... |
| ..... | | ....... | | | . | | | | | ..... |
| ..... | | cqe8[7] | | | | | | | | ..... |
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
After decompression:
0 1 ... 8
+-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 |
|-------| |-------| |-------|
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | | ..... |
+-------+ +-------+ +-------+
This patch does not perform the entire decompression step as it would be
really expensive, instead the first CQE64 is consumed and an internal
context is maintained to interpret the following CQE8 entries directly.
Intermediate empty CQE64 entries are handed back to HW without further
processing.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
The intent is to replace the remaining compile-time options and environment
variables with a common mean of runtime configuration. This commit only
adds the kvargs handling code, subsequent commits will update the rest.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
These structures and macros extend those exposed by libmlx5 (in mlx5_hw.h)
to let the PMD manage work queue and completion queue elements directly.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The latest version of Mellanox OFED exposes hardware definitions necessary
to implement data path operation bypassing Verbs. Update the minimum
version requirement to MLNX_OFED >= 3.3 and clean up compatibility checks
for previous releases.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure rxq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure txq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Inline TX will be fully managed by the PMD after Verbs is bypassed in the
data path. Remove the current code until then.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
There is no scatter/gather support anymore, CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
has no purpose and can be removed.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. RX scatter cannot be maintained during the
transition and will be reimplemented later.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. TX gather cannot be maintained during the
transition and will be reimplemented later.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Except for the first time when memory registration occurs, the lkey is
always cached. Since memory registration is slow and performs system calls,
performance can be improved by moving that code to its own function outside
of the data path so only the lookup code is left in the original inlined
function.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use RTE_PCI_DEVICE macro to set all fields rather than explicitly setting
them individually in the code. This shortens the code while helping to
future-proof against future changes to the rte_pci_id structure.
Fixes: 701c8d80c8 ("pci: support class id probing")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
SR-IOV mode is currently set when dealing with VF devices. PF devices must
be taken into account as well if they have active VFs.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
A hardware capability check is missing before enabling RX VLAN stripping
during queue setup.
Also, while dev_conf.rxmode.hw_vlan_strip is currently a single bit that
can be stored in priv->hw_vlan_strip directly, it should be interpreted as
a boolean value for safety.
Fixes: f3db948918 ("mlx5: support Rx VLAN stripping")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
There is no guarantee that the new MTU is effective after writing its value
to sysfs. Retrieve it to be sure.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Memory regions are always local with raw Ethernet queues, drop the remote
property as it adds extra processing on the hardware side.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since _BSD_SOURCE was deprecated in favor of _DEFAULT_SOURCE in Glibc 2.19
and entirely removed in 2.20, various BSD ioctl macros are not exposed
anymore when _XOPEN_SOURCE is defined, and linux/if.h now conflicts with
net/if.h.
Add _DEFAULT_SOURCE and keep _BSD_SOURCE for compatibility with older
versions.
Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
PMDs not advertising the same flags in similar conditions.
Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
and PKT_RX_QINQ_STRIPPED that are better defined:
PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
is enabled in the RX configuration of the PMD.
For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
It should be removed from applications and PMDs in a future revision.
This patch also updates the drivers. For PKT_RX_VLAN_PKT:
- e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
required.
- fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
- ixgbe: modification done (vector and normal), the old flag was set
when a vlan was recognized, even if vlan stripping was disabled.
- the other drivers do not support vlan stripping.
For PKT_RX_QINQ_PKT, it was only supported on i40e, and the behavior was
already correct, so we can reuse the same bit value for
PKT_RX_QINQ_STRIPPED.
[1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Some architectures (ex: Power8) have a cache line size of 128 bytes,
so the drivers should not expect that prefetching the second part of
the mbuf with rte_prefetch0(&m->cacheline1) is valid.
This commit add helpers that can be used by drivers to prefetch the
rx or tx part of the mbuf, whatever the cache line size.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Introduce rte_mempool_populate_default() which allocates
mempool objects in several memzones.
The mempool header is now always allocated in a specific memzone
(not with its objects). Thanks to this modification, we can remove
many specific behavior that was required when hugepages are not
enabled in case we are using rte_mempool_xmem_create().
This change requires to update how kni and mellanox drivers lookup for
mbuf memory. For now, this will only work if there is only one memory
chunk (like today), but we could make use of rte_mempool_mem_iter() to
support more memory chunks.
We can also remove RTE_MEMPOOL_OBJ_NAME that is not required anymore for
the lookup, as memory chunks are referenced by the mempool.
Note that rte_mempool_create() is still broken (it was the case before)
when there is no hugepages support (rte_mempool_create_xmem() has to be
used). This is fixed in next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Do not use paddr table to store the mempool memory chunks.
This will allow to have several chunks with different virtual addresses.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Now that the mempool objects are chained into a list, we can use it to
browse them. This implies a rework of rte_mempool_obj_iter() API, that
does not need to take as many arguments as before. The previous function
is kept as a private function, and renamed in this commit. It will be
removed in a next commit of the patch series.
The only internal users of this function are the mellanox drivers. The
code is updated accordingly.
Introducing an API compatibility for this function has been considered,
but it is not easy to do without keeping the old code, as the previous
function could also be used to browse elements that were not added in a
mempool. Moreover, the API is already be broken by other patches in this
version.
The library version was already updated in
commit 213af31e09 ("mempool: reduce structure size if no cache needed")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit removes the const qualifier for the mempool in
rte_mempool_walk() callback prototype.
Indeed, most functions that can be done on a mempool require a non-const
mempool pointer, except the dump and the audit. Therefore, the
mempool_walk() is more useful if the mempool pointer is not const.
This is required by next commit where the mellanox drivers use
rte_mempool_walk() to iterate the mempools, then rte_mempool_obj_iter()
to iterate the objects in each mempool.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Many drivers provide their own implementation of rte_mbuf_raw_alloc(),
duplicating the code. Introduce a new public function in rte_mbuf to
allocate a raw mbuf (uninitialized).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
The link speed configuration is now done with bitmaps so 100G speed
requires only a new bit flag.
The actual link speed is a number so its size must be increased from
16-bit to 32-bit.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Tested-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Tested-by: Matej Vido <vido@cesnet.cz>
This patch redesigns the API to set the link speed/s configuration
of an ethernet port. Specifically:
- it allows to define a set of advertised speeds for
auto-negociation.
- it allows to disable link auto-negociation (single fixed speed).
- default: auto-negociate all supported speeds.
A flag autoneg in struct rte_eth_link indicates if link speed was a
result of auto-negociation or was fixed by configuration.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
Tested-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Tested-by: Beilei Xing <beilei.xing@intel.com>
Tested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The speed capabilities of a device can be retrieved with
rte_eth_dev_info_get().
The new field speed_capa is initialized in the drivers without
taking care of device characteristics in this patch.
When the capabilities of a driver are accurate, the table in
overview.rst must be filled.
Signed-off-by: Marc Sune <marcdevel@gmail.com>
When the number of RX queues is not a power of two,
the RETA table is configured to its maximum size for
better balancing.
Testing showed that limiting its size to 256 improves performance
noticeably with little to no impact on balancing results.
Fixes: ebb30ec64a ("mlx5: increase RETA table size")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Once freed, completed mbufs pointers are not set to NULL in the TX queue.
Clean up function must take this into account.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Fixes: 7fae69eeff ("mlx4: new poll mode driver")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Update function can be called with no key to enable or disable a RSS
protocol, or with a key to be applied to the desired protocols.
Fixes: 2f97422e77 ("mlx5: support RSS hash update and get")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
RSS configuration provided by the application should not be used as storage
by the PMD.
Fixes: 2f97422e77 ("mlx5: support RSS hash update and get")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
VLAN insertion can be done in hardware when supported in Verbs. A software
fallback is provided otherwise. The software implementation is also used
when multi-packet send is enabled on a queue, as both features are mutually
exclusive.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Environment variable MLX5_PMD_ENABLE_PADDING enables HW packet padding
in PCI bus transactions.
When packet size is cache aligned and CRC stripping is enabled, 4 fewer
bytes are written to the PCI bus. Enabling padding makes such packets
aligned again.
In cases where PCI bandwidth is the bottleneck, padding can improve
performance by 10%.
This is disabled by default since this can also decrease performance for
unaligned packet sizes.
Signed-off-by: Olga Shern <olgas@mellanox.com>
fix packet padding macro check
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Secondary processes are expected to use queues and other resources
allocated by the primary, however Verbs resources can only be shared
between processes when inherited through fork().
This limitation can be worked around for TX by configuring separate queues
from secondary processes.
Signed-off-by: Or Ami <ora@mellanox.com>
Add driver functions to set link state up or down.
Burst functions are updated to make sure applications cannot attempt to
send/receive after link is brought down.
Signed-off-by: Or Ami <ora@mellanox.com>
Add a new API rte_eth_dev_get_supported_ptypes to query what packet types
can be filled by a given device. The device should be already started or
its PMD RX burst function already decided, since the packet types supported
may vary depending on RX function.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Change rxq_cq_to_ol_flags() to set checksum flags according to packet type,
so for non L3/L4 packets the mbuf chksum_bad flags will not be set.
Fixes: 67fa62bc67 ("mlx5: support checksum offload")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
RSS configuration should not be freed when priv is NULL.
Fixes: 2f97422e77 ("mlx5: support RSS hash update and get")
Signed-off-by: Or Ami <ora@mellanox.com>
The first and last memory pool elements are usually cache-aligned but not
page-aligned, particularly when using huge pages.
Hardware performance can be improved significantly by registering memory
regions starting and ending on page boundaries.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Avoid dereferencing pointers twice to get to fast Verbs functions by
storing them directly in RX/TX queue structures.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Allows HW to strip the 802.1Q header from incoming frames and report it
through the mbuf structure.
This feature requires MLNX_OFED >= 3.2.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Upcoming flow director support will reuse this function to generate filter
rules.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Until now, broadcast frames were handled like unicast. Moving the related
flow to the special flows table frees up the related unicast MAC entry.
The same method is used to handle IPv6 multicast frames.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Merge redundant code by adding a static initialization table to manage
promiscuous and allmulticast (special) flows.
New function priv_rehash_flows() implements the logic to enable/disable
relevant flows in one place from any context.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
ConnectX-4 NICs can handle at most 512 entries in RETA table.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The physically linked-together combined library has been an increasing
source of problems, as was predicted when library and symbol versioning
was introduced. Replace the complex and fragile construction with a
simple linker script which achieves the same without all the problems,
remove the related kludges from eg mlx drivers.
Since creating the linker script is practically zero cost, remove the
config option and just create it always.
Based on a patch by Sergio Gonzales Monroy, linker script approach
initially suggested by Neil Horman.
Suggested-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
fix the error reported by checkpatch:
"ERROR: return is not a function, parentheses are not required"
remove parentheses in return like:
"return (logical expressions)"
remove parentheses in return a function like:
"return (rte_mempool_lookup(...))"
Fixes: 6307b909b8 ("lib: remove extra parenthesis after return")
Signed-off-by: Huawei Xie <huawei.xie@intel.com>
The number of available entries in TX rings is taken before performing
completion, effectively making rings smaller than they are and causing
TX performance issues under load.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When MP to MR cache is full, the last (newest) MR is freed instead of the
first (oldest) one, causing local protection errors during TX.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Pre-registering mbuf memory pools when creating TX queues avoids costly
registrations later in the data path.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Buffers with too many segments are linearized to overcome
MLX5_PMD_SGE_WR_N, unfortunately the last segment is never sent.
Fixes: 3ee8444608 ("mlx5: support scattered Rx and Tx")
Signed-off-by: Jesper Wramberg <jesper.wramberg@gmail.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Indirect mbuf data may come from a different mempool which must be
registered separately as another memory region, otherwise such mbufs cannot
be sent.
Fixes: 2e22920b85 ("mlx5: support non-scattered Tx and Rx")
Signed-off-by: Jesper Wramberg <jesper.wramberg@gmail.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
A typo causes TX stats indices to be retrieved from RX queues.
Fixes: 87011737b7 ("mlx5: add software counters")
Reported-by: Nicolas Harnois <nicolas.harnois@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Only seen since IPv6 RSS support was added, confusion about the purpose of
the hash_rxq_type_from_n() function has caused it to return invalid hash RX
queue types.
Refactor function for its intended purpose, rename it
hash_rxq_type_from_pos() and update comment with a better description.
Fixes: a76133214d ("mlx5: use separate indirection table for default hash Rx queue")
Reported-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The following error occurs when CONFIG_RTE_LIBRTE_MLX5_DEBUG=y:
drivers/net/mlx5/mlx5.c:381:4: error: ISO C forbids braced-groups within expressions
RTE_MIN() uses the non-standard ({ ... }) syntax to declare variables within
parentheses, which is rejected by -pedantic.
Since the RSS_INDIRECTION_TABLE_SIZE check is meant to go away as soon as
DPDK supports larger/variable indirection tables, put it in a separate
condition.
Fixes: 634efbc2c8 ("mlx5: support RETA query and update")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Use new function rte_eth_copy_pci_info.
Copy device info for the following pdevs:
bnx2x
cxgbe
e1000
enic
fm10k
i40e
ixgbe
mlx4
mlx5
virtio
vmxnet3
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
ConnectX-4 adapters do not have a constant indirection table size, which is
set at runtime from the number of RX queues. The maximum size is retrieved
using a hardware query and is normally 512.
Since the current RETA API cannot handle a variable size, any query/update
command causes it to be silently updated to RSS_INDIRECTION_TABLE_SIZE
entries regardless of the original size.
Also due to the underlying type of the configuration structure, the maximum
size is limited to RSS_INDIRECTION_TABLE_SIZE (currently 128, at most 256
entries).
A port stop/start must be done to apply the new RETA configuration.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Add interrupts handler for port status notification.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Seen with GCC < 4.6:
error: unknown field ‘tcp_udp’ specified in initializer
error: extra brace group at end of initializer
Static initialization of anonymous structs/unions is a C11 feature
properly supported only since GCC 4.6.
Work around compilation errors with older versions by expanding
struct ibv_exp_flow_spec into struct hash_rxq_init.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Normal flows do not currently provide IPv6 support.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Only a single flow per hash RX queue is needed in promiscuous mode.
Disable others to free up hardware resources.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
DPDK expects to have an RSS hash key per flow type (IPv4, IPv6, UDPv4,
etc.), to handle this the PMD must keep a table of hash keys to be able
to reconfigure the queues at each start/stop call.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
First implementation of rss_hash_update and rss_hash_conf_get, those
functions still lack in functionality but are usable to change the RSS
hash key. For now, the PMD does not handle an indirection table for
each kind of flow (IPv4, IPv6, etc.), the same RSS hash key is used
for all protocols. This situation explains why the rss_hash_conf_get
returns the RSS hash key for all DPDK supported protocols and why the
hash key is set for all of them too.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Promiscuous and allmulticast modes were historically enabled by adding
specific flows with types IBV_FLOW_ATTR_ALL_DEFAULT or
IBV_EXP_FLOW_ATTR_MC_DEFAULT to each hash RX queue, but this method is
deprecated.
- Promiscuous mode is now enabled by omitting destination MAC addresses from
basic flow specifications.
- Allmulticast mode is now enabled by using flow specifications that match
the broadcast bit in destination MAC addresses.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
All hash RX QPs currently use the same flow steering rule (L2 MAC filtering)
regardless of their type (TCP, UDP, IPv4, IPv6), which prevents them from
being dispatched properly. This is fixed by adding flow information to the
hash RX queue initialization data and generating specific flow steering
rules for each of them.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Use the maximum size of the indirection table when the number of requested
RX queues is not a power of two, this help to improve RSS balancing.
A message informs users that balancing is not optimal in such cases.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The default hash RX queue handles packets that are not matched by more
specific types and requires its own indirection table of size 1 to work
properly.
This commit implements support for multiple indirection tables by grouping
their layout and properties in a static initialization table.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
The new Verbs RSS API is lower-level than the previous one and much more
flexible but requires RX queues to use Work Queues (WQs) internally instead
of Queue Pairs (QPs), which are grouped in an indirection table used by a
new kind of hash RX QPs.
Hash RX QPs and the indirection table together replace the parent RSS QP
while WQs are mostly similar to child QPs.
RSS hash key is not configurable yet.
Summary of changes:
- Individual DPDK RX queues do not store flow properties anymore, this info
is now part of the hash RX queues.
- All functions affecting the parent queue when RSS is enabled or the basic
queues otherwise are modified to affect hash RX queues instead.
- Hash RX queues are also used when a single DPDK RX queue is configured (no
RSS) to remove that special case.
- Hash RX queues and indirection table are created/destroyed when device
is started/stopped in addition to create/destroy flows.
- Contrary to QPs, WQs are moved to the "ready" state before posting RX
buffers, otherwise they are ignored.
- Resource domain information is added to WQs for better performance.
- CQs are not resized anymore when switching between non-SG and SG modes as
it does not work correctly with WQs. Use the largest possible size
instead, since CQ size does not have to be the same as the number of
elements in the RX queue. This also applies to the maximum number of
outstanding WRs in a WQ (max_recv_wr).
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Removing this structure reduces the size of SG and non-SG RX queue elements
significantly to improve performance.
An nice side effect is that the mbuf pointer is now fully stored in
struct rxq_elt instead of relying on the WR ID data offset hack.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit updates mlx5_rx_burst_sp() to use the fast verbs interface for
posting RX buffers just like mlx5_rx_burst(). Doing so avoids a loop in
libmlx5 and an indirect function call through libibverbs.
Note: recv_sg_list() is not implemented in the QP burst API, this commit is
only to prepare transition to the WQ-based API.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This is the same implementation as mlx4.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
All MAC RX flows must be updated with VLAN information when configuring a
VLAN filter.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Like most other device control operations, those are handled by the related
kernel network device through syscalls.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Link information is retrieved using ethtool ioctls.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Depending on the MTU and whether jumbo frames are enabled, RX queues may
switch between SG and non-SG modes for better performance.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
A dedicated RX callback is added to handle scattered buffers. For better
performance, it is only used when jumbo frames are enabled and MTU is larger
than a single mbuf.
On the TX path, scattered buffers are also handled in a separate function.
When there are more than MLX5_PMD_SGE_WR_N segments in a given mbuf, the
remaining segments are linearized in the last SGE entry.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This commit adds the remaining missing callbacks to make mlx5 usable.
Like mlx4, device start and stop are implemented on top of MAC RX flows.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Francesco Santoro <francesco.santoro@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
This commit adds support for MAC flow steering rules mandatory for the RX
path as well as the related callbacks to add/remove MAC addresses.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
RSS implementation with parent/child QPs comes from mlx4 and is temporary.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In its current state, this driver implements the bare minimum to initialize
itself and Mellanox ConnectX-4 adapters without doing anything else
(no RX/TX for instance). It is disabled by default since it is based on the
mlx4 driver and also depends on libibverbs.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>