Tx function was handling a double loop to send segmented packets, it can be
done in a single one.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
PMD uses only power of two number of Work Queue Elements (aka WQE), storing
the number of elements in log2 helps to reduce the size of the container to
store it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Blue Flame (aka BF) is a buffer allocated with a power of two value, its
size is returned by Verbs in log2.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
PMD uses only power of two number of Completion Queue Elements (aka CQE),
storing the number of elements in log2 helps to reduce the size of the
container to store it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
PMD uses only power of two number of descriptors, storing the number of
elements in log2 helps to reduce the size of the container to store it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Rework Work Queue Element (aka WQE) structures to fit PMD needs.
A WQE is an aggregation of 16 bytes elements known as "data segments"
(aka dseg).
The only common part is the first two elements i.e. the control one to
define the job type, and the Ethernet segment which embed offload requests
with other information, after that, it can have:
- a raw data packet,
- a data pointer to the packet itself,
- both.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
With recent gcc versions, e.g. gcc 6.1, compilation of mlx drivers with
debug enabled produces lots of errors complaining that "pedantic" is
not a warning level that can be ignored.
error: ‘-pedantic’ is not an option that controls warnings [-Werror=pragmas]
#pragma GCC diagnostic ignored "-pedantic"
^~~~~~~~~~~
These errors can be removed by changing the "-pedantic" to "-Wpedantic".
Fixes: 7fae69eeff ("mlx4: new poll mode driver")
Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To improve performance the NIC expects for large packets to have a pointer
to a cache aligned address, old inline code could break this assumption
which hurts performance.
Fixes: 2a66cf3789 ("net/mlx5: support inline send")
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Rework logic of wqe_write() and wqe_write_vlan() which are pretty similar
to keep a single one.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This function was supposed to be inlined, but was not because several
functions calls it. This function should always be inline avoid
external function calls and to optimize code in data-path.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Packet rejection was routed to a polled queue. This patch route them to a
dummy queue which is not polled.
Fixes: 76f5c99e68 ("mlx5: support flow director")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This is done to prepare support for drop queues, which are not related to
existing Rx queues and need to be managed separately.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
memmove was moving bytes as the number of elements next to i, while it
should move the number of elements multiplied by the size of each element.
Fixes: e9086978 ("mlx5: support VLAN filtering")
Signed-off-by: Raslan Darawsheh <rdarawsheh@asaltech.com>
This capability is implemented but not reported.
Fixes: f3db948918 ("mlx5: support Rx VLAN stripping")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The return value in DPDK is negative errno on failure.
Since internal functions in mlx driver return positive
values need to negate this value when it returned to
dpdk layer.
Fixes: 76f5c99 ("mlx5: support flow director")
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
The user is allowed to call ->rx_pkt_burst() even without free
mbufs in the pool. In this scenario we'll fail allocating a rep mbuf
on the first iteration (where pkt is still NULL). This would cause us
to deref a NULL pkt (reset refcount and free).
Fix this by checking the pkt before freeing it.
Fixes: a1bdb71a32 ("net/mlx5: fix crash in Rx")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Now that rte_device is available, drivers can start using its members
(numa, name) as well as link themselves into another rte_device list.
As of now no one is using this list, but can be used for moving over all
devices (pdev/vdev/Xdev) and perform bulk actions (like cleanup).
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Reword commit log for extra rte_device list]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Remove the 'name' member from rte_pci_driver and move to generic
rte_driver.
Most of the PMD drivers were initially using DRIVER_REGISTER_PCI(<name>..)
as well as assigning a name to eth_driver.pci_drv.name member.
In this patch, only the original DRIVER_REGISTER_PCI(<name>..) name has
been populated into the rte_driver.name member - assignments through
eth_driver has been removed.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
[Shreyansh: Rebase and expand changes to newly added files]
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
Now that hotplug has been moved to eal, there is no reason to keep the
device type in this layer.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Simplify crypto and ethdev pci drivers init by using newly introduced
init macros and helpers.
Those drivers then don't need to register as "rte_driver"s anymore.
Exceptions:
- virtio and mlx* use RTE_INIT directly as they have custom initialization
steps.
- VDEV devices are not modified - they continue to use PMD_REGISTER_DRIVER.
Update documentation for replacing an example referring to
PMD_REGISTER_DRIVER.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Probe and Remove are more appropriate names for PCI init and uninint
operations. This is a cosmetic change.
Only MLX* uses the PCI direct registration, bypassing PMD_* macro.
The callbacks for this too have been updated.
VDEV are left out. For them, init/uninit are more appropriate.
Suggested-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: David Marchand <david.marchand@6wind.com>
As discussed in the past release, driver names are modified
to be more consistent, and the future driver should follow
this new convention.
Driver names consist of:
"driver category"_"driver folder name"_"optional extra name".
For example:
- Crypto null driver -> "crypto_null"
- Network IXGBE VF driver -> "net_ixgbe_vf"
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
RHEL 7.1's GCC for POWER8 reports the following error in one rte_memcpy()
macro call by mlx5:
error: array subscript is above array bounds [-Werror=array-bounds]
It appears to be a GCC bug which can be worked around by making parentheses
more explicit.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Fixed issue could occur when Mbuf starvation happens in a middle of
reception of a segmented packet. In such a situation, the PMD has to
release all segments of that packet. The end condition was wrong
causing it to free an Mbuf still handled by the NIC.
Fixes: 9964b965ad ("net/mlx5: re-add Rx scatter support")
Reported-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In mlx5 rx function, the packet_type and ol_flags mbuf fields are not
properly initialized when no rx offload feature is enabled (checksum, l2
tun checksum, vlan_strip, crc). Thus, these fields can have a value
different of 0 depending on their value when the mbuf was freed.
This can result in an incorrect application behavior if invalid
ol_flags/ptype are set, or memory corruptions if IND_ATTACHED_MBUF is
set in ol_flags.
Fixes: 081f7eae24 ("mlx5: process offload flags only when requested")
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Fixes: 62072098b5 ("mlx5: support setting link up or down")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
According to the documentation, the function
priv_set_flags(priv, keep, flags) should not modify the flags
in "keep" mask.
So 'flags' argument should be masked with '~keep' before ORing
it with the previous flags value.
This avoids messing up the kernel interface flags when calling
priv_set_flags(priv, ~IFF_UP, ~IFF_UP) in priv_set_link():
$ ip link
26: eth0: BROADCAST,MULTICAST,NOARP,ALLMULTI,PROMISC,DEBUG,\
DYNAMIC,AUTOMEDIA,PORTSEL,NOTRAILERS
Fixes: 7fae69eeff ("mlx4: new poll mode driver")
Fixes: 771fa900b7 ("mlx5: introduce new driver for Mellanox ConnectX-4 adapters")
Reported-by: Fengtian Guo <fengtian.guo@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since now the PMD_REGISTER_DRIVER macro sets the driver names,
there is no need to have the rte_driver structure setting it
statically, as it will get overridden.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Compilation fails because of some typos.
Fixes: cb6696d220 ("drivers: update registration macro usage")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Modify the PMD_REGISTER_DRIVER macro, adding a name argument to it. The
addition of a name argument creates a token that can be used for subsequent
macros in the creation of unique symbol names to export additional bits of
information for use by the pmdinfogen tool. For example:
PMD_REGISTER_DRIVER(ena_driver, ena);
registers the ena_driver struct as it always did, and creates a symbol
const char this_pmd_name0[] __attribute__((used)) = "ena";
which pmdinfogen can search for and extract. The subsequent macro
DRIVER_REGISTER_PCI_TABLE(ena, ena_pci_id_map);
creates a symbol const char ena_pci_tbl_export[] __attribute__((used)) =
"ena_pci_id_map";
Which allows pmdinfogen to find the pci table of this driver
Using this pattern, we can export arbitrary bits of information.
pmdinfo uses this information to extract hardware support from an object
file and create a json string to make hardware support info discoverable
later.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Compilation errors:
mlx4:
drivers/net/mlx4/mlx4.c(5409): error #188:
enumerated type mixed with another type
priv->intr_handle.type = 0;
^
mlx5:
drivers/net/mlx5/mlx5_rxq.c(282): error #188:
enumerated type mixed with another type
enum hash_rxq_type type = 0;
^
and more same type of error.
Fix these by assigning enum values rather than integer values to the enum
variables
Fixes: c4da6caa42 ("mlx4: handle link status interrupts")
Fixes: 198a3c339a ("mlx5: handle link status interrupts")
Fixes: 0d2186743d ("mlx5: manage all special flow types at once")
Fixes: 612ad38209 ("mlx5: fix hash Rx queue type in RSS mode")
Fixes: 083c2dd317 ("mlx5: refactor special flows handling")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This commit brings back Rx scatter and related support by the MTU update
function. The maximum number of segments per packet is not a fixed value
anymore (previously MLX5_PMD_SGE_WR_N, set to 4 by default) as it caused
performance issues when fewer segments were actually needed as well as
limitations on the maximum packet size that could be received with the
default mbuf size (supporting at most 8576 bytes).
These limitations are now lifted as the number of SGEs is derived from the
MTU (which implies MRU) at queue initialization and during MTU update.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The primary purpose of rxq_rehash() function is to stop and restart
reception on a queue after re-posting buffers. This may fail if the array
that temporarily stores existing buffers for reuse cannot be allocated.
Update rxq_rehash() to work on the target queue directly (not through a
template copy) and avoid this allocation.
rxq_alloc_elts() is modified accordingly to take buffers from an existing
queue directly and update their refcount.
Unlike rxq_rehash(), rxq_setup() must work on a temporary structure but
should not allocate new mbufs from the pool while reinitializing an
existing queue. This is achieved by using the refcount-aware
rxq_alloc_elts() before overwriting queue data.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Toggling RX checksum offloads is already done at initialization time. This
code does not belong in rxq_rehash().
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Compared to its previous incarnation, the software limit on the number of
mbuf segments is no more (previously MLX5_PMD_SGE_WR_N, set to 4 by
default) hence no need for linearization code and related buffers that
permanently consumed a non negligible amount of memory to handle oversized
mbufs.
The resulting code is both lighter and faster.
With the addition of this code, older GCC versions (such
as 4.8.5) may complain about 'wqe' variable being uninitialized, so
initialize it preemptively, even though it is not necessary to do so.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
The space necessary to store segmented packets cannot be known in advance
and must be verified for each of them.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This feature enables the TX burst function to emit up to 5 packets using
only two work queue entries (WQEs) on devices that support it. Saves PCI
bandwidth and improves performance.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Implement send inline feature which copies packet data directly into
work queue entries (WQEs) for improved latency. The maximum packet
size and the minimum number of Tx queues to qualify for inline send
are user-configurable.
This feature is effective when HW causes a performance bottleneck.
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Replacing the variable countdown (which depends on the number of
descriptors) with a fixed relative threshold known at compile time improves
performance by reducing the TX queue structure footprint and the amount of
code to manage completions during a burst.
Completions are now requested at most once per burst after threshold is
reached.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
Mini (compressed) completion queue entries (CQEs) are returned by the
NIC when PCI back pressure is detected, in which case the first CQE64
contains common packet information followed by a number of CQE8
providing the rest, followed by a matching number of empty CQE64
entries to be used by software for decompression.
Before decompression:
0 1 2 6 7 8
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 | | CQE64 |
|-------| |---------| |-------| |-------| |-------| |-------|
| ..... | | cqe8[0] | | | . | | | | | ..... |
| ..... | | cqe8[1] | | | . | | | | | ..... |
| ..... | | ....... | | | . | | | | | ..... |
| ..... | | cqe8[7] | | | | | | | | ..... |
+-------+ +---------+ +-------+ +-------+ +-------+ +-------+
After decompression:
0 1 ... 8
+-------+ +-------+ +-------+
| CQE64 | | CQE64 | | CQE64 |
|-------| |-------| |-------|
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | . | ..... |
| ..... | | ..... | | ..... |
+-------+ +-------+ +-------+
This patch does not perform the entire decompression step as it would be
really expensive, instead the first CQE64 is consumed and an internal
context is maintained to interpret the following CQE8 entries directly.
Intermediate empty CQE64 entries are handed back to HW without further
processing.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Vasily Philipov <vasilyf@mellanox.com>
The intent is to replace the remaining compile-time options and environment
variables with a common mean of runtime configuration. This commit only
adds the kvargs handling code, subsequent commits will update the rest.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
These structures and macros extend those exposed by libmlx5 (in mlx5_hw.h)
to let the PMD manage work queue and completion queue elements directly.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
The latest version of Mellanox OFED exposes hardware definitions necessary
to implement data path operation bypassing Verbs. Update the minimum
version requirement to MLNX_OFED >= 3.3 and clean up compatibility checks
for previous releases.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure rxq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To keep the data path as efficient as possible, move fields only useful to
the control path into new structure txq_ctrl.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Inline TX will be fully managed by the PMD after Verbs is bypassed in the
data path. Remove the current code until then.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>