Commit Graph

884 Commits

Author SHA1 Message Date
Yongseok Koh
c18cf501a7 net/mlx5: enable secondary process to register DMA memory
The Memory Region (MR) for DMA memory can't be created from secondary
process due to lib/driver limitation. Whenever it is needed, secondary
process can make a request to primary process through the EAL IPC
channel (rte_mp_msg) which is established on initialization. Once a MR
is created by primary process, it is immediately visible to secondary
process because the MR list is global per a device. Thus, secondary
process can look up the list after the request is successfully returned.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
dceb502942 net/mlx5: add control of excessive memory pinning by kernel
A new PMD parameter (mr_ext_memseg_en) is added to control extension of
memseg when creating a MR. It is enabled by default.

If enabled, mlx5_mr_create() tries to maximize the range of MR
registration so that the LKey lookup tables on datapath become smaller
and get the best performance. However, it may worsen memory utilization
because registered memory is pinned by kernel driver. Even if a page in
the extended chunk is freed, that doesn't become reusable until the
entire memory is freed and the MR is destroyed.

To make freed pages available immediately, this parameter has to be
turned off but it could drop performance.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
207fe7ac72 net/mlx5: fix external memory registration
Secondary process is not allowed to register MR due to a restriction of
library and kernel driver.

Fixes: 7e43a32ee0 ("net/mlx5: support externally allocated static memory")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
3d1f3c7c83 net/mlx: remove debug messages on datapath
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
2aac5b5d11 net/mlx5: sync stop/start with secondary process
Rx/Tx burst function pointers are stored in the rte_eth_dev structure,
which is local to a process. Even though primary process replaces the
function pointers, secondary will not run the new ones. With rte_mp
APIs, primary can easily broadcast a request to stop/start the datapath
of secondary processes.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
7be600c8d8 net/mlx5: rework PMD global data init
There's more need to have PMD global data structure. This should be
initialized once per a process regardless of how many PMD instances are
probed. mlx5_init_once() is called during probing and make sure all the
init functions are called once per a process. Currently, such global
data and its initialization functions are even scattered. Rather than
'extern'-ing such variables and calling such functions one by one making
sure it is called only once by checking the validity of such variables, it
will be better to have a global storage to hold such data and a
consolidated function having all the initializations. The existing shared
memory gets more extensively used for this purpose. As there could be
multiple secondary processes, a static storage (local to process) is also
added.

As the reserved virtual address for UAR remap is a PMD global resource,
this doesn't need to be stored in the device priv structure, but in the
PMD global data.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
9a8ab29b84 net/mlx5: replace IPC socket with EAL API
Socket API is used for IPC in order for secondary process to acquire
Verb command file descriptor. The FD is used to remap UAR address.
The multi-process APIs (rte_mp) in EAL are newly introduced.
mlx5_socket.c is replaced with mlx5_mp.c, which uses the new APIs.

As it is PMD global infrastructure, only one IPC channel is established.
All the IPC message types may have port_id in the message if there is
need to reference a specific device.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
3ebe658059 net/mlx5: fix memory event on secondary process
As the memory event is propagated to secondary processes, the event is
processed redundantly. This should be processed once because the data
structure used for MR and the event is global across the processes.

Fixes: 974f1e7ef1 ("net/mlx5: add new memory region support")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Dekel Peled
de90612f40 net/mlx5: fix errno typos in comments
Correct typing mistake in several locations:
ernno ==> errno

Fixes: 23c1d42c71 ("net/mlx5: split flow validation to dedicated function")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Yongseok Koh
9c55c6bd86 net/mlx5: revert mbuf address calculation for x86
When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be
loaded. non-x86 processors (mostly RISC such as ARM and Power) are more
vulnerable to load stall. For x86, reducing the number of instructions
seems to matter most.

For x86, this is simply a load but for other architectures, it is
calculated from the address of mbuf structure by rte_mbuf_buf_addr()
without having to load the first cacheline of the mbuf.

Fixes: 12d468a62b ("net/mlx5: fix instruction hotspot on replenishing Rx buffer")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-04-05 17:45:22 +02:00
Viacheslav Ovsiienko
0fe3f18f78 net/mlx5: add source vport match to the ingress rules
For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
028b2a28c3 net/mlx5: update event handler for multiport IB devices
This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
53e5a82fd1 net/mlx5: update install/uninstall event handlers
We are implementing the support for multiport Infiniband device
with representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
1e14090e31 net/mlx5: provide IB port for the object being created
The code is updated to provide IB port index for the Verbs
objects being created - QPs and Verbs Flows.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
f048f3d479 net/mlx5: switch to the shared IB device context
The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created within this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
d485cdca01 net/mlx5: switch to the shared context IB attributes
The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multiple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
1b782252cb net/mlx5: switch to the shared protection domain
The PMD code is updated to use Protected Domain from the
shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
9c0a9eed37 net/mlx5: switch to the names in the shared IB context
The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
17e19bc4dd net/mlx5: add IB shared context alloc/free functions
The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
ad74bc6195 net/mlx5: support multiport IB device during probing
mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the single shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
bbfad6427b net/mlx5: add getting IB ports number for multiport IB
There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on these ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
e505508a38 net/mlx5: modify get ifindex routine for multiport IB
There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Viacheslav Ovsiienko
299d7dc28c net/mlx5: add representor recognition on Linux 5.x
The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-29 17:25:32 +01:00
Ali Alnubani
e3bcaf3a0f net/mlx5: add missing return value check
This patch fixes the build failure with message:
  drivers/net/mlx5/mlx5_ethdev.c: In function ‘mlx5_sysfs_switch_info’:
  drivers/net/mlx5/mlx5_ethdev.c:1381:3:
    error: ignoring return value of ‘fscanf’, declared with attribute
           warn_unused_result [-Werror=unused-result]
  fscanf(file, "%s", port_name);
    ^

Which reproduces on Ubuntu 16.04 LTS with
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609.

Fixes: b2f3a38101 ("net/mlx5: support new representor naming format")

Signed-off-by: Ali Alnubani <alialnu@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Dekel Peled <dekelp@mellanox.com>
2019-03-29 17:25:32 +01:00
Shahaf Shuler
989e999d93 net/mlx5: support PCI device DMA map and unmap
The implementation reuses the external memory registration work done by
commit[1].

Note about representors:

The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.

While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.

[1]
commit 7e43a32ee0
("net/mlx5: support externally allocated static memory")

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-30 16:48:58 +01:00
Shahaf Shuler
0f132546a8 net/mlx5: refactor external memory registration
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-30 16:48:57 +01:00
Dekel Peled
b2f3a38101 net/mlx5: support new representor naming format
Kernel update [1] introduce new format of representors names.
This patch implements RFC [2], updating MLX5 PMD to support the new
format, while maintaining support of the existing format.

[1] https://github.com/torvalds/linux/commit/c12ecc2
[2] http://mails.dpdk.org/archives/dev/2019-March/125676.html

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-20 18:15:42 +01:00
Shahaf Shuler
57c0e2494c net/mlx5: fix packet inline on Tx queue wraparound
Inlining a packet to WQE that cross the WQ wraparound, i.e. the WQE
starts on the end of the ring and ends on the beginning, is not
supported and blocked by the data path logic.

However, in case of TSO, an extra inline header is required before
inlining. This inline header is not taken into account when checking if
there is enough room left for the required inline size.
On some corner cases were
(ring_tailroom - inline header) < inline size < ring_tailroom ,
this can lead to WQE being written outsize of the ring buffer.

Fixing it by always assuming the worse case that inline of packet will
require the inline header.

Fixes: 3f13f8c23a ("net/mlx5: support hardware TSO")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-03-20 18:15:42 +01:00
Dekel Peled
fd350d3c9a net/mlx5: fix sync when handling Tx completions
Function mlx5_tx_complete() reads completion entry information
from Tx queue.
For some processors not having strongly-ordered memory model,
there has to be a memory barrier between reading the entry index
and the entry fields, in order to guarantee data is valid.

Fixes: 54d3fe948d ("net/mlx5: poll completion queue once per a call")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-08 17:52:22 +01:00
Dekel Peled
38f0a160b5 net/mlx5: fix hex dump of error completion
struct mlx5_cqe is defined in MLX5 PMD code (mlx5_prm.h).
It includes 64 bytes padding in case of (RTE_CACHE_LINE_SIZE == 128).

struct mlx5_err_cqe is defined in kernel, and doesn't include padding.

When running in debug mode, in case an error CQE is detected
it is printed using rte_hexdump().

The size of data to print should be sizeof(*cqe) instead of
sizeof(*err_cqe), to handle the case of (RTE_CACHE_LINE_SIZE == 128),
and print the full data in any case.

Fixes: c771499209 ("net/mlx5: extend debug logs verbosity")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-08 17:52:22 +01:00
Thomas Monjalon
09c9c4d23d net/mlx5: call generic strlcpy
The call to strlcpy uses either libc, libbsd or internal rte_strlcpy.
No need to call the DPDK flavor explicitly.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-08 17:52:22 +01:00
Viacheslav Ovsiienko
4fb27c1dfe net/mlx5: fix flow priorities probing error path
The mlx5 PMD probes the Verbs flow priorities supported with
ibv_create_flow() function. If rdma-core or kernel fails for
some reason, the returned error causes the drop queue is not
destroyed, and pd is locked by not freed resource.

Also the mlx5_flow_discover_priorities() returned negative value
as error, and this code was reported "as is", without sign
changing (eventually causing assert(err > 0)).

Fixes: 2815702bae ("net/mlx5: replace verbs priorities by flow")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-08 17:52:22 +01:00
Thomas Monjalon
dbeba4cf18 net/mlx: prefix private structure
The private structure stored in rte_eth_dev->data->dev_private
was named "struct priv".
In order to ease code browsing, the structure is renamed
"struct mlx[45]_priv".

Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-03-01 18:17:35 +01:00
Luca Boccassi
e30b4e566f build: improve dependency handling
Whenever possible (if the library ships a pkg-config file) use meson's
dependency() function to look for it, as it will automatically add it
to the Requires.private list if needed, to allow for static builds to
succeed for reverse dependencies of DPDK. Otherwise the recursive
dependencies are not parsed, and users doing static builds have to
resolve them manually by themselves.
When using this API avoid additional checks that are superfluous and
take extra time, and avoid adding the linker flag manually which causes
it to be duplicated.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
2019-02-27 12:13:54 +01:00
Thomas Monjalon
714bf46ebb net/mlx: support firmware version query
The API function rte_eth_dev_fw_version_get() is querying drivers
via the operation callback fw_version_get().
The implementation of this operation is added for mlx4 and mlx5.
Both functions are copying the same ibverbs field fw_ver
which is retrieved when calling ibv_query_device[_ex]()
during the port probing.

It is tested with command "drvinfo" of examples/ethtool/.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-02-13 12:55:38 +01:00
Dekel Peled
7f4019d370 net/mlx5: fix Tx metadata for multi-segment packet
Original patch implemented the use of match_metadata offload in the
different burst functions.
The concurrent use of match_metadata and multi_segs offloads was
not handled.

This patch updates function txq_scatter_v(), to pass metadata value
from mbuf to wqe, when indicated by offload flags.

Fixes: 6bd7fbd03c ("net/mlx5: support metadata as flow rule criteria")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2019-02-13 12:55:38 +01:00
Viacheslav Ovsiienko
c9e462d17b net/mlx5: fix build for armv8
Added <rte_cycles.h> inclusion, was not included on some
building setups (armv8).

Fixes: 71ab2d6472 ("net/mlx5: fix VXLAN port registration race condition")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-30 11:08:35 +01:00
Viacheslav Ovsiienko
a9c94cc050 net/mlx5: fix VXLAN without decap action for E-Switch
There is an intention to support VXLAN tunnel match without
hardware offloaded decapsulation, just to redirect ingress
tunnelled frame untouched. This small fix allows to specify
Flows with VXLAN VNI pattern and with or without following
decapsulation action.

Fixes: 251e8d02cf ("net/mlx5: add VXLAN to flow translate routine")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-27 12:33:47 +01:00
Viacheslav Ovsiienko
71ab2d6472 net/mlx5: fix VXLAN port registration race condition
E-Switch VXLAN tunneling rules require virtual VXLAN network
devices be created. These devices are managed by MLX5 PMD and
created/deleted dynamically.

Kernel creates the VXLAN devices and registers VXLAN UDP ports
to be hardware offloaded within the NIC kernel drivers. The
registration process is being performed into context of working
kernel thread and the race conditions might happen.

The VXLAN device is created and success code is returned to calling
application, but the UDP port registration process is not completed
yet and the next applied rule might be rejected by the driver with
ENOSUP code. This patch adds some timeout for new created devices,
allowing port registration process to be completed. The waiting
is performed once after device been created and first rule is being
applied.

Fixes: 95a464cecc ("net/mlx5: add E-switch VXLAN tunnel devices management")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-24 14:53:10 +01:00
Viacheslav Ovsiienko
0deb984fd7 net/mlx5: fix TC rule handle assignment
When tc rule is created via Netlink message application
can provide the unique rule value which can be accepted
by the kernel. Than rule is managed with this assigned
handle. It was found that kernel can reject the proposed
handle and assign its own handle value, the rule control
is lost, because application uses its initially prorosed
rule handle and knows nothing about handle been repleced.

The kernel can assign handle automatically, the application
can get the assigned handle value by specifying NLM_F_ECHO
flag in Netlink message when rule is being created. The
kernel sends back the full descriptor of rule and handle
can be retrieved from and stored by application for further
rule management.

Fixes: 57123c00c1 ("net/mlx5: add Linux TC flower driver for E-Switch flow")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-24 14:53:10 +01:00
Dekel Peled
15c8015587 net/mlx5: block RSS action without Rx queue
This patch modifies function mlx5_flow_validate_action_rss(), to
prevent the setting of rule with rss action, but without specifying
any queues.
For example:
flow create 0 ingress pattern end actions rss queues end / end

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-24 14:53:10 +01:00
Dekel Peled
ff160dbcba net/mlx5: allow port start with zero Rx queue
During port start, function mlx5_ctrl_flow_vlan() is called to create
default ingress flow rules.
For specific use-cases, a port can be used for Tx only.
In such case, number of Rx queues can be set to 0 to save resources,
hence the default ingress rules are irrelevant.

This patch modifies function mlx5_ctrl_flow_vlan() to avoid the
creation of the default ingress rules when number of Rx queues is 0.
It also includes update of validation functions for relevant actions,
mlx5_flow_validate_action_queue() and mlx5_flow_validate_action_rss(),
to prevent creation of flow rules with these actions when number of Rx
queues is 0.

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-24 14:53:10 +01:00
Yongseok Koh
2014a7fbae net/mlx5: fix deprecated library API for Rx padding
In rdma-core library IBV_WQ_FLAG_RX_END_PADDING is renamed to
IBV_WQ_FLAGS_PCI_WRITE_END_PADDING. Way to query the capability is also
changed.

Fixes: 43e9d9794c ("net/mlx5: support upstream rdma-core")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Erez Ferber <erezf@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-18 09:47:26 +01:00
Yongseok Koh
78c7a16daa net/mlx5: fix Rx packet padding
Rx packet padding is supposed to be set by an environment variable -
MLX5_PMD_ENABLE_PADDING, but it has been missing for some time by mistake.
Rather than using such a variable, a PMD parameter (rxq_pkt_pad_en) is
added instead.

Fixes: a1366b1a2b ("net/mlx5: add reference counter on DPDK Rx queues")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Reviewed-by: Erez Ferber <erezf@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-18 09:47:26 +01:00
Yongseok Koh
12d468a62b net/mlx5: fix instruction hotspot on replenishing Rx buffer
On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
to be accessed as it is static and easily calculated from the mbuf address.
Accessing the mbuf content causes unnecessary load stall and it is worsened
on ARM.

Fixes: 545b884b1d ("net/mlx5: fix buffer address posting in SSE Rx")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-15 02:40:40 +01:00
Viacheslav Ovsiienko
55c61fa714 net/mlx5: validate TOS and TTL on E-Switch
This patch adds the type-of-service and time-to-live IP header
fields validation on E-Switch, both for match pattern and
VXLAN encapsulation action IP header itesm. The E-Switch flows
will use the common mlx5_flow_validate_item_ipv4/6 routines
with added extra parameter, specifying the supported fields
mask.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:30 +01:00
Viacheslav Ovsiienko
363fa2f296 net/mlx5: support TOS and TTL fields on E-Switch
This patch adds the type-of-service and time-to-live IP header
fields support on E-Switch. There match pattern for both fields
with masking is added. Also these fields can be set for VXLAN
tunnel encapsulation header.

This issue is critical for some Open VSwitch configuration
on overlayed (tunneled) networks, where the tos field can be
inherited from outer header to inner header.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:30 +01:00
Viacheslav Ovsiienko
9d6d159a3f net/mlx5: add TOS and TTL flower match and tunnel keys
This patch is a preparation for adding the type-of-service and
time-to-live IP header fields support on E-Switch. There are
two types of keys added - one for match pattern, other for
tunnel encapsulation header.

This issue is critical for some Open VSwitch configuration
on overlayed (tunneled) networks, where the tos field can be
inherited from outer header to inner header.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:30 +01:00
Viacheslav Ovsiienko
01925b8c64 net/mlx5: add RHEL-7.2 VXLAN device metadata workaround
RH7.2 with kernel 3.10.0-327 does not support VXLAN
devices metadata and IFLA_VXLAN_COLLECT_METADATA
key is neither defined nor supported. We must specify
VNI parameter, which will be actually ignored by kernel,
applied rules will be processed by mlx5 kernel driver
and the actual VNI from rules will be used.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:30 +01:00
Viacheslav Ovsiienko
8ba325b051 net/mlx5: switch to detached VXLAN network devices
Current design uses the VXLAN virtual devices attached
to outer network interface for decapsulation. Kernel
allows to use non-attached devices, so now we can create
not attached device and use it both for encapsulation
and decapsulation. Devices management becomes simpler,
less VXLAN devices are created and used.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-01-14 17:44:30 +01:00