Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
although it is common for all ethdev in all buses.
Replacing pci specific struct with generic device struct and updating
places that are using pci device in a way to get this information from
generic device.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
All Netlink request the PMD will do can also be done by a iproute2 command
line interface, enabling VF behavior configuration without having to modify
the application nor reaching PMD limits (e.g. MAC address number limit).
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
VF devices are not able to receive promisc or allmulti traffic unless it
fully requests it though Netlink. This will cause the request to be
processed by the PF which will handle the request and enable it.
This requires the VF to be trusted by the PF.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
VF devices are not able to receive traffic unless it fully requests it
though Netlink. This will cause the request to be processed by the PF
which will add/remove the MAC address to the VF table if the VF is trusted.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched to
LLC if it isn't inlined. Even though this helps reducing jitter when HW
fetches data by DMA, this can thresh the LLC with evicting precious data.
And if the size of queue is large and there are many queues, this might not
be effective. Also, if application runs on a remote node from the PCIe
link, it may not be helpful and can even cause bad results.
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
According to CQE format:
- l4_hdr_type:
0 - None
1 - TCP header was present in the packet
2 - UDP header was present in the packet
3 - TCP header was present in the packet with Empty
TCP ACK indication. (TCP packet <ACK> flag is set,
and packet carries no data)
4 - TCP header was present in the packet with TCP ACK indication.
(TCP packet <ACK> flag is set, and packet carries data).
A packet should be identified as TCP packet if l4_hdr_type is 1, 3 or 4.
Add corresponding idx of TCP ACK to ptype table.
previous discussion:
https://www.mail-archive.com/users@dpdk.org/msg02980.html
Signed-off-by: Bin Huang <bin.huang@hxt-semitech.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
When linking the mlx glue code libraries using CC, the linker arguments in
LDFLAGS are not prefixed with -Wl. [The EXTRA_LDFLAGS are though.] This
leads to warning messages on build:
clang-5.0: warning: argument unused during compilation: '-e xport-dynamic'
Fix this by checking for $LINK_USING_CC in the Makefiles and prefixing the
LDFLAGS appropriately if set.
Fixes: 27cea11686ff ("net/mlx4: spawn rdma-core dependency plug-in")
Fixes: 59b91bec12c6 ("net/mlx5: spawn rdma-core dependency plug-in")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Before, we were aggregating multiple pages into one memseg, so the
number of memsegs was small. Now, each page gets its own memseg,
so the list of memsegs is huge. To accommodate the new memseg list
size and to keep the under-the-hood workings sane, the memseg list
is now not just a single list, but multiple lists. To be precise,
each hugepage size available on the system gets one or more memseg
lists, per socket.
In order to support dynamic memory allocation, we reserve all
memory in advance (unless we're in 32-bit legacy mode, in which
case we do not preallocate memory). As in, we do an anonymous
mmap() of the entire maximum size of memory per hugepage size, per
socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or
RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the
smaller one), split over multiple lists (which are limited to
either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST
megabytes per list, whichever is the smaller one). There is also
a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly
used for 32-bit targets to limit amounts of preallocated memory,
but can be used to place an upper limit on total amount of VA
memory that can be allocated by DPDK application.
So, for each hugepage size, we get (by default) up to 128G worth
of memory, per socket, split into chunks of up to 32G in size.
The address space is claimed at the start, in eal_common_memory.c.
The actual page allocation code is in eal_memalloc.c (Linux-only),
and largely consists of copied EAL memory init code.
Pages in the list are also indexed by address. That is, in order
to figure out where the page belongs, one can simply look at base
address for a memseg list. Similarly, figuring out IOVA address
of a memzone is a matter of finding the right memseg list, getting
offset and dividing by page size to get the appropriate memseg.
This commit also removes rte_eal_dump_physmem_layout() call,
according to deprecation notice [1], and removes that deprecation
notice as well.
On 32-bit targets due to limited VA space, DPDK will no longer
spread memory to different sockets like before. Instead, it will
(by default) allocate all of the memory on socket where master
lcore is. To override this behavior, --socket-mem must be used.
The rest of the changes are really ripple effects from the memseg
change - heap changes, compile fixes, and rewrites to support
fbarray-backed memseg lists. Due to earlier switch to _walk()
functions, most of the changes are simple fixes, however some
of the _walk() calls were switched to memseg list walk, where
it made sense to do so.
Additionally, we are also switching locks from flock() to fcntl().
Down the line, we will be introducing single-file segments option,
and we cannot use flock() locks to lock parts of the file. Therefore,
we will use fcntl() locks for legacy mem as well, in case someone is
unfortunate enough to accidentally start legacy mem primary process
alongside an already working non-legacy mem-based primary process.
[1] http://dpdk.org/dev/patchwork/patch/34002/
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Since we have support for the strlcpy function in DPDK, replace all
instances where a string is copied using snprintf.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
The RSS key length returned by rte_eth_dev_info_get command was taken
from the
PMD private structure. This structure initialization was done only after
the port configuration.
Considering Mellanox device supports only 40B long RSS key, reporting
the fixed number instead.
Fixes: 29c1d8bb3e79 ("net/mlx5: handle a single RSS hash key for all protocols")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
In some environments it is desirable to have the NIC perform RSS
normally on the packet regardless of the number of queues configured.
The RSS hash result that is stored in the mbuf can then be used by
the application to make decisions about how to distribute workloads
to threads, secondary processes, or even virtual machines if the
application is a virtual switch. This change to the mlx5 driver
aligns with how other drivers in the Intel family work.
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Tested-by: Allain Legacy <allain.legacy@windriver.com>
Remove the second declaration of device_attr [1] inside the loop as well as
the query_device_ex() which has already been done outside of the loop.
[1] https://dpdk.org/ml/archives/dev/2018-March/091744.html
Fixes: 9a761de8ea14 ("net/mlx5: flow counter support")
Cc: stable@dpdk.org
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
TSO should be set if either of the TSO offload flags is requested.
Fixes: dbccb4cddcd2 ("net/mlx5: convert to new Tx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Verbs specification doesn't help to distinguish between packets having an
VLAN and those which do not have, this ends by having flow rule which does
not react as the user expects e.g.
flow create 0 ingress pattern eth / vlan / end action queue index 0 / end
flow create 0 ingress pattern eth / end action queue index 1 / end
are colliding in Verbs definition as in both rule are matching packets with
or without VLAN.
For this reason, the VLAN specification must not be empty, otherwise the
PMD has to refuse it.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Fill the error context in conversion function to provide a better reason on
why it cannot be done to the user.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Packet matching inner and outer flow rules are caught by the first one
added in the device as both flows are configured with the same priority.
To avoid such situation, the inner flow can have an higher priority than
the outer ones as their pattern matching will otherwise collide.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Wait to complete is present to let the application get a correct status
when it requires it, it should not be ignored.
Fixes: e313ef4c2fe8 ("net/mlx5: fix link state on device start")
Fixes: cb8faed7dde8 ("mlx5: support link status update")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This behavior is mixed between what should be handled by the application
and what is under PMD responsibility.
According to DPDK API:
- link_update() should only query the link status [1]
- link_set_{up,down}() should only set the link to the according status [1]
- dev_{start,stop}() should enable/disable traffic reception/emission [2]
On this PMD, the link status is retrieved from the net device associated
owned by the Linux Kernel, it does not means that even when this interface
is down, the PMD cannot send/receive traffic from the NIC those two
information are unrelated, until the physical port is active and has a
link, the PMD can receive/send traffic on the wire.
According to DPDK API, calling the rte_eth_dev_start() even when the Linux
interface link is down is then possible and allowed, as the traffic will
flow between the DPDK application and the Physical port.
This also means that a synchronization between the Linux interface and the
DPDK application remains under the DPDK application responsibility.
To handle such synchronization the application should behave as the
following scheme, to start:
rte_eth_get_link(port_id, &link);
if (link.link_status == ETH_DOWN)
rte_eth_dev_set_link_up(port_id);
rte_eth_dev_start(port_id);
Taking in account the possible returned values for each function.
and to stop:
rte_eth_dev_stop(port_id);
rte_eth_dev_set_link_down(port_id);
The application should also set the LSC interrupt callbacks to catch and
behave accordingly when the administrator set the Linux device down/up.
The same callbacks are called when the link on the medium falls/raise.
[1] https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev_core.h
[2] https://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.h#n1677
Fixes: c7bf62255edf ("net/mlx5: fix handling link status event")
Fixes: e313ef4c2fe8 ("net/mlx5: fix link state on device start")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Kernel version check was introduced in
commit 3a49ffe38a95 ("net/mlx5: fix link status query")
due to a bug fixed by
commit ef09a7fc7620 ("net/mlx5: fix inconsistent link status query")
This patch restore the previous behavior as described in Linux API.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
rdma-core v16 has a bug. The following compilation error occurs on ARM
hosts.
In file included
from drivers/net/mlx5/mlx5_glue.h:16:0,
from drivers/net/mlx5/mlx5_glue.c:11:
/usr/include/infiniband/mlx5dv.h:144:2: error: unknown type name 'off_t'
off_t uar_mmap_offset;
^
As a temporary fix, sys/types.h is included in PMD. This has been fixed in
rdma-core v17. This can be removed when all the Linux distros are shipped
with rdma-core v17 or back-ported fix. As of now, RedHat 7.5 is known to
have rdma-core v16.
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
There is no guarantee that the file won't be removed by external
user/application between the stat() and remove() syscalls, remove() will
fail if the file no longer exists.
Fixes: f8b9a3bad467 ("net/mlx5: install a socket to exchange a file descriptor")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
These functions return int although they are not supposed to fail,
resulting in unnecessary checks in their callers.
Some are returning error where is should be a boolean.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
This change removes the need to distinguish unlocked priv_*() functions
which are therefore renamed using a mlx5_*() prefix for consistency.
At the same time, all functions from mlx5 uses a pointer to the ETH device
instead of the one to the PMD private data.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
In priv struct only the memory region needs to be protected against
concurrent access between the control plane and the data plane.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Some empty lines have been added in the middle of the code without any
reason. This commit removes them.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Replaces all (void)foo; by __rte_unused macro except when variables are
under #if statements.
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
priv_get_num_vfs() was used to help the PMD in prefetching the mbuf in
datapath when the PMD was behaving in VF mode.
This knowledge is no more used.
Fixes: 528a9fbec6de ("net/mlx5: support ConnectX-5 devices")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Glue object files are looked up in RTE_EAL_PMD_PATH by default when set and
should be installed in this directory.
During startup, EAL attempts to load them automatically like other plug-ins
found there. While normally harmless, dlopen() fails when rdma-core is not
installed, EAL interprets this as a fatal error and terminates the
application.
This patch requests glue objects to be installed in a different directory
to prevent their automatic loading by EAL since they are PMD helpers, not
actual DPDK plug-ins.
Fixes: f6242d0655cd ("net/mlx: make rdma-core glue path configurable")
Cc: stable@dpdk.org
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Tested-by: Timothy Redaelli <tredaelli@redhat.com>
The query for the tunnel stateless offloads is wrongly implemented
because of:
1. It was using the device id to query for the offloads.
2. It was using a compilation flag for Verbs which no longer exits.
The main reason was lack of proper API from Verbs.
Fixing the query to use rdma-core API. The capability returned from
rdma-core refer to both Tx and Rx sides.
Eventhough there is a separate cap for GRE and VXLAN, implementation merge
them into a single flag in order to simplify the checks on the data
path.
Fixes: 43e9d9794cde ("net/mlx5: support upstream rdma-core")
Fixes: f5fde5205101 ("net/mlx5: add hardware checksum offload for tunnel packets")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Adding a pattern targeting a single queues wrongly behaves as it is an RSS
request, ending by creating several Verbs flows rules to match the RSS
configuration.
Fixes: 8086cf08b2f0 ("net/mlx5: handle RSS hash configuration in RSS flow")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Several control operations implemented by these PMDs affect netdevices
through sysfs, itself subject to file system permission checks enforced by
the kernel, which limits their use for most purposes to applications
running with root privileges.
Since performing the same operations through ioctl() requires fewer
capabilities (only CAP_NET_ADMIN) and given the remaining operations are
already implemented this way, this patch standardizes on ioctl() and gets
rid of redundant code.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
This is to revert the following commits:
commit da646bd93888 ("net/mlx5: fix all multi verification code position")
commit 0a40a1363a4d ("net/mlx5: fix flow type for allmulti rules")
The last one introduced a bug in the following diff:
@ -1262,6 +1274,7 @@ struct ibv_spec_header {
eth.val.ether_type &= eth.mask.ether_type;
}
mlx5_flow_create_copy(parser, ð, eth_size);
+ parser->allmulti = eth.val.dst_mac[0] & 1;
return 0;
}
As broadcast rules will be considered of type allmulti as well.
The patch was originally intended to enable VF to receive all multicast
traffic by using the IBV_FLOW_ATTR_MC_DEFAULT flow type.
Since the support was removed from the kernel there is no point with
fixing this issue, hence the revert.
Fixes: da646bd93888 ("net/mlx5: fix all multi verification code position")
Fixes: 0a40a1363a4d ("net/mlx5: fix flow type for allmulti rules")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
This patch fixed primary socket assertion error during close on a device
that failed to start.
Fixes: f8b9a3bad467 ("net/mlx5: install a socket to exchange a file descriptor")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Neither upstream kernel nor MLNX_OFED support such filter.
There is no point announcing this feature.
Reverts commit 0fb2c9842b20 ("net/mlx5: support IPv4 time-to-live filter")
Fixes: 0fb2c9842b20 ("net/mlx5: support IPv4 time-to-live filter")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
priv_tx_uar_remap() is wrongly considering the queue is already configured
and thus present in the queue array of the device.
Fixes: f8b9a3bad467 ("net/mlx5: install a socket to exchange a file descriptor")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
An RSS configuration without a key is valid according to the
rte_eth_rss_conf API definition.
Fixes: 8086cf08b2f0 ("net/mlx5: handle RSS hash configuration in RSS flow")
Cc: stable@dpdk.org
Reported-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Since rdma-core glue libraries are intrinsically tied to their respective
PMDs and used as internal plug-ins, their presence in the default search
path among other system libraries for the dynamic linker is not necessarily
desired.
This commit enables their installation and subsequent look-up at run time
in RTE_EAL_PMD_PATH if configured to a nonempty string. This path can also
be overridden by environment variables MLX[45]_GLUE_PATH.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
When built as separate objects, these libraries do not have unique names.
Since they do not maintain a stable ABI, loading an incompatible library
may result in a crash (e.g. in case multiple versions are installed).
This patch addresses the above by versioning glue libraries, both on the
file system (version suffix) and by comparing a dedicated version field
member in glue structures.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>