The PMD was not reporting the supported RSS capabilities.
Fixes: 2f97422e7759 ("mlx5: support RSS hash update and get")
Cc: stable@dpdk.org
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
It is suggested to use PCI BDF to identify a port for port addition
in OVS-DPDK. While mlx5 has its own naming style: name it by ib dev
name. This breaks the typical OVS DPDK use case and brings more puzzle
to the end users.
To fix it, this patch changes it to use PCI BDF as the name, too.
Also, a postfix " port %u" is added, just in case their might be more
than 1 port associated with a PCI device.
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Parameter action_flag is not used correctly in i40e_flow_parse_rss_action.
Also change it from point type to value type since it is not an output
parameter.
Fixes: ecad87d22383 ("net/i40e: move RSS to flow API")
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Wei Zhao <wei.zhao1@intel.com>
Acked-by: Zhiyong Yang <zhiyong.yang@intel.com>
There are several func calls to rte_zmalloc() which don't have null
pointer check on the return value. And before return, the memory
is not freed. It fixes by adding null pointer check and rte_free().
Fixes: 078259773da9 ("net/i40e: store ethertype filter")
Fixes: 425c3325f0b0 ("net/i40e: store tunnel filter")
Fixes: c50474f31efe ("net/i40e: support tunnel filter to VF")
Fixes: 5c53c82c8174 ("net/i40e: store flow director filter")
Cc: stable@dpdk.org
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: Beilei Xing <beilei.xing@intel.com>
There are several func calls to rte_zmalloc() which don't have null
pointer check for the return value. It fixes that by adding null
pointer check.
Fixes: 22bb13410cb2 ("net/igb: create consistent filter")
Cc: stable@dpdk.org
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Remove some unnecessary explicit type casting, to clean the code.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
bonding immediately marks the incoming eth device as bonded and doesn't
clear this in later error paths. Delay marking the dev until we are
certain that we are going to add this eth device to the bond group.
Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
In case of plugged out device, the fail-safe PMD uses failsafe_rx_burst
function for packet receiving.
This function iterates over the present sub-devices until it
receives a traffic from one of them or they are all cannot receive
packets.
The corrupted code didn't advance the sub-device pointer when the
sub-device was not present and caused to infinite loop.
Advance the sub-device pointer also in plugged-out sub-device case.
Fixes: 8052bbd9d548 ("net/failsafe: improve Rx sub-devices iteration")
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
enic_cq_rx_to_pkt_flags() currently sets checksum good/bad flags only
for IPv4. The hardware actually validates the TCP/UDP checksum of
IPv6 packets too. Set PKT_RX_L4_CKSUM_{GOOD,BAD} accordingly.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
Like most NICs, this hardware (Cisco VIC) also requires partial
checksum in the packet for checksum offload and TSO. So, add
the tx_pkt_prepare handler like other PMDs do.
Technically, VIC has an offload mode that does not require partial
checksum for non-TSO packets. But, it has no such mode for TSO
packets, making tx_pkt_prepare unavoidable.
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
ENIC_CQ_MAX, ENIC_WQ_MAX and others are arbitrary values that
prevent the app from using more queues when they are available on
hardware. Remove them and dynamically allocate vnic_cq and such
arrays to accommodate all available hardware queues.
As a side effect of removing ENIC_CQ_MAX, this commit fixes a segfault
that would happen when the app requests more than 16 CQs, because
enic_set_vnic_res() does not consider ENIC_CQ_MAX. For example, the
following command causes a crash.
testpmd -- --rxq=16 --txq=16
Fixes: ce93d3c36db0 ("net/enic: fix resource check failures when bonding devices")
Cc: stable@dpdk.org
Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com>
Reviewed-by: John Daley <johndale@cisco.com>
When no memory is available on the same numa node than the device, the
initialization of the device fails. However, the use case where the
cores and memory are on a different socket than the device is valid,
even if not optimal.
To fix this issue, this commit introduces an infrastructure to select
the socket on which to allocate the verbs objects based on the ethdev
configuration and the object type, rather than the PCI numa node.
Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
On error, mlx5_dev_start() does not return a negative value
as it is supposed to do. The consequence is that the application
(ex: testpmd) does not notice that the port is not started
and begins the rxtx on an uninitialized port, which crashes.
Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
All multi code should not be handled in exit part of the code but in the
mainline of the function.
Fixes: 0a40a1363a4d ("net/mlx5: fix flow type for allmulti rules")
Cc: stable@dpdk.org
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Check if the security enable bits are not fused before setting
offload capabilities for security.
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add checks during build to ensure that all symbols in the EXPERIMENTAL
version map section have __experimental tags on their definitions, and
enable the warnings needed to announce their use. Also add an
ALLOW_EXPERIMENTAL_APIS define to allow individual libraries and files
to declare the acceptability of experimental api usage
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Append the __rte_experimental tag to api calls appearing in the
EXPERIMENTAL section of their libraries version map
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Polling a new packet is basically sensing the generation bit in a
completion entry. For some processors not having strongly-ordered memory
model, there has to be a memory barrier between reading the generation bit
and other fields of the entry in order to guarantee data is not stale.
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Remove CFLAGS -std=c11 and -pedantic in order to guarantee
a successful vdev_netvsc compilation on old Linux distributions.
Otherwise old GCC compilers may complain as follows:
cc1: error: unrecognized command line option -std=c11
Fixes: 6086ab3bb3d2 ("net/vdev_netvsc: introduce Hyper-V platform driver")
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
eBPF has a graceful approach: it must successfully compile on all Linux
distributions. If a specific kernel cannot support eBPF it will gracefully
refuse the eBPF netlink message sent to it.
The kernel header file linux/bpf.h (if present) on different Linux
distributions may not include all definitions required for TAP
compilation.
In order to guarantee a successful eBPF compilation everywhere all the
required definitions for TAP have been locally added instead of including
file <linux/bpf.h>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
There is time between the physical removal of the device until
sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
applications still don't know about the removal and may call sub-device
control operation which should return an error.
In previous code this error is reported to the application contrary to
fail-safe principle that the app should not be aware of device removal.
Add an removal check in each relevant control command error flow and
prevent an error report to application when the sub-device is removed.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
TAP PMD is required to support RSS queue mapping based on rte_flow API. An
example usage for this requirement is failsafe transparent switching from a
PCI device to TAP device while keep redirecting packets to the same RSS
queues on both devices.
TAP RSS implementation is based on eBPF programs sent to Linux kernel
through BPF system calls and using netlink messages to reference the
programs as part of traffic control commands.
TC uses eBPF programs as classifiers and actions.
eBPF classification: packets marked with an RSS queue will be directed
to this queue using TC with "skbedit" action.
BPF classifiers are downloaded to the kernel once on TAP creation for
each TAP Rx queue.
eBPF action: calculate the Toeplitz RSS hash based on L3 addresses and
L4 ports. Mark the packet with the RSS queue according the resulting
RSS hash, then reclassify the packet.
BPF actions are downloaded to the kernel for each new RSS rule.
TAP eBPF requires Linux version 4.9 configured with BPF. TAP PMD will
successfully compile on systems with old or non-BPF configured kernels but
RSS rules creation on TAP devices will not be successful
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
This commit include BPF API to be used by TAP.
tap_flow_bpf_cls_q() - download to kernel BPF program that classifies
packets to their matching queues
tap_flow_bpf_calc_l3_l4_hash() - download to kernel BPF program that
calculates per packet layer 3 and layer 4 RSS hash
tap_flow_bpf_rss_map_create() - create BPF RSS map for storing RSS
parameters per RSS rule
tap_flow_bpf_update_rss_elem() - update BPF map entry with RSS rule
parameters
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
File tap_bpf_insns.h was added. It includes eBPF bytes code
which corresponds to source file tap_bpf_program.c
(see "net/tap: add eBPF program file").
The bytes code is in the format of C arrays of struct bpf_insn and
was generated from the C file tap_bpf_program.c
1. The C file was compiled via LLVM into an object file in ELF
format as:
clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
-filetype=obj -o <tap_bpf_program.o>
clang version must be 3.7 and above
The C functions are under different ELF sections and are considered
different BPF programs to be downloaded to the kernel
2. Using an external tool the ELF sections are parsed and the C arrays
of struct bpf_insn are generated. Each C array (corresponding to a
different function under an ELF section) is downloaded to the kernel
using an BPF systm call. The external tool that generates the C arrays
will be added in separate commits.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
File tap_bpf_program.c was added with two ELF sections
corresponding to two BPF programs and one BPF map.
Section cls_q - BPF classifier to classify packets to their
corresponding queue after an RSS hash was calculated on the packet
and saved in skb->cb[1]
Section l3_l4 - BPF action to calculate RSS hash on packet
layers 3 and 4
This file is not part of DPDK tree compilation.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
Add a generic TC actions handling for TC actions: "mirred",
"gact", "skbedit". This will be useful when introducing
BPF actions, as it uses TCA_BPF_ACT instead of TCA_FLOWER_ACT
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
This patch reverts:
commit 3a6f2eb8c5c5 ("net/mlx5: fix Memory Region registration")
Although granularity of chunks in a mempool is a cacheline, addresses are
extended to align to page boundary for performance reason in device when
registering a MR (Memory Region). This could make some regions overlap,
then can cause Tx completion error due to incorrect LKEY search. If the
error occurs, the Tx queue will get stuck. It is because buffer address is
compared against aligned addresses for Memory Region. Saving original
addresses of mempool for comparison doesn't create any overlap.
Fixes: b0b093845793 ("net/mlx5: use buffer address for LKEY search")
Fixes: 3a6f2eb8c5c5 ("net/mlx5: fix Memory Region registration")
Cc: stable@dpdk.org
Reported-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Replace strncmp with strncasecmp in i40e_update_customized_ptype
function for compatibility.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
There're new metadata IPV4FRAG and IPV6FRAG in PPP
profile, this patch improves ptype parser to support
IPV4FRAG and IPV6FRAG.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Fail to update SW ptype mapping table when loading
PPP profile, though profile can be loaded successfully.
It will cause fail to parse SW ptype during receiving
packets. This patch fixes this issue.
Fixes: 11556c915a08 ("net/i40e: improve packet type parser")
Cc: stable@dpdk.org
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Using DPDK in Hyper-V VM systems requires vdev_netvsc driver to pair
the NetVSC netdev device with the same MAC address PCI device by
fail-safe PMD.
Add vdev_netvsc custom scan in vdev bus to allow automatic probing in
Hyper-V VM systems unless it was already specified by command line.
Add "ignore" parameter to disable this auto-detection.
Signed-off-by: Matan Azrad <matan@mellanox.com>
This parameter allows specifying any non-NetVSC interface or routed
NetVSC interfaces to use with tap sub-devices for development purposes.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
NetVSC netdevices which are already routed should not be probed because
they are used for management purposes by the HyperV.
prevent routed netvsc devices probing.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
As described in more details in the attached documentation (see patch
contents), this virtual device driver manages NetVSC interfaces in virtual
machines hosted by Hyper-V/Azure platforms.
This driver does not manage traffic nor Ethernet devices directly; it acts
as a thin configuration layer that automatically instantiates and controls
fail-safe PMD instances combining tap and PCI sub-devices, so that each
NetVSC interface is exposed as a single consolidated port to DPDK
applications.
PCI sub-devices being hot-pluggable (e.g. during VM migration),
applications automatically benefit from increased throughput when present
and automatic fallback on NetVSC otherwise without interruption thanks to
fail-safe's hot-plug handling.
Once initialized, the sole job of the vdev_netvsc driver is to regularly
scan for PCI devices to associate with NetVSC interfaces and feed their
addresses to corresponding fail-safe instances.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
This patch lays the groundwork for this driver (draft documentation,
copyright notices, code base skeleton and build system hooks). While it can
be successfully compiled and invoked, it's an empty shell at this stage.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Previous fail-safe code didn't support probed sub-devices capture and
failed when it tried to probe them.
Skip fail-safe sub-device probing when it already was probed.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
rte_free() is not supposed to work with pointers returned by calloc().
Fixes: a0194d828100 ("net/failsafe: add flexible device definition")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
The type_data argument to ef10_rx_qcreate is only used
in builds with EFSYS_OPT_RX_PACKED_STREAM. note this as
an unused argument to avoid warnings in builds without
packed stream support.
Fixes: b749646dade4 ("net/sfc/base: add function to create packed stream RxQ")
Signed-off-by: Andy Moreton <amoreton@solarflare.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
There are several func calls to rte_zmalloc() which don't do null
pointer check on the return value. And before return, the memory is not
freed. Fix it by adding null pointer check and rte_free().
Fixes: 37f9b54bd3cf ("net/dpaa: support Tx and Rx queue setup")
Fixes: 62f53995caaf ("net/dpaa: add frame count based tail drop with CGR")
Cc: stable@dpdk.org
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>