rte_ethdev_core.h created. Internal data structures are moved here.
These structures are mostly intended to be used by drivers, but they
need to be in the public header file because of the inline functions
in the ethdev.h header, and those inline functions are preferred to
kept because of the performance concerns.
The accessibility of the data structures are not changed, only logically
grouped to show that they are not intended to be used by applications.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.
There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.
More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
There is time between the physical removal of the device until
sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
applications still don't know about the removal and may call sub-device
control operation which should return an error.
In previous code this error is reported to the application contrary to
fail-safe principle that the app should not be aware of device removal.
Add an removal check in each relevant control command error flow and
prevent an error report to application when the sub-device is removed.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
rte_eth_dev_is_removed API was added to detect a device removal
synchronously.
When a device removal occurs during flow command execution, many
different errors can be reported to the user.
Adjust all flow APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
rte_eth_dev_is_removed API was added to detect a device removal
synchronously.
When a device removal occurs during control command execution, many
different errors can be reported to the user.
Adjust all ethdev APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
There is time between the physical removal of the device until PMDs get
a RMV interrupt. At this time DPDK PMDs and applications still don't
know about the removal.
Current removal detection is achieved only by registration to device RMV
event and the notification comes asynchronously. So, there is no option
to detect a device removal synchronously.
Applications and other DPDK entities may want to check a device removal
synchronously and to take an immediate decision accordingly.
Add new dev op called is_removed to allow DPDK entities to check an
Ethernet device removal status immediately.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
TAP PMD is required to support RSS queue mapping based on rte_flow API. An
example usage for this requirement is failsafe transparent switching from a
PCI device to TAP device while keep redirecting packets to the same RSS
queues on both devices.
TAP RSS implementation is based on eBPF programs sent to Linux kernel
through BPF system calls and using netlink messages to reference the
programs as part of traffic control commands.
TC uses eBPF programs as classifiers and actions.
eBPF classification: packets marked with an RSS queue will be directed
to this queue using TC with "skbedit" action.
BPF classifiers are downloaded to the kernel once on TAP creation for
each TAP Rx queue.
eBPF action: calculate the Toeplitz RSS hash based on L3 addresses and
L4 ports. Mark the packet with the RSS queue according the resulting
RSS hash, then reclassify the packet.
BPF actions are downloaded to the kernel for each new RSS rule.
TAP eBPF requires Linux version 4.9 configured with BPF. TAP PMD will
successfully compile on systems with old or non-BPF configured kernels but
RSS rules creation on TAP devices will not be successful
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
This commit include BPF API to be used by TAP.
tap_flow_bpf_cls_q() - download to kernel BPF program that classifies
packets to their matching queues
tap_flow_bpf_calc_l3_l4_hash() - download to kernel BPF program that
calculates per packet layer 3 and layer 4 RSS hash
tap_flow_bpf_rss_map_create() - create BPF RSS map for storing RSS
parameters per RSS rule
tap_flow_bpf_update_rss_elem() - update BPF map entry with RSS rule
parameters
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
File tap_bpf_insns.h was added. It includes eBPF bytes code
which corresponds to source file tap_bpf_program.c
(see "net/tap: add eBPF program file").
The bytes code is in the format of C arrays of struct bpf_insn and
was generated from the C file tap_bpf_program.c
1. The C file was compiled via LLVM into an object file in ELF
format as:
clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
-filetype=obj -o <tap_bpf_program.o>
clang version must be 3.7 and above
The C functions are under different ELF sections and are considered
different BPF programs to be downloaded to the kernel
2. Using an external tool the ELF sections are parsed and the C arrays
of struct bpf_insn are generated. Each C array (corresponding to a
different function under an ELF section) is downloaded to the kernel
using an BPF systm call. The external tool that generates the C arrays
will be added in separate commits.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
File tap_bpf_program.c was added with two ELF sections
corresponding to two BPF programs and one BPF map.
Section cls_q - BPF classifier to classify packets to their
corresponding queue after an RSS hash was calculated on the packet
and saved in skb->cb[1]
Section l3_l4 - BPF action to calculate RSS hash on packet
layers 3 and 4
This file is not part of DPDK tree compilation.
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
Add a generic TC actions handling for TC actions: "mirred",
"gact", "skbedit". This will be useful when introducing
BPF actions, as it uses TCA_BPF_ACT instead of TCA_FLOWER_ACT
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>
This patch reverts:
commit 3a6f2eb8c5 ("net/mlx5: fix Memory Region registration")
Although granularity of chunks in a mempool is a cacheline, addresses are
extended to align to page boundary for performance reason in device when
registering a MR (Memory Region). This could make some regions overlap,
then can cause Tx completion error due to incorrect LKEY search. If the
error occurs, the Tx queue will get stuck. It is because buffer address is
compared against aligned addresses for Memory Region. Saving original
addresses of mempool for comparison doesn't create any overlap.
Fixes: b0b0938457 ("net/mlx5: use buffer address for LKEY search")
Fixes: 3a6f2eb8c5 ("net/mlx5: fix Memory Region registration")
Cc: stable@dpdk.org
Reported-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Xueming Li <xuemingl@mellanox.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Replace strncmp with strncasecmp in i40e_update_customized_ptype
function for compatibility.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
There're new metadata IPV4FRAG and IPV6FRAG in PPP
profile, this patch improves ptype parser to support
IPV4FRAG and IPV6FRAG.
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Fail to update SW ptype mapping table when loading
PPP profile, though profile can be loaded successfully.
It will cause fail to parse SW ptype during receiving
packets. This patch fixes this issue.
Fixes: 11556c915a ("net/i40e: improve packet type parser")
Cc: stable@dpdk.org
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Using DPDK in Hyper-V VM systems requires vdev_netvsc driver to pair
the NetVSC netdev device with the same MAC address PCI device by
fail-safe PMD.
Add vdev_netvsc custom scan in vdev bus to allow automatic probing in
Hyper-V VM systems unless it was already specified by command line.
Add "ignore" parameter to disable this auto-detection.
Signed-off-by: Matan Azrad <matan@mellanox.com>
This parameter allows specifying any non-NetVSC interface or routed
NetVSC interfaces to use with tap sub-devices for development purposes.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
NetVSC netdevices which are already routed should not be probed because
they are used for management purposes by the HyperV.
prevent routed netvsc devices probing.
Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
As described in more details in the attached documentation (see patch
contents), this virtual device driver manages NetVSC interfaces in virtual
machines hosted by Hyper-V/Azure platforms.
This driver does not manage traffic nor Ethernet devices directly; it acts
as a thin configuration layer that automatically instantiates and controls
fail-safe PMD instances combining tap and PCI sub-devices, so that each
NetVSC interface is exposed as a single consolidated port to DPDK
applications.
PCI sub-devices being hot-pluggable (e.g. during VM migration),
applications automatically benefit from increased throughput when present
and automatic fallback on NetVSC otherwise without interruption thanks to
fail-safe's hot-plug handling.
Once initialized, the sole job of the vdev_netvsc driver is to regularly
scan for PCI devices to associate with NetVSC interfaces and feed their
addresses to corresponding fail-safe instances.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
This patch lays the groundwork for this driver (draft documentation,
copyright notices, code base skeleton and build system hooks). While it can
be successfully compiled and invoked, it's an empty shell at this stage.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Matan Azrad <matan@mellanox.com>
Previous fail-safe code didn't support probed sub-devices capture and
failed when it tried to probe them.
Skip fail-safe sub-device probing when it already was probed.
Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
rte_free() is not supposed to work with pointers returned by calloc().
Fixes: a0194d8281 ("net/failsafe: add flexible device definition")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Added missing doxygen for rte_eth_dev_get_sec_ctx
and moved the declaration to the proper place.
Fixes: 4c270218aa ("ethdev: support security APIs")
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The type_data argument to ef10_rx_qcreate is only used
in builds with EFSYS_OPT_RX_PACKED_STREAM. note this as
an unused argument to avoid warnings in builds without
packed stream support.
Fixes: b749646dad ("net/sfc/base: add function to create packed stream RxQ")
Signed-off-by: Andy Moreton <amoreton@solarflare.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
fix one typo and a grammatical mistake.
Fixes: b0152b1b40 ("doc: update bonding")
Cc: stable@dpdk.org
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
There are several func calls to rte_zmalloc() which don't do null
pointer check on the return value. And before return, the memory is not
freed. Fix it by adding null pointer check and rte_free().
Fixes: 37f9b54bd3 ("net/dpaa: support Tx and Rx queue setup")
Fixes: 62f53995ca ("net/dpaa: add frame count based tail drop with CGR")
Cc: stable@dpdk.org
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>
A typo in makefile that makes the RX/TX vector code
not to be compiled.
Fixes: 319c421f38 ("net/avf: enable SSE Rx Tx")
Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Even though link of a port gets down, device still can receive traffic.
That is the reason why mlx5_set_link_up/down() switches rx/tx_pkt_burst().
However, if link gets down by an external command (e.g. ifconfig), it isn't
effective. It is better to change burst functions when link status change
is detected.
Fixes: 62072098b5 ("mlx5: support setting link up or down")
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.
This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.
These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.
Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.
See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
Cc: stable@dpdk.org
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
dequeue zero copy change buf_addr and buf_iova of mbuf, and return
to mbuf pool without restore them, it breaks vm memory if others allocate
mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova.
Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
When live migration is done, for the backup VM, either the virtio
frontend or the vhost backend needs to send out gratuitous RARP packet
to announce its new network location.
This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support live
migration scenario where the vhost backend doesn't have the ability to
generate RARP packet.
Brief introduction of the work flow:
1. QEMU finishes live migration, pokes the backup VM with an interrupt.
2. Virtio interrupt handler reads out the interrupt status value, and
realizes it needs to send out RARP packet to announce its location.
3. Pause device to stop worker thread touching the queues.
4. Inject a RARP packet into a Tx Queue.
5. Ack the interrupt via control queue.
6. Resume device to continue packet processing.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
Due to a mistake operation from me, older version (v10) was merged to
master branch. It's the v11 should be applied. However, the master branch
is not rebase-able. Thus, this patch is made, from the diff between v10
and v11.
The diffs are:
- Add check for parameter and tailroom in rte_net_make_rarp_packet
- Allocate mbuf in rte_net_make_rarp_packet
Besides that, a link error is fixed when shared lib is enabled.
Fixes: 45ae05df82 ("net: add a helper for making RARP packet")
Fixes: c3ffdba0e8 ("vhost: use API to make RARP packet")
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform
deep copy, which lead to 1) zmbuf list not valid 2) remote memory access.
This patch is to re-initlize the zmbuf list and also do the deep copy.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>
This patch removes all references to old-style offload API
replacing them with new offload flags.
Signed-off-by: Maciej Czekaj <maciej.czekaj@caviumnetworks.com>
Since DPDK restores ether address configuration after device
is started it is safe to add ether address to uninitialized port (ppio).
Fixes: c0511a8f74 ("net/mrvl: check if ppio is initialized")
Cc: stable@dpdk.org
Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
DPDK updates MTU once mtu_set() callback returns success.
Since PMD changes port's MTU to dev->mtu every time device is
started it is safe to call mtu_set() before MUSDK ppio was initialized.
Fixes: c0511a8f74 ("net/mrvl: check if ppio is initialized")
Cc: stable@dpdk.org
Signed-off-by: Tomasz Duszynski <tdu@semihalf.com>
Ethdev Tx offloads API has changed since:
commit cba7f53b71 ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.
The code which fills in txq_flags in default_txconf is preserved
because rte_eth_dev_info_get() lacks conversion between offloads
and txq_flags fields which means that a legacy application which
relies on default_txconf will fail to configure Tx queues in the
case when some bits in txq_flags are mandatory.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The patch adds a separate function to report supported
Tx capabilities because this function will be required
in more places across the code in the upcoming patches.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Ethdev Rx offloads API has changed since:
commit ce17eddefc ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The patch adds a separate function to report supported
Rx capabilities because this function will be required
in more places across the code in the upcoming patches.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Commonly, drivers converted to the new offload API
may need to log unsupported offloads as a response
to wrong settings. From this perspective, it would
be convenient to have generic functions to look up
offload names. The patch adds such a helper for Tx.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Commonly, drivers converted to the new offload API
may need to log unsupported offloads as a response
to wrong settings. From this perspective, it would
be convenient to have generic functions to look up
offload names. The patch adds such a helper for Rx.
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
RSS is a local variable with address which is never NULL.
Fixes: d77d07391d ("net/sfc: support flow API RSS action")
Cc: stable@dpdk.org
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
The rte_flow is already filled in with zeros in the
case of create. So memset() with zeros is needed only
in validation.
Fixes: a9825ccf5b ("net/sfc: support flow API filters")
Cc: stable@dpdk.org
Signed-off-by: Roman Zhukov <roman.zhukov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Ethdev Rx offloads API has changed since:
commit ce17eddefc ("ethdev: introduce Rx queue offloads API")
This commit adds support for the new Rx offloads API.
Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Acked-by: Pascal Mazon <pascal.mazon@6wind.com>