In vhost datapath, descriptor's length are mostly used in two coherent
operations. First step is used for address translation, second step is
used for memory transaction from guest to host. But the interval between
two steps will give a window for malicious guest, in which can change
descriptor length after vhost calculated buffer size. Thus may lead to
buffer overflow in vhost side. This potential risk can be eliminated by
accessing the descriptor length once.
Fixes: 1be4ebb1c464 ("vhost: support indirect descriptor in mergeable Rx")
Cc: stable@dpdk.org
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes unchecked return value for rte_vhost_get_mem_table(),
which is reported by coverity.
Coverity issue: 364233
Fixes: ca059fa5e290 ("examples/vhost: demonstrate the new generic APIs")
Cc: stable@dpdk.org
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The return value of rte_pci_read_config should be checked.
Coverity issue: 302860
Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
Cc: stable@dpdk.org
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Avoid calling rte_vhost_get_vhost_ring_inflight() and
rte_vhost_get_vring_base_from_inflight() when
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is not set.
Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Add rte_vhost_get_negotiated_protocol_features, which returns a set of
enabled protocol features.
Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch moves vhost_virtqueue struct fields in order
to both optimize packing and move hot fields on the first
cachelines.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
This patch moves the per-virtqueue's dirty logging cache
out of the virtqueue struct, by allocating it dynamically
only when live-migration is enabled.
It saves 8 cachelines in vhost_virtqueue struct.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
This patch removes the "backend" field of the
vhost_virtqueue struct, which is not used by the
library.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
This patch optimizes packing of the virtqueue
struct by moving fields around to fill holes.
Offset field is not used and so can be removed.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
While it is worth clarifying whether the fake mbuf
in virtnet_rx struct is really necessary, it is sure
that it heavily impacts cache usage by being part of
the struct. Indeed, it uses two cachelines, and
requires alignment on a cacheline.
Before this series, it means it took 120 bytes in
virtnet_rx struct:
struct virtnet_rx {
struct virtqueue *vq; /*0 8*/
/* XXX 56 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
struct rte_mbuf fake_mbuf __attribute__((__aligned__(64))); /*64 128*/
/* --- cacheline 3 boundary (192 bytes) --- */
This patch allocates it using malloc in order to optimize
virtnet_rx cache usage and so virtqueue cache usage.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
This patch improves the error path of virtio_init_queue(),
by cleaning in reversing order all resources that have
been allocated.
Suggested-by: Chenbo Xia <chenbo.xia@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
Vrings are part of the virtqueues, so we don't need
to have a pointer to it in Vrings descriptions.
Instead, let's just subtract from its offset to
calculate virtqueue address.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Balazs Nemeth <bnemeth@redhat.com>
The member page_offset is always zero. Having this in the qede_rx_entry
makes it larger than it needs to be and this has cache performance
implications so remove that field. In addition, since qede_rx_entry only
has an rte_mbuf*, remove the definition of qede_rx_entry.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
While handling the current mbuf, pull the next mbuf into the cache. Note
that the last four mbufs pulled into the cache are not handled, but that
doesn't matter.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Ensure that, while ecore_chain_get_cons_idx is running, txq->hw_cons_ptr
is prefetched. This shows a slight performance improvement.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
rte_pktmbuf_free_bulk calls rte_mempool_put_bulk with the number of
pending packets to return to the mempool. In contrast, rte_pktmbuf_free
calls rte_mempool_put that calls rte_mempool_put_bulk with one object.
An important performance related downside of adding one packet at a time
to the mempool is that on each call, the per-core cache pointer needs to
be read from tls while a single rte_mempool_put_bulk only reads from the
tls once.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
The ring txq->sw_tx_ring is managed with txq->sw_tx_cons. As long as
txq->sw_tx_cons is correct, there is no need to check if
txq->sw_tx_ring[idx] is null explicitly.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Calling ecore_chain_get_cons_idx repeatedly is slower than calling it
once and using the result for the remainder of qede_process_tx_compl.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Each sw_tx_ring entry was of type struct qede_tx_entry:
struct qede_tx_entry {
struct rte_mbuf *mbuf;
uint8_t flags;
};
Leaving the unused flags member here has a few performance implications.
First, each qede_tx_entry takes up more memory which has caching
implications as less entries fit in a cache line while multiple entries
are frequently handled in batches. Second, an array of qede_tx_entry
entries is incompatible with existing APIs that expect an array of
rte_mbuf pointers. Consequently, an extra array need to be allocated
before calling such APIs and each entry needs to be copied over.
This patch omits the flags field and replaces the qede_tx_entry entry
by a simple rte_mbuf pointer.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Modification of the 802.1Q Tag Identifier, VXLAN Network
Identifier or GENEVE Network Identifier is not supported.
Reject attempt to modify these fields via the MODIFY_FIELD
action and document this mlx5 driver limitation.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
There is a limitation about copying one header field to another for
the Flow group 0. Such copy action is not allowed there. But setting
a header field with an immediate value is perfectly fine.
Allow the MODIFY_FIELD action on group 0 in case the source field
is an immediate value or a pointer to it.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The MODIFY_FIELD action requires the extended metadata support
in order to manipulate on MARK register. Check if it is supported
and reject the MODIFY_FIELD action if it is not.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Masks that used to modify a packet field must be in a big
endian format. Convert then to BE to ensure proper modification.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Add a validation check to make sure that the specified width
for MODIFY_FIELD RTE action is not bigger than a field size.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
To simplify BlueField HPF representor(vf[-1]) probe, this patch allows
probe it with "sf" syntax: "sf[-1]".
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
In case of kernel bonding device, counter was read from first bonding PF
member.
This patch reads all member PFs and sums to get bond xstats.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
With kernel bonding, there was an error when setting VF MAC address
through representor. The Netlink API requires ifindex of owner PF, not
bonding device ifindex.
Uses owner PF ifindex to modify VF default MAC in case of bonding
device.
Fixes: c21e5facf7d2 ("net/mlx5: use bond index for netdev operations")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Since kernel bonding netdev doesn't provide statistics counter that
reflects all member ports, PMD has to manually summarize counters from
each member ports.
As a preparation, this patch collects bonding member port information
and saves to shared context data.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
To probe representors from different kernel bonding PFs, had to specify
2 separate devargs like this:
-a 03:00.0,representor=pf0vf[0-3] -a 03:00.0,representor=pf1vf[0-3]
This patch supports range or list of PF section in devargs, so the
alternative short devargs of above is:
-a 03:00.0,representor=pf[0-1]vf[0-3]
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
To probe representor on 2nd PF of kernel bonding device, had to specify
PF1 BDF in devarg:
<PF1_BDF>,representor=0
When closing bonding device, all representors had to be closed together
and this implies all representors have to use primary PF of bonding
device. So after probing representor port on 2nd PF, when locating new
probed device using device argument, the filter used 2nd PF as PCI
address and failed to locate new device.
Conflict happened by using current representor devargs:
- Use PCI BDF to specify representor owner PF
- Use PCI BDF to locate probed representor device.
- PMD uses primary PCI BDF as PCI device.
To resolve such conflicts, new representor syntax is introduced here:
<primary BDF>,representor=pfXvfY
All representors must use primary PF as owner PCI device, PMD internally
locate owner PCI address by checking representor "pfX" part. To EAL, all
representors are registered to primary PCI device, the 2nd PF is hidden
to EAL, thus all search should be consistent.
Same to VF representor, HPF (host PF on BlueField) uses same syntax to
probe, example: representor=pf1vf[0-3,-1]
This patch also adds pf index into kernel bonding representor port name:
<BDF>_<ib_name>_representor_pf<X>vf<Y>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
With kernel bonding, representors on second PF are being probed by
devargs:
<primary_bdf>,representor=pf1vf<N>
No need to save primary PF port ID and lookup when probing sibling
ports, revert patch [1]
[1]:
commit e6818853c022 ("net/mlx5: set representor to first PF in bonding mode")
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch adds support for SF representor. Similar to VF representor,
switch port name of SF representor in phys_port_name sysfs key is
"pf<x>sf<y>".
Device representor argument is "representors=sf[list]", list member
could be mix of instance and range. Example:
representors=sf[0,2,4,8-12,-1]
To probe VF representor and SF representor, need to separate into 2
devices:
-a <BDF>,representor=vf[list] -a <BDF>,representor=sf[list]
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
This patch supports representor name parsing for SF.
In sysfs, representor name stored under "phys_port_name" sysfs key,
similar to VF representor, switch port name of SF representor is
"pf<x>sf<y>".
For netlink message, net SF type is supported.
Examples:
pf0sf1
pf0sf[0-3]
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Coverity flags that 'ctx->tunnel' variable is used before
it's checked for NULL. This patch fixes this issue.
Coverity issue: 366201
Fixes: 868d2e342cf3 ("net/mlx5: fix tunnel offload hub multi-thread protection")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
The ICE_RSS_ANY_HEADERS will try to enable outer RSS for
non-tunnel case and inner RSS for tunnel case. This confuse
user.
As we already have ICE_RSS_INNER_HEADER for tunnel case,
So, replace ICE_RSS_ANY_HEADERS with ICE_RSS_OUTER_HEADERS
for all exist flow which only specified the outer pattern.
To enable inner RSS for any tunnel cases, a separated rule
should be enabled.
The patch also remove some unnecessary condition check for GTPU
in base code, as we already can support outer RSS for GTPU.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Xuan Ding <xuan.ding@intel.com>
If one calls ‘rte_eth_dev_rss_reta_update’ with ixgbe before starting
the device (but after setting everything else), then RSS RETA
configuration will be zero after starting the device.
This patch gives a notification if the port not started.
Bugzilla ID: 664
Fixes: 249358424eab ("ixgbe: RSS RETA configuration")
Cc: stable@dpdk.org
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
The function 'ice_is_profile_rule' is defined as 'ice_is_prof_rule' in
base code, which has the exactly same function body.
So remove the 'ice_is_profile_rule', use the 'ice_is_prof_rule' instead.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
According to Intel® AVF spec
(https://www.intel.com/content/dam/
www/public/us/en/documents/product-specifications/
ethernet-adaptive-virtual-function-hardware-spec.pdf)
section 2.2.2.3:
The max segment size(MSS) of TSO should not be set lower than 88.
Fixes: a2b29a7733ef ("net/avf: enable basic Rx Tx")
Cc: stable@dpdk.org
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
Add DEV_RX_OFFLOAD_RSS_HASH flag to the PMD's Rx offload capabilities
for it supports RSS hash delivery.
Fixes: 4f09bc55ac3d ("net/igc: implement device base operations")
Cc: stable@dpdk.org
Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
A new feature requesting additional queues from PF is added in iavf;
before sending VIRTCHNL_OP_REQUEST_QUEUES op code, the offload
capability flag VIRTCHNL_VF_OFFLOAD_REQ_QUEUES will be checked.
And due to DPDK PF is still used by some cases, add this offload
capability flag in i40e PF.
Fixes: cbdbd360f77f ("net/i40e: support AVF basic interface")
Cc: stable@dpdk.org
Signed-off-by: Robin Zhang <robinx.zhang@intel.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Fix pkt_len parsing when DEV_RX_OFFLOAD_KEEP_CRC is set in AVX512 path.
Fixes: 31737f2b66fb ("net/iavf: enable AVX512 for legacy Rx")
Fixes: 6df587028e57 ("net/iavf: enable AVX512 for flexible Rx")
Cc: stable@dpdk.org
Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Tested-by: David Coyle <david.coyle@intel.com>
Implement support for the power management API by implementing a
`get_monitor_addr` function that will return an address of an RX ring's
status bit.
This patch is basically a cut-and-paste of the changes already
committed in ixgbe, i40e and ice drivers in 21.02. This extends
the availability of the power-saving mechanism to the iavf driver,
which is needed for those use-cases using virtual functions.
Patchset where PMD Power Management added in 21.02:
http://patchwork.dpdk.org/project/dpdk/list/?series=14756
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
In i40e NEON vector Rx path, the packet descs processing is incorrect.
This caused wrong packet type been filled in mbuf.
To fix this, when shifting the pktlen field to be 16-bit aligned, it
only needs to process the high 16bit of the packet descs instead of
the high 32bit.
Test Results:
Architecture: arm64
NIC: XL710
Driver: i40e
Package: Ether()/IP()/
Without this patch:
desc_to_ptype_v: ptype = 7 (error)
With this patch:
desc_to_ptype_v: ptype = 23 (correct)
Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Kathleen Capella <kathleen.capella@arm.com>
This patch adds more err info for Tx/Rx descriptor query command.
Fixes: fae9aa717d6c ("app/testpmd: support checking descriptor status")
Cc: stable@dpdk.org
Signed-off-by: Hongbo Zheng <zhenghongbo3@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
At the fail label, there's a statement to set general errno and
error message. However, before the label is reached, a custom
error message can be set by the code which parses actions.
This custom (action-specific) message, when present,
must not be replaced by the general one.
Fixes: 662286ae61d2 ("net/sfc: add actions parsing stub to MAE backend")
Cc: stable@dpdk.org
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Andy Moreton <amoreton@xilinx.com>
Coverity complains that the return value of recvfrom() in the AF_XDP
datapath is not checked. We don't care about the return value because in
the case of an error we still return 0 from the receive function to
indicate no packets were received. So to make Coverity happy we cast the
return to 'void'.
Coverity issue: 369671
Fixes: 63e8989fe5a4 ("net/af_xdp: use recvfrom instead of poll syscall")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Some apps, such as fstack, will use secondary process to access the
memory of eth_dev_ops, and they want to get the info of dev, but hinic
driver does not initialized it when in secondary process.
Fixes: 66f64dd6dc86 ("net/hinic: fix secondary process")
Cc: stable@dpdk.org
Signed-off-by: Guoyang Zhou <zhouguoyang@huawei.com>
Introduce a meson option 'enable_driver_sdk', when true installs internal
driver headers for ethdev. This allows drivers that do not depend on
stable api/abi to be built external to the dpdk source tree.
Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>