2123 Commits

Author SHA1 Message Date
Jiawen Wu
79f3128d4d net/ngbe: support scattered Rx
Add scattered Rx function to support receiving segmented mbufs.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2021-10-30 00:53:19 +02:00
Jiawen Wu
f6aef1dacf net/ngbe: support packet type query
Add packet type macro definition and convert ptype to ptid.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
2021-10-30 00:53:19 +02:00
Miao Li
327fcd2d38 net/vhost: support power monitor
According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and vhost_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-10-29 12:32:29 +02:00
Miao Li
34fd4373ce vhost: add power monitor API
This commit defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the
address to monitor, the expected value, the mask to extract value read
from 'addr', the value size of monitor address, the match flag used to
distinguish the value used to match something or not match something.

Vhost driver can use these information to fill rte_power_monitor_cond.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-10-29 12:32:29 +02:00
Miao Li
64ac7e08f6 net/virtio: support power monitor
According to current semantics of power monitor, this commit adds a
callback function to decide whether aborts the sleep by checking
current value against the expected value and virtio_get_monitor_addr
to provide address to monitor. When no packet come in, the value of
address will not be changed and the running core will sleep. Once
packets arrive, the value of address will be changed and the running
core will wakeup.

Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-10-29 12:32:29 +02:00
Maxime Coquelin
0c9d662070 net/virtio: support RSS
Provide the capability to update the hash key, hash types
and RETA table on the fly (without needing to stop/start
the device). However, the key length and the number of RETA
entries are fixed to 40B and 128 entries respectively. This
is done in order to simplify the design, but may be
revisited later as the Virtio spec provides this
flexibility.

Note that only VIRTIO_NET_F_RSS support is implemented,
VIRTIO_NET_F_HASH_REPORT, which would enable reporting the
packet RSS hash calculated by the device into mbuf.rss, is
not yet supported.

Regarding the default RSS configuration, it has been
chosen to use the default Intel ixgbe key as default key,
and default RETA is a simple modulo between the hash and
the number of Rx queues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
2021-10-29 11:23:10 +02:00
Radu Nicolau
6bc987ecb8 net/iavf: support IPsec inline crypto
Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and in conjunction with TSO for UDP and TCP flows.
Implement support for rte_security packet metadata

Add definition for IPsec descriptors, extend support for offload
in data and context descriptor to support

Add support to virtual channel mailbox for IPsec Crypto request
operations. IPsec Crypto requests receive an initial acknowledgment
from physical function driver of receipt of request and then an
asynchronous response with success/failure of request including any
response data.

Add enhanced descriptor debugging

Refactor of scalar tx burst function to support integration of offload

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
2021-10-29 04:22:04 +02:00
Rongwei Liu
7299ab6822 net/mlx5: support socket direct mode bonding
In socket direct mode, it's possible to bind any two (maybe four
in future) PCIe devices with IDs like xxxx:xx:xx.x and
yyyy:yy:yy.y. Bonding member interfaces are unnecessary to have
the same PCIe domain/bus/device ID anymore,

Kernel driver uses "system_image_guid" to identify if devices can
be bound together or not. Sysfs "phys_switch_id" is used to get
"system_image_guid" of each network interface.

OFED 5.4+ is required to support "phys_switch_id".

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
2021-10-26 13:24:20 +02:00
Vladimir Medvedkin
11c5b9b51a fib: add RIB extension size parameter
This patch adds a new parameter to the FIB configuration to specify
the size of the extension for internal RIB structure.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Tested-by: Conor Walsh <conor.walsh@intel.com>
2021-11-04 12:38:03 +01:00
Vladimir Medvedkin
4fd8c4cb0d hash: add new Toeplitz hash implementation
This patch add a new Toeplitz hash implementation using
Galios Fields New Instructions (GFNI).

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-11-04 11:19:10 +01:00
Ady Agbarih
9fa82d287f regex/mlx5: move RXP to CrSpace
Add patch for programming the regex database through ROF file,
using the firmware instead of manually through the software.
No need to setup the DB anymore, the regex-daemon is responsible
for that always.
In the new flow the regex driver only has to program ROF rules
by using set params DevX cmd, requires ROF mkey creation.
The rules file has to be read into 4KB aligned memory.

Signed-off-by: Ady Agbarih <adypodoman@gmail.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-11-03 23:14:48 +01:00
Zhihong Peng
6e0290250d build: enable AddressSanitizer
AddressSanitizer [1] a.k.a. ASan is a widely-used debugging tool to
detect memory access errors.
It helps to detect issues like use-after-free, various kinds of buffer
overruns in C/C++ programs, and other similar errors, as well as
printing out detailed debug information whenever an error is detected.

ASan is integrated with gcc and clang and can be enabled via a meson
option: -Db_sanitize=address
See the documentation for details (especially regarding clang).

Enabling ASan has an impact on performance since additional checks are
added to generated binaries.

Enabling ASan with Windows is currently not supported in DPDK.

1: https://github.com/google/sanitizers/wiki/AddressSanitizer

Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2021-10-29 15:25:34 +02:00
Honnappa Nagarahalli
f6c6c686f1 eal: remove FINISHED lcore state
FINISHED state seems to be used to indicate that the worker's update
of the 'state' is not visible to other threads. There seems to be no
requirement to have such a state.

Since the FINISHED state is removed, the API rte_eal_wait_lcore
is updated to always return the status of the last function that
ran in the worker core.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
2021-10-25 18:20:59 +02:00
Olivier Matz
daa02b5cdd mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-24 13:37:43 +02:00
Akhil Goyal
92cb130919 cryptodev: move device-specific structures
The device specific structures - rte_cryptodev
and rte_cryptodev_data are moved to cryptodev_pmd.h
to hide it from the applications.

Signed-off-by: Akhil Goyal <gakhil@marvell.com>
Tested-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2021-10-20 15:33:16 +02:00
Kai Ji
418c1a5499 test/crypto: enable chacha_poly PMD
An autotest is added for the new chacha20_poly1305 PMD.
A new test case is also added for SGL test.

Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 15:33:16 +02:00
Kai Ji
f166628854 crypto/ipsec_mb: add chacha_poly PMD
Add in new chacha20_poly1305 PMD to the ipsec_mb framework.

Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 15:33:16 +02:00
Piotr Bronowski
cde8df1bda crypto/ipsec_mb: move zuc PMD
This patch removes the crypto/zuc folder and gathers all zuc PMD
implementation specific details into two files,
pmd_zuc.c and pmd_zuc_priv.h in the crypto/ipsec_mb folder.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 15:32:36 +02:00
Piotr Bronowski
5208d68d30 crypto/ipsec_mb: support snow3g digest appended ops
This patch enables out-of-place auth-cipher operations where
digest should be encrypted along with the rest of raw data.
It also adds support for partially encrypted digest when using
auth-cipher operations.

Signed-off-by: Damian Nowak <damianx.nowak@intel.com>
Signed-off-by: Kai Ji <kai.ji@intel.com>
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Piotr Bronowski
4f1cfda59a crypto/ipsec_mb: move snow3g PMD
This patch removes the crypto/snow3g folder and gathers all snow3g PMD
implementation specific details into a single file,
pmd_snow3g.c in the crypto/ipsec_mb folder.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Piotr Bronowski
bc9ef81c42 crypto/ipsec_mb: move kasumi PMD
This patch removes the crypto/kasumi folder and gathers all kasumi PMD
implementation specific details into a single file,
pmd_kasumi.c in the crypto/ipsec_mb folder.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Piotr Bronowski
746825e5c0 crypto/ipsec_mb: move aesni_gcm PMD
This patch removes the crypto/aesni_gcm folder and gathers all
aesni-gcm PMD implementation specific details into a single file,
pmd_aesni_gcm.c in the crypto/ipsec_mb folder.
A redundant check for iv length is removed.

GCM ops are stored in the queue pair for multi process support, they
are updated during queue pair setup for both primary and secondary
processes.

GCM ops are also set per lcore for the CPU crypto mode.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Pablo de Lara
8c835018de crypto/ipsec_mb: support ZUC-256 for aesni_mb
Add support for ZUC-EEA3-256 and ZUC-EIA3-256.
Only 4-byte tags supported for now.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Piotr Bronowski
918fd2f146 crypto/ipsec_mb: move aesni_mb PMD
This patch removes the crypto/aesni_mb folder and gathers all
aesni-mb PMD implementation specific details into a single file,
pmd_aesni_mb.c in crypto/ipsec_mb.

Now that intel-ipsec-mb v1.0 is the minimum supported version, old
macros can be replaced with the newer macros supported by this version.

Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Ciara Power
72a169278a crypto/ipsec_mb: support multi-process
The ipsec_mb SW PMD now has multiprocess support.
The queue-pair IMB_MGR is stored in a memzone instead of being allocated
externally by the Intel IPSec MB library, when v1.1 is used.
If v1.0 is used, multi process is not supported, and allocation is
done as before.
The secondary process needs to reconfigure the queue-pair to allow for
IMB_MGR function pointers be updated.

Intel IPsec MB library version 1.1 is required for this support.

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Fan Zhang
c75542ae42 crypto/ipsec_mb: introduce IPsec_mb framework
This patch introduces the new framework to share common code between
the SW crypto PMDs that depend on the intel-ipsec-mb library.
This change helps to reduce future effort on the code maintenance and
feature updates.

The PMDs that will be added to this framework in subsequent patches are:
  - AESNI MB
  - AESNI GCM
  - CHACHA20_POLY1305
  - KASUMI
  - SNOW3G
  - ZUC

The use of these PMDs will not change, they will still be supported for
x86, and will use the same EAL args as before.

The minimum required version for the intel-ipsec-mb library is now v1.0.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Akhil Goyal <gakhil@marvell.com>
2021-10-20 12:06:01 +02:00
Ferruh Yigit
295968d174 ethdev: add namespace
Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.

All internal components switched to using new names.

Syntax fixed on lines that this patch touches.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
2021-10-22 18:15:38 +02:00
Xueming Li
dd22740cc2 ethdev: introduce shared Rx queue
In current DPDK framework, each Rx queue is pre-loaded with mbufs to
save incoming packets. For some PMDs, when number of representors scale
out in a switch domain, the memory consumption became significant.
Polling all ports also leads to high cache miss, high latency and low
throughput.

This patch introduces shared Rx queue. Ports in same Rx domain and
switch domain could share Rx queue set by specifying non-zero sharing
group in Rx queue configuration.

Shared Rx queue is identified by share_rxq field of Rx queue
configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using
same shared Rx queue ID.

No special API is defined to receive packets from shared Rx queue.
Polling any member port of a shared Rx queue receives packets of that
queue for all member ports, port_id is identified by mbuf->port. PMD is
responsible to resolve shared Rx queue from device and queue data.

Shared Rx queue must be polled in same thread or core, polling a queue
ID of any member port is essentially same.

Multiple share groups are supported. PMD should support mixed
configuration by allowing multiple share groups and non-shared Rx queue
on one port.

Example grouping and polling model to reflect service priority:
 Group1, 2 shared Rx queues per port: PF, rep0, rep1
 Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127
 Core0: poll PF queue0
 Core1: poll PF queue1
 Core2: poll rep2 queue0

PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE.

PMD is responsible for shared Rx queue consistency checks to avoid
member port's configuration contradict each other.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
2021-10-22 00:08:50 +02:00
William Tu
d1c7029a52 net/e1000: build on Windows
This patch enables building the e1000 driver for Windows.
I tested using two Windows VM on top of VMware Fusion,
creating two e1000 devices with device ID 0x10D3 (8274L),
verifying rx/tx works correctly using dpdk-testpmd.exe
rxonly and txonly mode.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Pallavi Kadam <pallavi.kadam@intel.com>
2021-10-21 04:58:40 +02:00
Jie Wang
f30157d988 net/iavf: support PPPoL2TPv2oUDP RSS Hash
Add support for PPP over L2TPv2 over UDP protocol RSS Hash based
on inner IP src/dst address and TCP/UDP src/dst port.

Patterns are listed below:
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)/udp
eth/ipv4(6)/udp/l2tpv2/ppp/ipv4(6)/tcp

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 14:15:59 +02:00
Jie Wang
3a929df1f2 ethdev: support L2TPv2 and PPP procotol
Added flow pattern items and header formats of L2TPv2 and PPP.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Signed-off-by: Jie Wang <jie1x.wang@intel.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2021-10-21 14:15:59 +02:00
Zhihong Peng
6ad06203a5 cmdline: free on exit
Malloc cl in the cmdline_stdin_new function, so release in the
cmdline_stdin_exit function is logical, so that cl will not be
released alone.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Reviewed-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Zhihong Peng <zhihongx.peng@intel.com>
2021-10-22 23:32:00 +02:00
Dmitry Kozlyuk
f8f8dc2890 cmdline: make struct rdline opaque
Hide struct rdline definition and some RDLINE_* constants in order
to be able to change internal buffer sizes transparently to the user.
Add new functions:

* rdline_new(): allocate and initialize struct rdline.
  This function replaces rdline_init() and takes an extra parameter:
  opaque user data for the callbacks.
* rdline_free(): deallocate struct rdline.
* rdline_get_history_buffer_size(): for use in tests.
* rdline_get_opaque(): to obtain user data in callback functions.

Remove rdline_init() function from library headers and export list,
because using it requires the knowledge of sizeof(struct rdline).

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2021-10-22 23:23:45 +02:00
Dmitry Kozlyuk
f43809d28c cmdline: make struct cmdline opaque
Remove the definition of `struct cmdline` from public header.
Deprecation notice:
https://mails.dpdk.org/archives/dev/2020-September/183310.html

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
2021-10-22 22:44:18 +02:00
Conor Walsh
a4d11e0386 raw/ioat: deprecate rawdev driver
Deprecate the rawdev IOAT driver as both IOAT and IDXD drivers have
moved to dmadev.

Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2021-10-22 22:40:59 +02:00
Conor Walsh
866e46bcd8 dma/ioat: add device probing and removal
Add the basic device probe/remove skeleton code and initial documentation
for new IOAT DMA driver. Maintainers update is also included in this
patch.

Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
2021-10-22 22:40:59 +02:00
Kevin Laatz
e33ad06eae dma/idxd: add skeleton for VFIO based DSA device
Add the basic device probe/remove skeleton code for DSA device bound to
the vfio pci driver. Relevant documentation and MAINTAINERS update also
included.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
2021-10-22 22:40:58 +02:00
Stephen Hemminger
cbb44143be app/dumpcap: add new packet capture application
This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2021-10-22 22:40:58 +02:00
Stephen Hemminger
10f726efe2 pdump: support pcapng and filtering
This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-22 22:07:48 +02:00
Stephen Hemminger
8d23ce8f5e pcapng: add new library for writing pcapng files
This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-22 17:19:07 +02:00
Pavan Nikhilesh
f3f3a91788 eventdev/timer: move adapters memory to hugepage
Move memory used by timer adapters to hugepage.
Allocate memory on the first adapter create or lookup to address
both primary and secondary process usecases.
This will prevent TLB misses if any and aligns to memory structure
of other subsystems.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:16:00 +02:00
Pavan Nikhilesh
1dcd67ba1e eventdev/timer: rearrange struct fields
Rearrange fields in rte_event_timer data structure to remove holes.
Also, remove use of volatile from rte_event_timer.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
d35e61322d eventdev: move inline APIs into separate structure
Move fastpath inline function pointers from rte_eventdev into a
separate structure accessed via a flat array.
The intention is to make rte_eventdev and related structures private
to avoid future API/ABI breakages.`

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2021-10-21 10:14:50 +02:00
Ganapati Kundapura
814d017093 eventdev/eth_rx: support telemetry
Added telemetry callbacks to get Rx adapter stats, reset stats and
to get Rx queue config information.

Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Pavan Nikhilesh
929ebdd543 eventdev/eth_rx: simplify event vector config
Include vector configuration into the structure
``rte_event_eth_rx_adapter_queue_conf`` that is used to configure
Rx adapter ethernet device Rx queue parameters.
This simplifies event vector configuration as it avoids splitting
configuration per Rx queue.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-21 10:14:50 +02:00
Viacheslav Ovsiienko
dc4d860e8a ethdev: introduce configurable flexible item
1. Introduction and Retrospective

Nowadays the networks are evolving fast and wide, the network
structures are getting more and more complicated, the new
application areas are emerging. To address these challenges
the new network protocols are continuously being developed,
considered by technical communities, adopted by industry and,
eventually implemented in hardware and software. The DPDK
framework follows the common trends and if we bother
to glance at the RTE Flow API header we see the multiple
new items were introduced during the last years since
the initial release.

The new protocol adoption and implementation process is
not straightforward and takes time, the new protocol passes
development, consideration, adoption, and implementation
phases. The industry tries to mitigate and address the
forthcoming network protocols, for example, many hardware
vendors are implementing flexible and configurable network
protocol parsers. As DPDK developers, could we anticipate
the near future in the same fashion and introduce the similar
flexibility in RTE Flow API?

Let's check what we already have merged in our project, and
we see the nice raw item (rte_flow_item_raw). At the first
glance, it looks superior and we can try to implement a flow
matching on the header of some relatively new tunnel protocol,
say on the GENEVE header with variable length options. And,
under further consideration, we run into the raw item
limitations:

- only fixed size network header can be represented
- the entire network header pattern of fixed format
  (header field offsets are fixed) must be provided
- the search for patterns is not robust (the wrong matches
  might be triggered), and actually is not supported
  by existing PMDs
- no explicitly specified relations with preceding
  and following items
- no tunnel hint support

As the result, implementing the support for tunnel protocols
like aforementioned GENEVE with variable extra protocol option
with flow raw item becomes very complicated and would require
multiple flows and multiple raw items chained in the same
flow (by the way, there is no support found for chained raw
items in implemented drivers).

This RFC introduces the dedicated flex item (rte_flow_item_flex)
to handle matches with existing and new network protocol headers
in a unified fashion.

2. Flex Item Life Cycle

Let's assume there are the requirements to support the new
network protocol with RTE Flows. What is given within protocol
specification:

  - header format
  - header length, (can be variable, depending on options)
  - potential presence of extra options following or included
    in the header the header
  - the relations with preceding protocols. For example,
    the GENEVE follows UDP, eCPRI can follow either UDP
    or L2 header
  - the relations with following protocols. For example,
    the next layer after tunnel header can be L2 or L3
  - whether the new protocol is a tunnel and the header
    is a splitting point between outer and inner layers

The supposed way to operate with flex item:

  - application defines the header structures according to
    protocol specification

  - application calls rte_flow_flex_item_create() with desired
    configuration according to the protocol specification, it
    creates the flex item object over specified ethernet device
    and prepares PMD and underlying hardware to handle flex
    item. On item creation call PMD backing the specified
    ethernet device returns the opaque handle identifying
    the object has been created

  - application uses the rte_flow_item_flex with obtained handle
    in the flows, the values/masks to match with fields in the
    header are specified in the flex item per flow as for regular
    items (except that pattern buffer combines all fields)

  - flows with flex items match with packets in a regular fashion,
    the values and masks for the new protocol header match are
    taken from the flex items in the flows

  - application destroys flows with flex items

  - application calls rte_flow_flex_item_release() as part of
    ethernet device API and destroys the flex item object in
    PMD and releases the engaged hardware resources

3. Flex Item Structure

The flex item structure is intended to be used as part of the flow
pattern like regular RTE flow items and provides the mask and
value to match with fields of the protocol item was configured
for.

  struct rte_flow_item_flex {
    void *handle;
    uint32_t length;
    const uint8_t* pattern;
  };

The handle is some opaque object maintained on per device basis
by underlying driver.

The protocol header fields are considered as bit fields, all
offsets and widths are expressed in bits. The pattern is the
buffer containing the bit concatenation of all the fields
presented at item configuration time, in the same order and
same amount. If byte boundary alignment is needed an application
can use a dummy type field, this is just some kind of gap filler.

The length field specifies the pattern buffer length in bytes
and is needed to allow rte_flow_copy() operations. The approach
of multiple pattern pointers and lengths (per field) was
considered and found clumsy - it seems to be much suitable for
the application to maintain the single structure within the
single pattern buffer.

4. Flex Item Configuration

The flex item configuration consists of the following parts:

  - header field descriptors:
    - next header
    - next protocol
    - sample to match
  - input link descriptors
  - output link descriptors

The field descriptors tell the driver and hardware what data should
be extracted from the packet and then control the packet handling
in the flow engine. Besides this, sample fields can be presented
to match with patterns in the flows. Each field is a bit pattern.
It has width, offset from the header beginning, mode of offset
calculation, and offset related parameters.

The next header field is special, no data are actually taken
from the packet, but its offset is used as a pointer to the next
header in the packet, in other words the next header offset
specifies the size of the header being parsed by flex item.

There is one more special field - next protocol, it specifies
where the next protocol identifier is contained and packet data
sampled from this field will be used to determine the next
protocol header type to continue packet parsing. The next
protocol field is like eth_type field in MAC2, or proto field
in IPv4/v6 headers.

The sample fields are used to represent the data be sampled
from the packet and then matched with established flows.

There are several methods supposed to calculate field offset
in runtime depending on configuration and packet content:

  - FIELD_MODE_FIXED - fixed offset. The bit offset from
    header beginning is permanent and defined by field_base
    configuration parameter.

  - FIELD_MODE_OFFSET - the field bit offset is extracted
    from other header field (indirect offset field). The
    resulting field offset to match is calculated from as:

  field_base + (*offset_base & offset_mask) << offset_shift

    This mode is useful to sample some extra options following
    the main header with field containing main header length.
    Also, this mode can be used to calculate offset to the
    next protocol header, for example - IPv4 header contains
    the 4-bit field with IPv4 header length expressed in dwords.
    One more example - this mode would allow us to skip GENEVE
    header variable length options.

  - FIELD_MODE_BITMASK - the field bit offset is extracted
    from other header field (indirect offset field), the latter
    is considered as bitmask containing some number of one bits,
    the resulting field offset to match is calculated as:

  field_base + bitcount(*offset_base & offset_mask) << offset_shift

    This mode would be useful to skip the GTP header and its
    extra options with specified flags.

  - FIELD_MODE_DUMMY - dummy field, optionally used for byte
    boundary alignment in pattern. Pattern mask and data are
    ignored in the match. All configuration parameters besides
    field size and offset are ignored.

  Note:  "*" - means the indirect field offset is calculated
  and actual data are extracted from the packet by this
  offset (like data are fetched by pointer *p from memory).

The offset mode list can be extended by vendors according to
hardware supported options.

The input link configuration section tells the driver after
what protocols and at what conditions the flex item can follow.
Input link specified the preceding header pattern, for example
for GENEVE it can be UDP item specifying match on destination
port with value 6081. The flex item can follow multiple header
types and multiple input links should be specified. At flow
creation time the item with one of the input link types should
precede the flex item and driver will select the correct flex
item settings, depending on the actual flow pattern.

The output link configuration section tells the driver how
to continue packet parsing after the flex item protocol.
If multiple protocols can follow the flex item header the
flex item should contain the field with the next protocol
identifier and the parsing will be continued depending
on the data contained in this field in the actual packet.

The flex item fields can participate in RSS hash calculation,
the dedicated flag is present in the field description to specify
what fields should be provided for hashing.

5. Flex Item Chaining

If there are multiple protocols supposed to be supported with
flex items in chained fashion - two or more flex items within
the same flow and these ones might be neighbors in the pattern,
it means the flex items are mutual referencing.  In this case,
the item that occurred first should be created with empty
output link list or with the list including existing items,
and then the second flex item should be created referencing
the first flex item as input arc, drivers should adjust
the item configuration.

Also, the hardware resources used by flex items to handle
the packet can be limited. If there are multiple flex items
that are supposed to be used within the same flow it would
be nice to provide some hint for the driver that these two
or more flex items are intended for simultaneous usage.
The fields of items should be assigned with hint indices
and these indices from two or more flex items supposed
to be provided within the same flow should be the same
as well. In other words, the field hint index specifies
the group of fields that can be matched simultaneously
within a single flow. If hint indices are specified,
the driver will try to engage not overlapping hardware
resources and provide independent handling of the field
groups with unique indices. If the hint index is zero
the driver assigns resources on its own.

6. Example of New Protocol Handling

Let's suppose we have the requirements to handle the new tunnel
protocol that follows UDP header with destination port 0xFADE
and is followed by MAC header. Let the new protocol header format
be like this:

  struct new_protocol_header {
    rte_be32 header_length; /* length in dwords, including options */
    rte_be32 specific0;     /* some protocol data, no intention */
    rte_be32 specific1;     /* to match in flows on these fields */
    rte_be32 crucial;       /* data of interest, match is needed */
    rte_be32 options[0];    /* optional protocol data, variable length */
  };

The supposed flex item configuration:

  struct rte_flow_item_flex_field field0 = {
    .field_mode = FIELD_MODE_DUMMY,  /* Affects match pattern only */
    .field_size = 96,                /* three dwords from the beginning */
  };
  struct rte_flow_item_flex_field field1 = {
    .field_mode = FIELD_MODE_FIXED,
    .field_size = 32,       /* Field size is one dword */
    .field_base = 96,       /* Skip three dwords from the beginning */
  };
  struct rte_flow_item_udp spec0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFADE),
    }
  };
  struct rte_flow_item_udp mask0 = {
    .hdr = {
      .dst_port = RTE_BE16(0xFFFF),
    }
  };
  struct rte_flow_item_flex_link link0 = {
    .item = {
       .type = RTE_FLOW_ITEM_TYPE_UDP,
       .spec = &spec0,
       .mask = &mask0,
  };

  struct rte_flow_item_flex_conf conf = {
    .next_header = {
      .tunnel = FLEX_TUNNEL_MODE_SINGLE,
      .field_mode = FIELD_MODE_OFFSET,
      .field_base = 0,
      .offset_base = 0,
      .offset_mask = 0xFFFFFFFF,
      .offset_shift = 2	   /* Expressed in dwords, shift left by 2 */
    },
    .sample = {
       &field0,
       &field1,
    },
    .nb_samples = 2,
    .input_link[0] = &link0,
    .nb_inputs = 1
  };

Let's suppose we have created the flex item successfully, and PMD
returned the handle 0x123456789A. We can use the following item
pattern to match the crucial field in the packet with value 0x00112233:

  struct new_protocol_header spec_pattern =
  {
    .crucial = RTE_BE32(0x00112233),
  };
  struct new_protocol_header mask_pattern =
  {
    .crucial = RTE_BE32(0xFFFFFFFF),
  };
  struct rte_flow_item_flex spec_flex = {
    .handle = 0x123456789A
    .length = sizeiof(struct new_protocol_header),
    .pattern = &spec_pattern,
  };
  struct rte_flow_item_flex mask_flex = {
    .length = sizeof(struct new_protocol_header),
    .pattern = &mask_pattern,
  };
  struct rte_flow_item item_to_match = {
    .type = RTE_FLOW_ITEM_TYPE_FLEX,
    .spec = &spec_flex,
    .mask = &mask_flex,
  };

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
2021-10-20 18:58:54 +02:00
Sunil Kumar Kori
58397fedc6 net/cnxk: support meter action to flow create
Meters are configured per flow using rte_flow_create API.
Implement support for meter action applied on the flow.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2021-10-19 16:25:31 +02:00
Michal Krawczyk
f93e20e516 net/ena: check missing Tx completions
In some cases Tx descriptors may be uncompleted by the HW and as a
result they will never be released.

This patch adds checking for the missing Tx completions to the ENA timer
service, so in order to use this feature, the application must call the
function rte_timer_manage().

Missing Tx completion reset threshold is determined dynamically, by
taking into consideration ring size and the default value.

Tx cleanup is associated with the Tx burst function. As DPDK
applications can call Tx burst function dynamically, time when last
cleanup was called must be traced to avoid false detection of the
missing Tx completion.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
2021-10-19 15:04:17 +02:00
Michal Krawczyk
08180833cb net/ena: add NUMA-aware allocations
Only the IO rings memory was allocated with taking the socket ID into
the respect, while the other structures was allocated using the regular
rte_zmalloc() API.

Ring specific structures are now being allocated using the ring's
socket ID.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
2021-10-19 15:04:17 +02:00
Michal Krawczyk
005064e505 net/ena: support Tx/Rx free thresholds
The caller can pass Tx or Rx free threshold value to the configuration
structure for each ring. It determines when the Tx/Rx function should
start cleaning up/refilling the descriptors. ENA was ignoring this value
and doing it's own calculations.

Now the user can configure ENA's behavior using this parameter and if
this variable won't be set, the ENA will continue with the old behavior
and will use it's own threshold value.

The default value is not provided by the ENA in the ena_infos_get(), as
it's being determined dynamically, depending on the requested ring size.

Note that NULL check for Tx conf was removed from the function
ena_tx_queue_setup(), as at this place the configuration will be
either provided by the user or the default config will be used and it's
handled by the upper (rte_ethdev) layer.

Tx threshold shouldn't be used for the Tx cleanup budget as it can be
inadequate to the used burst. Now the PMD tries to release mbufs for the
ring until it will be depleted.

Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
2021-10-19 15:04:17 +02:00