numam-dpdk

Author	SHA1	Message	Date
Volodymyr Fialko	001d402c89	eal/arm64: support ASan This patch defines ASAN_SHADOW_OFFSET for arm64 according to the ASan documentation. This offset should cover all arm64 VMAs supported by ASan. Signed-off-by: Volodymyr Fialko <vfialko@marvell.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>	2021-11-12 15:30:00 +01:00
Elena Agostini	3a99464456	doc: add CUDA example in GPU guide Add a pseudo-code example to show how to use gpudev API with a CUDA application. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	c7ebd65c13	gpudev: add communication list In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations. An example could be a receive-and-process application where CPU is responsible for receiving packets in multiple mbufs and the GPU is responsible for processing the content of those packets. The purpose of this list is to provide a buffer in CPU memory visible from the GPU that can be treated as a circular buffer to let the CPU provide fondamental info of received packets to the GPU. A possible use-case is described below. CPU: - Trigger some task on the GPU - in a loop: - receive a number of packets - provide packets info to the GPU GPU: - Do some pre-processing - Wait to receive a new set of packet to be processed Layout of a communication list would be: ------- \| 0 \| => pkt_list \| status \| \| #pkts \| ------- \| 1 \| => pkt_list \| status \| \| #pkts \| ------- \| 2 \| => pkt_list \| status \| \| #pkts \| ------- \| .... \| => pkt_list ------- Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	f56160a255	gpudev: add communication flag In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations. The purpose of this flag is to allow the CPU and the GPU to exchange ACKs. A possible use-case is described below. CPU: - Trigger some task on the GPU - Prepare some data - Signal to the GPU the data is ready updating the communication flag GPU: - Do some pre-processing - Wait for more data from the CPU polling on the communication flag - Consume the data prepared by the CPU Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	2d61b429cf	gpudev: add memory barrier Add a function for the application to ensure the coherency of the writes executed by another device into the GPU memory. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	e818c4e2bf	gpudev: add memory API In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. Such workload distribution can be achieved by sharing some memory. As a first step, the features are focused on memory management. A function allows to allocate memory inside the device, or in the main (CPU) memory while making it visible for the device. This memory may be used to save packets or for synchronization data. The next step should focus on GPU processing task control. Signed-off-by: Elena Agostini <eagostini@nvidia.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:53 +01:00
Thomas Monjalon	82e5f6b658	gpudev: add child device representing a device context The computing device may operate in some isolated contexts. Memory and processing are isolated in a silo represented by a child device. The context is provided as an opaque by the caller of rte_gpu_add_child(). Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:52 +01:00
Elena Agostini	8b8036a66e	gpudev: introduce GPU device class library In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. The new library gpudev is for dealing with GPGPU computing devices from a DPDK application running on the CPU. The infrastructure is prepared to welcome drivers in drivers/gpu/. Signed-off-by: Elena Agostini <eagostini@nvidia.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:52 +01:00
Naga Harish K S V	995b150c1a	eventdev/eth_rx: add queue stats API This patch adds new api ``rte_event_eth_rx_adapter_queue_stats_get`` to retrieve queue stats. The queue stats are in the format ``struct rte_event_eth_rx_adapter_queue_stats``. For resetting the queue stats, ``rte_event_eth_rx_adapter_queue_stats_reset`` api is added. The adapter stats_get and stats_reset apis are also updated to handle queue level event buffer use case. Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>	2021-11-04 08:41:25 +01:00
Gowrishankar Muthukrishnan	259ca6d161	security: add telemetry endpoint for capabilities Add telemetry endpoint for cryptodev security capabilities. Details of endpoints added in documentation. Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-11-04 19:46:27 +01:00
Radu Nicolau	ff4a29d167	ipsec: support TSO Add support for transmit segmentation offload to inline crypto processing mode. This offload is not supported by other offload modes, as at a minimum it requires inline crypto for IPsec to be supported on the network interface. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-11-04 19:46:27 +01:00
Gowrishankar Muthukrishnan	1c559ee846	cryptodev: add telemetry endpoint for capabilities Add telemetry endpoint for getting cryptodev capabilities. Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-11-04 19:43:14 +01:00
Rebecca Troy	d3d98f5ce9	cryptodev: support telemetry The cryptodev library now registers commands with telemetry, and implements the corresponding callback functions. These commands allow a list of cryptodevs to be queried, as well as info and stats for the corresponding cryptodev. An example usage can be seen below: Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2 {"version": "DPDK 21.11.0-rc0", "pid": 1135019, "max_output_len": 16384} --> / {"/": ["/", "/cryptodev/info", "/cryptodev/list", "/cryptodev/stats", ...]} --> /cryptodev/list {"/cryptodev/list": [0,1,2,3]} --> /cryptodev/info,0 {"/cryptodev/info": {"device_name": "0000:1c:01.0_qat_sym", \ "max_nb_queue_pairs": 2}} --> /cryptodev/stats,0 {"/cryptodev/stats": {"enqueued_count": 0, "dequeued_count": 0, \ "enqueue_err_count": 0, "dequeue_err_count": 0}} Signed-off-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Ciara Power <ciara.power@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-11-04 19:43:14 +01:00
Maxime Coquelin	ab4bb42406	vhost: rename driver callbacks struct As previously announced, this patch renames struct vhost_device_ops to struct rte_vhost_device_ops. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-11-03 11:59:27 +01:00
Dmitry Kozlyuk	2c9cd45de7	ethdev: add capability to keep shared objects on restart rte_flow_action_handle_create() did not mention what happens with an indirect action when a device is stopped and started again. It is natural for some indirect actions, like counter, to be persistent. Keeping others at least saves application time and complexity. However, not all PMDs can support it, or the support may be limited by particular action kinds, that is, combinations of action type and the value of the transfer bit in its configuration. Add a device capability to indicate if at least some indirect actions are kept across the above sequence. Without this capability the behavior is still unspecified, and application is required to destroy the indirect actions before stopping the device. In the future, indirect actions may not be the only type of objects shared between flow rules. The capability bit intends to cover all possible types of such objects, hence its name. Declare that the application can test for the persistence of a particular indirect action kind by attempting to create an indirect action of that kind when the device is stopped and checking for the specific error type. This is logical because if the PMD can to create an indirect action when the device is not started and use it after the start happens, it is natural that it can move its internal flow shared object to the same state when the device is stopped and restore the state when the device is started. Indirect action persistence across a reconfigurations is not required. In case a PMD cannot keep the indirect actions across reconfiguration, it is allowed just to report an error. Application must then flush the indirect actions before attempting it. Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-11-02 18:59:17 +01:00
Dmitry Kozlyuk	1d5a3d68c0	ethdev: add capability to keep flow rules on restart Previously, it was not specified what happens to the flow rules when the device is stopped, possibly reconfigured, then started. If flow rules were kept, it could be convenient for application developers, because they wouldn't need to save and restore them. However, due to the number of flows and possible creation rate it is impractical to save all flow rules in DPDK layer. This means that flow rules persistence really depends on whether PMD and HW can implement it efficiently. It can also be limited by the rule item and action types, and its attributes transfer bit (a combination of an item/action type and a value of the transfer bit is called a rule feature). Add a device capability bit for PMDs that can keep at least some of the flow rules across restart. Without this capability behavior is still unspecified and it is declared that the application must flush the rules before stopping the device. Allow the application to test for persistence of rules using a particular feature by attempting to create a flow rule using that feature when the device is stopped and checking for the specific error. This is logical because if the PMD can to create the flow rule when the device is not started and use it after the start happens, it is natural that it can move its internal flow rule object to the same state when the device is stopped and restore the state when the device is started. Rule persistence across a reconfigurations is not required, because tracking all the rules and configuration-dependent resources they use may be infeasible. In case a PMD cannot keep the rules across reconfiguration, it is allowed just to report an error. Application must then flush the rules before attempting it. Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-11-02 18:59:17 +01:00
Wojciech Liguzinski	44c730b0e3	sched: add PIE based congestion management Implement PIE based congestion management based on rfc8033. The Proportional Integral Controller Enhanced (PIE) algorithm works by proactively dropping packets randomly. PIE is implemented as more advanced queue management is required to address the bufferbloat problem and provide desirable quality of service to users. Tests for PIE code added to test application. Added PIE related information to documentation. Signed-off-by: Wojciech Liguzinski <wojciechx.liguzinski@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>	2021-11-04 15:41:49 +01:00
Vladimir Medvedkin	31d7c06947	hash: add bulk Toeplitz hash implementation This patch adds a bulk version for the Toeplitz hash implemented with Galios Fields New Instructions (GFNI). Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Vladimir Medvedkin	4fd8c4cb0d	hash: add new Toeplitz hash implementation This patch add a new Toeplitz hash implementation using Galios Fields New Instructions (GFNI). Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Zhihong Peng	6cc51b1293	mem: instrument allocator for ASan This patch adds necessary hooks in the memory allocator for ASan. This feature is currently available in DPDK only on Linux x86_64. If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be defined and RTE_MALLOC_ASAN must be set accordingly in meson. Signed-off-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2021-10-29 16:25:03 +02:00
Zhihong Peng	6e0290250d	build: enable AddressSanitizer AddressSanitizer [1] a.k.a. ASan is a widely-used debugging tool to detect memory access errors. It helps to detect issues like use-after-free, various kinds of buffer overruns in C/C++ programs, and other similar errors, as well as printing out detailed debug information whenever an error is detected. ASan is integrated with gcc and clang and can be enabled via a meson option: -Db_sanitize=address See the documentation for details (especially regarding clang). Enabling ASan has an impact on performance since additional checks are added to generated binaries. Enabling ASan with Windows is currently not supported in DPDK. 1: https://github.com/google/sanitizers/wiki/AddressSanitizer Signed-off-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2021-10-29 15:25:34 +02:00
Olivier Matz	daa02b5cdd	mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-24 13:37:43 +02:00
Ferruh Yigit	295968d174	ethdev: add namespace Add 'RTE_ETH' namespace to all enums & macros in a backward compatible way. The macros for backward compatibility can be removed in next LTS. Also updated some struct names to have 'rte_eth' prefix. All internal components switched to using new names. Syntax fixed on lines that this patch touches. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-22 18:15:38 +02:00
Xueming Li	dd22740cc2	ethdev: introduce shared Rx queue In current DPDK framework, each Rx queue is pre-loaded with mbufs to save incoming packets. For some PMDs, when number of representors scale out in a switch domain, the memory consumption became significant. Polling all ports also leads to high cache miss, high latency and low throughput. This patch introduces shared Rx queue. Ports in same Rx domain and switch domain could share Rx queue set by specifying non-zero sharing group in Rx queue configuration. Shared Rx queue is identified by share_rxq field of Rx queue configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using same shared Rx queue ID. No special API is defined to receive packets from shared Rx queue. Polling any member port of a shared Rx queue receives packets of that queue for all member ports, port_id is identified by mbuf->port. PMD is responsible to resolve shared Rx queue from device and queue data. Shared Rx queue must be polled in same thread or core, polling a queue ID of any member port is essentially same. Multiple share groups are supported. PMD should support mixed configuration by allowing multiple share groups and non-shared Rx queue on one port. Example grouping and polling model to reflect service priority: Group1, 2 shared Rx queues per port: PF, rep0, rep1 Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127 Core0: poll PF queue0 Core1: poll PF queue1 Core2: poll rep2 queue0 PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE. PMD is responsible for shared Rx queue consistency checks to avoid member port's configuration contradict each other. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:08:50 +02:00
Jie Wang	3a929df1f2	ethdev: support L2TPv2 and PPP procotol Added flow pattern items and header formats of L2TPv2 and PPP. Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Jie Wang <jie1x.wang@intel.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 14:15:59 +02:00
Kevin Laatz	280c3ca02c	dma/idxd: add operation statistic tracking Add statistic tracking for DSA devices. The dmadev library documentation is also updated to add a generic section for using the library's statistics APIs. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>	2021-10-22 22:40:59 +02:00
Kevin Laatz	3d36a0a1c7	dma/idxd: add data path job submission Add data path functions for enqueuing and submitting operations to DSA devices. Documentation updates are included for dmadev library and IDXD driver docs as appropriate. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>	2021-10-22 22:40:59 +02:00
Stephen Hemminger	cbb44143be	app/dumpcap: add new packet capture application This is a new packet capture application to replace existing pdump. The new application works like Wireshark dumpcap program and supports the pdump API features. It is not complete yet some features such as filtering are not implemented. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2021-10-22 22:40:58 +02:00
Stephen Hemminger	10f726efe2	pdump: support pcapng and filtering This enhances the DPDK pdump library to support new pcapng format and filtering via BPF. The internal client/server protocol is changed to support two versions: the original pdump basic version and a new pcapng version. The internal version number (not part of exposed API or ABI) is intentionally increased to cause any attempt to try mismatched primary/secondary process to fail. Add new API to do allow filtering of captured packets with DPDK BPF (eBPF) filter program. It keeps statistics on packets captured, filtered, and missed (because ring was full). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-22 22:07:48 +02:00
Stephen Hemminger	8d23ce8f5e	pcapng: add new library for writing pcapng files This is utility library for writing pcapng format files used by Wireshark family of utilities. Older tcpdump also knows how to read (but not write) this format. See https://github.com/pcapng/pcapng/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-22 17:19:07 +02:00
Naga Harish K S V	b06bca69b7	eventdev/eth_rx: add per-queue event buffer Added per queue buffer. To configure per queue event buffer size, application sets rte_event_eth_rx_adapter_params::use_queue_event_buf flag as true while using rte_event_eth_rx_adapter_create_with_params(). The per queue event buffer size is populated in rte_event_eth_rx_adapter_queue_conf::event_buf_size and passed to rte_event_eth_rx_adapter_queue_add(). Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>	2021-10-21 10:14:50 +02:00
Naga Harish K S V	bc0df25c83	eventdev/eth_rx: add event buffer size configurability Currently event buffer is static array with a default size defined internally. To configure event buffer size from application, rte_event_eth_rx_adapter_create_with_params() API is added which takes struct rte_event_eth_rx_adapter_params to configure event buffer size in addition other params. The event buffer size is rounded up for better buffer utilization and performance. In case of NULL params argument, default event buffer size is used. Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Ganapati Kundapura	da781e6488	eventdev/eth_rx: support Rx queue config get Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue information - event queue identifier, flags for handling received packets, scheduler type, event priority, polling frequency of the receive queue and flow identifier in rte_event_eth_rx_adapter_queue_conf structure Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	929ebdd543	eventdev/eth_rx: simplify event vector config Include vector configuration into the structure ``rte_event_eth_rx_adapter_queue_conf`` that is used to configure Rx adapter ethernet device Rx queue parameters. This simplifies event vector configuration as it avoids splitting configuration per Rx queue. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
David Marchand	1752b08781	test: rely on EAL detection for core list Cores count has a direct impact on the time needed to complete unit tests. Currently, the core list used for unit test is enforced to "all cores on the system" with no way for (CI) users to adapt it. On the other hand, EAL default behavior (when no -c/-l option gets passed) is to start threads on as many cores available in the process cpu affinity. Remove logic from meson: users can then select where to run the tests by either running meson with a custom cpu affinity (using taskset/cpuset depending on OS) or by passing a --test-args option to meson. Example: $ sudo meson test -C build --suite fast-tests -t 3 --test-args "-l 0-3" Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Aaron Conole <aconole@redhat.com>	2021-10-21 17:48:04 +02:00
Viacheslav Ovsiienko	dc4d860e8a	ethdev: introduce configurable flexible item 1. Introduction and Retrospective Nowadays the networks are evolving fast and wide, the network structures are getting more and more complicated, the new application areas are emerging. To address these challenges the new network protocols are continuously being developed, considered by technical communities, adopted by industry and, eventually implemented in hardware and software. The DPDK framework follows the common trends and if we bother to glance at the RTE Flow API header we see the multiple new items were introduced during the last years since the initial release. The new protocol adoption and implementation process is not straightforward and takes time, the new protocol passes development, consideration, adoption, and implementation phases. The industry tries to mitigate and address the forthcoming network protocols, for example, many hardware vendors are implementing flexible and configurable network protocol parsers. As DPDK developers, could we anticipate the near future in the same fashion and introduce the similar flexibility in RTE Flow API? Let's check what we already have merged in our project, and we see the nice raw item (rte_flow_item_raw). At the first glance, it looks superior and we can try to implement a flow matching on the header of some relatively new tunnel protocol, say on the GENEVE header with variable length options. And, under further consideration, we run into the raw item limitations: - only fixed size network header can be represented - the entire network header pattern of fixed format (header field offsets are fixed) must be provided - the search for patterns is not robust (the wrong matches might be triggered), and actually is not supported by existing PMDs - no explicitly specified relations with preceding and following items - no tunnel hint support As the result, implementing the support for tunnel protocols like aforementioned GENEVE with variable extra protocol option with flow raw item becomes very complicated and would require multiple flows and multiple raw items chained in the same flow (by the way, there is no support found for chained raw items in implemented drivers). This RFC introduces the dedicated flex item (rte_flow_item_flex) to handle matches with existing and new network protocol headers in a unified fashion. 2. Flex Item Life Cycle Let's assume there are the requirements to support the new network protocol with RTE Flows. What is given within protocol specification: - header format - header length, (can be variable, depending on options) - potential presence of extra options following or included in the header the header - the relations with preceding protocols. For example, the GENEVE follows UDP, eCPRI can follow either UDP or L2 header - the relations with following protocols. For example, the next layer after tunnel header can be L2 or L3 - whether the new protocol is a tunnel and the header is a splitting point between outer and inner layers The supposed way to operate with flex item: - application defines the header structures according to protocol specification - application calls rte_flow_flex_item_create() with desired configuration according to the protocol specification, it creates the flex item object over specified ethernet device and prepares PMD and underlying hardware to handle flex item. On item creation call PMD backing the specified ethernet device returns the opaque handle identifying the object has been created - application uses the rte_flow_item_flex with obtained handle in the flows, the values/masks to match with fields in the header are specified in the flex item per flow as for regular items (except that pattern buffer combines all fields) - flows with flex items match with packets in a regular fashion, the values and masks for the new protocol header match are taken from the flex items in the flows - application destroys flows with flex items - application calls rte_flow_flex_item_release() as part of ethernet device API and destroys the flex item object in PMD and releases the engaged hardware resources 3. Flex Item Structure The flex item structure is intended to be used as part of the flow pattern like regular RTE flow items and provides the mask and value to match with fields of the protocol item was configured for. struct rte_flow_item_flex { void handle; uint32_t length; const uint8_t pattern; }; The handle is some opaque object maintained on per device basis by underlying driver. The protocol header fields are considered as bit fields, all offsets and widths are expressed in bits. The pattern is the buffer containing the bit concatenation of all the fields presented at item configuration time, in the same order and same amount. If byte boundary alignment is needed an application can use a dummy type field, this is just some kind of gap filler. The length field specifies the pattern buffer length in bytes and is needed to allow rte_flow_copy() operations. The approach of multiple pattern pointers and lengths (per field) was considered and found clumsy - it seems to be much suitable for the application to maintain the single structure within the single pattern buffer. 4. Flex Item Configuration The flex item configuration consists of the following parts: - header field descriptors: - next header - next protocol - sample to match - input link descriptors - output link descriptors The field descriptors tell the driver and hardware what data should be extracted from the packet and then control the packet handling in the flow engine. Besides this, sample fields can be presented to match with patterns in the flows. Each field is a bit pattern. It has width, offset from the header beginning, mode of offset calculation, and offset related parameters. The next header field is special, no data are actually taken from the packet, but its offset is used as a pointer to the next header in the packet, in other words the next header offset specifies the size of the header being parsed by flex item. There is one more special field - next protocol, it specifies where the next protocol identifier is contained and packet data sampled from this field will be used to determine the next protocol header type to continue packet parsing. The next protocol field is like eth_type field in MAC2, or proto field in IPv4/v6 headers. The sample fields are used to represent the data be sampled from the packet and then matched with established flows. There are several methods supposed to calculate field offset in runtime depending on configuration and packet content: - FIELD_MODE_FIXED - fixed offset. The bit offset from header beginning is permanent and defined by field_base configuration parameter. - FIELD_MODE_OFFSET - the field bit offset is extracted from other header field (indirect offset field). The resulting field offset to match is calculated from as: field_base + (offset_base & offset_mask) << offset_shift This mode is useful to sample some extra options following the main header with field containing main header length. Also, this mode can be used to calculate offset to the next protocol header, for example - IPv4 header contains the 4-bit field with IPv4 header length expressed in dwords. One more example - this mode would allow us to skip GENEVE header variable length options. - FIELD_MODE_BITMASK - the field bit offset is extracted from other header field (indirect offset field), the latter is considered as bitmask containing some number of one bits, the resulting field offset to match is calculated as: field_base + bitcount(offset_base & offset_mask) << offset_shift This mode would be useful to skip the GTP header and its extra options with specified flags. - FIELD_MODE_DUMMY - dummy field, optionally used for byte boundary alignment in pattern. Pattern mask and data are ignored in the match. All configuration parameters besides field size and offset are ignored. Note: "" - means the indirect field offset is calculated and actual data are extracted from the packet by this offset (like data are fetched by pointer p from memory). The offset mode list can be extended by vendors according to hardware supported options. The input link configuration section tells the driver after what protocols and at what conditions the flex item can follow. Input link specified the preceding header pattern, for example for GENEVE it can be UDP item specifying match on destination port with value 6081. The flex item can follow multiple header types and multiple input links should be specified. At flow creation time the item with one of the input link types should precede the flex item and driver will select the correct flex item settings, depending on the actual flow pattern. The output link configuration section tells the driver how to continue packet parsing after the flex item protocol. If multiple protocols can follow the flex item header the flex item should contain the field with the next protocol identifier and the parsing will be continued depending on the data contained in this field in the actual packet. The flex item fields can participate in RSS hash calculation, the dedicated flag is present in the field description to specify what fields should be provided for hashing. 5. Flex Item Chaining If there are multiple protocols supposed to be supported with flex items in chained fashion - two or more flex items within the same flow and these ones might be neighbors in the pattern, it means the flex items are mutual referencing. In this case, the item that occurred first should be created with empty output link list or with the list including existing items, and then the second flex item should be created referencing the first flex item as input arc, drivers should adjust the item configuration. Also, the hardware resources used by flex items to handle the packet can be limited. If there are multiple flex items that are supposed to be used within the same flow it would be nice to provide some hint for the driver that these two or more flex items are intended for simultaneous usage. The fields of items should be assigned with hint indices and these indices from two or more flex items supposed to be provided within the same flow should be the same as well. In other words, the field hint index specifies the group of fields that can be matched simultaneously within a single flow. If hint indices are specified, the driver will try to engage not overlapping hardware resources and provide independent handling of the field groups with unique indices. If the hint index is zero the driver assigns resources on its own. 6. Example of New Protocol Handling Let's suppose we have the requirements to handle the new tunnel protocol that follows UDP header with destination port 0xFADE and is followed by MAC header. Let the new protocol header format be like this: struct new_protocol_header { rte_be32 header_length; /* length in dwords, including options / rte_be32 specific0; / some protocol data, no intention / rte_be32 specific1; / to match in flows on these fields / rte_be32 crucial; / data of interest, match is needed / rte_be32 options[0]; / optional protocol data, variable length / }; The supposed flex item configuration: struct rte_flow_item_flex_field field0 = { .field_mode = FIELD_MODE_DUMMY, / Affects match pattern only / .field_size = 96, / three dwords from the beginning / }; struct rte_flow_item_flex_field field1 = { .field_mode = FIELD_MODE_FIXED, .field_size = 32, / Field size is one dword / .field_base = 96, / Skip three dwords from the beginning / }; struct rte_flow_item_udp spec0 = { .hdr = { .dst_port = RTE_BE16(0xFADE), } }; struct rte_flow_item_udp mask0 = { .hdr = { .dst_port = RTE_BE16(0xFFFF), } }; struct rte_flow_item_flex_link link0 = { .item = { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &spec0, .mask = &mask0, }; struct rte_flow_item_flex_conf conf = { .next_header = { .tunnel = FLEX_TUNNEL_MODE_SINGLE, .field_mode = FIELD_MODE_OFFSET, .field_base = 0, .offset_base = 0, .offset_mask = 0xFFFFFFFF, .offset_shift = 2 / Expressed in dwords, shift left by 2 */ }, .sample = { &field0, &field1, }, .nb_samples = 2, .input_link[0] = &link0, .nb_inputs = 1 }; Let's suppose we have created the flex item successfully, and PMD returned the handle 0x123456789A. We can use the following item pattern to match the crucial field in the packet with value 0x00112233: struct new_protocol_header spec_pattern = { .crucial = RTE_BE32(0x00112233), }; struct new_protocol_header mask_pattern = { .crucial = RTE_BE32(0xFFFFFFFF), }; struct rte_flow_item_flex spec_flex = { .handle = 0x123456789A .length = sizeiof(struct new_protocol_header), .pattern = &spec_pattern, }; struct rte_flow_item_flex mask_flex = { .length = sizeof(struct new_protocol_header), .pattern = &mask_pattern, }; struct rte_flow_item item_to_match = { .type = RTE_FLOW_ITEM_TYPE_FLEX, .spec = &spec_flex, .mask = &mask_flex, }; Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2021-10-20 18:58:54 +02:00
Viacheslav Ovsiienko	14fc81aed7	ethdev: update modify field flow action The generic modify field flow action introduced in [1] has some issues related to the immediate source operand: - immediate source can be presented either as an unsigned 64-bit integer or pointer to data pattern in memory. There was no explicit pointer field defined in the union. - the byte ordering for 64-bit integer was not specified. Many fields have shorter lengths and byte ordering is crucial. - how the bit offset is applied to the immediate source field was not defined and documented. - 64-bit integer size is not enough to provide IPv6 addresses. In order to cover the issues and exclude any ambiguities the following is done: - introduce the explicit pointer field in rte_flow_action_modify_data structure - replace the 64-bit unsigned integer with 16-byte array - update the modify field flow action documentation Appropriate deprecation notice has been removed. [1] commit 73b68f4c54a0 ("ethdev: introduce generic modify flow action") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-14 14:34:31 +02:00
Ivan Malov	9d2a349b38	ethdev: deprecate direction attributes in transfer flows Attributes "ingress" and "egress" can only apply unambiguosly to non-"transfer" flows. In "transfer" flows, the standpoint is effectively shifted to the embedded switch. There can be many different endpoints connected to the switch, so the use of "ingress" / "egress" does not shed light on which endpoints precisely can be considered as traffic sources. Add relevant deprecation notices and suggest the use of precise traffic source items (PORT_REPRESENTOR and REPRESENTED_PORT). Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-13 22:59:26 +02:00
Ivan Malov	5da44faa80	ethdev: deprecate hard-to-use or ambiguous items and actions PF, VF and PHY_PORT require that applications have extra knowledge of the underlying NIC and thus are hard to use. Also, the corresponding items depend on the direction attribute (ingress / egress), which complicates their use in applications and interpretation in PMDs. The concept of PORT_ID is ambiguous as it doesn't say whether the port in question is an ethdev or the represented entity. Items and actions PORT_REPRESENTOR, REPRESENTED_PORT should be used instead. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	88caad251c	ethdev: add represented port action to flow API For use in "transfer" flows. Supposed to send matching traffic to the entity represented by the given ethdev, at embedded switch level. Such an entity can be a network (via a network port), a guest machine (via a VF) or another ethdev in the same application. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	8edb6bc026	ethdev: add port representor action to flow API For use in "transfer" flows. Supposed to send matching traffic to the given ethdev (to the application), at embedded switch level. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	49863ae2bf	ethdev: add represented port item to flow API For use in "transfer" flows. Supposed to match traffic entering the embedded switch from the entity represented by the given ethdev. Such an entity can be a network (via a network port), a guest machine (via a VF) or another ethdev in the same application. Must not be combined with direction attributes. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	081e42dab1	ethdev: add port representor item to flow API For use in "transfer" flows. Supposed to match traffic entering the embedded switch from the given ethdev. Must not be combined with direction attributes. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:25 +02:00
Andrew Rybchenko	92ef4b8f16	ethdev: remove deprecated shared counter attribute Indirect actions should be used to do shared counters. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-10-12 19:20:57 +02:00
Radu Nicolau	68977baa75	ipsec: support SA telemetry Add telemetry support for ipsec SAs. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:08:03 +02:00
Radu Nicolau	01eef5907f	ipsec: support NAT-T Add support for the IPsec NAT-Traversal use case for Tunnel mode packets. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:06:24 +02:00
Radu Nicolau	c99d26197c	ipsec: support more AEAD algorithms Added support for AES_CCM, CHACHA20_POLY1305 and AES_GMAC. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:03:13 +02:00
Andrew Rybchenko	cb77b060eb	mempool: add namespace to driver register macro Add RTE_ prefix to macro used to register mempool driver. The old one is still available but deprecated. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 10:00:18 +02:00
Chengwen Feng	91e581e5c9	dmadev: add data plane API This patch add data plane API for dmadev. Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-17 20:49:58 +02:00
Chengwen Feng	e0180db144	dmadev: add control plane API This patch add control plane API for dmadev. Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-17 20:49:58 +02:00

1 2 3 4 5 ...

612 Commits