numam-dpdk

Author	SHA1	Message	Date
Maxime Coquelin	2cbe826e26	vhost: remove notion of async descriptor Now that IO vectors iterator have been simplified, the rte_vhost_async_desc struct only contains a pointer on the iterator array stored in the async metadata. This patch removes it, and pass directly the iterators array pointer to the transfer_data callback. Doing that, we avoid declaring the descriptor array in the stack, and also avoid the cost of filling it. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	d5d25cfd85	vhost: improve IO vector logic IO vectors and their iterators arrays were part of the async metadata but not their indexes. In order to makes this more consistent, the patch adds the indexes to the async metadata. Doing that, we can avoid triggering DMA transfer within the loop as it IO vector index overflow is now prevented in the async_mbuf_to_desc() function. Note that previous detection mechanism was broken since the overflow already happened when detected, so OOB memory access would already have happened. With this changes done, virtio_dev_rx_async_submit_split() and virtio_dev_rx_async_submit_packed() can be further simplified. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	0af9f99221	vhost: remove useless fields in async iterator struct Offset and count fields are unused and so can be removed. The offset field was actually in the Vhost example, but in a way that does not make sense. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	6171bfbfb2	vhost: introduce specific iovec structure This patch introduces rte_vhost_iovec struct that contains both source and destination addresses since we always have a 1:1 mapping between source and destination. While using the standard iovec struct might have seemed better, having to duplicate IO vectors and its iterators is memory inefficient and make the implementation more complex. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	8b3fc5a213	vhost: remove async batch threshold Reaching the async batch threshold was one of the condition to trigger the DMA transfer. However, this condition was never met since the threshold value is 32, same as the MAX_PKT_BURST value. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	3fe629547e	vhost: simplify async IO vectors iterators This patch splits the iterator arrays in two, one for source and one for destination. The goal is make the code easier to understand. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	97064162d4	vhost: simplify async IO vectors IO vectors implementation is unnecessarily complex, mixing source and destinations vectors in the same array. This patch declares two arrays, one for the source and one for the destination. It also gets rid of seg_awaits variable in both packed and split implementation, which is the same as iovec_idx. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	5f89c5e1e9	vhost: hide in-flight async structure This patch moves async_inflight_info struct to internal header since it should not be part of the API. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	ee8024b3d4	vhost: move async data in dedicated structure This patch moves async-related metadata from vhost_virtqueue to a dedicated struct. It makes it clear which fields are async related, and also saves some memory when async feature is not in use. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Miao Li	c6e305141a	power: support missing Rx queue info Since some vdevs like virtio and vhost do not support rxq_info_get and queue state inquiry, the error return value -ENOTSUP need to be ignored when queue_stopped cannot get rx queue information and rx queue state. This patch changes the return value of queue_stopped when rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot provide rx queue information and rx queue state enable power management. Fixes: `209fd58545` ("power: make ethdev power management thread unsafe") Cc: stable@dpdk.org Signed-off-by: Miao Li <miao.li@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-10-29 12:32:29 +02:00
Miao Li	34fd4373ce	vhost: add power monitor API This commit defines rte_vhost_power_monitor_cond which is used to pass some information to vhost driver. The information is including the address to monitor, the expected value, the mask to extract value read from 'addr', the value size of monitor address, the match flag used to distinguish the value used to match something or not match something. Vhost driver can use these information to fill rte_power_monitor_cond. Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2021-10-29 12:32:29 +02:00
Xuan Ding	5fd6e93b7e	vhost: remove async DMA map status Async DMA map status flag was added to prevent the unnecessary unmap when DMA devices bound to kernel driver. This brings maintenance cost for a lot of code. This patch removes the DMA map status by using rte_errno instead. This patch relies on the following patch to fix a partial unmap check in vfio unmapping API. [1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-29 12:32:22 +02:00
David Marchand	e7c727c307	net: fix build with sparse on L2TPv2 bitfields An external project that wants to do additional checks on fields endianness can remap rte_beXX types to instrumented types and use sparse. The current code breaks OVS build with sparse: ../../lib/ofp-packet.c: note: in included file (through .../ovs/dpdk-dir/build/include/rte_flow.h, ../../lib/netdev-dpdk.h, ../../lib/dp-packet.h): .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:92:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:93:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:94:40: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:95:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:96:40: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:97:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:98:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:99:40: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h💯39: error: invalid bitfield specifier for type restricted ovs_be16. make[3]: *** [lib/ofp-packet.lo] Error 1 Use simple uint16_t types for bitfields in L2TPv2 struct. Fixes: `3a929df1f2` ("ethdev: support L2TPv2 and PPP procotol") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-28 20:28:01 +02:00
David Marchand	41f2f05574	ethdev: warn once when using port not ready Warning continuously is a pain when developping or if a unit test is/gets broken. It could also be a problem if application behaves badly only in some corner cases and a DoS results of those logs being continuously displayed. Let's warn once per port and per rx/tx. Getting such a log is scary, but let's make it more eye catching by dumping a backtrace with it. Tested by introducing a bug in testpmd: static int eth_dev_start_mp(uint16_t port_id) { - if (is_proc_primary()) + if (!is_proc_primary()) return rte_eth_dev_start(port_id); return 0; Then, running a basic null test: $ ./devtools/test-null.sh ... Start automatic packet forwarding io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native Logical Core 1 (socket 0) forwards packets on 2 streams: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 lcore 0 called rx_pkt_burst for not ready port 0 8: [build/app/dpdk-testpmd() [0x59e839]] 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]] 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]] 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]] 4: [build/app/dpdk-testpmd() [0x65e1be]] 3: [build/app/dpdk-testpmd() [0x65a996]] 2: [build/app/dpdk-testpmd() [0xa6cbc7]] 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]] lcore 0 called rx_pkt_burst for not ready port 1 8: [build/app/dpdk-testpmd() [0x59e839]] 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]] 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]] 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]] 4: [build/app/dpdk-testpmd() [0x65e1be]] 3: [build/app/dpdk-testpmd() [0x65a996]] 2: [build/app/dpdk-testpmd() [0xa6cbc7]] 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]] io packet forwarding packets/burst=32 nb forwarding cores=1 - nb forwarding ports=2 port 0: RX queue number: 1 Tx queue number: 1 Rx offloads=0x0 Tx offloads=0x0 Fixes: `c87d435a4d` ("ethdev: copy fast-path API into separate structure") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-27 19:28:45 +02:00
Olivier Matz	9bffc92850	mem: fix dynamic hugepage mapping in container Since its introduction in 2018, the SIGBUS handler was never registered, and all related functions were unused. A SIGBUS can be received by the application when accessing to hugepages even if mmap() was successful, This happens especially when running inside containers when there is not enough hugepages. In this case, we need to recover. A similar scheme can be found in eal_memory.c. Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-05 15:28:55 +01:00
Ilyes Ben Hamouda	770d41bf33	malloc: fix allocation with unknown socket ID When using rte_malloc() from a thread which is not bound to a numa socket (the typical case is a control thread, but it can also happen on a dataplane thread if its cpu affinity is on cores attached to several sockets), the used heap is the one from numa socket 0, which may not have available memory. Fix this by selecting the first socket which has available memory. Note: malloc_get_numa_socket() is only used from one .c file, so move it there, and remove the inline keyword. Fixes: `b94580d688` ("malloc: avoid unknown socket id") Cc: stable@dpdk.org Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-05 15:28:49 +01:00
David Hunt	bb0bd346d5	eal: suggest using --lcores option If the user requests to use an lcore above 128 using -l, the eal will exit with "EAL: invalid core list syntax" and very little else useful information. This patch adds some extra information suggesting to use --lcores so that physical cores above RTE_MAX_LCORE (default 128) can be used. This is achieved by using the --lcores option by mapping the logical cores in the application to physical cores. For example, if "-l 12-16,130,132" is used, we see the following additional output on the command line: EAL: lcore 132 >= RTE_MAX_LCORE (128) EAL: lcore 133 >= RTE_MAX_LCORE (128) EAL: To use high physical core ids, please use --lcores to map them to lcore ids below RTE_MAX_LCORE, EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133 The same is added to -c option parsing. For example, if "-c 0x300000000000000000000000000000000" is used, we see the following additional output on the command line: EAL: lcore 128 >= RTE_MAX_LCORE (128) EAL: lcore 129 >= RTE_MAX_LCORE (128) EAL: To use high physical core ids, please use --lcores to map them to lcore ids below RTE_MAX_LCORE, EAL: e.g. --lcores 0@128,1@129 Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-11-05 14:39:37 +01:00
David Marchand	f5fa0e110f	eal: promote non-EAL lcore API as stable This API has been around for more than a year (and is in LTS 20.11). It did not receive negative feedback and will be used in a next OVS release. Mark it stable. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-04 22:57:58 +01:00
Konstantin Ananyev	65d9b7c664	bpf: fix convert API when libpcap missing rte_bpf_convert() implementation depends on libpcap. Right now it is defined only when this library is installed and RTE_PORT_PCAP is defined. Fix that by providing for such case stub rte_bpf_convert() implementation that will always return an error. To draw user attention, if proper implementation is disabled, warning will be thrown at meson configure stage. Also move stub for another function (rte_bpf_elf_load) into the same place (bpf_stub.c). Fixes: `2eccf6afbe` ("bpf: add function to convert classic BPF to DPDK BPF") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 19:56:20 +01:00
Konstantin Ananyev	7b0a120157	bpf: fix doxygen comment Fix typo in doxygen comments for rte_bpf_convert(). Fixes: `2eccf6afbe` ("bpf: add function to convert classic BPF to DPDK BPF") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 19:56:14 +01:00
David Marchand	54abd300d5	pipeline: remove unreachable branch A previous change blamed it on compiler/ASan, while this is a real (yet minor) issue. This return -EINVAL is never reached since we test all combinations of fidx and fcin booleans. All branches end up with a return 0, factorize them. Fixes: `84f5ac9418` ("pipeline: fix build with ASan") Fixes: `f38913b7fb` ("pipeline: add meter array to SWX") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-11-04 18:11:08 +01:00
Yogesh Jangra	2ce3ccbe44	pipeline: fix dead code Fix minor dead code issue reported by Coverity. Coverity issue: 373653 Fixes: e9d870 ("pipeline: add SWX pipeline tables") Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-11-04 16:43:27 +01:00
Wojciech Liguzinski	44c730b0e3	sched: add PIE based congestion management Implement PIE based congestion management based on rfc8033. The Proportional Integral Controller Enhanced (PIE) algorithm works by proactively dropping packets randomly. PIE is implemented as more advanced queue management is required to address the bufferbloat problem and provide desirable quality of service to users. Tests for PIE code added to test application. Added PIE related information to documentation. Signed-off-by: Wojciech Liguzinski <wojciechx.liguzinski@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>	2021-11-04 15:41:49 +01:00
David Marchand	5633173341	eal/linux: fix device hotplug The device event interrupt handler was always freed. Bugzilla ID: 845 Fixes: `c2bd9367e1` ("lib: remove direct access to interrupt handle") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-11-04 15:13:41 +01:00
David Marchand	4847122aab	eal/linux: fix uevent message parsing Caught with ASan: ==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578 READ of size 1 at 0x7f0daa2fc0d0 thread T1 #0 0x7f0daeefacb1 (/lib64/libasan.so.5+0xbacb1) #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167 #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248 #3 0x1169b91 in eal_intr_process_interrupts ../lib/eal/linux/eal_interrupts.c:1026 #4 0x116a3a2 in eal_intr_handle_interrupts ../lib/eal/linux/eal_interrupts.c:1100 #5 0x116a7f0 in eal_intr_thread_main ../lib/eal/linux/eal_interrupts.c:1172 #6 0x112640a in ctrl_thread_init ../lib/eal/common/eal_common_thread.c:202 #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159) #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72) Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192 in frame #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226 This frame has 2 object(s): [32, 48) 'uevent' [96, 4192) 'buf' <== Memory access at offset 4192 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions are supported) Thread T1 created by T0 here: #0 0x7f0daee92ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3) #1 0x1126542 in rte_ctrl_thread_create ../lib/eal/common/eal_common_thread.c:228 #2 0x116a8b5 in rte_eal_intr_init ../lib/eal/linux/eal_interrupts.c:1200 #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044 #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105 #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802) Bugzilla ID: 792 Fixes: `0d0f478d04` ("eal/linux: add uevent parse and process") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-11-04 15:13:41 +01:00
Jim Harris	628bac7df1	eal/linux: remove unused variable for socket memory clang-13 rightfully complains that the total_mem variable in eal_parse_socket_arg is set but not used, since the final accumulated total_mem result isn't used anywhere. So just remove the total_mem variable. Fixes: `0a703f0f36` ("eal/linux: fix parsing zero socket memory and limits") Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-11-04 13:27:18 +01:00
Vladimir Medvedkin	11c5b9b51a	fib: add RIB extension size parameter This patch adds a new parameter to the FIB configuration to specify the size of the extension for internal RIB structure. Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Tested-by: Conor Walsh <conor.walsh@intel.com>	2021-11-04 12:38:03 +01:00
Xueming Li	fc382022c6	eal: fix device iterator when no bus is selected Devargs used in device iterator initialization wasn't set to zero, random data like bus string lead to invalid address access. This patch initializes devargs. Bugzilla ID: 862 Fixes: `c99a2d4c6b` ("eal: implement device iteration initialization") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com>	2021-11-04 11:44:49 +01:00
Vladimir Medvedkin	adeca6685f	hash: fix use after free in Toeplitz hash This patch fixes use after free in thash library, reported by ASAN. Bugzilla ID: 868 Fixes: `28ebff11c2` ("hash: add predictable RSS") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-11-04 11:43:20 +01:00
Vladimir Medvedkin	d27e2b7e9c	hash: enable GFNI Toeplitz hash implementation This patch enables new GFNI Toeplitz hash in predictable RSS library. Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Vladimir Medvedkin	31d7c06947	hash: add bulk Toeplitz hash implementation This patch adds a bulk version for the Toeplitz hash implemented with Galios Fields New Instructions (GFNI). Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Vladimir Medvedkin	4fd8c4cb0d	hash: add new Toeplitz hash implementation This patch add a new Toeplitz hash implementation using Galios Fields New Instructions (GFNI). Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Dmitry Kozlyuk	9790fc2149	eal/freebsd: fix IOVA mode selection FreeBSD EAL selected IOVA mode PA even in --no-huge mode where PA are not available. Memory zones were created with IOVA equal to RTE_BAD_IOVA with no indication this field is not usable. Change IOVA mode detection: 1. Always allow to force --iova-mode=va. 2. In --no-huge mode, disallow forcing --iova-mode=pa, and select VA. 3. Otherwise select IOVA mode according to bus requests, default to PA. In case contigmem is inaccessible, memory initialization will fail with a message indicating the cause. Fixes: `c2361bab70` ("eal: compute IOVA mode based on PA availability") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-11-03 18:32:19 +01:00
Feifei Wang	6b70c6b31f	distributor: use wait until scheme Instead of polling for bufptr64 to be updated, use wait until scheme for this case. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	388bee69a5	bpf: use wait until scheme for Rx/Tx iteration Instead of polling for cbi->use to be updated, use wait until scheme. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	4ed4e554ac	mcslock: use wait until scheme for unlock Instead of polling for mcslock to be updated, use wait until scheme for this case. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	41902d2468	pflock: use wait until scheme for read lock Instead of polling for read pflock update, use wait until scheme for this case. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	875f350924	eal: add a new helper for wait until scheme Add a new generic helper which is a macro for wait until scheme. Furthermore, to prevent compilation warning in arm: ---------------------------------------------- 'warning: implicit declaration of function ...' ---------------------------------------------- Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'. And add ‘__RTE_ARM’ for these macros to fix the namespace. This is because original macros are undefine at the end of the file. If the new macro calls them in other files, they will be seen as 'not defined'. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Konstantin Ananyev	53caecb844	pdump: fix freeing statistics memzone rte_pdump_init() always allocates new memzone for pdump_stats. Though rte_pdump_uninit() never frees it. So the following combination will always fail: rte_pdump_init(); rte_pdump_uninit(); rte_pdump_init(); The issue was caught by pdump_autotest UT. While first test run successful, any consecutive runs of this test-case will fail. Fix the issue by calling rte_memzone_free() for statistics memzone. Fixes: `10f726efe2` ("pdump: support pcapng and filtering") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2021-11-03 12:53:03 +01:00
Stephen Hemminger	b2be63b55a	pdump: fix packet snapshot length initialization If packet dump was enabled via pdump_enable_by_deviceid the packet snapshot length was not being set. Bugzilla ID: 840 Fixes: `10f726efe2` ("pdump: support pcapng and filtering") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2021-11-01 00:36:29 +01:00
Stephen Hemminger	ae1702fffe	pcapng: use new ethdev namespace RTE_ prefix was added by commit `295968d174` ("ethdev: add namespace") Fixes: `8d23ce8f5e` ("pcapng: add new library for writing pcapng files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2021-10-31 23:25:02 +01:00
Zhihong Peng	6cc51b1293	mem: instrument allocator for ASan This patch adds necessary hooks in the memory allocator for ASan. This feature is currently available in DPDK only on Linux x86_64. If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be defined and RTE_MALLOC_ASAN must be set accordingly in meson. Signed-off-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2021-10-29 16:25:03 +02:00
Zhihong Peng	84f5ac9418	pipeline: fix build with ASan Code changes to avoid the following build error: "Control reaches end of non-void function". Signed-off-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-29 15:25:34 +02:00
Anatoly Burakov	ab910a8068	vfio: fix partial unmap Partial unmap support was introduced in commit `c13ca4e81c` ("vfio: fix DMA mapping granularity for IOVA as VA"), and with it was added a check that dereferenced the IOMMU type to determine whether partial ummapping is supported for currently configured IOMMU type. In certain circumstances (such as when VFIO is supported, but no devices were bound to the VFIO driver), the IOMMU type pointer can be NULL. However, dereferencing of IOMMU type was guarded by access to the user maps list - that is, we were always checking the user map list first, and then, if we found a memory region that encloses the one we're trying to unmap, we would have performed the IOMMU type check. This ensured that the IOMMU type check will not cause any NULL pointer dereferences, because in order for an IOMMU type check to have been performed, there necessarily must have been at least one memory region that was previously mapped successfully, and that implies having a defined IOMMU type. When commit `56259f7fc0` ("vfio: allow partially unmapping adjacent memory") was introduced, the IOMMU type check was moved to before we were traversing the user mem maps list, thereby introducing a potential NULL dereference, because the IOMMU type access was no longer guarded by the user mem maps list traversal. Fix the issue by moving the IOMMU type check to after the user mem maps traversal, thereby ensuring that by the time the check happens, the IOMMU type is always valid. Fixes: `56259f7fc0` ("vfio: allow partially unmapping adjacent memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Xuan Ding <xuan.ding@intel.com>	2021-10-28 09:51:55 +02:00
Honnappa Nagarahalli	705356f081	eal: simplify control thread creation Remove the usage of pthread barrier and replace it with synchronization using atomic variable. This also removes the use of reference count required to synchronize freeing the memory. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 21:43:10 +02:00
Harman Kalra	8cb5d08db9	interrupts: extend event list Dynamically allocating the efds and elist array of intr_handle structure, based on size provided by user. Eg size can be MSIX interrupts supported by a PCI device. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	99e6c7e316	interrupts: rename device specific file descriptor VFIO/UIO are mutually exclusive, storing file descriptor in a single field is enough. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	73d844fd08	interrupts: make interrupt handle structure opaque Moving interrupt handle structure definition inside a EAL private header to make its fields totally opaque to the outside world. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	d61138d4f0	drivers: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the drivers access the interrupt handle fields. Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	c2bd9367e1	lib: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the libraries access the interrupt handle fields. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	90b13ab8d4	alarm: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the libraries access the interrupt handle fields. Implementing alarm cleanup routine, where the memory allocated for interrupt instance can be freed. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	bbbac4cd6e	interrupts: remove direct access to interrupt handle Making changes to the interrupt framework to use interrupt handle APIs to get/set any field. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	b7c9842916	interrupts: add allocator and accessors Prototype/Implement get set APIs for interrupt handle fields. User won't be able to access any of the interrupt handle fields directly while should use these get/set APIs to access/manipulate them. Internal interrupt header i.e. rte_eal_interrupt.h is rearranged, as APIs defined are moved to rte_interrupts.h and epoll specific definitions are moved to a new header rte_epoll.h. Later in the series rte_eal_interrupt.h will be removed. Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Dmitry Kozlyuk	0c8fc83a71	eal/windows: fix IOVA mode detection and handling Windows EAL did not detect IOVA mode and worked incorrectly if physical addresses could not be obtained (if virt2phys driver was missing or inaccessible). In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address. Inability to obtain IOVA, be it PA or VA, should cause a failure for the DPDK allocator, but it was hidden by the implementation, so allocations did not fail when they should. The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode. However, rte_eal_iova_mode() always returned RTE_IOVA_DC (while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA), because IOVA mode detection was not implemented. Implement IOVA mode detection: 1. Always allow to force --iova-mode=va. 2. Allow to force --iova-mode=pa only if virt2phys is available. 3. If no mode is forced and virt2phys is available, select the mode according to bus requests, default to PA. 4. If no mode is forced but virt2phys is unavailable, default to VA. Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA. Fix rte_eal_iova_mode() by returning the selected mode. Fixes: `2a5d547a4a` ("eal/windows: implement basic memory management") Cc: stable@dpdk.org Reported-by: Tal Shnaiderman <talshn@nvidia.com> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com> Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>	2021-10-25 20:59:40 +02:00
Harman Kalra	e6732d0d6e	mem: add telemetry infos Registering new telemetry callbacks to list named (memzones) and unnamed (malloc) memory reserved and return information based on arguments provided by user. Example: Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2 {"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384} Connected to application: "dpdk-testpmd" --> --> /eal/memzone_list {"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]} --> --> --> /eal/memzone_info,0 {"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data", \ "Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \ "Hugepage_size": 536870912, "Hugepage_base": "0x120000000", \ "Hugepage_used": 1}} --> --> --> /eal/memzone_info,6 {"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0", \ "Length": 669918336, "Address": "0x15811db80", "Socket": 0, \ "Flags": 0, "Hugepage_size": 536870912, "Hugepage_base": "0x140000000", \ "Hugepage_used": 2}} --> --> --> /eal/memzone_info,14 {"/eal/memzone_info": null} --> --> --> /eal/heap_list {"/eal/heap_list": [0]} --> --> --> /eal/heap_info,0 {"/eal/heap_info": {"Head id": 0, "Name": "socket_0", \ "Heap_size": 1610612736, "Free_size": 927645952, \ "Alloc_size": 682966784, "Greatest_free_size": 529153152, \ "Alloc_count": 482, "Free_count": 2}} Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Ciara Power <ciara.power@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-25 19:39:54 +02:00
Vladimir Medvedkin	97e2ae4c58	rib: fix IPv6 depth mask Fixes: `03b8372a9a` ("rib: fix max depth IPv6 lookup") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2021-10-25 19:13:12 +02:00
Vladimir Medvedkin	b16ac53657	lpm6: fix buffer overflow This patch fixes buffer overflow reported by ASAN, please reference https://bugs.dpdk.org/show_bug.cgi?id=819 The rte_lpm6 keeps routing information for control plane purpose inside the rte_hash table which uses rte_jhash() as a hash function. From the rte_jhash() documentation: If input key is not aligned to four byte boundaries or a multiple of four bytes in length, the memory region just after may be read (but not used in the computation). rte_lpm6 uses 17 bytes keys consisting of IPv6 address (16 bytes) + depth (1 byte). This patch increases the size of the depth field up to uint32_t and sets the alignment to 4 bytes. Bugzilla ID: 819 Fixes: `86b3b21952` ("lpm6: store rules in hash table") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-25 19:08:16 +02:00
Vladimir Medvedkin	45523f494c	hash: fix Doxygen comment of Toeplitz file Fixes: `7574c3ef74` ("hash: add toeplitz algorithm used by RSS") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2021-10-25 19:06:07 +02:00
Honnappa Nagarahalli	3596537005	eal: fix memory ordering around lcore task accesses Ensure that the memory operations before the call to rte_eal_remote_launch are visible to the worker thread. Use the function pointer to execute in worker thread as the guard variable. Ensure that the memory operations in worker thread, that happen before it returns the status of the assigned function, are visible to the main thread. Use the variable containing the lcore's state as the guard variable. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli	f6c6c686f1	eal: remove FINISHED lcore state FINISHED state seems to be used to indicate that the worker's update of the 'state' is not visible to other threads. There seems to be no requirement to have such a state. Since the FINISHED state is removed, the API rte_eal_wait_lcore is updated to always return the status of the last function that ran in the worker core. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli	33969e9c61	eal: reset lcore task callback and argument In the rte_eal_remote_launch function, the lcore function pointer is checked for NULL. However, the pointer is never reset to NULL. Reset the lcore function pointer and argument after the worker has completed executing the lcore function. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Eli Britstein	6de430b707	eal/x86: avoid cast-align warning in memcpy functions Functions and macros in x86 rte_memcpy.h may cause cast-align warnings, when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static For example: In file included from main.c:24: /dpdk/build/include/rte_memcpy.h: In function 'rte_mov16': /dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases required alignment of target type [-Wcast-align] 306 \| xmm0 = _mm_loadu_si128((const __m128i )src); \| ^ As the code assumes correct alignment, add first a (void ) or (const void *) castings, to avoid the warnings. Fixes: `9484092baa` ("eal/x86: optimize memcpy for AVX512 platforms") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com>	2021-10-25 17:28:12 +02:00
Eli Britstein	da0333c879	mbuf: avoid cast-align warning in data offset macro In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type 't', which may cause cast-align warning when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static main.c: In function 'l2fwd_mac_updating': /dpdk/build/include/rte_mbuf_core.h:719:3: warning: cast increases required alignment of target type [-Wcast-align] 719 \| ((t)((char )(m)->buf_addr + (m)->data_off + (o))) \| ^ /dpdk/build/include/rte_mbuf_core.h:733:32: note: in expansion of macro 'rte_pktmbuf_mtod_offset' 733 \| #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0) \| ^~~~~~~~~~~~~~~~~~~~~~~ As the code assumes correct alignment, add first a (void ) casting, to avoid the warning. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 17:27:48 +02:00
Eli Britstein	a3f8d05871	net: avoid cast-align warning in VLAN insert function In rte_vlan_insert there is a casting of rte_pktmbuf_prepend returned value to (struct rte_ether_hdr ), which causes cast-align warning when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static In file included from main.c:35: /dpdk/build/include/rte_ether.h:370:7: warning: cast increases required alignment of target type [-Wcast-align] 370 \| nh = (struct rte_ether_hdr ) \| ^ As the code assumes correct alignment, add first a (void *) casting, to avoid the warning. Fixes: `c974021a59` ("ether: add soft vlan encap/decap") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 17:27:17 +02:00
Dmitry Kozlyuk	6fda3ff6f0	mempool: fix non-IO flag inference When mempool had been created with RTE_MEMPOOL_F_NO_IOVA_CONTIG flag but later populated with valid IOVA, RTE_MEMPOOL_F_NON_IO was unset, while it should be kept. The unit test did not catch this because rte_mempool_populate_default() it used was populating with RTE_BAD_IOVA. Keep setting RTE_MEMPOOL_NON_IO at an empty mempool creation and add an assert for it in the unit test (remove the separate case). Do not reset the flag if RTE_MEMPOOL_F_ON_IOVA_CONTIG is set. Fixes: `11541c5c81` ("mempool: add non-IO flag") Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 16:52:56 +02:00
Jasvinder Singh	fd9e07a1f4	sched: promote a function as stable This API was introduced in 18.05, therefore removing experimental tag to promote it to stable state Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-25 15:14:22 +02:00
Yogesh Jangra	cd79e02058	pipeline: support action annotations Enable restricting the scope of an action to regular table entries or to the table default entry in order to support the P4 language tableonly or defaultonly annotations. Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-25 14:53:28 +02:00
Yogesh Jangra	0317c4521d	port: configure loop count for source port Add support for configurable number of loops through the input PCAP file for the source port. Added an additional parameter to source port CLI command. Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-25 14:30:32 +02:00
Yogesh Jangra	55095ccb7f	pipeline: fix instruction label check The instruction_data array was incorrectly indexed, which resulted in the array index getting out of bounds and sometimes segfault. Fixes: a1711f (“pipeline: add SWX Rx and extract instructions“) Cc: stable@dpdk.org Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-25 14:06:02 +02:00
David Marchand	e0d3a74d92	net: fix build with pedantic for L2TPv2 definitions Build is broken on RHEL7 following introduction of this new protocol. Fixes: `3a929df1f2` ("ethdev: support L2TPv2 and PPP procotol") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 09:33:15 +02:00
Olivier Matz	daa02b5cdd	mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-24 13:37:43 +02:00
Olivier Matz	5b63493241	mbuf: mark old VLAN offload flags as deprecated The flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT are marked as deprecated since commit `380a7aab1a` ("mbuf: rename deprecated VLAN flags") (2017). But they were not using the RTE_DEPRECATED macro, because it did not exist at this time. Add it, and replace usage of these flags. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-24 13:30:40 +02:00
Olivier Matz	0c03660db1	mbuf: remove duplicate definition of cksum offload flags The flags PKT_RX_L4_CKSUM_BAD and PKT_RX_IP_CKSUM_BAD are defined twice with the same value. Remove one of the occurrence, which was marked as "deprecated". Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-24 13:30:40 +02:00
Radu Nicolau	74176aec37	ipsec: fix telemetry text Set correct tunnel type telemetry text - tunnel type was wrongly set as IPv4-UDP for all types. Fixes: bf5b65a8e781 ("ipsec: support SA telemetry") Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-20 15:55:37 +02:00
Akhil Goyal	92cb130919	cryptodev: move device-specific structures The device specific structures - rte_cryptodev and rte_cryptodev_data are moved to cryptodev_pmd.h to hide it from the applications. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Tested-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	f6849cdcc6	cryptodev: use new flat array in fast path API Rework fast-path cryptodev functions to use rte_crypto_fp_ops[]. While it is an API/ABI breakage, this change is intended to be transparent for both users (no changes in user app is required) and PMD developers (no changes in PMD is required). Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	33cd3fd52f	cryptodev: add device probing finish function Added a rte_cryptodev_pmd_probing_finish API which need to be called by the PMD after the device is initialized completely. This will set the fast path function pointers in the flat array for secondary process. For primary process, these are set in rte_cryptodev_start. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	2fd66f758f	cryptodev: move inline APIs into separate structure Move fastpath inline function pointers from rte_cryptodev into a separate structure accessed via a flat array. The intention is to make rte_cryptodev and related structures private to avoid future API/ABI breakages. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Tested-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	7f3876ad54	cryptodev: allocate max space for internal queue array At queue_pair config stage, allocate memory for maximum number of queue pair pointers that a device can support. This will allow fast path APIs(enqueue_burst/dequeue_burst) to refer pointer to internal QP data without checking for currently configured QPs. This is required to hide the rte_cryptodev and rte_cryptodev_data structure from user. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	691e1f4d56	cryptodev: separate out internal structures A new header file rte_cryptodev_core.h is added and all internal data structures which need not be exposed directly to application are moved to this file. These structures are mostly used by drivers, but they need to be in the public header file as they are accessed by datapath inline functions for performance reasons. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Tested-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Andrew Rybchenko	68e8ca7b59	ethdev: avoid usage of ULL for 64-bit unsigned constants Use UINT64_C() macro instead. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 19:11:35 +02:00
Andrew Rybchenko	4852c647d1	ethdev: replace single bit masks with macros The macros RTE_BIT32 and RTE_BIT64 are used to replace single bit masks. Do not switch VLAN offload flags since type is not fixed size. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 18:36:34 +02:00
Ferruh Yigit	295968d174	ethdev: add namespace Add 'RTE_ETH' namespace to all enums & macros in a backward compatible way. The macros for backward compatibility can be removed in next LTS. Also updated some struct names to have 'rte_eth' prefix. All internal components switched to using new names. Syntax fixed on lines that this patch touches. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-22 18:15:38 +02:00
Ivan Ilchenko	b26bee10ee	ethdev: forbid MTU set before device configure rte_eth_dev_configure() always sets MTU to either dev_conf.rxmode.mtu or RTE_ETHER_MTU if application doesn't provide the value. So, there is no point to allow rte_eth_dev_set_mtu() before since set value will be overwritten on configure anyway. Fixes: `1bb4a528c4` ("ethdev: fix max Rx packet length") Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 15:26:54 +02:00
Andrew Rybchenko	9ce1717d3e	ethdev: remove unused L2 tunnel mask defines Fixes: `cf47acc0f9` ("ethdev: remove L2 tunnel offload control API") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 12:03:52 +02:00
Xueming Li	93e441c9a0	ethdev: get device capability name as string This patch adds API to return name of device capability. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-22 00:08:57 +02:00
Xueming Li	dd22740cc2	ethdev: introduce shared Rx queue In current DPDK framework, each Rx queue is pre-loaded with mbufs to save incoming packets. For some PMDs, when number of representors scale out in a switch domain, the memory consumption became significant. Polling all ports also leads to high cache miss, high latency and low throughput. This patch introduces shared Rx queue. Ports in same Rx domain and switch domain could share Rx queue set by specifying non-zero sharing group in Rx queue configuration. Shared Rx queue is identified by share_rxq field of Rx queue configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using same shared Rx queue ID. No special API is defined to receive packets from shared Rx queue. Polling any member port of a shared Rx queue receives packets of that queue for all member ports, port_id is identified by mbuf->port. PMD is responsible to resolve shared Rx queue from device and queue data. Shared Rx queue must be polled in same thread or core, polling a queue ID of any member port is essentially same. Multiple share groups are supported. PMD should support mixed configuration by allowing multiple share groups and non-shared Rx queue on one port. Example grouping and polling model to reflect service priority: Group1, 2 shared Rx queues per port: PF, rep0, rep1 Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127 Core0: poll PF queue0 Core1: poll PF queue1 Core2: poll rep2 queue0 PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE. PMD is responsible for shared Rx queue consistency checks to avoid member port's configuration contradict each other. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:08:50 +02:00
Huisong Li	17faaed854	ethdev: fix PCI device release in secondary process In secondary process, rte_eth_dev_close() doesn't clear eth_dev->data. If calling rte_dev_remove() after rte_eth_dev_close(), in rte_eth_dev_pci_generic_remove() function, the released eth device still can be found by its name in shared memory. As a result, the eth device will be released repeatedly. The state of the eth device is modified to RTE_ETH_DEV_UNUSED after rte_eth_dev_close(). So this state can be used to avoid this problem. Fixes: `dcd5c8112b` ("ethdev: add PCI driver helpers") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 23:15:34 +02:00
Xuan Ding	7c61fa08b7	vhost: enable IOMMU for async vhost The use of IOMMU has many advantages, such as isolation and address translation. This patch extends the capability of DMA engine to use IOMMU if the DMA engine is bound to vfio. When set memory table, the guest memory will be mapped into the default container of DPDK. Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xuan Ding	56259f7fc0	vfio: allow partially unmapping adjacent memory Currently, if we map a memory area A, then map a separate memory area B that by coincidence happens to be adjacent to A, current implementation will merge these two segments into one, and if partial unmapping is not supported, these segments will then be only allowed to be unmapped in one go. In other words, given segments A and B that are adjacent, it is currently not possible to map A, then map B, then unmap A. Fix this by adding a notion of "chunk size", which will allow subdividing segments into equally sized segments whenever we are dealing with an IOMMU that does not support partial unmapping. With this change, we will still be able to merge adjacent segments, but only if they are of the same size. If we keep with our above example, adjacent segments A and B will be stored as separate segments if they are of different sizes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Li Feng	5a4fbe79e6	vhost: add sanity check on inflight last index The index in rte_vhost_set_last_inflight_io_split is from the frontend driver, check if it's in the virtqueue range. Fixes: `bb0c2de960` ("vhost: add APIs to operate inflight ring") Cc: stable@dpdk.org Signed-off-by: Li Feng <fengli@smartx.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Jie Wang	3a929df1f2	ethdev: support L2TPv2 and PPP procotol Added flow pattern items and header formats of L2TPv2 and PPP. Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Jie Wang <jie1x.wang@intel.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 14:15:59 +02:00
Andrew Rybchenko	55645ee65b	ethdev: remove full stop after short comments Full stop at the end of short comment just make line longer. It should be either everywhere or nowhere to be consistent. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	cc0a644450	ethdev: make device and data structures readable Add empty lines to separate fields commented using different styles. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	32ec9c6be7	ethdev: remove reserved fields from internal structures Fixes: `f9bdee267a` ("ethdev: hide internal structures") Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	bf73419d96	ethdev: fix EEPROM spelling Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	5906be5af6	ethdev: fix ID spelling in comments and log messages Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	5b49ba658b	ethdev: fix VLAN spelling including VLAN ID case Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	064e90c419	ethdev: fix DCB and VMDq spelling Fix both in one changeset since they share line in a number of cases. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	0d9f56a857	ethdev: fix Ethernet spelling Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	09fd42275b	ethdev: fix Rx/Tx spelling Fix it everywhere in ethdev including log messages. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	3c2ca0a982	ethdev: avoid documentation in next lines Documentation in the next separate line is confusing. If documentation requires own line it should be before, not after. Move documentation to the previous line if documentation on the same line makes it too long. Fix a number of incorrect markups on the way. When a lines is touched by the patch anyway, do other cosmetics changes to avoid changes in next patches. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Zhihong Peng	6ad06203a5	cmdline: free on exit Malloc cl in the cmdline_stdin_new function, so release in the cmdline_stdin_exit function is logical, so that cl will not be released alone. Fixes: `af75078fec` ("first public release") Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Reviewed-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Zhihong Peng <zhihongx.peng@intel.com>	2021-10-22 23:32:00 +02:00
Dmitry Kozlyuk	f8f8dc2890	cmdline: make struct rdline opaque Hide struct rdline definition and some RDLINE_* constants in order to be able to change internal buffer sizes transparently to the user. Add new functions: * rdline_new(): allocate and initialize struct rdline. This function replaces rdline_init() and takes an extra parameter: opaque user data for the callbacks. * rdline_free(): deallocate struct rdline. * rdline_get_history_buffer_size(): for use in tests. * rdline_get_opaque(): to obtain user data in callback functions. Remove rdline_init() function from library headers and export list, because using it requires the knowledge of sizeof(struct rdline). Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>	2021-10-22 23:23:45 +02:00
Dmitry Kozlyuk	f43809d28c	cmdline: make struct cmdline opaque Remove the definition of `struct cmdline` from public header. Deprecation notice: https://mails.dpdk.org/archives/dev/2020-September/183310.html Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>	2021-10-22 22:44:18 +02:00
Gowrishankar Muthukrishnan	2f5c4025ab	mempool: add telemetry endpoint Add telemetry endpoint for mempool info. Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com> Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-22 22:40:59 +02:00
Bruce Richardson	b1094939a5	build/windows: remove separate list of libs Rather than maintaining a separate list of libraries which are to be built on windows, use the standard library list and explicitly add to each library that is not to be built a check for windows and disable the library at that per-lib level. As well as shortening the main lib/meson.build file, this also leads to the build summary at the end of the meson config run correctly listing the libraries which are not to be built. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-22 22:40:59 +02:00
Bruce Richardson	fed600889d	dmadev: enable build on Windows The dmadev library was not added to the list of libraries built on Windows, meaning it was skipped in those builds and also that none of the drivers were being considered for build. Adding dmadev to the list fixes this, and also enables the skeleton dmadev driver to be built - all-be-it with a small fix necessary. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com> Tested-by: Conor Walsh <conor.walsh@intel.com>	2021-10-22 22:40:59 +02:00
David Marchand	223e0f7244	dmadev: remove symbol versioning for inline helpers Inline helpers have no global symbols in shared libraries. There is no reason to ask for versioning (plus this library would not build on Windows). Fixes: `91e581e5c9` ("dmadev: add data plane API") Fixes: `ea8cf0f853` ("dmadev: add burst capacity API") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-22 22:40:59 +02:00
Stephen Hemminger	10f726efe2	pdump: support pcapng and filtering This enhances the DPDK pdump library to support new pcapng format and filtering via BPF. The internal client/server protocol is changed to support two versions: the original pdump basic version and a new pcapng version. The internal version number (not part of exposed API or ABI) is intentionally increased to cause any attempt to try mismatched primary/secondary process to fail. Add new API to do allow filtering of captured packets with DPDK BPF (eBPF) filter program. It keeps statistics on packets captured, filtered, and missed (because ring was full). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-22 22:07:48 +02:00
Stephen Hemminger	745b7587f9	bpf: add function to dump eBPF instructions When debugging converted (and other) programs it is useful to see disassembled eBPF output. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-22 22:07:48 +02:00
Stephen Hemminger	2eccf6afbe	bpf: add function to convert classic BPF to DPDK BPF The pcap library emits classic BPF (32 bit) and is useful for creating filter programs. The DPDK BPF library only implements extended BPF (eBPF). Add an function to convert from old to new. The rte_bpf_convert function uses rte_malloc to put the resulting program in hugepage shared memory so it can be passed from a secondary process to a primary process. The code to convert was originally done as part of the Linux kernel implementation then converted to a userspace program. See https://github.com/tklauser/filter2xdp Both authors have agreed that it is allowable to create a modified version of this code and license it with BSD license used by DPDK. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-22 17:19:13 +02:00
Stephen Hemminger	80da61198b	bpf: allow self-xor operation Some BPF programs may use XOR of a register with itself as a way to zero register in one instruction. The BPF filter converter generates this in the prolog to the generated code. The BPF validator would not allow this because the value of register was undefined. But after this operation it always zero. Fixes: `8021917293` ("bpf: add extra validation for input BPF program") Cc: stable@dpdk.org Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2021-10-22 17:19:13 +02:00
Stephen Hemminger	8d23ce8f5e	pcapng: add new library for writing pcapng files This is utility library for writing pcapng format files used by Wireshark family of utilities. Older tcpdump also knows how to read (but not write) this format. See https://github.com/pcapng/pcapng/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-22 17:19:07 +02:00
Stephen Hemminger	09644b58a1	pdump: disable on Windows The current version of the pdump library was building on Windows, but it was useless since the pdump utility was not being built and Windows does not have multi-process support. The new version of pdump with filtering now has dependency on bpf. But bpf library is not available on Windows. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-10-22 15:46:19 +02:00
Chengwen Feng	a188277d53	dmadev: fix debug build This patch fix compile error when enable RTE_DMADEV_DEBUG. Fixes: `ea8cf0f853` ("dmadev: add burst capacity API") Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Conor Walsh <conor.walsh@intel.com>	2021-10-21 22:10:22 +02:00
David Marchand	c61c8282ef	dmadev: hide devices array No need to expose rte_dma_devices out of the dmadev library. Existing helpers should be enough, and inlines make use of rte_dma_fp_objs. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com> Tested-by: Conor Walsh <conor.walsh@intel.com> Acked-by: Kevin Laatz <kevin.laatz@intel.com>	2021-10-21 22:01:37 +02:00
Harry van Haaren	976329581d	eventdev: add usage hints to port configure API This commit introduces 3 flags to the port configuration flags. These flags allow the application to indicate what type of work is expected to be performed by an eventdev port. The three new flags are - RTE_EVENT_PORT_CFG_HINT_PRODUCER (mostly RTE_EVENT_OP_NEW events) - RTE_EVENT_PORT_CFG_HINT_CONSUMER (mostly RTE_EVENT_OP_RELEASE events) - RTE_EVENT_PORT_CFG_HINT_WORKER (mostly RTE_EVENT_OP_FORWARD events) These flags are only hints, and the PMDs must operate under the assumption that any port can enqueue an event with any type of op. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:16:00 +02:00
Naga Harish K S V	81da8a5ff4	eventdev/eth_rx: fix WRR buffer overrun When a poll queue is removed from a rx_adapter instance, the WRR poll array is recomputed. The wrr array length is reduced in this case. The next wrr position to poll is stored in wrr_pos variable of rx_adapter instance. This wrr_pos can become invalid in some cases after wrr is recomputed. Using this variable to get the next queue and device pair may leed to wrr buffer overruns. Resetting the wrr_pos to zero after recomputation of wrr array fixes the buffer overrun issue. Fixes: `9c38b704d2` ("eventdev: add eth Rx adapter implementation") Cc: stable@dpdk.org Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>	2021-10-21 10:16:00 +02:00
Pavan Nikhilesh	fcf782051c	eventdev: mark trace variables as internal Mark rte_trace global variables as internal i.e. remove them from experimental section of version map. Some of them are used in inline APIs, mark those as global. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-21 10:16:00 +02:00
Pavan Nikhilesh	f26f2ca657	eventdev: make trace API internal Slowpath trace APIs are only used in rte_eventdev.c so make them as internal. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2021-10-21 10:16:00 +02:00
Pavan Nikhilesh	68e9668a09	eventdev: promote event vector API to stable Promote event vector configuration APIs to stable. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-21 10:16:00 +02:00
Pavan Nikhilesh	f3f3a91788	eventdev/timer: move adapters memory to hugepage Move memory used by timer adapters to hugepage. Allocate memory on the first adapter create or lookup to address both primary and secondary process usecases. This will prevent TLB misses if any and aligns to memory structure of other subsystems. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2021-10-21 10:16:00 +02:00
Pavan Nikhilesh	1dcd67ba1e	eventdev/timer: rearrange struct fields Rearrange fields in rte_event_timer data structure to remove holes. Also, remove use of volatile from rte_event_timer. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	a256a743cf	eventdev: remove rte prefix for internal structs Remove rte_ prefix from rte_eth_event_enqueue_buffer, rte_event_eth_rx_adapter and rte_event_crypto_adapter as they are only used in rte_event_eth_rx_adapter.c and rte_event_crypto_adapter.c Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	53548ad300	eventdev: hide timer adapter PMD file Hide rte_event_timer_adapter_pmd.h file as it is an internal file. Remove rte_ prefix from rte_event_timer_adapter_ops structure. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	295c053f90	eventdev: hide event device related structures Move rte_eventdev, rte_eventdev_data structures to eventdev_pmd.h. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Harman Kalra <hkalra@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	052e25d912	eventdev: use new API for inline functions Use new driver interface for the fastpath enqueue/dequeue inline functions. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	d35e61322d	eventdev: move inline APIs into separate structure Move fastpath inline function pointers from rte_eventdev into a separate structure accessed via a flat array. The intention is to make rte_eventdev and related structures private to avoid future API/ABI breakages.` Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	9c67fcbfd6	eventdev: allocate max space for internal arrays Allocate max space for internal port, port config, queue config and link map arrays. Introduce new macro RTE_EVENT_MAX_PORTS_PER_DEV and set it to max possible value. This simplifies the port and queue reconfigure scenarios and will also allow inline functions to refer pointer to internal port data without extra checking of current number of configured queues. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	26f14535ed	eventdev: separate internal structures Create rte_eventdev_core.h and move all the internal data structures to this file. These structures are mostly used by drivers, but they need to be in the public header file as they are accessed by datapath inline functions for performance reasons. The accessibility of these data structures is not changed. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	23d06e3766	eventdev: make driver interface as internal Mark all the driver specific functions as internal, remove `rte` prefix from `struct rte_eventdev_ops`. Remove experimental tag from internal functions. Remove `eventdev_pmd.h` from non-internal header files. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2021-10-21 10:14:50 +02:00
Ganapati Kundapura	814d017093	eventdev/eth_rx: support telemetry Added telemetry callbacks to get Rx adapter stats, reset stats and to get Rx queue config information. Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Naga Harish K S V	b06bca69b7	eventdev/eth_rx: add per-queue event buffer Added per queue buffer. To configure per queue event buffer size, application sets rte_event_eth_rx_adapter_params::use_queue_event_buf flag as true while using rte_event_eth_rx_adapter_create_with_params(). The per queue event buffer size is populated in rte_event_eth_rx_adapter_queue_conf::event_buf_size and passed to rte_event_eth_rx_adapter_queue_add(). Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>	2021-10-21 10:14:50 +02:00
Naga Harish K S V	bc0df25c83	eventdev/eth_rx: add event buffer size configurability Currently event buffer is static array with a default size defined internally. To configure event buffer size from application, rte_event_eth_rx_adapter_create_with_params() API is added which takes struct rte_event_eth_rx_adapter_params to configure event buffer size in addition other params. The event buffer size is rounded up for better buffer utilization and performance. In case of NULL params argument, default event buffer size is used. Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Ganapati Kundapura	da781e6488	eventdev/eth_rx: support Rx queue config get Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue information - event queue identifier, flags for handling received packets, scheduler type, event priority, polling frequency of the receive queue and flow identifier in rte_event_eth_rx_adapter_queue_conf structure Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Ganapati Kundapura	83ab470d12	eventdev/eth_rx: use timestamp as dynamic mbuf field Add support to register timestamp dynamic field in mbuf. Update the timestamp in mbuf for each packet before enqueuing to event device if the timestamp is not already set. Adding the timestamp in Rx adapter avoids additional latency due to the event device. Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Pavan Nikhilesh	929ebdd543	eventdev/eth_rx: simplify event vector config Include vector configuration into the structure ``rte_event_eth_rx_adapter_queue_conf`` that is used to configure Rx adapter ethernet device Rx queue parameters. This simplifies event vector configuration as it avoids splitting configuration per Rx queue. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:50 +02:00
Shijith Thotton	e3f128dbee	eventdev/crypto: add cryptodev start in adapter spec Event crypto adapter spec does not mention about cryptodev start and stop. Cryptodev attached to the adapter should be started before calling crypto adapter start. Added the same in spec and test application. Signed-off-by: Shijith Thotton <sthotton@marvell.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-21 10:14:50 +02:00
Ganapati Kundapura	8113fd15e2	eventdev/eth_rx: make enqueue buffer circular Rx adapter uses memove() to move unprocessed events to the beginning of the packet enqueue buffer. The use memmove() was found to consume good amount of CPU cycles (about 20%). This patch removes the use of memove() while implementing a circular buffer to avoid copying of data. With this change RX adapter is able to fill the buffer of 16384 events. Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-10-21 10:14:49 +02:00
Xueming Li	5adef306da	devargs: make bus optional Global devargs syntax is used as device iteration filter like "class=vdpa", a devargs without bus args is valid from parsing perspective. This patch makes bus args optional. Fixes: `d2a66ad794` ("bus: add device arguments name parsing") Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Gaetan Rivet <grive@u256.net>	2021-10-21 11:32:44 +02:00
Xueming Li	9a1a9e4a2d	devargs: support path value with global device syntax Slash is used to split global device arguments. To support path value which contains slash, this patch parses devargs by locating both slash and layer name key: bus=a,name=/some/path/class=b,k1=v1/driver=c,k2=v2 "/class=" and "/driver" are valid start of a layer. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Gaetan Rivet <grive@u256.net>	2021-10-21 11:32:06 +02:00
Olivier Matz	efc6f9104c	mbuf: fix reset on mbuf free m->nb_seg must be reset on mbuf free whatever the value of m->next, because it can happen that m->nb_seg is != 1. For instance in this case: m1 = rte_pktmbuf_alloc(mp); rte_pktmbuf_append(m1, 500); m2 = rte_pktmbuf_alloc(mp); rte_pktmbuf_append(m2, 500); rte_pktmbuf_chain(m1, m2); m0 = rte_pktmbuf_alloc(mp); rte_pktmbuf_append(m0, 500); rte_pktmbuf_chain(m0, m1); As rte_pktmbuf_chain() does not reset nb_seg in the initial m1 segment (this is not required), after this code the mbuf chain have 3 segments: - m0: next=m1, nb_seg=3 - m1: next=m2, nb_seg=2 - m2: next=NULL, nb_seg=1 Then split this chain between m1 and m2, it would result in 2 packets: - first packet - m0: next=m1, nb_seg=2 - m1: next=NULL, nb_seg=2 - second packet - m2: next=NULL, nb_seg=1 Freeing the first packet will not restore nb_seg=1 in the second segment. This is an issue because it is expected that mbufs stored in pool have their nb_seg field set to 1. Fixes: `8f094a9ac5` ("mbuf: set mbuf fields while in pool") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Ali Alnubani <alialnu@nvidia.com>	2021-10-21 11:18:54 +02:00
Honnappa Nagarahalli	f4acb429d0	hash: promote some functions to stable Promote APIs to stable. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2021-10-21 09:46:47 +02:00
Honnappa Nagarahalli	0ff26704b4	ring: fix name size in ring structure Use correct define for the name array size. The change breaks ABI and hence cannot be backported to stable branches. Fixes: `38c9817ee1` ("mempool: adjust name size in related data types") Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-21 09:32:04 +02:00
Thomas Monjalon	e1823e0842	ethdev: replace bit shifts with macros The macros RTE_BIT32 and RTE_BIT64 are used to replace bit shifts. The macro UINT64C is also used to replace remaining occurrences of ULL. The bit shifts of ETH_RSS_LEVEL_* are kept for aesthetic reason. The API of rte_mtr and rte_tm is using enums for 64-bit variables. As they are enums, unsigned bit cannot be used. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-20 19:34:24 +02:00
Andrew Rybchenko	febc855b35	ethdev: forbid closing started device Ethernet device must be stopped first before close in accordance with the documentation. Fixes: `980995f8cc` ("ethdev: improve API comments of close and detach functions") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-20 19:24:22 +02:00
Viacheslav Ovsiienko	dc4d860e8a	ethdev: introduce configurable flexible item 1. Introduction and Retrospective Nowadays the networks are evolving fast and wide, the network structures are getting more and more complicated, the new application areas are emerging. To address these challenges the new network protocols are continuously being developed, considered by technical communities, adopted by industry and, eventually implemented in hardware and software. The DPDK framework follows the common trends and if we bother to glance at the RTE Flow API header we see the multiple new items were introduced during the last years since the initial release. The new protocol adoption and implementation process is not straightforward and takes time, the new protocol passes development, consideration, adoption, and implementation phases. The industry tries to mitigate and address the forthcoming network protocols, for example, many hardware vendors are implementing flexible and configurable network protocol parsers. As DPDK developers, could we anticipate the near future in the same fashion and introduce the similar flexibility in RTE Flow API? Let's check what we already have merged in our project, and we see the nice raw item (rte_flow_item_raw). At the first glance, it looks superior and we can try to implement a flow matching on the header of some relatively new tunnel protocol, say on the GENEVE header with variable length options. And, under further consideration, we run into the raw item limitations: - only fixed size network header can be represented - the entire network header pattern of fixed format (header field offsets are fixed) must be provided - the search for patterns is not robust (the wrong matches might be triggered), and actually is not supported by existing PMDs - no explicitly specified relations with preceding and following items - no tunnel hint support As the result, implementing the support for tunnel protocols like aforementioned GENEVE with variable extra protocol option with flow raw item becomes very complicated and would require multiple flows and multiple raw items chained in the same flow (by the way, there is no support found for chained raw items in implemented drivers). This RFC introduces the dedicated flex item (rte_flow_item_flex) to handle matches with existing and new network protocol headers in a unified fashion. 2. Flex Item Life Cycle Let's assume there are the requirements to support the new network protocol with RTE Flows. What is given within protocol specification: - header format - header length, (can be variable, depending on options) - potential presence of extra options following or included in the header the header - the relations with preceding protocols. For example, the GENEVE follows UDP, eCPRI can follow either UDP or L2 header - the relations with following protocols. For example, the next layer after tunnel header can be L2 or L3 - whether the new protocol is a tunnel and the header is a splitting point between outer and inner layers The supposed way to operate with flex item: - application defines the header structures according to protocol specification - application calls rte_flow_flex_item_create() with desired configuration according to the protocol specification, it creates the flex item object over specified ethernet device and prepares PMD and underlying hardware to handle flex item. On item creation call PMD backing the specified ethernet device returns the opaque handle identifying the object has been created - application uses the rte_flow_item_flex with obtained handle in the flows, the values/masks to match with fields in the header are specified in the flex item per flow as for regular items (except that pattern buffer combines all fields) - flows with flex items match with packets in a regular fashion, the values and masks for the new protocol header match are taken from the flex items in the flows - application destroys flows with flex items - application calls rte_flow_flex_item_release() as part of ethernet device API and destroys the flex item object in PMD and releases the engaged hardware resources 3. Flex Item Structure The flex item structure is intended to be used as part of the flow pattern like regular RTE flow items and provides the mask and value to match with fields of the protocol item was configured for. struct rte_flow_item_flex { void handle; uint32_t length; const uint8_t pattern; }; The handle is some opaque object maintained on per device basis by underlying driver. The protocol header fields are considered as bit fields, all offsets and widths are expressed in bits. The pattern is the buffer containing the bit concatenation of all the fields presented at item configuration time, in the same order and same amount. If byte boundary alignment is needed an application can use a dummy type field, this is just some kind of gap filler. The length field specifies the pattern buffer length in bytes and is needed to allow rte_flow_copy() operations. The approach of multiple pattern pointers and lengths (per field) was considered and found clumsy - it seems to be much suitable for the application to maintain the single structure within the single pattern buffer. 4. Flex Item Configuration The flex item configuration consists of the following parts: - header field descriptors: - next header - next protocol - sample to match - input link descriptors - output link descriptors The field descriptors tell the driver and hardware what data should be extracted from the packet and then control the packet handling in the flow engine. Besides this, sample fields can be presented to match with patterns in the flows. Each field is a bit pattern. It has width, offset from the header beginning, mode of offset calculation, and offset related parameters. The next header field is special, no data are actually taken from the packet, but its offset is used as a pointer to the next header in the packet, in other words the next header offset specifies the size of the header being parsed by flex item. There is one more special field - next protocol, it specifies where the next protocol identifier is contained and packet data sampled from this field will be used to determine the next protocol header type to continue packet parsing. The next protocol field is like eth_type field in MAC2, or proto field in IPv4/v6 headers. The sample fields are used to represent the data be sampled from the packet and then matched with established flows. There are several methods supposed to calculate field offset in runtime depending on configuration and packet content: - FIELD_MODE_FIXED - fixed offset. The bit offset from header beginning is permanent and defined by field_base configuration parameter. - FIELD_MODE_OFFSET - the field bit offset is extracted from other header field (indirect offset field). The resulting field offset to match is calculated from as: field_base + (offset_base & offset_mask) << offset_shift This mode is useful to sample some extra options following the main header with field containing main header length. Also, this mode can be used to calculate offset to the next protocol header, for example - IPv4 header contains the 4-bit field with IPv4 header length expressed in dwords. One more example - this mode would allow us to skip GENEVE header variable length options. - FIELD_MODE_BITMASK - the field bit offset is extracted from other header field (indirect offset field), the latter is considered as bitmask containing some number of one bits, the resulting field offset to match is calculated as: field_base + bitcount(offset_base & offset_mask) << offset_shift This mode would be useful to skip the GTP header and its extra options with specified flags. - FIELD_MODE_DUMMY - dummy field, optionally used for byte boundary alignment in pattern. Pattern mask and data are ignored in the match. All configuration parameters besides field size and offset are ignored. Note: "" - means the indirect field offset is calculated and actual data are extracted from the packet by this offset (like data are fetched by pointer p from memory). The offset mode list can be extended by vendors according to hardware supported options. The input link configuration section tells the driver after what protocols and at what conditions the flex item can follow. Input link specified the preceding header pattern, for example for GENEVE it can be UDP item specifying match on destination port with value 6081. The flex item can follow multiple header types and multiple input links should be specified. At flow creation time the item with one of the input link types should precede the flex item and driver will select the correct flex item settings, depending on the actual flow pattern. The output link configuration section tells the driver how to continue packet parsing after the flex item protocol. If multiple protocols can follow the flex item header the flex item should contain the field with the next protocol identifier and the parsing will be continued depending on the data contained in this field in the actual packet. The flex item fields can participate in RSS hash calculation, the dedicated flag is present in the field description to specify what fields should be provided for hashing. 5. Flex Item Chaining If there are multiple protocols supposed to be supported with flex items in chained fashion - two or more flex items within the same flow and these ones might be neighbors in the pattern, it means the flex items are mutual referencing. In this case, the item that occurred first should be created with empty output link list or with the list including existing items, and then the second flex item should be created referencing the first flex item as input arc, drivers should adjust the item configuration. Also, the hardware resources used by flex items to handle the packet can be limited. If there are multiple flex items that are supposed to be used within the same flow it would be nice to provide some hint for the driver that these two or more flex items are intended for simultaneous usage. The fields of items should be assigned with hint indices and these indices from two or more flex items supposed to be provided within the same flow should be the same as well. In other words, the field hint index specifies the group of fields that can be matched simultaneously within a single flow. If hint indices are specified, the driver will try to engage not overlapping hardware resources and provide independent handling of the field groups with unique indices. If the hint index is zero the driver assigns resources on its own. 6. Example of New Protocol Handling Let's suppose we have the requirements to handle the new tunnel protocol that follows UDP header with destination port 0xFADE and is followed by MAC header. Let the new protocol header format be like this: struct new_protocol_header { rte_be32 header_length; /* length in dwords, including options / rte_be32 specific0; / some protocol data, no intention / rte_be32 specific1; / to match in flows on these fields / rte_be32 crucial; / data of interest, match is needed / rte_be32 options[0]; / optional protocol data, variable length / }; The supposed flex item configuration: struct rte_flow_item_flex_field field0 = { .field_mode = FIELD_MODE_DUMMY, / Affects match pattern only / .field_size = 96, / three dwords from the beginning / }; struct rte_flow_item_flex_field field1 = { .field_mode = FIELD_MODE_FIXED, .field_size = 32, / Field size is one dword / .field_base = 96, / Skip three dwords from the beginning / }; struct rte_flow_item_udp spec0 = { .hdr = { .dst_port = RTE_BE16(0xFADE), } }; struct rte_flow_item_udp mask0 = { .hdr = { .dst_port = RTE_BE16(0xFFFF), } }; struct rte_flow_item_flex_link link0 = { .item = { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &spec0, .mask = &mask0, }; struct rte_flow_item_flex_conf conf = { .next_header = { .tunnel = FLEX_TUNNEL_MODE_SINGLE, .field_mode = FIELD_MODE_OFFSET, .field_base = 0, .offset_base = 0, .offset_mask = 0xFFFFFFFF, .offset_shift = 2 / Expressed in dwords, shift left by 2 */ }, .sample = { &field0, &field1, }, .nb_samples = 2, .input_link[0] = &link0, .nb_inputs = 1 }; Let's suppose we have created the flex item successfully, and PMD returned the handle 0x123456789A. We can use the following item pattern to match the crucial field in the packet with value 0x00112233: struct new_protocol_header spec_pattern = { .crucial = RTE_BE32(0x00112233), }; struct new_protocol_header mask_pattern = { .crucial = RTE_BE32(0xFFFFFFFF), }; struct rte_flow_item_flex spec_flex = { .handle = 0x123456789A .length = sizeiof(struct new_protocol_header), .pattern = &spec_pattern, }; struct rte_flow_item_flex mask_flex = { .length = sizeof(struct new_protocol_header), .pattern = &mask_pattern, }; struct rte_flow_item item_to_match = { .type = RTE_FLOW_ITEM_TYPE_FLEX, .spec = &spec_flex, .mask = &mask_flex, }; Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2021-10-20 18:58:54 +02:00
Gregory Etelson	6cf7204733	ethdev: support flow elements with variable length Flow API provides RAW item type for packet patterns of variable length. The RAW item structure has fixed size members that describe the variable pattern length and methods to process it. There is the new Flow items with variable lengths coming - flex item. In order to handle this item (and potentially other new ones with variable pattern length) in flow copy and conversion routines the helper function is introduced. Signed-off-by: Gregory Etelson <getelson@nvidia.com> Reviewed-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2021-10-20 18:53:46 +02:00
Ferruh Yigit	990912e676	ethdev: unify MTU checks Both 'rte_eth_dev_configure()' & 'rte_eth_dev_set_mtu()' sets MTU but have slightly different checks. Like one checks min MTU against RTE_ETHER_MIN_MTU and other RTE_ETHER_MIN_LEN. Checks moved into common function to unify the checks. Also this has benefit to have common error logs. Default 'dev_info->min_mtu' (the one set by ethdev if driver doesn't provide one), changed to ('RTE_ETHER_MIN_LEN' - overhead). Previously it was 'RTE_ETHER_MIN_MTU' which is min MTU for IPv4 packets. Since the intention is to provide min MTU corresponding minimum frame size, new default value suits better. Suggested-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-18 19:20:21 +02:00
Ferruh Yigit	b563c14212	ethdev: remove jumbo offload flag Removing 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag. Instead of drivers announce this capability, application can deduct the capability by checking reported 'dev_info.max_mtu' or 'dev_info.max_rx_pktlen'. And instead of application setting this flag explicitly to enable jumbo frames, this can be deduced by driver by comparing requested 'mtu' to 'RTE_ETHER_MTU'. Removing this additional configuration for simplification. Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Huisong Li <lihuisong@huawei.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Michal Krawczyk <mk@semihalf.com>	2021-10-18 19:20:21 +02:00
Ferruh Yigit	f7e04f57ad	ethdev: move MTU set check to library Move requested MTU value check to the API to prevent the duplicated code. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-18 19:20:21 +02:00
Ferruh Yigit	dd4e429c95	ethdev: move jumbo frame offload check to library Setting MTU bigger than RTE_ETHER_MTU requires the jumbo frame support, and application should enable the jumbo frame offload support for it. When jumbo frame offload is not enabled by application, but MTU bigger than RTE_ETHER_MTU is requested there are two options, either fail or enable jumbo frame offload implicitly. Enabling jumbo frame offload implicitly is selected by many drivers since setting a big MTU value already implies it, and this increases usability. This patch moves this logic from drivers to the library, both to reduce the duplicated code in the drivers and to make behaviour more visible. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Huisong Li <lihuisong@huawei.com>	2021-10-18 19:20:21 +02:00
Ferruh Yigit	1bb4a528c4	ethdev: fix max Rx packet length There is a confusion on setting max Rx packet length, this patch aims to clarify it. 'rte_eth_dev_configure()' API accepts max Rx packet size via 'uint32_t max_rx_pkt_len' field of the config struct 'struct rte_eth_conf'. Also 'rte_eth_dev_set_mtu()' API can be used to set the MTU, and result stored into '(struct rte_eth_dev)->data->mtu'. These two APIs are related but they work in a disconnected way, they store the set values in different variables which makes hard to figure out which one to use, also having two different method for a related functionality is confusing for the users. Other issues causing confusion is: * maximum transmission unit (MTU) is payload of the Ethernet frame. And 'max_rx_pkt_len' is the size of the Ethernet frame. Difference is Ethernet frame overhead, and this overhead may be different from device to device based on what device supports, like VLAN and QinQ. * 'max_rx_pkt_len' is only valid when application requested jumbo frame, which adds additional confusion and some APIs and PMDs already discards this documented behavior. * For the jumbo frame enabled case, 'max_rx_pkt_len' is an mandatory field, this adds configuration complexity for application. As solution, both APIs gets MTU as parameter, and both saves the result in same variable '(struct rte_eth_dev)->data->mtu'. For this 'max_rx_pkt_len' updated as 'mtu', and it is always valid independent from jumbo frame. For 'rte_eth_dev_configure()', 'dev->data->dev_conf.rxmode.mtu' is user request and it should be used only within configure function and result should be stored to '(struct rte_eth_dev)->data->mtu'. After that point both application and PMD uses MTU from this variable. When application doesn't provide an MTU during 'rte_eth_dev_configure()' default 'RTE_ETHER_MTU' value is used. Additional clarification done on scattered Rx configuration, in relation to MTU and Rx buffer size. MTU is used to configure the device for physical Rx/Tx size limitation, Rx buffer is where to store Rx packets, many PMDs use mbuf data buffer size as Rx buffer size. PMDs compare MTU against Rx buffer size to decide enabling scattered Rx or not. If scattered Rx is not supported by device, MTU bigger than Rx buffer size should fail. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Huisong Li <lihuisong@huawei.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com>	2021-10-18 19:20:20 +02:00
Georg Sauthoff	24f1955d1e	net: fix aliasing in checksum computation That means a superfluous cast is removed and aliasing through a uint8_t pointer is eliminated. NB: The C standard specifies that a unsigned char pointer may alias while the C standard doesn't include such requirement for uint8_t pointers. Also simplified the loop since a modern C compiler can speed up (i.e. auto-vectorize) it in a similar way. For example, GCC auto-vectorizes it for Haswell using AVX registers while halving the number of instructions in the generated code. Fixes: `6006818cfb` ("net: new checksum functions") Fixes: `e079655c41` ("net: fix build with gcc 4.4.7 and strict aliasing") Cc: stable@dpdk.org Signed-off-by: Georg Sauthoff <mail@gms.tf> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-18 18:15:58 +02:00
Jie Wang	632be32735	ethdev: add API to get device configuration The driver may change offloads info into dev->data->dev_conf in dev_configure which may cause apps use outdated values. Add a new API to get actual device configuration. Signed-off-by: Jie Wang <jie1x.wang@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-15 13:27:05 +02:00
Gowrishankar Muthukrishnan	58b43c1ddf	ethdev: add telemetry endpoint for device info Add telemetry endpoint /ethdev/info for device info. Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-14 23:44:53 +02:00
Gregory Etelson	63f2bbfa82	net: introduce IPv4 IHL and version fields RTE IPv4 header definition combines the `version' and `ihl' fields into a single structure member. This patch introduces dedicated structure members for both `version' and `ihl' IPv4 fields. Separated header fields definitions allow to create simplified code to match on the IHL value in a flow rule. The original `version_ihl' structure member is kept for backward compatibility. Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-14 23:00:45 +02:00
Viacheslav Ovsiienko	50cd0391a4	ethdev: add experimental comment for modify field action EXPERIMENTAL tag was missed in rte_flow_action_modify_data structure description. Fixes: `73b68f4c54` ("ethdev: introduce generic modify flow action") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-14 14:34:31 +02:00
Viacheslav Ovsiienko	14fc81aed7	ethdev: update modify field flow action The generic modify field flow action introduced in [1] has some issues related to the immediate source operand: - immediate source can be presented either as an unsigned 64-bit integer or pointer to data pattern in memory. There was no explicit pointer field defined in the union. - the byte ordering for 64-bit integer was not specified. Many fields have shorter lengths and byte ordering is crucial. - how the bit offset is applied to the immediate source field was not defined and documented. - 64-bit integer size is not enough to provide IPv6 addresses. In order to cover the issues and exclude any ambiguities the following is done: - introduce the explicit pointer field in rte_flow_action_modify_data structure - replace the 64-bit unsigned integer with 16-byte array - update the modify field flow action documentation Appropriate deprecation notice has been removed. [1] commit `73b68f4c54` ("ethdev: introduce generic modify flow action") Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-14 14:34:31 +02:00
Ivan Malov	1179f05cc9	ethdev: query proxy port to manage transfer flows Not all DPDK ports in a given switching domain may have the privilege to manage "transfer" flows. Add an API to find a port with sufficient privileges by any port in the domain. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com>	2021-10-14 13:42:59 +02:00
Ivan Malov	9d2a349b38	ethdev: deprecate direction attributes in transfer flows Attributes "ingress" and "egress" can only apply unambiguosly to non-"transfer" flows. In "transfer" flows, the standpoint is effectively shifted to the embedded switch. There can be many different endpoints connected to the switch, so the use of "ingress" / "egress" does not shed light on which endpoints precisely can be considered as traffic sources. Add relevant deprecation notices and suggest the use of precise traffic source items (PORT_REPRESENTOR and REPRESENTED_PORT). Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-13 22:59:26 +02:00
Ivan Malov	5da44faa80	ethdev: deprecate hard-to-use or ambiguous items and actions PF, VF and PHY_PORT require that applications have extra knowledge of the underlying NIC and thus are hard to use. Also, the corresponding items depend on the direction attribute (ingress / egress), which complicates their use in applications and interpretation in PMDs. The concept of PORT_ID is ambiguous as it doesn't say whether the port in question is an ethdev or the represented entity. Items and actions PORT_REPRESENTOR, REPRESENTED_PORT should be used instead. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	88caad251c	ethdev: add represented port action to flow API For use in "transfer" flows. Supposed to send matching traffic to the entity represented by the given ethdev, at embedded switch level. Such an entity can be a network (via a network port), a guest machine (via a VF) or another ethdev in the same application. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	8edb6bc026	ethdev: add port representor action to flow API For use in "transfer" flows. Supposed to send matching traffic to the given ethdev (to the application), at embedded switch level. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	49863ae2bf	ethdev: add represented port item to flow API For use in "transfer" flows. Supposed to match traffic entering the embedded switch from the entity represented by the given ethdev. Such an entity can be a network (via a network port), a guest machine (via a VF) or another ethdev in the same application. Must not be combined with direction attributes. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:26 +02:00
Ivan Malov	081e42dab1	ethdev: add port representor item to flow API For use in "transfer" flows. Supposed to match traffic entering the embedded switch from the given ethdev. Must not be combined with direction attributes. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-13 22:59:25 +02:00
Konstantin Ananyev	f9bdee267a	ethdev: hide internal structures Move rte_eth_dev, rte_eth_dev_data, rte_eth_rxtx_callback and related data into private header (ethdev_driver.h). Few minor changes to keep DPDK building after that. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-13 22:14:59 +02:00
Konstantin Ananyev	27a300e6af	ethdev: add API to retrieve multiple MAC addresses Introduce rte_eth_macaddrs_get() to allow user to retrieve all ethernet addresses assigned to given port. Change testpmd to use this new function and avoid referencing directly rte_eth_devices[]. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-13 22:14:59 +02:00
Konstantin Ananyev	7a0935239b	ethdev: make fast-path functions to use new flat array Rework fast-path ethdev functions to use rte_eth_fp_ops[]. While it is an API/ABI breakage, this change is intended to be transparent for both users (no changes in user app is required) and PMD developers (no changes in PMD is required). One extra thing to note - RX/TX callback invocation will cause extra function call with these changes. That might cause some insignificant slowdown for code-path where RX/TX callbacks are heavily involved. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-13 22:14:58 +02:00
Konstantin Ananyev	c87d435a4d	ethdev: copy fast-path API into separate structure Copy public function pointers (rx_pkt_burst(), etc.) and related pointers to internal data from rte_eth_dev structure into a separate flat array. That array will remain in a public header. The intention here is to make rte_eth_dev and related structures internal. That should allow future possible changes to core eth_dev structures to be transparent to the user and help to avoid ABI/API breakages. The plan is to keep minimal part of data from rte_eth_dev public, so we still can use inline functions for fast-path calls (like rte_eth_rx_burst(), etc.) to avoid/minimize slowdown. The whole idea beyond this new schema: 1. PMDs keep to setup fast-path function pointers and related data inside rte_eth_dev struct in the same way they did it before. 2. Inside rte_eth_dev_start() and inside rte_eth_dev_probing_finish() (for secondary process) we call eth_dev_fp_ops_setup, which copies these function and data pointers into rte_eth_fp_ops[port_id]. 3. Inside rte_eth_dev_stop() and inside rte_eth_dev_release_port() we call eth_dev_fp_ops_reset(), which resets rte_eth_fp_ops[port_id] into some dummy values. 4. fast-path ethdev API (rte_eth_rx_burst(), etc.) will use that new flat array to call PMD specific functions. That approach should allow us to make rte_eth_devices[] private without introducing regression and help to avoid changes in drivers code. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-13 22:14:58 +02:00
Konstantin Ananyev	8d7d4fcdca	ethdev: change input parameters for Rx queue count Currently majority of fast-path ethdev ops take pointers to internal queue data structures as an input parameter. While eth_rx_queue_count() takes a pointer to rte_eth_dev and queue index. For future work to hide rte_eth_devices[] and friends it would be plausible to unify parameters list of all fast-path ethdev ops. This patch changes eth_rx_queue_count() to accept pointer to internal queue data as input parameter. While this change is transparent to user, it still counts as an ABI change, as eth_rx_queue_count_t is used by ethdev public inline function rte_eth_rx_queue_count(). Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Hyong Youb Kim <hyonkim@cisco.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-13 22:14:58 +02:00
Konstantin Ananyev	c024496ae8	ethdev: allocate max space for internal queue array At queue configure stage always allocate space for maximum possible number (RTE_MAX_QUEUES_PER_PORT) of queue pointers. That will allow 'fast' inline functions (eth_rx_burst, etc.) to refer pointer to internal queue data without extra checking of current number of configured queues. That would help in future to hide rte_eth_dev and related structures. It means that from now on, each ethdev port will always consume: ((2sizeof(uintptr_t)) RTE_MAX_QUEUES_PER_PORT) bytes of memory for its queue pointers. With RTE_MAX_QUEUES_PER_PORT==1024 (default value) it is 16KB per port. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-13 22:14:58 +02:00
Ivan Malov	f6d8a6d3fa	ethdev: negotiate delivery of packet metadata from HW to PMD Provide an API to let the application control the NIC's ability to deliver specific kinds of per-packet metadata to the PMD. Checks for the NIC's ability to set these kinds of metadata in the first place (support for the flow actions) belong in flow API responsibility domain (flow validate mechanism). This topic is out of scope of the new API in question. The PMD's ability to deliver received metadata to the user by virtue of mbuf fields should be covered by mbuf library. It is also out of scope of the new API in question. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Andy Moreton <amoreton@xilinx.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com>	2021-10-13 00:47:42 +02:00
Andrew Rybchenko	92ef4b8f16	ethdev: remove deprecated shared counter attribute Indirect actions should be used to do shared counters. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>	2021-10-12 19:20:57 +02:00
Viacheslav Galaktionov	ff4e52efb3	ethdev: fix representor port ID search by name The patch is required for all PMDs which do not provide representors info on the representor itself. The function, rte_eth_representor_id_get(), is used in eth_representor_cmp() which is required in ethdev class iterator to search ethdev port ID by name (representor case). Before the patch the function is called on the representor itself and tries to get representors info to match. Search of port ID by name is used after hotplug to find out port ID of the just plugged device. Getting a list of representors from a representor does not make sense. Instead, a backer device should be used. To this end, extend the rte_eth_dev_data structure to include the port ID of the backing device for representors. Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Beilei Xing <beilei.xing@intel.com> Reviewed-by: Xueming Li <xuemingl@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-10-12 16:54:20 +02:00
Andrew Rybchenko	6c31a8c20a	ethdev: remove legacy Rx descriptor done API rte_eth_rx_descriptor_status() should be used as a replacement. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-11 16:44:57 +02:00
Akhil Goyal	fb545457ed	security: add reserved bit fields In struct rte_security_ipsec_sa_options, for every new option added, there is an ABI breakage, to avoid, a reserved_opts bitfield is added to for the remaining bits available in the structure. Now for every new sa option, these reserved_opts can be reduced and new option can be added. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-18 20:12:19 +02:00
Akhil Goyal	3867ed0280	security: hide internal API rte_security_dynfield_register() is an internal API to be used by the driver, hence moving it to internal. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-18 20:12:19 +02:00
Nicolas Chautru	ab4e19097b	bbdev: add device info for data endianness Added device information to capture explicitly the assumption of the input/output data byte endianness being processed. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-18 20:11:16 +02:00
Gagandeep Singh	8edcb68fd0	cryptodev: fix multi-segment raw vector processing If no next segment available the “for” loop will fail and it still returns i+1 i.e. 2, which is wrong as it has filled only 1 buffer. Fixes: `7adf992fb9` ("cryptodev: introduce CPU crypto API") Cc: stable@dpdk.org Signed-off-by: Gagandeep Singh <g.singh@nxp.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 19:32:13 +02:00
Hemant Agrawal	68f5d3d320	cryptodev: add field for out-of-place in raw vector The structure rte_crypto_sym_vec is updated to add dest_sgl to support out of place processing. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-17 19:32:01 +02:00
Gagandeep Singh	6afd461f9f	cryptodev: add total raw buffer length The current crypto raw data vectors is extended to support rte_security usecases, where we need total data length to know how much additional memory space is available in buffer other than data length so that driver/HW can write expanded size data after encryption. Signed-off-by: Gagandeep Singh <g.singh@nxp.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-17 19:32:01 +02:00
Hemant Agrawal	10488d59ae	cryptodev: rename field in vector struct This patch renames the sgl to src_sgl in struct rte_crypto_sym_vec to help differentiating between source and destination sgl. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-17 19:31:15 +02:00
Radu Nicolau	2ed40da848	ipsec: support setting initial ESN value Update IPsec library to support initial ESN value. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:11:59 +02:00
Radu Nicolau	68977baa75	ipsec: support SA telemetry Add telemetry support for ipsec SAs. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:08:03 +02:00
Radu Nicolau	64df4712ce	mbuf: add IPsec ESP tunnel type Add ESP tunnel type to the tunnel types list that can be specified for TSO or checksum on the inner part of tunnel packets. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-17 14:07:03 +02:00
Radu Nicolau	01eef5907f	ipsec: support NAT-T Add support for the IPsec NAT-Traversal use case for Tunnel mode packets. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:06:24 +02:00
Radu Nicolau	17344c0278	security: add UDP parameters for IPsec NAT-T Add support for specifying UDP port params for UDP encapsulation option. RFC3948 section-2.1 does not enforce using specific the UDP ports for UDP-Encapsulated ESP Header Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:03:43 +02:00
Radu Nicolau	c99d26197c	ipsec: support more AEAD algorithms Added support for AES_CCM, CHACHA20_POLY1305 and AES_GMAC. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 14:03:13 +02:00
Radu Nicolau	199fcba1bd	security: add ESN field to IPsec xform Update ipsec_xform definition to include ESN field. This allows the application to control the ESN starting value. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com> Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Anoob Joseph <anoobj@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-17 13:08:35 +02:00
Matan Azrad	cab0c8f3c0	cryptodev: extend data-unit length field As described in [1] and as announced in [2], The field ``dataunit_len`` of the ``struct rte_crypto_cipher_xform`` moved to the end of the structure and extended to ``uint32_t``. In this way, sizes bigger than 64K bytes can be supported for data-unit lengths. [1] commit `d014dddb2d` ("cryptodev: support multiple cipher data-units") [2] commit `9a5c09211b` ("doc: announce extension of crypto data-unit length") Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-16 16:24:43 +02:00
David Marchand	afdaa60795	mempool: accept user flags only As reported by Dmitry, RTE_MEMPOOL_F_POOL_CREATED is a flag only manipulated internally. This flag is not supposed to be requested from an application and would probably result in an incorrect behavior if an application did pass it. At least one other internal flag has been added recently and more may be introduced later. Rework the check and export a mask of valid user flags for use in the unit test. Fixes: `b240af8b10` ("mempool: enforce valid flags at creation") Reported-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-20 10:03:55 +02:00
Andrew Rybchenko	fb11ae8816	mempool: deprecate unused physical page defines MEMPOOL_PG_NUM_DEFAULT and MEMPOOL_PG_SHIFT_MAX are not used. Fixes: `fd943c764a` ("mempool: deprecate xmem functions") Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 10:03:41 +02:00
Andrew Rybchenko	cb77b060eb	mempool: add namespace to driver register macro Add RTE_ prefix to macro used to register mempool driver. The old one is still available but deprecated. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 10:00:18 +02:00
Andrew Rybchenko	d720366184	mempool: make header size calculation internal Add RTE_ prefix to helper macro to calculate mempool header size and make it internal. Old macro is still available, but deprecated. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 10:00:18 +02:00
Andrew Rybchenko	ad276d5c7e	mempool: add namespace to internal helpers Add RTE_ prefix to internal API defined in public header. Use the prefix instead of double underscore. Use uppercase for macros in the case of name conflict. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 10:00:18 +02:00
Andrew Rybchenko	c47d7b90a1	mempool: add namespace to flags Fix the mempool flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, to be deprecated in the future. Flag MEMPOOL_F_NON_IO added in the release is just renamed to have RTE_ prefix. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 10:00:16 +02:00
Andrew Rybchenko	925a83a5bf	mempool: enhance flags documentation readability Move documentation into a separate line just before define. Prepare to have a bit longer flag name because of namespace prefix. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-20 09:58:39 +02:00
Feifei Wang	c4629b02c5	mcslock: use WFE in lock for aarch64 Instead of polling for previous lock holder unlocking, use wait_until_equal API. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>	2021-10-20 08:22:41 +02:00
Feifei Wang	a6e24bf417	mem: use WFE for init sync on aarch64 Instead of polling for mcfg->magic to be updated, use wait_until_equal API. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>	2021-10-20 08:22:18 +02:00
Joyce Kong	4da0136096	stack: remove unneeded atomic header include In stack module, remove the header file rte_atomic.h as it is not being used. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-19 17:15:10 +02:00
Dmitry Kozlyuk	11541c5c81	mempool: add non-IO flag Mempool is a generic allocator that is not necessarily used for device IO operations and its memory for DMA. Add MEMPOOL_F_NON_IO flag to mark such mempools automatically a) if their objects are not contiguous; b) if IOVA is not available for any object. Other components can inspect this flag in order to optimize their memory management. Discussion: https://mails.dpdk.org/archives/dev/2021-August/216654.html Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-19 16:35:16 +02:00
Dmitry Kozlyuk	da2b9cb25e	mempool: add event callbacks Data path performance can benefit if the PMD knows which memory it will need to handle in advance, before the first mbuf is sent to the PMD. It is impractical, however, to consider all allocated memory for this purpose. Most often mbuf memory comes from mempools that can come and go. PMD can enumerate existing mempools on device start, but it also needs to track creation and destruction of mempools after the forwarding starts but before an mbuf from the new mempool is sent to the device. Add an API to register callback for mempool life cycle events: * rte_mempool_event_callback_register() * rte_mempool_event_callback_unregister() Currently tracked events are: * RTE_MEMPOOL_EVENT_READY (after populating a mempool) * RTE_MEMPOOL_EVENT_DESTROY (before freeing a mempool) Provide a unit test for the new API. The new API is internal, because it is primarily demanded by PMDs that may need to deal with any mempools and do not control their creation, while an application, on the other hand, knows which mempools it creates and doesn't care about internal mempools PMDs might create. Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-19 16:35:16 +02:00
Bruce Richardson	2e348d8fe3	dmadev: add flag for error handling support Due to HW or driver limitations, not all dmadevs may support full error handling e.g. safely managing and reporting an invalid address to a copy operation. The skeleton dmadev, for example, being pure software will always seg-fault if passed an invalid address. To indicate the availability of safe error handling by a device, we add a capability flag for it. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>	2021-10-18 11:19:27 +02:00
Bruce Richardson	190f7e84c3	dmadev: add device iterator Add a function and wrapper macro to iterate over all DMA devices. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>	2021-10-18 11:17:32 +02:00
Kevin Laatz	ea8cf0f853	dmadev: add burst capacity API Add a burst capacity check API to the dmadev library. This API is useful to applications which need to how many descriptors can be enqueued in the current batch. For example, it could be used to determine whether all segments of a multi-segment packet can be enqueued in the same batch or not (to avoid half-offload of the packet). Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-18 11:17:30 +02:00
Bruce Richardson	5e0f859127	dmadev: add channel status check for testing use Add in a function to check if a device or vchan has completed all jobs assigned to it, without gathering in the results. This is primarily for use in testing, to allow the hardware to be in a known-state prior to gathering completions. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com>	2021-10-18 11:17:21 +02:00
Chengwen Feng	2ece65f00f	dmadev: support multi-process This patch add multi-process support for dmadev. Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-17 20:49:58 +02:00
Chengwen Feng	91e581e5c9	dmadev: add data plane API This patch add data plane API for dmadev. Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-17 20:49:58 +02:00
Chengwen Feng	e0180db144	dmadev: add control plane API This patch add control plane API for dmadev. Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-17 20:49:58 +02:00
Chengwen Feng	b36970f2e1	dmadev: introduce DMA device library The 'dmadev' is a generic type of DMA device. This patch introduce the 'dmadev' device allocation functions. The infrastructure is prepared to welcome drivers in drivers/dma/ Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: Kevin Laatz <kevin.laatz@intel.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com>	2021-10-17 20:49:57 +02:00
David Marchand	e9123c467d	mbuf: enforce no option for dynamic fields and flags As stated in the API, dynamic field and flags should be created with no additional flag (simply in the API for future changes). Fix the dynamic flag register helper which was not enforcing it and add unit tests. Fixes: `4958ca3a44` ("mbuf: support dynamic fields and flags") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-15 10:29:41 +02:00
David Marchand	bc1a35fb3f	memzone: enforce valid flags when reserving If we do not enforce valid flags are passed by an application, this application might face issues in the future when we add more flags. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-15 10:29:21 +02:00
David Marchand	b240af8b10	mempool: enforce valid flags at creation If we do not enforce valid flags are passed by an application, this application might face issues in the future when we add more flags. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-15 10:24:43 +02:00
Bruce Richardson	85e21b77d8	telemetry: fix socket path conflicts for in-memory mode When running using in-memory mode, multiple processes can use the same runtime dir, leading to conflicts with the telemetry sockets in that directory. We can resolve this by appending a suffix to each socket beyond the first, with the suffix being an increasing counter value. Each process uses the first unused socket counter value. Fixes: `6dd571fd07` ("telemetry: introduce new functionality") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Ciara Power <ciara.power@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Conor Walsh <conor.walsh@intel.com>	2021-10-14 20:31:10 +02:00
Bruce Richardson	e89463a366	eal: limit telemetry to primary processes Telemetry interface should be exposed for primary processes only, since secondary processes will conflict on socket creation, and since all data in secondary process is generally available to primary. For example, all device stats for ethdevs, cryptodevs, etc. will all be common across processes. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Ciara Power <ciara.power@intel.com> Tested-by: Conor Walsh <conor.walsh@intel.com>	2021-10-14 20:31:10 +02:00
David Christensen	b698651b91	eal/ppc: use compiler builtins for atomics Replace existing PPC assembly code for rte_atomicXX ops with compiler atomic builtins as previously adopted by DPDK (see [1] and [2]). This has the additional benefit of resolving a POWER10 build failure due to an outstanding gcc issue which fails on the existing PPC assembly code [3]. [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/ [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98519 Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>	2021-10-14 16:51:25 +02:00
Huichao Cai	567473433b	ip_frag: fix fragmenting IPv4 fragment Current implementation of rte_ipv4_fragment_packet() doesn’t take into account offset and flag values of the given packet, but blindly assumes they are always zero (original packet is not fragmented). According to RFC791, fragment and flag values for new fragment should take into account values provided in the original IPv4 packet. Fixes: `4c38e5532a` ("ip_frag: refactor IPv4 fragmentation into a proper library") Cc: stable@dpdk.org Signed-off-by: Huichao Cai <chcchc88@163.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-14 08:52:34 +02:00
Andrew Rybchenko	74a74bf98c	mbuf: remove deprecated flag for bad outer IPv4 checksum Removed offload flag PKT_RX_EIP_CKSUM_BAD. PKT_RX_OUTER_IP_CKSUM_BAD should be used as a replacement. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-13 23:03:47 +02:00
Andrew Rybchenko	a87a0c0d1a	mempool: fix name size in mempool structure Use correct define as a name array size. The change breaks ABI and therefore cannot be backported to stable branches. Fixes: `38c9817ee1` ("mempool: adjust name size in related data types") Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-10-13 22:54:10 +02:00
Stephen Hemminger	d75eed0fbe	mbuf: fix typo in comment Misspelling of 'copied' Fixes: `c3a90c381d` ("mbuf: add a copy routine") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-13 19:02:18 +02:00
Gowrishankar Muthukrishnan	b76731683b	telemetry: fix JSON output buffer length Earlier, JSON message length was limited to 1024 which would not allow data more than this size. Removed this limitation by creating output buffer based on requested data length. Fixes: `52af6ccb2b` ("telemetry: add utility functions for creating JSON") Cc: stable@dpdk.org Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com> Acked-by: Ciara Power <ciara.power@intel.com>	2021-10-13 18:17:24 +02:00
Bruce Richardson	0faa4cfc50	eal/freebsd: ignore in-memory option The in-memory option is not supported on FreeBSD so print a warning and ignore the flag when it is specified for BSD apps. The lack of support is due to the different way in which memory is managed on FreeBSD using the contigmem driver rather than via a hugetlbfs filesystem. Fixes: `14de8734c4` ("eal: add --in-memory option") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-13 17:11:26 +02:00
Olivier Matz	8e506da755	net: promote IPv6 external headers skip API as stable This function is public since commit `8f0e4d6a78` ("net: export IPv6 header extensions skip function") (2018), and is used by vmxnet3 driver. Promote it as stable. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-13 12:57:12 +02:00
David Marchand	2f3758751b	eal/x86: sort CPU extended features definitions Sort the definitions for extended features (leaf 0) to enhance readability. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-12 21:07:53 +02:00
David Marchand	aae3037ab1	eal/x86: fix some CPU extended features definitions Caught while checking CPUID related stuff in OVS. According to [1], for Structured Extended Feature Flags Enumeration Leaf (EAX = 0x07H, ECX = 0): - BMI1 is associated to EBX, bit 3 (was incorrectly 2), - SMEP is associated to EBX, bit 7 (was incorrectly 6), - BMI2 is associated to EBX, bit 8 (was incorrectly 7), - ERMS is associated to EBX, bit 9 (was incorrectly 8), 1: https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-12 21:07:50 +02:00
John Levon	24d5a1ce6b	eal/linux: allow hugetlbfs sub-directories get_hugepage_dir() was implemented in such a way that a --huge-dir option had to exactly match the mountpoint, but there's no reason for this restriction: DPDK might not be the only user of hugepages, and shouldn't assume it owns an entire mountpoint. For example, if I have /dev/hugepages/myapp, and /dev/hugepages/dpdk, I should be able to specify: --huge-dir=/dev/hugepages/dpdk/ and have DPDK only use that sub-directory. Fix the implementation to allow a sub-directory within a suitable hugetlbfs mountpoint to be specified, preferring the closest match. Signed-off-by: John Levon <john.levon@nutanix.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-10-12 21:07:46 +02:00
Archana Muniganti	03ab51eafd	security: add SA config option for inner checksum Add inner packet IPv4 hdr and L4 checksum enable options in conf. These will be used in case of protocol offload. Per SA, application could specify whether the checksum(compute/verify) can be offloaded to security device. Signed-off-by: Archana Muniganti <marchana@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-08 21:39:39 +02:00
Shijith Thotton	dd451ad152	doc: remove event crypto metadata deprecation note Proposed change to event crypto metadata is not done as per deprecation note. Instead, comments are updated in spec to improve readability. Signed-off-by: Shijith Thotton <sthotton@marvell.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2021-10-08 21:31:07 +02:00
Nicolas Chautru	5f13f4c03d	bbdev: reduce log level of a failure message Queue setup may genuinely fail when adding incremental queues for a given priority level. In that case application would attempt to configure a queue at a different priority level. Not an actual error. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Reviewed-by: Tom Rix <trix@redhat.com>	2021-10-08 21:31:07 +02:00
Nicolas Chautru	10ea15e35f	bbdev: add capability for 4G CB CRC drop Adding option to drop CRC24B to align with existing feature for 5G Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Reviewed-by: Tom Rix <trix@redhat.com>	2021-10-08 21:31:07 +02:00
Nicolas Chautru	cc360fd3f2	bbdev: add capability for CRC16 check Adding a missing operation when CRC16 is being used for TB CRC check. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Reviewed-by: Tom Rix <trix@redhat.com>	2021-10-08 21:31:07 +02:00
Tejasree Kondoj	f7e3aa693d	security: add option to configure UDP ports verification Add option to indicate whether UDP encapsulation ports verification need to be done as part of inbound IPsec processing. Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-08 21:31:07 +02:00
Dmitry Kozlyuk	d47dd94162	eal/windows: do not install virt2phys header The header was not intended to be a public one. DPDK users should use `rte_mem_virt2iova()` to translate addresses. Other virt2phys users should use the header from the driver instead. Fixes: `2a5d547a4a` ("eal/windows: implement basic memory management") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-10-11 21:17:12 +02:00
Narcisa Vasile	694c81721e	eal/windows: fix CPU cores counting On Windows, -l/--lcores EAL option was unable to process CPU sets containing CPUs other than 0 and 1, because CPU_COUNT() macro only checked these CPUs in the set. Fix CPU_COUNT() by enumerating all possible CPU indices. Fixes: `e8428a9d89` ("eal/windows: add some basic functions and macros") Cc: stable@dpdk.org Signed-off-by: Narcisa Vasile <navasile@microsoft.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>	2021-10-11 18:52:56 +02:00
Lance Richardson	dc954ae73a	net: fix checksum API documentation Minor corrections and improvements to documentation for checksum APIs. Fixes: `6006818cfb` ("net: new checksum functions") Fixes: `45a08ef55e` ("net: introduce functions to verify L4 checksums") Cc: stable@dpdk.org Signed-off-by: Lance Richardson <lance.richardson@broadcom.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-07 14:42:45 +02:00
Andrew Rybchenko	b225783dda	ethdev: remove legacy mirroring API A more fine-grain flow API action RTE_FLOW_ACTION_TYPE_SAMPLE should be used instead of it. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-07 13:02:26 +02:00
Xueming Li	7483341ae5	ethdev: change queue release callback Currently, most ethdev callback API use queue ID as parameter, but Rx and Tx queue release callback use queue object which is used by Rx and Tx burst data plane callback. To align with other eth device queue configuration callbacks: - queue release callbacks are changed to use queue ID - all drivers are adapted Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-06 19:16:03 +02:00
Xueming Li	49ed322469	ethdev: make queue release callback optional Some drivers don't need Rx and Tx queue release callback, make them optional. Clean up empty queue release callbacks for some drivers. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2021-10-06 19:16:03 +02:00
Andrew Rybchenko	8c9f976f05	ethdev: improve xstats names by IDs get prototype Adjust parameters order to eth_xstats_get_by_id_t prototype. Make ids the second parameter similar to eth_xstats_get_by_id_t. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-06 13:07:11 +02:00
Ivan Ilchenko	71b5e430a6	ethdev: update xstats by ID driver callbacks documentation Update xstats by IDs callbacks documentation in accordance with ethdev usage of these callbacks. Document valid combinations of input arguments to make driver implementation simpler. Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Andy Moreton <amoreton@xilinx.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-06 13:07:11 +02:00
Andrew Rybchenko	113778be13	ethdev: do not use get xstats names by IDs to obtain count Relax requirements on get xstats names by IDs. After the patch corresponding the driver operation is called with non-NULL ids and xstats_names parameters only. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-06 13:07:11 +02:00
Ivan Ilchenko	bc5112ca59	ethdev: fix xstats by ID API documentation Document valid combinations of input arguments in accordance with current implementation in ethdev. Fixes: `79c913a42f` ("ethdev: retrieve xstats by ID") Cc: stable@dpdk.org Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Andy Moreton <amoreton@xilinx.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-06 13:07:11 +02:00
Dmitry Kozlyuk	04d43857ea	net: rename Ethernet header fields Definition of `rte_ether_addr` structure used a workaround allowing DPDK and Windows SDK headers to be used in the same file, because Windows SDK defines `s_addr` as a macro. Rename `s_addr` to `src_addr` and `d_addr` to `dst_addr` to avoid the conflict and remove the workaround. Deprecation notice: https://mails.dpdk.org/archives/dev/2021-July/215270.html Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2021-10-08 14:58:11 +02:00
Tal Shnaiderman	56d2c1aa0b	security: build on Windows Build the security library on Windows. Remove unneeded export of inline functions from version file. Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: William Tu <u9012063@gmail.com>	2021-10-07 14:47:35 +02:00
Tal Shnaiderman	cb7b6898c8	cryptodev: build on Windows Build the cryptography device library on Windows OS by removing unneeded include and exports of inline functions blocking the compilation. Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: William Tu <u9012063@gmail.com>	2021-10-07 14:47:35 +02:00
Tal Shnaiderman	f9b2a75ed4	security: use net library to include IP structs Remove the netinet includes and replaces them with rte_ip.h to support the in_addr/in6_addr structs on all operating systems. Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Akhil Goyal <gakhil@marvell.com> Acked-by: William Tu <u9012063@gmail.com>	2021-10-07 14:47:35 +02:00
David Marchand	ddfc59f4fb	sort symbol maps Fixed with ./devtools/update-abi.sh $(cat ABI_VERSION) Fixes: `e73a7ab224` ("net/softnic: promote manage API") Fixes: `8f532a34c4` ("fib: promote API to stable") Fixes: `4aeb92396b` ("rib: promote API to stable") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-05 17:03:37 +02:00
Xuan Ding	07ee2d7505	vhost: normalize return type and function name In some function definitions, adjust return type and function name on a separate line to be consistent with DPDK coding style. Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-09-28 21:23:00 +02:00

... 3 4 5 6 7 ...

7669 Commits