numam-dpdk

Author	SHA1	Message	Date
Maxime Coquelin	a3cfa8081f	vhost: simplify async enqueue completion vhost_poll_enqueue_completed() assumes some inflight packets could have been completed in a previous call but not returned to the application. But this is not the case, since check_completed_copies callback is never called with more than the current count as argument. In other words, async->last_pkts_n is always 0. Removing it greatly simplifies the function. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	2cbe826e26	vhost: remove notion of async descriptor Now that IO vectors iterator have been simplified, the rte_vhost_async_desc struct only contains a pointer on the iterator array stored in the async metadata. This patch removes it, and pass directly the iterators array pointer to the transfer_data callback. Doing that, we avoid declaring the descriptor array in the stack, and also avoid the cost of filling it. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	d5d25cfd85	vhost: improve IO vector logic IO vectors and their iterators arrays were part of the async metadata but not their indexes. In order to makes this more consistent, the patch adds the indexes to the async metadata. Doing that, we can avoid triggering DMA transfer within the loop as it IO vector index overflow is now prevented in the async_mbuf_to_desc() function. Note that previous detection mechanism was broken since the overflow already happened when detected, so OOB memory access would already have happened. With this changes done, virtio_dev_rx_async_submit_split() and virtio_dev_rx_async_submit_packed() can be further simplified. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	0af9f99221	vhost: remove useless fields in async iterator struct Offset and count fields are unused and so can be removed. The offset field was actually in the Vhost example, but in a way that does not make sense. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	6171bfbfb2	vhost: introduce specific iovec structure This patch introduces rte_vhost_iovec struct that contains both source and destination addresses since we always have a 1:1 mapping between source and destination. While using the standard iovec struct might have seemed better, having to duplicate IO vectors and its iterators is memory inefficient and make the implementation more complex. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	8b3fc5a213	vhost: remove async batch threshold Reaching the async batch threshold was one of the condition to trigger the DMA transfer. However, this condition was never met since the threshold value is 32, same as the MAX_PKT_BURST value. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	3fe629547e	vhost: simplify async IO vectors iterators This patch splits the iterator arrays in two, one for source and one for destination. The goal is make the code easier to understand. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	97064162d4	vhost: simplify async IO vectors IO vectors implementation is unnecessarily complex, mixing source and destinations vectors in the same array. This patch declares two arrays, one for the source and one for the destination. It also gets rid of seg_awaits variable in both packed and split implementation, which is the same as iovec_idx. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	5f89c5e1e9	vhost: hide in-flight async structure This patch moves async_inflight_info struct to internal header since it should not be part of the API. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Maxime Coquelin	ee8024b3d4	vhost: move async data in dedicated structure This patch moves async-related metadata from vhost_virtqueue to a dedicated struct. It makes it clear which fields are async related, and also saves some memory when async feature is not in use. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>	2021-10-29 12:32:30 +02:00
Miao Li	c6e305141a	power: support missing Rx queue info Since some vdevs like virtio and vhost do not support rxq_info_get and queue state inquiry, the error return value -ENOTSUP need to be ignored when queue_stopped cannot get rx queue information and rx queue state. This patch changes the return value of queue_stopped when rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot provide rx queue information and rx queue state enable power management. Fixes: `209fd58545` ("power: make ethdev power management thread unsafe") Cc: stable@dpdk.org Signed-off-by: Miao Li <miao.li@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2021-10-29 12:32:29 +02:00
Miao Li	34fd4373ce	vhost: add power monitor API This commit defines rte_vhost_power_monitor_cond which is used to pass some information to vhost driver. The information is including the address to monitor, the expected value, the mask to extract value read from 'addr', the value size of monitor address, the match flag used to distinguish the value used to match something or not match something. Vhost driver can use these information to fill rte_power_monitor_cond. Signed-off-by: Miao Li <miao.li@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2021-10-29 12:32:29 +02:00
Xuan Ding	5fd6e93b7e	vhost: remove async DMA map status Async DMA map status flag was added to prevent the unnecessary unmap when DMA devices bound to kernel driver. This brings maintenance cost for a lot of code. This patch removes the DMA map status by using rte_errno instead. This patch relies on the following patch to fix a partial unmap check in vfio unmapping API. [1] https://www.mail-archive.com/dev@dpdk.org/msg226464.html Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-29 12:32:22 +02:00
David Marchand	e7c727c307	net: fix build with sparse on L2TPv2 bitfields An external project that wants to do additional checks on fields endianness can remap rte_beXX types to instrumented types and use sparse. The current code breaks OVS build with sparse: ../../lib/ofp-packet.c: note: in included file (through .../ovs/dpdk-dir/build/include/rte_flow.h, ../../lib/netdev-dpdk.h, ../../lib/dp-packet.h): .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:92:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:93:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:94:40: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:95:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:96:40: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:97:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:98:37: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h:99:40: error: invalid bitfield specifier for type restricted ovs_be16. .../ovs/dpdk-dir/build/include/rte_l2tpv2.h💯39: error: invalid bitfield specifier for type restricted ovs_be16. make[3]: *** [lib/ofp-packet.lo] Error 1 Use simple uint16_t types for bitfields in L2TPv2 struct. Fixes: `3a929df1f2` ("ethdev: support L2TPv2 and PPP procotol") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-28 20:28:01 +02:00
David Marchand	41f2f05574	ethdev: warn once when using port not ready Warning continuously is a pain when developping or if a unit test is/gets broken. It could also be a problem if application behaves badly only in some corner cases and a DoS results of those logs being continuously displayed. Let's warn once per port and per rx/tx. Getting such a log is scary, but let's make it more eye catching by dumping a backtrace with it. Tested by introducing a bug in testpmd: static int eth_dev_start_mp(uint16_t port_id) { - if (is_proc_primary()) + if (!is_proc_primary()) return rte_eth_dev_start(port_id); return 0; Then, running a basic null test: $ ./devtools/test-null.sh ... Start automatic packet forwarding io packet forwarding - ports=2 - cores=1 - streams=2 - NUMA support enabled, MP allocation mode: native Logical Core 1 (socket 0) forwards packets on 2 streams: RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01 RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 lcore 0 called rx_pkt_burst for not ready port 0 8: [build/app/dpdk-testpmd() [0x59e839]] 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]] 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]] 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]] 4: [build/app/dpdk-testpmd() [0x65e1be]] 3: [build/app/dpdk-testpmd() [0x65a996]] 2: [build/app/dpdk-testpmd() [0xa6cbc7]] 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]] lcore 0 called rx_pkt_burst for not ready port 1 8: [build/app/dpdk-testpmd() [0x59e839]] 7: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff481b69555]] 6: [build/app/dpdk-testpmd(main+0x54b) [0x662d24]] 5: [build/app/dpdk-testpmd(start_packet_forwarding+0x263) [0x65e795]] 4: [build/app/dpdk-testpmd() [0x65e1be]] 3: [build/app/dpdk-testpmd() [0x65a996]] 2: [build/app/dpdk-testpmd() [0xa6cbc7]] 1: [build/app/dpdk-testpmd(rte_dump_stack+0x27) [0xaee796]] io packet forwarding packets/burst=32 nb forwarding cores=1 - nb forwarding ports=2 port 0: RX queue number: 1 Tx queue number: 1 Rx offloads=0x0 Tx offloads=0x0 Fixes: `c87d435a4d` ("ethdev: copy fast-path API into separate structure") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-27 19:28:45 +02:00
Olivier Matz	9bffc92850	mem: fix dynamic hugepage mapping in container Since its introduction in 2018, the SIGBUS handler was never registered, and all related functions were unused. A SIGBUS can be received by the application when accessing to hugepages even if mmap() was successful, This happens especially when running inside containers when there is not enough hugepages. In this case, we need to recover. A similar scheme can be found in eal_memory.c. Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-05 15:28:55 +01:00
Ilyes Ben Hamouda	770d41bf33	malloc: fix allocation with unknown socket ID When using rte_malloc() from a thread which is not bound to a numa socket (the typical case is a control thread, but it can also happen on a dataplane thread if its cpu affinity is on cores attached to several sockets), the used heap is the one from numa socket 0, which may not have available memory. Fix this by selecting the first socket which has available memory. Note: malloc_get_numa_socket() is only used from one .c file, so move it there, and remove the inline keyword. Fixes: `b94580d688` ("malloc: avoid unknown socket id") Cc: stable@dpdk.org Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-05 15:28:49 +01:00
David Hunt	bb0bd346d5	eal: suggest using --lcores option If the user requests to use an lcore above 128 using -l, the eal will exit with "EAL: invalid core list syntax" and very little else useful information. This patch adds some extra information suggesting to use --lcores so that physical cores above RTE_MAX_LCORE (default 128) can be used. This is achieved by using the --lcores option by mapping the logical cores in the application to physical cores. For example, if "-l 12-16,130,132" is used, we see the following additional output on the command line: EAL: lcore 132 >= RTE_MAX_LCORE (128) EAL: lcore 133 >= RTE_MAX_LCORE (128) EAL: To use high physical core ids, please use --lcores to map them to lcore ids below RTE_MAX_LCORE, EAL: e.g. --lcores 0@12,1@13,2@14,3@15,4@16,5@132,6@133 The same is added to -c option parsing. For example, if "-c 0x300000000000000000000000000000000" is used, we see the following additional output on the command line: EAL: lcore 128 >= RTE_MAX_LCORE (128) EAL: lcore 129 >= RTE_MAX_LCORE (128) EAL: To use high physical core ids, please use --lcores to map them to lcore ids below RTE_MAX_LCORE, EAL: e.g. --lcores 0@128,1@129 Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-11-05 14:39:37 +01:00
David Marchand	f5fa0e110f	eal: promote non-EAL lcore API as stable This API has been around for more than a year (and is in LTS 20.11). It did not receive negative feedback and will be used in a next OVS release. Mark it stable. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-04 22:57:58 +01:00
Konstantin Ananyev	65d9b7c664	bpf: fix convert API when libpcap missing rte_bpf_convert() implementation depends on libpcap. Right now it is defined only when this library is installed and RTE_PORT_PCAP is defined. Fix that by providing for such case stub rte_bpf_convert() implementation that will always return an error. To draw user attention, if proper implementation is disabled, warning will be thrown at meson configure stage. Also move stub for another function (rte_bpf_elf_load) into the same place (bpf_stub.c). Fixes: `2eccf6afbe` ("bpf: add function to convert classic BPF to DPDK BPF") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 19:56:20 +01:00
Konstantin Ananyev	7b0a120157	bpf: fix doxygen comment Fix typo in doxygen comments for rte_bpf_convert(). Fixes: `2eccf6afbe` ("bpf: add function to convert classic BPF to DPDK BPF") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 19:56:14 +01:00
David Marchand	54abd300d5	pipeline: remove unreachable branch A previous change blamed it on compiler/ASan, while this is a real (yet minor) issue. This return -EINVAL is never reached since we test all combinations of fidx and fcin booleans. All branches end up with a return 0, factorize them. Fixes: `84f5ac9418` ("pipeline: fix build with ASan") Fixes: `f38913b7fb` ("pipeline: add meter array to SWX") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-11-04 18:11:08 +01:00
Yogesh Jangra	2ce3ccbe44	pipeline: fix dead code Fix minor dead code issue reported by Coverity. Coverity issue: 373653 Fixes: e9d870 ("pipeline: add SWX pipeline tables") Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-11-04 16:43:27 +01:00
Wojciech Liguzinski	44c730b0e3	sched: add PIE based congestion management Implement PIE based congestion management based on rfc8033. The Proportional Integral Controller Enhanced (PIE) algorithm works by proactively dropping packets randomly. PIE is implemented as more advanced queue management is required to address the bufferbloat problem and provide desirable quality of service to users. Tests for PIE code added to test application. Added PIE related information to documentation. Signed-off-by: Wojciech Liguzinski <wojciechx.liguzinski@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>	2021-11-04 15:41:49 +01:00
David Marchand	5633173341	eal/linux: fix device hotplug The device event interrupt handler was always freed. Bugzilla ID: 845 Fixes: `c2bd9367e1` ("lib: remove direct access to interrupt handle") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-11-04 15:13:41 +01:00
David Marchand	4847122aab	eal/linux: fix uevent message parsing Caught with ASan: ==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578 READ of size 1 at 0x7f0daa2fc0d0 thread T1 #0 0x7f0daeefacb1 (/lib64/libasan.so.5+0xbacb1) #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167 #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248 #3 0x1169b91 in eal_intr_process_interrupts ../lib/eal/linux/eal_interrupts.c:1026 #4 0x116a3a2 in eal_intr_handle_interrupts ../lib/eal/linux/eal_interrupts.c:1100 #5 0x116a7f0 in eal_intr_thread_main ../lib/eal/linux/eal_interrupts.c:1172 #6 0x112640a in ctrl_thread_init ../lib/eal/common/eal_common_thread.c:202 #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159) #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72) Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192 in frame #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226 This frame has 2 object(s): [32, 48) 'uevent' [96, 4192) 'buf' <== Memory access at offset 4192 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions are supported) Thread T1 created by T0 here: #0 0x7f0daee92ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3) #1 0x1126542 in rte_ctrl_thread_create ../lib/eal/common/eal_common_thread.c:228 #2 0x116a8b5 in rte_eal_intr_init ../lib/eal/linux/eal_interrupts.c:1200 #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044 #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105 #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802) Bugzilla ID: 792 Fixes: `0d0f478d04` ("eal/linux: add uevent parse and process") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-11-04 15:13:41 +01:00
Jim Harris	628bac7df1	eal/linux: remove unused variable for socket memory clang-13 rightfully complains that the total_mem variable in eal_parse_socket_arg is set but not used, since the final accumulated total_mem result isn't used anywhere. So just remove the total_mem variable. Fixes: `0a703f0f36` ("eal/linux: fix parsing zero socket memory and limits") Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-11-04 13:27:18 +01:00
Vladimir Medvedkin	11c5b9b51a	fib: add RIB extension size parameter This patch adds a new parameter to the FIB configuration to specify the size of the extension for internal RIB structure. Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Tested-by: Conor Walsh <conor.walsh@intel.com>	2021-11-04 12:38:03 +01:00
Xueming Li	fc382022c6	eal: fix device iterator when no bus is selected Devargs used in device iterator initialization wasn't set to zero, random data like bus string lead to invalid address access. This patch initializes devargs. Bugzilla ID: 862 Fixes: `c99a2d4c6b` ("eal: implement device iteration initialization") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com>	2021-11-04 11:44:49 +01:00
Vladimir Medvedkin	adeca6685f	hash: fix use after free in Toeplitz hash This patch fixes use after free in thash library, reported by ASAN. Bugzilla ID: 868 Fixes: `28ebff11c2` ("hash: add predictable RSS") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-11-04 11:43:20 +01:00
Vladimir Medvedkin	d27e2b7e9c	hash: enable GFNI Toeplitz hash implementation This patch enables new GFNI Toeplitz hash in predictable RSS library. Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Vladimir Medvedkin	31d7c06947	hash: add bulk Toeplitz hash implementation This patch adds a bulk version for the Toeplitz hash implemented with Galios Fields New Instructions (GFNI). Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Vladimir Medvedkin	4fd8c4cb0d	hash: add new Toeplitz hash implementation This patch add a new Toeplitz hash implementation using Galios Fields New Instructions (GFNI). Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-04 11:19:10 +01:00
Dmitry Kozlyuk	9790fc2149	eal/freebsd: fix IOVA mode selection FreeBSD EAL selected IOVA mode PA even in --no-huge mode where PA are not available. Memory zones were created with IOVA equal to RTE_BAD_IOVA with no indication this field is not usable. Change IOVA mode detection: 1. Always allow to force --iova-mode=va. 2. In --no-huge mode, disallow forcing --iova-mode=pa, and select VA. 3. Otherwise select IOVA mode according to bus requests, default to PA. In case contigmem is inaccessible, memory initialization will fail with a message indicating the cause. Fixes: `c2361bab70` ("eal: compute IOVA mode based on PA availability") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-11-03 18:32:19 +01:00
Feifei Wang	6b70c6b31f	distributor: use wait until scheme Instead of polling for bufptr64 to be updated, use wait until scheme for this case. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	388bee69a5	bpf: use wait until scheme for Rx/Tx iteration Instead of polling for cbi->use to be updated, use wait until scheme. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	4ed4e554ac	mcslock: use wait until scheme for unlock Instead of polling for mcslock to be updated, use wait until scheme for this case. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	41902d2468	pflock: use wait until scheme for read lock Instead of polling for read pflock update, use wait until scheme for this case. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Feifei Wang	875f350924	eal: add a new helper for wait until scheme Add a new generic helper which is a macro for wait until scheme. Furthermore, to prevent compilation warning in arm: ---------------------------------------------- 'warning: implicit declaration of function ...' ---------------------------------------------- Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'. And add ‘__RTE_ARM’ for these macros to fix the namespace. This is because original macros are undefine at the end of the file. If the new macro calls them in other files, they will be seen as 'not defined'. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2021-11-03 15:50:14 +01:00
Konstantin Ananyev	53caecb844	pdump: fix freeing statistics memzone rte_pdump_init() always allocates new memzone for pdump_stats. Though rte_pdump_uninit() never frees it. So the following combination will always fail: rte_pdump_init(); rte_pdump_uninit(); rte_pdump_init(); The issue was caught by pdump_autotest UT. While first test run successful, any consecutive runs of this test-case will fail. Fix the issue by calling rte_memzone_free() for statistics memzone. Fixes: `10f726efe2` ("pdump: support pcapng and filtering") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2021-11-03 12:53:03 +01:00
Stephen Hemminger	b2be63b55a	pdump: fix packet snapshot length initialization If packet dump was enabled via pdump_enable_by_deviceid the packet snapshot length was not being set. Bugzilla ID: 840 Fixes: `10f726efe2` ("pdump: support pcapng and filtering") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2021-11-01 00:36:29 +01:00
Stephen Hemminger	ae1702fffe	pcapng: use new ethdev namespace RTE_ prefix was added by commit `295968d174` ("ethdev: add namespace") Fixes: `8d23ce8f5e` ("pcapng: add new library for writing pcapng files") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2021-10-31 23:25:02 +01:00
Zhihong Peng	6cc51b1293	mem: instrument allocator for ASan This patch adds necessary hooks in the memory allocator for ASan. This feature is currently available in DPDK only on Linux x86_64. If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be defined and RTE_MALLOC_ASAN must be set accordingly in meson. Signed-off-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2021-10-29 16:25:03 +02:00
Zhihong Peng	84f5ac9418	pipeline: fix build with ASan Code changes to avoid the following build error: "Control reaches end of non-void function". Signed-off-by: Xueqin Lin <xueqin.lin@intel.com> Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-29 15:25:34 +02:00
Anatoly Burakov	ab910a8068	vfio: fix partial unmap Partial unmap support was introduced in commit `c13ca4e81c` ("vfio: fix DMA mapping granularity for IOVA as VA"), and with it was added a check that dereferenced the IOMMU type to determine whether partial ummapping is supported for currently configured IOMMU type. In certain circumstances (such as when VFIO is supported, but no devices were bound to the VFIO driver), the IOMMU type pointer can be NULL. However, dereferencing of IOMMU type was guarded by access to the user maps list - that is, we were always checking the user map list first, and then, if we found a memory region that encloses the one we're trying to unmap, we would have performed the IOMMU type check. This ensured that the IOMMU type check will not cause any NULL pointer dereferences, because in order for an IOMMU type check to have been performed, there necessarily must have been at least one memory region that was previously mapped successfully, and that implies having a defined IOMMU type. When commit `56259f7fc0` ("vfio: allow partially unmapping adjacent memory") was introduced, the IOMMU type check was moved to before we were traversing the user mem maps list, thereby introducing a potential NULL dereference, because the IOMMU type access was no longer guarded by the user mem maps list traversal. Fix the issue by moving the IOMMU type check to after the user mem maps traversal, thereby ensuring that by the time the check happens, the IOMMU type is always valid. Fixes: `56259f7fc0` ("vfio: allow partially unmapping adjacent memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Xuan Ding <xuan.ding@intel.com>	2021-10-28 09:51:55 +02:00
Honnappa Nagarahalli	705356f081	eal: simplify control thread creation Remove the usage of pthread barrier and replace it with synchronization using atomic variable. This also removes the use of reference count required to synchronize freeing the memory. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 21:43:10 +02:00
Harman Kalra	8cb5d08db9	interrupts: extend event list Dynamically allocating the efds and elist array of intr_handle structure, based on size provided by user. Eg size can be MSIX interrupts supported by a PCI device. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	99e6c7e316	interrupts: rename device specific file descriptor VFIO/UIO are mutually exclusive, storing file descriptor in a single field is enough. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	73d844fd08	interrupts: make interrupt handle structure opaque Moving interrupt handle structure definition inside a EAL private header to make its fields totally opaque to the outside world. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	d61138d4f0	drivers: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the drivers access the interrupt handle fields. Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Hyong Youb Kim <hyonkim@cisco.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	c2bd9367e1	lib: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the libraries access the interrupt handle fields. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	90b13ab8d4	alarm: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the libraries access the interrupt handle fields. Implementing alarm cleanup routine, where the memory allocated for interrupt instance can be freed. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	bbbac4cd6e	interrupts: remove direct access to interrupt handle Making changes to the interrupt framework to use interrupt handle APIs to get/set any field. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	b7c9842916	interrupts: add allocator and accessors Prototype/Implement get set APIs for interrupt handle fields. User won't be able to access any of the interrupt handle fields directly while should use these get/set APIs to access/manipulate them. Internal interrupt header i.e. rte_eal_interrupt.h is rearranged, as APIs defined are moved to rte_interrupts.h and epoll specific definitions are moved to a new header rte_epoll.h. Later in the series rte_eal_interrupt.h will be removed. Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Dmitry Kozlyuk	0c8fc83a71	eal/windows: fix IOVA mode detection and handling Windows EAL did not detect IOVA mode and worked incorrectly if physical addresses could not be obtained (if virt2phys driver was missing or inaccessible). In this case, rte_mem_virt2iova() reported RTE_BAD_IOVA for any address. Inability to obtain IOVA, be it PA or VA, should cause a failure for the DPDK allocator, but it was hidden by the implementation, so allocations did not fail when they should. The mode when DPDK cannot obtain PA but can work is IOVA-as-VA mode. However, rte_eal_iova_mode() always returned RTE_IOVA_DC (while it should only ever return RTE_IOVA_PA or RTE_IOVA_VA), because IOVA mode detection was not implemented. Implement IOVA mode detection: 1. Always allow to force --iova-mode=va. 2. Allow to force --iova-mode=pa only if virt2phys is available. 3. If no mode is forced and virt2phys is available, select the mode according to bus requests, default to PA. 4. If no mode is forced but virt2phys is unavailable, default to VA. Fix rte_mem_virt2iova() by returning VA when using IOVA-as-VA. Fix rte_eal_iova_mode() by returning the selected mode. Fixes: `2a5d547a4a` ("eal/windows: implement basic memory management") Cc: stable@dpdk.org Reported-by: Tal Shnaiderman <talshn@nvidia.com> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Tested-by: Pallavi Kadam <pallavi.kadam@intel.com> Acked-by: Pallavi Kadam <pallavi.kadam@intel.com>	2021-10-25 20:59:40 +02:00
Harman Kalra	e6732d0d6e	mem: add telemetry infos Registering new telemetry callbacks to list named (memzones) and unnamed (malloc) memory reserved and return information based on arguments provided by user. Example: Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2 {"version": "DPDK 21.11.0-rc0", "pid": 59754, "max_output_len": 16384} Connected to application: "dpdk-testpmd" --> --> /eal/memzone_list {"/eal/memzone_list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]} --> --> --> /eal/memzone_info,0 {"/eal/memzone_info": {"Zone": 0, "Name": "rte_eth_dev_data", \ "Length": 225408, "Address": "0x13ffc0280", "Socket": 0, "Flags": 0, \ "Hugepage_size": 536870912, "Hugepage_base": "0x120000000", \ "Hugepage_used": 1}} --> --> --> /eal/memzone_info,6 {"/eal/memzone_info": {"Zone": 6, "Name": "MP_mb_pool_0_0", \ "Length": 669918336, "Address": "0x15811db80", "Socket": 0, \ "Flags": 0, "Hugepage_size": 536870912, "Hugepage_base": "0x140000000", \ "Hugepage_used": 2}} --> --> --> /eal/memzone_info,14 {"/eal/memzone_info": null} --> --> --> /eal/heap_list {"/eal/heap_list": [0]} --> --> --> /eal/heap_info,0 {"/eal/heap_info": {"Head id": 0, "Name": "socket_0", \ "Heap_size": 1610612736, "Free_size": 927645952, \ "Alloc_size": 682966784, "Greatest_free_size": 529153152, \ "Alloc_count": 482, "Free_count": 2}} Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: Ciara Power <ciara.power@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-25 19:39:54 +02:00
Vladimir Medvedkin	97e2ae4c58	rib: fix IPv6 depth mask Fixes: `03b8372a9a` ("rib: fix max depth IPv6 lookup") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>	2021-10-25 19:13:12 +02:00
Vladimir Medvedkin	b16ac53657	lpm6: fix buffer overflow This patch fixes buffer overflow reported by ASAN, please reference https://bugs.dpdk.org/show_bug.cgi?id=819 The rte_lpm6 keeps routing information for control plane purpose inside the rte_hash table which uses rte_jhash() as a hash function. From the rte_jhash() documentation: If input key is not aligned to four byte boundaries or a multiple of four bytes in length, the memory region just after may be read (but not used in the computation). rte_lpm6 uses 17 bytes keys consisting of IPv6 address (16 bytes) + depth (1 byte). This patch increases the size of the depth field up to uint32_t and sets the alignment to 4 bytes. Bugzilla ID: 819 Fixes: `86b3b21952` ("lpm6: store rules in hash table") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2021-10-25 19:08:16 +02:00
Vladimir Medvedkin	45523f494c	hash: fix Doxygen comment of Toeplitz file Fixes: `7574c3ef74` ("hash: add toeplitz algorithm used by RSS") Cc: stable@dpdk.org Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2021-10-25 19:06:07 +02:00
Honnappa Nagarahalli	3596537005	eal: fix memory ordering around lcore task accesses Ensure that the memory operations before the call to rte_eal_remote_launch are visible to the worker thread. Use the function pointer to execute in worker thread as the guard variable. Ensure that the memory operations in worker thread, that happen before it returns the status of the assigned function, are visible to the main thread. Use the variable containing the lcore's state as the guard variable. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli	f6c6c686f1	eal: remove FINISHED lcore state FINISHED state seems to be used to indicate that the worker's update of the 'state' is not visible to other threads. There seems to be no requirement to have such a state. Since the FINISHED state is removed, the API rte_eal_wait_lcore is updated to always return the status of the last function that ran in the worker core. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli	33969e9c61	eal: reset lcore task callback and argument In the rte_eal_remote_launch function, the lcore function pointer is checked for NULL. However, the pointer is never reset to NULL. Reset the lcore function pointer and argument after the worker has completed executing the lcore function. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Eli Britstein	6de430b707	eal/x86: avoid cast-align warning in memcpy functions Functions and macros in x86 rte_memcpy.h may cause cast-align warnings, when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static For example: In file included from main.c:24: /dpdk/build/include/rte_memcpy.h: In function 'rte_mov16': /dpdk/build/include/rte_memcpy.h:306:25: warning: cast increases required alignment of target type [-Wcast-align] 306 \| xmm0 = _mm_loadu_si128((const __m128i )src); \| ^ As the code assumes correct alignment, add first a (void ) or (const void *) castings, to avoid the warnings. Fixes: `9484092baa` ("eal/x86: optimize memcpy for AVX512 platforms") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com>	2021-10-25 17:28:12 +02:00
Eli Britstein	da0333c879	mbuf: avoid cast-align warning in data offset macro In rte_pktmbuf_mtod_offset macro, there is a casting from char * to type 't', which may cause cast-align warning when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static main.c: In function 'l2fwd_mac_updating': /dpdk/build/include/rte_mbuf_core.h:719:3: warning: cast increases required alignment of target type [-Wcast-align] 719 \| ((t)((char )(m)->buf_addr + (m)->data_off + (o))) \| ^ /dpdk/build/include/rte_mbuf_core.h:733:32: note: in expansion of macro 'rte_pktmbuf_mtod_offset' 733 \| #define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0) \| ^~~~~~~~~~~~~~~~~~~~~~~ As the code assumes correct alignment, add first a (void ) casting, to avoid the warning. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 17:27:48 +02:00
Eli Britstein	a3f8d05871	net: avoid cast-align warning in VLAN insert function In rte_vlan_insert there is a casting of rte_pktmbuf_prepend returned value to (struct rte_ether_hdr ), which causes cast-align warning when using strict cast align flag with supporting gcc: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 CFLAGS="-Wcast-align=strict" make V=1 -C examples/l2fwd clean static In file included from main.c:35: /dpdk/build/include/rte_ether.h:370:7: warning: cast increases required alignment of target type [-Wcast-align] 370 \| nh = (struct rte_ether_hdr ) \| ^ As the code assumes correct alignment, add first a (void *) casting, to avoid the warning. Fixes: `c974021a59` ("ether: add soft vlan encap/decap") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 17:27:17 +02:00
Dmitry Kozlyuk	6fda3ff6f0	mempool: fix non-IO flag inference When mempool had been created with RTE_MEMPOOL_F_NO_IOVA_CONTIG flag but later populated with valid IOVA, RTE_MEMPOOL_F_NON_IO was unset, while it should be kept. The unit test did not catch this because rte_mempool_populate_default() it used was populating with RTE_BAD_IOVA. Keep setting RTE_MEMPOOL_NON_IO at an empty mempool creation and add an assert for it in the unit test (remove the separate case). Do not reset the flag if RTE_MEMPOOL_F_ON_IOVA_CONTIG is set. Fixes: `11541c5c81` ("mempool: add non-IO flag") Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2021-10-25 16:52:56 +02:00
Jasvinder Singh	fd9e07a1f4	sched: promote a function as stable This API was introduced in 18.05, therefore removing experimental tag to promote it to stable state Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Ray Kinsella <mdr@ashroe.eu>	2021-10-25 15:14:22 +02:00
Yogesh Jangra	cd79e02058	pipeline: support action annotations Enable restricting the scope of an action to regular table entries or to the table default entry in order to support the P4 language tableonly or defaultonly annotations. Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-25 14:53:28 +02:00
Yogesh Jangra	0317c4521d	port: configure loop count for source port Add support for configurable number of loops through the input PCAP file for the source port. Added an additional parameter to source port CLI command. Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-25 14:30:32 +02:00
Yogesh Jangra	55095ccb7f	pipeline: fix instruction label check The instruction_data array was incorrectly indexed, which resulted in the array index getting out of bounds and sometimes segfault. Fixes: a1711f (“pipeline: add SWX Rx and extract instructions“) Cc: stable@dpdk.org Signed-off-by: Yogesh Jangra <yogesh.jangra@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2021-10-25 14:06:02 +02:00
David Marchand	e0d3a74d92	net: fix build with pedantic for L2TPv2 definitions Build is broken on RHEL7 following introduction of this new protocol. Fixes: `3a929df1f2` ("ethdev: support L2TPv2 and PPP procotol") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 09:33:15 +02:00
Olivier Matz	daa02b5cdd	mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-24 13:37:43 +02:00
Olivier Matz	5b63493241	mbuf: mark old VLAN offload flags as deprecated The flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT are marked as deprecated since commit `380a7aab1a` ("mbuf: rename deprecated VLAN flags") (2017). But they were not using the RTE_DEPRECATED macro, because it did not exist at this time. Add it, and replace usage of these flags. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-24 13:30:40 +02:00
Olivier Matz	0c03660db1	mbuf: remove duplicate definition of cksum offload flags The flags PKT_RX_L4_CKSUM_BAD and PKT_RX_IP_CKSUM_BAD are defined twice with the same value. Remove one of the occurrence, which was marked as "deprecated". Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2021-10-24 13:30:40 +02:00
Radu Nicolau	74176aec37	ipsec: fix telemetry text Set correct tunnel type telemetry text - tunnel type was wrongly set as IPv4-UDP for all types. Fixes: bf5b65a8e781 ("ipsec: support SA telemetry") Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2021-10-20 15:55:37 +02:00
Akhil Goyal	92cb130919	cryptodev: move device-specific structures The device specific structures - rte_cryptodev and rte_cryptodev_data are moved to cryptodev_pmd.h to hide it from the applications. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Tested-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	f6849cdcc6	cryptodev: use new flat array in fast path API Rework fast-path cryptodev functions to use rte_crypto_fp_ops[]. While it is an API/ABI breakage, this change is intended to be transparent for both users (no changes in user app is required) and PMD developers (no changes in PMD is required). Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	33cd3fd52f	cryptodev: add device probing finish function Added a rte_cryptodev_pmd_probing_finish API which need to be called by the PMD after the device is initialized completely. This will set the fast path function pointers in the flat array for secondary process. For primary process, these are set in rte_cryptodev_start. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	2fd66f758f	cryptodev: move inline APIs into separate structure Move fastpath inline function pointers from rte_cryptodev into a separate structure accessed via a flat array. The intention is to make rte_cryptodev and related structures private to avoid future API/ABI breakages. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Tested-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	7f3876ad54	cryptodev: allocate max space for internal queue array At queue_pair config stage, allocate memory for maximum number of queue pair pointers that a device can support. This will allow fast path APIs(enqueue_burst/dequeue_burst) to refer pointer to internal QP data without checking for currently configured QPs. This is required to hide the rte_cryptodev and rte_cryptodev_data structure from user. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Akhil Goyal	691e1f4d56	cryptodev: separate out internal structures A new header file rte_cryptodev_core.h is added and all internal data structures which need not be exposed directly to application are moved to this file. These structures are mostly used by drivers, but they need to be in the public header file as they are accessed by datapath inline functions for performance reasons. Signed-off-by: Akhil Goyal <gakhil@marvell.com> Tested-by: Rebecca Troy <rebecca.troy@intel.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2021-10-20 15:33:16 +02:00
Andrew Rybchenko	68e8ca7b59	ethdev: avoid usage of ULL for 64-bit unsigned constants Use UINT64_C() macro instead. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 19:11:35 +02:00
Andrew Rybchenko	4852c647d1	ethdev: replace single bit masks with macros The macros RTE_BIT32 and RTE_BIT64 are used to replace single bit masks. Do not switch VLAN offload flags since type is not fixed size. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 18:36:34 +02:00
Ferruh Yigit	295968d174	ethdev: add namespace Add 'RTE_ETH' namespace to all enums & macros in a backward compatible way. The macros for backward compatibility can be removed in next LTS. Also updated some struct names to have 'rte_eth' prefix. All internal components switched to using new names. Syntax fixed on lines that this patch touches. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>	2021-10-22 18:15:38 +02:00
Ivan Ilchenko	b26bee10ee	ethdev: forbid MTU set before device configure rte_eth_dev_configure() always sets MTU to either dev_conf.rxmode.mtu or RTE_ETHER_MTU if application doesn't provide the value. So, there is no point to allow rte_eth_dev_set_mtu() before since set value will be overwritten on configure anyway. Fixes: `1bb4a528c4` ("ethdev: fix max Rx packet length") Signed-off-by: Ivan Ilchenko <ivan.ilchenko@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 15:26:54 +02:00
Andrew Rybchenko	9ce1717d3e	ethdev: remove unused L2 tunnel mask defines Fixes: `cf47acc0f9` ("ethdev: remove L2 tunnel offload control API") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-22 12:03:52 +02:00
Xueming Li	93e441c9a0	ethdev: get device capability name as string This patch adds API to return name of device capability. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2021-10-22 00:08:57 +02:00
Xueming Li	dd22740cc2	ethdev: introduce shared Rx queue In current DPDK framework, each Rx queue is pre-loaded with mbufs to save incoming packets. For some PMDs, when number of representors scale out in a switch domain, the memory consumption became significant. Polling all ports also leads to high cache miss, high latency and low throughput. This patch introduces shared Rx queue. Ports in same Rx domain and switch domain could share Rx queue set by specifying non-zero sharing group in Rx queue configuration. Shared Rx queue is identified by share_rxq field of Rx queue configuration. Port A RxQ X can share RxQ with Port B RxQ Y by using same shared Rx queue ID. No special API is defined to receive packets from shared Rx queue. Polling any member port of a shared Rx queue receives packets of that queue for all member ports, port_id is identified by mbuf->port. PMD is responsible to resolve shared Rx queue from device and queue data. Shared Rx queue must be polled in same thread or core, polling a queue ID of any member port is essentially same. Multiple share groups are supported. PMD should support mixed configuration by allowing multiple share groups and non-shared Rx queue on one port. Example grouping and polling model to reflect service priority: Group1, 2 shared Rx queues per port: PF, rep0, rep1 Group2, 1 shared Rx queue per port: rep2, rep3, ... rep127 Core0: poll PF queue0 Core1: poll PF queue1 Core2: poll rep2 queue0 PMD advertise shared Rx queue capability via RTE_ETH_DEV_CAPA_RXQ_SHARE. PMD is responsible for shared Rx queue consistency checks to avoid member port's configuration contradict each other. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>	2021-10-22 00:08:50 +02:00
Huisong Li	17faaed854	ethdev: fix PCI device release in secondary process In secondary process, rte_eth_dev_close() doesn't clear eth_dev->data. If calling rte_dev_remove() after rte_eth_dev_close(), in rte_eth_dev_pci_generic_remove() function, the released eth device still can be found by its name in shared memory. As a result, the eth device will be released repeatedly. The state of the eth device is modified to RTE_ETH_DEV_UNUSED after rte_eth_dev_close(). So this state can be used to avoid this problem. Fixes: `dcd5c8112b` ("ethdev: add PCI driver helpers") Cc: stable@dpdk.org Signed-off-by: Huisong Li <lihuisong@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 23:15:34 +02:00
Xuan Ding	7c61fa08b7	vhost: enable IOMMU for async vhost The use of IOMMU has many advantages, such as isolation and address translation. This patch extends the capability of DMA engine to use IOMMU if the DMA engine is bound to vfio. When set memory table, the guest memory will be mapped into the default container of DPDK. Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Xuan Ding	56259f7fc0	vfio: allow partially unmapping adjacent memory Currently, if we map a memory area A, then map a separate memory area B that by coincidence happens to be adjacent to A, current implementation will merge these two segments into one, and if partial unmapping is not supported, these segments will then be only allowed to be unmapped in one go. In other words, given segments A and B that are adjacent, it is currently not possible to map A, then map B, then unmap A. Fix this by adding a notion of "chunk size", which will allow subdividing segments into equally sized segments whenever we are dealing with an IOMMU that does not support partial unmapping. With this change, we will still be able to merge adjacent segments, but only if they are of the same size. If we keep with our above example, adjacent segments A and B will be stored as separate segments if they are of different sizes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Li Feng	5a4fbe79e6	vhost: add sanity check on inflight last index The index in rte_vhost_set_last_inflight_io_split is from the frontend driver, check if it's in the virtqueue range. Fixes: `bb0c2de960` ("vhost: add APIs to operate inflight ring") Cc: stable@dpdk.org Signed-off-by: Li Feng <fengli@smartx.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Jie Wang	3a929df1f2	ethdev: support L2TPv2 and PPP procotol Added flow pattern items and header formats of L2TPv2 and PPP. Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Jie Wang <jie1x.wang@intel.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 14:15:59 +02:00
Andrew Rybchenko	55645ee65b	ethdev: remove full stop after short comments Full stop at the end of short comment just make line longer. It should be either everywhere or nowhere to be consistent. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	cc0a644450	ethdev: make device and data structures readable Add empty lines to separate fields commented using different styles. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	32ec9c6be7	ethdev: remove reserved fields from internal structures Fixes: `f9bdee267a` ("ethdev: hide internal structures") Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	bf73419d96	ethdev: fix EEPROM spelling Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	5906be5af6	ethdev: fix ID spelling in comments and log messages Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	5b49ba658b	ethdev: fix VLAN spelling including VLAN ID case Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ori Kam <orika@nvidia.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00
Andrew Rybchenko	064e90c419	ethdev: fix DCB and VMDq spelling Fix both in one changeset since they share line in a number of cases. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2021-10-21 13:43:56 +02:00

1 2 3 4 5 ...

7520 Commits