numam-dpdk

Author	SHA1	Message	Date
Mattias Rönnblom	34d785571f	eventdev/crypto: support appropriately report idle Update the event crypto adapter's service function to report as idle (i.e., return -EAGAIN) in case no crypto operations were performed. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2022-10-21 11:34:41 +02:00
Erik Gabriel Carrillo	329280c53e	service: fix early move to inactive status Assume thread T2 is a service lcore that is in the middle of executing a service function. Also, assume thread T1 concurrently calls rte_service_lcore_stop(), which will set the "service_active_on_lcore" state to false. If thread T1 then calls rte_service_may_be_active(), it can return zero even though T2 is still running the service function. If T1 then proceeds to free data being used by T2, a crash can ensue. Move the logic that clears the "service_active_on_lcore" state from the rte_service_lcore_stop() function to the service_runner_func() to ensure that we: - don't let the "service_active_on_lcore" state linger as 1 - don't clear the state early Fixes: `6550113be6` ("service: fix lingering active status") Cc: stable@dpdk.org Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2022-10-21 14:54:26 +02:00
Stephen Hemminger	8a0cf0c455	pdump: do not allow enable/disable in primary process Attempts to enable or disable pdump in primary process will fail with core dump because it is not valid to call rte_mp_request_sync() unless in a secondary process. Trap the error in the common code used for both enable and disable requests. Fixes: `660098d61f` ("pdump: use generic multi-process channel") Cc: stable@dpdk.org Reported-by: Sylvia Grundwürmer <sylvia.grundwuermer@b-plus.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-10-21 14:54:26 +02:00
David Marchand	eb870201b4	trace: remove limitation on directory Remove arbitrary limit on 12 characters of the file prefix used for the directory where to store the traces. Simplify the code by relying on dynamic allocations. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	477cc313a2	trace: remove limitation on trace point name The name of a trace point is provided as a constant string via the RTE_TRACE_POINT_REGISTER macro. We can rely on an explicit constant string in the binary and simply point at it. There is then no need for a (fixed size) copy. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	d4cbbee345	trace: fix metadata dump The API does not describe that metadata dump is conditioned to enabling any trace points. While at it, merge dump unit tests into the generic trace_autotest to enhance coverage. Fixes: `f6b2d65dcd` ("trace: implement debug dump") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	782dbf1791	trace: fix race in debug dump trace->nb_trace_mem_list access must be under trace->lock to avoid races with threads allocating/freeing their trace buffers. Fixes: `f6b2d65dcd` ("trace: implement debug dump") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	d6fd5a018e	trace: fix dynamically enabling trace points Enabling trace points at runtime was not working if no trace point had been enabled first at rte_eal_init() time. The reason was that trace.args reflected the arguments passed to --trace= EAL option. To fix this: - the trace subsystem initialisation is updated: trace directory creation is deferred to when traces are dumped (to avoid creating directories that may not be used), - per lcore memory allocation still relies on rte_trace_is_enabled() but this helper now tracks if any trace point is enabled. The documentation is updated accordingly, - cleanup helpers must always be called in rte_eal_cleanup() since some trace points might have been enabled and disabled in the lifetime of the DPDK application, With this fix, we can update the unit test and check that a trace point callback is invoked when expected. Note: - the 'trace' global variable might be shadowed with the argument passed to the functions dealing with trace point handles. 'tp' has been used for referring to trace_point object. Prefer 't' for referring to handles, Fixes: `84c4fae462` ("trace: implement operation APIs") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	3ee927d3e4	trace: rework loop on trace points Directly skip the block when a trace point does not match the user criteria. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	b980ced067	trace: fix leak with regexp The precompiled buffer initialised in regcomp must be freed before leaving rte_trace_regexp. Fixes: `84c4fae462` ("trace: implement operation APIs") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	1559663872	trace: fix mode change The API does not state that changing mode should be refused if no trace point is enabled. Remove this limitation. Fixes: `84c4fae462` ("trace: implement operation APIs") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	12b627bf77	trace: fix mode for new trace point If an application registers trace points later than rte_eal_init(), changes in the trace point mode were not applied. Fixes: `84c4fae462` ("trace: implement operation APIs") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
Nicolas Chautru	a53a025b45	bbdev: fix build with clang 3.4.2 Casting explicitly from enum to uint8_t to avoid compilation warning with clang 3.4.2: rte_bbdev.c:1179:13: error: comparison of constant 4 with expression of type 'enum rte_bbdev_enqueue_status' is always true [-Werror,-Wtautological-constant-out-of-range-compare] Bugzilla ID: 1095 Fixes: `1be86f2e94` ("bbdev: add device status info") Fixes: `4f08028c5e` ("bbdev: expose queue related warning and status") Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Tested-by: Ali Alnubani <alialnu@nvidia.com>	2022-10-11 01:34:07 +02:00
Shiqi Liu	d914c01036	node: check Rx element allocation As the possible failure of the malloc(), the not_checked and checked could be NULL pointer. Therefore, it should be better to check it in order to avoid the dereference of the NULL pointer. Fixes: `fa8054c8c8` ("examples/eventdev: add thread safe Tx worker pipeline") Cc: stable@dpdk.org Signed-off-by: Shiqi Liu <835703180@qq.com>	2022-10-10 17:53:12 +02:00
Zhirun Yan	afe67d1414	graph: fix node objects allocation For __rte_node_enqueue_prologue(), if the number of objs is more than the node->size * 2, the extra objs will write out of bounds memory. It should use __rte_node_stream_alloc_size() to request enough memory. And for rte_node_next_stream_put(), it will re-allocate a small size, when the node free space is small and new objs is less than the current node->size. Some objs pointers behind new size may be lost. And it will cause memory leak. It should request enough size of memory, containing the original objs and new objs at least. Fixes: `40d4f51403` ("graph: implement fastpath routines") Cc: stable@dpdk.org Signed-off-by: Zhirun Yan <zhirun.yan@intel.com> Signed-off-by: Cunming Liang <cunming.liang@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2022-10-10 17:30:39 +02:00
Andrew Rybchenko	90cf759aaf	mempool: avoid usage of term ring on put Term ring is misleading since it is the default, but still just one of possible drivers to store objects. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>	2022-10-10 17:24:22 +02:00
Andrew Rybchenko	e3f138aa91	mempool: check driver enqueue result in one place Enqueue operation must not fail. Move corresponding debug check from one particular case to dequeue operation helper in order to do it for all invocations. Log critical message with useful information instead of rte_panic(). Make rte_mempool_do_generic_put() implementation more readable and fix incosistency when return value is not checked in one place and checked in another. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>	2022-10-10 17:17:48 +02:00
Bruce Richardson	66f624e4ea	kni: add deprecation warning at runtime When KNI is being used at runtime, output a warning message about its deprecated status. This is part of the deprecation process for KNI agreed by the DPDK technical board.[1] [1] https://mails.dpdk.org/archives/dev/2022-June/243596.html Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2022-10-10 17:04:09 +02:00
Bruce Richardson	bbaf917565	kni: flag deprecated status at build time To ensure all users are aware of KNI's deprecated status at build time, this library is marked as a deprecated library: the library is disabled by default. It can be re-enabled by setting disabled_libs to the empty string (or other string not including 'kni'). The dependent NIC driver, drivers/net/kni, is disabled accordingly as it depends on the library. NOTE: This is part of the deprecation process for KNI agreed by the DPDK technical board.[1] [1] https://mails.dpdk.org/archives/dev/2022-June/243596.html Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2022-10-10 17:01:59 +02:00
Bruce Richardson	dfd5b25b57	build: introduce deprecated libraries Add support for a list of deprecated libs to the lib/meson.build file. This will be used to mark libraries that are planned to be removed from DPDK. The first user of this will be KNI in a next patch. Deprecated libraries should still be tested in the CI, so update our build testing and CI scripts. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: David Marchand <david.marchand@redhat.com>	2022-10-10 17:01:56 +02:00
Dmitry Kozlyuk	03b3cdf9c2	mempool: make event callbacks process-private Callbacks for mempool events were registered in a process-shared tailq. This was inherently incorrect because the same function may be loaded to a different address in each process. Make the tailq process-private. Use the EAL tailq lock to reduce the number of different locks this module operates. Fixes: `da2b9cb25e` ("mempool: add event callbacks") Cc: stable@dpdk.org Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-10 16:38:03 +02:00
Tadhg Kearney	60b8a661a9	power: add Intel uncore frequency control Add API to allow uncore frequency adjustment. Uncore is a term used by Intel to describe function of a microprocessor that are closely connected to the core to achieve high performance. This is done through manipulating related uncore frequency control sysfs entries to adjust the minimum and maximum uncore frequency values and works on Linux for Intel hardware. Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com> Reviewed-by: David Hunt <david.hunt@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2022-10-10 14:53:40 +02:00
Leyi Rong	373b51ef02	member: fix build with GCC 5.4.0 This patch fixes the build failure by typecasting to match _mm512_i32gather_epi64() definition. Bugzilla ID: 1096 Fixes: `db354bd2e1` ("member: add NitroSketch mode") Signed-off-by: Leyi Rong <leyi.rong@intel.com> Tested-by: Ali Alnubani <alialnu@nvidia.com>	2022-10-10 12:20:01 +02:00
Markus Theil	de254dac60	power: read P-state turbo percentage from sysfs If DPDK applications should be used with a minimal set of privileges, using the msr kernel module on linux should not be necessary. Since at least kernel 4.4 the rdmsr call to obtain the last non-turbo boost frequency can be left out, if the sysfs interface is used. Also RHEL 7 with recent kernel updates should include the sysfs interface for this (I only looked this up for CentOS 7). Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Tested-by: David Hunt <david.hunt@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2022-10-10 02:52:26 +02:00
Mário Kuka	0744f1c9f9	pcapng: fix write more packets than IOV_MAX limit The rte_pcapng_write_packets() function fails when we try to write more packets than the IOV_MAX limit. writev() system call is limited by the IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and less than or equal to IOV_MAX as defined in <limits.h>. To avoid this problem, we can check that all segments of the next packet will fit into the iovec buffer, whose capacity will be limited by the IOV_MAX limit. If not, we flush the current iovec buffer to the file by calling writev() and, if successful, fit the current packet at the beginning of the flushed iovec buffer. Fixes: `8d23ce8f5e` ("pcapng: add new library for writing pcapng files") Cc: stable@dpdk.org Signed-off-by: Mário Kuka <kuka@cesnet.cz> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2022-10-10 02:42:36 +02:00
Stephen Hemminger	668958f3c1	eal: fix data race in multi-process support If DPDK is built with thread sanitizer it reports a race in setting of multiprocess file descriptor. The fix is to use atomic operations when updating mp_fd. Build: $ meson -Db_sanitize=address build $ ninja -C build Simple example: $ .build/app/dpdk-testpmd -l 1-3 --no-huge EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' testpmd: No probed ethernet devices testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc EAL: Error - exiting with code: 1 Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory ================== WARNING: ThreadSanitizer: data race (pid=87245) Write of size 4 at 0x558e04d8ff70 by main thread: #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c) #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929) #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a) #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011) #4 main <null> (dpdk-testpmd+0x5cc15d) Previous read of size 4 at 0x558e04d8ff70 by thread T2: #0 mp_handle <null> (dpdk-testpmd+0x1e7c439) #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e) As if synchronized via sleep: #0 nanosleep libsanitizer/tsan/tsan_interceptors_posix.cpp:366 #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9) #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc) #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4) #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578) #5 main <null> (dpdk-testpmd+0x5cbc45) Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70) Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at: #0 pthread_create libsanitizer/tsan/tsan_interceptors_posix.cpp:969 #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0) #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c) #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e) #4 main <null> (dpdk-testpmd+0x5cbc45) SUMMARY: ThreadSanitizer: data race (app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup ================== ThreadSanitizer: reported 1 warnings Fixes: `bacaa27540` ("eal: add channel for multi-process communication") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>	2022-10-10 01:58:31 +02:00
Leyi Rong	db354bd2e1	member: add NitroSketch mode Sketching algorithm provide high-fidelity approximate measurements and appears as a promising alternative to traditional approaches such as packet sampling. NitroSketch [1] is a software sketching framework that optimizes performance, provides accuracy guarantees, and supports a variety of sketches. This commit adds a new data structure called sketch into membership library. This new data structure is an efficient way to profile the traffic for heavy hitters. Also use min-heap structure to maintain the top-k flow keys. [1] Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir Braverman, Roy Friedman, Vyas Sekar, "NitroSketch: Robust and General Sketch-based Monitoring in Software Switches", in ACM SIGCOMM 2019. https://dl.acm.org/doi/pdf/10.1145/3341302.3342076 Signed-off-by: Alan Liu <zaoxingliu@gmail.com> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Leyi Rong <leyi.rong@intel.com> Tested-by: Yu Jiang <yux.jiang@intel.com>	2022-10-09 23:11:43 +02:00
Yuan Wang	605975b8b3	ethdev: introduce protocol-based buffer split Currently, Rx buffer split supports length based split. With Rx queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment configured, PMD will be able to split the received packets into multiple segments. However, length based buffer split is not suitable for NICs that do split based on protocol headers. Given an arbitrarily variable length in Rx packet segment, it is almost impossible to pass a fixed protocol header to driver. Besides, the existence of tunneling results in the composition of a packet is various, which makes the situation even worse. This patch extends current buffer split to support protocol header based buffer split. A new proto_hdr field is introduced in the reserved field of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr field defines the split position of packet, splitting will always happen after the protocol header defined in the Rx packet segment. When Rx queue offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding protocol header is configured, driver will split the ingress packets into multiple segments. Examples for proto_hdr field defines: To split after ETH-IPV4-UDP, it should be defined as proto_hdr = RTE_PTYPE_L2_ETHER \| RTE_PTYPE_L3_IPV4_EXT_UNKNOWN \| RTE_PTYPE_L4_UDP For inner ETH-IPV4-UDP, it should be defined as proto_hdr = RTE_PTYPE_TUNNEL_GRENAT \| RTE_PTYPE_INNER_L2_ETHER \| RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN \| RTE_PTYPE_INNER_L4_UDP If the protocol header is repeated with the previously defined one, the repeated part should be omitted. For example, split after ETH, ETH-IPV4 and ETH-IPV4-UDP, it should be defined as proto_hdr0 = RTE_PTYPE_L2_ETHER proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN proto_hdr2 = RTE_PTYPE_L4_UDP If protocol header split can be supported by a PMD, the rte_eth_buffer_split_get_supported_hdr_ptypes function can be used to obtain a list of these protocol headers. For example, let's suppose we configured the Rx queue with the following segments: seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER \| RTE_PTYPE_L3_IPV4, off0=2B seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B seg2 - pool2, proto_hdr2=0, off1=0B The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like following: seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 seg1 - udp header @ 128 in mbuf from pool1 seg2 - payload @ 0 in mbuf from pool2 Now buffer split can be configured in two modes. User can choose length or protocol header to configure buffer split according to NIC's capability. For length based buffer split, the mp, length, offset field in Rx packet segment should be configured, while the proto_hdr field must be 0. For protocol header based buffer split, the mp, offset, proto_hdr field in Rx packet segment should be configured, while the length field must be 0. Note: When protocol header split is enabled, NIC may receive packets which do not match all the protocol headers within the Rx segments. At this point, NIC will have two possible split behaviors according to matching results, one is exact match, another is longest match. The split result of NIC must belong to one of them. The exact match means NIC only do split when the packets exactly match all the protocol headers in the segments. Otherwise, the whole packet will be put into the last valid mempool. The longest match means NIC will do split until packets mismatch the protocol header in the segments. The rest will be put into the last valid pool. Pseudo-code for exact match: FOR each seg in segs except last one IF proto_hdr is not matched THEN BREAK END IF END FOR IF loop breaked THEN put whole pkt in last seg ELSE put protocol header in each seg put everything else in last seg END IF Pseudo-code for longest match: FOR each seg in segs except last one IF proto_hdr is matched THEN put protocol header in seg ELSE BREAK END IF END FOR put everything else in last seg Signed-off-by: Yuan Wang <yuanx.wang@intel.com> Signed-off-by: Xuan Ding <xuan.ding@intel.com> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-09 16:41:27 +02:00
Yuan Wang	e4e6f4cbf9	ethdev: introduce protocol header API Add a new ethdev API to retrieve supported protocol headers of a PMD, which helps to configure protocol header based buffer split. Signed-off-by: Yuan Wang <yuanx.wang@intel.com> Signed-off-by: Xuan Ding <xuan.ding@intel.com> Signed-off-by: Wenxuan Wu <wenxuanx.wu@intel.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-09 16:41:24 +02:00
Jun Qiu	b8a55871d5	gro: trim tail padding bytes Exclude CRC fields, the minimum Ethernet packet length is 60 bytes. When the actual packet length is less than 60 bytes, padding is added to the tail. When GRO is performed on a packet containing a padding field, mbuf->pkt_len is the one that contains the padding field, which leads to the error of thinking of the padding field as the actual content of the packet. We need to trim away this extra padding field during GRO processing. Fixes: `0d2cbe59b7` ("lib/gro: support TCP/IPv4") Cc: stable@dpdk.org Signed-off-by: Jun Qiu <jun.qiu@jaguarmicro.com> Acked-by: Jiayu Hu <Jiayu.hu@intel.com>	2022-10-09 19:36:57 +02:00
Nicolas Chautru	b3af222778	bbdev: remove unnecessary checks Code clean up due to if-check not required Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2022-10-07 08:44:58 +02:00
Nicolas Chautru	4f08028c5e	bbdev: expose queue related warning and status Added parameters in rte_bbdev_queue_data to expose information with regards to any queue related failure and warning which cannot be supported in existing API. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2022-10-07 08:44:58 +02:00
Nicolas Chautru	9d3933252d	bbdev: add operation for FFT processing Extended bbdev operations to support FFT based operations. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2022-10-07 08:44:58 +02:00
Nicolas Chautru	53115a4e5d	bbdev: add device info on queue topology Added more options in the API to expose the number of queues exposed and related priority. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2022-10-07 08:44:58 +02:00
Nicolas Chautru	1be86f2e94	bbdev: add device status info Added device status information, so that the PMD can expose information related to the underlying accelerator device status. Minor order change in structure to fit into padding hole. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Acked-by: Mingshan Zhang <mingshan.zhang@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2022-10-07 08:44:58 +02:00
Nicolas Chautru	e70212cc24	bbdev: allow operation type enum for growth Updated the enum for rte_bbdev_op_type to allow to keep ABI compatible for enum insertion while adding padded maximum value for array need. Removing RTE_BBDEV_OP_TYPE_COUNT and instead exposing RTE_BBDEV_OP_TYPE_SIZE_MAX. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Akhil Goyal <gakhil@marvell.com>	2022-10-07 08:44:58 +02:00
Gerry Gribbon	70f1ea713f	regexdev: add maximum number of mbuf segments Allows application to query maximum number of mbuf segments that can be chained together. Signed-off-by: Gerry Gribbon <ggribbon@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2022-10-09 14:54:30 +02:00
Shijith Thotton	5812b32773	mbuf: move next pointer to first cache line if PA disabled Swapped position of mbuf next pointer and second dynamic field (dynfield2) if the build is configured to disable IOVA as PA. This is to move the mbuf next pointer to first cache line. Signed-off-by: Shijith Thotton <sthotton@marvell.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-09 13:14:57 +02:00
Shijith Thotton	03b57eb7ab	mbuf: add second dynamic field member If IOVA as PA is disabled during build, mbuf physical address field is undefined. This space is used to add the second dynamic field. Signed-off-by: Shijith Thotton <sthotton@marvell.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-09 13:14:57 +02:00
Shijith Thotton	a986c2b797	build: add option to configure IOVA mode as PA IOVA mode in DPDK is either PA or VA. The new build option enable_iova_as_pa configures the mode to PA at compile time. By default, this option is enabled. If the option is disabled, only drivers which support it are enabled. Supported driver can set the flag pmd_supports_disable_iova_as_pa in its build file. mbuf structure holds the physical (PA) and virtual address (VA). If IOVA as PA is disabled at compile time, PA field (buf_iova) of mbuf is redundant as it is the same as VA and is replaced by a dummy field. Signed-off-by: Shijith Thotton <sthotton@marvell.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-09 13:14:52 +02:00
Shijith Thotton	e811e2d76f	mbuf: add helper to get/set IOVA address Added APIs rte_mbuf_iova_set and rte_mbuf_iova_get to set and get the physical address of an mbuf respectively. Updated applications and library to use the same. Signed-off-by: Shijith Thotton <sthotton@marvell.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-08 23:58:26 +02:00
Morten Brørup	a2833ecc5e	mempool: fix get objects from mempool with cache A flush threshold for the mempool cache was introduced in DPDK version 1.3, but rte_mempool_do_generic_get() was not completely updated back then, and some inefficiencies were introduced. Fix the following in rte_mempool_do_generic_get(): 1. The code that initially screens the cache request was not updated with the change in DPDK version 1.3. The initial screening compared the request length to the cache size, which was correct before, but became irrelevant with the introduction of the flush threshold. E.g. the cache can hold up to flushthresh objects, which is more than its size, so some requests were not served from the cache, even though they could be. The initial screening has now been corrected to match the initial screening in rte_mempool_do_generic_put(), which verifies that a cache is present, and that the length of the request does not overflow the memory allocated for the cache. This bug caused a major performance degradation in scenarios where the application burst length is the same as the cache size. In such cases, the objects were not ever fetched from the mempool cache, regardless if they could have been. This scenario occurs e.g. if an application has configured a mempool with a size matching the application's burst size. 2. The function is a helper for rte_mempool_generic_get(), so it must behave according to the description of that function. Specifically, objects must first be returned from the cache, subsequently from the backend. After the change in DPDK version 1.3, this was not the behavior when the request was partially satisfied from the cache; instead, the objects from the backend were returned ahead of the objects from the cache. This bug degraded application performance on CPUs with a small L1 cache, which benefit from having the hot objects first in the returned array. (This is probably also the reason why the function returns the objects in reverse order, which it still does.) Now, all code paths first return objects from the cache, subsequently from the backend. The function was not behaving as described (by the function using it) and expected by applications using it. This in itself is also a bug. 3. If the cache could not be backfilled, the function would attempt to get all the requested objects from the backend (instead of only the number of requested objects minus the objects available in the backend), and the function would fail if that failed. Now, the first part of the request is always satisfied from the cache, and if the subsequent backfilling of the cache from the backend fails, only the remaining requested objects are retrieved from the backend. The function would fail despite there are enough objects in the cache plus the common pool. 4. The code flow for satisfying the request from the cache was slightly inefficient: The likely code path where the objects are simply served from the cache was treated as unlikely. Now it is treated as likely. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>	2022-10-08 22:52:51 +02:00
Hanumanth Pothula	458485eb21	ethdev: support multiple mbuf pools per Rx queue Some of the HW has support for choosing memory pools based on the packet's size. This is often useful for saving the memory where the application can create a different pool to steer the specific size of the packet, thus enabling more efficient usage of memory. For example, let's say HW has a capability of three pools, - pool-1 size is 2K - pool-2 size is > 2K and < 4K - pool-3 size is > 4K Here, pool-1 can accommodate packets with sizes < 2K pool-2 can accommodate packets with sizes > 2K and < 4K pool-3 can accommodate packets with sizes > 4K With multiple mempool capability enabled in SW, an application may create three pools of different sizes and send them to PMD. Allowing PMD to program HW based on the packet lengths. So that packets with less than 2K are received on pool-1, packets with lengths between 2K and 4K are received on pool-2 and finally packets greater than 4K are received on pool-3. Signed-off-by: Hanumanth Pothula <hpothula@marvell.com> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-08 18:39:02 +02:00
Andrew Rybchenko	b7fc7c5366	ethdev: factor out helper function to check Rx mempool Avoid Rx mempool checks duplication logic. Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-08 18:34:55 +02:00
Dariusz Sosnowski	bc705061cb	ethdev: introduce hairpin memory capabilities Before this patch, implementation details and configuration of hairpin queues were decided internally by the PMD. Applications had no control over the configuration of Rx and Tx hairpin queues, despite number of descriptors, explicit Tx flow mode and disabling automatic binding. This patch addresses that by adding: - Hairpin queue capabilities reported by PMDs. - New configuration options for Rx and Tx hairpin queues. Main goal of this patch is to allow applications to provide configuration hints regarding placement of hairpin queues. These hints specify whether buffers of hairpin queues should be placed in host memory or in dedicated device memory. Different memory options may have different performance characteristics and hairpin configuration should be fine-tuned to the specific application and use case. This patch introduces new hairpin queue configuration options through rte_eth_hairpin_conf struct, allowing to tune Rx and Tx hairpin queues memory configuration. Hairpin configuration is extended with the following fields: - use_locked_device_memory - If set, PMD will use specialized on-device memory to store RX or TX hairpin queue data. - use_rte_memory - If set, PMD will use DPDK-managed memory to store RX or TX hairpin queue data. - force_memory - If set, PMD will be forced to use provided memory settings. If no appropriate resources are available, then device start will fail. If unset and no resources are available, PMD will fallback to using default type of resource for given queue. If application chooses to use PMD default memory configuration, all of these flags should remain unset. Hairpin capabilities are also extended, to allow verification of support of given hairpin memory configurations. Struct rte_eth_hairpin_cap is extended with two additional fields of type rte_eth_hairpin_queue_cap: - rx_cap - memory capabilities of hairpin RX queues. - tx_cap - memory capabilities of hairpin TX queues. Struct rte_eth_hairpin_queue_cap exposes whether given queue type supports use_locked_device_memory and use_rte_memory flags. Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>	2022-10-08 18:22:01 +02:00
Jerin Jacob	6b81dddbb9	ethdev: support congestion management NIC HW controllers often come with congestion management support on various HW objects such as Rx queue depth or mempool queue depth. Also, it can support various modes of operation such as RED (Random early discard), WRED etc on those HW objects. Add a framework to express such modes(enum rte_cman_mode) and introduce (enum rte_eth_cman_obj) to enumerate the different objects where the modes can operate on. Add RTE_CMAN_RED mode of operation and RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL objects. Introduce reserved fields in configuration structure backed by rte_eth_cman_config_init() to add new configuration parameters without ABI breakage. Add rte_eth_cman_info_get() API to get the information such as supported modes and objects. Add rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs to configure congestion management on those object with associated mode. Finally, add rte_eth_cman_config_get() API to retrieve the applied configuration. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-07 11:50:28 +02:00
Dongdong Liu	092b701fe3	ethdev: introduce Rx/Tx descriptor dump API Added the ethdev Rx/Tx desc dump API which provides functions for query descriptor from device. HW descriptor info differs in different NICs. The information demonstrates I/O process which is important for debug. As the information is different between NICs, the new API is introduced. Signed-off-by: Min Hu (Connor) <humin29@huawei.com> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@xilinx.com>	2022-10-06 18:38:48 +02:00
Thomas Monjalon	04cf171cb3	doc: relate bifurcated driver and flow isolated mode The relation between the isolated mode in ethdev flow API and bifurcated driver behaviour was not clearly explained. It is made clear in the how-to guide that isolated mode is required for flow bifurcation to the kernel. On the other side, the impact of the isolated mode on a bifurcated driver is made more explicit. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Dariusz Sosnowski <dsosnowski@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>	2022-10-04 17:01:03 +02:00
Olivier Matz	7dcd73e379	drivers/bus: set device NUMA node to unknown by default The dev->device.numa_node field is set by each bus driver for every device it manages to indicate on which NUMA node this device lies. When this information is unknown, the assigned value is not consistent across the bus drivers. Set the default value to SOCKET_ID_ANY (-1) by all bus drivers when the NUMA information is unavailable. This change impacts rte_eth_dev_socket_id() in the same manner. Signed-off-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-06 21:26:55 +02:00
Tyler Retzlaff	a2e94ca89f	eal: add thread comparison helper Add rte_thread_equal() that tests if two rte_thread_id are equal. Signed-off-by: Narcisa Vasile <navasile@linux.microsoft.com> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Chengwen Feng <fengchengwen@huawei.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>	2022-10-06 21:05:32 +02:00

1 2 3 4 5 ...

8085 Commits