numam-dpdk

Author	SHA1	Message	Date
Jun Qiu	bdd0c62c69	hash: fix RCU configuration memory leak The memory of h->hash_rcu_cfg which is allocated in rte_hash_rcu_qsbr_add was leaked. Fixes: `769b2de7fb` ("hash: implement RCU resources reclamation") Cc: stable@dpdk.org Signed-off-by: Jun Qiu <jun.qiu@jaguarmicro.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2022-11-14 11:03:54 +01:00
Tadhg Kearney	08aa805a0b	power: fix double free of opened files Fix double free of f_min and f_max by reverting the fclose() for f_min and f_max. As f_min and f_max are stored for further use and closed in uncore deinitialization. Fixes: `b127e74cce` ("power: fix open file descriptors leak") Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com>	2022-11-14 10:39:24 +01:00
Jerin Jacob	58794bf8d2	power: fix some doxygen comments Fix following syntax error reported by doxygen 1.9.5 version. lib/power/rte_power.h:169: error: rte_power_freq_up has @param documentation sections but no arguments (warning treated as error, aborting now) Fixes: `d7937e2e3d` ("power: initial import") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerinj@marvell.com>	2022-11-14 10:38:41 +01:00
Jerin Jacob	61c7dfe75a	eal: fix doxygen comments for UUID Fix following syntax error reported by doxygen 1.9.5 version. lib/eal/include/rte_uuid.h:89: error: RTE_UUID_STRLEN has @param documentation sections but no arguments (warning treated as error, aborting now) Fixes: `6bc67c497a` ("eal: add uuid API") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerinj@marvell.com>	2022-11-14 10:37:33 +01:00
Morten Brørup	203dcc9cfe	mempool: use cache for frequently updated stats When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the performance of mempools with caches is improved as follows. When accessing objects in the mempool, either the put_bulk and put_objs or the get_success_bulk and get_success_objs statistics counters are likely to be incremented. By adding an alternative set of these counters to the mempool cache structure, accessing the dedicated statistics structure is avoided in the likely cases where these counters are incremented. The trick here is that the cache line holding the mempool cache structure is accessed anyway, in order to access the 'len' or 'flushthresh' fields. Updating some statistics counters in the same cache line has lower performance cost than accessing the statistics counters in the dedicated statistics structure, which resides in another cache line. mempool_perf_autotest with this patch shows the following improvements in rate_persec. The cost of enabling mempool stats (without debug) after this patch: -6.8 % and -6.7 %, respectively without and with cache. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>	2022-11-10 17:32:54 +01:00
Morten Brørup	17749e4d64	mempool: add stats for unregistered non-EAL threads This patch adds statistics for unregistered non-EAL threads, which was previously not included in the statistics. Add one more entry to the stats array, and use the last index for unregistered non-EAL threads. The unregistered non-EAL thread statistics are incremented atomically. In theory, the EAL thread counters should also be accessed atomically to avoid tearing on 32 bit architectures. However, it was decided to avoid the performance cost of using atomic operations, because: 1. these are debug counters, and 2. statistics counters in DPDK are usually incremented non-atomically. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>	2022-11-10 17:32:54 +01:00
Morten Brørup	9d87e05d08	mempool: split stats from debug mode Split stats from debug, to make mempool statistics available without the performance cost of continuously validating the debug cookies in the mempool elements. mempool_perf_autotest shows the following improvements in rate_persec. The cost of enabling mempool debug without this patch: -28.1 % and -74.0 %, respectively without and with cache. The cost of enabling mempool stats (without debug) after this patch: -5.8 % and -21.2 %, respectively without and with cache. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>	2022-11-10 17:32:45 +01:00
Nicolas Chautru	c49c880ffe	doc: include bbdev code snippet using literal include Adding code snippet using literalinclude so that to keep automatically these structures in doc in sync with the bbdev source code. Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>	2022-10-29 13:01:41 +02:00
Morten Brørup	b77f58604a	mempool: align cache objects on cache lines Add __rte_cache_aligned to the objs array. It makes no difference in the general case, but if get/put operations are always 32 objects, it will reduce the number of memory (or last level cache) accesses from five to four 64 B cache lines for every get/put operation. For readability reasons, an example using 16 objects follows: Currently, with 16 objects (128B), we access to 3 cache lines: ┌────────┐ │len │ cache │******│--- line0 │****│ ^ │****│ \| ├────────┤ \| 16 objects │****│ \| 128B cache │****│ \| line1 │****│ \| │****│ \| ├────────┤ \| │****│_v_ cache │ │ line2 │ │ │ │ └────────┘ With the alignment, it is also 3 cache lines: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤--- │****│ ^ cache │****│ \| line1 │****│ \| │****│ \| ├────────┤ \| 16 objects │****│ \| 128B cache │****│ \| line2 │****│ \| │****│ v └────────┘--- However, accessing the objects at the bottom of the mempool cache is a special case, where cache line0 is also used for objects. Consider the next burst (and any following bursts): Current: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │****│--- line2 │****│ ^ │****│ \| ├────────┤ \| 16 objects │****│ \| 128B cache │****│ \| line3 │****│ \| │****│ \| ├────────┤ \| │****│_v_ cache │ │ line4 │ │ │ │ └────────┘ 4 cache lines touched, incl. line0 for len. With the proposed alignment: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │ │ line2 │ │ │ │ ├────────┤ │****│--- cache │****│ ^ line3 │****│ \| │****│ \| 16 objects ├────────┤ \| 128B │****│ \| cache │****│ \| line4 │****│ \| │******│_v_ └────────┘ Only 3 cache lines touched, incl. line0 for len. Credits go to Olivier Matz for the nice ASCII graphics. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-30 10:07:58 +01:00
Michael Baum	86fe1b01fa	ethdev: add structure for indirect flow age update Add a new structure for indirect AGE update. This new structure enables: 1. Update timeout value. 2. Stop AGE checking. 3. Start AGE checking. 4. restart AGE checking. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-28 12:41:03 +02:00
Michael Baum	966eb55e9a	ethdev: add queue-based API to report aged flow rules When application use queue-based flow rule management and operate the same flow rule on the same queue, e.g create/destroy/query, API of querying aged flow rules should also have queue id parameter just like other queue-based flow APIs. By this way, PMD can work in more optimized way since resources are isolated by queue and needn't synchronize. If application do use queue-based flow management but configure port without RTE_FLOW_PORT_FLAG_STRICT_QUEUE, which means application operate a given flow rule on different queues, the queue id parameter will be ignored. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-28 12:41:03 +02:00
Michael Baum	dcc9a80c20	ethdev: add strict queue to pre-configuration flow hints The data-path focused flow rule management can manage flow rules in more optimized way than traditional one by using hints provided by application in initialization phase. In addition to the current hints we have in port attr, more hints could be provided by application about its behaviour. One example is how the application do with the same flow rule ? A. create/destroy flow on same queue but query flow on different queue or queue-less way (i.e, counter query) B. All flow operations will be exactly on the same queue, by which PMD could be in more optimized way then A because resource could be isolated and access based on queue, without lock, for example. This patch add flag about above situation and could be extended to cover more situations. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-28 12:41:03 +02:00
Megha Ajmera	25eae0f790	sched: fix subport profile configuration In rte_sched_subport_config() API, subport_profile_id is not set correctly. Fixes: `ac6fcb841b` ("sched: update subport rate dynamically") Cc: stable@dpdk.org Signed-off-by: Megha Ajmera <megha.ajmera@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2022-10-28 16:20:59 +02:00
David Marchand	9f81548430	flow_classify: mark library as deprecated This library has no maintainer and, for now, nobody expressed interest in taking over. Mark this experimental library as deprecated and announce plan for removal in v23.11. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Ferruh Yigit <ferruh.yigit@amd.com> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2022-10-28 16:20:59 +02:00
Markus Theil	45d7cf91ba	build: export include directories list In order to perform things like LTO more easily in our DPDK applications, we use DPDK as a meson subproject. Export include directories list in order to be usable in this context. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2022-10-28 14:27:48 +02:00
Maxime Coquelin	755a8eaf3f	vhost: promote per-queue stats API to stable This patch promotes the per-queue stats API to stable. The API has been used by the Vhost PMD since v22.07, and David Marchand posted a patch to make use of it in next OVS release[0]. [0]: http://patchwork.ozlabs.org/project/openvswitch/patch/20221007111613.1695524-4-david.marchand@redhat.com/ Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2022-10-26 11:11:03 +02:00
Cheng Jiang	fd03876e71	vhost: fix slot index in async Tx When the packet receiving failure and the DMA ring full occur simultaneously in the asynchronous vhost, the slot_idx needs to be decreased by 1. For packed virtqueue, the slot index should be ring_size - 1, if the slot_idx is currently 0, since the ring size is not necessarily the power of 2. Fixes: `84d5204310` ("vhost: support async dequeue for split ring") Fixes: `fe8477ebbd` ("vhost: support async packed ring dequeue") Cc: stable@dpdk.org Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com> Tested-by: Wei Ling <weix.ling@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2022-10-26 11:07:29 +02:00
Cheng Jiang	5c3a69879e	vhost: fix descriptor count in async packed ring When vhost receive packets from the front-end using packed virtqueue, it might use multiple descriptors for one packet, so we need calculate and record the descriptor number for each packet to update available descriptor counter and used descriptor counter, and rollback when DMA ring is full. Fixes: `fe8477ebbd` ("vhost: support async packed ring dequeue") Cc: stable@dpdk.org Signed-off-by: Cheng Jiang <cheng1.jiang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2022-10-26 11:07:29 +02:00
Changpeng Liu	830f7e7907	vhost: add non-blocking API for posting interrupt Vhost-user library locks all VQ's access lock when processing vring based messages, such as SET_VRING_KICK and SET_VRING_CALL, and the data processing thread may already be started, e.g: SPDK vhost-blk and vhost-scsi will start the data processing thread when one vring is ready, then deadlock may happen when SPDK is posting interrupts to VM. Here, we add a new API which allows caller to try again later for this case. Bugzilla ID: 1015 Fixes: `c573699830` ("vhost: fix missing virtqueue lock protection") Cc: stable@dpdk.org Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2022-10-26 10:58:48 +02:00
Xuan Ding	e8c3d496ca	vhost: introduce DMA vChannel unconfiguration Add a new API rte_vhost_async_dma_unconfigure() to unconfigure DMA vChannels in vhost async data path. Lock protection are also added to protect DMA vChannel configuration and unconfiguration from concurrent calls. Signed-off-by: Xuan Ding <xuan.ding@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>	2022-10-26 10:46:06 +02:00
Andy Pei	71151e7555	vhost: improve vDPA block device configure condition To support multi-queue, configure device after call fd of all queues are set. Signed-off-by: Andy Pei <andy.pei@intel.com> Signed-off-by: Huang Wei <wei.huang@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2022-10-26 10:40:34 +02:00
Andy Pei	c20c5e9f88	vhost: get ready with first queue of block device When boot from virtio blk device, seabios in QEMU only enables one queue. To work in this scenario, vDPA BLK device back-end configure device when the first queue is ready. Signed-off-by: Andy Pei <andy.pei@intel.com> Signed-off-by: Huang Wei <wei.huang@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2022-10-26 10:40:34 +02:00
Andy Pei	f92ab3f02c	vhost: add type to vDPA device Add type to rte_vdpa_device to store device type. Call vdpa ops get_dev_type to fill type when register vdpa device. Signed-off-by: Andy Pei <andy.pei@intel.com> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2022-10-26 10:40:34 +02:00
Chengwen Feng	c622735dd3	net/bonding: call Tx prepare before Tx burst Normally, to use the HW offloads capability (e.g. checksum and TSO) in the Tx direction, the application needs to call rte_eth_tx_prepare() to do some adjustment with the packets before sending them. But the tx_prepare callback of the bonding driver is not implemented. Therefore, the sent packets may have errors (e.g. checksum errors). However, it is difficult to design the tx_prepare callback for bonding driver. Because when a bonded device sends packets, the bonded device allocates the packets to different slave devices based on the real-time link status and bonding mode. That is, it is very difficult for the bonded device to determine which slave device's prepare function should be invoked. So in this patch, the tx_prepare callback of bonding driver is not implemented. Instead, the rte_eth_tx_prepare() will be called before rte_eth_tx_burst(). In this way, all tx_offloads can be processed correctly for all NIC devices. Note: because it is rara that bond different PMDs together, so just call tx-prepare once in broadcast bonding mode. Also the following description was added to the rte_eth_tx_burst() function: "@note This function must not modify mbufs (including packets data) unless the refcnt is 1. The exception is the bonding PMD, which does not have tx-prepare function, in this case, mbufs maybe modified." Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Min Hu (Connor) <humin29@huawei.com> Acked-by: Chas Williams <3chas3@gmail.com>	2022-10-20 08:36:34 +02:00
Kalesh AP	eb0d471a89	ethdev: add proactive error handling mode Some PMDs (e.g. hns3) could detect hardware or firmware errors, one error recovery mode is to report RTE_ETH_EVENT_INTR_RESET event, and wait for application invoke rte_eth_dev_reset() to recover the port, however, this mode has the following weaknesses: 1) Due to different hardware and software design, some NIC port recovery process requires multiple handshakes with the firmware and PF (when the port is VF). It takes a long time to complete the entire operation for one port, If multiple ports (for example, multiple VFs of a PF) are reset at the same time, other VFs may fail to be reset. (Because the reset processing is serial, the previous VFs must be processed before the subsequent VFs). 2) The impact on the application layer is great, and it should stop working queues, stop calling Rx and Tx functions, and then call rte_eth_dev_reset(), and re-setup all again. This patch introduces proactive error handling mode, the PMD will try to recover from the errors itself. In this process, the PMD sets the data path pointers to dummy functions (which will prevent the crash), and also make sure the control path operations failed with retcode -EBUSY. Because the PMD recovers automatically, the application can only sense that the data flow is disconnected for a while and the control API returns an error in this period. In order to sense the error happening/recovering, three events were introduced: 1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it detected an error and the recovery is being started. Upon receiving the event, the application should not invoke any control path APIs until receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event. 2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that it recovers successful from the error, the PMD already re-configures the port, and the effect is the same as that of the restart operation. 3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it recovers failed from the error, the port should not usable anymore. The application should close the port. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2022-10-17 08:27:18 +02:00
Chengwen Feng	0d5c38bac7	ethdev: add error handling mode to device info Currently, the defined error handling modes include: 1) NONE: it means no error handling modes are supported by this port. 2) PASSIVE: passive error handling, after the PMD detect that a reset is required, the PMD reports RTE_ETH_EVENT_INTR_RESET event, and application invoke rte_eth_dev_reset() to recover the port. Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>	2022-10-17 08:26:36 +02:00
Stephen Hemminger	04d59ab2cf	rwlock: promote trylock operations as stable These have been in for since 19.02, time to take off the experimental tag. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Marchand <david.marchand@redhat.com>	2022-10-27 13:00:11 +02:00
Stephen Hemminger	44830cc082	log: promote rte_log_list_types as stable This call was added in 21.05 so time to make it stable. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Marchand <david.marchand@redhat.com>	2022-10-27 12:59:19 +02:00
Stephen Hemminger	23ce0afd37	eal: promote interruptible epoll wait as stable This call was added in 20.11, so time to make it not experimental. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Marchand <david.marchand@redhat.com>	2022-10-27 12:58:02 +02:00
Stephen Hemminger	d2e3c4b8a2	pcapng: record received RSS hash in pcap file There is an option for recording RSS hash with packets in the pcapng standard. This implements this for all received packets. There is a corner case that can not be addressed with current DPDK API's. If using rte_flow() and some hardware it is possible to write a flow rule that uses another hash function like XOR. But there is no API that records this, or provides the algorithm info on a per-packet basis. Wireshark recently merged support for displaying the recorded hash option (for, yet to be released, version 4.1). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Tested-by: Ben Magistro <koncept1@gmail.com>	2022-10-27 10:29:59 +02:00
Markus Theil	e6b42038e8	power: fix P-state number parsing When converting atoi to strtol in a revision of introducing sysfs support for turbo percentage, a necessary check against '\n' returned by sysfs was not introduced. Fixes: `de254dac60` ("power: read P-state turbo percentage from sysfs") Signed-off-by: Markus Theil <markus.theil@secunet.com> Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>	2022-10-26 23:36:56 +02:00
Tadhg Kearney	b127e74cce	power: fix open file descriptors leak Close file pointers to Intel uncore sysfiles. Coverity issue: 381400 381397 Fixes: `60b8a661a9` ("power: add Intel uncore frequency control") Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com> Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>	2022-10-26 23:31:17 +02:00
Ali Alnubani	16de054160	lib: remove empty return types from doxygen comments Recent versions of doxygen (1.9.4 and newer) complain about documented return types for functions that don't return anything. This patch removes these return types to fix build errors similar to this one: [..] Generating doc/api/doxygen with a custom command FAILED: doc/api/html /usr/bin/python3 /path/to/doc/api/generate_doxygen.py doc/api/html /usr/bin/doxygen doc/api/doxy-api.conf /root/dpdk/lib/eal/include/rte_bitmap.h:324: error: found documented return type for rte_bitmap_prefetch0 that does not return anything (warning treated as error, aborting now) [..] Tested with doxygen versions: 1.8.13, 1.8.17, 1.9.1, and 1.9.4. Signed-off-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org>	2022-10-26 17:51:51 +02:00
Kumara Parameshwaran	72f51b097a	gro: check payload length after trim When packet is padded with extra bytes the the validation of the payload length should be done after the trim operation Fixes: `b8a55871d5` ("gro: trim tail padding bytes") Cc: stable@dpdk.org Signed-off-by: Kumara Parameshwaran <kumaraparamesh92@gmail.com> Acked-by: Jiayu Hu <jiayu.hu@intel.com>	2022-10-26 17:18:11 +02:00
Andrew Rybchenko	e6e62f6f55	mempool: flush cache completely on overflow The cache was still full after flushing. In the opposite direction, i.e. when getting objects from the cache, the cache is refilled to full level when it crosses the low watermark (which happens to be zero). Similarly, the cache should be flushed to empty level when it crosses the high watermark (which happens to be 1.5 x the size of the cache). The existing flushing behaviour was suboptimal for real applications, because crossing the low or high watermark typically happens when the application is in a state where the number of put/get events are out of balance, e.g. when absorbing a burst of packets into a QoS queue (getting more mbufs from the mempool), or when a burst of packets is trickling out from the QoS queue (putting the mbufs back into the mempool). Now, the mempool cache is completely flushed when crossing the flush threshold, so only the newly put (hot) objects remain in the mempool cache afterwards. This bug degraded performance caused by too frequent flushing. Consider this application scenario: Either, an lcore thread in the application is in a state of balance, where it uses the mempool cache within its flush/refill boundaries; in this situation, the flush method is less important, and this fix is irrelevant. Or, an lcore thread in the application is out of balance (either permanently or temporarily), and mostly gets or puts objects from/to the mempool. If it mostly puts objects, not flushing all of the objects will cause more frequent flushing. This is the scenario addressed by this fix. E.g.: Cache size=256, flushthresh=384 (1.5x size), initial len=256; application burst len=32. If there are "size" objects in the cache after flushing, the cache is flushed at every 4th burst. If the cache is flushed completely, the cache is only flushed at every 16th burst. As you can see, this bug caused the cache to be flushed 4x too frequently in this example. And when/if the application thread breaks its pattern of continuously putting objects, and suddenly starts to get objects instead, it will either get objects already in the cache, or the get() function will refill the cache. The concept of not flushing the cache completely was probably based on an assumption that it is more likely for an application's lcore thread to get() after flushing than to put() after flushing. I strongly disagree with this assumption! If an application thread is continuously putting so much that it overflows the cache, it is much more likely to keep putting than it is to start getting. If in doubt, consider how CPU branch predictors work: When the application has done something many times consecutively, the branch predictor will expect the application to do the same again, rather than suddenly do something else. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-26 12:10:33 +02:00
Morten Brørup	459531c958	mempool: fix cache flushing algorithm Fix the rte_mempool_do_generic_put() caching flushing algorithm to keep hot objects in cache instead of cold ones. The algorithm was: 1. Add the objects to the cache. 2. Anything greater than the cache size (if it crosses the cache flush threshold) is flushed to the backend. Please note that the description in the source code said that it kept "cache min value" objects after flushing, but the function actually kept the cache full after flushing, which the above description reflects. Now, the algorithm is: 1. If the objects cannot be added to the cache without crossing the flush threshold, flush some cached objects to the backend to free up required space. 2. Add the objects to the cache. The most recent (hot) objects were flushed, leaving the oldest (cold) objects in the mempool cache. The bug degraded performance, because flushing prevented immediate reuse of the (hot) objects already in the CPU cache. Now, the existing (cold) objects in the mempool cache are flushed before the new (hot) objects are added the to the mempool cache. Since nearby code is touched anyway fix flush threshold comparison to do flushing if the threshold is really exceed, not just reached. I.e. it must be "len > flushthresh", not "len >= flushthresh". Consider a flush multiplier of 1 instead of 1.5; the cache would be flushed already when reaching size objects, not when exceeding size objects. In other words, the cache would not be able to hold "size" objects, which is clearly a bug. The bug could degraded performance due to premature flushing. Since we never exceed flush threshold now, cache size in the mempool may be decreased from RTE_MEMPOOL_CACHE_MAX_SIZE * 3 to RTE_MEMPOOL_CACHE_MAX_SIZE * 2. In fact it could be CALC_CACHE_FLUSHTHRESH(RTE_MEMPOOL_CACHE_MAX_SIZE), but flush threshold multiplier is internal. Signed-off-by: Morten Brørup <mb@smartsharesystems.com> Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2022-10-26 12:09:13 +02:00
Naga Harish K S V	75c5bfc320	eventdev/eth_tx: fix queue delete To delete all the queues of an ethdev device associated with adapter instance the queue_id can be passed as -1 to the queue delete API. When a subset of queues of a ethdev device are associated, the queue delete logic is exiting without deleting the queues in some cases (higher numbered associated queues) for above scenario as the queue delete logic is not checking all the queue association status. This patch fixes this issue by checking the queue association status of all the queues of the ethernet device. Fixes: `741b499e64` ("eventdev/eth_tx: fix queue delete logic") Cc: stable@dpdk.org Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>	2022-10-21 11:42:08 +02:00
Ganapati Kundapura	8f4ff7de39	eventdev/crypto: fix multi-process Secondary process is not able to call the crypto adapter APIs stats get/reset as crypto adapter memzone memory is not accessible by secondary process. Added memzone lookup so that secondary process can call the crypto adapter APIs(stats_get etc) Fixes: `7901eac340` ("eventdev: add crypto adapter implementation") Cc: stable@dpdk.org Signed-off-by: Ganapati Kundapura <ganapati.kundapura@intel.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2022-10-21 11:42:08 +02:00
Pavan Nikhilesh	1bdfe4d76e	eventdev: increase xstats ID width to 64 bits Increase xstats ID width from 32 to 64 bits. This also fixes the xstats ID datatype discrepancy between reset and rest of the xstats family. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2022-10-21 11:42:08 +02:00
Mattias Rönnblom	ed88c5a5e4	eventdev/timer: support appropriately report idle Update the Event Timer Adapter's service function to report as idle (i.e., return -EAGAIN) in case no timer events were enqueued to the event device. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2022-10-21 11:34:42 +02:00
Mattias Rönnblom	35d052356b	eventdev/eth_tx: support appropriately report idle Update the Event Ethernet Tx Adapter's service function to report as idle (i.e., return -EAGAIN) in case no events were dequeued from the event device and no Ethernet frames were sent out on the wire. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Reviewed-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>	2022-10-21 11:34:41 +02:00
Mattias Rönnblom	7f33abd49b	eventdev/eth_rx: support appropriately report idle Update the Event Ethernet Rx Adapter's service function to report as idle (i.e., return -EAGAIN) in case no Ethernet frames were received from the ethdev and no events were enqueued to the event device. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Reviewed-by: Naga Harish K S V <s.v.naga.harish.k@intel.com> Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>	2022-10-21 11:34:41 +02:00
Mattias Rönnblom	34d785571f	eventdev/crypto: support appropriately report idle Update the event crypto adapter's service function to report as idle (i.e., return -EAGAIN) in case no crypto operations were performed. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>	2022-10-21 11:34:41 +02:00
Erik Gabriel Carrillo	329280c53e	service: fix early move to inactive status Assume thread T2 is a service lcore that is in the middle of executing a service function. Also, assume thread T1 concurrently calls rte_service_lcore_stop(), which will set the "service_active_on_lcore" state to false. If thread T1 then calls rte_service_may_be_active(), it can return zero even though T2 is still running the service function. If T1 then proceeds to free data being used by T2, a crash can ensue. Move the logic that clears the "service_active_on_lcore" state from the rte_service_lcore_stop() function to the service_runner_func() to ensure that we: - don't let the "service_active_on_lcore" state linger as 1 - don't clear the state early Fixes: `6550113be6` ("service: fix lingering active status") Cc: stable@dpdk.org Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2022-10-21 14:54:26 +02:00
Stephen Hemminger	8a0cf0c455	pdump: do not allow enable/disable in primary process Attempts to enable or disable pdump in primary process will fail with core dump because it is not valid to call rte_mp_request_sync() unless in a secondary process. Trap the error in the common code used for both enable and disable requests. Fixes: `660098d61f` ("pdump: use generic multi-process channel") Cc: stable@dpdk.org Reported-by: Sylvia Grundwürmer <sylvia.grundwuermer@b-plus.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-10-21 14:54:26 +02:00
David Marchand	eb870201b4	trace: remove limitation on directory Remove arbitrary limit on 12 characters of the file prefix used for the directory where to store the traces. Simplify the code by relying on dynamic allocations. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	477cc313a2	trace: remove limitation on trace point name The name of a trace point is provided as a constant string via the RTE_TRACE_POINT_REGISTER macro. We can rely on an explicit constant string in the binary and simply point at it. There is then no need for a (fixed size) copy. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	d4cbbee345	trace: fix metadata dump The API does not describe that metadata dump is conditioned to enabling any trace points. While at it, merge dump unit tests into the generic trace_autotest to enhance coverage. Fixes: `f6b2d65dcd` ("trace: implement debug dump") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	782dbf1791	trace: fix race in debug dump trace->nb_trace_mem_list access must be under trace->lock to avoid races with threads allocating/freeing their trace buffers. Fixes: `f6b2d65dcd` ("trace: implement debug dump") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00
David Marchand	d6fd5a018e	trace: fix dynamically enabling trace points Enabling trace points at runtime was not working if no trace point had been enabled first at rte_eal_init() time. The reason was that trace.args reflected the arguments passed to --trace= EAL option. To fix this: - the trace subsystem initialisation is updated: trace directory creation is deferred to when traces are dumped (to avoid creating directories that may not be used), - per lcore memory allocation still relies on rte_trace_is_enabled() but this helper now tracks if any trace point is enabled. The documentation is updated accordingly, - cleanup helpers must always be called in rte_eal_cleanup() since some trace points might have been enabled and disabled in the lifetime of the DPDK application, With this fix, we can update the unit test and check that a trace point callback is invoked when expected. Note: - the 'trace' global variable might be shadowed with the argument passed to the functions dealing with trace point handles. 'tp' has been used for referring to trace_point object. Prefer 't' for referring to handles, Fixes: `84c4fae462` ("trace: implement operation APIs") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Kumar Kori <skori@marvell.com>	2022-10-20 13:34:19 +02:00

1 2 3 4 5 ...

8127 Commits