numam-dpdk

Author	SHA1	Message	Date
Stephen Hemminger	06c047b680	remove unnecessary null checks Functions like free, rte_free, and rte_mempool_free already handle NULL pointer so the checks here are not necessary. Remove redundant NULL pointer checks before free functions found by nullfree.cocci Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-02-12 12:07:48 +01:00
Feifei Wang	f0f7c557f3	net/mlx4: remove barrier for memory region cache 'dev_gen' is a variable to trigger all cores to flush their local caches once the global MR cache has been rebuilt. This is due to MR cache's R/W lock can maintain synchronization between threads: 1. dev_gen and global cache updating ordering inside the lock protected section does not matter. Because other threads cannot take the lock until global cache has been updated. Thus, in out of order platform, even if other agents firstly observe updated dev_gen but global does not update, they still have to wait the lock. As a result, it is unnecessary to add a wmb between global cache rebuilding and updating the dev_gen to keep the memory store order. 2. Store-Release of unlock provides the implicit wmb at the level visible by software. This makes 'rebuilding global cache' and 'updating dev_gen' be observed before local_cache starts to be updated by other agents. Thus, wmb after 'updating dev_gen' can be removed. Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com> Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>	2021-06-23 17:02:35 +02:00
Alexander Kozyrev	8e08df22f0	net/mlx4: improve assert control Use the MLX4_ASSERT macros instead of the standard assert clause. Depends on the RTE_LIBRTE_MLX4_DEBUG configuration option to define it. If RTE_LIBRTE_MLX4_DEBUG is enabled MLX4_ASSERT is equal to RTE_VERIFY to bypass the global CONFIG_RTE_ENABLE_ASSERT option. If RTE_LIBRTE_MLX4_DEBUG is disabled, the global CONFIG_RTE_ENABLE_ASSERT can still make this assert active by calling RTE_VERIFY inside RTE_ASSERT. Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:21 +01:00
Alexander Kozyrev	e99fdaa7a3	net/mlx4: remove NDEBUG flag Use the RTE_LIBRTE_MLX4_DEBUG compilation flag to get rid of dependency on the NDEBUG definition. This is a preparation step to switch from standard assert clauses to DPDK RTE_ASSERT ones in MLX4 driver. Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2020-02-05 09:51:20 +01:00
David Marchand	8ac3591694	remove useless include of EAL memory config header Restrict this header inclusion to its real users. Fixes: `028669bc9f` ("eal: hide shared memory config") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-10-09 10:22:24 +02:00
Anatoly Burakov	76f80881ef	mem: add API to lock/unlock memory hotplug Currently, the memory hotplug is locked automatically by all memory-related _walk() functions, but sometimes locking the memory subsystem outside of them is needed. There is no public API to do that, so it creates a dependency on shared memory config to be public. Fix this by introducing a new API to lock/unlock the memory hotplug subsystem. Create a new common file for all things mem config, and a new API namespace rte_mcfg_*, and search-and-replace all usages of the locks with the new API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Marchand <david.marchand@redhat.com>	2019-07-05 22:12:40 +02:00
Viacheslav Ovsiienko	897dbd3c86	net/mlx4: fix memory region cleanup mlx4 driver has a global list of Memory Regions created by device, and there is a ml4_mr_release() routine which makes a memory cleanup at device closing. The head of device MR list was fetched outside the rwlock protected section. Also some noticed typos are fixed. Fixes: `9797bfcce1` ("net/mlx4: add new memory region support") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2019-04-19 14:51:55 +02:00
Yongseok Koh	0b259b8e96	net/mlx4: enable secondary process to register DMA memory The Memory Region (MR) for DMA memory can't be created from secondary process due to lib/driver limitation. Whenever it is needed, secondary process can make a request to primary process through the EAL IPC channel (rte_mp_msg) which is established on initialization. Once a MR is created by primary process, it is immediately visible to secondary process because the MR list is global per a device. Thus, secondary process can look up the list after the request is successfully returned. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	f4efc0eb97	net/mlx4: add control of excessive memory pinning by kernel A new PMD parameter (mr_ext_memseg_en) is added to control extension of memseg when creating a MR. It is enabled by default. If enabled, mlx4_mr_create() tries to maximize the range of MR registration so that the LKey lookup tables on datapath become smalle and get the best performance. However, it may worsen memory utilization because registered memory is pinned by kernel driver. Even if a page in the extended chunk is freed, that doesn't become reusable until the entire memory is freed and the MR is destroyed. To make freed pages available immediately, this parameter has to be turned off but it could drop performance. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	3d1f3c7c83	net/mlx: remove debug messages on datapath Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	0203d33a10	net/mlx4: support secondary process In order to support secondary process, a few features are required. a) rdma-core library should allocate device resources using DPDK's memory allocator. b) UAR should be remapped for secondary processes. Currently, in order not to use different data structure for secondary processes, PMD tries to reserve identical virtual address space for both primary and secondary processes. c) IPC channel is necessary, which can be easily set with rte_mp APIs. Through the channel, Verbs command FD is delivered to the secondary process and the device stop/start event is also broadcast from primary process. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Yongseok Koh	099c2c5376	net/mlx4: change device reference for secondary process rte_eth_devices[] is not shared between primary and secondary process, but a static array to each process. The reverse pointer of device (priv->dev) becomes invalid if mlx4 supports secondary process. Instead, priv has the pointer to shared data of the device, struct rte_eth_dev_data *dev_data; Two macros are added, #define PORT_ID(priv) ((priv)->dev_data->port_id) #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)]) Cc: stable@dpdk.org Suggested-by: Raslan Darawsheh <rasland@mellanox.com> Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-04-05 17:45:22 +02:00
Thomas Monjalon	dbeba4cf18	net/mlx: prefix private structure The private structure stored in rte_eth_dev->data->dev_private was named "struct priv". In order to ease code browsing, the structure is renamed "struct mlx[45]_priv". Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2019-03-01 18:17:35 +01:00
Yongseok Koh	1948776360	net/mlx4: optimize Tx external memory registration There's some performance drop due to extra condition checks on the datapath. Checking for external memory registration should be consolidated to the existing bottom-half. Fixes: `31912d9924` ("net/mlx4: support externally allocated static memory") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-11-16 10:45:37 +01:00
Ali Alnubani	d924d6b964	net/mlx4: fix initialization of struct members This patch fixes compilation errors with meson and the clang compiler caused by some of the struct members not being initialized. ``` ../drivers/net/mlx4/mlx4_mr.c:357:37: error: missing field 'end' initializer [-Werror,-Wmissing-field-initializers] struct mlx4_mr_cache entry = { 0, }; ^ ../drivers/net/mlx4/mlx4_mr.c:401:36: error: missing field 'end' initializer [-Werror,-Wmissing-field-initializers] struct mlx4_mr_cache ret = { 0, }; ^ ../drivers/net/mlx4/mlx4_mr.c:691:35: error: missing field 'end' initializer [-Werror,-Wmissing-field-initializers] struct mlx4_mr_cache ret = { 0, }; ^ ``` The compilation errors reproduce with clang version 3.4.2 (tags/RELEASE_34/dot2-final) on RHEL. Fixes: `9797bfcce1` ("net/mlx4: add new memory region support") Cc: stable@dpdk.org Signed-off-by: Ali Alnubani <alialnu@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-11-15 23:54:53 +01:00
Ali Alnubani	96c0cc17fc	net/mlx4: fix minor typo Fixes: `9797bfcce1` ("net/mlx4: add new memory region support") Cc: stable@dpdk.org Signed-off-by: Ali Alnubani <alialnu@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2018-11-15 23:54:53 +01:00
Yongseok Koh	31912d9924	net/mlx4: support externally allocated static memory When MLX PMD registers memory for DMA, it accesses the global memseg list of DPDK to maximize the range of registration so that LKey search can be more efficient. Granularity of MR registration is per page. Externally allocated memory shouldn't be used for DMA because it can't be searched in the memseg list and free event can't be tracked by DPDK. If it is used, the following error will occur: net_mlx5: port 0 unable to find virtually contiguous chunk for address (0x5600017587c0). rte_memseg_contig_walk() failed. There's a pending patchset [1] which enables externally allocated memory. Once it is merged, users can register their own memory out of EAL then that will resolve this issue. Meanwhile, if the external memory is static (allocated on startup and never freed), such memory can also be registered by little tweak in the code. [1] http://patches.dpdk.org/project/dpdk/list/?series=1415 This patch is not a bug fix but needs to be included in stable versions. Fixes: `9797bfcce1` ("net/mlx4: add new memory region support") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-10-11 18:53:49 +02:00
Yongseok Koh	9797bfcce1	net/mlx4: add new memory region support This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There are multiple layers for MR search. L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized array by linear search. L0/L1 is in an inline function - mlx4_mr_lookup_cache(). If L1 misses, the bottom-half function is called to look up the address from the bigger local cache of the queue. This is L2 - mlx4_mr_addr2mr_bh() and it is not an inline function. Data structure for L2 is the Binary Tree. If L2 misses, the search falls into the slowest path which takes locks in order to access global device cache (priv->mr.cache) which is also a B-tree and caches the original MR list (priv->mr.mr_list) of the device. Unless the global cache is overflowed, it is all-inclusive of the MR list. This is L3 - mlx4_mr_lookup_dev(). The size of the L3 cache table is limited and can't be expanded on the fly due to deadlock. Refer to the comments in the code for the details - mr_lookup_dev(). If L3 is overflowed, the list will have to be searched directly bypassing the cache although it is slower. If L3 misses, a new MR for the address should be created - mlx4_mr_create(). When it creates a new MR, it tries to register adjacent memsegs as much as possible which are virtually contiguous around the address. This must take two locks - memory_hotplug_lock and priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any allocation/free of memory inside. In the free callback of the memory hotplug event, freed space is searched from the MR list and corresponding bits are cleared from the bitmap of MRs. This can fragment a MR and the MR will have multiple search entries in the caches. Once there's a change by the event, the global cache must be rebuilt and all the per-queue caches will be flushed as well. If memory is frequently freed in run-time, that may cause jitter on dataplane processing in the worst case by incurring MR cache flush and rebuild. But, it would be the least probable scenario. To guarantee the most optimal performance, it is highly recommended to use an EAL option - '--socket-mem'. Then, the reserved memory will be pinned and won't be freed dynamically. And it is also recommended to configure per-lcore cache of Mempool. Even though there're many MRs for a device or MRs are highly fragmented, the cache of Mempool will be much helpful to reduce misses on per-queue caches anyway. '--legacy-mem' is also supported. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-05-14 22:31:52 +01:00
Yongseok Koh	2d684b911d	net/mlx4: remove memory region support This patch removes current support of Memory Region (MR) in order to accommodate the dynamic memory hotplug patch. This patch can be compiled but traffic can't flow and HW will raise faults. Subsequent patches will add new MR support. Signed-off-by: Yongseok Koh <yskoh@mellanox.com>	2018-05-14 22:31:51 +01:00
Yongseok Koh	96525b9e19	net/mlx4: fix alignment of memory region The memory region is [start, end), so if the memseg of 'end' isn't allocated yet, the returned memseg will have zero entries and this will make 'end' zero (nil). Fixes: `c2fe582322` ("net/mlx4: use virt2memseg instead of iteration") Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 15:54:56 +01:00
Anatoly Burakov	66cc45e293	mem: replace memseg with memseg lists Before, we were aggregating multiple pages into one memseg, so the number of memsegs was small. Now, each page gets its own memseg, so the list of memsegs is huge. To accommodate the new memseg list size and to keep the under-the-hood workings sane, the memseg list is now not just a single list, but multiple lists. To be precise, each hugepage size available on the system gets one or more memseg lists, per socket. In order to support dynamic memory allocation, we reserve all memory in advance (unless we're in 32-bit legacy mode, in which case we do not preallocate memory). As in, we do an anonymous mmap() of the entire maximum size of memory per hugepage size, per socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the smaller one), split over multiple lists (which are limited to either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST megabytes per list, whichever is the smaller one). There is also a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly used for 32-bit targets to limit amounts of preallocated memory, but can be used to place an upper limit on total amount of VA memory that can be allocated by DPDK application. So, for each hugepage size, we get (by default) up to 128G worth of memory, per socket, split into chunks of up to 32G in size. The address space is claimed at the start, in eal_common_memory.c. The actual page allocation code is in eal_memalloc.c (Linux-only), and largely consists of copied EAL memory init code. Pages in the list are also indexed by address. That is, in order to figure out where the page belongs, one can simply look at base address for a memseg list. Similarly, figuring out IOVA address of a memzone is a matter of finding the right memseg list, getting offset and dividing by page size to get the appropriate memseg. This commit also removes rte_eal_dump_physmem_layout() call, according to deprecation notice [1], and removes that deprecation notice as well. On 32-bit targets due to limited VA space, DPDK will no longer spread memory to different sockets like before. Instead, it will (by default) allocate all of the memory on socket where master lcore is. To override this behavior, --socket-mem must be used. The rest of the changes are really ripple effects from the memseg change - heap changes, compile fixes, and rewrites to support fbarray-backed memseg lists. Due to earlier switch to _walk() functions, most of the changes are simple fixes, however some of the _walk() calls were switched to memseg list walk, where it made sense to do so. Additionally, we are also switching locks from flock() to fcntl(). Down the line, we will be introducing single-file segments option, and we cannot use flock() locks to lock parts of the file. Therefore, we will use fcntl() locks for legacy mem as well, in case someone is unfortunate enough to accidentally start legacy mem primary process alongside an already working non-legacy mem-based primary process. [1] http://dpdk.org/dev/patchwork/patch/34002/ Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:39 +02:00
Anatoly Burakov	c2fe582322	net/mlx4: use virt2memseg instead of iteration Reduce dependency on internal details of EAL memory subsystem, and simplify code. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:00 +02:00
Shahaf Shuler	5feecc57d9	align SPDX Mellanox copyrights Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-11 01:47:47 +02:00
Olivier Matz	82092c8734	net/mlx4: use SPDX tags in 6WIND copyrighted files Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-02-01 02:33:04 +01:00
Adrien Mazarguil	4eba244b78	net/mlx4: move rdma-core calls to separate file This lays the groundwork for externalizing rdma-core as an optional run-time dependency instead of a mandatory one. No functional change. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-01-31 20:57:29 +01:00
Adrien Mazarguil	0d03353077	net/mlx4: share memory region resources Memory regions assigned to hardware and used during Tx/Rx are mapped to mbuf pools. Each Rx queue creates its own MR based on the mempool provided during queue setup, while each Tx queue looks up and registers MRs for all existing mbuf pools instead. Since most applications use few large mbuf pools (usually only a single one per NUMA node) common to all Tx/Rx queues, the above approach wastes hardware resources due to redundant MRs. This negatively affects performance, particularly with large numbers of queues. This patch therefore makes the entire MR registration common to all queues using a reference count. A spinlock is added to protect against asynchronous registration that may occur from the Tx side where new mempools are discovered based on mbuf data. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2017-11-03 21:30:41 +01:00
Ophir Munk	326d2cdf7b	net/mlx4: associate MR to MP in a short function Associate memory region to mempool (on data path) in a short function. Handle the less common case of adding a new memory region to mempool in a separate function. Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2017-11-03 20:22:07 +01:00
Adrien Mazarguil	655588afc8	net/mlx4: separate memory management functions No impact on functionality. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2017-10-06 02:49:48 +02:00

28 Commits