freebsd-dev

Author	SHA1	Message	Date
Vladimir Kondratyev	f5998d20ed	psm(4): Probe Synaptics touchpad with active multiplexing mode enabled if it is only multiplexed device. Also enable syncbit checks for them. This fixes touchpad recognition on Panasonic Toughbook CF-MX4 laptop. Reported by: Tomasz "CeDeROM" CEDRO <tomek_AT_cedro_DOT_info> MFC after: 1 month PR: 253279 Differential revision: https://reviews.freebsd.org/D28502	2021-07-14 13:30:26 +03:00
Warner Losh	c0c703342d	pccard: remove pccard device from all kernels All the PC Card drivers have been removed from the tree. Remove the pccard drivers from all the kernels. Sponsored by: Netflix	2021-07-13 20:39:31 -06:00
Warner Losh	91f046d059	pccard: remove pccard module There's no more pccard client drivers, so remove pccard driver. Sponsored by: Netflix	2021-07-13 20:39:31 -06:00
Warner Losh	27d997be97	cardbus: move card_if.m to sys/dev/cardbus Move card_if.m to sys/dev/cardbus since pccard is disappearing. Sponsored by: Netflix	2021-07-13 20:39:30 -06:00
Navdeep Parhar	f13920b39b	cxgbe(4): Skip a few more T5/T6 registers during a regdump. These registers have read side effects and a read at just the right (wrong?) time can trash some internal hw state. Obtained from: Chelsio Communications MFC after: 1 week Sponsored by: Chelsio Communications	2021-07-13 17:36:40 -07:00
Alan Cox	d411b285bc	pmap: Micro-optimize pmap_remove_pages() on amd64 and arm64 Reduce the live ranges for three variables so that they do not span the call to PHYS_TO_VM_PAGE(). This enables the compiler to generate slightly smaller machine code. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D31161	2021-07-13 17:33:23 -05:00
Mark Johnston	4a9a41650c	uart: Fix an out-of-bounds read in ns8250_bus_probe() The problem is that ns8250_bus_probe() accesses a field from the ns8250_softc, which embeds the generic UART softc, but the ns8250_softc hasn't yet been allocated because we're still probing. This is a regression from commit `0aefb0a63c`. This fixed a problem where one of the upper four IER bits, which are usually reserved, needs to be set in order to get RX interrupts before the RX FIFO is full. At the same time, we avoid clearing those reserved bits (see commit `58957d8717`, though other UART drivers I looked at do not bother with this). So, copy what ns8250_init() does to disable interrupts, since we don't know what the "right" mask is at this point. Reported by: syzbot+f256beefd0df9eb796e7@syzkaller.appspotmail.com Reviewed by: imp MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31124	2021-07-13 17:49:39 -04:00
Mark Johnston	2783335cae	blist: Correct the node count computed in blist_create() Commit `bb4a27f927` added the ability to allocate a span of blocks crossing a meta node boundary. To ensure that blst_next_leaf_alloc() does not walk past the end of the tree, an extra all-zero meta node needs to be present at the end of the allocation, and blst_next_leaf_alloc() is implemented such that the presence of this node terminates the search. blist_create() computes the number of nodes required. It had two problems: 1. When the size of the blist is a power of BLIST_RADIX, we would unnecessarily allocate an extra level in the tree. 2. When the size of the blist is a multiple of BLIST_RADIX, we would fail to allocate a terminator node. In this case, blst_next_leaf_alloc() could scan beyond the bounds of the allocation. This was found using KASAN. Modify blist_create() to handle these cases correctly. Reported by: pho Reviewed by: dougm MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31158	2021-07-13 17:47:27 -04:00
Mark Johnston	45e2357113	malloc: Pass the allocation size to malloc_large() by value Its callers do not make use the modified size that malloc_large() was returning, so there's no need to pass a pointer. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-13 17:47:02 -04:00
Mark Johnston	39552dff7b	graid3: Zero the metadata block before writing Ensure that string buffers and pad bytes are zero-filled before writing graid3 metadata. Reported by: KMSAN MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-13 17:46:02 -04:00
Mark Johnston	0f09ab89cc	gconcat: Zero the metadata block before writing Ensure that string buffers and pad bytes are zero-filled before writing gconcat metadata. Also make sure to zero the full block buffer before encoding the metadata and writing. Fix some style bugs in g_concat_write_metadata() while here. Reported by: KMSAN MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-13 17:45:59 -04:00
Mark Johnston	7f053a44ae	gmirror: Zero the metadata block before writing The mirror metadata fields contain string buffers and pad bytes, neither were being zeroed before metadata was written to disk. Also, the metadata structure is smaller than the sector size, and in one case gmirror was failing to zero-fill the full buffer before writing. Fix these problems by pre-zeroing the metadata structure and the sector buffer. Reported by: KMSAN MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-13 17:45:57 -04:00
Mark Johnston	b9ca419a21	fifo: Explicitly initialize generation numbers when opening The fi_rgen and fi_wgen fields are generation numbers used when sleeping waiting for the other end of the fifo to be opened. The fields were not explicitly initialized after allocation, but this was harmless. To avoid false positives from KMSAN, though, ensure that they get initialized to zero. Reported by: KMSAN MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-07-13 17:45:49 -04:00
Warner Losh	a065ccb280	cam_iosched: use tunable flag and make a bool really a bool kern.cam.do_dynamic_iosched is really a bool, so change its type to bool. While I'm here, also use the CTLFLAG_TUN flag instead of a separate tunable line for it and kern.cam.iosched_alpha_bits. MFC After: 1 week Sponsored by: Netflix	2021-07-13 14:13:21 -06:00
Young Xiao	431ddd9436	Fix potential NULL pointer dereference of device physical path In ata_dev_advinfo() and nvme_dev_advinfo(), if the physical path is being stored and there is a malloc failure (malloc(9) is called with M_NOWAIT), we could wind up in a situation where the device's physpath_len is set to the length the user provided, but the physpath itself is NULL. If another context then comes in to fetch the physical path value, we would wind up trying to memcpy a NULL pointer into the caller's buffer. So, set the physpath_len to 0 when we free the physpath on entry into the store case for the physical path. Reset the length to a non-zero value only after we've successfully malloced a buffer to hold it. This code mirrors scsi_xpt.c does already as well. Signed-off-by: Young Xiao <92siuyang@gmail.com> Reviewed by: imp PR: 238014	2021-07-13 14:13:21 -06:00
Ka Ho Ng	b5c74dfd64	vmm: Fix AMD-vi using wrong rid range The ACPI parsing code around rid range was wrong on assuming there is only one pair of start/end device id range. Besides, ivhd_dev_parse() never work as supposed. The start/end rid info was always zero. Restructure the code to build dynamic-sized tables for each IOMMU softc holding device entries. The device entries are enumerated to find a suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU unit itself) are no-op from now on. There are also a minor fix on wrong %b formatting string usage. Tested on my EPYC 7282. Sponsored by: The FreeBSD Foundation Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D30827	2021-07-14 01:53:10 +08:00
Randall Stewart	ca1a7e1021	tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly. In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into LRO without making sure that the checksum passes. Some drivers actually are aware of this and do not call lro when the csum failed, others do not do this and thus could end up sending data up that we think has a checksum passing when it does not. This change will fix that situation by properly verifying that the mbuf has the correct markings (CSUM VALID bits as well as csum in mbuf header is set to 0xffff). Reviewed by: tuexen, hselasky, gallatin Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31155	2021-07-13 12:45:15 -04:00
Rajesh Kumar M A	0fd05b0173	Fix to call callout_init with correct inputs from axp driver Approved by: vmaffione, gallatin Reviewed by: hselasky, vmaffione, gallatin Differential Revision: https://reviews.freebsd.org/D31104 MFC after: 1 week	2021-07-13 14:38:31 +00:00
Edward Tomasz Napierala	3eaf271d3c	linux(4): Improve comment about SA_RESTORER No functional changes. Sponsored By: EPSRC	2021-07-13 11:13:17 +01:00
Edward Tomasz Napierala	84a3963d5d	linux(4): remove unfinished vsyscall bits on arm64 The vsyscall mechanism is obsolete. Reviewed By: dchagin, emaste Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D31091	2021-07-13 09:52:18 +00:00
Navdeep Parhar	3c900106ea	cxgbe(4): Update firmwares to 1.26.0.0. Changes since 1.25.6.0 are listed here. This list comes from the Release Notes for "Chelsio Unified Wire 3.14.0.4 for Linux" dated 2021-07-08. Fixes ----- BASE: - Wait 5ms before and after the i2c command that clears the mod_select. This fixes incorrect port module type read from i2c. Obtained from: Chelsio Communications MFC after: 1 week Sponsored by: Chelsio Communications	2021-07-12 21:25:36 -07:00
Martin Matuska	5eb61f6c65	zfs: merge openzfs/zfs@07a4c76e9 (master) into main Notable upstream pull request merges: #12299 file reference counts can get corrupted #12320 FreeBSD: Use unmapped I/O for scattered/gang ABD buffers Obtained from: OpenZFS OpenZFS commit: `07a4c76e90`	2021-07-12 23:24:45 +02:00
Andriy Gapon	66c183f43f	mmc_cam_sim_default_action: do not touch the ccb after dispatching it If MMC_SIM_CAM_REQUEST() is successful the ccb could be running or being completed as the method returns. Modifying the ccb status could override whatever status was already set by a MMC driver. I am not sure what was the purpose of setting the status to CAM_REQ_INVALID in the success path. I assume that it was to catch a possibility that the ccb could be completed without its status explicitly set. So, I am keeping the code, it's just moved to before the MMC_SIM_CAM_REQUEST call. Without this change I was getting random and phantom EIO errors on Rock64 running off an SD card (dwmmc driver) plus occasional panics like: Memory modified after free 0xffffa00003985800(2040) val=6 @ 0xffffa00003985854 panic: Most recently used by CAM CCB MFC after: 1 week	2021-07-12 21:29:26 +03:00
Hans Petter Selasky	693ddf4dc4	Fix LINT kernel build issues after `c3987b8ea7` . Fixes the IPOIB_CM and SDP kernel options. Reported by: lwhsu @ MFC after: 1 week Sponsored by: NVIDIA Networking	2021-07-12 18:00:30 +02:00
Hans Petter Selasky	cd2c05d323	ipoib: Fix for accessing uninitialized pointers and freed memory during attach and detach. Call infiniband_ifdetach() early to stop ifioctl(9) calls from user-space during device removal. Also make sure that ifioctl(9) calls are blocked from executing until the device is fully initialized. Ideally we would delay the infiniband_ifattach() call, but because part of the initialization is to update the link level address, that is not possible without more significant changes. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 15:01:19 +02:00
Hans Petter Selasky	7c3eff94bd	mlx5: Numa domain improvements. Properly allocate all mlx5en(4) structures from correct numa domain. While at it cleanup unused numa domain integers deriving from the Linux version of mlx5en(4). MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:52:45 +02:00
Hans Petter Selasky	cbf6911e10	mlx5: Fix for uninitialized "uid" field. Make sure the "uid" field gets properly set when destroying DCT and QP objects by making a copy of the field when creating such objects. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:38:51 +02:00
Hans Petter Selasky	c8301cbb0f	mlx4: Map core_clock page to user space only when allowed Currently when we map the hca_core_clock page to the user space, there are vulnerable registers, one of which is semaphore, on this page as well. If user read the wrong offset, it can modify the above semaphore and hang the device. Hence, mapping the hca_core_clock page to the user space only when user required it specifically. After this patch, mlx4 core_clock won't be mapped to user space by default. Oppose to current state, where mlx4 core_clock is always mapped to user space. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:35 +02:00
Hans Petter Selasky	c8d16d1e08	mlx5en: Allow binding channels to CPUs when RSS is not enabled. MFC after: 1 week Submitted by: Netflix Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:34 +02:00
Hans Petter Selasky	f60da09dbb	ibcore: Add some functions and definitions for selecting and querying retryable ucontext cleanup. Linux commit: 1c77483e4c50339b0306572167ccbff6b55d051b MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:34 +02:00
Hans Petter Selasky	9dfa21486e	mlx5en: Allocate per-channel doorbells. To avoid congestion on the same PCI memory register space when traffic consists mostly of small packets. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:34 +02:00
Hans Petter Selasky	3a934ba7a3	mlx5en: Wait for all TLS connections to terminate when unloading driver. The driver expects all TLS tags to be returned to the driver before it can free the UMA zone where the TLS tags reside. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:34 +02:00
Hans Petter Selasky	30416d4e82	mlx4ib and mlx5ib: Set slid to zero in Ethernet completion struct IB spec says that a lid should be ignored when link layer is Ethernet, for example when building or parsing a CM request message (CA17-34). However, since ib_lid_be16() and ib_lid_cpu16() validates the slid, not only when link layer is IB, we set the slid to zero to prevent false warnings in the kernel log. Linux commit: 65389322b28f81cc137b60a41044c2d958a7b950 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:34 +02:00
Hans Petter Selasky	de2437f199	mlx5en: Configure relaxed PCI read and write ordering for ethernet. This may improve performance in some configurations. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:34 +02:00
Hans Petter Selasky	4692d9808e	mlx5en: Check for pci_channel_offline() when draining sendqueue. This speeds up detach in hypervisor environments. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	8abf5ac0e6	mlx5ib: Implement support for enabling and disabling RoCE ECN. RoCE is short for Remote direct memory access over Converged Ethernet. ECN is short for Explicit Congestion Notification. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	42f719d611	mlx5ib: Extend parameter macros so that more arguments may be added. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	e787b5acb1	mlx5core: Don't query the PCI config space for offline during a firmware command. Querying the PCI config space for offline for every firmware command blocks the PCI bus and affects performance. Especially for packet pacing and TLS when objects are frequently created and destroyed. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	c3987b8ea7	ibcore: Declare ib_post_send() and ib_post_recv() arguments const Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send\|recv\|srq_recv) really do not modify the posted work request. Linux commit: f696bf6d64b195b83ca1bdb7cd33c999c9dcf514 7bb1fafc2f163ad03a2007295bb2f57cfdbfb630 d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	4fb0a74e08	mlx5: Set default timestamp format for mlx5en(4) and mlx5ib. MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	915fc66cb5	mlx5: Add new timestamp mode bits. These fields declare which timestamp mode is supported by the device per RQ/SQ/QP. In addition add the ts_format field to the select the mode for RQ/SQ/QP. Linux commit: a6a217dddcd544f6b75f0e2a60b6e84c1d494b7e MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	79b817084c	ibcore: Implement ib_uverbs_get_ucontext_file(). Expose ib_ucontext from a given ib_uverbs_file. Drivers that use the ioctl(9) API may have the ib_uverbs_file and need a way to get the related ib_ucontext from it, this is enabled by this patch. Downstream patches from this series will use it. Linux commit: 7dc08dcfc8c86cb4457e383734ff6844ddaff876 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:33 +02:00
Hans Petter Selasky	05f4691919	ibcore: Clean up INIT_UDATA() and INIT_UDATA_BUF_OR_NULL() macro usage. We get a harmless warning about the fact that we use the result of a multiplication as a condition in INIT_UDATA_BUF_OR_NULL(): uverbs_main.c: In function 'ib_uverbs_write': error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] This avoids the problem by using an inline function in place of the macro. After changing INIT_UDATA_BUF_OR_NULL() to an inline function, do the same change to INIT_UDATA() for consistency. Using an inline function gives us better type safety here among other issues with macros. I'm using u64_to_user_ptr() to convert the user pointer to simplify the logic rather than adding lots of new type casts. Linux commit: 12f727721eee61b3d19dedb95cb893b2baa9fe41 40a203396cc1c239f2e71c47c66ed03097123d2c MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	d92a9e5604	ibcore: Simplify ib_modify_qp_is_ok(). All callers to ib_modify_qp_is_ok() provides enum ib_qp_state makes the checks of out-of-scope redundant. Let's remove them together with updating function signature to return boolean result. While at it remove unused "ll" parameter from ib_modify_qp_is_ok(). Linux commit: 19b1f54099b6ee334acbfbcfbdffd1d1f057216d d31131bba5a1630304c55ea775c48cc84912ab59 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	0c13880ccc	ibcore: Support rate limit for packet pacing Add new member rate_limit to ib_qp_attr which holds the packet pacing rate in kbps, 0 means unlimited. IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW QPs when changing QP state from RTR to RTS, RTS to RTS. Linux commit: 528e5a1bd3f0e9b760cb3a1062fce7513712a15d MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	3d2fb36a9c	ibcore: Add new IB rates. Add the new rates that were added to Infiniband spec as part of HDR and 2x support. Linux commit: a5a5d1993696419e7d5357fc3128e53d219d382e MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	721b795b72	ibcore: Don't allocate method table, if already present. This commit aligns the code in question with upstream Linux. Linux commit: 2468b82d69e3a53d024f28d79ba0fdb8bf43dfbf MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	c6ccb08686	ibcore: Fix a use-after-free in ucma_resolve_ip(). There is a race condition between ucma_close() and ucma_resolve_ip(): CPU0 CPU1 ucma_resolve_ip(): ucma_close(): ctx = ucma_get_ctx(file, cmd.id); list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { mutex_lock(&mut); idr_remove(&ctx_idr, ctx->id); mutex_unlock(&mut); ... mutex_lock(&mut); if (!ctx->closing) { mutex_unlock(&mut); rdma_destroy_id(ctx->cm_id); ... ucma_free_ctx(ctx); } ret = rdma_resolve_addr(); ucma_put_ctx(ctx); Before idr_remove(), ucma_get_ctx() could still find the ctx and after rdma_destroy_id(), rdma_resolve_addr() may still access id_priv pointer. Also, ucma_put_ctx() may use ctx after ucma_free_ctx() too. ucma_close() should call ucma_put_ctx() too which tests the refcnt and waits for the last one releasing it. The similar pattern is already used by ucma_destroy_id(). Linux commit: 5fe23f262e0548ca7f19fb79f89059a60d087d22 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	20fea7ac64	ibcore: Define option to set ack timeout. Define new option in 'rdma_set_option' to override calculated QP timeout when requested to provide QP attributes to modify a QP. At the same time, pack tos_set to be bitfield. Linux commit: 2c1619edef61a03cb516efaa81750784c3071d10 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00
Hans Petter Selasky	df1df0c742	ibcore: Do not overreact to SM LID change event. When IPoIB receives an SM LID change event, it reacts by flushing its path record cache and rejoining multicast groups. This is the same behavior it performs when it receives a reregistration event. This behavior is unnecessary as an SM may have database backup or synchronization mechanisms which permit the SM location or LID to change without loss of multicast membership and without impact to path records. Both opensm and the OPA FM issue reregistration events if a new SM is started (or restarted with a new config) or an SM event occurs which results in loss of multicast membership records by the SM (such as opensm failover) or the SM encounters new nodes with Active ports (such as after joining 2 fabrics by connecting switches via ISLs). Hence this event can be depended on as the trigger for IPoIB cache and multicast flushing. It appears that some drivers, such as qib, and hfi1 issue the IB_EVENT_SM_CHANGE but other drivers such as mlx4 and mlx5 do not. Empirical testing on Mellanox EDR using ibv_asyncwatch has confirmed that Mellanox EDR HCAs do not generate SM change events and that opensm does generate reregistration. An SM LID change event is generated by the mentioned drivers to reflect that sm_lid and/or sm_sl in the local port info has changed. The intent of this event is to permit applications and ULPs which have a local copy of this information (or an address handle using it) to update their information. The intent is that the reregistration event (caused by the SM via a bit in Set(PortInfo)) be used to inform nodes that they need to rejoin multicast groups, resubscribe for notices and potentially update path records. When an SM migrates or fails over, a SM LID change event can occur. In response IPoIB discards path records and multicast membership and loses connectivity until these records are restored via SA requests. In very large fabrics, it may take minutes for the SM to be ready and for the SA responses to be supplied. This can result in undesirable and unnecessary IPoIB connectivity impacts. It also can result in an unnecessary storm of SA queries from all nodes in a cluster potentially followed by yet another storm if the SM issues the reregistration request. The fact the Mellanox HCAs do not even generate this event, is further evidence that on modern IB fabrics there will be no ill side effects from the proposed changes below to reduce the reaction by 3 kernel components to this event. So these changes should be benign for Mellanox IB fabrics and will benefit OPA fabrics while also making ib_core and ULP behavor "correct" as intended by the IBTA spec and kernel RDMA event APIs. Address these issues by removing IB_EVENT_SM_CHANGE handling from ipoib. IPoIB does not locally store sm_lid nor sm_sl, so it does not need to do anything on SM LID change. IPoIB makes use of other ib_core components to issue SA requests for it and those components correctly track SM LID and SM LID changes. Also in ib_core multicast handling, remove the test for IB_EVENT_SM_CHANGE. This code is moving all multicast groups to the error state, which will trigger rejoins. This code is used by IPoIB as well as the connection manager and other clients of multicast groups. This kernel module centralizes group membership status and joins since a node can only join a given group once but multiple ULPs or applications may want to join the same group. It makes use of the sa_query.c component in ib_core, which correctly trackes SM LID and SL. This component does not track SM LID nor SL itself and hence need not react to their changes. Similarly in the ib_core cache code remove the handling for the IB_EVENT_SM_CHANGE. In this function. The ib_cache_update function which is ultimately called is updating local copies of the pkey table, gid table and lmc. It does not update nor retain sm_lid nor sm_sl. As such it does not need to be called on an SM LID change. It technically also does not need to be called on a reregistration. The LID_CHANGE, PKEY_CHANGE, GID_CHANGE and port state change events (PORT_ERR, PORT_ACTICE) should be sufficient triggers. It is worth noting that the alternative of simply having the hfi1 and qib drivers not generate the SM LID change event was explored. While this would duplicate what Mellanox drivers do now, it is not the correct behavior and removes the ability for an SM to migrate without requiring reregistration. Since both opensm and OPA SM have mechanisms to backup or synchronize registration information, it is desirable to let them perform SM migrations (with LID or SL changes) without requiring reregistration when they deem it appropriate. Linux commit: ba7d8117f3cca8eb70d579fde3f9ec8cd6a28f39 MFC after: 1 week Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking	2021-07-12 14:22:32 +02:00

1 2 3 4 5 ...

138119 Commits