FreeBSD src
Go to file
Hans Petter Selasky df1df0c742 ibcore: Do not overreact to SM LID change event.
When IPoIB receives an SM LID change event, it reacts by flushing its
path record cache and rejoining multicast groups. This is the same
behavior it performs when it receives a reregistration event. This
behavior is unnecessary as an SM may have database backup or
synchronization mechanisms which permit the SM location or LID to change
without loss of multicast membership and without impact to path records.

Both opensm and the OPA FM issue reregistration events if a new SM is
started (or restarted with a new config) or an SM event occurs which
results in loss of multicast membership records by the SM (such as
opensm failover) or the SM encounters new nodes with Active ports (such
as after joining 2 fabrics by connecting switches via ISLs). Hence this
event can be depended on as the trigger for IPoIB cache and multicast
flushing.

It appears that some drivers, such as qib, and hfi1 issue the
IB_EVENT_SM_CHANGE but other drivers such as mlx4 and mlx5 do not.
Empirical testing on Mellanox EDR using ibv_asyncwatch has confirmed
that Mellanox EDR HCAs do not generate SM change events and that opensm
does generate reregistration.

An SM LID change event is generated by the mentioned drivers to reflect
that sm_lid and/or sm_sl in the local port info has changed. The intent
of this event is to permit applications and ULPs which have a local copy
of this information (or an address handle using it) to update their
information.

The intent is that the reregistration event (caused by the SM via a bit
in Set(PortInfo)) be used to inform nodes that they need to rejoin
multicast groups, resubscribe for notices and potentially update path
records.

When an SM migrates or fails over, a SM LID change event can occur. In
response IPoIB discards path records and multicast membership and loses
connectivity until these records are restored via SA requests. In very
large fabrics, it may take minutes for the SM to be ready and for the SA
responses to be supplied.  This can result in undesirable and
unnecessary IPoIB connectivity impacts. It also can result in an
unnecessary storm of SA queries from all nodes in a cluster potentially
followed by yet another storm if the SM issues the reregistration
request.

The fact the Mellanox HCAs do not even generate this event, is further
evidence that on modern IB fabrics there will be no ill side effects
from the proposed changes below to reduce the reaction by 3 kernel
components to this event. So these changes should be benign for Mellanox
IB fabrics and will benefit OPA fabrics while also making ib_core and
ULP behavor "correct" as intended by the IBTA spec and kernel RDMA event
APIs.

Address these issues by removing IB_EVENT_SM_CHANGE handling from ipoib.
IPoIB does not locally store sm_lid nor sm_sl, so it does not need to do
anything on SM LID change. IPoIB makes use of other ib_core components
to issue SA requests for it and those components correctly track SM LID
and SM LID changes.

Also in ib_core multicast handling,  remove the test for
IB_EVENT_SM_CHANGE. This code is moving all multicast groups to the
error state, which will trigger rejoins. This code is used by IPoIB as
well as the connection manager and other clients of multicast groups.
This kernel module centralizes group membership status and joins since a
node can only join a given group once but multiple ULPs or applications
may want to join the same group. It makes use of the sa_query.c
component in ib_core, which correctly trackes SM LID and SL. This
component does not track SM LID nor SL itself and hence need not react
to their changes.

Similarly in the ib_core cache code remove the handling for the
IB_EVENT_SM_CHANGE.  In this function. The ib_cache_update function
which is ultimately called is updating local copies of the pkey table,
gid table and lmc. It does not update nor retain sm_lid nor sm_sl. As
such it does not need to be called on an SM LID change. It technically
also does not need to be called on a reregistration. The LID_CHANGE,
PKEY_CHANGE, GID_CHANGE and port state change events (PORT_ERR,
PORT_ACTICE) should be sufficient triggers.

It is worth noting that the alternative of simply having the hfi1 and
qib drivers not generate the SM LID change event was explored. While
this would duplicate what Mellanox drivers do now, it is not the correct
behavior and removes the ability for an SM to migrate without requiring
reregistration. Since both opensm and OPA SM have mechanisms to backup
or synchronize registration information, it is desirable to let them
perform SM migrations (with LID or SL changes) without requiring
reregistration when they deem it appropriate.

Linux commit:
ba7d8117f3cca8eb70d579fde3f9ec8cd6a28f39

MFC after:	1 week
Reviewed by:	kib
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-07-12 14:22:32 +02:00
.cirrus-ci Cirrus-CI: retry pkg installation on failure 2021-06-02 22:41:20 -04:00
.github [skip ci] volunteer to maintain POSIX AIO 2021-05-30 17:21:12 -06:00
bin pkgbase: Put chio in utilities 2021-06-19 17:49:44 +02:00
cddl zfs: attach zpool_influxdb to build 2021-07-07 20:15:12 +02:00
contrib awk: Reduce diffs with upstream to almost nothing. 2021-07-08 23:05:13 -06:00
crypto kerberos.8: Replace dead link 2021-05-16 01:37:09 -04:00
etc zfs: attach zpool_influxdb to build 2021-07-07 20:15:12 +02:00
gnu dialog: fix macro redefinition 2021-03-01 16:01:44 +01:00
include FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE 2021-07-06 17:39:23 -07:00
kerberos5 kerberos5: fix the WITH_OPENLDAP build 2021-01-30 00:07:50 -06:00
lib pmc(3): mandoc clean ups 2021-07-12 06:28:03 +02:00
libexec devmatch: don't announce autoloading so much 2021-07-08 15:22:22 -06:00
release release: Remove C-like string comparison operator 2021-06-30 11:13:51 -06:00
rescue Fix building rescue/rescue when sanitizers are enabled 2021-07-06 12:18:30 +01:00
sbin fsck_ffs: fix background fsck in preen mode 2021-07-11 12:47:27 -08:00
secure secure/caroot, certctl: Rename secure/caroot/blacklisted 2021-06-18 13:38:07 +01:00
share igc(4): Introduce new driver for the Intel I225 Ethernet controller. 2021-07-12 14:57:18 +10:00
stand loader: support.4th resets the read buffer incorrectly 2021-07-11 08:47:29 -06:00
sys ibcore: Do not overreact to SM LID change event. 2021-07-12 14:22:32 +02:00
targets Remove svnlite. 2021-06-11 14:56:41 -07:00
tests Skip netgraph tests when WITHOUT_NETGRAPH is set 2021-07-06 09:45:34 -04:00
tools nanobsd: enhance fill_pkg.sh 2021-07-11 09:05:16 -06:00
usr.bin grep: fix combination of quite and count flag 2021-07-09 14:09:14 +02:00
usr.sbin nfsd: Fix some issues found by mandoc 2021-07-12 06:31:54 +02:00
.arcconfig Remove history.immutable from .arcconfig 2021-04-13 12:36:25 +01:00
.arclint arc lint: ignore /tests/ in chmod 2017-12-19 03:38:06 +00:00
.cirrus.yml CI: use amd64 EDK II firmware included with QEMU 2021-06-26 14:22:48 -04:00
.clang-format clang-format: Avoid breaking after the opening paren of function definitions 2020-10-28 11:54:00 +00:00
.gitattributes Add a basic clang-format configuration file 2019-06-07 15:23:52 +00:00
.gitignore gitignore: Add .clangd and .ccls-cache 2021-06-04 16:56:08 +08:00
COPYRIGHT copyrights: Happy New Year 2021 2020-12-31 10:29:44 -05:00
LOCKS LOCKS: update current locks 2018-06-09 03:08:04 +00:00
MAINTAINERS [skip ci] volunteer to maintain POSIX AIO 2021-05-30 17:21:12 -06:00
Makefile Remove 'make update'. 2021-06-11 14:56:28 -07:00
Makefile.inc1 Remove svnlite. 2021-06-11 14:56:41 -07:00
Makefile.libcompat libpmc: always generate libpmc_events.c 2021-05-31 17:39:05 -03:00
Makefile.sys.inc AUTO_OBJ: For all top-level targets enforce using an OBJDIR. 2017-12-05 21:29:47 +00:00
ObsoleteFiles.inc ObsoleteFiles.inc: Remove manpages from OLD_FILES 2021-07-07 20:09:02 +02:00
README.md Whitespace cleanup 2021-03-12 19:57:58 +08:00
RELNOTES nfscl: Add entries to UPDATING and RELNOTES for commit a145cf3f73 2021-06-24 19:10:36 -07:00
UPDATING UPDATING: fix typo 2021-07-08 23:42:15 -06:00

FreeBSD Source:

This is the top level of the FreeBSD source directory.

FreeBSD is an operating system used to power modern servers, desktops, and embedded platforms. A large community has continually developed it for more than thirty years. Its advanced networking, security, and storage features have made FreeBSD the platform of choice for many of the busiest web sites and most pervasive embedded networking and storage devices.

For copyright information, please see the file COPYRIGHT in this directory. Additional copyright information also exists for some sources in this tree - please see the specific source directories for more information.

The Makefile in this directory supports a number of targets for building components (or all) of the FreeBSD source tree. See build(7), config(8), FreeBSD handbook on building userland, and Handbook for kernels for more information, including setting make(1) variables.

Source Roadmap:

Directory Description
bin System/user commands.
cddl Various commands and libraries under the Common Development and Distribution License.
contrib Packages contributed by 3rd parties.
crypto Cryptography stuff (see crypto/README).
etc Template files for /etc.
gnu Various commands and libraries under the GNU Public License. Please see gnu/COPYING and gnu/COPYING.LIB for more information.
include System include files.
kerberos5 Kerberos5 (Heimdal) package.
lib System libraries.
libexec System daemons.
release Release building Makefile & associated tools.
rescue Build system for statically linked /rescue utilities.
sbin System commands.
secure Cryptographic libraries and commands.
share Shared resources.
stand Boot loader sources.
sys Kernel sources.
sys/arch/conf Kernel configuration files. GENERIC is the configuration used in release builds. NOTES contains documentation of all possible entries.
tests Regression tests which can be run by Kyua. See tests/README for additional information.
tools Utilities for regression testing and miscellaneous tasks.
usr.bin User commands.
usr.sbin System administration commands.

For information on synchronizing your source tree with one or more of the FreeBSD Project's development branches, please see FreeBSD Handbook.