the failover protocol is supported due to limitations in the IPoIB
architecture. Refer to the lagg(4) manual page for how to configure
and use this new feature. A new network interface type,
IFT_INFINIBANDLAG, has been added, similar to the existing
IFT_IEEE8023ADLAG .
ifconfig(8) has been updated to accept a new laggtype argument when
creating lagg(4) network interfaces. This new argument is used to
distinguish between ethernet and infiniband type of lagg(4) network
interface. The laggtype argument is optional and defaults to
ethernet. The lagg(4) command line syntax is backwards compatible.
Differential Revision: https://reviews.freebsd.org/D26254
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
r286700 added the "lacp_fast_timeout" option to `ifconfig', but we forgot to
include the new option in the string used to decode the option bits. Add
"LACP_FAST_TIMO" to LAGG_OPT_BITS.
Also, s/LAGG_OPT_LACP_TIMEOUT/LAGG_OPT_LACP_FAST_TIMO/g , to be clearer that
the flag indicates "Fast Timeout" mode.
Reported by: Greg Foster <gfoster at panasas dot com>
Reviewed by: jpaetzel
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D25239
Add an option flag so that arbitrary updates to a lagg's configuration
do not clear sc_stride. Preseve compatibility for old ifconfig
binaries. Update ifconfig to use the new flag and improve the casting
used when parsing the option parameter.
Modify the RR transmit function to avoid locklessly reading sc_stride
twice. Ensure that sc_stride is always 1 or greater.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23092
- Don't allow an unprivileged user to set the stride. [1]
- Only set the stride under the softc lock.
- Rename the internal fields to accurately reflect their use. Keep
ro_bkt to avoid changing the user API.
- Simplify the implementation. The port index is just sc_seq / stride.
- Document rr_limit in ifconfig.8.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> [1]
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22857
This change creates an array of port maps indexed by numa domain
for lacp port selection. If we have lacp interfaces in more than
one domain, then we select the egress port by indexing into the
numa port maps and picking a port on the appropriate numa domain.
This is behavior is controlled by the new ifconfig use_numa flag
and net.link.lagg.use_numa sysctl/tunable (both modeled after the
existing use_flowid), which default to enabled.
Reviewed by: bz, hselasky, markj (and scottl, earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20060
Use the new epoch based reclamation API. Now the hot paths will not
block at all, and the sx lock is used for the softc data. This fixes LORs
reported where the rwlock was obtained when the sxlock was held.
Submitted by: mmacy
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15355
Before this change if_lagg was using nonsleepable rmlocks to protect its
internal state. This patch introduces another sx lock to protect code
paths that require sleeping, while still uses old rmlock to protect hot
nonsleepable data paths.
This change allows to remove taskqueue decoupling used before to change
interface addresses without holding the lock. Instead it uses sx lock to
protect direct if_ioctl() calls.
As another bonus, the new code synchronizes enabled capabilities of member
interfaces, and allows to control them with ifconfig laggX, that was
impossible before. This part should fix interoperation with if_bridge,
that may need to disable some capabilities, such as TXCSUM or LRO, to allow
bridging with noncapable interfaces.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10514
This would enqueue an event to send the gratuitous arp on a dying lagg
interface without any physical ports attached to it.
Apart from that, the taskqueue_drain() on lagg_clone_destroy() runs too
late, when the ifp data structure is already freed. Fix that too.
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.
Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500
Reviewed by: thompsa, glebius, adrian (old patch)
Approved by: bapt (mentor)
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D540
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.
This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.
We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.
Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce
This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.
The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link
Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.
PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111
The remove began with revision r271733.
NOTE: This patch must never be merge to 10-Stable
Reviewed by: glebius
Approved by: bapt (mentor)
Relnotes: Yes
Sponsored by: EuroBSDCon Sweden.
Differential Revision: D3786
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.
Differential Revision: https://reviews.freebsd.org/D1987
Sponsored by: Mellanox Technologies
MFC after: 1 month
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.
Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.
MFC after: 1 month
Sponsored by: Mellanox Technologies
LOR of softc rmlock in iflladdr_event handlers.
- Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR.
- Fix a panic in lacp_transit_expire().
- Fix a panic in lagg_input() upon shutting down a port.
if_lagg(4) interfaces which were cloned in a vnet jail.
Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead. Flags and per-interface
statistics counters are displayed in "ifconfig -v".
CR: D842
and receives frames on any port of the lagg(4).
Phabric: D549
Reviewed by: glebius, thompsa
Approved by: glebius
Obtained from: OpenBSD
Sponsored by: QNAP Systems Inc.
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.
Reviewed by: max
Obtained from: Netflix
MFC after: 3 days
We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.
The write lock is only grabbed when configuration changes are made - which
are infrequent.
With this patch, the contention and cycles spent waiting for updates
disappear.
Sponsored by: Netflix, Inc.
sysctl tree.
* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1
tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.
Sponsored by: Netflix
additions.
* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
to test whether seeing or hearing heartbeats correctly cause LACP to
drop the port.
* Add in (and make default) a strict LACP mode, which requires the
heartbeat on a port to be heard before it's used. Sometimes vendor ports
will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
configured ports, rather than the largest.
Obtained from: Netflix
MFC after: 2 weeks
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.
Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.
We store counters in the softc, and once per second push their values
to legacy ifnet counters.
Sponsored by: Nginx, Inc.
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.
PR: kern/164901
Submitted by: Eugene Grosbein
MFC after: 1 week
1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.
2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.
MFC after: 3 days
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.
This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.
Approved by: re (kensmith)
MFC After: 3 days