Commit Graph

103 Commits

Author SHA1 Message Date
Alexander V. Chernikov
033074c440 Replace 'struct route *' if_output() argument with 'struct nhop_info *'.
Leave 'struct route' as is for legacy routing api users.
Remove most of rtalloc_ign*-derived functions.
2014-11-09 16:33:04 +00:00
Hiroki Sato
bf6d3f0c7c - Fix lladdr configuration which could prevent LACP mode from working.
- Fix LORs when a laggport interface has an IPv6 LLA.

PR:	194321
2014-10-17 09:08:44 +00:00
Hiroki Sato
6d47816791 - Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK().  This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.
2014-10-05 02:34:21 +00:00
Hiroki Sato
9732189ca9 Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
backward compatibility with old ifconfig(8).
2014-10-02 20:01:13 +00:00
Hiroki Sato
939a050ad9 Virtualize lagg(4) cloner. This change fixes a panic when tearing down
if_lagg(4) interfaces which were cloned in a vnet jail.

Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead.  Flags and per-interface
statistics counters are displayed in "ifconfig -v".

CR:	D842
2014-10-01 21:37:32 +00:00
Gleb Smirnoff
dee826cec0 Fix off by one in lagg_port_destroy().
Reported by:	"Max N. Boyarov" <zotrix bsd.by>
2014-10-01 11:23:54 +00:00
Gleb Smirnoff
112f50ffb2 Finally, convert counters in struct ifnet to counter(9).
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-28 08:57:07 +00:00
Gleb Smirnoff
2357543753 Convert to if_inc_counter() last remnantes of bare access to struct ifnet
counters.
2014-09-28 07:43:38 +00:00
Alexander V. Chernikov
7d6cc45c9b Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by:	asomers (prev.version), glebius
CR:		D781
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-09-27 13:57:48 +00:00
Gleb Smirnoff
eade13f9d2 Remove macros that hide access to struct ifnet fields. 2014-09-26 13:02:29 +00:00
Gleb Smirnoff
38738d739a Make all lagg protocol methods live in lagg_protos, not in softc. All
interfaces of a same protocol, use the same methods.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 12:54:24 +00:00
Andrey V. Elsukov
30e5de489d Keep list of lagg ports sorted by if_index.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-09-26 12:42:06 +00:00
Gleb Smirnoff
6900d0d328 - Whitespace.
- Remove caddr_t.
2014-09-26 12:35:58 +00:00
Gleb Smirnoff
16ca790ead - Provide lagg_proto_attach(), lagg_proto_detach().
- Make detach a protocol method in lagg_protos.
- Simplify code to lookup protocols.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 11:01:04 +00:00
Gleb Smirnoff
09c7577ef3 - When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE,
then drop lock, run the attach routines, and then set it to specific
  proto. This removes tons of WITNESS warnings.
- Make lagg protocol attach handlers not failing and allocate memory
  with M_WAITOK.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 08:42:32 +00:00
Gleb Smirnoff
b1bbc5b3d1 Make lagg protocols detach methods returning void.
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 07:12:40 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Marcelo Araujo
99cdd96163 Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group
and receives frames on any port of the lagg(4).

Phabric:	D549
Reviewed by:	glebius, thompsa
Approved by:	glebius
Obtained from:	OpenBSD
Sponsored by:	QNAP Systems Inc.
2014-09-18 02:12:48 +00:00
Hans Petter Selasky
72f3100047 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
Hans Petter Selasky
eb93b77ae4 Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
Marcelo Araujo
133991579d - Remove unneeded include.
Phabric:	D563
Reviewed by:	kevlo
Approved by:	kevlo
2014-08-11 03:04:16 +00:00
Alexander Motin
2d222cb761 Improve locking of multicast addresses in VLAN and LAGG interfaces.
This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.

MFC after:	2 weeks
2014-08-04 00:58:12 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Rick Macklem
d092e11c6a Fix build for non-INET that was broken by r264469.
MFC after:	2 weeks
2014-04-15 13:28:54 +00:00
Rick Macklem
9387570f8a Lagg did not set the value of if_hw_tsomax, so when lagg
was stacked on top of network interfaces that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface(s). This patch modifies lagg so that
it sets if_hw_tsomax to the minimum of the value(s) for the
underlying network interfaces.

Reviewed by:	glebius
MFC after:	2 weeks
2014-04-14 20:34:48 +00:00
Alexander V. Chernikov
95fbe4d0cc Simplify filling sockaddr_dl structure for if_resolvemulti()
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
 with unmodified if_resolvemulti() callers).

MFC after:	2 weeks
2014-01-18 23:24:51 +00:00
Scott Long
1a8959dac6 Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases.  Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp.  By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.

Reviewed by:	max
Obtained from:	Netflix
MFC after:	3 days
2013-12-30 01:32:17 +00:00
Gleb Smirnoff
4cdc1f5421 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00
Adrian Chadd
310915a45a Convert the if_lagg rwlock to an rmlock.
We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by:	Netflix, Inc.
2013-08-29 19:35:14 +00:00
Adrian Chadd
49de4f2214 Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.

* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1

tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.

Sponsored by:	Netflix
2013-07-26 19:41:13 +00:00
Adrian Chadd
31402c27b8 Bring over some link aggregation / LACP protocol improvements and debugging
additions.

* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
  to test whether seeing or hearing heartbeats correctly cause LACP to
  drop the port.
* Add in (and make default) a strict LACP mode, which requires the
  heartbeat on a port to be heard before it's used.  Sometimes vendor ports
  will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
  badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
  configured ports, rather than the largest.

Obtained from:	Netflix
MFC after:	2 weeks
2013-07-13 04:25:03 +00:00
Hiroki Sato
af8056441e - Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
  regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
  To configure an autoconfigured link-local address (RFC 4862), the
  following rc.conf(5) configuration can be used:

   ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
  added when the parent interface or one of the existing member
  interfaces has an IPv6 address.  if_bridge(4) merges each link-local
  scope zone which the member interfaces form respectively, so it causes
  address scope violation.  Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
  unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by:	rpaulo [*]
MFC after:	1 week
2013-07-02 16:58:15 +00:00
Xin LI
eda6cf02b8 Return ENETDOWN instead of ENOENT when all lagg(4) links are
inactive when upper layer tries to transmit packet.  This
gives better feedback and meaningful errors for applications.

MFC after:	2 weeks
Reviewed by:	thompsa
2013-06-17 19:31:03 +00:00
Mikolaj Golub
f8afe33795 Properly set curvnet context in lagg_port_setlladdr() task handler.
Reported by:	Nikos Vassiliadis <nvass gmx.com>
Submitted by:	zec
Tested by:	Nikos Vassiliadis <nvass gmx.com>
MFC after:	1 week
2013-06-07 10:27:50 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Gleb Smirnoff
b64478a137 Switch lagg(4) statistics to counter(9).
The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by:	Nginx, Inc.
2013-04-15 13:00:42 +00:00
Gleb Smirnoff
209dddb90e Remove __FreeBSD_version ifdefs. 2013-03-22 20:44:16 +00:00
Gleb Smirnoff
1d9797f128 If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.
2013-01-21 08:59:31 +00:00
Xin LI
6684e46988 Fix build. 2012-10-17 08:19:08 +00:00
Maksim Yevmenkin
e9bbb44e09 report total number of ports for each lagg(4) interface
via net.link.lagg.X.count sysctl

MFC after:	1  week
2012-10-16 22:43:14 +00:00
Gleb Smirnoff
42a58907c3 Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

  if_clone_simple()
  if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with:		brooks, bz, 1 year ago
2012-10-16 13:37:54 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Gleb Smirnoff
3b7d677b8f Convert lagg(4) to use if_transmit instead of if_start.
In collaboration with:	thompsa, sbruno, fabient
2012-09-20 10:05:10 +00:00
Andrew Thompson
61587a8403 Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR:		kern/169557
Submitted by:	Mark Johnston
MFC after:	3 days
2012-06-30 19:09:02 +00:00
Eygene Ryabinkin
f74d5a7a20 if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row
Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface.  So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week
2012-05-28 12:13:04 +00:00
Ed Maste
6107adc3f5 Relax restriction on direct tx to child ports
Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR:		kern/138620
Approved by:	thompsa@
2012-05-03 01:41:12 +00:00
Andrew Thompson
b517176ad9 Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by:	Andrew Boyer (earlier version)
2012-04-12 01:07:17 +00:00