freebsd-dev

Author	SHA1	Message	Date
Hiroki Sato	023d10cbc7	Fix a bug that caused reinitialization failure of MAC addresses on the lagg interface when removing the primary port. PR: 201916 Differential Revision: https://reviews.freebsd.org/D3301	2015-10-07 06:32:34 +00:00
Marcelo Araujo	973532fc7d	Remove per complete the fec aggregation protocol. The remove began with revision r271733. NOTE: This patch must never be merge to 10-Stable Reviewed by: glebius Approved by: bapt (mentor) Relnotes: Yes Sponsored by: EuroBSDCon Sweden. Differential Revision: D3786	2015-10-04 08:00:29 +00:00
Hiren Panchasara	0e02b43a07	Make LAG LACP fast timeout tunable through IOCTL. Differential Revision: D3300 Submitted by: LN Sundararajan <lakshmi.n at msystechnologies> Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas MFC after: 2 weeks Sponsored by: Panasas	2015-08-12 20:21:04 +00:00
Andrey V. Elsukov	a4b65afcab	Fix a possible mbuf leak on interface departure. Reported by: Alexandre Martins	2015-03-26 23:40:22 +00:00
Hans Petter Selasky	b7ba031ff7	Factor out mbuf hashing code from LAGG driver so that other network drivers can use it. This avoids some code duplication. Add missing default case to all switch statements while at it. Also move the hashing of the IPv6 flow field to layer 4 because the IPv6 flow field is constant on a per L4 connection basis and not on a per L3 network. Differential Revision: https://reviews.freebsd.org/D1987 Sponsored by: Mellanox Technologies MFC after: 1 month	2015-03-11 16:02:24 +00:00
Will Andrews	0e5f55bb95	Improve the distribution of LAGG port traffic. I edited the original change to retain the use of arc4random() as a seed for the hashing as a very basic defense against intentional lagg port selection. The author's original commit message (edited slightly): sys/net/ieee8023ad_lacp.c sys/net/if_lagg.c In lagg_hashmbuf, use the FNV hash instead of the old hash32_buf. The hash32 family of functions operate one octet at a time, and when run on a string s of length n, their output is equivalent to : ----- i=n-1 \ n \ (n-i-1) 32 ( seed^ + / 33^ * s[i] ) % 2^ / ----- i=0 The problem is that the last five bytes of input don't get multiplied by sufficiently many powers of 33 to rollover 2^32. That means that changing the last few bytes (but obviously not the very last) of input will always change the value of the hash by a multiple of 33. In the case of lagg_hashmbuf() with ipv4 input, the last four bytes are the TCP or UDP port numbers. Since the output of lagg_hashmbuf is always taken modulo the port count, and 3 is a common port count for a lagg, that's bad. It means that the UDP or TCP source port will never affect which lagg member is selected on a 3-port lagg. At 10Gbps, I was not able to measure any difference in CPU consumption between the old and new hash. Submitted by: asomers (original commit) Reviewed by: emaste, glebius MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1001723 on 2013/08/28 (original) 1114258 on 2015/01/22 (edit)	2015-01-23 00:06:35 +00:00
Andrey V. Elsukov	504289ea5a	Fix condition and really sort ports. Also add comment describing the intent of this code. Reported by: sbruno MFC after: 1 week Sponsored by: Yandex LLC	2015-01-17 11:32:09 +00:00
Hans Petter Selasky	c25290420e	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies	2014-12-01 11:45:24 +00:00
Hiroki Sato	bf6d3f0c7c	- Fix lladdr configuration which could prevent LACP mode from working. - Fix LORs when a laggport interface has an IPv6 LLA. PR: 194321	2014-10-17 09:08:44 +00:00
Hiroki Sato	6d47816791	- Move L2 addr configuration for the primary port to a taskqueue. This fixes LOR of softc rmlock in iflladdr_event handlers. - Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR. - Fix a panic in lacp_transit_expire(). - Fix a panic in lagg_input() upon shutting down a port.	2014-10-05 02:34:21 +00:00
Hiroki Sato	9732189ca9	Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for backward compatibility with old ifconfig(8).	2014-10-02 20:01:13 +00:00
Hiroki Sato	939a050ad9	Virtualize lagg(4) cloner. This change fixes a panic when tearing down if_lagg(4) interfaces which were cloned in a vnet jail. Sysctl nodes which are dynamically generated for each cloned interface (net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift ifconfig(8) parameters have been added instead. Flags and per-interface statistics counters are displayed in "ifconfig -v". CR: D842	2014-10-01 21:37:32 +00:00
Gleb Smirnoff	dee826cec0	Fix off by one in lagg_port_destroy(). Reported by: "Max N. Boyarov" <zotrix bsd.by>	2014-10-01 11:23:54 +00:00
Gleb Smirnoff	112f50ffb2	Finally, convert counters in struct ifnet to counter(9). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-28 08:57:07 +00:00
Gleb Smirnoff	2357543753	Convert to if_inc_counter() last remnantes of bare access to struct ifnet counters.	2014-09-28 07:43:38 +00:00
Alexander V. Chernikov	7d6cc45c9b	Use underlying ports counters to get lagg statistics instead of per-packet accounting. This introduce user-visible changes like aggregating error counters. Reviewed by: asomers (prev.version), glebius CR: D781 MFC after: 2 weeks Sponsored by: Yandex LLC	2014-09-27 13:57:48 +00:00
Gleb Smirnoff	eade13f9d2	Remove macros that hide access to struct ifnet fields.	2014-09-26 13:02:29 +00:00
Gleb Smirnoff	38738d739a	Make all lagg protocol methods live in lagg_protos, not in softc. All interfaces of a same protocol, use the same methods. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 12:54:24 +00:00
Andrey V. Elsukov	30e5de489d	Keep list of lagg ports sorted by if_index. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2014-09-26 12:42:06 +00:00
Gleb Smirnoff	6900d0d328	- Whitespace. - Remove caddr_t.	2014-09-26 12:35:58 +00:00
Gleb Smirnoff	16ca790ead	- Provide lagg_proto_attach(), lagg_proto_detach(). - Make detach a protocol method in lagg_protos. - Simplify code to lookup protocols. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 11:01:04 +00:00
Gleb Smirnoff	09c7577ef3	- When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE, then drop lock, run the attach routines, and then set it to specific proto. This removes tons of WITNESS warnings. - Make lagg protocol attach handlers not failing and allocate memory with M_WAITOK. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 08:42:32 +00:00
Gleb Smirnoff	b1bbc5b3d1	Make lagg protocols detach methods returning void. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 07:12:40 +00:00
Hans Petter Selasky	9fd573c39d	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week	2014-09-22 08:27:27 +00:00
Marcelo Araujo	99cdd96163	Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group and receives frames on any port of the lagg(4). Phabric: D549 Reviewed by: glebius, thompsa Approved by: glebius Obtained from: OpenBSD Sponsored by: QNAP Systems Inc.	2014-09-18 02:12:48 +00:00
Hans Petter Selasky	72f3100047	Revert r271504. A new patch to solve this issue will be made. Suggested by: adrian @	2014-09-13 20:52:01 +00:00
Hans Petter Selasky	eb93b77ae4	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies	2014-09-13 08:26:09 +00:00
Marcelo Araujo	133991579d	- Remove unneeded include. Phabric: D563 Reviewed by: kevlo Approved by: kevlo	2014-08-11 03:04:16 +00:00
Alexander Motin	2d222cb761	Improve locking of multicast addresses in VLAN and LAGG interfaces. This fixes several scenarios of reproducible panics, cause by races between multicast address changes and interface destruction. MFC after: 2 weeks	2014-08-04 00:58:12 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Rick Macklem	d092e11c6a	Fix build for non-INET that was broken by r264469. MFC after: 2 weeks	2014-04-15 13:28:54 +00:00
Rick Macklem	9387570f8a	Lagg did not set the value of if_hw_tsomax, so when lagg was stacked on top of network interfaces that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface(s). This patch modifies lagg so that it sets if_hw_tsomax to the minimum of the value(s) for the underlying network interfaces. Reviewed by: glebius MFC after: 2 weeks	2014-04-14 20:34:48 +00:00
Alexander V. Chernikov	95fbe4d0cc	Simplify filling sockaddr_dl structure for if_resolvemulti() callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers). MFC after: 2 weeks	2014-01-18 23:24:51 +00:00
Scott Long	1a8959dac6	Multi-queue NIC drivers and multi-port lagg tend to use the same lower bits of the flowid as each other, resulting in a poor distribution of packets among queues in certain cases. Work around this by adding a set of sysctls for controlling a bit-shift on the flowid when doing multi-port aggrigation in lagg and lacp. By default, lagg/lacp will now use bits 16 and higher instead of 0 and higher. Reviewed by: max Obtained from: Netflix MFC after: 3 days	2013-12-30 01:32:17 +00:00
Gleb Smirnoff	4cdc1f5421	There are some high performance NICs that count statistics in hardware, and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)	2013-10-09 19:04:40 +00:00
Adrian Chadd	310915a45a	Convert the if_lagg rwlock to an rmlock. We've been seeing lots of cache line contention (but not lock contention!) in our workloads between the various TX and RX threads going on. The write lock is only grabbed when configuration changes are made - which are infrequent. With this patch, the contention and cycles spent waiting for updates disappear. Sponsored by: Netflix, Inc.	2013-08-29 19:35:14 +00:00
Adrian Chadd	49de4f2214	Break out the static, global LACP debug options into a per-lagg unit sysctl tree. * Create a net.link.lagg.X.lacp node * Add a debug node under that for tx_test and rx_test * Add lacp_strict_mode, defaulting to 1 tx_test and rx_test are still a bitmap of unit numbers for now. At some point it would be nice to create child nodes of the lagg bundle for each sub-interface, and then populate those with various knobs and statistics. Sponsored by: Netflix	2013-07-26 19:41:13 +00:00
Adrian Chadd	31402c27b8	Bring over some link aggregation / LACP protocol improvements and debugging additions. * Add some new tracing events to aid in debugging. * Add in a debugging mode to drop transmit and received frames, specifically to test whether seeing or hearing heartbeats correctly cause LACP to drop the port. * Add in (and make default) a strict LACP mode, which requires the heartbeat on a port to be heard before it's used. Sometimes vendor ports will hang but the link layer stays up, resulting in hung traffic. * Add logging the number of link status flaps, again to aid in debugging badly behaving switch ports. * Calculate the lagg interface port speed as the multiple of the configured ports, rather than the largest. Obtained from: Netflix MFC after: 2 weeks	2013-07-13 04:25:03 +00:00
Hiroki Sato	af8056441e	- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal. To configure an autoconfigured link-local address (RFC 4862), the following rc.conf(5) configuration can be used: ifconfig_bridge0_ipv6="inet6 auto_linklocal" - if_bridge(4) now removes IPv6 addresses on a member interface to be added when the parent interface or one of the existing member interfaces has an IPv6 address. if_bridge(4) merges each link-local scope zone which the member interfaces form respectively, so it causes address scope violation. Removal of the IPv6 addresses prevents it. - if_lagg(4) now removes IPv6 addresses on a member interfaces unconditionally. - Set reasonable flags to non-IPv6-capable interfaces. [] Submitted by: rpaulo [] MFC after: 1 week	2013-07-02 16:58:15 +00:00
Xin LI	eda6cf02b8	Return ENETDOWN instead of ENOENT when all lagg(4) links are inactive when upper layer tries to transmit packet. This gives better feedback and meaningful errors for applications. MFC after: 2 weeks Reviewed by: thompsa	2013-06-17 19:31:03 +00:00
Mikolaj Golub	f8afe33795	Properly set curvnet context in lagg_port_setlladdr() task handler. Reported by: Nikos Vassiliadis <nvass gmx.com> Submitted by: zec Tested by: Nikos Vassiliadis <nvass gmx.com> MFC after: 1 week	2013-06-07 10:27:50 +00:00
Gleb Smirnoff	47e8d432d5	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
Gleb Smirnoff	b64478a137	Switch lagg(4) statistics to counter(9). The lagg(4) is often used to bond high speed links, so basic per-packet += on statistics cause cache misses and statistics loss. Perfect solution would be to convert ifnet(9) to counters(9), but this requires much more work, and unfortunately ABI change, so temporarily patch lagg(4) manually. We store counters in the softc, and once per second push their values to legacy ifnet counters. Sponsored by: Nginx, Inc.	2013-04-15 13:00:42 +00:00
Gleb Smirnoff	209dddb90e	Remove __FreeBSD_version ifdefs.	2013-03-22 20:44:16 +00:00
Gleb Smirnoff	1d9797f128	If lagg(4) can't forward a packet due to underlying port problems, return much more meaningful ENETDOWN to the stack, instead of EBUSY.	2013-01-21 08:59:31 +00:00
Xin LI	6684e46988	Fix build.	2012-10-17 08:19:08 +00:00
Maksim Yevmenkin	e9bbb44e09	report total number of ports for each lagg(4) interface via net.link.lagg.X.count sysctl MFC after: 1 week	2012-10-16 22:43:14 +00:00
Gleb Smirnoff	42a58907c3	Make the "struct if_clone" opaque to users of the cloning API. Users now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago	2012-10-16 13:37:54 +00:00

1 2 3

110 Commits