freebsd-skq

Author	SHA1	Message	Date
hselasky	727760a4e4	Revert r271504. A new patch to solve this issue will be made. Suggested by: adrian @	2014-09-13 20:52:01 +00:00
hselasky	3d04a989df	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies	2014-09-13 08:26:09 +00:00
araujo	9abce0e567	- Remove unneeded include. Phabric: D563 Reviewed by: kevlo Approved by: kevlo	2014-08-11 03:04:16 +00:00
mav	1d4e2a0972	Improve locking of multicast addresses in VLAN and LAGG interfaces. This fixes several scenarios of reproducible panics, cause by races between multicast address changes and interface destruction. MFC after: 2 weeks	2014-08-04 00:58:12 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
rmacklem	dc2495c46c	Fix build for non-INET that was broken by r264469. MFC after: 2 weeks	2014-04-15 13:28:54 +00:00
rmacklem	ff97df6be2	Lagg did not set the value of if_hw_tsomax, so when lagg was stacked on top of network interfaces that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface(s). This patch modifies lagg so that it sets if_hw_tsomax to the minimum of the value(s) for the underlying network interfaces. Reviewed by: glebius MFC after: 2 weeks	2014-04-14 20:34:48 +00:00
melifaro	881c9e28bf	Simplify filling sockaddr_dl structure for if_resolvemulti() callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers). MFC after: 2 weeks	2014-01-18 23:24:51 +00:00
scottl	4ba8fc2916	Multi-queue NIC drivers and multi-port lagg tend to use the same lower bits of the flowid as each other, resulting in a poor distribution of packets among queues in certain cases. Work around this by adding a set of sysctls for controlling a bit-shift on the flowid when doing multi-port aggrigation in lagg and lacp. By default, lagg/lacp will now use bits 16 and higher instead of 0 and higher. Reviewed by: max Obtained from: Netflix MFC after: 3 days	2013-12-30 01:32:17 +00:00
glebius	75528d8e36	There are some high performance NICs that count statistics in hardware, and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)	2013-10-09 19:04:40 +00:00
adrian	8f526008d4	Convert the if_lagg rwlock to an rmlock. We've been seeing lots of cache line contention (but not lock contention!) in our workloads between the various TX and RX threads going on. The write lock is only grabbed when configuration changes are made - which are infrequent. With this patch, the contention and cycles spent waiting for updates disappear. Sponsored by: Netflix, Inc.	2013-08-29 19:35:14 +00:00
adrian	1467e47941	Break out the static, global LACP debug options into a per-lagg unit sysctl tree. * Create a net.link.lagg.X.lacp node * Add a debug node under that for tx_test and rx_test * Add lacp_strict_mode, defaulting to 1 tx_test and rx_test are still a bitmap of unit numbers for now. At some point it would be nice to create child nodes of the lagg bundle for each sub-interface, and then populate those with various knobs and statistics. Sponsored by: Netflix	2013-07-26 19:41:13 +00:00
adrian	e729c4bb92	Bring over some link aggregation / LACP protocol improvements and debugging additions. * Add some new tracing events to aid in debugging. * Add in a debugging mode to drop transmit and received frames, specifically to test whether seeing or hearing heartbeats correctly cause LACP to drop the port. * Add in (and make default) a strict LACP mode, which requires the heartbeat on a port to be heard before it's used. Sometimes vendor ports will hang but the link layer stays up, resulting in hung traffic. * Add logging the number of link status flaps, again to aid in debugging badly behaving switch ports. * Calculate the lagg interface port speed as the multiple of the configured ports, rather than the largest. Obtained from: Netflix MFC after: 2 weeks	2013-07-13 04:25:03 +00:00
hrs	50e0add9e4	- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal. To configure an autoconfigured link-local address (RFC 4862), the following rc.conf(5) configuration can be used: ifconfig_bridge0_ipv6="inet6 auto_linklocal" - if_bridge(4) now removes IPv6 addresses on a member interface to be added when the parent interface or one of the existing member interfaces has an IPv6 address. if_bridge(4) merges each link-local scope zone which the member interfaces form respectively, so it causes address scope violation. Removal of the IPv6 addresses prevents it. - if_lagg(4) now removes IPv6 addresses on a member interfaces unconditionally. - Set reasonable flags to non-IPv6-capable interfaces. [] Submitted by: rpaulo [] MFC after: 1 week	2013-07-02 16:58:15 +00:00
delphij	d5f66cc889	Return ENETDOWN instead of ENOENT when all lagg(4) links are inactive when upper layer tries to transmit packet. This gives better feedback and meaningful errors for applications. MFC after: 2 weeks Reviewed by: thompsa	2013-06-17 19:31:03 +00:00
trociny	d7bd09411e	Properly set curvnet context in lagg_port_setlladdr() task handler. Reported by: Nikos Vassiliadis <nvass gmx.com> Submitted by: zec Tested by: Nikos Vassiliadis <nvass gmx.com> MFC after: 1 week	2013-06-07 10:27:50 +00:00
glebius	b4bc270e8f	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
glebius	d9c22bdbc9	Switch lagg(4) statistics to counter(9). The lagg(4) is often used to bond high speed links, so basic per-packet += on statistics cause cache misses and statistics loss. Perfect solution would be to convert ifnet(9) to counters(9), but this requires much more work, and unfortunately ABI change, so temporarily patch lagg(4) manually. We store counters in the softc, and once per second push their values to legacy ifnet counters. Sponsored by: Nginx, Inc.	2013-04-15 13:00:42 +00:00
glebius	82edd7c363	Remove __FreeBSD_version ifdefs.	2013-03-22 20:44:16 +00:00
glebius	bc87b91f9e	If lagg(4) can't forward a packet due to underlying port problems, return much more meaningful ENETDOWN to the stack, instead of EBUSY.	2013-01-21 08:59:31 +00:00
delphij	37fb264720	Fix build.	2012-10-17 08:19:08 +00:00
emax	f4127691ff	report total number of ports for each lagg(4) interface via net.link.lagg.X.count sysctl MFC after: 1 week	2012-10-16 22:43:14 +00:00
glebius	05f24a6b77	Make the "struct if_clone" opaque to users of the cloning API. Users now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago	2012-10-16 13:37:54 +00:00
kevlo	ceb08698f2	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
kevlo	8747a46991	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
glebius	450219f7cf	Convert lagg(4) to use if_transmit instead of if_start. In collaboration with: thompsa, sbruno, fabient	2012-09-20 10:05:10 +00:00
thompsa	c61cad51ad	Add the same check as vlan(4) where we ignore the ifnet departure event if the interface is just being renamed. PR: kern/169557 Submitted by: Mark Johnston MFC after: 3 days	2012-06-30 19:09:02 +00:00
rea	cda9e92421	if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row Currently, 'ifconfig laggX down' does not remove members from this lagg(4) interface. So, 'service netif stop laggX' followed by 'service netif start laggX' will choke, because "stop" will leave interfaces attached to the laggX and ifconfig from the "start" will refuse to add already-existing interfaces. The real-world case is when I am bundling together my Ethernet and WiFi interfaces and using multiple profiles for accessing network in different places: system being booted up with one profile, but later this profile being exchanged to another one, followed by 'service netif restart' will not add WiFi interface back to the lagg: the "stop" action from 'service netif restart' will shut down my main WiFi interface, so wlan0 that exists in the lagg0 will be destroyed and purged from lagg0; the "start" action will try to re-add both interfaces, but since Ethernet one is already in lagg0, ifconfig will refuse to add the wlan0 from WiFi interface. Since adding the interface to the lagg(4) when it is already here should be an idempotent action: we're really not changing anything, so this fix doesn't change the semantics of interface addition. Approved by: thompsa Reviewed by: emaste MFC after: 1 week	2012-05-28 12:13:04 +00:00
emaste	62cde8b2a2	Relax restriction on direct tx to child ports Lagg(4) restricts the type of packet that may be sent directly to a child port, to avoid undesired output from accidental misconfiguration. Previously only ETHERTYPE_PAE was permitted. BPF writes to a lagg(4) child port are presumably intentional, so just allow them, while still blocking other packets that should take the aggregation path. PR: kern/138620 Approved by: thompsa@	2012-05-03 01:41:12 +00:00
thompsa	8d206d7279	Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets are discarded, this is an issue because lacp drops the lock which may allow network threads to access freed memory. Expand the lock coverage so the detach/attach happen atomically. Submitted by: Andrew Boyer (earlier version)	2012-04-12 01:07:17 +00:00
thompsa	1ead29e715	Move the vlan buffer space into the union which also fixes an unused variable warning with !INET & !INET6. Spotted by: pluknet	2012-03-07 07:22:53 +00:00
thompsa	b1fbb40a93	Add the ability to set which packet layers are used for the load balance hash calculation.	2012-03-06 22:58:13 +00:00
thompsa	f16a14fbae	Add a sysctl/tunable default value for the use_flowid sysctl in r232008.	2012-02-23 21:56:53 +00:00
thompsa	c8215b632a	Using the flowid in the mbuf assumes the network card is giving a good hash for the traffic flow, this may not be the case giving poor traffic distribution. Add a sysctl which allows us to fall back to our own flow hash code. PR: kern/164901 Submitted by: Eugene Grosbein MFC after: 1 week	2012-02-22 22:01:30 +00:00
brooks	e4a4d6436f	In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month	2011-11-11 22:57:52 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
pluknet	957e60f904	Add missing MODULE_VERSION() definition to protect against duplicating module loads. PR: kern/159345 Reported by: Eugene Grosbein <egrosbein att rdtc ru> Tested by: Eugene Grosbein <egrosbein att rdtc ru> Approved by: re (kib) MFC after: 1 week	2011-08-01 11:24:55 +00:00
thompsa	9d0e437193	Grab the rlock before checking if our interface is enabled, it could be possible to hit a dead pointer when changing interfaces. PR: kern/156978 Submitted by: Andrew Boyer MFC after: 1 week	2011-07-07 20:02:09 +00:00
thompsa	fc83a48265	LACP frames must not be send VLAN-tagged, check for that before processing. PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week	2011-04-30 20:34:52 +00:00
bz	1910487722	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
eri	2ad117efbd	Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none. Approved by: thompsa(mentor) MFC after: 1 week	2011-03-04 20:37:38 +00:00
emaste	a9a1b47f1d	Add a sysctl knob to accept input packets on any link in a failover lagg.	2010-09-01 16:53:38 +00:00
delphij	fe5f1f57b8	Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month	2010-03-09 00:52:16 +00:00
eri	3c38fdad1e	Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features. PR: kern/141646 Reviewed by: thompsa Approved by: thompsa(co-mentor) MFC after: 2 weeks	2010-02-06 13:49:35 +00:00
thompsa	5056e27c2d	Declare a new EVENTHANDLER called iflladdr_event which signals that the L2 address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev	2010-01-18 20:34:00 +00:00
trasz	2ced506c4b	Stop GCC from complaining about lagg_port_checkstacking() being unused.	2010-01-08 16:44:33 +00:00
thompsa	f55b83f9f4	Use the flowid if its available for selecting the tx port.	2009-04-30 14:25:44 +00:00
kmacy	24b38efdce	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00

1 2

85 Commits