135 Commits

Author SHA1 Message Date
smh
a74104797b Disabled the use of flowid for lagg by default
Disabled the use of RSS hash from the network card aka flowid for
lagg(4) interfaces by default as it's currently incompatible with
the lacp and loadbalance protocols.

The incompatibility is due to the fact that the flowid isn't know
for the first packet of a new outbound stream which can result in
the hash calculation method changing and hence a stream being
incorrectly split across multiple interfaces during normal
operation.

This can be re-enabled by setting the following in loader.conf:
net.link.lagg.default_use_flowid="1"

Discussed with: kmacy
Sponsored by:	Multiplay
2018-01-04 20:05:47 +00:00
sbruno
0c1d43a29b Don't hold the RM lock during lagg_proto_addport() to avoid an LOR.
Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by:	mav
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11711
2017-07-25 14:41:50 +00:00
mav
0c95955a5e Call VLAN_CAPABILITIES() when LAGG capabilities change.
This makes VLAN on top of LAGG to expose proper capabilities if they are
changed after creation.

MFC after:	1 week
2017-05-26 22:22:48 +00:00
mav
b1b0d2011f Improve applying unified capabilities to the lagg ports.
Some NICs have some capabilities dependent, so that disabling one require
disabling some other (TXCSUM/RXCSUM on em).  This code tries to reach the
consensus more insistently.

PR:		219453
MFC after:	1 week
2017-05-26 20:15:33 +00:00
mav
20571e84fe Remove some code, dead from the day one. 2017-05-25 23:19:09 +00:00
mav
d332b8edc8 Relax r317696 locking to not drain taskqueue under the lock.
MFC after:	11 days
2017-05-05 16:51:53 +00:00
mav
d6379610b4 Fix r317696 build without debug.
MFC after:	2 weeks
2017-05-03 02:54:11 +00:00
mav
39f39609ab Introduce sleepable locks into if_lagg.
Before this change if_lagg was using nonsleepable rmlocks to protect its
internal state.  This patch introduces another sx lock to protect code
paths that require sleeping, while still uses old rmlock to protect hot
nonsleepable data paths.

This change allows to remove taskqueue decoupling used before to change
interface addresses without holding the lock.  Instead it uses sx lock to
protect direct if_ioctl() calls.

As another bonus, the new code synchronizes enabled capabilities of member
interfaces, and allows to control them with ifconfig laggX, that was
impossible before.  This part should fix interoperation with if_bridge,
that may need to disable some capabilities, such as TXCSUM or LRO, to allow
bridging with noncapable interfaces.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D10514
2017-05-02 19:09:11 +00:00
mav
e3b5ab75e5 Remove unneeded conditions.
MFC after:	2 weeks
2017-04-22 08:38:49 +00:00
mav
dfb79d4360 Add interface reference counting to if_lagg.
Using plain ifunit() looks like request for troubles.

MFC after:	2 weeks
2017-04-21 13:45:01 +00:00
loos
1f77bc5d5c Do not update the lagg link layer address when destroying a lagg clone.
This would enqueue an event to send the gratuitous arp on a dying lagg
interface without any physical ports attached to it.

Apart from that, the taskqueue_drain() on lagg_clone_destroy() runs too
late, when the ifp data structure is already freed.  Fix that too.

Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-30 03:04:33 +00:00
hselasky
efa6326974 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
asomers
d2d98e596c Remove stray debugging code from r310180
Reported by:	rstone
Pointy hat to:	asomers
MFC after:	3 weeks
X-MFC-with:	310180
Sponsored by:	Spectra Logic Corp
2016-12-20 15:45:53 +00:00
asomers
9d0f88b640 Fix panic during lagg destruction with simultaneous status check
If you run "ifconfig lagg0 destroy" and "ifconfig lagg0" at the same time a
page fault may result. The first process will destroy ifp->if_lagg in
lagg_clone_destroy (called by if_clone_destroy). Then the second process
will observe that ifp->if_lagg is NULL at the top of lagg_port_ioctl and
goto fallback: where it will promptly dereference ifp->if_lagg anyway.

The solution is to repeat the NULL check for ifp->if_lagg

MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D8512
2016-12-16 22:39:30 +00:00
bz
7a1c0b1ad1 Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
pfg
d9c9113377 sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
rpokala
b4cec75b57 Revert accidental submit of WIP as part of r297609
Pointyhat to:	rpokala
2016-04-06 04:58:20 +00:00
rpokala
1f8d2c0640 Storage Controller Interface driver - typo in unimplemented macro in
scic_sds_controller_registers.h

s/contoller/controller/

PR:		207336
Submitted by:	Tony Narlock <tony @ git-pull.com>
2016-04-06 04:50:28 +00:00
araujo
42270c385f Fix regression introduced on 272446r.
lagg(4) supports the protocol none, where it disables any traffic without
disabling the lagg(4) interface itself.

PR:		206921
Submitted by:	Pushkar Kothavade <pushkarbk@gmail.com>
Reviewed by:	rpokala
Approved by:	bapt (mentor)
MFC after:	3 weeks
Sponsored by:	gandi.net
Differential Revision:	https://reviews.freebsd.org/D5076
2016-02-19 06:35:53 +00:00
araujo
a1319b3488 Add an IOCTL rr_limit to let users fine tuning the number of packets to be
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.

Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
	192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500

Reviewed by:	thompsa, glebius, adrian (old patch)
Approved by:	bapt (mentor)
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D540
2016-01-23 04:18:44 +00:00
smh
45d5617154 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
smh
864cf18128 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
melifaro
c49cb93294 Move iflladdr_event eventhandler invocation to if_setlladdr.
Suggested by:	glebius
2015-11-14 13:34:03 +00:00
melifaro
a23a95f981 Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after:		4 weeks
Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D4004
2015-11-01 19:59:04 +00:00
hrs
30ae8a4cad Fix a panic when destroying a lagg interface.
Differential Revision:	https://reviews.freebsd.org/D3883
2015-10-16 01:16:01 +00:00
hrs
03da536d33 Fix a bug that caused reinitialization failure of MAC addresses on
the lagg interface when removing the primary port.

PR:			201916
Differential Revision:	https://reviews.freebsd.org/D3301
2015-10-07 06:32:34 +00:00
araujo
2bbb4de455 Remove per complete the fec aggregation protocol.
The remove began with revision r271733.

NOTE: This patch must never be merge to 10-Stable

Reviewed by:	glebius
Approved by:	bapt (mentor)
Relnotes:	Yes
Sponsored by:	EuroBSDCon Sweden.
Differential Revision:	D3786
2015-10-04 08:00:29 +00:00
hiren
99dda03ed4 Make LAG LACP fast timeout tunable through IOCTL.
Differential Revision:	D3300
Submitted by:		LN Sundararajan <lakshmi.n at msystechnologies>
Reviewed by:		wblock, smh, gnn, hiren, rpokala at panasas
MFC after:		2 weeks
Sponsored by:		Panasas
2015-08-12 20:21:04 +00:00
ae
cfc3df2b8f Fix a possible mbuf leak on interface departure.
Reported by:	Alexandre Martins
2015-03-26 23:40:22 +00:00
hselasky
de871211fe Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00
will
2226bf3149 Improve the distribution of LAGG port traffic.
I edited the original change to retain the use of arc4random() as a seed for
the hashing as a very basic defense against intentional lagg port selection.

The author's original commit message (edited slightly):

sys/net/ieee8023ad_lacp.c
sys/net/if_lagg.c
	In lagg_hashmbuf, use the FNV hash instead of the old
	hash32_buf.  The hash32 family of functions operate one octet
	at a time, and when run on a string s of length n, their output
	is equivalent to :

		   ----- i=n-1
		   \
	       n    \           (n-i-1)              32
	( seed^  +  /        33^        * s[i] ) % 2^
		   /
		   ----- i=0

	The problem is that the last five bytes of input don't get
	multiplied by sufficiently many powers of 33 to rollover 2^32.
	That means that changing the last few bytes (but obviously not
	the very last) of input will always change the value of the
	hash by a multiple of 33.  In the case of lagg_hashmbuf() with
	ipv4 input, the last four bytes are the TCP or UDP port
	numbers.  Since the output of lagg_hashmbuf is always taken
	modulo the port count, and 3 is a common port count for a lagg,
	that's bad.  It means that the UDP or TCP source port will
	never affect which lagg member is selected on a 3-port lagg.

	At 10Gbps, I was not able to measure any difference in CPU
	consumption between the old and new hash.

Submitted by:	asomers (original commit)
Reviewed by:	emaste, glebius
MFC after:	1 week
Sponsored by:	Spectra Logic
MFSpectraBSD:	1001723 on 2013/08/28 (original)
		1114258 on 2015/01/22 (edit)
2015-01-23 00:06:35 +00:00
ae
240f3ae6d2 Fix condition and really sort ports. Also add comment describing
the intent of this code.

Reported by:	sbruno
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-01-17 11:32:09 +00:00
hselasky
12fec3618b Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
hrs
95ad85717f - Fix lladdr configuration which could prevent LACP mode from working.
- Fix LORs when a laggport interface has an IPv6 LLA.

PR:	194321
2014-10-17 09:08:44 +00:00
hrs
2edb8f7e2f - Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK().  This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.
2014-10-05 02:34:21 +00:00
hrs
db53b4f174 Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
backward compatibility with old ifconfig(8).
2014-10-02 20:01:13 +00:00
hrs
667a2b7369 Virtualize lagg(4) cloner. This change fixes a panic when tearing down
if_lagg(4) interfaces which were cloned in a vnet jail.

Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead.  Flags and per-interface
statistics counters are displayed in "ifconfig -v".

CR:	D842
2014-10-01 21:37:32 +00:00
glebius
57bac09d3f Fix off by one in lagg_port_destroy().
Reported by:	"Max N. Boyarov" <zotrix bsd.by>
2014-10-01 11:23:54 +00:00
glebius
2cb6078939 Finally, convert counters in struct ifnet to counter(9).
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-28 08:57:07 +00:00
glebius
53d7fba29f Convert to if_inc_counter() last remnantes of bare access to struct ifnet
counters.
2014-09-28 07:43:38 +00:00
melifaro
7d70b89c51 Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by:	asomers (prev.version), glebius
CR:		D781
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-09-27 13:57:48 +00:00
glebius
58a4ee184a Remove macros that hide access to struct ifnet fields. 2014-09-26 13:02:29 +00:00
glebius
f564c3e730 Make all lagg protocol methods live in lagg_protos, not in softc. All
interfaces of a same protocol, use the same methods.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 12:54:24 +00:00
ae
530a56d2e5 Keep list of lagg ports sorted by if_index.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-09-26 12:42:06 +00:00
glebius
7f6197c96b - Whitespace.
- Remove caddr_t.
2014-09-26 12:35:58 +00:00
glebius
62993359de - Provide lagg_proto_attach(), lagg_proto_detach().
- Make detach a protocol method in lagg_protos.
- Simplify code to lookup protocols.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 11:01:04 +00:00
glebius
680ed8e05c - When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE,
then drop lock, run the attach routines, and then set it to specific
  proto. This removes tons of WITNESS warnings.
- Make lagg protocol attach handlers not failing and allocate memory
  with M_WAITOK.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 08:42:32 +00:00
glebius
e30ec249f1 Make lagg protocols detach methods returning void.
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 07:12:40 +00:00
hselasky
bdacf9ba4d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
araujo
d7a9c633d7 Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group
and receives frames on any port of the lagg(4).

Phabric:	D549
Reviewed by:	glebius, thompsa
Approved by:	glebius
Obtained from:	OpenBSD
Sponsored by:	QNAP Systems Inc.
2014-09-18 02:12:48 +00:00