Commit Graph

142 Commits

Author SHA1 Message Date
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Alexander Motin
167a34407c Keep CARP state as INIT when net.inet.carp.allow=0.
Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is
not very useful (if there are other masters -- it can lead to split brain,
if there are none -- it makes no sense).  Having it as INIT makes it clear
that carp packets are disabled.

Submitted by:	wg
MFC after:	1 month
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14477
2018-05-07 14:44:55 +00:00
Stephen Hurd
f3e1324b41 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
Brooks Davis
0437c8e3b1 Remove support for FDDI networks.
Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by:	kib, jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15017
2018-04-11 17:28:24 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Brooks Davis
69f0fecbd6 Remove infrastructure for token-ring networks.
Reviewed by:	cem, imp, jhb, jmallett
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14875
2018-03-28 23:33:26 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Alexander Motin
81098a018e Relax per-ifnet cif_vrs list double locking in carp(4).
In all cases where cif_vrs list is modified, two locks are held: per-ifnet
CIF_LOCK and global carp_sx.  It means to read that list only one of them
is enough to be held, so we can skip CIF_LOCK when we already have carp_sx.

This fixes kernel panic, caused by attempts of copyout() to sleep while
holding non-sleepable CIF_LOCK mutex.

Discussed with:	glebius
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-10-19 09:01:15 +00:00
Luiz Otavio O Souza
338e227ac0 After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by:	glebius
Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-25 19:04:08 +00:00
Enji Cooper
cfff8d3dbd Unbreak ip_carp with WITHOUT_INET6 enabled by conditionalizing all IPv6
structs under the INET6 #ifdef. Similarly (even though it doesn't seem
to affect the build), conditionalize all IPv4 structs under the INET
#ifdef

This also unbreaks the LINT-NOINET6 tinderbox target on amd64; I have not
verified other MACHINE/TARGET pairs (e.g. armv6/arm).

MFC after:	2 weeks
X-MFC with:	r310847
Pointyhat to:	jpaetzel
Reported by:	O. Hartmann <o.hartmann@walstatt.org>
2016-12-30 21:33:01 +00:00
Josh Paetzel
8151740c88 Harden CARP against network loops.
If there is a loop in the network a CARP that is in MASTER state will see it's
own broadcasts, which will then cause it to assume BACKUP state.  When it
assumes BACKUP it will stop sending advertisements.  In that state it will no
longer see advertisements and will assume MASTER...

We can't catch all the cases where we are seeing our own CARP broadcast, but
we can catch the obvious case.

Submitted by:	torek
Obtained from:	FreeNAS
MFC after:	2 weeks
Sponsored by:	iXsystems
2016-12-30 18:46:21 +00:00
Steven Hartland
d6e82913c1 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
Steven Hartland
3a909afe8e Fix issues introduced by r292275
* Fix panic for etherswitches which don't have a LLADDR.
* Disabled DELAY in unsolicited NDA, which needs further work.
* Fixed missing DELAY in carp_send_na.
* style(9) fix.

Reported by:	kp & melifaro
X-MFC-With:	r292275
MFC after:	1 month
Sponsored by:	Multiplay
2015-12-16 22:26:28 +00:00
Steven Hartland
52e53e2de0 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
Steven Hartland
9925ac11da Revert r290403
CARP rework invalidated this change.
2015-11-13 23:14:39 +00:00
Alexander V. Chernikov
1c302b58da Decompose arp_ifinit() into arp_add_ifa_lle() and arp_announce_ifaddr().
Rename arp_ifinit2() into arp_announce_ifaddr().

Eliminate zeroing ifa_rtrequest: it was used for calling arp_rtrequest()
which was responsible for handling route cloning requests. It became
obsolete since r186119 (L2/L3 split).
2015-11-09 10:35:33 +00:00
Steven Hartland
ac19560a34 Add MTU support to carp interfaces
MFC after:	2 weeks
Sponsored by:	Multiplay
2015-11-05 17:23:02 +00:00
Alexander V. Chernikov
3e7a2321e3 * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
  more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3573
2015-09-14 16:48:19 +00:00
Gleb Smirnoff
9c2cd1aa84 Improve carp(4) locking:
- Use the carp_sx to serialize not only CARP ioctls, but also carp_attach()
  and carp_detach().
- Use cif_mtx to lock only access to those the linked list.
- These locking changes allow us to do some memory allocations with M_WAITOK
  and also properly call callout_drain() in carp_destroy().
- In carp_attach() assert that ifaddr isn't attached. We always come here
  with a pristine address from in[6]_control().

Reviewed by:	oleg
Sponsored by:	Nginx, Inc.
2015-04-21 20:25:12 +00:00
Gleb Smirnoff
93d4534cdc Add sleepable lock to protect at least against two parallel SIOCSVHs.
Sponsored by:	Nginx, Inc.
2015-04-06 15:31:19 +00:00
Gleb Smirnoff
6d947416cc o Use new function ip_fillid() in all places throughout the kernel,
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
  datagrams to any value, to improve performance. The behaviour is
  controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
  default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.

Differential Revision:		https://reviews.freebsd.org/D2177
Reviewed by:			adrian, cy, rpaulo
Tested by:			Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by:			Netflix
Sponsored by:			Nginx, Inc.
Relnotes:			yes
2015-04-01 22:26:39 +00:00
Will Andrews
bb269f3ae4 Log hardware interface up/down as "hardware" rather than just "hw".
Suggested by:	glebius
MFC after:	1 week
MFC with:	277530
2015-01-23 14:30:24 +00:00
Will Andrews
369a670857 When a CARP state change is caused by an ifconfig request, log it accordingly.
Suggested by:	glebius
MFC after:	1 week
MFC with:	277530
2015-01-23 14:28:12 +00:00
Will Andrews
d01641e2c1 Improve CARP logging so that all state transitions are logged.
sys/netinet/ip_carp.c:
	Add a "reason" string parameter to carp_set_state() and
	carp_master_down_locked() allowing more specific logging
	information to be passed into these apis.

	Refactor existing state transition logging into a single
	log call in carp_set_state().

	Update all calls to carp_set_state() and
	carp_master_down_locked() to pass an appropriate reason
	string.  For state transitions that were previously logged,
	the output should be unchanged.

Submitted by:	gibbs (original), asomers (updated)
MFC after:	1 week
Sponsored by:	Spectra Logic
MFSpectraBSD:	1039697 on 2014/02/11 (original)
		1049992 on 2014/03/21 (updated)
2015-01-22 17:09:54 +00:00
Luiz Otavio O Souza
57c5139c46 Remove the check that prevent carp(4) advskew to be set to '0'.
CARP devices are created with advskew set to '0' and once you set it to
any other value in the valid range (0..254) you can't set it back to zero.

The code in question is also used to prevent that zeroed values overwrite
the CARP defaults when a new CARP device is created.  Since advskew already
defaults to '0' for newly created devices and the new value is guaranteed
to be within the valid range, it is safe to overwrite it here.

PR:		194672
Reported by:	cmb@pfsense.org
In collaboration with:	garga
Tested by:	garga
MFC after:	2 weeks
2015-01-06 13:07:13 +00:00
Robert Watson
ed6a66ca6c To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision:	https://reviews.freebsd.org/D1436
Reviewed by:	glebius, trasz, gnn, bz
Sponsored by:	EMC / Isilon Storage Division
2015-01-05 09:58:32 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Kevin Lo
73d76e77b6 Change pr_output's prototype to avoid the need for explicit casts.
This is a follow up to r269699.

Phabric:	D564
Reviewed by:	jhb
2014-08-15 02:43:02 +00:00
Kevin Lo
8f5a8818f5 Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric:	D476
Reviewed by:	jhb
2014-08-08 01:57:15 +00:00
Gleb Smirnoff
4a2dd8d4fb Improve logging of send errors, reporting error code and interface.
Reduce code duplication between INET and INET6.

Tested by:	Lytochkin Boris <lytboris gmail.com>
2014-02-22 19:20:40 +00:00
Alexander V. Chernikov
f6b84910bb Further rework netinet6 address handling code:
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
  in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
  in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
  in6_ifaddloop -> nd6_add_ifa_lle
  in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
2014-01-19 16:07:27 +00:00
Gleb Smirnoff
0cc726f25a Make failure of ifpromisc() a non-fatal error. This makes it possible to
run carp(4) on vtnet(4).

Sponsored by:	Nginx, Inc.
2014-01-03 11:03:12 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Mikolaj Golub
1f6addd92c Relese the interface in the last.
Reviewed by:	glebius
Approved by:	re (kib)
2013-09-08 18:19:40 +00:00
Mikolaj Golub
c5c392e7ed Virtualize carp(4) variables to have per vnet control.
Reviewed by:	ae, glebius
2013-08-13 19:59:49 +00:00
Andrey V. Elsukov
69edf037d7 Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Gleb Smirnoff
dc4ad05ecd Use m_get/m_gethdr instead of compat macros.
Sponsored by:	Nginx, Inc.
2013-03-15 12:55:30 +00:00
Gleb Smirnoff
24421c1c32 Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master()   - boolean function which is true if an address
		    is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
		    and is aware of CARP.

  Utilize ifa_preferred() in ifa_ifwithnet().

  The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by:	Anton Yuzhaninov <citrin citrin.ru>
Sponsored by:		Nginx, Inc
2013-02-11 10:58:22 +00:00
Gleb Smirnoff
c4d0697685 Garbage collect carp_cksum(). 2012-12-25 14:29:38 +00:00
Gleb Smirnoff
7951008b47 Change net.inet.carp.demotion sysctl to add the supplied value
to the current demotion factor instead of assigning it.

  This allows external scripts to control demotion factor together
with kernel in a raceless manner.
2012-12-25 14:08:13 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Gleb Smirnoff
891122d180 carp_send_ad() should never return without rescheduling next run. 2012-09-29 05:52:19 +00:00
Bjoern A. Zeeb
8253dcabe7 Fix a problem when CARP is enabled on the interface for IPv4
but not for IPv6.  The current checks in nd6_nbr.c along with the
old version will result in ifa being NULL and subsequently the
packet will be dropped.  This prevented NS/NA, from working and
with that IPv6.

Now return the ifa from the carp lookup function in two cases:
1) if the address matches, is a carp address, and we are MASTER
   (as before),
2) if the address matches but it is not a carp address at all (new).

Reported by:	Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood)
Tested on:	New Y! FreeBSD cluster machines
Reviewed by:	glebius
2012-07-25 12:14:39 +00:00
Gleb Smirnoff
eaf151c49d Improve style(9) of bcopy() to and from mbuf tag.
Submitted by:	bde
2012-05-30 13:51:00 +00:00
Gleb Smirnoff
a856ddc665 After r228571 carp_output() expects carp_softc * pointer in the mtag.
Noticed by:	thompsa
2012-05-30 07:11:27 +00:00
Gleb Smirnoff
a9a2c40ced It is a logical error that in carp_multicast_cleanup()
we look at count of addresses on a particular vhid, we
should account number of addresses on cif.

To achieve this we need to run carp_attach() and
carp_detach() under appropriate cif lock.
2012-04-11 12:26:30 +00:00
Gleb Smirnoff
9ca1fe0db8 CARP should be capable to run on if_bridge(4). Unfortunately,
this commit is not enough to enable CARP operation on
if_bridge(4), because the latter doesn't handle or even
initialize its ifp->if_link_state.

Reported by:	Alexander Lunev <sol289 gmail.com>
2012-04-10 05:42:48 +00:00