Commit Graph

209 Commits

Author SHA1 Message Date
Alexander V. Chernikov
e362cf0e9f MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
Andrey V. Elsukov
30aee13117 Add LLE event handler to report ND6 events to userland via rtsock.
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:58:32 +00:00
Andrey V. Elsukov
4e870f943f Move RTM announces into generic code to be independent from Layer2 code.
This fixes bug introduced in 274988, when announces about new addresses
don't sent for tunneling interfaces.

Reported by:	tuexen@
MFC after:	1 week
2015-05-29 10:24:16 +00:00
Gleb Smirnoff
0fa5aacd8b Remove #ifdef IFT_FOO.
Submitted by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:31:27 +00:00
Mark Johnston
dff78447a4 Fix a possible refcount leak in regen_tmpaddr().
public_ifa6 may be set to NULL after taking a reference to a previous
address list element. Instead, only take the reference after leaving the
loop but before releasing the address list lock.

Differential Revision:	https://reviews.freebsd.org/D2253
Reviewed by:		ae
MFC after:		2 weeks
2015-04-13 01:55:42 +00:00
Hiroki Sato
11d8451df3 Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.

This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet.  This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).

The length of the nonce is set to 6 bytes.  This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.

Reported by:		hiren
Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D1835
2015-03-02 17:30:26 +00:00
Randall Stewart
2575fbb827 This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by:	adrian, jhb and bz
Sponsored by:	Netflix Inc.
2015-02-09 19:28:11 +00:00
Alexander V. Chernikov
d63e657c04 * Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4:
handle it in arc_output() instead of nd6_storelladdr().
* Remove IFT_ARCNET check from arpresolve() since arc_output() does not
  use arpresolve() to handle broadcast/multicast. This check was there
  since r84931. It looks like it was not used since r89099 (initial
  import of Arcnet support where multicast is handled separately).
* Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output()
  calles nd6_storelladdr() for unicast addresses only.
* Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now
  handles multicast by itself.

As a result, we have the following pattern: all non-ethernet-style
media have their own multicast map handling inside their appropriate
routines. On the other hand, arpresolve() (and nd6_storelladdr()) which
meant to be 'generic' ones de-facto handles ethernet-only multicast maps.

MFC after:	3 weeks
2015-01-09 12:56:51 +00:00
Alexander V. Chernikov
abc1be9062 Add forgotten definition for nd6_output_ifp(). 2015-01-08 18:29:54 +00:00
Alexander V. Chernikov
d7968c29ec * Use newly-created nd6_grab_holdchain() function to retrieve lle
hold mbuf chain instead of calling full-blown nd6_output_lle()
  for each packet. This simplifies both callers and nd6_output_lle()
  implementation.
* Make nd6_output_lle() static and remove now-unused lle and chain
  arguments.
* Rename nd6_output_flush() -> nd6_flush_holdchain() to be consistent.
* Move all pre-send transmit hooks to newly-created nd6_output_ifp().
  Now nd6_output(), nd6_output_lle() and nd6_flush_holdchain() are using
  it to send mbufs to if_output.
* Remove SeND hook from nd6_na_input() because it was implemented
  incorrectly since the beginning (r211501):
  - it tagged initial input mbuf (m) instead of m_hold
  - tagging _all_ mbufs in holdchain seems to be wrong anyway.
2015-01-08 18:02:05 +00:00
Mark Johnston
d6ad6a865a Add refcounting to IPv6 DAD objects and simplify the DAD code to fix a
number of races which could cause double frees or use-after-frees when
performing DAD on an address. In particular, an IPv6 address can now only be
marked as a duplicate from the DAD callout.

Differential Revision:	https://reviews.freebsd.org/D1258
Reviewed by:		ae, hrs
Reported by:		rstone
MFC after:		1 month
2014-12-08 04:44:40 +00:00
Alexander V. Chernikov
74860d4f7c Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr -
return lle flags IFF needed.
Do not pass rte to arpresolve - pass is_gateway flag instead.
2014-11-27 23:06:25 +00:00
Andrey V. Elsukov
af6209a133 Skip L2 addresses lookups for p2p interfaces.
Discussed with:	melifaro
Sponsored by:	Yandex LLC
2014-11-24 21:51:43 +00:00
Alexander V. Chernikov
b4b1367ae4 * Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
  Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().
2014-11-15 18:54:07 +00:00
Gleb Smirnoff
e6abaf91f4 Consistently use if_link.
Reviewed by:	ae, melifaro
2014-11-10 15:56:30 +00:00
Alexander V. Chernikov
d0f9fca40d Remove forgotten arguments. 2014-11-09 16:57:31 +00:00
Alexander V. Chernikov
033074c440 Replace 'struct route *' if_output() argument with 'struct nhop_info *'.
Leave 'struct route' as is for legacy routing api users.
Remove most of rtalloc_ign*-derived functions.
2014-11-09 16:33:04 +00:00
Alexander V. Chernikov
9c9bde01d1 Remove unused 'struct route *' argument from nd6_output_flush(). 2014-11-09 16:20:27 +00:00
Alexander V. Chernikov
a9413f6ca0 Sync to HEAD@r274297. 2014-11-08 18:13:35 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Alexander V. Chernikov
1a75e3b20f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
Alexander V. Chernikov
8c3cfe0be0 Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.
2014-11-04 17:28:13 +00:00
Alan Somers
4f8585e021 Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
	Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
	Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
	versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
	Fixup calls of modified functions.

share/man/man9/ifnet.9
	Document changed API.

CR:		https://reviews.freebsd.org/D458
MFC after:	Never
Sponsored by:	Spectra Logic
2014-09-11 20:21:03 +00:00
Andrey V. Elsukov
ff899182ec Fix condition.
Sponsored by:	Yandex LLC
2014-07-11 06:34:15 +00:00
Hajimu UMEMOTO
f4839cbc0a Make nd6_gctimer tunable.
MFC after:	1 week
2014-06-23 16:27:29 +00:00
Alan Somers
2f308a343f Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
	Add legacy-compatible functions as described above.  Ensure legacy
	behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
	Call with _fib() functions if we must use a specific fib, or the
	legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
	Improve the udp_dontroute test.  The bug that this test exercises is
	that ifa_ifwithnet() will return the wrong address, if multiple
	interfaces have addresses on the same subnet but with different
	fibs.  The previous version of the test only considered one possible
	failure mode: that ifa_ifwithnet_fib() might fail to find any
	suitable address at all.  The new version also checks whether
	ifa_ifwithnet_fib() finds the correct address by checking where the
	ARP request goes.

Reported by:	bz, hrs
Reviewed by:	hrs
MFC after:	1 week
X-MFC-with:	264905
Sponsored by:	Spectra Logic
2014-05-29 21:03:49 +00:00
Alan Somers
0cfee0c223 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00
Alexander V. Chernikov
f6990c4e3e Further simplify nd6_output_lle.
Currently we have 3 usage patterns:
1) nd6_output (most traffic flow, no lle supplied, lle RLOCK sufficient)
2) corner cases for output (no lle, STALE lle, so on). lle WLOCK needed.
3) nd* iunternal machinery (WLOCK'ed lle provided, perform packet queing).

We separate case 1 and implement it inside its only customer - nd6_output.
This leads to some code duplication (especialy SEND stuff, which should be
hooked to output in a different way), but simplifies locking and control
flow logic fir nd6_output_lle.

Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2014-02-13 19:09:04 +00:00
Alexander V. Chernikov
9dffa6a3f3 Simplify nd6_output_lle:
* Check ND6_IFF_IFDISABLED before acquiring any locks
* Assume m is always non-NULL
* remove 'bad' case not used anymore
* Simply if_output conditional

MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-02-10 12:52:33 +00:00
Andrey V. Elsukov
74a976fffd Unlock entry before retry.
Submitted by:	melifaro
MFC after:	1 week
2014-02-07 10:58:46 +00:00
Andrey V. Elsukov
51eecdc35a Take exclusive lock only when lle isn't NULL. We don't need write access
to lle in most cases.

MFC after:	1 week
Sponsored by:	Yandex LLC
2014-02-02 07:28:04 +00:00
Alexander V. Chernikov
f6b84910bb Further rework netinet6 address handling code:
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
  in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
  in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
  in6_ifaddloop -> nd6_add_ifa_lle
  in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
2014-01-19 16:07:27 +00:00
Andrey V. Elsukov
ea0c377602 lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

Reviewed by:	glebius, adrian
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-01-02 08:40:37 +00:00
Adrian Chadd
c445d2520d Use an RLOCK here instead of an RWLOCK - matching all the other calls
to lla_lookup().

This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.

Tested:

* parallel IPv6 TCP bulk data exchange, 8192 sockets

MFC after:	1 week
Sponsored by:	Netflix, Inc.
2014-01-01 00:56:26 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Mark Johnston
57f6086735 Implement the ip, tcp, and udp DTrace providers. The probe definitions use
dynamic translation so that their arguments match the definitions for
these providers in Solaris and illumos. Thus, existing scripts for these
providers should work unmodified on FreeBSD.

Tested by:	gnn, hiren
MFC after:	1 month
2013-08-25 21:54:41 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Hiroki Sato
5a04191532 Return 0 in nbi->expire when la_expire == 0. Conversion from time_uptime to
time_second should not be performed in this case.
2013-08-17 07:14:45 +00:00
Hiroki Sato
ffa0165ae0 Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6,
SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl.  These userland interfaces
treat expiration times in time_second, not time_uptime.
2013-08-06 17:10:52 +00:00
Hiroki Sato
7d26db1792 - Use time_uptime instead of time_second in data structures for
PF_INET6 in kernel.  This fixes various malfunction when the wall time
  clock is changed.  Bump __FreeBSD_version to 1000041.

- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.

MFC after:	1 month
2013-08-05 20:13:02 +00:00
Hiroki Sato
0de0dd9be8 Allocate in6_ifextra (ifp->if_afdata[AF_INET6]) only for IPv6-capable
interfaces.  This eliminates unnecessary IPv6 processing for non-IPv6
interfaces.

MFC after:	3 days
2013-07-31 16:24:49 +00:00
Hiroki Sato
af8056441e - Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
  regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
  To configure an autoconfigured link-local address (RFC 4862), the
  following rc.conf(5) configuration can be used:

   ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
  added when the parent interface or one of the existing member
  interfaces has an IPv6 address.  if_bridge(4) merges each link-local
  scope zone which the member interfaces form respectively, so it causes
  address scope violation.  Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
  unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by:	rpaulo [*]
MFC after:	1 week
2013-07-02 16:58:15 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Kevin Lo
b3dcd51dde Clean up some unused leftover code.
Pointed out by:	ae
2013-03-22 01:45:54 +00:00
Navdeep Parhar
63a97a4040 Generate lle_event in the IPv6 neighbor discovery code too.
Reviewed by:	bz@
2013-01-26 00:05:22 +00:00
Navdeep Parhar
f31b83e118 Avoid NULL dereference in nd6_storelladdr when no mbuf is provided. It
is called this way from a couple of places in the OFED code.  (toecore
calls it too but that's going to change shortly).

Reviewed by:	bz@
2013-01-25 23:11:13 +00:00
Gleb Smirnoff
b1ec2940af Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
Andrey V. Elsukov
73cb2f38f2 Reduce the overhead of locking, use IF_AFDATA_RLOCK() when we are doing
simple lookups.

Sponsored by:	Yandex LLC
MFC after:	1 week
2012-11-16 12:12:02 +00:00
Xin LI
6f56329a25 Remove __P.
Submitted by:	kevlo
Reviewed by:	md5(1)
MFC after:	2 months
2012-10-22 21:49:56 +00:00