Commit Graph

256 Commits

Author SHA1 Message Date
melifaro
595bcb4ce1 Unify setting lladdr for AF_INET[6]. 2015-11-07 11:12:00 +00:00
melifaro
b26720e1df Fix deletion of ifaddr lle entries when deleting prefix from interface in
down state.

Regression appeared in r287789, where the "prefix has no corresponding
  installed route" case was forgotten. Additionally, lltable_delete_addr()
  was called with incorrect byte order (default is network for lltable code).
While here, improve comments on given cases and byte order.

PR:		203573
Submitted by:	phk
2015-10-18 12:26:25 +00:00
melifaro
4fed811000 rtsock requests for deleting interface address lles started to return EPERM
instead of old "ignore-and-return 0" in r287789. This broke arp -da /
  ndp -cn behavior (they exit on rtsock command failure). Fix this by
  translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and
  making arp/ndp ignore these entries in batched delete.

MFC after:	2 weeks
2015-09-27 04:54:29 +00:00
melifaro
44be0cdaa8 Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
  if old ifp fib differes from new one, that the caller
  is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
2015-09-16 06:23:15 +00:00
melifaro
5ad1f2444d * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
  more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3573
2015-09-14 16:48:19 +00:00
melifaro
54b3b78856 * Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
  return existing if found, create if not. This behaviour was error-prone
  since we had to deal with 'sudden' static<>dynamic lle changes.
  This commit fixes bunch of different issues like:
  - refcount leak when lle is converted to static.
    Simple check case:
    console 1:
    while true;
      do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
        do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
      done;
    done
   console 2:
    ping -f any-dead-host-in-L2
   console 3:
    # watch for memory consumption:
    vmstat -m | awk '$1~/lltable/{print$2}'
  - possible problems in arptimer() / nd6_timer() when dropping/reacquiring
   lock.
  New logic explicitly handles use-or-create cases in every lla_create
  user. Basically, most of the changes are purely mechanical. However,
  we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
  lle_free_t callback for entry deletion
2015-08-20 12:05:17 +00:00
melifaro
0c24547a66 Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.
2015-08-11 12:38:54 +00:00
melifaro
d8f92ce2cf Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by:	Yandex LLC
2015-08-11 09:26:11 +00:00
melifaro
8e6b3a8d59 MFP r276712.
* Split lltable_init() into lltable_allocate_htbl() (alloc
  hash table with default callbacks) and lltable_link() (
  links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
2015-08-11 05:51:00 +00:00
melifaro
4f240a9c31 Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
  format (code is not indented properly to minimize diff). Will be fixed
  in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
  format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
  lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
 access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
  in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
  Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by:	ae
2015-08-10 12:03:59 +00:00
marius
fbf037b786 Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT. 2015-08-08 21:41:59 +00:00
melifaro
20bb5966e2 MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
ae
75425458ac Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
glebius
14b7122d6d Provide functions to determine presence of a given address
configured on a given interface.

Discussed with:	np
Sponsored by:	Nginx, Inc.
2015-04-17 11:57:06 +00:00
rrs
e83a007798 This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by:	adrian, jhb and bz
Sponsored by:	Netflix Inc.
2015-02-09 19:28:11 +00:00
glebius
99f4ec50e8 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
hrs
a313921d44 Add a change missing in r271916. 2014-09-21 04:38:50 +00:00
delphij
3fc68a2c42 Restore historical behavior of in_control, which, when no matching address
is found, the first usable address is returned for legacy ioctls like
SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR.

While there also fix a subtle issue that a caller from a jail asking for
INADDR_ANY may get the first IP of the host that do not belong to the jail.

Submitted by:	glebius
Differential Revision: https://reviews.freebsd.org/D667
2014-08-22 19:08:12 +00:00
smh
3ae4520699 Ensure that IP's added to CARP always use the CARP MAC
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.

This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
2014-07-31 16:43:56 +00:00
smh
8d283ef60a Only check error if one could have been generated 2014-07-31 09:18:29 +00:00
glebius
d32e428cc3 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
asomers
130691029e Fix a panic when removing an IP address from an interface, if the same address
exists on another interface.  The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS.  The solution
is to use the interface fib in that call.  For the majority of users, that will
be equivalent to the legacy behavior.

PR:		kern/189089
Reported by:	neel
Reviewed by:	neel
MFC after:	3 weeks
X-MFC with:	264887
Sponsored by:	Spectra Logic
2014-04-29 14:46:45 +00:00
asomers
f8a34b6f49 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00
asomers
6e7494c7e1 Fix host and network routes for new interfaces when net.add_addr_allfibs=0
sys/net/route.c
	In rtinit1, use the interface fib instead of the process fib.  The
	latter wasn't very useful because ifconfig(8) is usually invoked
	with the default process fib.  Changing ifconfig(8) to use setfib(2)
	would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear the expected ATF failure

sys/net/if.c
	Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
	Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
	in_scrubprefix.  Pass it the interface fib.

PR:		kern/187549
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-04-24 17:23:16 +00:00
kevlo
45fcb795ff Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
Tested with vlc and a test suite [1].

[1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

Reviewed by:	jhb, glebius, adrian
2014-04-07 01:53:03 +00:00
asomers
0b79334339 Correct ARP update handling when the routes for network interfaces are
restricted to a single FIB in a multifib system.

Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x".  This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.

sys/netinet/in.c:
	When dealing with RTM_ADD (add route) requests for an interface, use
	the interface's assigned FIB instead of the default (FIB 0).

sys/netinet/if_ether.c:
	In arpresolve(), enhance error message generated when an
	lla_lookup() fails so that the interface causing the error is
	visible in logs.

tests/sys/netinet/fibs_test.sh
	Clear ATF expected error.

PR:		kern/167947
Submitted by:	Nikolay Denev <ndenev@gmail.com> (previous version)
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-03-26 22:46:03 +00:00
melifaro
0092f95bcc Fix refcount leak on netinet ifa.
Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-01-16 12:35:18 +00:00
melifaro
cd97f8bba8 Simplify inet alias handling code: if we're adding/removing alias which
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

This eliminates the following rtsock messages:

Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).

Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

before commit, addition:

  got message of size 116 on Fri Jan 10 14:13:15 2014
  RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  got message of size 192 on Fri Jan 10 14:13:15 2014
  RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
  locks:  inits:
  sockaddrs: <DST,GATEWAY,NETMASK>
   10.0.0.0 10.0.0.2 (255) ffff ffff ff

after commit, addition:

  got message of size 116 on Fri Jan 10 13:56:26 2014
  RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

before commit, wihdrawal:

  got message of size 192 on Fri Jan 10 13:58:59 2014
  RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
  locks:  inits:
  sockaddrs: <DST,GATEWAY,NETMASK>
   10.0.0.0 10.0.0.2 (255) ffff ffff ff

  got message of size 116 on Fri Jan 10 13:58:59 2014
  RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

adter commit, withdrawal:

  got message of size 116 on Fri Jan 10 14:14:11 2014
  RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).

I've tested this change with quagga (no change) and bird (*).

bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.

I'm going to MFC this change if there will be no complains about behavior
change.

While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).

Sponsored by:	Yandex LLC
MFC after:	4 weeks
2014-01-10 12:13:55 +00:00
ae
941bb837f9 Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.

MFC after:	1 week
2014-01-03 02:32:05 +00:00
glebius
5bc18c5e57 Fix couple of bugs from r257692 related to scan of address list on
an interface:
- in in_control() skip over not AF_INET addresses.
- in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check
  of address family, w/o accessing memory beyond struct ifaddr.

Sponsored by:	Nginx, Inc.
2013-12-29 22:20:06 +00:00
glebius
5e103a98ba In r257692 I intentionally deleted code that handled P2P interfaces
with equal addresses on both sides. It appeared that OpenVPN uses
such configutations.

Submitted by:	trociny
2013-11-17 15:14:07 +00:00
glebius
3c1f482e0e Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.
2013-11-11 05:39:42 +00:00
glebius
1dbe9493b0 Provide compat layer for OSIOCAIFADDR. 2013-11-06 19:46:20 +00:00
glebius
d52532b483 Fix my braino in r257692. For SIOCG*ADDR we don't need exact match on
specified address, actually in most cases the address isn't specified.

Reported by:	peter
2013-11-06 08:36:08 +00:00
nwhitehorn
b6d1ab7a03 Fix build on GCC. 2013-11-06 01:14:00 +00:00
glebius
50953d29a1 Rewrite in_control(), so that it is comprehendable without getting mad.
o Provide separate functions for SIOCAIFADDR and for SIOCDIFADDR, with
  clear code flow from beginning to the end. After that the rest of
  in_control() gets very small and clear.
o Provide sx(9) lock to protect against parallel ioctl() invocations.
o Reimplement logic from r201282, that tried to keep localhost route in
  table when multiple P2P interfaces with same local address are created
  and deleted.

Discussed with:		pluknet, melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2013-11-05 07:44:15 +00:00
glebius
bce78dfe17 Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
2013-11-05 07:32:09 +00:00
glebius
396f790863 Cleanup in_ifscrub(), which is just an entry to in_scrubprefix(). 2013-11-01 10:18:41 +00:00
glebius
f469ae1d45 Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
glebius
564d02b304 Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:31:42 +00:00
ae
705a50a053 Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
oleg
9917da6df0 Plug static llentry leak (ipv4 & ipv6 were affected).
PR:		kern/172985
MFC after:	1 month
2013-04-21 21:28:38 +00:00
glebius
7f832c3059 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
peter
3f8d5a8f51 Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.
2013-01-03 10:21:28 +00:00
glebius
9f622a1b38 The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

  However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

  This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
2012-12-25 13:01:58 +00:00
glebius
08c9fb9d97 Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.
2012-12-24 21:35:48 +00:00
rrs
09ab09a1b5 Though I disagree, I conceed to jhb & Rui. Note
that we still have a problem with this whole structure of
locks and in_input.c [it does not lock which it should not, but
this *can* lead to crashes]. (I have seen it in our SQA
testbed.. besides the one with a refcnt issue that I will
have SQA work on next week ;-)
2012-08-19 11:54:02 +00:00
rrs
1bcd97d239 Ok jhb, lets move the ifa_free() down to the bottom to
assure that *all* tables and such are removed before
we start to free. This won't protect the Hash in ip_input.c
but in theory should protect any other uses that *do* use locks.

MFC after:	1 week (or more)
2012-08-17 05:51:46 +00:00
rrs
7c7c85dcac Its never a good idea to double free the same
address.

MFC after:	1 week (after the other commits ahead of this gets MFC'd)
2012-08-16 17:55:16 +00:00
glebius
abf245020a Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
  disestablish them.
  - This allows us to simplify the arptimer() and make it
    race safe.
o Consistently use ifp->if_afdata_lock to lock access to
  linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
  is attached to the hash.
  - Use LLE_LINKED to avoid double unlinking via consequent
    calls to llentry_free().
  - Mark lle with LLE_DELETED via |= operation istead of =,
    so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
  consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR:		kern/165863
Submitted by:	Andrey Zonov <andrey zonov.org>
Submitted by:	Ryan Stone <rysto32 gmail.com>
Submitted by:	Eric van Gyzen <eric_van_gyzen dell.com>
2012-08-02 13:57:49 +00:00