223 Commits

Author SHA1 Message Date
rrs
dc494194a2 This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped.  Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after:	1 Month with associated callout change.
2015-11-13 22:51:35 +00:00
melifaro
595bcb4ce1 Unify setting lladdr for AF_INET[6]. 2015-11-07 11:12:00 +00:00
melifaro
4fed811000 rtsock requests for deleting interface address lles started to return EPERM
instead of old "ignore-and-return 0" in r287789. This broke arp -da /
  ndp -cn behavior (they exit on rtsock command failure). Fix this by
  translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and
  making arp/ndp ignore these entries in batched delete.

MFC after:	2 weeks
2015-09-27 04:54:29 +00:00
melifaro
5ad1f2444d * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
  more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3573
2015-09-14 16:48:19 +00:00
hrs
0529f0c80f Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6 forgotten in the previous commit.
MFC after:	3 days
2015-09-10 08:37:03 +00:00
hrs
e161347f7d Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD.
MFC after:	3 days
2015-09-10 06:10:30 +00:00
melifaro
d013cff635 Do not skip entries without LLE_VALID flag.
This one fixes showing incomplete entries in ndp -an.

MFC after:	2 weeks
2015-09-05 06:24:00 +00:00
melifaro
3e1524c83e Make in6ifa_ifpwithaddr() take const param.
Remove unneded DECONST from in6_lltable_rtcheck().
2015-09-05 05:54:09 +00:00
melifaro
3f9699bcfe Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in
alloc handler, based on flags.
2015-08-31 05:03:36 +00:00
hrs
d27954934d - Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa().  An address
  which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
  since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
  when the interface has ND6_IFF_IFDISABLED.  And Do IFF_UP and
  IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.
2015-08-24 05:21:49 +00:00
melifaro
54b3b78856 * Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
  return existing if found, create if not. This behaviour was error-prone
  since we had to deal with 'sudden' static<>dynamic lle changes.
  This commit fixes bunch of different issues like:
  - refcount leak when lle is converted to static.
    Simple check case:
    console 1:
    while true;
      do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
        do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
      done;
    done
   console 2:
    ping -f any-dead-host-in-L2
   console 3:
    # watch for memory consumption:
    vmstat -m | awk '$1~/lltable/{print$2}'
  - possible problems in arptimer() / nd6_timer() when dropping/reacquiring
   lock.
  New logic explicitly handles use-or-create cases in every lla_create
  user. Basically, most of the changes are purely mechanical. However,
  we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
  lle_free_t callback for entry deletion
2015-08-20 12:05:17 +00:00
melifaro
0c24547a66 Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.
2015-08-11 12:38:54 +00:00
melifaro
d8f92ce2cf Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by:	Yandex LLC
2015-08-11 09:26:11 +00:00
melifaro
8e6b3a8d59 MFP r276712.
* Split lltable_init() into lltable_allocate_htbl() (alloc
  hash table with default callbacks) and lltable_link() (
  links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
2015-08-11 05:51:00 +00:00
melifaro
4f240a9c31 Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
  format (code is not indented properly to minimize diff). Will be fixed
  in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
  format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
  lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
 access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
  in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
  Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by:	ae
2015-08-10 12:03:59 +00:00
marius
fbf037b786 Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT. 2015-08-08 21:41:59 +00:00
melifaro
20bb5966e2 MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
ae
75425458ac Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
ae
291abe56fe Invoke LLE event handler when entry is deleted.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:54:50 +00:00
ae
a1e13e9e59 Move RTM announces into generic code to be independent from Layer2 code.
This fixes bug introduced in 274988, when announces about new addresses
don't sent for tunneling interfaces.

Reported by:	tuexen@
MFC after:	1 week
2015-05-29 10:24:16 +00:00
glebius
1ca5d173ab Remove #ifdef IFT_FOO.
Submitted by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:31:27 +00:00
glebius
37c56fd2f8 Fix r281649: don't call in6_clearscope() twice.
Submitted by:	ae
2015-04-17 15:26:08 +00:00
glebius
14b7122d6d Provide functions to determine presence of a given address
configured on a given interface.

Discussed with:	np
Sponsored by:	Nginx, Inc.
2015-04-17 11:57:06 +00:00
hrs
218de7f1d6 - Implement loopback probing state in enhanced DAD algorithm.
- Add no_dad and ignoreloop per-IF knob.  no_dad disables DAD completely,
  and ignoreloop is to prevent infinite loop in loopback probing state when
  loopback is permanently expected.
2015-03-05 21:27:49 +00:00
rrs
e83a007798 This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by:	adrian, jhb and bz
Sponsored by:	Netflix Inc.
2015-02-09 19:28:11 +00:00
ae
e079051f88 Print IPv6 address in log message instead of address of pointer.
MFC after:	1 week
2015-02-05 16:29:26 +00:00
ae
e9254e774f Remove link-local multicast routes remnants from in6_purgeaddr.
Also merge in6_purgeaddr_mc with in6_purgeaddr.

Sponsored by:	Yandex LLC
2014-11-10 16:01:31 +00:00
glebius
f28e253225 Consistently use if_link.
Reviewed by:	ae, melifaro
2014-11-10 15:56:30 +00:00
melifaro
b5d711d3a6 Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.

No objections from:	net@
2014-11-09 21:33:01 +00:00
melifaro
11af63037f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
hrs
948508e096 Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when
the newly-added IPv6 address was /128.

PR:	188032
2014-11-02 21:58:31 +00:00
ae
b27ddf1ff3 Do not automatically install routes to link-local and interface-local multicast
addresses.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-10-27 16:15:15 +00:00
ae
8d3b28e72b Remove unused function.
Sponsored by:	Yandex LLC
2014-10-27 10:34:09 +00:00
ae
e12a77b0cb Add const qualifier to in6_addrhash() function.
Add in6ifa_ifwithaddr() function. It is similar to ifa_ifwithaddr,
but does fast lookup in the hash of inet6 addresses.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-09-11 13:18:41 +00:00
markj
72d1944ad2 Add some missing checks for unsupported interfaces (e.g. pflog(4)) when
handling ioctls. While here, remove duplicated checks for a NULL ifp in
in6_control(): this check is already done near the beginning of the
function.

PR:		189117
Reviewed by:	hrs
MFC after:	2 weeks
2014-08-22 19:21:08 +00:00
glebius
d32e428cc3 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
melifaro
6ee59753ec Further rework netinet6 address handling code:
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
  in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
  in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
  in6_ifaddloop -> nd6_add_ifa_lle
  in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
2014-01-19 16:07:27 +00:00
melifaro
4e82296063 Add in6_prepare_ifra() function to ease preparing in-kernel IPv6
address requests.

MFC after:	2 weeks
2014-01-18 20:32:59 +00:00
melifaro
9f1142ff95 Do some style(9) not done in r260851 to improve readability.
MFC after:	2 weeks
2014-01-18 15:57:43 +00:00
melifaro
9b02dc0fae Split in6_update_ifa() into smaller pieces leaving functionality intact.
Discussed with:	ae
MFC after:	2 weeks
2014-01-18 15:52:52 +00:00
melifaro
db2be6a793 Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code
to do IPv6 addr & mask in different places.

MFC after:	2 weeks
2014-01-08 22:13:32 +00:00
ae
941bb837f9 Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.

MFC after:	1 week
2014-01-03 02:32:05 +00:00
glebius
3c1f482e0e Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.
2013-11-11 05:39:42 +00:00
glebius
f469ae1d45 Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
glebius
564d02b304 Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:31:42 +00:00
des
21e6fd796b Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory.  [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks.  [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem.  [SA-13:13]

Security:	CVE-2013-5666
Security:	FreeBSD-SA-13:11.sendfile
Security:	CVE-2013-5691
Security:	FreeBSD-SA-13:12.ifioctl
Security:	CVE-2013-5710
Security:	FreeBSD-SA-13:13.nullfs
Approved by:	re
2013-09-10 10:05:59 +00:00
hrs
13c1bcf2c1 - Use time_uptime instead of time_second in data structures for
PF_INET6 in kernel.  This fixes various malfunction when the wall time
  clock is changed.  Bump __FreeBSD_version to 1000041.

- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.

MFC after:	1 month
2013-08-05 20:13:02 +00:00
hrs
64e5ea0653 Allocate in6_ifextra (ifp->if_afdata[AF_INET6]) only for IPv6-capable
interfaces.  This eliminates unnecessary IPv6 processing for non-IPv6
interfaces.

MFC after:	3 days
2013-07-31 16:24:49 +00:00
ae
9bfe2ac5dd Correct the size of allocated memory to store array of counters. 2013-07-09 15:20:46 +00:00
ae
08c6719ac4 Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters. 2013-07-09 09:59:46 +00:00