Commit Graph

129 Commits

Author SHA1 Message Date
Mark Johnston
f60d595f34 Remove a bogus comment from nd6_na_input().
The splnet() call that it refers to has been removed, and a lock for the
default router list is in fact needed.

MFC after:	1 week
2016-02-12 21:01:53 +00:00
Mark Johnston
baebd3e54f Remove superfluous return statements from the neighbour discovery code.
MFC after:	1 week
2016-02-12 20:55:22 +00:00
Alexander V. Chernikov
601c0b8bcc Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().
in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
  socket-less (ND code). The main reason for that change is inability to
  specify non-default FIB for callers w/o socket since (internally) inpcb
  is used to determine fib.

As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
  static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
  along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
  pass IPv6 address  w/ explicitly specified scope as separate argument.

Reviewed by:	ae (previous version)
2016-01-10 13:40:29 +00:00
Alexander V. Chernikov
9a1b64d5a0 Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
  just next-hop data (e.g. individual rtentry fields like flags or
  prefix/mask). Given that the goal is to avoid rte reference/refcounting,
  re-use rt_addrinfo structure to store most rte fields. If caller wants
  to retrieve key/mask or gateway (which are sockaddrs and are allocated
  separately), it needs to provide sufficient-sized sockaddrs structures
  w/ ther pointers saved in passed rt_addrinfo.

Convert:
  * lltable new records checks (in_lltable_rtcheck(),
    nd6_is_new_addr_neighbor().
  * rtsock pre-add/change route check.
  * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
     1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
       not be multiple host routes for such hosts 2) if we have multiple
       routes we should inspect them (which is not done). 3) the entire idea
       of abusing KRT as storage for ND proxy seems odd. Userland programs
       should be used for that purpose).
2016-01-04 15:03:20 +00:00
Alexander V. Chernikov
357ce739b9 Remove 'struct route_int6' argument from in6_selectsrc() and
in6_selectif().

The main task of in6_selectsrc() is to return IPv6 SAS (along with
  output interface used for scope checks). No data-path code uses
  route argument for caching. The only users are icmp6 (reflect code),
  ND6 ns/na generation code. All this fucntions are control-plane, so
  there is no reason to try to 'optimize' something by passing cached
  route into to ip6_output(). Given that, simplify code by eliminating
  in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
  used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
  too. While here, reshape rte-related code inside in6_selectif() to
  free lookup result immediately after saving all the needed fields.
2016-01-03 10:43:23 +00:00
Alexander V. Chernikov
9a7ee988b5 Use lltable_get_ifp() instead of direct access to lltable fields. 2016-01-01 12:35:33 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Steven Hartland
d6e82913c1 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
Steven Hartland
3a909afe8e Fix issues introduced by r292275
* Fix panic for etherswitches which don't have a LLADDR.
* Disabled DELAY in unsolicited NDA, which needs further work.
* Fixed missing DELAY in carp_send_na.
* style(9) fix.

Reported by:	kp & melifaro
X-MFC-With:	r292275
MFC after:	1 month
Sponsored by:	Multiplay
2015-12-16 22:26:28 +00:00
Steven Hartland
52e53e2de0 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
Alexander V. Chernikov
12cb7521c2 Remove LLE read lock from IPv6 fast path.
LLE structure is mostly unchanged during its lifecycle: there are only 2
things relevant for fast path lookup code:
1) link-level address change. Since r286722, these updates are performed
  under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
  we send NS to perform reachability verification instead of expiring entry.
  The only signal that is needed from fast path is something like binary
  yes/no.
The latter is solved by the following changes:

Special r_skip_req (introduced in D3688) value is used for fast path feedback.
  It is read lockless by fast path, but updated under req_mutex mutex. If this
  field is non-zero, then fast path will acquire lock and set it back to 0.

After transitioning to STALE state, callout timer is armed to run each
  V_nd6_delay seconds to make sure that if packet was transmitted at the start
  of given interval, we would be able to switch to PROBE state in V_nd6_delay
  seconds as user expects.
(in STALE state) timer is rescheduled until original V_nd6_gctimer expires
  keeping lle in STALE state (remaining timer value stored in lle_remtime).
(in STALE state) timer is rescheduled if packet was transmitted less that
  V_nd6_delay seconds ago to make sure we transition to PROBE state exactly
  after V_n6_delay seconds.

As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled
  by fast path without acquiring lle read lock.

Differential Revision:		https://reviews.freebsd.org/D3780
2015-12-13 07:39:49 +00:00
Alexander V. Chernikov
ddd208f7ad Unify setting lladdr for AF_INET[6]. 2015-11-07 11:12:00 +00:00
Hiroki Sato
6401c828ce - Schedule DAD for IN6_IFF_TENTATIVE addresses in nd6_timer(). This
catches cases that DAD probes cannot be sent because of
  IFF_UP && !IFF_DRV_RUNNING.

- nd6_dad_starttimer() now calls nd6_dad_ns_output(), instead of
  calling it before nd6_dad_starttimer().

- Do not release an entry in dadq when a duplicate entry is being
  added.
2015-10-03 12:09:12 +00:00
Alexander V. Chernikov
1558cb2448 Eliminate nd6_nud_hint() and its TCP bindings.
Initially function was introduced in r53541 (KAME initial commit) to
  "provide hints from upper layer protocols that indicate a connection
  is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability
  Confirmation).
However, it was converted to do nothing (e.g. just return) in r122922
  (tcp_hostcache implementation) back in 2003. Some defines were moved
  to tcp_var.h in r169541. Then, it was broken (for non-corner cases)
  by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So,
  right now this code is broken and has no "real" base users.

Differential Revision:	https://reviews.freebsd.org/D3699
2015-09-27 05:29:34 +00:00
Alexander V. Chernikov
aa5f023eaf Unify nd6 state switching by using newly-created nd6_llinfo_setstate()
function. The change is mostly mechanical with the following exception:
Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT
  condition was removed as always-true, explicit ND6_LLINFO_NOSTATE ->
  ND6_LLINFO_INCOMPLETE state transition was removed as duplicate.

Reviewed by:	ae
Sponsored by:	Yandex LLC
2015-09-21 11:19:53 +00:00
Alexander V. Chernikov
26deb8826c Do not pass lle to nd6_ns_output(). Use newly-added
nd6_llinfo_get_holdsrc() to extract desired IPv6 source
  from holdchain and pass it to the nd6_ns_output().
2015-09-05 14:14:03 +00:00
Hiroki Sato
fb583bd228 - Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa().  An address
  which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
  since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
  when the interface has ND6_IFF_IFDISABLED.  And Do IFF_UP and
  IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.
2015-08-24 05:21:49 +00:00
Alexander V. Chernikov
314294de5c Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by:	Yandex LLC
2015-08-11 09:26:11 +00:00
Hiroki Sato
59333867ff - Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
  does not work with either NDP or ARP for IPv4.

- draft-ietf-6man-enhanced-dad is now RFC 7527.

Discussed with:	hiren
MFC after:	3 days
2015-05-12 03:31:57 +00:00
Gleb Smirnoff
0fa5aacd8b Remove #ifdef IFT_FOO.
Submitted by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:31:27 +00:00
Andrey V. Elsukov
efb19cf6db Fix the check for maximum mbuf's size needed to send ND6 NA and NS.
It is acceptable that the size can be equal to MCLBYTES. In the later
KAME's code this check has been moved under DIAGNOSTIC ifdef, because
the size of NA and NS is much smaller than MCLBYTES. So, it is safe to
replace the check with KASSERT.

PR:		199304
Discussed with:	glebius
MFC after:	1 week
2015-04-09 12:57:58 +00:00
Andrey V. Elsukov
ff9f2a36de To avoid a possible race, release the reference to ifa after return
from nd6_dad_na_input().

Submitted by:	Alexandre Martins
MFC after:	1 week
2015-03-19 00:04:25 +00:00
Hiroki Sato
23e9ffb0e1 - Implement loopback probing state in enhanced DAD algorithm.
- Add no_dad and ignoreloop per-IF knob.  no_dad disables DAD completely,
  and ignoreloop is to prevent infinite loop in loopback probing state when
  loopback is permanently expected.
2015-03-05 21:27:49 +00:00
Andrey V. Elsukov
1eef8a6c08 Create nd6_ns_output_fib() function with extra argument fibnum. Use it
to initialize mbuf's fibnum. Uninitialized fibnum value can lead to
panic in the routing code. Currently we use only RT_DEFAULT_FIB value
for initialization.

Differential Revision:	https://reviews.freebsd.org/D1998
Reviewed by:	hrs (previous version)
Sponsored by:	Yandex LLC
2015-03-03 10:50:03 +00:00
Hiroki Sato
8d56075939 Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0. 2015-03-03 04:28:19 +00:00
Hiroki Sato
11d8451df3 Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.

This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet.  This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).

The length of the nonce is set to 6 bytes.  This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.

Reported by:		hiren
Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D1835
2015-03-02 17:30:26 +00:00
Alexander V. Chernikov
d7968c29ec * Use newly-created nd6_grab_holdchain() function to retrieve lle
hold mbuf chain instead of calling full-blown nd6_output_lle()
  for each packet. This simplifies both callers and nd6_output_lle()
  implementation.
* Make nd6_output_lle() static and remove now-unused lle and chain
  arguments.
* Rename nd6_output_flush() -> nd6_flush_holdchain() to be consistent.
* Move all pre-send transmit hooks to newly-created nd6_output_ifp().
  Now nd6_output(), nd6_output_lle() and nd6_flush_holdchain() are using
  it to send mbufs to if_output.
* Remove SeND hook from nd6_na_input() because it was implemented
  incorrectly since the beginning (r211501):
  - it tagged initial input mbuf (m) instead of m_hold
  - tagging _all_ mbufs in holdchain seems to be wrong anyway.
2015-01-08 18:02:05 +00:00
Robert Watson
ed6a66ca6c To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision:	https://reviews.freebsd.org/D1436
Reviewed by:	glebius, trasz, gnn, bz
Sponsored by:	EMC / Isilon Storage Division
2015-01-05 09:58:32 +00:00
Mark Johnston
a37271c3b8 Revert r275695: nd6_dad_find() was already correct.
Reported by:	ae, kib
Pointy hat to:	markj
2014-12-11 09:16:45 +00:00
Mark Johnston
97712e3efc Fix a bug in r266857: nd6_dad_find() must return NULL if it doesn't find
a matching element in the DAD queue.

Reported by:	Holger Hans Peter Freyther <holger@freyther.de>
MFC after:	3 days
2014-12-11 00:41:54 +00:00
Mark Johnston
d6ad6a865a Add refcounting to IPv6 DAD objects and simplify the DAD code to fix a
number of races which could cause double frees or use-after-frees when
performing DAD on an address. In particular, an IPv6 address can now only be
marked as a duplicate from the DAD callout.

Differential Revision:	https://reviews.freebsd.org/D1258
Reviewed by:		ae, hrs
Reported by:		rstone
MFC after:		1 month
2014-12-08 04:44:40 +00:00
Alexander V. Chernikov
e0c0711e01 * Make nd6_dad_duplicated() constant.
* Simplify refcounting by using nd6_dad_add() / nd6_dad_del().

Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-11-10 16:01:39 +00:00
Alexander V. Chernikov
9c9bde01d1 Remove unused 'struct route *' argument from nd6_output_flush(). 2014-11-09 16:20:27 +00:00
Hiroki Sato
82a9fa4a1d Add rwlock to struct dadq. A panic could occur when a large number of
addresses performed DAD at the same time.
2014-05-29 20:53:53 +00:00
Hiroki Sato
705bef548a Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as
possible and do not clear IN6_IFF_TENTATIVE.  If IFDISABLED was accidentally
set after a DAD started, TENTATIVE could be cleared because no NA was
received due to IFDISABLED, and as a result it could prevent DAD when
manually clearing IFDISABLED after that.
2014-05-16 15:53:31 +00:00
Alexander V. Chernikov
cf58751a44 Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag.
Hashing method is the same as in in6_src.c. (Probably we need better one).

MFC after:	2 weeks
2014-04-26 16:46:33 +00:00
Andrey V. Elsukov
ea0c377602 lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

Reviewed by:	glebius, adrian
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-01-02 08:40:37 +00:00
Andrey V. Elsukov
ee674966f4 Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already
unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps
simplify the code and remove several now unused variables.

PR:		156283
MFC after:	2 weeks
2013-11-11 12:49:00 +00:00
Gleb Smirnoff
10e5acc3c6 - Use m_getcl() instead of hand allocating.
- Do not calculate constant length values at run time,
  CTASSERT() their sanity.
- Remove superfluous cleaning of mbuf fields after allocation.
- Replace compat macros with function calls.

Sponsored by:	Nginx, Inc.
2013-03-15 13:48:53 +00:00
Navdeep Parhar
63a97a4040 Generate lle_event in the IPv6 neighbor discovery code too.
Reviewed by:	bz@
2013-01-26 00:05:22 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Andre Oppermann
c9b652e3e8 Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
2012-10-18 13:57:24 +00:00
Gleb Smirnoff
bf9840512a When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
  not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
  depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
  but empty route should use RO_RTFREE() to free results of
  lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
  ro->ro_rt == NULL.

Tested by:	tuexen (SCTP part)
2012-07-04 07:37:53 +00:00
Bjoern A. Zeeb
1b46c7f832 Allow to provide a hint to in6_selectsrc() for the interface using the
return ifnet double pointer.
Pass that hint down to in6_selectif() to be used when i) the default FIB
is queried and ii) route lookup fails because the network is not present
(i.e. someone deleted the connected subnet).
This hint should not be generally used from anywhere outside the neighbor
discovery code.  We just make use of it from nd6_ns_output().

Extend the nd6_na_output() interface by a nd6_na_output_fib() version
and pass the FIB number from the NS mbuf on to NA to allow the new mbuf
to inherit the FIB tag and a later lookup from ip6_output() to succeed
in the aformentioned example case.
Provide a wrapper function for the old public interface also used from
CARP but mark it with BURN_BRIDGES to cleanup in HEAD after MFC.

Sponsored by:	Cisco Systems, Inc.
2012-02-14 11:51:32 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Gleb Smirnoff
08b68b0e4c A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
Gleb Smirnoff
d5378bb633 Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation. 2011-10-13 13:33:23 +00:00
Hiroki Sato
e7fa8d0ada - Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
  This controls if accepting a route in an RA message as the default route.
  The default value for each interface can be set by net.inet6.ip6.no_radr.
  The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif.  This controls if setting R-bit in
  NA on RA accepting interfaces.  The default is 0 (R-bit is set based on
  net.inet6.ip6.forwarding).

Background:

 IPv6 host/router model suggests a router sends an RA and a host accepts it for
 router discovery.  Because of that, KAME implementation does not allow
 accepting RAs when net.inet6.ip6.forwarding=1.  Accepting RAs on a router can
 make the routing table confused since it can change the default router
 unintentionally.

 However, in practice there are cases where we cannot distinguish a host from
 a router clearly.  For example, a customer edge router often works as a host
 against the ISP, and as a router against the LAN at the same time.  Another
 example is a complex network configurations like an L2TP tunnel for IPv6
 connection to Internet over an Ethernet link with another native IPv6 subnet.
 In this case, the physical interface for the native IPv6 subnet works as a
 host, and the pseudo-interface for L2TP works as the default IP forwarding
 route.

Problem:

 Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
 accepting them when net.inet6.ip6.forward=0 cause the following practical
 issues:

 - A router cannot perform SLAAC.  It becomes a problem if a box has
   multiple interfaces and you want to use SLAAC on some of them, for
   example.  A customer edge router for IPv6 Internet access service
   using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
   physical interface for administration purpose; updating firmware
   and so on (link-local addresses can be used there, but GUAs by
   SLAAC are often used for scalability).

 - When a host has multiple IPv6 interfaces and it receives multiple RAs on
   them, controlling the default route is difficult.  Router preferences
   defined in RFC 4191 works only when the routers on the links are
   under your control.

Details of Implementation Changes:

 Router Advertisement messages will be accepted even when
 net.inet6.ip6.forwarding=1.  More precisely, the conditions are as
 follow:

 (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
	=> Normal RA processing on that interface. (as IPv6 host)

 (ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
	=> Accept RA but add the router to the defroute list with
	   rtlifetime=0 unconditionally.  This effectively prevents
	   from setting the received router address as the box's
	   default route.

 (!ACCEPT_RTADV)
	=> No RA processing on that interface.

 ACCEPT_RTADV and NO_RADR are per-interface knob.  In short, all interface
 are classified as "RA-accepting" or not.  An RA-accepting interface always
 processes RA messages regardless of ip6.forwarding.  The difference caused by
 NO_RADR or ip6.forwarding is whether the RA source address is considered as
 the default router or not.

 R-bit in NA on the RA accepting interfaces is set based on
 net.inet6.ip6.forwarding.  While RFC 6204 W-1 rule (for CPE case) suggests
 a router should disable the R-bit completely even when the box has
 net.inet6.ip6.forwarding=1, I believe there is no technical reason with
 doing so.  This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
 (the default is 0).

Usage:

 # ifconfig fxp0 inet6 accept_rtadv
	=> accept RA on fxp0
 # ifconfig fxp0 inet6 accept_rtadv no_radr
	=> accept RA on fxp0 but ignore default route information in it.
 # sysctl net.inet6.ip6.norbit_no_radr=1
	=> R-bit in NAs on RA accepting interfaces will always be set to 0.
2011-06-06 02:14:23 +00:00
Bjoern A. Zeeb
0105c5eb47 Fix IPv6 ND. After r219562 we in nd6_ns_input() were erroneously always
passing the cached proxydl reference (sockaddr_dl initialized or not) to
nd6_na_output().  nd6_na_output() will thus assume a proxy NA.  Revert to
conditionally passing either &proxydl or NULL if no proxy case desired.

Tested by:	ipv6gw and ref9-i386
Reported by:	Pete French (petefrench ingresso.co.uk on stable)
Reported by:	bz, simon on Y! cluster
Reported by:	kib
PR:		kern/151908
MFC after:	3 days
2011-04-17 16:07:08 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00