Commit Graph

1105 Commits

Author SHA1 Message Date
Hiroki Sato
154d5f7321 Fix a problem that an interface unexpectedly becomes IFF_UP by
just doing "ifconfing inet6 -ifdisabled" when the interface has
ND6_IFF_AUTO_LINKLOCAL flag and no link-local address.
2011-10-16 19:46:52 +00:00
Gleb Smirnoff
d5378bb633 Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation. 2011-10-13 13:33:23 +00:00
Gleb Smirnoff
b590b6ae80 Restore functions in6_ifaddloop() and in6_ifremloop() that were
inlined by Qing Li in his big new-ARP commit. I am going to utilize
them in my newcarp work, and also these functions left declared
in in6_var.h for all the time they were absent.

Reviewed by:	bz
2011-10-13 13:05:36 +00:00
Qing Li
6c6aa80c9d The IFA_RTSELF instead of the IFA_ROUTE flag should be checked to
determine if a loopback route should be installed for an interface
IPv6 address. Another condition is the address must not belong to a
looopback interface.

Reviewed by:	hrs
MFC after:	3 days
2011-10-05 16:27:11 +00:00
Bjoern A. Zeeb
d7ae37140a Fix an obvious bug from r186196 shadowing a variable, not correctly
appending the new mbuf to the chain reference but possibly causing an mbuf
nextpkt loop leading to a memory used after handoff (or having been freed)
and leaking an mbuf here.

Reviewed by:	rwatson, brooks
MFC after:	3 days
2011-09-30 18:20:16 +00:00
Kip Macy
1eeb6d97d0 Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.
Reviewed by:	rwatson, bz
Approved by:	re (kib)
2011-09-20 20:27:26 +00:00
Hiroki Sato
6090ab8bd6 Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixes
inconsistency when options are specified by both setsockopt() and ancillary
data types.

PR:		kern/158307
Approved by:	re (bz)
2011-09-20 00:29:17 +00:00
Hiroki Sato
049087a0f3 Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE
(r225485).  When setting an interface name to it, the following
configurations will be enabled:

 1. "no_radr" is set to all IPv6 interfaces automatically.

 2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif.  This is
    done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this
    means you can manually supersede this configuration if necessary).

 3. The node will add RA-sending routers to the default router list
    even if net.inet6.ip6.forwarding=1.

This mode is added to conform to RFC 6204 (a router which connects
the end-user network to a service provider network).  To enable
packet forwarding, you still need to set ipv6_gateway_enable=YES.

Note that accepting router entries into the default router list when
packet forwarding capability and a routing daemon are enabled can
result in messing up the routing table.  To minimize such unexpected
behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif.

Approved by:	re (bz)
2011-09-13 00:06:11 +00:00
Sergey Kandaurov
0ad2addc9d Fix if_addr_mtx recursion in mld6.
mld_set_version() is called only from mld_v1_input_query() and
mld_v2_input_query() both holding the if_addr_mtx lock, and then calling
into mld_v2_cancel_link_timers() acquires it the second time, which results
in mtx recursion. To avoid that, delay if_addr_mtx acquisition until after
mld_set_version() is called; while here, further reduce locking scope
to protect only the needed pieces: if_multiaddrs, in6m_lookup_locked().

PR:		kern/158426
Reported by:	Thomas <tps vr-web.de>,
		Tom Vijlbrief <tom.vijlbrief xs4all.nl>
Tested by:	Tom Vijlbrief
Reviewed by:	bz
Approved by:	re (kib)
2011-08-22 23:39:40 +00:00
Bjoern A. Zeeb
8a006adb24 Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from:	David Dolson at Sandvine Incorporated
		(original version for ipfw fwd IPv6 support)
Sponsored by:	Sandvine Incorporated
PR:		bin/117214
MFC after:	4 weeks
Approved by:	re (kib)
2011-08-20 17:05:11 +00:00
Bjoern A. Zeeb
90bc35de38 Add an in6_localip() helper function as in6_localaddr() is not doing what
people think: returning true for an address in any connected subnet, not
necessarily on the local machine.

Sponsored by:	Sandvine Incorporated
MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 16:43:47 +00:00
Michael Tuexen
ca85e9482a The result of a joint work between rrs@ and myself at the IETF:
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.

Approved by: re@
MFC after: 2 months.
2011-08-03 20:21:00 +00:00
Michael Tuexen
78d9a31d3a The socket API only specifies SCTP for SOCK_SEQPACKET and
SOCK_STREAM, but not SOCK_DGRAM. So don't register it for
SOCK_DGRAM.
While there, fix some indentation.
2011-07-12 19:29:29 +00:00
Marko Zec
13e255fab7 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
Bjoern A. Zeeb
e0bfbfce79 Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by:	mlaier
Submitted by:	eri
2011-06-28 11:57:25 +00:00
Bjoern A. Zeeb
869052041d Add the missing call to ip6_ipsec_filtertunnel() to be able to control
whether decapsulated IPsec packets will be passed to pfil again depending
on the setting of the net.ip6.ipsec6.filtertunnel sysctl.

PR:		kern/157670
Submitted by:	Manuel Kasper (mk neon1.net)
MFC after:	2 weeks
2011-06-08 10:59:36 +00:00
Bjoern A. Zeeb
ffe8cd7b10 Correct comments and debug logging in ipsec to better match reality.
MFC after:	3 days
2011-06-08 03:02:11 +00:00
Robert Watson
52cd27cb58 Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup.  pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups.  During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock.  By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details).  This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems".  However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies.  Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect.  In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-06 12:55:02 +00:00
Hiroki Sato
23be782526 Do not activate automatic LL addr configuration when 0/1->1 transition of
ND6_IFF_IFDISABLED flag.
2011-06-06 04:12:57 +00:00
Hiroki Sato
77bc49858c - Make the code more proactively clear an ND6_IFF_IFDISABLED flag when
an explicit action for INET6 configuration happens.  The changes are:

  1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
     setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
     an attempt to clear the ND6_IFF_IFDISABLED flag.

  2. When an AF_INET6 address is added successfully to an interface and
     it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
     ND6_IFF_IFDISABLED happens.

  This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
  in most cases manual configuration is no longer needed.

- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
  an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
  a link-local address.

  This change ensures link-local address configuration when "ifconfig IF inet6"
  command is invoked.  For example, "ifconfig IF inet6 auto_linklocal" now
  always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
  set to 1 (i.e. down/up cycle is no longer needed).

Reviewed by:	bz
2011-06-06 02:37:38 +00:00
Hiroki Sato
e7fa8d0ada - Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
  This controls if accepting a route in an RA message as the default route.
  The default value for each interface can be set by net.inet6.ip6.no_radr.
  The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif.  This controls if setting R-bit in
  NA on RA accepting interfaces.  The default is 0 (R-bit is set based on
  net.inet6.ip6.forwarding).

Background:

 IPv6 host/router model suggests a router sends an RA and a host accepts it for
 router discovery.  Because of that, KAME implementation does not allow
 accepting RAs when net.inet6.ip6.forwarding=1.  Accepting RAs on a router can
 make the routing table confused since it can change the default router
 unintentionally.

 However, in practice there are cases where we cannot distinguish a host from
 a router clearly.  For example, a customer edge router often works as a host
 against the ISP, and as a router against the LAN at the same time.  Another
 example is a complex network configurations like an L2TP tunnel for IPv6
 connection to Internet over an Ethernet link with another native IPv6 subnet.
 In this case, the physical interface for the native IPv6 subnet works as a
 host, and the pseudo-interface for L2TP works as the default IP forwarding
 route.

Problem:

 Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
 accepting them when net.inet6.ip6.forward=0 cause the following practical
 issues:

 - A router cannot perform SLAAC.  It becomes a problem if a box has
   multiple interfaces and you want to use SLAAC on some of them, for
   example.  A customer edge router for IPv6 Internet access service
   using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
   physical interface for administration purpose; updating firmware
   and so on (link-local addresses can be used there, but GUAs by
   SLAAC are often used for scalability).

 - When a host has multiple IPv6 interfaces and it receives multiple RAs on
   them, controlling the default route is difficult.  Router preferences
   defined in RFC 4191 works only when the routers on the links are
   under your control.

Details of Implementation Changes:

 Router Advertisement messages will be accepted even when
 net.inet6.ip6.forwarding=1.  More precisely, the conditions are as
 follow:

 (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
	=> Normal RA processing on that interface. (as IPv6 host)

 (ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
	=> Accept RA but add the router to the defroute list with
	   rtlifetime=0 unconditionally.  This effectively prevents
	   from setting the received router address as the box's
	   default route.

 (!ACCEPT_RTADV)
	=> No RA processing on that interface.

 ACCEPT_RTADV and NO_RADR are per-interface knob.  In short, all interface
 are classified as "RA-accepting" or not.  An RA-accepting interface always
 processes RA messages regardless of ip6.forwarding.  The difference caused by
 NO_RADR or ip6.forwarding is whether the RA source address is considered as
 the default router or not.

 R-bit in NA on the RA accepting interfaces is set based on
 net.inet6.ip6.forwarding.  While RFC 6204 W-1 rule (for CPE case) suggests
 a router should disable the R-bit completely even when the box has
 net.inet6.ip6.forwarding=1, I believe there is no technical reason with
 doing so.  This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
 (the default is 0).

Usage:

 # ifconfig fxp0 inet6 accept_rtadv
	=> accept RA on fxp0
 # ifconfig fxp0 inet6 accept_rtadv no_radr
	=> accept RA on fxp0 but ignore default route information in it.
 # sysctl net.inet6.ip6.norbit_no_radr=1
	=> R-bit in NAs on RA accepting interfaces will always be set to 0.
2011-06-06 02:14:23 +00:00
Hiroki Sato
7de7a90404 Use uint8_t for sockaddr sa_len.
Reviewed by:	bz
2011-06-05 11:40:30 +00:00
Robert Watson
d3c1f00350 Add _mbuf() variants of various inpcb-related interfaces, including lookup,
hash install, etc.  For now, these are arguments are unused, but as we add
RSS support, we will want to use hashes extracted from mbufs, rather than
manually calculated hashes of header fields, due to the expensive of the
software version of Toeplitz (and similar hashes).

Add notes that it would be nice to be able to pass mbufs into lookup
routines in pf(4), optimising firewall lookup in the same way, but the
code structure there doesn't facilitate that currently.

(In principle there is no reason this couldn't be MFCed -- the change
extends rather than modifies the KBI.  However, it won't be useful without
other previous possibly less MFCable changes.)

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-04 16:33:06 +00:00
Robert Watson
fa046d8774 Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
  inpcb counter.  This lock is now relegated to a small number of
  allocation and free operations, and occasional operations that walk
  all connections (including, awkwardly, certain UDP multicast receive
  operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
  looking up connections and bound sockets, manipulated using new
  INP_HASH_*() macros.  This lock, combined with inpcb locks, protects
  the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required.  As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb.  Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed.  In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup.  New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

  INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
  INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
  TCP, UDPv4, and UDPv6.  pcbinfo ipi_lock acquisitions are largely
  eliminated, and global hash lock hold times are dramatically reduced
  compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
  may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
  is no longer available -- hash lookup locks are now held only very
  briefly during inpcb lookup, rather than for potentially extended
  periods.  However, the pcbinfo ipi_lock will still be acquired if a
  connection state might change such that a connection is added or
  removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
  due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
  callers to acquire hash locks and perform one or more lookups atomically
  with 4-tuple allocation: this is required only for TCPv6, as there is no
  in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
  locking, which relates to source address selection.  This needs
  attention, as it likely significantly reduces parallelism in this code
  for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
  somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
  is no longer sufficient.  A second check once the inpcb lock is held
  should do the trick, keeping the general case from requiring the inpcb
  lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
  which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
  undesirable, and probably another argument is required to take care of
  this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics.  It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
Bjoern A. Zeeb
8d5a3ca77b Add FEATURE() definitions for IPv4 and IPv6 so that we can use
feature_present(3) to dynamically decide whether to use one or the
other family.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	10 days
2011-05-25 00:34:25 +00:00
Robert Watson
68e0d7e06a Move from passing a wildcard boolean to a general set up lookup flags into
in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly
for IPv6 functions.  In the future, we would like to support other flags
relating to locking strategy.

This change doesn't appear to modify the KBI in practice, as callers already
passed in INPLOOKUP_WILDCARD rather than a simple boolean.

MFC after:      3 weeks
Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-23 15:23:18 +00:00
Qing Li
5b84dc789a The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by:	delphij
MFC after:	5 days
2011-05-20 19:12:20 +00:00
Michael Tuexen
274b0bd51d Remove code with any effect. 2011-05-03 20:34:02 +00:00
Michael Tuexen
e6194c2ed4 Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
  IPv6.
* Checking for additional addresses takes the correct address
  into account and also does not do more comparisons than
  necessary.

This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.

MFC after: 1 week
2011-04-30 11:18:16 +00:00
Bjoern A. Zeeb
79288c112c Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only
as well compiling out most functions adding or extending #ifdef INET
coverage.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:17:00 +00:00
Bjoern A. Zeeb
67107f4594 Make the PCB code compile without INET support by adding #ifdef INETs
and correcting few #includes.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:04:34 +00:00
Bjoern A. Zeeb
db178eb816 Make IPsec compile without INET adding appropriate #ifdef checks.
Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c
to not need three different versions depending on INET, INET6 or both.

Mark two places preparing for not yet supported functionality with IPv6.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:28:42 +00:00
Bernd Walter
cae54c668c correct variable type name in comment 2011-04-25 09:00:52 +00:00
Bjoern A. Zeeb
1024547144 MFp4 CH=191760,191770:
Not compiling in and not initializing from inetsw from in_proto.c for
IPv6 only, we need to initialize upper layer protocols from inet6sw.
Make sure to not initialize them twice in a Dual-Stack
environment but only conditionally on no INET as we have done for
TCP for a long time.  Otherwise we would leak resources.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	3 days
2011-04-20 08:05:23 +00:00
Bjoern A. Zeeb
0105c5eb47 Fix IPv6 ND. After r219562 we in nd6_ns_input() were erroneously always
passing the cached proxydl reference (sockaddr_dl initialized or not) to
nd6_na_output().  nd6_na_output() will thus assume a proxy NA.  Revert to
conditionally passing either &proxydl or NULL if no proxy case desired.

Tested by:	ipv6gw and ref9-i386
Reported by:	Pete French (petefrench ingresso.co.uk on stable)
Reported by:	bz, simon on Y! cluster
Reported by:	kib
PR:		kern/151908
MFC after:	3 days
2011-04-17 16:07:08 +00:00
Bjoern A. Zeeb
e2a4005dcc Remove a check in udp6_send() that prevented v4-mapped v6 addresses from
working.  We store v4 and v6 addresses as a union but for v4-mapped
addresses only store the 32bits w/o the ::ffff: word.  That failed the
check as for example 127.0.0.1 would be ::7f00:1 rather than ::ffff:7f00:1
and the IN6_IS_ADDR_V4MAPPED() never worked here.  Given we can hardly get
here with an unbound local address or invalid inp_vflags remove the check.

Reported by:	tuexen
Reviewed by:	tuexen
MFC after:	3 days
2011-04-09 02:22:49 +00:00
Bjoern A. Zeeb
9537bb47b7 After r219579 and r219779 unbreak v4-mapped v6 sockets for UDP
some more.  Similar to what we do for TCP check for v4-mapped
addresses and then handle them or the normal v6 address case.
For either set inp_vflags before calling into the pcb connect
function so that we have an unambiguous view in case we need to
set the local address or port.

Looked at:	tuexen (as part of more)
MFC after:	3 days
2011-04-09 01:29:46 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb
efc76f729a Merge the two identical implementations for local port selections from
in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport().

MFC after:	2 weeks
2011-03-12 21:46:37 +00:00
Bjoern A. Zeeb
4a2b25621f Push a possible "unbind" in some situation from in6_pcbsetport() to
callers.  This also fixes a problem when the prison call could set
the inp->in6p_laddr (laddr) and a following priv_check_cred() call
would return an error and will allow us to merge the IPv4 and IPv6
implementation.

MFC after:	2 weeks
2011-03-12 16:45:15 +00:00
Bjoern A. Zeeb
8b529ca61e Make sure the locally cached value of rt->rt_gateway stays stable,
even after dropping the reference and unlocking. Previously we
have dereferenced a NULL pointer (after r121765).
Simply unlocking after the block does not work either because of
lock ordering (see r121765) and in addition we would still hold
a pointer to something that might be gone by the time we access it.
Thus take a copy of the value rather than just caching the pointer.

PR:		kern/151908
Submitted by:	chenyl (netstar2008 126.com) (initial version)
MFC after:	2 weeks
2011-03-12 09:41:25 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Michael Tuexen
4c97400f86 Fix bugs related to M_FLOWID:
* Store the flowid when receiving an SCTP/IPv6 packet.
* Store the flowid when receiving an SCTP packet with wrong CRC.
* Initilize flowid correctly.
* Put test code under INVARIANTS.
MFC after: 3 months.
2011-02-07 15:04:23 +00:00
Randall Stewart
5d40cf5d23 1) Typo correction in comments and one spacing change.
2) Mass update to all copyrights.
MFC after:	3 Months
2011-02-05 12:12:51 +00:00
Michael Tuexen
7c99d56fdf Improve plausibility check in sctp_handle_sack().
Allow cmt_on_off to support values 0 (no CMT), 1 (CMT), and 2 (CMT/RP).

MFC after: 3 months.
2010-12-22 17:59:38 +00:00
John Hay
e9a23b5585 Add IFT_L2VLAN to the list that is capable of supplying the ingredients
of the EUI64 part of an IPv6 address. Otherwise vlans will all use the
MAC address of the first ethernet interface of the system.

MFC after:	1 week
2010-12-22 11:58:31 +00:00
Bjoern A. Zeeb
1d5089c2c2 Loosen the locking in nd6-free() again after r216022 to avoid
a LOR and a recursed lock.

Reported by:	delphij
Tested by:	delphij
PR:		kern/148857
MFC After:	3 days
2010-12-07 22:43:29 +00:00
Bjoern A. Zeeb
e6950476b9 Plug well observed races on la_hold entries with the callout handler.
Call the handler function with the lock held, return unlocked as we
might free the entry.  Rework functions later in the call graph to be
either called with the lock held or, only if needed, unlocked.

Place asserts to document and tighten assumptions on various lle locking,
which were not always true before.

We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was
decentralized to minimize possible complexity introduced with the formerly
missing locking there.  This also resulted in a push down of local
variable scopes into smaller blocks.

Reported by:	many
PR:		kern/148857
Submitted by:	Dmitrij Tejblum (tejblum yandex-team.ru) (original version)
MFC After:	4 days
2010-11-29 00:04:08 +00:00
Rebecca Cran
6d79f3f6ae Fix more continuous/contiguous typos (cf. r215955) 2010-11-27 21:51:39 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Bjoern A. Zeeb
8987b01ea9 In case of an early return from the function there is no need to zero
the route upfront, so defer as long as we can.

MFC after:	3 days
2010-11-20 12:27:40 +00:00
Bjoern A. Zeeb
683525038b Do not initialize flag variables before needed.
Consistently use the LLE_ prefix for lla_lookup() and the ND6_ prefix
for nd6_lookup() even though both are defined the same. Use the right
flag variable when checking each.

No real functional change.

MFC after:	4 days
2010-11-17 10:43:20 +00:00
Bjoern A. Zeeb
20723e34e3 No need to re-initialize the callout. We initially do it in in6_lltable_new()
right after allocation.  Worse, we are losing the right flags here.

MFC after:	4 days
2010-11-17 09:25:08 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
Bjoern A. Zeeb
4a85b5e2ea Make the IPsec SADB embedded route cache a union to be able to hold both the
legacy and IPv6 route destination address.
Previously in case of IPv6, there was a memory overwrite due to not enough
space for the IPv6 address.

PR:		kern/122565
MFC After:	2 weeks
2010-10-23 20:35:40 +00:00
Rui Paulo
4c88812572 Purposely tell the compiler that we ignore the return value of ADDCARRY()
in the REDUCE macro.

Reviewed by:	dim, rdivacky
2010-10-13 10:45:22 +00:00
Xin LI
64e0f48e7c Add a bandaid for a long-standing race condition during route entry
un-expiring.

The previous version of code have no locking when testing rt_refcnt.
The result of the lack of locking may result in a condition where
a routing entry have a reference count but at the same time have
RTPRF_OURS bit set and an expiration timer.  These would eventually
lead to a panic:

	panic: rtqkill route really not free

When the system have ICMP redirects accepted from local gateway
in a moderate frequency, for instance.

Commit this workaround for now until we have some better solution.

PR:		kern/149804
Reviewed by:	bz
Tested by:	Zhao Xin, Pete French
MFC after:	2 weeks
2010-09-27 19:26:56 +00:00
Attilio Rao
5f6bf4518d IP_BINDANY is not correctly handled in getsockopt() case.
Fix it by specifying the correct bits.

Sponsored by:	Sandvine Incorporated
Reviewed by:	bz, emaste, rstone
Obtained from:	Sandvine Incorporated
MFC after:	10 days
2010-09-24 14:38:54 +00:00
Michael Tuexen
15537f41b4 Remove unused variables.
MFC after: 2 weeks.
2010-09-15 20:41:20 +00:00
Bjoern A. Zeeb
1b48d24533 MFp4 CH=183052 183053 183258:
In protosw we define pr_protocol as short, while on the wire
  it is an uint8_t.  That way we can have "internal" protocols
  like DIVERT, SEND or gaps for modules (PROTO_SPACER).
  Switch ipproto_{un,}register to accept a short protocol number(*)
  and do an upfront check for valid boundries. With this we
  also consistently report EPROTONOSUPPORT for out of bounds
  protocols, as we did for proto == 0.  This allows a caller
  to not error for this case, which is especially important
  if we want to automatically call these from domain handling.

  (*) the functions have been without any in-tree consumer
  since the initial introducation, so this is considered save.

  Implement ip6proto_{un,}register() similarly to their legacy IP
  counter parts to allow modules to hook up dynamically.

Reviewed by:	philip, will
MFC after:	1 week
2010-09-02 17:43:44 +00:00
Michael Tuexen
9c7635e18b Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.

MFC after: 4 weeks
2010-08-29 18:50:30 +00:00
Michael Tuexen
20083c2eb1 Fix the switching on/off of CMT using sysctl and socket option.
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.

MFC after: 4 weeks
2010-08-28 17:59:51 +00:00
Hajimu UMEMOTO
365ccde0fb optp may be NULL. 2010-08-20 17:52:49 +00:00
Ana Kukec
e7a6db7467 Fix mbuf leakages and remove unneccessary duplicate mbuf frees.
Use the right copy of an mbuf for the IP6_EXTHDR_CHECK.

Reported by:	zec, hrs
Approved by:	bz (mentor)
2010-08-19 23:16:44 +00:00
Ana Kukec
1db8d1f843 MFp4: anchie_soc2009 branch:
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space.  The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by:	bz (mentor)
2010-08-19 11:31:03 +00:00
Hajimu UMEMOTO
388288b202 Make `ping6 -I' work with net.inet6.ip6.use_defaultzone=1.
MFC after:	2 weeks
2010-08-17 17:30:56 +00:00
Bjoern A. Zeeb
8c09aa57d9 In rip6_input(), in case of multicast, we might skip the normal processing
and go to the next iteration early if multicast filtering would decide that
this socket shall not receive the data.
Unlock the pcb in that case or we leak the read lock and next time trying
to get a write lock, would hang forever.

PR:		kern/149608
Submitted by:	Chris Luke (chrisy flirble.org)
MFC after:	3 days
2010-08-14 14:13:44 +00:00
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Bjoern A. Zeeb
4f7495d32a MFp4 CH180235:
Add proto spacers to inet6sw like we have for legacy IP. This allows us
to dynamically pf_proto_register() for INET6 from modules, needed by
upcoming CARP changes and SeND.
MC and SCTP could make use of it as well in theory in the future after
upcoming VIMAGE vnet teardown work.

Discussed with:	will, anchie
MFC after:	10 days
2010-08-09 19:53:24 +00:00
Bjoern A. Zeeb
19291ab3de Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer.  It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by:	rwatson
MFC after:	6 days
2010-07-31 21:33:18 +00:00
Bjoern A. Zeeb
101235dcb3 Since r186119 IP6 input counters for octets and packets were not
working anymore.  In addition more checks and operations were missing.

In case lla_lookup results in a match, get the ifaddr to update the
statistics counters, and check that the address is neither tentative,
duplicate or otherwise invalid before accepting the packet.  If ok,
record the address information in the mbuf.  [ as is done in case
lla_lookup does not return a result and we go through the FIB ].

Reported by:	remko
Tested by:	remko
MFC after:	2 weeks
2010-07-21 13:01:21 +00:00
Alfred Perlstein
8e96292d91 Fix our version of IPv6 address representation.
We do not respect rules 3 and 4 in the required list:

1. omit leading zeros

2. "::" used to their maximum extent whenever possible

3. "::" used where shortens address the most

4. "::" used in the former part in case of a tie breaker

5. do not shorten one 16 bit 0 field

6. use lower case

http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-04.html

Submitted by: Kalluru Abhiram @ Juniper Networks
Obtained from: Juniper Networks
Reviewed by: hrs, dougb
2010-05-19 00:35:47 +00:00
Kip Macy
83e711ec14 allocate ipv6 flows from the ipv6 flow zone
reported by: rrs@

MFC after:	3 days
2010-05-16 21:48:39 +00:00
Kip Macy
94162961c6 do a proper fix
Pointed out by: np@

MFC after:	3 days
2010-05-13 19:47:36 +00:00
Kip Macy
fc21c49a0f fix compile error on some builds by doing the equivalent of
an "extern VNET_DEFINE" without "__used"

MFC after:	3 days
2010-05-13 19:36:13 +00:00
Kip Macy
1f93b77267 try working around panic by validating rt and lle
MFC after:	3 days
2010-05-12 03:29:11 +00:00
Kip Macy
693810835d boot time size the flowtable
MFC after:	3 days
2010-05-10 21:31:20 +00:00
Kip Macy
77931dd513 Add flowtable support to IPv6
Tested by: qingli@

Reviewed by:	qingli@
MFC after:	3 days
2010-05-09 20:32:00 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
7a657e630d Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by:	(ten 211.ru)
Tested by:	(ten 211.ru)
MFC after:	4 days
2010-04-27 15:07:08 +00:00
Bjoern A. Zeeb
877fc3b64b Make sure IPv6 source address selection does not change interface
addresses while walking the IPv6 address list if in the jail case
something is connecting to ::1.

Reported by:	Pieter de Boer (pieter thedarkside.nl)
Tested by:	Pieter de Boer (pieter thedarkside.nl)
MFC after:	4 days
2010-04-27 15:05:03 +00:00
Konstantin Belousov
99c750a814 Provide 32bit compat for SIOCGDEFIFACE_IN6.
Based on submission by:	pluknet gmail com
Reviewed by:	emaste
MFC after:	2 weeks
2010-04-27 09:47:14 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Bruce M Simpson
f1014c074d When embedding the scope ID in MLDv1 output, check if the scope of the address
being embedded is in fact link-local, before attempting to embed it.

Note that this operation is a side-effect of trying to avoid recursion on
the IN6 scope lock.

PR:		144560
Submitted by:	Petr Lampa
MFC after:	3 days
2010-04-10 12:24:21 +00:00
Michael Tuexen
b5c164935e * Fix some race condition in SACK/NR-SACK processing.
* Fix handling of mapping arrays when draining mbufs or processing
  FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.
2010-04-03 15:40:14 +00:00
Bjoern A. Zeeb
d715e397f0 We are holding a write lock here so avoid aquiring it twice calling
the "locked" version rather than the wrapper function.

MFC after:	6 days
2010-03-25 10:29:00 +00:00
Randall Stewart
1966e5b5a1 The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR:		144529
MFC after:	2 weeks
2010-03-12 22:58:52 +00:00
Randall Stewart
9b03990a13 With the recent change of the sctp checksum to support offload,
no delayed checksum was added to the ip6 output code. This
causes cards that do not support SCTP checksum offload to
have SCTP packets that are IPv6 NOT have the sctp checksum
performed. Thus you could not communicate with a peer. This
adds the missing bits to make the checksum happen for these cards.

PR:		144529
MFC after:	2 weeks
2010-03-12 08:10:30 +00:00
Qing Li
c1752bcd65 Use reference counting instead of locking to secure an address while
that address is being used to generate temporary IPv6 address. This
approach is sufficient and avoids recursive locking.

MFC after:	3 days
2010-02-27 07:12:25 +00:00
Pawel Jakub Dawidek
ceda73974b No need to include security/mac/mac_framework.h here. 2010-02-18 22:30:37 +00:00
Bjoern A. Zeeb
681ffdf935 Correct a typo.
Submitted by:	kensmith
MFC after:	3 days
2010-01-24 10:22:39 +00:00
Bjoern A. Zeeb
4dcc55a363 Garbage collect references to the no longer implemented tcp_fasttimo().
Discussed with:	rwatson
MFC after:	5 days
2010-01-17 13:07:52 +00:00
Bjoern A. Zeeb
592bcae802 Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by:	jamie, hrs (ipv6 part)
Pointed out by:	hrs [1]
MFC After:	2 weeks
Asked for by:	Jase Thew (bazerka beardz.net)
2010-01-17 12:57:11 +00:00
Edward Tomasz Napierala
3745cc73d0 Replace several instances of 'if (!a & b)' with 'if (!(a &b))' in order
to silence newer GCC versions.
2010-01-08 15:44:49 +00:00
Bjoern A. Zeeb
1767c52079 Correct a typo.
Submitted by:	sn_ (sn_ gmx.net) on hackers@
MFC after:	3 days
2010-01-06 23:05:00 +00:00
Qing Li
6f1828763e The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.

MFC after:	5 days
2010-01-04 23:39:53 +00:00
Qing Li
baf7c37373 Multiple IPv6 addresses of the same prefix can be installed on the
same interface. The first address will install the prefix route into
the kernel routing table and that prefix will be marked as on-link.
Without RADIX_MPATH enabled, the other address aliases of the same
prefix will update the prefix reference count but no other routes
will be installed. Consequently the prefixes associated with these
addresses would not be marked as on-link. As such, incoming packets
destined to these address aliases will fail the ND6 on-link check
on input. This patch fixes the above problem by searching the kernel
routing table and try to find an on-link prefix on the given interface.

MFC after:	5 days
2009-12-30 21:51:23 +00:00
Qing Li
c7ab66020f The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00
Bruce M Simpson
aa16623133 Use ALLOW_NEW_SOURCES and BLOCK_OLD_SOURCES to signal a join or leave
with SSM MLDv2 by default.
This is current practice and complies with RFC 4604, as well as being
required by production IPv6 networks in Japan.
The behaviour may be disabled by setting the net.inet6.mld.use_allow
sysctl/tunable to 0.

Requested by:	Hideki Yamamoto
MFC after:	1 week
2009-12-22 20:40:22 +00:00