Commit Graph

1174 Commits

Author SHA1 Message Date
bz
2d7f5ff3a8 Plug two interface address refcount leaks in early error return cases
in the ioctl path.

Reported by:	rpaulo
 Reviewed by:	emax
MFC after:	3 days
2012-06-05 13:27:37 +00:00
emax
0985bae1a6 Plug reference leak.
Interface routes are refcounted as packets move through the stack,
and there's garbage collection tied to it so that route changes can
safely propagate while traffic is flowing. In our setup, we weren't
changing or deleting any routes, but the refcounting logic in
ip6_input() was wrong and caused a reference leak on every inbound
V6 packet. This eventually caused a 32bit overflow, and the resulting
0 value caused the garbage collection to run on the active route.
That then snowballed into the panic.

Reviewed by:	scottl
MFC after:	3 days
2012-06-03 07:36:59 +00:00
tuexen
dc64091687 Seperate SCTP checksum offloading for IPv4 and IPv6.
While there: remove some trainling whitespaces.

MFC after: 3 days
X-MFC with: 236170
2012-05-30 20:56:07 +00:00
emax
b57a3bc250 When we return deprecated addresses, we need to reference them.
Reviewed by:	bz, scottl
MFC after:	3 days
2012-05-30 20:02:39 +00:00
bz
ac429c7044 It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading.  Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.

To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.

Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible).  Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation.  Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.

This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.

Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.

Individual driver updates will have to follow, as will SCTP.

Reported by:	gallatin, dim, ..
Reviewed by:	gallatin (glanced at?)
MFC after:	3 days
X-MFC with:	r235961,235959,235958
2012-05-28 09:30:13 +00:00
bz
2d07c3f75f Correctly get the payload length in host byte order. While we
already plan to support >64k payload here, the IPv6 header payload
length obviously is only 16 bit and the calculations need to be right.

Reported by:	dim
Tested by:	dim
MFC after:	1 day
X-MFC:		with r235958
2012-05-26 23:58:51 +00:00
tuexen
520d26f351 Get rid of SCTP specific code to avoid CRC32C computations on loopback.
Just just offloading.
MFC after: 3 days
2012-05-26 09:16:33 +00:00
bz
95907ed567 MFp4 bz_ipv6_fast:
Use M_ZERO with malloc rather than calling bzero() ourselves.

  Change if () panic() checks to KASSERT()s as they are only
  catching invariants in code flow but not dependent on network
  input/output.

  Move initial assigments indirecting pointers after the lock
  has been aquired.

  Passing layer boundries, reset M_PROTOFLAGS.

  Remove a NULL assignment before free.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 09:27:16 +00:00
bz
f528e8d71f MFp4 bz_ipv6_fast:
Factor out Hop-By-Hop option processing.  It's still not heavily used,
  it reduces the footprint of ip6_input() and makes ip6_input() more
  readable.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 02:58:21 +00:00
bz
179af0a3e5 MFp4 bz_ipv6_fast:
Defer checksum calulations on UDP6 output and respect the mbuf
  flags set by NICs having done checksum validation for us already,
  thus saving the computing time in the input path as well.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 02:19:17 +00:00
bz
d52a8c3e3b MFp4 bz_ipv6_fast:
Add support for delayed checksum calculations in the IPv6
  output path.  We currently cannot offload to the card if we
  add extension headers (which incl. fragmentation).

  Fix two SCTP offload support copy&paste bugs: calculate
  checksums if fragmenting and no need to flag IPv4 header
  checksums in the IPv6 forwarding path.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 02:17:16 +00:00
bz
eda9e50c52 MFp4 bz_ipv6_fast:
Hide the ip6aux functions.  The only one referenced outside ip6_input.c
  is not compiled in yet (__notyet__) in route6.c (r235954).  We do have
  accessor functions that should be used.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
X-MFC:		KPI?
2012-05-25 01:48:15 +00:00
bz
a57f599e5c MFp4 bz_ipv6_fast:
Simplify the code removing a return from an earlier else case,
  not differing from the default function return called now.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 01:45:05 +00:00
bz
2c462da250 MFp4 bz_ipv6_fast:
We currently nowhere set IP6A_SWAP making the entire check useless
  with the current code.  Keep around but do not compile in.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 01:43:52 +00:00
bz
3971a44478 MFp4 bz_ipv6_fast:
No need to hold the (expensive) rt lock over (expensive) logging.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 01:42:48 +00:00
bz
d9ff82d313 MFp4 bz_ipv6_fast:
Introduce a (for now copied stripped down) in6_cksum_pseudo()
  function.  We should be able to use this from in6_cksum() but
  we should also ponder possible MD specific improvements.
  It takes an extra csum argument to allow for easy checks as
  will be done by the upper layer protocol input paths.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 18:25:09 +00:00
bz
0a74a9f559 MFp4 bz_ipv6_fast:
Optimize in6_cksum(), re-ordering work and limiting variable
  initialization, removing a bzero() for mostly re-initialized
  struct values, making use of the newly introduced in6_getscope(),
  as well as converting an if/panic to a KASSERT().

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 18:05:10 +00:00
bz
bbf3b9348d MFp4 bz_ipv6_fast:
Introduce in6_getscope() to allow more effective checksum
  computations without the need to copy the address to clear the
  scope.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 16:30:13 +00:00
tuexen
25827ad382 Use consistent text at the begining of the files.
MFC after: 3 days
2012-05-23 11:26:28 +00:00
marius
b8d578077a Rewrite nd6_sysctl_{d,p}rlist() to avoid misaligned accesses to char arrays
casted to structs by getting rid of these buffers entirely. In r169832, it
was tried to paper over this issue by 32-bit aligning the buffers. Depending
on compiler optimizations that still was insufficient for 64-bit architectures
with strong alignment requirements though.
While at it, add comments regarding the total lack of locking in this area.

Tested by:	bz
Reviewed by:	bz (slightly earlier version), yongari (earlier version)
MFC after:	1 week
2012-05-20 05:12:31 +00:00
tuexen
442f3db615 Missed to commit this in r235414.
MFC after: 3 days
2012-05-13 19:25:21 +00:00
tuexen
abe6735879 Use ECONNABORTED in cases where the ABORT was sent to the peer.
MFC after: 3 days
2012-05-13 16:56:16 +00:00
tuexen
b3431e25a4 Provide in the association change notification the received ABORT chunk
if case of SCTP_COMM_LOST or SCTP_CANT_STR_ASSOC as required by RFC 6458.

MFC after: 3 days
2012-05-12 20:11:35 +00:00
glebius
348e8ce8be in6_pcblookup_local() still can return a pcb with NULL
inp_socket. To avoid panic, do not dereference inp_socket,
but obtain reuse port option from inp_flags2, like this
is done after next call to in_pcblookup_local() a few lines
down below.

Submitted by:	rwatson
2012-03-21 08:43:38 +00:00
tuexen
52199a8cf4 Clean up, no functional change.
MFC after: 3 days.
2012-03-15 14:22:05 +00:00
bz
b70c7f1db8 In nd6_options() ignore the RFC 6106 options completely rather than printing
them if nd6_debug is enabled as unknown.  Leave a comment about the RFC4191
option as I am undecided so far.

Discussed with:	hrs
MFC after:	3 days
2012-03-04 18:51:45 +00:00
hrs
c2fcac3f88 Allow to configure net.inet6.ip6.{accept_rtadv,no_radr} by the loader tunables
as well because they have to be configured before interface initialization for
AF_INET6.
2012-03-02 07:23:28 +00:00
hrs
ae62f802c7 Remove a redundant check. 2012-03-02 07:22:04 +00:00
bz
9eb6f57f87 In selectroute() add a missing fibnum argument to an in6_rtalloc()
call in an #if 0 section.

In in6_selecthlim() optimize a case where in6p cannot be NULL due to an
earlier check.

More consistently use u_int instead of int for fibnum function arguments.

Sponsored by:	Cisco Systems, Inc.
MFC after:	3 days
2012-02-24 20:06:04 +00:00
kmacy
a99e9d281d When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after:	2 weeks
2012-02-23 18:21:37 +00:00
tuexen
01e294b2fd Remove two clang warnings.
MFC after: 1 month.
2012-02-18 16:06:15 +00:00
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
glebius
b1246dbed6 Remove casts from inet6 address testing macros, thus preserving
qualifier from original argument.

Obtained from:	NetBSD, r. 1.67
Submitted by:	maxim
2012-01-26 12:04:19 +00:00
pluknet
cf0bd26fe1 Remove unused variable.
The actual ia6->ia6_lifetime access is hidden in
IFA6_IS_INVALID/IFA6_IS_DEPRECATED macros since a long time ago
(see netinet6/nd6.c, r1.104 of KAME for the reference).

MFC after:	3 days
2012-01-25 08:53:42 +00:00
bz
e59c01b14f Plug a possible ifa_ref leak in case of premature return from in6_purgeaddr().
Reviewed by:	rwatson
MFC after:	3 days
2012-01-24 13:57:30 +00:00
pluknet
728e6ff16e Remove the stale XXX rt_newaddrmsg comment.
A routing socket message is generated since r192282.

Reviewed by:	bz
MFC after:	3 days
2012-01-24 09:51:42 +00:00
bz
e8bf125640 Remove unnecessary line break.
MFC after:	3 days
2012-01-24 06:21:38 +00:00
bz
a8d3ef905d Clean up some #endif comments removing from short sections. Add #endif
comments to longer, also refining strange ones.

Properly use #ifdef rather than #if defined() where possible.  Four
#if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to
avoid conflicts with eventually upcoming changes for RSS.

Reported by:	bde (most)
Reviewed by:	bde
MFC after:	3 days
2012-01-22 02:13:19 +00:00
tuexen
bddf5b6a08 Small cleanup, no functional change. 2012-01-15 14:03:05 +00:00
tuexen
ebc0602463 Add an SCTP sysctl "blackhole", similar to the one for TCP.
If set to 1, no ABORT is sent back in response to an incoming
INIT. If set to 2, no ABORT is sent back in response to
an out of the blue packet. If set to 0 (the default), ABORTs
are sent.
Discussed with rrs@.

MFC after: 1 month.
2012-01-08 09:56:24 +00:00
jhb
4ef366671a Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
bz
bfb18e3280 Mark a couple of file local functions static and stop exporting them.
MFC after:	1 week
2012-01-05 01:14:35 +00:00
bz
e889f6860b Convert an #ifdef DIAGNOSTIC if/panic to a KASSERT.
MFC after:	1 week
2012-01-05 01:13:25 +00:00
jhb
4e0513fc73 Use the mli_relinmhead list normally used to defer calls to
in6m_release_locked() to defer calls to mld_v1_transmit_report() until
after the IF_ADDR_LOCK is dropped.  This removes a race where the lock
is dropped and reacquired while attempting to walk an interface's
address list.

Reviewed by:	bz
MFC after:	1 week
2012-01-04 13:35:20 +00:00
glebius
2886f1415c Use correct locking when traversing interface address list.
Reviewed by:	bz
2012-01-04 07:01:23 +00:00
jhb
66d3d3405c When cancelling multicast timers on an interface, don't release the
reference on a group in the leaving state while iterating over the loop.
Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach()
of placing the groups to free on pending release list and then releasing
the references after dropping the IF_ADDR_LOCK.  This closes an ugly race
where the code was dropping the lock in the middle of iterating over the
list.  It also fixes some additional potential use-after-free bugs since
the cancellation routine also applied other changes to the group after
dropping the reference.  Now those changes are performed before the
reference is dropped and the group is potentially freed.

Prodded to fix by:	glebius
Reviewed by:	bz
MFC after:	1 week
2012-01-03 20:34:52 +00:00
jhb
badf97ee75 Grab a reference on the matching interface address (ifa) in the handling
of the SIOC[DG]LIFADDR icotls before dropping the IF_ADDR_LOCK() and
release the reference after using it.  This prevents the address from
being potentially freed out from under the ioctl handler.

Reviewed by:	bz
MFC after:	1 week
2012-01-03 19:44:36 +00:00
jhb
dd61fe0873 Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that
do not modify the queues they iterate over.

Submitted by:	glebius
2012-01-03 16:22:29 +00:00
bz
ea62212755 Remove an uneeded inpcb forward declaration and align the function
declaration following to match the style in the rest of the file.

MFC after:	3 days
2012-01-02 13:03:13 +00:00
bz
55bd70ed9a Remove a declaration to a non-existent function.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2011-12-31 16:19:22 +00:00
jhb
7a0151720c Use queue(3) macros instead of home-rolled versions in several places in
the INET6 code.  This includes retiring the 'ndpr_next' and 'pfr_next'
macros.

Submitted by:	pluknet (earlier version)
Reviewed by:	pluknet
2011-12-29 18:25:18 +00:00
tuexen
b9ef107414 Address issues found by clang. While there, fix also some style
issues.

MFC after: 3 months.
2011-12-27 10:16:24 +00:00
jhb
f3f015b978 Fix a bug where TAILQ_FIRST(&V_ifnet) was accessed without holding the
proper lock.

Reviewed by:	bz
MFC after:	1 week
2011-12-24 18:11:54 +00:00
glebius
653f8c5e71 Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by:	brooks
2011-12-21 12:39:08 +00:00
maxim
330e98b5de o Convert IPv6 read-only stats sysctls to the read-write ones.
o Teach netstat(1) -z to reset these stats sysctls.

PR:		bin/153206
Reviewed by:	glebuis
Sponsored by:	NGINX, Inc.
MFC after:	1 month
2011-12-19 05:50:34 +00:00
tuexen
3a4d069b21 Fix unused parameter warnings.
While there, fix some whitespace issues.

MFC after: 3 months.
2011-12-17 19:21:40 +00:00
glebius
27a36f6ac8 A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
glebius
c9b9c0a5a3 Fix double free.
PR:		kern/163089
Submitted by:	Herbie Robinson <Herbie.Robinson stratus.com>
2011-12-07 13:37:42 +00:00
bz
2c9fd688ff Return the correct value for the IPV6_MULTICAST_HOPS getsockopt() call.
Submitted by:	rpaulo
MFC after:	3 days
2011-11-13 02:32:10 +00:00
qingli
3b996bbc11 A default route learned from the RAs could be deleted manually
after its installation. This removal may be accidental and can
prevent the default route from being installed in the future if
the associated default router has the best preference. The cause
is the lack of status update in the default router on the state
of its route installation in the kernel FIB. This patch fixes
the described problem.

Reviewed by:	hrs, discussed with hrs
MFC after:	5 days
2011-11-11 23:22:38 +00:00
trociny
e290adc1a6 Fix false positive EADDRINUSE that could be returned by bind, due to
the typo made in r227207.

Reported by:	kib
Tested by:	kib
2011-11-11 14:09:09 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
glebius
599a401646 In icmp6_redirect_input:
- Assert that we got a valid mbuf with rcvif pointer. [1]
- Use __func__ in logging.

Submitted by:	prabhakar lakhera <prabhakar.lakhera gmail.com> [1]
Submitted by:	Kristof Provost <kristof sigsegv.be> [1]
2011-11-07 14:22:18 +00:00
ed
e97eae1577 Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
trociny
f9135967f2 Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid
inp_socket->so_options dereference when we may not acquire the lock on
the inpcb.

This fixes the crash due to NULL pointer dereference in
in_pcbbind_setup() when inp_socket->so_options in a pcb returned by
in_pcblookup_local() was checked.

Reported by:	dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com>
Suggested by:	rwatson
Glanced by:	rwatson
Tested by:	dave jones <s.dave.jones@gmail.com>
2011-11-06 10:47:20 +00:00
trociny
ddbde914da Before dereferencing intotw() check for NULL, the same way as it is
done for in_pcb (see r157474).

MFC after:	1 week
2011-11-06 09:29:52 +00:00
pluknet
494eaea7b0 Remove a couple of write-only variables. 2011-11-03 09:09:05 +00:00
qingli
4fd26a87dc The code change made in r226040 was incomplete and resulted in
routes such as fe80::1%lo0 no being installed. This patch completes
the original intended fix.

Reviewed by:	hrs, bz
MFC after:	3 days
2011-10-16 22:24:04 +00:00
qingli
dd2b5b91eb The IPv6 code was influx at the time of r196865 due to the L2/L3
separation rewrite changes. r196865 was committed to fix a scope
violation problem in the following test scenario:

  box-1# ifconfig em0 inet6 2001:db8:1:: prefixlen 64 anycast
  box-1# ifconfig em1 inet6 2001:db8:2::1 prefixlen 64

  box-2# ifconfig re0 inet6 2001:db8:1::6 prefixlen 64

  em0 and re0 are on the same link.

  box-2# ping6 2001:db8:1::
  PING6(56=40+8+8 bytes) 2001:db8:1::6 --> 2001:db8:1::

the ICMPv6 response should have a source address of em1, which
is 2001:db8:2::1, not the link-local address of em0.

That code is no longer necessary and breaks the IPv6-Ready logo
testing, so revert it now.

Reviewed by:	hrs
MFC after:	3 days
2011-10-16 22:15:13 +00:00
hrs
145ad1ad8c Fix a problem that an interface unexpectedly becomes IFF_UP by
just doing "ifconfing inet6 -ifdisabled" when the interface has
ND6_IFF_AUTO_LINKLOCAL flag and no link-local address.
2011-10-16 19:46:52 +00:00
glebius
16af3ddcd2 Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation. 2011-10-13 13:33:23 +00:00
glebius
623fcd8af9 Restore functions in6_ifaddloop() and in6_ifremloop() that were
inlined by Qing Li in his big new-ARP commit. I am going to utilize
them in my newcarp work, and also these functions left declared
in in6_var.h for all the time they were absent.

Reviewed by:	bz
2011-10-13 13:05:36 +00:00
qingli
d9f932fe21 The IFA_RTSELF instead of the IFA_ROUTE flag should be checked to
determine if a loopback route should be installed for an interface
IPv6 address. Another condition is the address must not belong to a
looopback interface.

Reviewed by:	hrs
MFC after:	3 days
2011-10-05 16:27:11 +00:00
bz
772dd0b1c3 Fix an obvious bug from r186196 shadowing a variable, not correctly
appending the new mbuf to the chain reference but possibly causing an mbuf
nextpkt loop leading to a memory used after handoff (or having been freed)
and leaking an mbuf here.

Reviewed by:	rwatson, brooks
MFC after:	3 days
2011-09-30 18:20:16 +00:00
kmacy
e3079e1350 Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.
Reviewed by:	rwatson, bz
Approved by:	re (kib)
2011-09-20 20:27:26 +00:00
hrs
45d40bbb1a Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixes
inconsistency when options are specified by both setsockopt() and ancillary
data types.

PR:		kern/158307
Approved by:	re (bz)
2011-09-20 00:29:17 +00:00
hrs
08320280c6 Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE
(r225485).  When setting an interface name to it, the following
configurations will be enabled:

 1. "no_radr" is set to all IPv6 interfaces automatically.

 2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif.  This is
    done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this
    means you can manually supersede this configuration if necessary).

 3. The node will add RA-sending routers to the default router list
    even if net.inet6.ip6.forwarding=1.

This mode is added to conform to RFC 6204 (a router which connects
the end-user network to a service provider network).  To enable
packet forwarding, you still need to set ipv6_gateway_enable=YES.

Note that accepting router entries into the default router list when
packet forwarding capability and a routing daemon are enabled can
result in messing up the routing table.  To minimize such unexpected
behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif.

Approved by:	re (bz)
2011-09-13 00:06:11 +00:00
pluknet
081544b729 Fix if_addr_mtx recursion in mld6.
mld_set_version() is called only from mld_v1_input_query() and
mld_v2_input_query() both holding the if_addr_mtx lock, and then calling
into mld_v2_cancel_link_timers() acquires it the second time, which results
in mtx recursion. To avoid that, delay if_addr_mtx acquisition until after
mld_set_version() is called; while here, further reduce locking scope
to protect only the needed pieces: if_multiaddrs, in6m_lookup_locked().

PR:		kern/158426
Reported by:	Thomas <tps vr-web.de>,
		Tom Vijlbrief <tom.vijlbrief xs4all.nl>
Tested by:	Tom Vijlbrief
Reviewed by:	bz
Approved by:	re (kib)
2011-08-22 23:39:40 +00:00
bz
eccbdd061b Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from:	David Dolson at Sandvine Incorporated
		(original version for ipfw fwd IPv6 support)
Sponsored by:	Sandvine Incorporated
PR:		bin/117214
MFC after:	4 weeks
Approved by:	re (kib)
2011-08-20 17:05:11 +00:00
bz
fa01a4aee0 Add an in6_localip() helper function as in6_localaddr() is not doing what
people think: returning true for an address in any connected subnet, not
necessarily on the local machine.

Sponsored by:	Sandvine Incorporated
MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 16:43:47 +00:00
tuexen
f47c615e88 The result of a joint work between rrs@ and myself at the IETF:
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.

Approved by: re@
MFC after: 2 months.
2011-08-03 20:21:00 +00:00
tuexen
8600c7f735 The socket API only specifies SCTP for SOCK_SEQPACKET and
SOCK_STREAM, but not SOCK_DGRAM. So don't register it for
SOCK_DGRAM.
While there, fix some indentation.
2011-07-12 19:29:29 +00:00
zec
99a0b299b3 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
bz
e15f804c7b Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by:	mlaier
Submitted by:	eri
2011-06-28 11:57:25 +00:00
bz
1aaf930d63 Add the missing call to ip6_ipsec_filtertunnel() to be able to control
whether decapsulated IPsec packets will be passed to pfil again depending
on the setting of the net.ip6.ipsec6.filtertunnel sysctl.

PR:		kern/157670
Submitted by:	Manuel Kasper (mk neon1.net)
MFC after:	2 weeks
2011-06-08 10:59:36 +00:00
bz
b4b3d062cd Correct comments and debug logging in ipsec to better match reality.
MFC after:	3 days
2011-06-08 03:02:11 +00:00
rwatson
6e29aea1db Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup.  pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups.  During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock.  By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details).  This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems".  However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies.  Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect.  In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-06 12:55:02 +00:00
hrs
0ae2d5f6c6 Do not activate automatic LL addr configuration when 0/1->1 transition of
ND6_IFF_IFDISABLED flag.
2011-06-06 04:12:57 +00:00
hrs
acbda2ccc1 - Make the code more proactively clear an ND6_IFF_IFDISABLED flag when
an explicit action for INET6 configuration happens.  The changes are:

  1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
     setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
     an attempt to clear the ND6_IFF_IFDISABLED flag.

  2. When an AF_INET6 address is added successfully to an interface and
     it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
     ND6_IFF_IFDISABLED happens.

  This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
  in most cases manual configuration is no longer needed.

- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
  an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
  a link-local address.

  This change ensures link-local address configuration when "ifconfig IF inet6"
  command is invoked.  For example, "ifconfig IF inet6 auto_linklocal" now
  always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
  set to 1 (i.e. down/up cycle is no longer needed).

Reviewed by:	bz
2011-06-06 02:37:38 +00:00
hrs
4c2206b625 - Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
  This controls if accepting a route in an RA message as the default route.
  The default value for each interface can be set by net.inet6.ip6.no_radr.
  The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif.  This controls if setting R-bit in
  NA on RA accepting interfaces.  The default is 0 (R-bit is set based on
  net.inet6.ip6.forwarding).

Background:

 IPv6 host/router model suggests a router sends an RA and a host accepts it for
 router discovery.  Because of that, KAME implementation does not allow
 accepting RAs when net.inet6.ip6.forwarding=1.  Accepting RAs on a router can
 make the routing table confused since it can change the default router
 unintentionally.

 However, in practice there are cases where we cannot distinguish a host from
 a router clearly.  For example, a customer edge router often works as a host
 against the ISP, and as a router against the LAN at the same time.  Another
 example is a complex network configurations like an L2TP tunnel for IPv6
 connection to Internet over an Ethernet link with another native IPv6 subnet.
 In this case, the physical interface for the native IPv6 subnet works as a
 host, and the pseudo-interface for L2TP works as the default IP forwarding
 route.

Problem:

 Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
 accepting them when net.inet6.ip6.forward=0 cause the following practical
 issues:

 - A router cannot perform SLAAC.  It becomes a problem if a box has
   multiple interfaces and you want to use SLAAC on some of them, for
   example.  A customer edge router for IPv6 Internet access service
   using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
   physical interface for administration purpose; updating firmware
   and so on (link-local addresses can be used there, but GUAs by
   SLAAC are often used for scalability).

 - When a host has multiple IPv6 interfaces and it receives multiple RAs on
   them, controlling the default route is difficult.  Router preferences
   defined in RFC 4191 works only when the routers on the links are
   under your control.

Details of Implementation Changes:

 Router Advertisement messages will be accepted even when
 net.inet6.ip6.forwarding=1.  More precisely, the conditions are as
 follow:

 (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
	=> Normal RA processing on that interface. (as IPv6 host)

 (ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
	=> Accept RA but add the router to the defroute list with
	   rtlifetime=0 unconditionally.  This effectively prevents
	   from setting the received router address as the box's
	   default route.

 (!ACCEPT_RTADV)
	=> No RA processing on that interface.

 ACCEPT_RTADV and NO_RADR are per-interface knob.  In short, all interface
 are classified as "RA-accepting" or not.  An RA-accepting interface always
 processes RA messages regardless of ip6.forwarding.  The difference caused by
 NO_RADR or ip6.forwarding is whether the RA source address is considered as
 the default router or not.

 R-bit in NA on the RA accepting interfaces is set based on
 net.inet6.ip6.forwarding.  While RFC 6204 W-1 rule (for CPE case) suggests
 a router should disable the R-bit completely even when the box has
 net.inet6.ip6.forwarding=1, I believe there is no technical reason with
 doing so.  This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
 (the default is 0).

Usage:

 # ifconfig fxp0 inet6 accept_rtadv
	=> accept RA on fxp0
 # ifconfig fxp0 inet6 accept_rtadv no_radr
	=> accept RA on fxp0 but ignore default route information in it.
 # sysctl net.inet6.ip6.norbit_no_radr=1
	=> R-bit in NAs on RA accepting interfaces will always be set to 0.
2011-06-06 02:14:23 +00:00
hrs
58bf7d3104 Use uint8_t for sockaddr sa_len.
Reviewed by:	bz
2011-06-05 11:40:30 +00:00
rwatson
e9eb5d3b9c Add _mbuf() variants of various inpcb-related interfaces, including lookup,
hash install, etc.  For now, these are arguments are unused, but as we add
RSS support, we will want to use hashes extracted from mbufs, rather than
manually calculated hashes of header fields, due to the expensive of the
software version of Toeplitz (and similar hashes).

Add notes that it would be nice to be able to pass mbufs into lookup
routines in pf(4), optimising firewall lookup in the same way, but the
code structure there doesn't facilitate that currently.

(In principle there is no reason this couldn't be MFCed -- the change
extends rather than modifies the KBI.  However, it won't be useful without
other previous possibly less MFCable changes.)

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-04 16:33:06 +00:00
rwatson
fdfdadb612 Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
  inpcb counter.  This lock is now relegated to a small number of
  allocation and free operations, and occasional operations that walk
  all connections (including, awkwardly, certain UDP multicast receive
  operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
  looking up connections and bound sockets, manipulated using new
  INP_HASH_*() macros.  This lock, combined with inpcb locks, protects
  the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required.  As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb.  Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed.  In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup.  New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

  INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
  INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
  TCP, UDPv4, and UDPv6.  pcbinfo ipi_lock acquisitions are largely
  eliminated, and global hash lock hold times are dramatically reduced
  compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
  may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
  is no longer available -- hash lookup locks are now held only very
  briefly during inpcb lookup, rather than for potentially extended
  periods.  However, the pcbinfo ipi_lock will still be acquired if a
  connection state might change such that a connection is added or
  removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
  due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
  callers to acquire hash locks and perform one or more lookups atomically
  with 4-tuple allocation: this is required only for TCPv6, as there is no
  in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
  locking, which relates to source address selection.  This needs
  attention, as it likely significantly reduces parallelism in this code
  for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
  somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
  is no longer sufficient.  A second check once the inpcb lock is held
  should do the trick, keeping the general case from requiring the inpcb
  lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
  which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
  undesirable, and probably another argument is required to take care of
  this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics.  It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
bz
2a56bd3155 Add FEATURE() definitions for IPv4 and IPv6 so that we can use
feature_present(3) to dynamically decide whether to use one or the
other family.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	10 days
2011-05-25 00:34:25 +00:00
rwatson
79b3da72c2 Move from passing a wildcard boolean to a general set up lookup flags into
in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly
for IPv6 functions.  In the future, we would like to support other flags
relating to locking strategy.

This change doesn't appear to modify the KBI in practice, as callers already
passed in INPLOOKUP_WILDCARD rather than a simple boolean.

MFC after:      3 weeks
Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-23 15:23:18 +00:00
qingli
a1bf1a2582 The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by:	delphij
MFC after:	5 days
2011-05-20 19:12:20 +00:00
tuexen
a51881ba59 Remove code with any effect. 2011-05-03 20:34:02 +00:00
tuexen
ad795d2c5d Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
  IPv6.
* Checking for additional addresses takes the correct address
  into account and also does not do more comparisons than
  necessary.

This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.

MFC after: 1 week
2011-04-30 11:18:16 +00:00
bz
734a66b389 Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only
as well compiling out most functions adding or extending #ifdef INET
coverage.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:17:00 +00:00
bz
337d0d95c2 Make the PCB code compile without INET support by adding #ifdef INETs
and correcting few #includes.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-30 11:04:34 +00:00