Commit Graph

1086 Commits

Author SHA1 Message Date
Michael Tuexen
f7f2907a7e Small cleanup, no functional change. 2012-01-15 14:03:05 +00:00
Michael Tuexen
c58e60be43 Add an SCTP sysctl "blackhole", similar to the one for TCP.
If set to 1, no ABORT is sent back in response to an incoming
INIT. If set to 2, no ABORT is sent back in response to
an out of the blue packet. If set to 0 (the default), ABORTs
are sent.
Discussed with rrs@.

MFC after: 1 month.
2012-01-08 09:56:24 +00:00
John Baldwin
137f91e80f Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
Bjoern A. Zeeb
9d9b92f299 Mark a couple of file local functions static and stop exporting them.
MFC after:	1 week
2012-01-05 01:14:35 +00:00
Bjoern A. Zeeb
f67e13d66d Convert an #ifdef DIAGNOSTIC if/panic to a KASSERT.
MFC after:	1 week
2012-01-05 01:13:25 +00:00
John Baldwin
19b0c9b246 Use the mli_relinmhead list normally used to defer calls to
in6m_release_locked() to defer calls to mld_v1_transmit_report() until
after the IF_ADDR_LOCK is dropped.  This removes a race where the lock
is dropped and reacquired while attempting to walk an interface's
address list.

Reviewed by:	bz
MFC after:	1 week
2012-01-04 13:35:20 +00:00
Gleb Smirnoff
1331bbc33f Use correct locking when traversing interface address list.
Reviewed by:	bz
2012-01-04 07:01:23 +00:00
John Baldwin
c6f4ea8062 When cancelling multicast timers on an interface, don't release the
reference on a group in the leaving state while iterating over the loop.
Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach()
of placing the groups to free on pending release list and then releasing
the references after dropping the IF_ADDR_LOCK.  This closes an ugly race
where the code was dropping the lock in the middle of iterating over the
list.  It also fixes some additional potential use-after-free bugs since
the cancellation routine also applied other changes to the group after
dropping the reference.  Now those changes are performed before the
reference is dropped and the group is potentially freed.

Prodded to fix by:	glebius
Reviewed by:	bz
MFC after:	1 week
2012-01-03 20:34:52 +00:00
John Baldwin
9f745f61b8 Grab a reference on the matching interface address (ifa) in the handling
of the SIOC[DG]LIFADDR icotls before dropping the IF_ADDR_LOCK() and
release the reference after using it.  This prevents the address from
being potentially freed out from under the ioctl handler.

Reviewed by:	bz
MFC after:	1 week
2012-01-03 19:44:36 +00:00
John Baldwin
f5b50e25ec Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that
do not modify the queues they iterate over.

Submitted by:	glebius
2012-01-03 16:22:29 +00:00
Bjoern A. Zeeb
ad05fc1d2d Remove an uneeded inpcb forward declaration and align the function
declaration following to match the style in the rest of the file.

MFC after:	3 days
2012-01-02 13:03:13 +00:00
Bjoern A. Zeeb
1a69707f40 Remove a declaration to a non-existent function.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2011-12-31 16:19:22 +00:00
John Baldwin
3b0b2840be Use queue(3) macros instead of home-rolled versions in several places in
the INET6 code.  This includes retiring the 'ndpr_next' and 'pfr_next'
macros.

Submitted by:	pluknet (earlier version)
Reviewed by:	pluknet
2011-12-29 18:25:18 +00:00
Michael Tuexen
60990c0c06 Address issues found by clang. While there, fix also some style
issues.

MFC after: 3 months.
2011-12-27 10:16:24 +00:00
John Baldwin
94e8313349 Fix a bug where TAILQ_FIRST(&V_ifnet) was accessed without holding the
proper lock.

Reviewed by:	bz
MFC after:	1 week
2011-12-24 18:11:54 +00:00
Gleb Smirnoff
7121247312 Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by:	brooks
2011-12-21 12:39:08 +00:00
Maxim Konovalov
d96ea877a7 o Convert IPv6 read-only stats sysctls to the read-write ones.
o Teach netstat(1) -z to reset these stats sysctls.

PR:		bin/153206
Reviewed by:	glebuis
Sponsored by:	NGINX, Inc.
MFC after:	1 month
2011-12-19 05:50:34 +00:00
Michael Tuexen
7215cc1b74 Fix unused parameter warnings.
While there, fix some whitespace issues.

MFC after: 3 months.
2011-12-17 19:21:40 +00:00
Gleb Smirnoff
08b68b0e4c A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
Gleb Smirnoff
6d18ea8ff9 Fix double free.
PR:		kern/163089
Submitted by:	Herbie Robinson <Herbie.Robinson stratus.com>
2011-12-07 13:37:42 +00:00
Bjoern A. Zeeb
08907004c6 Return the correct value for the IPV6_MULTICAST_HOPS getsockopt() call.
Submitted by:	rpaulo
MFC after:	3 days
2011-11-13 02:32:10 +00:00
Qing Li
0f1aca6519 A default route learned from the RAs could be deleted manually
after its installation. This removal may be accidental and can
prevent the default route from being installed in the future if
the associated default router has the best preference. The cause
is the lack of status update in the default router on the state
of its route installation in the kernel FIB. This patch fixes
the described problem.

Reviewed by:	hrs, discussed with hrs
MFC after:	5 days
2011-11-11 23:22:38 +00:00
Mikolaj Golub
040ee1ec95 Fix false positive EADDRINUSE that could be returned by bind, due to
the typo made in r227207.

Reported by:	kib
Tested by:	kib
2011-11-11 14:09:09 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Gleb Smirnoff
37c1ff48a9 In icmp6_redirect_input:
- Assert that we got a valid mbuf with rcvif pointer. [1]
- Use __func__ in logging.

Submitted by:	prabhakar lakhera <prabhakar.lakhera gmail.com> [1]
Submitted by:	Kristof Provost <kristof sigsegv.be> [1]
2011-11-07 14:22:18 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Mikolaj Golub
fc06cd427e Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid
inp_socket->so_options dereference when we may not acquire the lock on
the inpcb.

This fixes the crash due to NULL pointer dereference in
in_pcbbind_setup() when inp_socket->so_options in a pcb returned by
in_pcblookup_local() was checked.

Reported by:	dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com>
Suggested by:	rwatson
Glanced by:	rwatson
Tested by:	dave jones <s.dave.jones@gmail.com>
2011-11-06 10:47:20 +00:00
Mikolaj Golub
29381b363b Before dereferencing intotw() check for NULL, the same way as it is
done for in_pcb (see r157474).

MFC after:	1 week
2011-11-06 09:29:52 +00:00
Sergey Kandaurov
6ba404cab8 Remove a couple of write-only variables. 2011-11-03 09:09:05 +00:00
Qing Li
14417253d8 The code change made in r226040 was incomplete and resulted in
routes such as fe80::1%lo0 no being installed. This patch completes
the original intended fix.

Reviewed by:	hrs, bz
MFC after:	3 days
2011-10-16 22:24:04 +00:00
Qing Li
e74eb12c68 The IPv6 code was influx at the time of r196865 due to the L2/L3
separation rewrite changes. r196865 was committed to fix a scope
violation problem in the following test scenario:

  box-1# ifconfig em0 inet6 2001:db8:1:: prefixlen 64 anycast
  box-1# ifconfig em1 inet6 2001:db8:2::1 prefixlen 64

  box-2# ifconfig re0 inet6 2001:db8:1::6 prefixlen 64

  em0 and re0 are on the same link.

  box-2# ping6 2001:db8:1::
  PING6(56=40+8+8 bytes) 2001:db8:1::6 --> 2001:db8:1::

the ICMPv6 response should have a source address of em1, which
is 2001:db8:2::1, not the link-local address of em0.

That code is no longer necessary and breaks the IPv6-Ready logo
testing, so revert it now.

Reviewed by:	hrs
MFC after:	3 days
2011-10-16 22:15:13 +00:00
Hiroki Sato
154d5f7321 Fix a problem that an interface unexpectedly becomes IFF_UP by
just doing "ifconfing inet6 -ifdisabled" when the interface has
ND6_IFF_AUTO_LINKLOCAL flag and no link-local address.
2011-10-16 19:46:52 +00:00
Gleb Smirnoff
d5378bb633 Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation. 2011-10-13 13:33:23 +00:00
Gleb Smirnoff
b590b6ae80 Restore functions in6_ifaddloop() and in6_ifremloop() that were
inlined by Qing Li in his big new-ARP commit. I am going to utilize
them in my newcarp work, and also these functions left declared
in in6_var.h for all the time they were absent.

Reviewed by:	bz
2011-10-13 13:05:36 +00:00
Qing Li
6c6aa80c9d The IFA_RTSELF instead of the IFA_ROUTE flag should be checked to
determine if a loopback route should be installed for an interface
IPv6 address. Another condition is the address must not belong to a
looopback interface.

Reviewed by:	hrs
MFC after:	3 days
2011-10-05 16:27:11 +00:00
Bjoern A. Zeeb
d7ae37140a Fix an obvious bug from r186196 shadowing a variable, not correctly
appending the new mbuf to the chain reference but possibly causing an mbuf
nextpkt loop leading to a memory used after handoff (or having been freed)
and leaking an mbuf here.

Reviewed by:	rwatson, brooks
MFC after:	3 days
2011-09-30 18:20:16 +00:00
Kip Macy
1eeb6d97d0 Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.
Reviewed by:	rwatson, bz
Approved by:	re (kib)
2011-09-20 20:27:26 +00:00
Hiroki Sato
6090ab8bd6 Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixes
inconsistency when options are specified by both setsockopt() and ancillary
data types.

PR:		kern/158307
Approved by:	re (bz)
2011-09-20 00:29:17 +00:00
Hiroki Sato
049087a0f3 Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE
(r225485).  When setting an interface name to it, the following
configurations will be enabled:

 1. "no_radr" is set to all IPv6 interfaces automatically.

 2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif.  This is
    done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this
    means you can manually supersede this configuration if necessary).

 3. The node will add RA-sending routers to the default router list
    even if net.inet6.ip6.forwarding=1.

This mode is added to conform to RFC 6204 (a router which connects
the end-user network to a service provider network).  To enable
packet forwarding, you still need to set ipv6_gateway_enable=YES.

Note that accepting router entries into the default router list when
packet forwarding capability and a routing daemon are enabled can
result in messing up the routing table.  To minimize such unexpected
behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif.

Approved by:	re (bz)
2011-09-13 00:06:11 +00:00
Sergey Kandaurov
0ad2addc9d Fix if_addr_mtx recursion in mld6.
mld_set_version() is called only from mld_v1_input_query() and
mld_v2_input_query() both holding the if_addr_mtx lock, and then calling
into mld_v2_cancel_link_timers() acquires it the second time, which results
in mtx recursion. To avoid that, delay if_addr_mtx acquisition until after
mld_set_version() is called; while here, further reduce locking scope
to protect only the needed pieces: if_multiaddrs, in6m_lookup_locked().

PR:		kern/158426
Reported by:	Thomas <tps vr-web.de>,
		Tom Vijlbrief <tom.vijlbrief xs4all.nl>
Tested by:	Tom Vijlbrief
Reviewed by:	bz
Approved by:	re (kib)
2011-08-22 23:39:40 +00:00
Bjoern A. Zeeb
8a006adb24 Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from:	David Dolson at Sandvine Incorporated
		(original version for ipfw fwd IPv6 support)
Sponsored by:	Sandvine Incorporated
PR:		bin/117214
MFC after:	4 weeks
Approved by:	re (kib)
2011-08-20 17:05:11 +00:00
Bjoern A. Zeeb
90bc35de38 Add an in6_localip() helper function as in6_localaddr() is not doing what
people think: returning true for an address in any connected subnet, not
necessarily on the local machine.

Sponsored by:	Sandvine Incorporated
MFC after:	2 weeks
Approved by:	re (kib)
2011-08-20 16:43:47 +00:00
Michael Tuexen
ca85e9482a The result of a joint work between rrs@ and myself at the IETF:
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.

Approved by: re@
MFC after: 2 months.
2011-08-03 20:21:00 +00:00
Michael Tuexen
78d9a31d3a The socket API only specifies SCTP for SOCK_SEQPACKET and
SOCK_STREAM, but not SOCK_DGRAM. So don't register it for
SOCK_DGRAM.
While there, fix some indentation.
2011-07-12 19:29:29 +00:00
Marko Zec
13e255fab7 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
Bjoern A. Zeeb
e0bfbfce79 Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by:	mlaier
Submitted by:	eri
2011-06-28 11:57:25 +00:00
Bjoern A. Zeeb
869052041d Add the missing call to ip6_ipsec_filtertunnel() to be able to control
whether decapsulated IPsec packets will be passed to pfil again depending
on the setting of the net.ip6.ipsec6.filtertunnel sysctl.

PR:		kern/157670
Submitted by:	Manuel Kasper (mk neon1.net)
MFC after:	2 weeks
2011-06-08 10:59:36 +00:00
Bjoern A. Zeeb
ffe8cd7b10 Correct comments and debug logging in ipsec to better match reality.
MFC after:	3 days
2011-06-08 03:02:11 +00:00
Robert Watson
52cd27cb58 Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup.  pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups.  During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock.  By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details).  This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems".  However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies.  Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect.  In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-06 12:55:02 +00:00
Hiroki Sato
23be782526 Do not activate automatic LL addr configuration when 0/1->1 transition of
ND6_IFF_IFDISABLED flag.
2011-06-06 04:12:57 +00:00