2005-01-07 02:30:35 +00:00
|
|
|
/*-
|
1999-12-07 17:39:16 +00:00
|
|
|
* Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
* Copyright (c) 2010-2011 Juniper Networks, Inc.
|
1999-12-07 17:39:16 +00:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
* Portions of this software were developed by Robert N. M. Watson under
|
|
|
|
* contract to Juniper Networks, Inc.
|
|
|
|
*
|
1999-12-07 17:39:16 +00:00
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. Neither the name of the project nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
2007-12-10 16:03:40 +00:00
|
|
|
*
|
|
|
|
* $KAME: udp6_usrreq.c,v 1.27 2001/05/21 05:45:10 jinmei Exp $
|
|
|
|
* $KAME: udp6_output.c,v 1.31 2001/05/21 16:39:15 jinmei Exp $
|
1999-12-07 17:39:16 +00:00
|
|
|
*/
|
|
|
|
|
2005-01-07 02:30:35 +00:00
|
|
|
/*-
|
2007-09-08 08:18:24 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1988, 1990, 1993, 1995
|
2007-07-09 17:47:04 +00:00
|
|
|
* The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
1999-12-07 17:39:16 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
2007-09-08 08:18:24 +00:00
|
|
|
* @(#)udp_usrreq.c 8.6 (Berkeley) 5/23/95
|
1999-12-07 17:39:16 +00:00
|
|
|
*/
|
|
|
|
|
2007-12-10 16:03:40 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2000-07-04 16:35:15 +00:00
|
|
|
#include "opt_inet.h"
|
|
|
|
#include "opt_inet6.h"
|
2011-08-20 17:05:11 +00:00
|
|
|
#include "opt_ipfw.h"
|
1999-12-22 19:13:38 +00:00
|
|
|
#include "opt_ipsec.h"
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/param.h>
|
MFp4:
Bring in updated jail support from bz_jail branch.
This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..
SCTP support was updated and supports IPv6 in jails as well.
Cpuset support permits jails to be bound to specific processor
sets after creation.
Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.
DDB 'show jails' command was added to aid debugging.
Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.
Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.
Bump __FreeBSD_version for the afore mentioned and in kernel changes.
Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.
Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible
2008-11-29 14:32:14 +00:00
|
|
|
#include <sys/jail.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/kernel.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/lock.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/mbuf.h>
|
2006-11-06 13:42:10 +00:00
|
|
|
#include <sys/priv.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/proc.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/protosw.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/signalvar.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/socketvar.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/sx.h>
|
|
|
|
#include <sys/sysctl.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <sys/syslog.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/systm.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
#include <net/if.h>
|
|
|
|
#include <net/if_types.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <net/route.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
#include <netinet/in.h>
|
|
|
|
#include <netinet/in_pcb.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <netinet/in_systm.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <netinet/in_var.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <netinet/ip.h>
|
2007-07-19 22:34:25 +00:00
|
|
|
#include <netinet/ip_icmp.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <netinet/ip6.h>
|
2007-07-19 22:34:25 +00:00
|
|
|
#include <netinet/icmp_var.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <netinet/icmp6.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <netinet/ip_var.h>
|
|
|
|
#include <netinet/udp.h>
|
|
|
|
#include <netinet/udp_var.h>
|
2008-12-02 21:37:28 +00:00
|
|
|
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <netinet6/ip6protosw.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
#include <netinet6/ip6_var.h>
|
|
|
|
#include <netinet6/in6_pcb.h>
|
|
|
|
#include <netinet6/udp6_var.h>
|
2005-07-25 12:31:43 +00:00
|
|
|
#include <netinet6/scope6_var.h>
|
1999-12-07 17:39:16 +00:00
|
|
|
|
2007-07-03 12:13:45 +00:00
|
|
|
#ifdef IPSEC
|
2002-10-16 02:25:05 +00:00
|
|
|
#include <netipsec/ipsec.h>
|
|
|
|
#include <netipsec/ipsec6.h>
|
2007-07-03 12:13:45 +00:00
|
|
|
#endif /* IPSEC */
|
2002-10-16 02:25:05 +00:00
|
|
|
|
2007-07-19 22:34:25 +00:00
|
|
|
#include <security/mac/mac_framework.h>
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
/*
|
2007-09-08 08:18:24 +00:00
|
|
|
* UDP protocol implementation.
|
1999-12-07 17:39:16 +00:00
|
|
|
* Per RFC 768, August, 1980.
|
|
|
|
*/
|
|
|
|
|
2007-07-09 17:47:04 +00:00
|
|
|
extern struct protosw inetsw[];
|
|
|
|
static void udp6_detach(struct socket *so);
|
1999-12-07 17:39:16 +00:00
|
|
|
|
2006-05-01 21:39:48 +00:00
|
|
|
static void
|
2007-09-08 08:18:24 +00:00
|
|
|
udp6_append(struct inpcb *inp, struct mbuf *n, int off,
|
2006-05-01 21:39:48 +00:00
|
|
|
struct sockaddr_in6 *fromsa)
|
|
|
|
{
|
|
|
|
struct socket *so;
|
|
|
|
struct mbuf *opts;
|
|
|
|
|
2008-04-22 12:20:33 +00:00
|
|
|
INP_LOCK_ASSERT(inp);
|
2006-05-01 21:39:48 +00:00
|
|
|
|
2007-07-03 12:13:45 +00:00
|
|
|
#ifdef IPSEC
|
2007-07-09 17:47:04 +00:00
|
|
|
/* Check AH/ESP integrity. */
|
2007-09-08 08:18:24 +00:00
|
|
|
if (ipsec6_in_reject(n, inp)) {
|
2006-05-01 21:39:48 +00:00
|
|
|
m_freem(n);
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
V_ipsec6stat.in_polvio++;
|
2006-05-01 21:39:48 +00:00
|
|
|
return;
|
|
|
|
}
|
2007-07-03 12:13:45 +00:00
|
|
|
#endif /* IPSEC */
|
2007-07-19 22:34:25 +00:00
|
|
|
#ifdef MAC
|
2007-10-24 19:04:04 +00:00
|
|
|
if (mac_inpcb_check_deliver(inp, n) != 0) {
|
2007-07-19 22:34:25 +00:00
|
|
|
m_freem(n);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#endif
|
2006-05-01 21:39:48 +00:00
|
|
|
opts = NULL;
|
2008-12-17 13:00:18 +00:00
|
|
|
if (inp->inp_flags & INP_CONTROLOPTS ||
|
2007-09-08 08:18:24 +00:00
|
|
|
inp->inp_socket->so_options & SO_TIMESTAMP)
|
|
|
|
ip6_savecontrol(inp, n, &opts);
|
2006-05-01 21:39:48 +00:00
|
|
|
m_adj(n, off + sizeof(struct udphdr));
|
|
|
|
|
2007-09-08 08:18:24 +00:00
|
|
|
so = inp->inp_socket;
|
2006-05-01 21:39:48 +00:00
|
|
|
SOCKBUF_LOCK(&so->so_rcv);
|
|
|
|
if (sbappendaddr_locked(&so->so_rcv, (struct sockaddr *)fromsa, n,
|
|
|
|
opts) == 0) {
|
|
|
|
SOCKBUF_UNLOCK(&so->so_rcv);
|
|
|
|
m_freem(n);
|
|
|
|
if (opts)
|
|
|
|
m_freem(opts);
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_fullsock);
|
2006-05-01 21:39:48 +00:00
|
|
|
} else
|
|
|
|
sorwakeup_locked(so);
|
|
|
|
}
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
int
|
2007-07-05 16:23:49 +00:00
|
|
|
udp6_input(struct mbuf **mp, int *offp, int proto)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
2006-05-01 21:39:48 +00:00
|
|
|
struct mbuf *m = *mp;
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
struct ifnet *ifp;
|
2007-07-09 17:47:04 +00:00
|
|
|
struct ip6_hdr *ip6;
|
|
|
|
struct udphdr *uh;
|
2007-09-08 08:18:24 +00:00
|
|
|
struct inpcb *inp;
|
2009-05-23 16:51:13 +00:00
|
|
|
struct udpcb *up;
|
1999-12-07 17:39:16 +00:00
|
|
|
int off = *offp;
|
|
|
|
int plen, ulen;
|
2003-11-02 19:09:29 +00:00
|
|
|
struct sockaddr_in6 fromsa;
|
2011-08-20 17:05:11 +00:00
|
|
|
#ifdef IPFIREWALL_FORWARD
|
|
|
|
struct m_tag *fwd_tag;
|
|
|
|
#endif
|
2012-05-25 02:19:17 +00:00
|
|
|
uint16_t uh_sum;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
ifp = m->m_pkthdr.rcvif;
|
2001-06-11 12:39:29 +00:00
|
|
|
ip6 = mtod(m, struct ip6_hdr *);
|
|
|
|
|
2001-09-25 18:40:52 +00:00
|
|
|
if (faithprefix_p != NULL && (*faithprefix_p)(&ip6->ip6_dst)) {
|
2001-06-11 12:39:29 +00:00
|
|
|
/* XXX send icmp6 host/port unreach? */
|
|
|
|
m_freem(m);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (IPPROTO_DONE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
2004-02-13 15:11:47 +00:00
|
|
|
#ifndef PULLDOWN_TEST
|
|
|
|
IP6_EXTHDR_CHECK(m, off, sizeof(struct udphdr), IPPROTO_DONE);
|
|
|
|
ip6 = mtod(m, struct ip6_hdr *);
|
|
|
|
uh = (struct udphdr *)((caddr_t)ip6 + off);
|
|
|
|
#else
|
|
|
|
IP6_EXTHDR_GET(uh, struct udphdr *, m, off, sizeof(*uh));
|
|
|
|
if (!uh)
|
2007-07-09 17:47:04 +00:00
|
|
|
return (IPPROTO_DONE);
|
2004-02-13 15:11:47 +00:00
|
|
|
#endif
|
|
|
|
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_ipackets);
|
1999-12-07 17:39:16 +00:00
|
|
|
|
2007-07-19 22:34:25 +00:00
|
|
|
/*
|
|
|
|
* Destination port of 0 is illegal, based on RFC768.
|
|
|
|
*/
|
|
|
|
if (uh->uh_dport == 0)
|
|
|
|
goto badunlocked;
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
plen = ntohs(ip6->ip6_plen) - off + sizeof(*ip6);
|
|
|
|
ulen = ntohs((u_short)uh->uh_ulen);
|
|
|
|
|
|
|
|
if (plen != ulen) {
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_badlen);
|
2007-07-09 17:47:04 +00:00
|
|
|
goto badunlocked;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Checksum extended UDP header and data.
|
|
|
|
*/
|
2004-04-01 13:48:23 +00:00
|
|
|
if (uh->uh_sum == 0) {
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_nosum);
|
2007-07-09 17:47:04 +00:00
|
|
|
goto badunlocked;
|
2004-04-01 13:48:23 +00:00
|
|
|
}
|
2012-05-25 02:19:17 +00:00
|
|
|
|
It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading. Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.
To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.
Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.
This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.
Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.
Individual driver updates will have to follow, as will SCTP.
Reported by: gallatin, dim, ..
Reviewed by: gallatin (glanced at?)
MFC after: 3 days
X-MFC with: r235961,235959,235958
2012-05-28 09:30:13 +00:00
|
|
|
if (m->m_pkthdr.csum_flags & CSUM_DATA_VALID_IPV6) {
|
2012-05-25 02:19:17 +00:00
|
|
|
if (m->m_pkthdr.csum_flags & CSUM_PSEUDO_HDR)
|
|
|
|
uh_sum = m->m_pkthdr.csum_data;
|
|
|
|
else
|
|
|
|
uh_sum = in6_cksum_pseudo(ip6, ulen,
|
|
|
|
IPPROTO_UDP, m->m_pkthdr.csum_data);
|
|
|
|
uh_sum ^= 0xffff;
|
|
|
|
} else
|
|
|
|
uh_sum = in6_cksum(m, IPPROTO_UDP, off, ulen);
|
|
|
|
|
|
|
|
if (uh_sum != 0) {
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_badsum);
|
2007-07-09 17:47:04 +00:00
|
|
|
goto badunlocked;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
2006-05-01 21:39:48 +00:00
|
|
|
/*
|
|
|
|
* Construct sockaddr format source address.
|
|
|
|
*/
|
|
|
|
init_sin6(&fromsa, m);
|
|
|
|
fromsa.sin6_port = uh->uh_sport;
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) {
|
2007-07-09 17:47:04 +00:00
|
|
|
struct inpcb *last;
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
struct ip6_moptions *imo;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_INFO_RLOCK(&V_udbinfo);
|
1999-12-07 17:39:16 +00:00
|
|
|
/*
|
2007-07-09 17:47:04 +00:00
|
|
|
* In the event that laddr should be set to the link-local
|
1999-12-07 17:39:16 +00:00
|
|
|
* address (this happens in RIPng), the multicast address
|
2007-07-09 17:47:04 +00:00
|
|
|
* specified in the received packet will not match laddr. To
|
|
|
|
* handle this situation, matching is relaxed if the
|
|
|
|
* receiving interface is the same as one specified in the
|
|
|
|
* socket and if the destination multicast address matches
|
|
|
|
* one of the multicast groups specified in the socket.
|
1999-12-07 17:39:16 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2007-07-09 17:47:04 +00:00
|
|
|
* KAME note: traditionally we dropped udpiphdr from mbuf
|
|
|
|
* here. We need udphdr for IPsec processing so we do that
|
|
|
|
* later.
|
1999-12-07 17:39:16 +00:00
|
|
|
*/
|
|
|
|
last = NULL;
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
LIST_FOREACH(inp, &V_udb, inp_list) {
|
2007-09-08 08:18:24 +00:00
|
|
|
if ((inp->inp_vflag & INP_IPV6) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
if (inp->inp_lport != uh->uh_dport)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
2007-09-08 08:18:24 +00:00
|
|
|
if (inp->inp_fport != 0 &&
|
|
|
|
inp->inp_fport != uh->uh_sport)
|
2007-07-19 22:34:25 +00:00
|
|
|
continue;
|
2007-09-08 08:18:24 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) {
|
|
|
|
if (!IN6_ARE_ADDR_EQUAL(&inp->in6p_laddr,
|
2004-02-13 15:11:47 +00:00
|
|
|
&ip6->ip6_dst))
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
}
|
2007-09-08 08:18:24 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
|
|
|
if (!IN6_ARE_ADDR_EQUAL(&inp->in6p_faddr,
|
1999-12-07 17:39:16 +00:00
|
|
|
&ip6->ip6_src) ||
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
inp->inp_fport != uh->uh_sport)
|
1999-12-07 17:39:16 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
/*
|
|
|
|
* XXXRW: Because we weren't holding either the inpcb
|
|
|
|
* or the hash lock when we checked for a match
|
|
|
|
* before, we should probably recheck now that the
|
|
|
|
* inpcb lock is (supposed to be) held.
|
|
|
|
*/
|
|
|
|
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
/*
|
|
|
|
* Handle socket delivery policy for any-source
|
|
|
|
* and source-specific multicast. [RFC3678]
|
|
|
|
*/
|
|
|
|
imo = inp->in6p_moptions;
|
|
|
|
if (imo && IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) {
|
|
|
|
struct sockaddr_in6 mcaddr;
|
|
|
|
int blocked;
|
|
|
|
|
2009-05-01 11:05:24 +00:00
|
|
|
INP_RLOCK(inp);
|
|
|
|
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
bzero(&mcaddr, sizeof(struct sockaddr_in6));
|
|
|
|
mcaddr.sin6_len = sizeof(struct sockaddr_in6);
|
|
|
|
mcaddr.sin6_family = AF_INET6;
|
|
|
|
mcaddr.sin6_addr = ip6->ip6_dst;
|
|
|
|
|
|
|
|
blocked = im6o_mc_filter(imo, ifp,
|
|
|
|
(struct sockaddr *)&mcaddr,
|
|
|
|
(struct sockaddr *)&fromsa);
|
|
|
|
if (blocked != MCAST_PASS) {
|
|
|
|
if (blocked == MCAST_NOTGMEMBER)
|
|
|
|
IP6STAT_INC(ip6s_notmember);
|
|
|
|
if (blocked == MCAST_NOTSMEMBER ||
|
|
|
|
blocked == MCAST_MUTED)
|
|
|
|
UDPSTAT_INC(udps_filtermcast);
|
2009-05-01 11:05:24 +00:00
|
|
|
INP_RUNLOCK(inp); /* XXX */
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
continue;
|
|
|
|
}
|
2009-05-01 11:05:24 +00:00
|
|
|
|
|
|
|
INP_RUNLOCK(inp);
|
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
|
|
|
}
|
1999-12-07 17:39:16 +00:00
|
|
|
if (last != NULL) {
|
2004-02-13 15:11:47 +00:00
|
|
|
struct mbuf *n;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
if ((n = m_copy(m, 0, M_COPYALL)) != NULL) {
|
2008-04-22 12:20:33 +00:00
|
|
|
INP_RLOCK(last);
|
2009-05-23 16:51:13 +00:00
|
|
|
up = intoudpcb(last);
|
|
|
|
if (up->u_tun_func == NULL) {
|
|
|
|
udp6_append(last, n, off, &fromsa);
|
|
|
|
} else {
|
2009-01-06 12:13:40 +00:00
|
|
|
/*
|
|
|
|
* Engage the tunneling
|
|
|
|
* protocol we will have to
|
|
|
|
* leave the info_lock up,
|
|
|
|
* since we are hunting
|
2009-01-06 13:27:56 +00:00
|
|
|
* through multiple UDP's.
|
2009-01-06 12:13:40 +00:00
|
|
|
*
|
|
|
|
*/
|
2009-05-23 16:51:13 +00:00
|
|
|
(*up->u_tun_func)(n, off, last);
|
2009-01-06 12:13:40 +00:00
|
|
|
}
|
2009-05-23 16:51:13 +00:00
|
|
|
INP_RUNLOCK(last);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
}
|
2007-09-08 08:18:24 +00:00
|
|
|
last = inp;
|
1999-12-07 17:39:16 +00:00
|
|
|
/*
|
|
|
|
* Don't look for additional matches if this one does
|
|
|
|
* not have either the SO_REUSEPORT or SO_REUSEADDR
|
2007-07-09 17:47:04 +00:00
|
|
|
* socket options set. This heuristic avoids
|
|
|
|
* searching through all pcbs in the common case of a
|
|
|
|
* non-shared port. It assumes that an application
|
|
|
|
* will never clear these options after setting them.
|
1999-12-07 17:39:16 +00:00
|
|
|
*/
|
2007-09-08 08:18:24 +00:00
|
|
|
if ((last->inp_socket->so_options &
|
2002-05-31 11:52:35 +00:00
|
|
|
(SO_REUSEPORT|SO_REUSEADDR)) == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (last == NULL) {
|
|
|
|
/*
|
2007-07-09 17:47:04 +00:00
|
|
|
* No matching pcb found; discard datagram. (No need
|
|
|
|
* to send an ICMP Port Unreachable for a broadcast
|
|
|
|
* or multicast datgram.)
|
1999-12-07 17:39:16 +00:00
|
|
|
*/
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_noport);
|
|
|
|
UDPSTAT_INC(udps_noportmcast);
|
2007-07-09 17:47:04 +00:00
|
|
|
goto badheadlocked;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
2008-04-22 12:20:33 +00:00
|
|
|
INP_RLOCK(last);
|
2008-08-31 13:16:45 +00:00
|
|
|
INP_INFO_RUNLOCK(&V_udbinfo);
|
2009-05-23 16:51:13 +00:00
|
|
|
up = intoudpcb(last);
|
|
|
|
if (up->u_tun_func == NULL) {
|
|
|
|
udp6_append(last, m, off, &fromsa);
|
|
|
|
} else {
|
2009-01-06 12:13:40 +00:00
|
|
|
/*
|
2009-01-06 13:27:56 +00:00
|
|
|
* Engage the tunneling protocol.
|
2009-01-06 12:13:40 +00:00
|
|
|
*/
|
2009-05-23 16:51:13 +00:00
|
|
|
(*up->u_tun_func)(m, off, last);
|
2009-01-06 12:13:40 +00:00
|
|
|
}
|
2008-04-22 12:20:33 +00:00
|
|
|
INP_RUNLOCK(last);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (IPPROTO_DONE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Locate pcb for datagram.
|
|
|
|
*/
|
2011-08-20 17:05:11 +00:00
|
|
|
#ifdef IPFIREWALL_FORWARD
|
|
|
|
/*
|
|
|
|
* Grab info from PACKET_TAG_IPFORWARD tag prepended to the chain.
|
|
|
|
*/
|
|
|
|
fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL);
|
|
|
|
if (fwd_tag != NULL) {
|
|
|
|
struct sockaddr_in6 *next_hop6;
|
|
|
|
|
|
|
|
next_hop6 = (struct sockaddr_in6 *)(fwd_tag + 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Transparently forwarded. Pretend to be the destination.
|
|
|
|
* Already got one like this?
|
|
|
|
*/
|
|
|
|
inp = in6_pcblookup_mbuf(&V_udbinfo,
|
|
|
|
&ip6->ip6_src, uh->uh_sport, &ip6->ip6_dst, uh->uh_dport,
|
|
|
|
INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif, m);
|
|
|
|
if (!inp) {
|
|
|
|
/*
|
|
|
|
* It's new. Try to find the ambushing socket.
|
|
|
|
* Because we've rewritten the destination address,
|
|
|
|
* any hardware-generated hash is ignored.
|
|
|
|
*/
|
|
|
|
inp = in6_pcblookup(&V_udbinfo, &ip6->ip6_src,
|
|
|
|
uh->uh_sport, &next_hop6->sin6_addr,
|
|
|
|
next_hop6->sin6_port ? htons(next_hop6->sin6_port) :
|
|
|
|
uh->uh_dport, INPLOOKUP_WILDCARD |
|
|
|
|
INPLOOKUP_RLOCKPCB, m->m_pkthdr.rcvif);
|
|
|
|
}
|
|
|
|
/* Remove the tag from the packet. We don't need it anymore. */
|
|
|
|
m_tag_delete(m, fwd_tag);
|
|
|
|
} else
|
|
|
|
#endif /* IPFIREWALL_FORWARD */
|
|
|
|
inp = in6_pcblookup_mbuf(&V_udbinfo, &ip6->ip6_src,
|
|
|
|
uh->uh_sport, &ip6->ip6_dst, uh->uh_dport,
|
|
|
|
INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB,
|
|
|
|
m->m_pkthdr.rcvif, m);
|
2007-09-08 08:18:24 +00:00
|
|
|
if (inp == NULL) {
|
2007-02-20 10:20:03 +00:00
|
|
|
if (udp_log_in_vain) {
|
2006-12-12 12:17:58 +00:00
|
|
|
char ip6bufs[INET6_ADDRSTRLEN];
|
|
|
|
char ip6bufd[INET6_ADDRSTRLEN];
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
log(LOG_INFO,
|
2002-08-19 19:47:13 +00:00
|
|
|
"Connection attempt to UDP [%s]:%d from [%s]:%d\n",
|
2006-12-12 12:17:58 +00:00
|
|
|
ip6_sprintf(ip6bufd, &ip6->ip6_dst),
|
|
|
|
ntohs(uh->uh_dport),
|
|
|
|
ip6_sprintf(ip6bufs, &ip6->ip6_src),
|
|
|
|
ntohs(uh->uh_sport));
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_noport);
|
1999-12-07 17:39:16 +00:00
|
|
|
if (m->m_flags & M_MCAST) {
|
|
|
|
printf("UDP6: M_MCAST is set in a unicast packet.\n");
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_noportmcast);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
goto badunlocked;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
if (V_udp_blackhole)
|
2007-07-19 22:34:25 +00:00
|
|
|
goto badunlocked;
|
|
|
|
if (badport_bandlim(BANDLIM_ICMP6_UNREACH) < 0)
|
|
|
|
goto badunlocked;
|
1999-12-07 17:39:16 +00:00
|
|
|
icmp6_error(m, ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOPORT, 0);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (IPPROTO_DONE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_RLOCK_ASSERT(inp);
|
2009-05-23 16:51:13 +00:00
|
|
|
up = intoudpcb(inp);
|
|
|
|
if (up->u_tun_func == NULL) {
|
|
|
|
udp6_append(inp, m, off, &fromsa);
|
|
|
|
} else {
|
2009-01-06 12:13:40 +00:00
|
|
|
/*
|
2009-01-06 13:27:56 +00:00
|
|
|
* Engage the tunneling protocol.
|
2009-01-06 12:13:40 +00:00
|
|
|
*/
|
|
|
|
|
2009-05-23 16:51:13 +00:00
|
|
|
(*up->u_tun_func)(m, off, inp);
|
2009-01-06 12:13:40 +00:00
|
|
|
}
|
2008-04-22 12:20:33 +00:00
|
|
|
INP_RUNLOCK(inp);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (IPPROTO_DONE);
|
2007-07-19 22:34:25 +00:00
|
|
|
|
2007-07-09 17:47:04 +00:00
|
|
|
badheadlocked:
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
INP_INFO_RUNLOCK(&V_udbinfo);
|
2007-07-09 17:47:04 +00:00
|
|
|
badunlocked:
|
1999-12-07 17:39:16 +00:00
|
|
|
if (m)
|
|
|
|
m_freem(m);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (IPPROTO_DONE);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2007-07-05 16:23:49 +00:00
|
|
|
udp6_ctlinput(int cmd, struct sockaddr *sa, void *d)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
|
|
|
struct udphdr uh;
|
|
|
|
struct ip6_hdr *ip6;
|
|
|
|
struct mbuf *m;
|
2000-07-04 16:35:15 +00:00
|
|
|
int off = 0;
|
2001-06-11 12:39:29 +00:00
|
|
|
struct ip6ctlparam *ip6cp = NULL;
|
|
|
|
const struct sockaddr_in6 *sa6_src = NULL;
|
2004-02-13 14:50:01 +00:00
|
|
|
void *cmdarg;
|
2008-01-08 19:08:58 +00:00
|
|
|
struct inpcb *(*notify)(struct inpcb *, int) = udp_notify;
|
2001-06-11 12:39:29 +00:00
|
|
|
struct udp_portonly {
|
|
|
|
u_int16_t uh_sport;
|
|
|
|
u_int16_t uh_dport;
|
|
|
|
} *uhp;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
if (sa->sa_family != AF_INET6 ||
|
|
|
|
sa->sa_len != sizeof(struct sockaddr_in6))
|
|
|
|
return;
|
|
|
|
|
2000-07-04 16:35:15 +00:00
|
|
|
if ((unsigned)cmd >= PRC_NCMDS)
|
|
|
|
return;
|
|
|
|
if (PRC_IS_REDIRECT(cmd))
|
|
|
|
notify = in6_rtchange, d = NULL;
|
|
|
|
else if (cmd == PRC_HOSTDEAD)
|
|
|
|
d = NULL;
|
|
|
|
else if (inet6ctlerrmap[cmd] == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
return;
|
|
|
|
|
|
|
|
/* if the parameter is from icmp6, decode it. */
|
|
|
|
if (d != NULL) {
|
2001-06-11 12:39:29 +00:00
|
|
|
ip6cp = (struct ip6ctlparam *)d;
|
1999-12-07 17:39:16 +00:00
|
|
|
m = ip6cp->ip6c_m;
|
|
|
|
ip6 = ip6cp->ip6c_ip6;
|
|
|
|
off = ip6cp->ip6c_off;
|
2004-02-13 14:50:01 +00:00
|
|
|
cmdarg = ip6cp->ip6c_cmdarg;
|
2001-06-11 12:39:29 +00:00
|
|
|
sa6_src = ip6cp->ip6c_src;
|
1999-12-07 17:39:16 +00:00
|
|
|
} else {
|
|
|
|
m = NULL;
|
|
|
|
ip6 = NULL;
|
2004-02-13 14:50:01 +00:00
|
|
|
cmdarg = NULL;
|
2001-06-11 12:39:29 +00:00
|
|
|
sa6_src = &sa6_any;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ip6) {
|
|
|
|
/*
|
|
|
|
* XXX: We assume that when IPV6 is non NULL,
|
|
|
|
* M and OFF are valid.
|
|
|
|
*/
|
|
|
|
|
2007-07-09 17:47:04 +00:00
|
|
|
/* Check if we can safely examine src and dst ports. */
|
2001-06-11 12:39:29 +00:00
|
|
|
if (m->m_pkthdr.len < off + sizeof(*uhp))
|
2000-10-23 07:11:01 +00:00
|
|
|
return;
|
|
|
|
|
2001-06-11 12:39:29 +00:00
|
|
|
bzero(&uh, sizeof(uh));
|
|
|
|
m_copydata(m, off, sizeof(*uhp), (caddr_t)&uh);
|
|
|
|
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
(void) in6_pcbnotify(&V_udbinfo, sa, uh.uh_dport,
|
2007-07-09 17:47:04 +00:00
|
|
|
(struct sockaddr *)ip6cp->ip6c_src, uh.uh_sport, cmd,
|
|
|
|
cmdarg, notify);
|
1999-12-07 17:39:16 +00:00
|
|
|
} else
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
(void) in6_pcbnotify(&V_udbinfo, sa, 0,
|
2007-07-09 17:47:04 +00:00
|
|
|
(const struct sockaddr *)sa6_src, 0, cmd, cmdarg, notify);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2000-07-04 11:25:35 +00:00
|
|
|
udp6_getcred(SYSCTL_HANDLER_ARGS)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
2001-02-18 13:30:20 +00:00
|
|
|
struct xucred xuc;
|
1999-12-07 17:39:16 +00:00
|
|
|
struct sockaddr_in6 addrs[2];
|
|
|
|
struct inpcb *inp;
|
2006-04-12 03:03:47 +00:00
|
|
|
int error;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
2007-06-12 00:12:01 +00:00
|
|
|
error = priv_check(req->td, PRIV_NETINET_GETCRED);
|
1999-12-07 17:39:16 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
2000-07-04 16:35:15 +00:00
|
|
|
|
|
|
|
if (req->newlen != sizeof(addrs))
|
|
|
|
return (EINVAL);
|
2001-02-18 13:30:20 +00:00
|
|
|
if (req->oldlen != sizeof(struct xucred))
|
2000-07-04 16:35:15 +00:00
|
|
|
return (EINVAL);
|
1999-12-07 17:39:16 +00:00
|
|
|
error = SYSCTL_IN(req, addrs, sizeof(addrs));
|
|
|
|
if (error)
|
|
|
|
return (error);
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
if ((error = sa6_embedscope(&addrs[0], V_ip6_use_defzone)) != 0 ||
|
|
|
|
(error = sa6_embedscope(&addrs[1], V_ip6_use_defzone)) != 0) {
|
2005-07-25 12:31:43 +00:00
|
|
|
return (error);
|
|
|
|
}
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
inp = in6_pcblookup(&V_udbinfo, &addrs[1].sin6_addr,
|
|
|
|
addrs[1].sin6_port, &addrs[0].sin6_addr, addrs[0].sin6_port,
|
|
|
|
INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB, NULL);
|
2008-05-29 08:27:14 +00:00
|
|
|
if (inp != NULL) {
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_RLOCK_ASSERT(inp);
|
2008-05-29 08:27:14 +00:00
|
|
|
if (inp->inp_socket == NULL)
|
|
|
|
error = ENOENT;
|
|
|
|
if (error == 0)
|
|
|
|
error = cr_canseesocket(req->td->td_ucred,
|
|
|
|
inp->inp_socket);
|
|
|
|
if (error == 0)
|
2008-10-04 15:06:34 +00:00
|
|
|
cru2x(inp->inp_cred, &xuc);
|
2008-05-29 08:27:14 +00:00
|
|
|
INP_RUNLOCK(inp);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
} else
|
2007-07-27 08:25:02 +00:00
|
|
|
error = ENOENT;
|
|
|
|
if (error == 0)
|
|
|
|
error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred));
|
1999-12-07 17:39:16 +00:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2007-07-09 17:47:04 +00:00
|
|
|
SYSCTL_PROC(_net_inet6_udp6, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW, 0,
|
|
|
|
0, udp6_getcred, "S,xucred", "Get the xucred of a UDP6 connection");
|
1999-12-07 17:39:16 +00:00
|
|
|
|
2007-07-23 07:58:58 +00:00
|
|
|
static int
|
2007-09-08 08:18:24 +00:00
|
|
|
udp6_output(struct inpcb *inp, struct mbuf *m, struct sockaddr *addr6,
|
2007-07-23 07:58:58 +00:00
|
|
|
struct mbuf *control, struct thread *td)
|
|
|
|
{
|
|
|
|
u_int32_t ulen = m->m_pkthdr.len;
|
|
|
|
u_int32_t plen = sizeof(struct udphdr) + ulen;
|
|
|
|
struct ip6_hdr *ip6;
|
|
|
|
struct udphdr *udp6;
|
2009-06-23 22:08:55 +00:00
|
|
|
struct in6_addr *laddr, *faddr, in6a;
|
2007-07-23 07:58:58 +00:00
|
|
|
struct sockaddr_in6 *sin6 = NULL;
|
|
|
|
struct ifnet *oifp = NULL;
|
|
|
|
int scope_ambiguous = 0;
|
|
|
|
u_short fport;
|
|
|
|
int error = 0;
|
|
|
|
struct ip6_pktopts *optp, opt;
|
|
|
|
int af = AF_INET6, hlen = sizeof(struct ip6_hdr);
|
|
|
|
int flags;
|
|
|
|
struct sockaddr_in6 tmp;
|
|
|
|
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK_ASSERT(inp);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
|
2007-07-23 07:58:58 +00:00
|
|
|
|
|
|
|
if (addr6) {
|
|
|
|
/* addr6 has been validated in udp6_send(). */
|
|
|
|
sin6 = (struct sockaddr_in6 *)addr6;
|
|
|
|
|
|
|
|
/* protect *sin6 from overwrites */
|
|
|
|
tmp = *sin6;
|
|
|
|
sin6 = &tmp;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Application should provide a proper zone ID or the use of
|
|
|
|
* default zone IDs should be enabled. Unfortunately, some
|
|
|
|
* applications do not behave as it should, so we need a
|
|
|
|
* workaround. Even if an appropriate ID is not determined,
|
|
|
|
* we'll see if we can determine the outgoing interface. If we
|
|
|
|
* can, determine the zone ID based on the interface below.
|
|
|
|
*/
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
if (sin6->sin6_scope_id == 0 && !V_ip6_use_defzone)
|
2007-07-23 07:58:58 +00:00
|
|
|
scope_ambiguous = 1;
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
if ((error = sa6_embedscope(sin6, V_ip6_use_defzone)) != 0)
|
2007-07-23 07:58:58 +00:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (control) {
|
|
|
|
if ((error = ip6_setpktopts(control, &opt,
|
2008-01-24 08:25:59 +00:00
|
|
|
inp->in6p_outputopts, td->td_ucred, IPPROTO_UDP)) != 0)
|
2007-07-23 07:58:58 +00:00
|
|
|
goto release;
|
|
|
|
optp = &opt;
|
|
|
|
} else
|
2007-09-08 08:18:24 +00:00
|
|
|
optp = inp->in6p_outputopts;
|
2007-07-23 07:58:58 +00:00
|
|
|
|
|
|
|
if (sin6) {
|
|
|
|
faddr = &sin6->sin6_addr;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IPv4 version of udp_output calls in_pcbconnect in this case,
|
|
|
|
* which needs splnet and affects performance.
|
|
|
|
* Since we saw no essential reason for calling in_pcbconnect,
|
|
|
|
* we get rid of such kind of logic, and call in6_selectsrc
|
|
|
|
* and in6_pcbsetport in order to fill in the local address
|
|
|
|
* and the local port.
|
|
|
|
*/
|
|
|
|
if (sin6->sin6_port == 0) {
|
|
|
|
error = EADDRNOTAVAIL;
|
|
|
|
goto release;
|
|
|
|
}
|
|
|
|
|
2007-09-08 08:18:24 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
2007-07-23 07:58:58 +00:00
|
|
|
/* how about ::ffff:0.0.0.0 case? */
|
|
|
|
error = EISCONN;
|
|
|
|
goto release;
|
|
|
|
}
|
|
|
|
|
|
|
|
fport = sin6->sin6_port; /* allow 0 port */
|
|
|
|
|
|
|
|
if (IN6_IS_ADDR_V4MAPPED(faddr)) {
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
if ((inp->inp_flags & IN6P_IPV6_V6ONLY)) {
|
2007-07-23 07:58:58 +00:00
|
|
|
/*
|
|
|
|
* I believe we should explicitly discard the
|
|
|
|
* packet when mapped addresses are disabled,
|
|
|
|
* rather than send the packet as an IPv6 one.
|
|
|
|
* If we chose the latter approach, the packet
|
|
|
|
* might be sent out on the wire based on the
|
|
|
|
* default route, the situation which we'd
|
|
|
|
* probably want to avoid.
|
|
|
|
* (20010421 jinmei@kame.net)
|
|
|
|
*/
|
|
|
|
error = EINVAL;
|
|
|
|
goto release;
|
|
|
|
}
|
2007-09-08 08:18:24 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr) &&
|
|
|
|
!IN6_IS_ADDR_V4MAPPED(&inp->in6p_laddr)) {
|
2007-07-23 07:58:58 +00:00
|
|
|
/*
|
|
|
|
* when remote addr is an IPv4-mapped address,
|
|
|
|
* local addr should not be an IPv6 address,
|
|
|
|
* since you cannot determine how to map IPv6
|
|
|
|
* source address to IPv4.
|
|
|
|
*/
|
|
|
|
error = EINVAL;
|
|
|
|
goto release;
|
|
|
|
}
|
|
|
|
|
|
|
|
af = AF_INET;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!IN6_IS_ADDR_V4MAPPED(faddr)) {
|
2009-06-23 22:08:55 +00:00
|
|
|
error = in6_selectsrc(sin6, optp, inp, NULL,
|
|
|
|
td->td_ucred, &oifp, &in6a);
|
|
|
|
if (error)
|
|
|
|
goto release;
|
2007-07-23 07:58:58 +00:00
|
|
|
if (oifp && scope_ambiguous &&
|
|
|
|
(error = in6_setscope(&sin6->sin6_addr,
|
|
|
|
oifp, NULL))) {
|
|
|
|
goto release;
|
|
|
|
}
|
2009-06-23 22:08:55 +00:00
|
|
|
laddr = &in6a;
|
2007-07-23 07:58:58 +00:00
|
|
|
} else
|
2007-09-08 08:18:24 +00:00
|
|
|
laddr = &inp->in6p_laddr; /* XXX */
|
2007-07-23 07:58:58 +00:00
|
|
|
if (laddr == NULL) {
|
|
|
|
if (error == 0)
|
|
|
|
error = EADDRNOTAVAIL;
|
|
|
|
goto release;
|
|
|
|
}
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
if (inp->inp_lport == 0 &&
|
2011-03-12 16:45:15 +00:00
|
|
|
(error = in6_pcbsetport(laddr, inp, td->td_ucred)) != 0) {
|
|
|
|
/* Undo an address bind that may have occurred. */
|
|
|
|
inp->in6p_laddr = in6addr_any;
|
2007-07-23 07:58:58 +00:00
|
|
|
goto release;
|
2011-03-12 16:45:15 +00:00
|
|
|
}
|
2007-07-23 07:58:58 +00:00
|
|
|
} else {
|
2007-09-08 08:18:24 +00:00
|
|
|
if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
2007-07-23 07:58:58 +00:00
|
|
|
error = ENOTCONN;
|
|
|
|
goto release;
|
|
|
|
}
|
2007-09-08 08:18:24 +00:00
|
|
|
if (IN6_IS_ADDR_V4MAPPED(&inp->in6p_faddr)) {
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
if ((inp->inp_flags & IN6P_IPV6_V6ONLY)) {
|
2007-07-23 07:58:58 +00:00
|
|
|
/*
|
|
|
|
* XXX: this case would happen when the
|
|
|
|
* application sets the V6ONLY flag after
|
|
|
|
* connecting the foreign address.
|
|
|
|
* Such applications should be fixed,
|
|
|
|
* so we bark here.
|
|
|
|
*/
|
|
|
|
log(LOG_INFO, "udp6_output: IPV6_V6ONLY "
|
|
|
|
"option was set for a connected socket\n");
|
|
|
|
error = EINVAL;
|
|
|
|
goto release;
|
|
|
|
} else
|
|
|
|
af = AF_INET;
|
|
|
|
}
|
2007-09-08 08:18:24 +00:00
|
|
|
laddr = &inp->in6p_laddr;
|
|
|
|
faddr = &inp->in6p_faddr;
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
fport = inp->inp_fport;
|
2007-07-23 07:58:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (af == AF_INET)
|
|
|
|
hlen = sizeof(struct ip);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Calculate data length and get a mbuf
|
|
|
|
* for UDP and IP6 headers.
|
|
|
|
*/
|
|
|
|
M_PREPEND(m, hlen + sizeof(struct udphdr), M_DONTWAIT);
|
|
|
|
if (m == 0) {
|
|
|
|
error = ENOBUFS;
|
|
|
|
goto release;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Stuff checksum and output datagram.
|
|
|
|
*/
|
|
|
|
udp6 = (struct udphdr *)(mtod(m, caddr_t) + hlen);
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
udp6->uh_sport = inp->inp_lport; /* lport is always set in the PCB */
|
2007-07-23 07:58:58 +00:00
|
|
|
udp6->uh_dport = fport;
|
|
|
|
if (plen <= 0xffff)
|
|
|
|
udp6->uh_ulen = htons((u_short)plen);
|
|
|
|
else
|
|
|
|
udp6->uh_ulen = 0;
|
|
|
|
udp6->uh_sum = 0;
|
|
|
|
|
|
|
|
switch (af) {
|
|
|
|
case AF_INET6:
|
|
|
|
ip6 = mtod(m, struct ip6_hdr *);
|
Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.
Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
|
|
|
ip6->ip6_flow = inp->inp_flow & IPV6_FLOWINFO_MASK;
|
2007-07-23 07:58:58 +00:00
|
|
|
ip6->ip6_vfc &= ~IPV6_VERSION_MASK;
|
|
|
|
ip6->ip6_vfc |= IPV6_VERSION;
|
|
|
|
#if 0 /* ip6_plen will be filled in ip6_output. */
|
|
|
|
ip6->ip6_plen = htons((u_short)plen);
|
|
|
|
#endif
|
|
|
|
ip6->ip6_nxt = IPPROTO_UDP;
|
2007-09-08 08:18:24 +00:00
|
|
|
ip6->ip6_hlim = in6_selecthlim(inp, NULL);
|
2007-07-23 07:58:58 +00:00
|
|
|
ip6->ip6_src = *laddr;
|
|
|
|
ip6->ip6_dst = *faddr;
|
|
|
|
|
2012-05-25 02:19:17 +00:00
|
|
|
udp6->uh_sum = in6_cksum_pseudo(ip6, plen, IPPROTO_UDP, 0);
|
It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading. Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.
To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.
Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.
This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.
Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.
Individual driver updates will have to follow, as will SCTP.
Reported by: gallatin, dim, ..
Reviewed by: gallatin (glanced at?)
MFC after: 3 days
X-MFC with: r235961,235959,235958
2012-05-28 09:30:13 +00:00
|
|
|
m->m_pkthdr.csum_flags = CSUM_UDP_IPV6;
|
2012-05-25 02:19:17 +00:00
|
|
|
m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum);
|
2007-07-23 07:58:58 +00:00
|
|
|
|
|
|
|
flags = 0;
|
|
|
|
|
2009-04-12 11:53:12 +00:00
|
|
|
UDPSTAT_INC(udps_opackets);
|
2007-09-08 08:18:24 +00:00
|
|
|
error = ip6_output(m, optp, NULL, flags, inp->in6p_moptions,
|
|
|
|
NULL, inp);
|
2007-07-23 07:58:58 +00:00
|
|
|
break;
|
|
|
|
case AF_INET:
|
|
|
|
error = EAFNOSUPPORT;
|
|
|
|
goto release;
|
|
|
|
}
|
|
|
|
goto releaseopt;
|
|
|
|
|
|
|
|
release:
|
|
|
|
m_freem(m);
|
|
|
|
|
|
|
|
releaseopt:
|
|
|
|
if (control) {
|
|
|
|
ip6_clearpktopts(&opt, -1);
|
|
|
|
m_freem(control);
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2006-04-01 15:15:05 +00:00
|
|
|
static void
|
1999-12-07 17:39:16 +00:00
|
|
|
udp6_abort(struct socket *so)
|
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
KASSERT(inp != NULL, ("udp6_abort: inp == NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
2006-07-21 17:11:15 +00:00
|
|
|
#ifdef INET
|
|
|
|
if (inp->inp_vflag & INP_IPV4) {
|
|
|
|
struct pr_usrreqs *pru;
|
|
|
|
|
|
|
|
pru = inetsw[ip_protox[IPPROTO_UDP]].pr_usrreqs;
|
|
|
|
(*pru->pru_abort)(so);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK(inp);
|
2006-07-21 17:11:15 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
2006-07-21 17:11:15 +00:00
|
|
|
in6_pcbdisconnect(inp);
|
|
|
|
inp->in6p_laddr = in6addr_any;
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2006-07-21 17:11:15 +00:00
|
|
|
soisdisconnected(so);
|
|
|
|
}
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2001-09-12 08:38:13 +00:00
|
|
|
udp6_attach(struct socket *so, int proto, struct thread *td)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
2006-04-12 03:03:47 +00:00
|
|
|
int error;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
2006-04-09 16:33:41 +00:00
|
|
|
KASSERT(inp == NULL, ("udp6_attach: inp != NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
if (so->so_snd.sb_hiwat == 0 || so->so_rcv.sb_hiwat == 0) {
|
|
|
|
error = soreserve(so, udp_sendspace, udp_recvspace);
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
if (error)
|
2007-07-09 17:47:04 +00:00
|
|
|
return (error);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
INP_INFO_WLOCK(&V_udbinfo);
|
|
|
|
error = in_pcballoc(so, &V_udbinfo);
|
2004-07-27 23:44:03 +00:00
|
|
|
if (error) {
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
INP_INFO_WUNLOCK(&V_udbinfo);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (error);
|
2004-07-27 23:44:03 +00:00
|
|
|
}
|
1999-12-07 17:39:16 +00:00
|
|
|
inp = (struct inpcb *)so->so_pcb;
|
|
|
|
inp->inp_vflag |= INP_IPV6;
|
2006-02-02 11:46:05 +00:00
|
|
|
if ((inp->inp_flags & IN6P_IPV6_V6ONLY) == 0)
|
2002-07-15 19:25:46 +00:00
|
|
|
inp->inp_vflag |= INP_IPV4;
|
1999-12-07 17:39:16 +00:00
|
|
|
inp->in6p_hops = -1; /* use kernel default */
|
|
|
|
inp->in6p_cksum = -1; /* just to be sure */
|
2000-06-22 16:48:59 +00:00
|
|
|
/*
|
|
|
|
* XXX: ugly!!
|
|
|
|
* IPv4 TTL initialization is necessary for an IPv6 socket as well,
|
|
|
|
* because the socket may be bound to an IPv6 wildcard address,
|
|
|
|
* which may match an IPv4-mapped IPv6 address.
|
|
|
|
*/
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
inp->inp_ip_ttl = V_ip_defttl;
|
2009-05-23 16:51:13 +00:00
|
|
|
|
|
|
|
error = udp_newudpcb(inp);
|
|
|
|
if (error) {
|
|
|
|
in_pcbdetach(inp);
|
|
|
|
in_pcbfree(inp);
|
|
|
|
INP_INFO_WUNLOCK(&V_udbinfo);
|
|
|
|
return (error);
|
|
|
|
}
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2009-05-23 16:51:13 +00:00
|
|
|
INP_INFO_WUNLOCK(&V_udbinfo);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (0);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2001-09-12 08:38:13 +00:00
|
|
|
udp6_bind(struct socket *so, struct sockaddr *nam, struct thread *td)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
2006-04-12 03:03:47 +00:00
|
|
|
int error;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
KASSERT(inp != NULL, ("udp6_bind: inp == NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK(inp);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
1999-12-07 17:39:16 +00:00
|
|
|
inp->inp_vflag &= ~INP_IPV4;
|
|
|
|
inp->inp_vflag |= INP_IPV6;
|
2001-06-11 12:39:29 +00:00
|
|
|
if ((inp->inp_flags & IN6P_IPV6_V6ONLY) == 0) {
|
1999-12-07 17:39:16 +00:00
|
|
|
struct sockaddr_in6 *sin6_p;
|
|
|
|
|
|
|
|
sin6_p = (struct sockaddr_in6 *)nam;
|
|
|
|
|
|
|
|
if (IN6_IS_ADDR_UNSPECIFIED(&sin6_p->sin6_addr))
|
|
|
|
inp->inp_vflag |= INP_IPV4;
|
2011-04-30 11:17:00 +00:00
|
|
|
#ifdef INET
|
1999-12-07 17:39:16 +00:00
|
|
|
else if (IN6_IS_ADDR_V4MAPPED(&sin6_p->sin6_addr)) {
|
|
|
|
struct sockaddr_in sin;
|
|
|
|
|
|
|
|
in6_sin6_2_sin(&sin, sin6_p);
|
|
|
|
inp->inp_vflag |= INP_IPV4;
|
|
|
|
inp->inp_vflag &= ~INP_IPV6;
|
2004-03-27 21:05:46 +00:00
|
|
|
error = in_pcbbind(inp, (struct sockaddr *)&sin,
|
|
|
|
td->td_ucred);
|
2004-07-27 23:44:03 +00:00
|
|
|
goto out;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
2011-04-30 11:17:00 +00:00
|
|
|
#endif
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
2004-03-27 21:05:46 +00:00
|
|
|
error = in6_pcbbind(inp, nam, td->td_ucred);
|
2011-04-30 11:17:00 +00:00
|
|
|
#ifdef INET
|
2004-07-27 23:44:03 +00:00
|
|
|
out:
|
2011-04-30 11:17:00 +00:00
|
|
|
#endif
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (error);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
2006-07-21 17:11:15 +00:00
|
|
|
static void
|
|
|
|
udp6_close(struct socket *so)
|
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
|
|
|
KASSERT(inp != NULL, ("udp6_close: inp == NULL"));
|
|
|
|
|
|
|
|
#ifdef INET
|
|
|
|
if (inp->inp_vflag & INP_IPV4) {
|
|
|
|
struct pr_usrreqs *pru;
|
|
|
|
|
|
|
|
pru = inetsw[ip_protox[IPPROTO_UDP]].pr_usrreqs;
|
|
|
|
(*pru->pru_disconnect)(so);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#endif
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK(inp);
|
2006-07-21 17:11:15 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
2006-07-21 17:11:15 +00:00
|
|
|
in6_pcbdisconnect(inp);
|
|
|
|
inp->in6p_laddr = in6addr_any;
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2006-07-21 17:11:15 +00:00
|
|
|
soisdisconnected(so);
|
|
|
|
}
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2006-07-21 17:11:15 +00:00
|
|
|
}
|
|
|
|
|
1999-12-07 17:39:16 +00:00
|
|
|
static int
|
2001-09-12 08:38:13 +00:00
|
|
|
udp6_connect(struct socket *so, struct sockaddr *nam, struct thread *td)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
2009-02-05 15:04:23 +00:00
|
|
|
struct sockaddr_in6 *sin6;
|
2006-04-12 03:03:47 +00:00
|
|
|
int error;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
2009-02-05 15:04:23 +00:00
|
|
|
sin6 = (struct sockaddr_in6 *)nam;
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
KASSERT(inp != NULL, ("udp6_connect: inp == NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
/*
|
|
|
|
* XXXRW: Need to clarify locking of v4/v6 flags.
|
|
|
|
*/
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK(inp);
|
2011-04-30 11:17:00 +00:00
|
|
|
#ifdef INET
|
2011-04-09 01:29:46 +00:00
|
|
|
if (IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr)) {
|
2009-02-05 15:04:23 +00:00
|
|
|
struct sockaddr_in sin;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
2011-04-09 01:29:46 +00:00
|
|
|
if ((inp->inp_flags & IN6P_IPV6_V6ONLY) != 0) {
|
|
|
|
error = EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
2009-02-05 15:04:23 +00:00
|
|
|
if (inp->inp_faddr.s_addr != INADDR_ANY) {
|
|
|
|
error = EISCONN;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
in6_sin6_2_sin(&sin, sin6);
|
2011-04-09 01:29:46 +00:00
|
|
|
inp->inp_vflag |= INP_IPV4;
|
|
|
|
inp->inp_vflag &= ~INP_IPV6;
|
2009-02-05 15:04:23 +00:00
|
|
|
error = prison_remote_ip4(td->td_ucred, &sin.sin_addr);
|
|
|
|
if (error != 0)
|
2004-07-27 23:44:03 +00:00
|
|
|
goto out;
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
2009-02-05 15:04:23 +00:00
|
|
|
error = in_pcbconnect(inp, (struct sockaddr *)&sin,
|
|
|
|
td->td_ucred);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2011-04-09 01:29:46 +00:00
|
|
|
if (error == 0)
|
2009-02-05 15:04:23 +00:00
|
|
|
soisconnected(so);
|
|
|
|
goto out;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
2011-04-30 11:17:00 +00:00
|
|
|
#endif
|
2004-07-27 23:44:03 +00:00
|
|
|
if (!IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
|
|
|
error = EISCONN;
|
|
|
|
goto out;
|
|
|
|
}
|
2011-04-09 01:29:46 +00:00
|
|
|
inp->inp_vflag &= ~INP_IPV4;
|
|
|
|
inp->inp_vflag |= INP_IPV6;
|
2009-02-05 15:04:23 +00:00
|
|
|
error = prison_remote_ip6(td->td_ucred, &sin6->sin6_addr);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
2004-03-27 21:05:46 +00:00
|
|
|
error = in6_pcbconnect(inp, nam, td->td_ucred);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2011-04-09 01:29:46 +00:00
|
|
|
if (error == 0)
|
1999-12-07 17:39:16 +00:00
|
|
|
soisconnected(so);
|
2004-07-27 23:44:03 +00:00
|
|
|
out:
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (error);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
Chance protocol switch method pru_detach() so that it returns void
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.
MFC after: 3 months
2006-04-01 15:42:02 +00:00
|
|
|
static void
|
1999-12-07 17:39:16 +00:00
|
|
|
udp6_detach(struct socket *so)
|
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
2009-05-23 16:51:13 +00:00
|
|
|
struct udpcb *up;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
KASSERT(inp != NULL, ("udp6_detach: inp == NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
INP_INFO_WLOCK(&V_udbinfo);
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK(inp);
|
2009-05-23 16:51:13 +00:00
|
|
|
up = intoudpcb(inp);
|
|
|
|
KASSERT(up != NULL, ("%s: up == NULL", __func__));
|
2008-11-26 20:52:26 +00:00
|
|
|
in_pcbdetach(inp);
|
2008-11-27 12:04:35 +00:00
|
|
|
in_pcbfree(inp);
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
INP_INFO_WUNLOCK(&V_udbinfo);
|
2009-05-23 16:51:13 +00:00
|
|
|
udp_discardcb(up);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
udp6_disconnect(struct socket *so)
|
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
2006-04-12 03:03:47 +00:00
|
|
|
int error;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
KASSERT(inp != NULL, ("udp6_disconnect: inp == NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
2004-02-13 15:11:47 +00:00
|
|
|
#ifdef INET
|
1999-12-07 17:39:16 +00:00
|
|
|
if (inp->inp_vflag & INP_IPV4) {
|
|
|
|
struct pr_usrreqs *pru;
|
|
|
|
|
|
|
|
pru = inetsw[ip_protox[IPPROTO_UDP]].pr_usrreqs;
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
(void)(*pru->pru_disconnect)(so);
|
|
|
|
return (0);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
2004-02-13 15:11:47 +00:00
|
|
|
#endif
|
1999-12-07 17:39:16 +00:00
|
|
|
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_WLOCK(inp);
|
|
|
|
|
2004-07-27 23:44:03 +00:00
|
|
|
if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
|
|
|
|
error = ENOTCONN;
|
|
|
|
goto out;
|
|
|
|
}
|
1999-12-07 17:39:16 +00:00
|
|
|
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
1999-12-07 17:39:16 +00:00
|
|
|
in6_pcbdisconnect(inp);
|
|
|
|
inp->in6p_laddr = in6addr_any;
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2008-10-12 20:01:32 +00:00
|
|
|
SOCK_LOCK(so);
|
1999-12-07 17:39:16 +00:00
|
|
|
so->so_state &= ~SS_ISCONNECTED; /* XXX */
|
2008-10-12 20:01:32 +00:00
|
|
|
SOCK_UNLOCK(so);
|
2004-07-27 23:44:03 +00:00
|
|
|
out:
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (0);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2007-07-09 17:47:04 +00:00
|
|
|
udp6_send(struct socket *so, int flags, struct mbuf *m,
|
|
|
|
struct sockaddr *addr, struct mbuf *control, struct thread *td)
|
1999-12-07 17:39:16 +00:00
|
|
|
{
|
|
|
|
struct inpcb *inp;
|
2000-07-04 16:35:15 +00:00
|
|
|
int error = 0;
|
1999-12-07 17:39:16 +00:00
|
|
|
|
|
|
|
inp = sotoinpcb(so);
|
Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
2006-04-01 16:20:54 +00:00
|
|
|
KASSERT(inp != NULL, ("udp6_send: inp == NULL"));
|
2006-04-12 03:03:47 +00:00
|
|
|
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WLOCK(inp);
|
2000-07-04 16:35:15 +00:00
|
|
|
if (addr) {
|
2007-07-05 16:29:40 +00:00
|
|
|
if (addr->sa_len != sizeof(struct sockaddr_in6)) {
|
2000-07-04 16:35:15 +00:00
|
|
|
error = EINVAL;
|
|
|
|
goto bad;
|
|
|
|
}
|
|
|
|
if (addr->sa_family != AF_INET6) {
|
|
|
|
error = EAFNOSUPPORT;
|
|
|
|
goto bad;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2004-02-13 15:11:47 +00:00
|
|
|
#ifdef INET
|
2006-02-02 11:46:05 +00:00
|
|
|
if ((inp->inp_flags & IN6P_IPV6_V6ONLY) == 0) {
|
1999-12-07 17:39:16 +00:00
|
|
|
int hasv4addr;
|
|
|
|
struct sockaddr_in6 *sin6 = 0;
|
|
|
|
|
|
|
|
if (addr == 0)
|
|
|
|
hasv4addr = (inp->inp_vflag & INP_IPV4);
|
|
|
|
else {
|
|
|
|
sin6 = (struct sockaddr_in6 *)addr;
|
|
|
|
hasv4addr = IN6_IS_ADDR_V4MAPPED(&sin6->sin6_addr)
|
2007-07-09 17:47:04 +00:00
|
|
|
? 1 : 0;
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
if (hasv4addr) {
|
|
|
|
struct pr_usrreqs *pru;
|
|
|
|
|
2008-09-22 06:44:03 +00:00
|
|
|
/*
|
|
|
|
* XXXRW: We release UDP-layer locks before calling
|
|
|
|
* udp_send() in order to avoid recursion. However,
|
|
|
|
* this does mean there is a short window where inp's
|
|
|
|
* fields are unstable. Could this lead to a
|
|
|
|
* potential race in which the factors causing us to
|
|
|
|
* select the UDPv4 output routine are invalidated?
|
|
|
|
*/
|
|
|
|
INP_WUNLOCK(inp);
|
1999-12-07 17:39:16 +00:00
|
|
|
if (sin6)
|
|
|
|
in6_sin6_2_sin_in_sock(addr);
|
|
|
|
pru = inetsw[ip_protox[IPPROTO_UDP]].pr_usrreqs;
|
|
|
|
/* addr will just be freed in sendit(). */
|
2008-09-22 06:44:03 +00:00
|
|
|
return ((*pru->pru_send)(so, flags, m, addr, control,
|
|
|
|
td));
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
}
|
2004-02-13 15:11:47 +00:00
|
|
|
#endif
|
2007-07-19 22:34:25 +00:00
|
|
|
#ifdef MAC
|
2007-10-24 19:04:04 +00:00
|
|
|
mac_inpcb_create_mbuf(inp, m);
|
2007-07-19 22:34:25 +00:00
|
|
|
#endif
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WLOCK(&V_udbinfo);
|
2004-07-27 23:44:03 +00:00
|
|
|
error = udp6_output(inp, m, addr, control, td);
|
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
|
|
|
INP_HASH_WUNLOCK(&V_udbinfo);
|
2010-05-09 20:32:00 +00:00
|
|
|
#ifdef INET
|
|
|
|
#endif
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2007-07-09 17:47:04 +00:00
|
|
|
return (error);
|
2000-07-04 16:35:15 +00:00
|
|
|
|
2007-07-09 17:47:04 +00:00
|
|
|
bad:
|
2008-04-17 21:38:18 +00:00
|
|
|
INP_WUNLOCK(inp);
|
2000-07-04 16:35:15 +00:00
|
|
|
m_freem(m);
|
2003-10-06 14:02:09 +00:00
|
|
|
return (error);
|
1999-12-07 17:39:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
struct pr_usrreqs udp6_usrreqs = {
|
2004-11-08 14:44:54 +00:00
|
|
|
.pru_abort = udp6_abort,
|
|
|
|
.pru_attach = udp6_attach,
|
|
|
|
.pru_bind = udp6_bind,
|
|
|
|
.pru_connect = udp6_connect,
|
|
|
|
.pru_control = in6_control,
|
|
|
|
.pru_detach = udp6_detach,
|
|
|
|
.pru_disconnect = udp6_disconnect,
|
|
|
|
.pru_peeraddr = in6_mapped_peeraddr,
|
|
|
|
.pru_send = udp6_send,
|
|
|
|
.pru_shutdown = udp_shutdown,
|
|
|
|
.pru_sockaddr = in6_mapped_sockaddr,
|
2008-07-08 10:15:23 +00:00
|
|
|
.pru_soreceive = soreceive_dgram,
|
|
|
|
.pru_sosend = sosend_dgram,
|
2006-07-21 17:11:15 +00:00
|
|
|
.pru_sosetlabel = in_pcbsosetlabel,
|
|
|
|
.pru_close = udp6_close
|
1999-12-07 17:39:16 +00:00
|
|
|
};
|