2005-01-06 23:35:40 +00:00
|
|
|
/*-
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1989, 1991, 1993
|
2005-09-23 12:41:06 +00:00
|
|
|
* The Regents of the University of California.
|
2006-03-17 13:52:57 +00:00
|
|
|
* Copyright 2004-2006 Robert N. M. Watson
|
2005-09-23 12:41:06 +00:00
|
|
|
* All rights reserved.
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1995-05-11 00:13:26 +00:00
|
|
|
* From: @(#)uipc_usrreq.c 8.3 (Berkeley) 1/4/94
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
2006-07-23 20:06:45 +00:00
|
|
|
/*
|
|
|
|
* UNIX Domain (Local) Sockets
|
|
|
|
*
|
|
|
|
* This is an implementation of UNIX (local) domain sockets. Each socket has
|
|
|
|
* an associated struct unpcb (UNIX protocol control block). Stream sockets
|
|
|
|
* may be connected to 0 or 1 other socket. Datagram sockets may be
|
|
|
|
* connected to 0, 1, or many other sockets. Sockets may be created and
|
|
|
|
* connected in pairs (socketpair(2)), or bound/connected to using the file
|
|
|
|
* system name space. For most purposes, only the receive socket buffer is
|
|
|
|
* used, as sending on one socket delivers directly to the receive socket
|
|
|
|
* buffer of a second socket. The implementation is substantially
|
|
|
|
* complicated by the fact that "ancillary data", such as file descriptors or
|
2006-07-23 21:01:09 +00:00
|
|
|
* credentials, may be passed across UNIX domain sockets. The potential for
|
|
|
|
* passing UNIX domain sockets over other UNIX domain sockets requires the
|
|
|
|
* implementation of a simple garbage collector to find and tear down cycles
|
|
|
|
* of disconnected sockets.
|
2006-07-23 20:06:45 +00:00
|
|
|
*/
|
|
|
|
|
2003-06-11 00:56:59 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2002-07-31 03:03:22 +00:00
|
|
|
#include "opt_mac.h"
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
2001-05-01 08:13:21 +00:00
|
|
|
#include <sys/domain.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/fcntl.h>
|
1997-11-23 10:43:49 +00:00
|
|
|
#include <sys/malloc.h> /* XXX must be before <sys/file.h> */
|
2006-04-21 09:25:40 +00:00
|
|
|
#include <sys/eventhandler.h>
|
1997-02-24 20:30:58 +00:00
|
|
|
#include <sys/file.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/filedesc.h>
|
|
|
|
#include <sys/jail.h>
|
|
|
|
#include <sys/kernel.h>
|
|
|
|
#include <sys/lock.h>
|
2002-08-01 01:18:42 +00:00
|
|
|
#include <sys/mac.h>
|
1997-02-24 20:30:58 +00:00
|
|
|
#include <sys/mbuf.h>
|
2006-01-30 08:19:01 +00:00
|
|
|
#include <sys/mount.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/mutex.h>
|
1997-02-24 20:30:58 +00:00
|
|
|
#include <sys/namei.h>
|
|
|
|
#include <sys/proc.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/protosw.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/resourcevar.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/socketvar.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/signalvar.h>
|
1997-02-24 20:30:58 +00:00
|
|
|
#include <sys/stat.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/sx.h>
|
1997-02-24 20:30:58 +00:00
|
|
|
#include <sys/sysctl.h>
|
2002-04-30 01:54:54 +00:00
|
|
|
#include <sys/systm.h>
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
#include <sys/taskqueue.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/un.h>
|
1998-05-15 20:11:40 +00:00
|
|
|
#include <sys/unpcb.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/vnode.h>
|
|
|
|
|
2002-03-20 04:11:52 +00:00
|
|
|
#include <vm/uma.h>
|
1998-05-15 20:11:40 +00:00
|
|
|
|
2002-03-20 04:11:52 +00:00
|
|
|
static uma_zone_t unp_zone;
|
1998-05-15 20:11:40 +00:00
|
|
|
static unp_gen_t unp_gencnt;
|
|
|
|
static u_int unp_count;
|
|
|
|
|
|
|
|
static struct unp_head unp_shead, unp_dhead;
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Unix communications domain.
|
|
|
|
*
|
|
|
|
* TODO:
|
|
|
|
* SEQPACKET, RDM
|
|
|
|
* rethink name space problems
|
|
|
|
* need a proper out-of-band
|
1998-05-15 20:11:40 +00:00
|
|
|
* lock pushdown
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2004-06-04 04:07:08 +00:00
|
|
|
static const struct sockaddr sun_noname = { sizeof(sun_noname), AF_LOCAL };
|
1995-12-14 09:55:16 +00:00
|
|
|
static ino_t unp_ino; /* prototype for fake inode numbers */
|
2005-04-13 00:01:46 +00:00
|
|
|
struct mbuf *unp_addsockcred(struct thread *, struct mbuf *);
|
1995-12-14 09:55:16 +00:00
|
|
|
|
2006-07-23 10:19:04 +00:00
|
|
|
/*
|
|
|
|
* Both send and receive buffers are allocated PIPSIZ bytes of buffering for
|
|
|
|
* stream sockets, although the total for sender and receiver is actually
|
|
|
|
* only PIPSIZ.
|
|
|
|
*
|
|
|
|
* Datagram sockets really use the sendspace as the maximum datagram size,
|
|
|
|
* and don't really want to reserve the sendspace. Their recvspace should be
|
|
|
|
* large enough for at least one max-size datagram plus address.
|
|
|
|
*/
|
|
|
|
#ifndef PIPSIZ
|
|
|
|
#define PIPSIZ 8192
|
|
|
|
#endif
|
|
|
|
static u_long unpst_sendspace = PIPSIZ;
|
|
|
|
static u_long unpst_recvspace = PIPSIZ;
|
|
|
|
static u_long unpdg_sendspace = 2*1024; /* really max datagram size */
|
|
|
|
static u_long unpdg_recvspace = 4*1024;
|
|
|
|
|
|
|
|
static int unp_rights; /* file descriptors in flight */
|
|
|
|
|
2006-08-07 12:02:43 +00:00
|
|
|
SYSCTL_NODE(_net, PF_LOCAL, local, CTLFLAG_RW, 0, "Local domain");
|
|
|
|
SYSCTL_NODE(_net_local, SOCK_STREAM, stream, CTLFLAG_RW, 0, "SOCK_STREAM");
|
|
|
|
SYSCTL_NODE(_net_local, SOCK_DGRAM, dgram, CTLFLAG_RW, 0, "SOCK_DGRAM");
|
|
|
|
|
2006-07-23 10:19:04 +00:00
|
|
|
SYSCTL_ULONG(_net_local_stream, OID_AUTO, sendspace, CTLFLAG_RW,
|
|
|
|
&unpst_sendspace, 0, "");
|
|
|
|
SYSCTL_ULONG(_net_local_stream, OID_AUTO, recvspace, CTLFLAG_RW,
|
|
|
|
&unpst_recvspace, 0, "");
|
|
|
|
SYSCTL_ULONG(_net_local_dgram, OID_AUTO, maxdgram, CTLFLAG_RW,
|
|
|
|
&unpdg_sendspace, 0, "");
|
|
|
|
SYSCTL_ULONG(_net_local_dgram, OID_AUTO, recvspace, CTLFLAG_RW,
|
|
|
|
&unpdg_recvspace, 0, "");
|
|
|
|
SYSCTL_INT(_net_local, OID_AUTO, inflight, CTLFLAG_RD, &unp_rights, 0, "");
|
|
|
|
|
2004-08-16 01:52:04 +00:00
|
|
|
/*
|
|
|
|
* Currently, UNIX domain sockets are protected by a single subsystem lock,
|
|
|
|
* which covers global data structures and variables, the contents of each
|
|
|
|
* per-socket unpcb structure, and the so_pcb field in sockets attached to
|
|
|
|
* the UNIX domain. This provides for a moderate degree of paralellism, as
|
|
|
|
* receive operations on UNIX domain sockets do not need to acquire the
|
|
|
|
* subsystem lock. Finer grained locking to permit send() without acquiring
|
|
|
|
* a global lock would be a logical next step.
|
|
|
|
*
|
|
|
|
* The UNIX domain socket lock preceds all socket layer locks, including the
|
|
|
|
* socket lock and socket buffer lock, permitting UNIX domain socket code to
|
|
|
|
* call into socket support routines without releasing its locks.
|
|
|
|
*
|
|
|
|
* Some caution is required in areas where the UNIX domain socket code enters
|
|
|
|
* VFS in order to create or find rendezvous points. This results in
|
|
|
|
* dropping of the UNIX domain socket subsystem lock, acquisition of the
|
|
|
|
* Giant lock, and potential sleeping. This increases the chances of races,
|
|
|
|
* and exposes weaknesses in the socket->protocol API by offering poor
|
|
|
|
* failure modes.
|
|
|
|
*/
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
static struct mtx unp_mtx;
|
|
|
|
#define UNP_LOCK_INIT() \
|
|
|
|
mtx_init(&unp_mtx, "unp", NULL, MTX_DEF)
|
|
|
|
#define UNP_LOCK() mtx_lock(&unp_mtx)
|
|
|
|
#define UNP_UNLOCK() mtx_unlock(&unp_mtx)
|
|
|
|
#define UNP_LOCK_ASSERT() mtx_assert(&unp_mtx, MA_OWNED)
|
2004-08-19 01:45:16 +00:00
|
|
|
#define UNP_UNLOCK_ASSERT() mtx_assert(&unp_mtx, MA_NOTOWNED)
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
/*
|
|
|
|
* Garbage collection of cyclic file descriptor/socket references occurs
|
|
|
|
* asynchronously in a taskqueue context in order to avoid recursion and
|
|
|
|
* reentrance in the UNIX domain socket, file descriptor, and socket layer
|
|
|
|
* code. See unp_gc() for a full description.
|
|
|
|
*/
|
|
|
|
static struct task unp_gc_task;
|
|
|
|
|
2002-03-24 05:09:11 +00:00
|
|
|
static int unp_connect(struct socket *,struct sockaddr *, struct thread *);
|
2005-04-13 00:01:46 +00:00
|
|
|
static int unp_connect2(struct socket *so, struct socket *so2, int);
|
2002-03-19 21:25:46 +00:00
|
|
|
static void unp_disconnect(struct unpcb *);
|
|
|
|
static void unp_shutdown(struct unpcb *);
|
|
|
|
static void unp_drop(struct unpcb *, int);
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
static void unp_gc(__unused void *, int);
|
2002-03-19 21:25:46 +00:00
|
|
|
static void unp_scan(struct mbuf *, void (*)(struct file *));
|
|
|
|
static void unp_mark(struct file *);
|
|
|
|
static void unp_discard(struct file *);
|
|
|
|
static void unp_freerights(struct file **, int);
|
|
|
|
static int unp_internalize(struct mbuf **, struct thread *);
|
2005-10-30 19:44:40 +00:00
|
|
|
static int unp_listen(struct socket *, struct unpcb *, int,
|
|
|
|
struct thread *);
|
1995-12-14 09:55:16 +00:00
|
|
|
|
2006-08-07 12:02:43 +00:00
|
|
|
/*
|
|
|
|
* Definitions of protocols supported in the LOCAL domain.
|
|
|
|
*/
|
|
|
|
static struct domain localdomain;
|
|
|
|
static struct protosw localsw[] = {
|
|
|
|
{
|
|
|
|
.pr_type = SOCK_STREAM,
|
|
|
|
.pr_domain = &localdomain,
|
|
|
|
.pr_flags = PR_CONNREQUIRED|PR_WANTRCVD|PR_RIGHTS,
|
|
|
|
.pr_ctloutput = &uipc_ctloutput,
|
|
|
|
.pr_usrreqs = &uipc_usrreqs
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.pr_type = SOCK_DGRAM,
|
|
|
|
.pr_domain = &localdomain,
|
|
|
|
.pr_flags = PR_ATOMIC|PR_ADDR|PR_RIGHTS,
|
|
|
|
.pr_usrreqs = &uipc_usrreqs
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct domain localdomain = {
|
|
|
|
.dom_family = AF_LOCAL,
|
|
|
|
.dom_name = "local",
|
|
|
|
.dom_init = unp_init,
|
|
|
|
.dom_externalize = unp_externalize,
|
|
|
|
.dom_dispose = unp_dispose,
|
|
|
|
.dom_protosw = localsw,
|
|
|
|
.dom_protoswNPROTOSW = &localsw[sizeof(localsw)/sizeof(localsw[0])]
|
|
|
|
};
|
|
|
|
DOMAIN_SET(local);
|
|
|
|
|
2006-04-01 15:15:05 +00:00
|
|
|
static void
|
1997-04-27 20:01:29 +00:00
|
|
|
uipc_abort(struct socket *so)
|
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_abort: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
unp_drop(unp, ECONNABORTED);
|
2006-07-21 17:11:15 +00:00
|
|
|
UNP_UNLOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
1997-08-16 19:16:27 +00:00
|
|
|
uipc_accept(struct socket *so, struct sockaddr **nam)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
const struct sockaddr *sa;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Pass back name of connected socket, if it was bound and we are
|
|
|
|
* still connected (our peer may have closed already!).
|
1997-04-27 20:01:29 +00:00
|
|
|
*/
|
2006-03-17 13:52:57 +00:00
|
|
|
unp = sotounpcb(so);
|
|
|
|
KASSERT(unp != NULL, ("uipc_accept: unp == NULL"));
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
*nam = malloc(sizeof(struct sockaddr_un), M_SONAME, M_WAITOK);
|
|
|
|
UNP_LOCK();
|
|
|
|
if (unp->unp_conn != NULL && unp->unp_conn->unp_addr != NULL)
|
|
|
|
sa = (struct sockaddr *) unp->unp_conn->unp_addr;
|
|
|
|
else
|
|
|
|
sa = &sun_noname;
|
|
|
|
bcopy(sa, *nam, sa->sa_len);
|
|
|
|
UNP_UNLOCK();
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
static int
|
2001-09-12 08:38:13 +00:00
|
|
|
uipc_attach(struct socket *so, int proto, struct thread *td)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2006-07-23 10:25:28 +00:00
|
|
|
struct unpcb *unp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
KASSERT(so->so_pcb == NULL, ("uipc_attach: so_pcb != NULL"));
|
|
|
|
if (so->so_snd.sb_hiwat == 0 || so->so_rcv.sb_hiwat == 0) {
|
|
|
|
switch (so->so_type) {
|
|
|
|
case SOCK_STREAM:
|
|
|
|
error = soreserve(so, unpst_sendspace, unpst_recvspace);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case SOCK_DGRAM:
|
|
|
|
error = soreserve(so, unpdg_sendspace, unpdg_recvspace);
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
panic("unp_attach");
|
|
|
|
}
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
unp = uma_zalloc(unp_zone, M_WAITOK | M_ZERO);
|
|
|
|
if (unp == NULL)
|
|
|
|
return (ENOBUFS);
|
|
|
|
LIST_INIT(&unp->unp_refs);
|
|
|
|
unp->unp_socket = so;
|
|
|
|
so->so_pcb = unp;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2006-07-23 10:25:28 +00:00
|
|
|
UNP_LOCK();
|
|
|
|
unp->unp_gencnt = ++unp_gencnt;
|
|
|
|
unp_count++;
|
2006-08-13 23:16:59 +00:00
|
|
|
LIST_INSERT_HEAD(so->so_type == SOCK_DGRAM ? &unp_dhead : &unp_shead,
|
|
|
|
unp, unp_link);
|
2006-07-23 10:25:28 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
|
|
|
|
return (0);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
static int
|
2001-09-12 08:38:13 +00:00
|
|
|
uipc_bind(struct socket *so, struct sockaddr *nam, struct thread *td)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2006-07-23 11:02:12 +00:00
|
|
|
struct sockaddr_un *soun = (struct sockaddr_un *)nam;
|
|
|
|
struct vattr vattr;
|
|
|
|
int error, namelen;
|
|
|
|
struct nameidata nd;
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
2006-07-23 11:02:12 +00:00
|
|
|
struct vnode *vp;
|
|
|
|
struct mount *mp;
|
|
|
|
char *buf;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_bind: unp == NULL"));
|
2006-07-23 12:01:14 +00:00
|
|
|
|
|
|
|
namelen = soun->sun_len - offsetof(struct sockaddr_un, sun_path);
|
|
|
|
if (namelen <= 0)
|
|
|
|
return (EINVAL);
|
2006-07-23 11:02:12 +00:00
|
|
|
|
|
|
|
/*
|
2006-07-23 12:01:14 +00:00
|
|
|
* We don't allow simultaneous bind() calls on a single UNIX domain
|
|
|
|
* socket, so flag in-progress operations, and return an error if an
|
|
|
|
* operation is already in progress.
|
|
|
|
*
|
|
|
|
* Historically, we have not allowed a socket to be rebound, so this
|
|
|
|
* also returns an error. Not allowing re-binding certainly
|
|
|
|
* simplifies the implementation and avoids a great many possible
|
|
|
|
* failure modes.
|
2006-07-23 11:02:12 +00:00
|
|
|
*/
|
2006-07-23 12:01:14 +00:00
|
|
|
UNP_LOCK();
|
2006-07-23 11:02:12 +00:00
|
|
|
if (unp->unp_vnode != NULL) {
|
|
|
|
UNP_UNLOCK();
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
2006-07-23 12:01:14 +00:00
|
|
|
if (unp->unp_flags & UNP_BINDING) {
|
2006-07-23 11:02:12 +00:00
|
|
|
UNP_UNLOCK();
|
2006-07-23 12:01:14 +00:00
|
|
|
return (EALREADY);
|
2006-07-23 11:02:12 +00:00
|
|
|
}
|
2006-07-23 12:01:14 +00:00
|
|
|
unp->unp_flags |= UNP_BINDING;
|
2006-07-23 11:02:12 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
|
|
|
|
buf = malloc(namelen + 1, M_TEMP, M_WAITOK);
|
|
|
|
strlcpy(buf, soun->sun_path, namelen + 1);
|
|
|
|
|
|
|
|
mtx_lock(&Giant);
|
|
|
|
restart:
|
|
|
|
mtx_assert(&Giant, MA_OWNED);
|
|
|
|
NDINIT(&nd, CREATE, NOFOLLOW | LOCKPARENT | SAVENAME, UIO_SYSSPACE,
|
|
|
|
buf, td);
|
|
|
|
/* SHOULD BE ABLE TO ADOPT EXISTING AND wakeup() ALA FIFO's */
|
|
|
|
error = namei(&nd);
|
|
|
|
if (error)
|
2006-07-23 12:01:14 +00:00
|
|
|
goto error;
|
2006-07-23 11:02:12 +00:00
|
|
|
vp = nd.ni_vp;
|
|
|
|
if (vp != NULL || vn_start_write(nd.ni_dvp, &mp, V_NOWAIT) != 0) {
|
|
|
|
NDFREE(&nd, NDF_ONLY_PNBUF);
|
|
|
|
if (nd.ni_dvp == vp)
|
|
|
|
vrele(nd.ni_dvp);
|
|
|
|
else
|
|
|
|
vput(nd.ni_dvp);
|
|
|
|
if (vp != NULL) {
|
|
|
|
vrele(vp);
|
|
|
|
error = EADDRINUSE;
|
2006-07-23 12:01:14 +00:00
|
|
|
goto error;
|
2006-07-23 11:02:12 +00:00
|
|
|
}
|
|
|
|
error = vn_start_write(NULL, &mp, V_XSLEEP | PCATCH);
|
|
|
|
if (error)
|
2006-07-23 12:01:14 +00:00
|
|
|
goto error;
|
2006-07-23 11:02:12 +00:00
|
|
|
goto restart;
|
|
|
|
}
|
|
|
|
VATTR_NULL(&vattr);
|
|
|
|
vattr.va_type = VSOCK;
|
|
|
|
vattr.va_mode = (ACCESSPERMS & ~td->td_proc->p_fd->fd_cmask);
|
|
|
|
#ifdef MAC
|
|
|
|
error = mac_check_vnode_create(td->td_ucred, nd.ni_dvp, &nd.ni_cnd,
|
|
|
|
&vattr);
|
|
|
|
#endif
|
|
|
|
if (error == 0) {
|
|
|
|
VOP_LEASE(nd.ni_dvp, td, td->td_ucred, LEASE_WRITE);
|
|
|
|
error = VOP_CREATE(nd.ni_dvp, &nd.ni_vp, &nd.ni_cnd, &vattr);
|
|
|
|
}
|
|
|
|
NDFREE(&nd, NDF_ONLY_PNBUF);
|
|
|
|
vput(nd.ni_dvp);
|
|
|
|
if (error) {
|
|
|
|
vn_finished_write(mp);
|
2006-07-23 12:01:14 +00:00
|
|
|
goto error;
|
2006-07-23 11:02:12 +00:00
|
|
|
}
|
|
|
|
vp = nd.ni_vp;
|
2006-07-23 12:01:14 +00:00
|
|
|
ASSERT_VOP_LOCKED(vp, "uipc_bind");
|
2006-07-23 11:02:12 +00:00
|
|
|
soun = (struct sockaddr_un *)sodupsockaddr(nam, M_WAITOK);
|
|
|
|
UNP_LOCK();
|
|
|
|
vp->v_socket = unp->unp_socket;
|
|
|
|
unp->unp_vnode = vp;
|
|
|
|
unp->unp_addr = soun;
|
2006-07-23 12:01:14 +00:00
|
|
|
unp->unp_flags &= ~UNP_BINDING;
|
2004-08-16 04:41:03 +00:00
|
|
|
UNP_UNLOCK();
|
2006-07-23 11:02:12 +00:00
|
|
|
VOP_UNLOCK(vp, 0, td);
|
|
|
|
vn_finished_write(mp);
|
2006-07-23 12:01:14 +00:00
|
|
|
mtx_unlock(&Giant);
|
|
|
|
free(buf, M_TEMP);
|
|
|
|
return (0);
|
|
|
|
error:
|
|
|
|
UNP_LOCK();
|
|
|
|
unp->unp_flags &= ~UNP_BINDING;
|
|
|
|
UNP_UNLOCK();
|
2006-07-23 11:02:12 +00:00
|
|
|
mtx_unlock(&Giant);
|
|
|
|
free(buf, M_TEMP);
|
2004-08-16 04:41:03 +00:00
|
|
|
return (error);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
static int
|
2001-09-12 08:38:13 +00:00
|
|
|
uipc_connect(struct socket *so, struct sockaddr *nam, struct thread *td)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
int error;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-07-25 23:30:43 +00:00
|
|
|
KASSERT(td == curthread, ("uipc_connect: td != curthread"));
|
2006-03-17 13:52:57 +00:00
|
|
|
UNP_LOCK();
|
2004-07-25 23:30:43 +00:00
|
|
|
error = unp_connect(so, nam, td);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
return (error);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2006-07-21 17:11:15 +00:00
|
|
|
/*
|
|
|
|
* XXXRW: Should also unbind?
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
uipc_close(struct socket *so)
|
|
|
|
{
|
|
|
|
struct unpcb *unp;
|
|
|
|
|
|
|
|
unp = sotounpcb(so);
|
|
|
|
KASSERT(unp != NULL, ("uipc_close: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
|
|
|
unp_disconnect(unp);
|
|
|
|
UNP_UNLOCK();
|
|
|
|
}
|
|
|
|
|
2004-03-31 01:41:30 +00:00
|
|
|
int
|
1997-04-27 20:01:29 +00:00
|
|
|
uipc_connect2(struct socket *so1, struct socket *so2)
|
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
int error;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so1);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_connect2: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
2005-04-13 00:01:46 +00:00
|
|
|
error = unp_connect2(so1, so2, PRU_CONNECT2);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
return (error);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
/* control is EOPNOTSUPP */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Chance protocol switch method pru_detach() so that it returns void
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.
MFC after: 3 months
2006-04-01 15:42:02 +00:00
|
|
|
static void
|
1997-04-27 20:01:29 +00:00
|
|
|
uipc_detach(struct socket *so)
|
|
|
|
{
|
2006-07-23 10:25:28 +00:00
|
|
|
int local_unp_rights;
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
2006-07-23 10:25:28 +00:00
|
|
|
struct vnode *vp;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_detach: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
2006-07-23 10:25:28 +00:00
|
|
|
LIST_REMOVE(unp, unp_link);
|
|
|
|
unp->unp_gencnt = ++unp_gencnt;
|
|
|
|
--unp_count;
|
|
|
|
if ((vp = unp->unp_vnode) != NULL) {
|
|
|
|
/*
|
|
|
|
* XXXRW: should v_socket be frobbed only while holding
|
|
|
|
* Giant?
|
|
|
|
*/
|
|
|
|
unp->unp_vnode->v_socket = NULL;
|
|
|
|
unp->unp_vnode = NULL;
|
|
|
|
}
|
|
|
|
if (unp->unp_conn != NULL)
|
|
|
|
unp_disconnect(unp);
|
|
|
|
while (!LIST_EMPTY(&unp->unp_refs)) {
|
|
|
|
struct unpcb *ref = LIST_FIRST(&unp->unp_refs);
|
|
|
|
unp_drop(ref, ECONNRESET);
|
|
|
|
}
|
|
|
|
unp->unp_socket->so_pcb = NULL;
|
|
|
|
local_unp_rights = unp_rights;
|
|
|
|
UNP_UNLOCK();
|
|
|
|
if (unp->unp_addr != NULL)
|
|
|
|
FREE(unp->unp_addr, M_SONAME);
|
|
|
|
uma_zfree(unp_zone, unp);
|
|
|
|
if (vp) {
|
|
|
|
int vfslocked;
|
|
|
|
|
|
|
|
vfslocked = VFS_LOCK_GIANT(vp->v_mount);
|
|
|
|
vrele(vp);
|
|
|
|
VFS_UNLOCK_GIANT(vfslocked);
|
|
|
|
}
|
|
|
|
if (local_unp_rights)
|
|
|
|
taskqueue_enqueue(taskqueue_thread, &unp_gc_task);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
uipc_disconnect(struct socket *so)
|
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_disconnect: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
unp_disconnect(unp);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2005-10-30 19:44:40 +00:00
|
|
|
uipc_listen(struct socket *so, int backlog, struct thread *td)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
int error;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_listen: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
|
|
|
if (unp->unp_vnode == NULL) {
|
2004-08-16 04:41:03 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
2005-10-30 19:44:40 +00:00
|
|
|
error = unp_listen(so, unp, backlog, td);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
return (error);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
1997-08-16 19:16:27 +00:00
|
|
|
uipc_peeraddr(struct socket *so, struct sockaddr **nam)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
const struct sockaddr *sa;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2006-03-17 13:52:57 +00:00
|
|
|
unp = sotounpcb(so);
|
|
|
|
KASSERT(unp != NULL, ("uipc_peeraddr: unp == NULL"));
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
*nam = malloc(sizeof(struct sockaddr_un), M_SONAME, M_WAITOK);
|
|
|
|
UNP_LOCK();
|
|
|
|
if (unp->unp_conn != NULL && unp->unp_conn->unp_addr!= NULL)
|
|
|
|
sa = (struct sockaddr *) unp->unp_conn->unp_addr;
|
2003-01-22 18:03:06 +00:00
|
|
|
else {
|
|
|
|
/*
|
|
|
|
* XXX: It seems that this test always fails even when
|
|
|
|
* connection is established. So, this else clause is
|
|
|
|
* added as workaround to return PF_LOCAL sockaddr.
|
|
|
|
*/
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
sa = &sun_noname;
|
2003-01-22 18:03:06 +00:00
|
|
|
}
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
bcopy(sa, *nam, sa->sa_len);
|
|
|
|
UNP_UNLOCK();
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
uipc_rcvd(struct socket *so, int flags)
|
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
1997-04-27 20:01:29 +00:00
|
|
|
struct socket *so2;
|
2006-07-11 21:49:54 +00:00
|
|
|
u_int mbcnt, sbcc;
|
2000-08-29 11:28:06 +00:00
|
|
|
u_long newhiwat;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL"));
|
1997-04-27 20:01:29 +00:00
|
|
|
switch (so->so_type) {
|
|
|
|
case SOCK_DGRAM:
|
|
|
|
panic("uipc_rcvd DGRAM?");
|
|
|
|
/*NOTREACHED*/
|
|
|
|
|
|
|
|
case SOCK_STREAM:
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Adjust backpressure on sender and wakeup any waiting to
|
|
|
|
* write.
|
1997-04-27 20:01:29 +00:00
|
|
|
*/
|
2006-07-11 21:49:54 +00:00
|
|
|
SOCKBUF_LOCK(&so->so_rcv);
|
|
|
|
mbcnt = so->so_rcv.sb_mbcnt;
|
|
|
|
sbcc = so->so_rcv.sb_cc;
|
|
|
|
SOCKBUF_UNLOCK(&so->so_rcv);
|
|
|
|
UNP_LOCK();
|
|
|
|
if (unp->unp_conn == NULL) {
|
|
|
|
UNP_UNLOCK();
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
so2 = unp->unp_conn->unp_socket;
|
|
|
|
SOCKBUF_LOCK(&so2->so_snd);
|
|
|
|
so2->so_snd.sb_mbmax += unp->unp_mbcnt - mbcnt;
|
|
|
|
newhiwat = so2->so_snd.sb_hiwat + unp->unp_cc - sbcc;
|
2000-09-05 22:11:13 +00:00
|
|
|
(void)chgsbsize(so2->so_cred->cr_uidinfo, &so2->so_snd.sb_hiwat,
|
2000-08-29 11:28:06 +00:00
|
|
|
newhiwat, RLIM_INFINITY);
|
Reduce the number of unnecessary unlock-relocks on socket buffer mutexes
associated with performing a wakeup on the socket buffer:
- When performing an sbappend*() followed by a so[rw]wakeup(), explicitly
acquire the socket buffer lock and use the _locked() variants of both
calls. Note that the _locked() sowakeup() versions unlock the mutex on
return. This is done in uipc_send(), divert_packet(), mroute
socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append().
- When the socket buffer lock is dropped before a sowakeup(), remove the
explicit unlock and use the _locked() sowakeup() variant. This is done
in soisdisconnecting(), soisdisconnected() when setting the can't send/
receive flags and dropping data, and in uipc_rcvd() which adjusting
back-pressure on the sockets.
For UNIX domain sockets running mpsafe with a contention-intensive SMP
mysql benchmark, this results in a 1.6% query rate improvement due to
reduce mutex costs.
2004-06-26 19:10:39 +00:00
|
|
|
sowwakeup_locked(so2);
|
2006-07-11 21:49:54 +00:00
|
|
|
unp->unp_mbcnt = mbcnt;
|
|
|
|
unp->unp_cc = sbcc;
|
|
|
|
UNP_UNLOCK();
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
default:
|
|
|
|
panic("uipc_rcvd unknown socktype");
|
|
|
|
}
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
/* pru_rcvoob is EOPNOTSUPP */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
static int
|
1997-08-16 19:16:27 +00:00
|
|
|
uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
|
2005-02-20 23:22:13 +00:00
|
|
|
struct mbuf *control, struct thread *td)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2006-07-22 18:41:42 +00:00
|
|
|
struct unpcb *unp, *unp2;
|
1997-04-27 20:01:29 +00:00
|
|
|
struct socket *so2;
|
2006-07-11 21:49:54 +00:00
|
|
|
u_int mbcnt, sbcc;
|
2000-08-29 11:28:06 +00:00
|
|
|
u_long newhiwat;
|
2006-07-22 18:41:42 +00:00
|
|
|
int error = 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_send: unp == NULL"));
|
1997-04-27 20:01:29 +00:00
|
|
|
if (flags & PRUS_OOB) {
|
|
|
|
error = EOPNOTSUPP;
|
|
|
|
goto release;
|
|
|
|
}
|
|
|
|
|
2004-03-30 02:16:25 +00:00
|
|
|
if (control != NULL && (error = unp_internalize(&control, td)))
|
1997-04-27 20:01:29 +00:00
|
|
|
goto release;
|
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
switch (so->so_type) {
|
2004-01-11 19:48:19 +00:00
|
|
|
case SOCK_DGRAM:
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2004-06-04 04:07:08 +00:00
|
|
|
const struct sockaddr *from;
|
1995-05-30 08:16:23 +00:00
|
|
|
|
2004-03-30 02:16:25 +00:00
|
|
|
if (nam != NULL) {
|
|
|
|
if (unp->unp_conn != NULL) {
|
1997-04-27 20:01:29 +00:00
|
|
|
error = EISCONN;
|
|
|
|
break;
|
|
|
|
}
|
2001-09-12 08:38:13 +00:00
|
|
|
error = unp_connect(so, nam, td);
|
1997-04-27 20:01:29 +00:00
|
|
|
if (error)
|
|
|
|
break;
|
|
|
|
}
|
2006-07-31 23:00:05 +00:00
|
|
|
/*
|
|
|
|
* Because connect() and send() are non-atomic in a sendto()
|
|
|
|
* with a target address, it's possible that the socket will
|
|
|
|
* have disconnected before the send() can run. In that case
|
|
|
|
* return the slightly counter-intuitive but otherwise
|
|
|
|
* correct error that the socket is not connected.
|
|
|
|
*/
|
2006-07-22 18:41:42 +00:00
|
|
|
unp2 = unp->unp_conn;
|
2006-07-31 23:00:05 +00:00
|
|
|
if (unp2 == NULL) {
|
|
|
|
error = ENOTCONN;
|
|
|
|
break;
|
|
|
|
}
|
2006-07-22 18:41:42 +00:00
|
|
|
so2 = unp2->unp_socket;
|
2004-03-30 02:16:25 +00:00
|
|
|
if (unp->unp_addr != NULL)
|
1997-08-16 19:16:27 +00:00
|
|
|
from = (struct sockaddr *)unp->unp_addr;
|
1997-04-27 20:01:29 +00:00
|
|
|
else
|
|
|
|
from = &sun_noname;
|
2006-07-22 18:41:42 +00:00
|
|
|
if (unp2->unp_flags & UNP_WANTCRED)
|
2005-04-13 00:01:46 +00:00
|
|
|
control = unp_addsockcred(td, control);
|
Merge next step in socket buffer locking:
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
2004-06-21 00:20:43 +00:00
|
|
|
SOCKBUF_LOCK(&so2->so_rcv);
|
|
|
|
if (sbappendaddr_locked(&so2->so_rcv, from, m, control)) {
|
Reduce the number of unnecessary unlock-relocks on socket buffer mutexes
associated with performing a wakeup on the socket buffer:
- When performing an sbappend*() followed by a so[rw]wakeup(), explicitly
acquire the socket buffer lock and use the _locked() variants of both
calls. Note that the _locked() sowakeup() versions unlock the mutex on
return. This is done in uipc_send(), divert_packet(), mroute
socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append().
- When the socket buffer lock is dropped before a sowakeup(), remove the
explicit unlock and use the _locked() sowakeup() variant. This is done
in soisdisconnecting(), soisdisconnected() when setting the can't send/
receive flags and dropping data, and in uipc_rcvd() which adjusting
back-pressure on the sockets.
For UNIX domain sockets running mpsafe with a contention-intensive SMP
mysql benchmark, this results in a 1.6% query rate improvement due to
reduce mutex costs.
2004-06-26 19:10:39 +00:00
|
|
|
sorwakeup_locked(so2);
|
2004-03-30 02:16:25 +00:00
|
|
|
m = NULL;
|
|
|
|
control = NULL;
|
2004-01-11 19:48:19 +00:00
|
|
|
} else {
|
Merge next step in socket buffer locking:
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
2004-06-21 00:20:43 +00:00
|
|
|
SOCKBUF_UNLOCK(&so2->so_rcv);
|
1997-04-27 20:01:29 +00:00
|
|
|
error = ENOBUFS;
|
2004-01-11 19:48:19 +00:00
|
|
|
}
|
2004-03-30 02:16:25 +00:00
|
|
|
if (nam != NULL)
|
1997-04-27 20:01:29 +00:00
|
|
|
unp_disconnect(unp);
|
|
|
|
break;
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
case SOCK_STREAM:
|
1995-02-07 02:01:16 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Connect if not connected yet.
|
|
|
|
*
|
|
|
|
* Note: A better implementation would complain if not equal
|
|
|
|
* to the peer's address.
|
1995-02-07 02:01:16 +00:00
|
|
|
*/
|
1997-04-27 20:01:29 +00:00
|
|
|
if ((so->so_state & SS_ISCONNECTED) == 0) {
|
2004-03-30 02:16:25 +00:00
|
|
|
if (nam != NULL) {
|
2001-09-12 08:38:13 +00:00
|
|
|
error = unp_connect(so, nam, td);
|
1997-04-27 20:01:29 +00:00
|
|
|
if (error)
|
|
|
|
break; /* XXX */
|
|
|
|
} else {
|
|
|
|
error = ENOTCONN;
|
|
|
|
break;
|
|
|
|
}
|
1995-02-07 02:01:16 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2006-07-11 21:49:54 +00:00
|
|
|
/* Lockless read. */
|
2004-06-14 18:16:22 +00:00
|
|
|
if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
|
1997-04-27 20:01:29 +00:00
|
|
|
error = EPIPE;
|
|
|
|
break;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2006-07-31 23:00:05 +00:00
|
|
|
/*
|
|
|
|
* Because connect() and send() are non-atomic in a sendto()
|
|
|
|
* with a target address, it's possible that the socket will
|
|
|
|
* have disconnected before the send() can run. In that case
|
|
|
|
* return the slightly counter-intuitive but otherwise
|
|
|
|
* correct error that the socket is not connected.
|
|
|
|
*/
|
2006-07-22 18:41:42 +00:00
|
|
|
unp2 = unp->unp_conn;
|
2006-07-31 23:00:05 +00:00
|
|
|
if (unp2 == NULL) {
|
|
|
|
error = ENOTCONN;
|
|
|
|
break;
|
|
|
|
}
|
2006-07-22 18:41:42 +00:00
|
|
|
so2 = unp2->unp_socket;
|
Merge next step in socket buffer locking:
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
2004-06-21 00:20:43 +00:00
|
|
|
SOCKBUF_LOCK(&so2->so_rcv);
|
2006-07-22 18:41:42 +00:00
|
|
|
if (unp2->unp_flags & UNP_WANTCRED) {
|
2005-04-13 00:01:46 +00:00
|
|
|
/*
|
|
|
|
* Credentials are passed only once on
|
|
|
|
* SOCK_STREAM.
|
|
|
|
*/
|
2006-07-22 18:41:42 +00:00
|
|
|
unp2->unp_flags &= ~UNP_WANTCRED;
|
2005-04-13 00:01:46 +00:00
|
|
|
control = unp_addsockcred(td, control);
|
|
|
|
}
|
1997-04-27 20:01:29 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Send to paired receive port, and then reduce send buffer
|
|
|
|
* hiwater marks to maintain backpressure. Wake up readers.
|
1997-04-27 20:01:29 +00:00
|
|
|
*/
|
2004-03-30 02:16:25 +00:00
|
|
|
if (control != NULL) {
|
Merge next step in socket buffer locking:
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
2004-06-21 00:20:43 +00:00
|
|
|
if (sbappendcontrol_locked(&so2->so_rcv, m, control))
|
2004-03-30 02:16:25 +00:00
|
|
|
control = NULL;
|
2004-01-11 19:48:19 +00:00
|
|
|
} else {
|
Merge next step in socket buffer locking:
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
2004-06-21 00:20:43 +00:00
|
|
|
sbappend_locked(&so2->so_rcv, m);
|
2004-01-11 19:48:19 +00:00
|
|
|
}
|
2006-07-22 18:41:42 +00:00
|
|
|
mbcnt = so2->so_rcv.sb_mbcnt - unp2->unp_mbcnt;
|
|
|
|
unp2->unp_mbcnt = so2->so_rcv.sb_mbcnt;
|
2006-07-11 21:49:54 +00:00
|
|
|
sbcc = so2->so_rcv.sb_cc;
|
|
|
|
sorwakeup_locked(so2);
|
|
|
|
|
|
|
|
SOCKBUF_LOCK(&so->so_snd);
|
2006-07-22 18:41:42 +00:00
|
|
|
newhiwat = so->so_snd.sb_hiwat - (sbcc - unp2->unp_cc);
|
2000-09-05 22:11:13 +00:00
|
|
|
(void)chgsbsize(so->so_cred->cr_uidinfo, &so->so_snd.sb_hiwat,
|
2000-08-29 11:28:06 +00:00
|
|
|
newhiwat, RLIM_INFINITY);
|
2006-07-11 21:49:54 +00:00
|
|
|
so->so_snd.sb_mbmax -= mbcnt;
|
2004-12-22 20:28:46 +00:00
|
|
|
SOCKBUF_UNLOCK(&so->so_snd);
|
2006-07-11 21:49:54 +00:00
|
|
|
|
2006-07-22 18:41:42 +00:00
|
|
|
unp2->unp_cc = sbcc;
|
2004-03-30 02:16:25 +00:00
|
|
|
m = NULL;
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
1997-04-27 20:01:29 +00:00
|
|
|
panic("uipc_send unknown socktype");
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
1997-04-27 20:01:29 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* SEND_EOF is equivalent to a SEND followed by
|
|
|
|
* a SHUTDOWN.
|
|
|
|
*/
|
|
|
|
if (flags & PRUS_EOF) {
|
|
|
|
socantsendmore(so);
|
|
|
|
unp_shutdown(unp);
|
|
|
|
}
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2004-03-30 02:16:25 +00:00
|
|
|
if (control != NULL && error != 0)
|
1999-05-10 18:09:39 +00:00
|
|
|
unp_dispose(control);
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
release:
|
2004-03-30 02:16:25 +00:00
|
|
|
if (control != NULL)
|
1994-05-24 10:09:53 +00:00
|
|
|
m_freem(control);
|
2004-03-30 02:16:25 +00:00
|
|
|
if (m != NULL)
|
1994-05-24 10:09:53 +00:00
|
|
|
m_freem(m);
|
2004-01-11 19:48:19 +00:00
|
|
|
return (error);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
uipc_sense(struct socket *so, struct stat *sb)
|
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
1997-04-27 20:01:29 +00:00
|
|
|
struct socket *so2;
|
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_sense: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
sb->st_blksize = so->so_snd.sb_hiwat;
|
2004-03-30 02:16:25 +00:00
|
|
|
if (so->so_type == SOCK_STREAM && unp->unp_conn != NULL) {
|
1997-04-27 20:01:29 +00:00
|
|
|
so2 = unp->unp_conn->unp_socket;
|
|
|
|
sb->st_blksize += so2->so_rcv.sb_cc;
|
|
|
|
}
|
2004-06-17 17:16:53 +00:00
|
|
|
sb->st_dev = NODEV;
|
1997-04-27 20:01:29 +00:00
|
|
|
if (unp->unp_ino == 0)
|
2002-12-25 07:59:39 +00:00
|
|
|
unp->unp_ino = (++unp_ino == 0) ? ++unp_ino : unp_ino;
|
1997-04-27 20:01:29 +00:00
|
|
|
sb->st_ino = unp->unp_ino;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
uipc_shutdown(struct socket *so)
|
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2004-08-16 04:41:03 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_shutdown: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
1997-04-27 20:01:29 +00:00
|
|
|
socantsendmore(so);
|
|
|
|
unp_shutdown(unp);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1997-04-27 20:01:29 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
1997-08-16 19:16:27 +00:00
|
|
|
uipc_sockaddr(struct socket *so, struct sockaddr **nam)
|
1997-04-27 20:01:29 +00:00
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
const struct sockaddr *sa;
|
1997-04-27 20:01:29 +00:00
|
|
|
|
2006-03-17 13:52:57 +00:00
|
|
|
unp = sotounpcb(so);
|
|
|
|
KASSERT(unp != NULL, ("uipc_sockaddr: unp == NULL"));
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
*nam = malloc(sizeof(struct sockaddr_un), M_SONAME, M_WAITOK);
|
|
|
|
UNP_LOCK();
|
2004-03-30 02:16:25 +00:00
|
|
|
if (unp->unp_addr != NULL)
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
sa = (struct sockaddr *) unp->unp_addr;
|
2001-04-24 19:09:23 +00:00
|
|
|
else
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
sa = &sun_noname;
|
|
|
|
bcopy(sa, *nam, sa->sa_len);
|
|
|
|
UNP_UNLOCK();
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1997-04-27 20:01:29 +00:00
|
|
|
struct pr_usrreqs uipc_usrreqs = {
|
2004-11-08 14:44:54 +00:00
|
|
|
.pru_abort = uipc_abort,
|
|
|
|
.pru_accept = uipc_accept,
|
|
|
|
.pru_attach = uipc_attach,
|
|
|
|
.pru_bind = uipc_bind,
|
|
|
|
.pru_connect = uipc_connect,
|
|
|
|
.pru_connect2 = uipc_connect2,
|
|
|
|
.pru_detach = uipc_detach,
|
|
|
|
.pru_disconnect = uipc_disconnect,
|
|
|
|
.pru_listen = uipc_listen,
|
|
|
|
.pru_peeraddr = uipc_peeraddr,
|
|
|
|
.pru_rcvd = uipc_rcvd,
|
|
|
|
.pru_send = uipc_send,
|
|
|
|
.pru_sense = uipc_sense,
|
|
|
|
.pru_shutdown = uipc_shutdown,
|
|
|
|
.pru_sockaddr = uipc_sockaddr,
|
2006-07-21 17:11:15 +00:00
|
|
|
.pru_close = uipc_close,
|
1997-04-27 20:01:29 +00:00
|
|
|
};
|
2001-08-17 22:01:18 +00:00
|
|
|
|
|
|
|
int
|
2005-02-20 23:22:13 +00:00
|
|
|
uipc_ctloutput(struct socket *so, struct sockopt *sopt)
|
2001-08-17 22:01:18 +00:00
|
|
|
{
|
2004-08-16 04:41:03 +00:00
|
|
|
struct unpcb *unp;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
struct xucred xu;
|
2005-04-13 00:01:46 +00:00
|
|
|
int error, optval;
|
|
|
|
|
2005-04-20 02:57:56 +00:00
|
|
|
if (sopt->sopt_level != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
2005-04-13 00:01:46 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("uipc_ctloutput: unp == NULL"));
|
|
|
|
UNP_LOCK();
|
2005-04-13 00:01:46 +00:00
|
|
|
error = 0;
|
2001-08-17 22:01:18 +00:00
|
|
|
switch (sopt->sopt_dir) {
|
|
|
|
case SOPT_GET:
|
|
|
|
switch (sopt->sopt_name) {
|
|
|
|
case LOCAL_PEERCRED:
|
|
|
|
if (unp->unp_flags & UNP_HAVEPC)
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
xu = unp->unp_peercred;
|
2001-08-17 22:01:18 +00:00
|
|
|
else {
|
|
|
|
if (so->so_type == SOCK_STREAM)
|
|
|
|
error = ENOTCONN;
|
|
|
|
else
|
|
|
|
error = EINVAL;
|
|
|
|
}
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
if (error == 0)
|
|
|
|
error = sooptcopyout(sopt, &xu, sizeof(xu));
|
2001-08-17 22:01:18 +00:00
|
|
|
break;
|
2005-04-13 00:01:46 +00:00
|
|
|
case LOCAL_CREDS:
|
|
|
|
optval = unp->unp_flags & UNP_WANTCRED ? 1 : 0;
|
|
|
|
error = sooptcopyout(sopt, &optval, sizeof(optval));
|
|
|
|
break;
|
|
|
|
case LOCAL_CONNWAIT:
|
|
|
|
optval = unp->unp_flags & UNP_CONNWAIT ? 1 : 0;
|
|
|
|
error = sooptcopyout(sopt, &optval, sizeof(optval));
|
|
|
|
break;
|
2001-08-17 22:01:18 +00:00
|
|
|
default:
|
|
|
|
error = EOPNOTSUPP;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SOPT_SET:
|
2005-04-13 00:01:46 +00:00
|
|
|
switch (sopt->sopt_name) {
|
|
|
|
case LOCAL_CREDS:
|
|
|
|
case LOCAL_CONNWAIT:
|
|
|
|
error = sooptcopyin(sopt, &optval, sizeof(optval),
|
|
|
|
sizeof(optval));
|
|
|
|
if (error)
|
|
|
|
break;
|
|
|
|
|
|
|
|
#define OPTSET(bit) \
|
|
|
|
if (optval) \
|
|
|
|
unp->unp_flags |= bit; \
|
|
|
|
else \
|
|
|
|
unp->unp_flags &= ~bit;
|
|
|
|
|
|
|
|
switch (sopt->sopt_name) {
|
|
|
|
case LOCAL_CREDS:
|
|
|
|
OPTSET(UNP_WANTCRED);
|
|
|
|
break;
|
|
|
|
case LOCAL_CONNWAIT:
|
|
|
|
OPTSET(UNP_CONNWAIT);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
#undef OPTSET
|
|
|
|
default:
|
|
|
|
error = ENOPROTOOPT;
|
|
|
|
break;
|
|
|
|
}
|
2005-04-25 00:48:04 +00:00
|
|
|
break;
|
2001-08-17 22:01:18 +00:00
|
|
|
default:
|
|
|
|
error = EOPNOTSUPP;
|
|
|
|
break;
|
|
|
|
}
|
2005-04-13 00:01:46 +00:00
|
|
|
UNP_UNLOCK();
|
2001-08-17 22:01:18 +00:00
|
|
|
return (error);
|
|
|
|
}
|
2004-01-11 19:48:19 +00:00
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static int
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_connect(struct socket *so, struct sockaddr *nam, struct thread *td)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2005-02-20 23:22:13 +00:00
|
|
|
struct sockaddr_un *soun = (struct sockaddr_un *)nam;
|
|
|
|
struct vnode *vp;
|
|
|
|
struct socket *so2, *so3;
|
2004-08-14 03:43:49 +00:00
|
|
|
struct unpcb *unp, *unp2, *unp3;
|
1997-08-16 19:16:27 +00:00
|
|
|
int error, len;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct nameidata nd;
|
1997-08-16 19:16:27 +00:00
|
|
|
char buf[SOCK_MAXADDRLEN];
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
struct sockaddr *sa;
|
|
|
|
|
|
|
|
UNP_LOCK_ASSERT();
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2006-03-17 13:52:57 +00:00
|
|
|
unp = sotounpcb(so);
|
|
|
|
KASSERT(unp != NULL, ("unp_connect: unp == NULL"));
|
1997-08-16 19:16:27 +00:00
|
|
|
len = nam->sa_len - offsetof(struct sockaddr_un, sun_path);
|
|
|
|
if (len <= 0)
|
2004-01-11 19:48:19 +00:00
|
|
|
return (EINVAL);
|
2002-10-17 15:52:42 +00:00
|
|
|
strlcpy(buf, soun->sun_path, len + 1);
|
2006-07-23 12:01:14 +00:00
|
|
|
if (unp->unp_flags & UNP_CONNECTING) {
|
|
|
|
UNP_UNLOCK();
|
|
|
|
return (EALREADY);
|
|
|
|
}
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
|
|
|
sa = malloc(sizeof(struct sockaddr_un), M_SONAME, M_WAITOK);
|
|
|
|
mtx_lock(&Giant);
|
2001-09-12 08:38:13 +00:00
|
|
|
NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF, UIO_SYSSPACE, buf, td);
|
1994-10-02 17:35:40 +00:00
|
|
|
error = namei(&nd);
|
|
|
|
if (error)
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
vp = NULL;
|
|
|
|
else
|
|
|
|
vp = nd.ni_vp;
|
|
|
|
ASSERT_VOP_LOCKED(vp, "unp_connect");
|
1999-12-15 23:02:35 +00:00
|
|
|
NDFREE(&nd, NDF_ONLY_PNBUF);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
if (error)
|
|
|
|
goto bad;
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
if (vp->v_type != VSOCK) {
|
|
|
|
error = ENOTSOCK;
|
|
|
|
goto bad;
|
|
|
|
}
|
2002-02-27 18:32:23 +00:00
|
|
|
error = VOP_ACCESS(vp, VWRITE, td->td_ucred, td);
|
1994-10-02 17:35:40 +00:00
|
|
|
if (error)
|
1994-05-24 10:09:53 +00:00
|
|
|
goto bad;
|
2004-07-18 01:29:43 +00:00
|
|
|
mtx_unlock(&Giant);
|
|
|
|
UNP_LOCK();
|
2004-08-14 03:43:49 +00:00
|
|
|
unp = sotounpcb(so);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp != NULL, ("unp_connect: unp == NULL"));
|
1994-05-24 10:09:53 +00:00
|
|
|
so2 = vp->v_socket;
|
2004-03-30 02:16:25 +00:00
|
|
|
if (so2 == NULL) {
|
1994-05-24 10:09:53 +00:00
|
|
|
error = ECONNREFUSED;
|
2004-07-18 01:29:43 +00:00
|
|
|
goto bad2;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
if (so->so_type != so2->so_type) {
|
|
|
|
error = EPROTOTYPE;
|
2004-07-18 01:29:43 +00:00
|
|
|
goto bad2;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
if (so->so_proto->pr_flags & PR_CONNREQUIRED) {
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
if (so2->so_options & SO_ACCEPTCONN) {
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* NB: drop locks here so unp_attach is entered w/o
|
|
|
|
* locks; this avoids a recursive lock of the head
|
|
|
|
* and holding sleep locks across a (potentially)
|
|
|
|
* blocking malloc.
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
*/
|
|
|
|
UNP_UNLOCK();
|
|
|
|
so3 = sonewconn(so2, 0);
|
|
|
|
UNP_LOCK();
|
|
|
|
} else
|
|
|
|
so3 = NULL;
|
|
|
|
if (so3 == NULL) {
|
1994-05-24 10:09:53 +00:00
|
|
|
error = ECONNREFUSED;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
goto bad2;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2001-08-17 22:01:18 +00:00
|
|
|
unp = sotounpcb(so);
|
1994-05-24 10:09:53 +00:00
|
|
|
unp2 = sotounpcb(so2);
|
|
|
|
unp3 = sotounpcb(so3);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
if (unp2->unp_addr != NULL) {
|
|
|
|
bcopy(unp2->unp_addr, sa, unp2->unp_addr->sun_len);
|
|
|
|
unp3->unp_addr = (struct sockaddr_un *) sa;
|
|
|
|
sa = NULL;
|
|
|
|
}
|
2001-08-17 22:01:18 +00:00
|
|
|
/*
|
|
|
|
* unp_peercred management:
|
|
|
|
*
|
2006-07-22 17:24:55 +00:00
|
|
|
* The connecter's (client's) credentials are copied from its
|
|
|
|
* process structure at the time of connect() (which is now).
|
2001-08-17 22:01:18 +00:00
|
|
|
*/
|
2002-02-27 18:32:23 +00:00
|
|
|
cru2x(td->td_ucred, &unp3->unp_peercred);
|
2001-08-17 22:01:18 +00:00
|
|
|
unp3->unp_flags |= UNP_HAVEPC;
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* The receiver's (server's) credentials are copied from the
|
|
|
|
* unp_peercred member of socket on which the former called
|
|
|
|
* listen(); unp_listen() cached that process's credentials
|
|
|
|
* at that time so we can use them now.
|
2001-08-17 22:01:18 +00:00
|
|
|
*/
|
|
|
|
KASSERT(unp2->unp_flags & UNP_HAVEPCCACHED,
|
|
|
|
("unp_connect: listener without cached peercred"));
|
|
|
|
memcpy(&unp->unp_peercred, &unp2->unp_peercred,
|
|
|
|
sizeof(unp->unp_peercred));
|
|
|
|
unp->unp_flags |= UNP_HAVEPC;
|
2006-04-24 19:09:33 +00:00
|
|
|
if (unp2->unp_flags & UNP_WANTCRED)
|
|
|
|
unp3->unp_flags |= UNP_WANTCRED;
|
2002-07-31 03:03:22 +00:00
|
|
|
#ifdef MAC
|
2004-06-13 02:50:07 +00:00
|
|
|
SOCK_LOCK(so);
|
2002-07-31 03:03:22 +00:00
|
|
|
mac_set_socket_peer_from_socket(so, so3);
|
|
|
|
mac_set_socket_peer_from_socket(so3, so);
|
2004-06-13 02:50:07 +00:00
|
|
|
SOCK_UNLOCK(so);
|
2002-07-31 03:03:22 +00:00
|
|
|
#endif
|
2001-08-17 22:01:18 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
so2 = so3;
|
|
|
|
}
|
2005-04-13 00:01:46 +00:00
|
|
|
error = unp_connect2(so, so2, PRU_CONNECT);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
bad2:
|
|
|
|
UNP_UNLOCK();
|
|
|
|
mtx_lock(&Giant);
|
1994-05-24 10:09:53 +00:00
|
|
|
bad:
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
mtx_assert(&Giant, MA_OWNED);
|
|
|
|
if (vp != NULL)
|
|
|
|
vput(vp);
|
|
|
|
mtx_unlock(&Giant);
|
|
|
|
free(sa, M_SONAME);
|
|
|
|
UNP_LOCK();
|
2006-07-23 12:01:14 +00:00
|
|
|
unp->unp_flags &= ~UNP_CONNECTING;
|
1994-05-24 10:09:53 +00:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2004-03-31 01:41:30 +00:00
|
|
|
static int
|
2005-04-13 00:01:46 +00:00
|
|
|
unp_connect2(struct socket *so, struct socket *so2, int req)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2005-02-20 23:22:13 +00:00
|
|
|
struct unpcb *unp = sotounpcb(so);
|
|
|
|
struct unpcb *unp2;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK_ASSERT();
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
if (so2->so_type != so->so_type)
|
|
|
|
return (EPROTOTYPE);
|
|
|
|
unp2 = sotounpcb(so2);
|
2006-03-17 13:52:57 +00:00
|
|
|
KASSERT(unp2 != NULL, ("unp_connect2: unp2 == NULL"));
|
1994-05-24 10:09:53 +00:00
|
|
|
unp->unp_conn = unp2;
|
|
|
|
switch (so->so_type) {
|
|
|
|
case SOCK_DGRAM:
|
1998-05-15 20:11:40 +00:00
|
|
|
LIST_INSERT_HEAD(&unp2->unp_refs, unp, unp_reflink);
|
1994-05-24 10:09:53 +00:00
|
|
|
soisconnected(so);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case SOCK_STREAM:
|
|
|
|
unp2->unp_conn = unp;
|
2005-04-13 00:01:46 +00:00
|
|
|
if (req == PRU_CONNECT &&
|
|
|
|
((unp->unp_flags | unp2->unp_flags) & UNP_CONNWAIT))
|
|
|
|
soisconnecting(so);
|
|
|
|
else
|
|
|
|
soisconnected(so);
|
1994-05-24 10:09:53 +00:00
|
|
|
soisconnected(so2);
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
panic("unp_connect2");
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_disconnect(struct unpcb *unp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2005-02-20 23:22:13 +00:00
|
|
|
struct unpcb *unp2 = unp->unp_conn;
|
2004-06-20 21:29:56 +00:00
|
|
|
struct socket *so;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK_ASSERT();
|
|
|
|
|
2004-03-30 02:16:25 +00:00
|
|
|
if (unp2 == NULL)
|
1994-05-24 10:09:53 +00:00
|
|
|
return;
|
2004-03-30 02:16:25 +00:00
|
|
|
unp->unp_conn = NULL;
|
1994-05-24 10:09:53 +00:00
|
|
|
switch (unp->unp_socket->so_type) {
|
|
|
|
case SOCK_DGRAM:
|
1998-05-15 20:11:40 +00:00
|
|
|
LIST_REMOVE(unp, unp_reflink);
|
2004-06-20 21:29:56 +00:00
|
|
|
so = unp->unp_socket;
|
|
|
|
SOCK_LOCK(so);
|
|
|
|
so->so_state &= ~SS_ISCONNECTED;
|
|
|
|
SOCK_UNLOCK(so);
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case SOCK_STREAM:
|
|
|
|
soisdisconnected(unp->unp_socket);
|
2004-03-30 02:16:25 +00:00
|
|
|
unp2->unp_conn = NULL;
|
1994-05-24 10:09:53 +00:00
|
|
|
soisdisconnected(unp2->unp_socket);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* unp_pcblist() assumes that UNIX domain socket memory is never reclaimed by
|
|
|
|
* the zone (UMA_ZONE_NOFREE), and as such potentially stale pointers are
|
|
|
|
* safe to reference. It first scans the list of struct unpcb's to generate
|
|
|
|
* a pointer list, then it rescans its list one entry at a time to
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
* externalize and copyout. It checks the generation number to see if a
|
|
|
|
* struct unpcb has been reused, and will skip it if so.
|
|
|
|
*/
|
1998-05-15 20:11:40 +00:00
|
|
|
static int
|
2000-07-04 11:25:35 +00:00
|
|
|
unp_pcblist(SYSCTL_HANDLER_ARGS)
|
1998-05-15 20:11:40 +00:00
|
|
|
{
|
1998-10-25 17:44:59 +00:00
|
|
|
int error, i, n;
|
1998-05-15 20:11:40 +00:00
|
|
|
struct unpcb *unp, **unp_list;
|
|
|
|
unp_gen_t gencnt;
|
2001-08-18 02:53:50 +00:00
|
|
|
struct xunpgen *xug;
|
1998-05-15 20:11:40 +00:00
|
|
|
struct unp_head *head;
|
2001-08-18 02:53:50 +00:00
|
|
|
struct xunpcb *xu;
|
1998-05-15 20:11:40 +00:00
|
|
|
|
1998-07-15 02:32:35 +00:00
|
|
|
head = ((intptr_t)arg1 == SOCK_DGRAM ? &unp_dhead : &unp_shead);
|
1998-05-15 20:11:40 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The process of preparing the PCB list is too time-consuming and
|
|
|
|
* resource-intensive to repeat twice on every request.
|
|
|
|
*/
|
2004-03-30 02:16:25 +00:00
|
|
|
if (req->oldptr == NULL) {
|
1998-05-15 20:11:40 +00:00
|
|
|
n = unp_count;
|
2001-08-18 02:53:50 +00:00
|
|
|
req->oldidx = 2 * (sizeof *xug)
|
1998-05-15 20:11:40 +00:00
|
|
|
+ (n + n/8) * sizeof(struct xunpcb);
|
2004-01-11 19:48:19 +00:00
|
|
|
return (0);
|
1998-05-15 20:11:40 +00:00
|
|
|
}
|
|
|
|
|
2004-03-30 02:16:25 +00:00
|
|
|
if (req->newptr != NULL)
|
2004-01-11 19:48:19 +00:00
|
|
|
return (EPERM);
|
1998-05-15 20:11:40 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* OK, now we're committed to doing something.
|
|
|
|
*/
|
2003-02-19 05:47:46 +00:00
|
|
|
xug = malloc(sizeof(*xug), M_TEMP, M_WAITOK);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK();
|
1998-05-15 20:11:40 +00:00
|
|
|
gencnt = unp_gencnt;
|
|
|
|
n = unp_count;
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
1998-05-15 20:11:40 +00:00
|
|
|
|
2001-08-18 02:53:50 +00:00
|
|
|
xug->xug_len = sizeof *xug;
|
|
|
|
xug->xug_count = n;
|
|
|
|
xug->xug_gen = gencnt;
|
|
|
|
xug->xug_sogen = so_gencnt;
|
|
|
|
error = SYSCTL_OUT(req, xug, sizeof *xug);
|
|
|
|
if (error) {
|
|
|
|
free(xug, M_TEMP);
|
2004-01-11 19:48:19 +00:00
|
|
|
return (error);
|
2001-08-18 02:53:50 +00:00
|
|
|
}
|
1998-05-15 20:11:40 +00:00
|
|
|
|
2003-02-19 05:47:46 +00:00
|
|
|
unp_list = malloc(n * sizeof *unp_list, M_TEMP, M_WAITOK);
|
2004-01-11 19:48:19 +00:00
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK();
|
1999-11-16 10:56:05 +00:00
|
|
|
for (unp = LIST_FIRST(head), i = 0; unp && i < n;
|
|
|
|
unp = LIST_NEXT(unp, unp_link)) {
|
2001-10-09 21:40:30 +00:00
|
|
|
if (unp->unp_gencnt <= gencnt) {
|
2002-02-27 18:32:23 +00:00
|
|
|
if (cr_cansee(req->td->td_ucred,
|
2001-10-09 21:40:30 +00:00
|
|
|
unp->unp_socket->so_cred))
|
2001-10-05 07:06:32 +00:00
|
|
|
continue;
|
1998-05-15 20:11:40 +00:00
|
|
|
unp_list[i++] = unp;
|
2001-10-05 07:06:32 +00:00
|
|
|
}
|
1998-05-15 20:11:40 +00:00
|
|
|
}
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_UNLOCK();
|
2006-07-22 17:24:55 +00:00
|
|
|
n = i; /* In case we lost some during malloc. */
|
1998-05-15 20:11:40 +00:00
|
|
|
|
|
|
|
error = 0;
|
2005-05-07 00:41:36 +00:00
|
|
|
xu = malloc(sizeof(*xu), M_TEMP, M_WAITOK | M_ZERO);
|
1998-05-15 20:11:40 +00:00
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
unp = unp_list[i];
|
|
|
|
if (unp->unp_gencnt <= gencnt) {
|
2001-08-18 02:53:50 +00:00
|
|
|
xu->xu_len = sizeof *xu;
|
|
|
|
xu->xu_unpp = unp;
|
1998-05-15 20:11:40 +00:00
|
|
|
/*
|
|
|
|
* XXX - need more locking here to protect against
|
|
|
|
* connect/disconnect races for SMP.
|
|
|
|
*/
|
2004-03-30 02:16:25 +00:00
|
|
|
if (unp->unp_addr != NULL)
|
2004-01-11 19:48:19 +00:00
|
|
|
bcopy(unp->unp_addr, &xu->xu_addr,
|
1998-05-15 20:11:40 +00:00
|
|
|
unp->unp_addr->sun_len);
|
2004-03-30 02:16:25 +00:00
|
|
|
if (unp->unp_conn != NULL &&
|
|
|
|
unp->unp_conn->unp_addr != NULL)
|
1998-05-15 20:11:40 +00:00
|
|
|
bcopy(unp->unp_conn->unp_addr,
|
2001-08-18 02:53:50 +00:00
|
|
|
&xu->xu_caddr,
|
1998-05-15 20:11:40 +00:00
|
|
|
unp->unp_conn->unp_addr->sun_len);
|
2001-08-18 02:53:50 +00:00
|
|
|
bcopy(unp, &xu->xu_unp, sizeof *unp);
|
|
|
|
sotoxsocket(unp->unp_socket, &xu->xu_socket);
|
|
|
|
error = SYSCTL_OUT(req, xu, sizeof *xu);
|
1998-05-15 20:11:40 +00:00
|
|
|
}
|
|
|
|
}
|
2001-08-18 02:53:50 +00:00
|
|
|
free(xu, M_TEMP);
|
1998-05-15 20:11:40 +00:00
|
|
|
if (!error) {
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Give the user an updated idea of our state. If the
|
|
|
|
* generation differs from what we told her before, she knows
|
|
|
|
* that something happened while we were processing this
|
|
|
|
* request, and it might be necessary to retry.
|
1998-05-15 20:11:40 +00:00
|
|
|
*/
|
2001-08-18 02:53:50 +00:00
|
|
|
xug->xug_gen = unp_gencnt;
|
|
|
|
xug->xug_sogen = so_gencnt;
|
|
|
|
xug->xug_count = unp_count;
|
|
|
|
error = SYSCTL_OUT(req, xug, sizeof *xug);
|
1998-05-15 20:11:40 +00:00
|
|
|
}
|
|
|
|
free(unp_list, M_TEMP);
|
2001-08-18 02:53:50 +00:00
|
|
|
free(xug, M_TEMP);
|
2004-01-11 19:48:19 +00:00
|
|
|
return (error);
|
1998-05-15 20:11:40 +00:00
|
|
|
}
|
|
|
|
|
2004-01-11 19:48:19 +00:00
|
|
|
SYSCTL_PROC(_net_local_dgram, OID_AUTO, pcblist, CTLFLAG_RD,
|
1998-05-15 20:11:40 +00:00
|
|
|
(caddr_t)(long)SOCK_DGRAM, 0, unp_pcblist, "S,xunpcb",
|
|
|
|
"List of active local datagram sockets");
|
2004-01-11 19:48:19 +00:00
|
|
|
SYSCTL_PROC(_net_local_stream, OID_AUTO, pcblist, CTLFLAG_RD,
|
1998-05-15 20:11:40 +00:00
|
|
|
(caddr_t)(long)SOCK_STREAM, 0, unp_pcblist, "S,xunpcb",
|
|
|
|
"List of active local stream sockets");
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_shutdown(struct unpcb *unp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
struct socket *so;
|
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK_ASSERT();
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
if (unp->unp_socket->so_type == SOCK_STREAM && unp->unp_conn &&
|
|
|
|
(so = unp->unp_conn->unp_socket))
|
|
|
|
socantrcvmore(so);
|
|
|
|
}
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_drop(struct unpcb *unp, int errno)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
struct socket *so = unp->unp_socket;
|
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK_ASSERT();
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
so->so_error = errno;
|
|
|
|
unp_disconnect(unp);
|
|
|
|
}
|
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_freerights(struct file **rp, int fdcount)
|
2001-10-04 13:11:48 +00:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct file *fp;
|
|
|
|
|
|
|
|
for (i = 0; i < fdcount; i++) {
|
|
|
|
fp = *rp;
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Zero the pointer before calling unp_discard since it may
|
|
|
|
* end up in unp_gc()..
|
2006-01-13 00:00:32 +00:00
|
|
|
*
|
|
|
|
* XXXRW: This is less true than it used to be.
|
2001-10-04 13:11:48 +00:00
|
|
|
*/
|
|
|
|
*rp++ = 0;
|
|
|
|
unp_discard(fp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
int
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_externalize(struct mbuf *control, struct mbuf **controlp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td = curthread; /* XXX */
|
2001-10-04 13:11:48 +00:00
|
|
|
struct cmsghdr *cm = mtod(control, struct cmsghdr *);
|
|
|
|
int i;
|
|
|
|
int *fdp;
|
|
|
|
struct file **rp;
|
|
|
|
struct file *fp;
|
|
|
|
void *data;
|
|
|
|
socklen_t clen = control->m_len, datalen;
|
|
|
|
int error, newfds;
|
1994-05-24 10:09:53 +00:00
|
|
|
int f;
|
2001-10-04 13:11:48 +00:00
|
|
|
u_int newlen;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-19 01:45:16 +00:00
|
|
|
UNP_UNLOCK_ASSERT();
|
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
error = 0;
|
|
|
|
if (controlp != NULL) /* controlp == NULL => free control messages */
|
|
|
|
*controlp = NULL;
|
|
|
|
|
|
|
|
while (cm != NULL) {
|
|
|
|
if (sizeof(*cm) > clen || cm->cmsg_len > clen) {
|
|
|
|
error = EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
data = CMSG_DATA(cm);
|
|
|
|
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;
|
|
|
|
|
|
|
|
if (cm->cmsg_level == SOL_SOCKET
|
|
|
|
&& cm->cmsg_type == SCM_RIGHTS) {
|
|
|
|
newfds = datalen / sizeof(struct file *);
|
|
|
|
rp = data;
|
|
|
|
|
2003-03-23 19:41:34 +00:00
|
|
|
/* If we're not outputting the descriptors free them. */
|
2001-10-04 13:11:48 +00:00
|
|
|
if (error || controlp == NULL) {
|
|
|
|
unp_freerights(rp, newfds);
|
|
|
|
goto next;
|
|
|
|
}
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_LOCK(td->td_proc->p_fd);
|
2001-10-04 13:11:48 +00:00
|
|
|
/* if the new FD's will not fit free them. */
|
|
|
|
if (!fdavail(td, newfds)) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_UNLOCK(td->td_proc->p_fd);
|
2001-10-04 13:11:48 +00:00
|
|
|
error = EMSGSIZE;
|
|
|
|
unp_freerights(rp, newfds);
|
|
|
|
goto next;
|
|
|
|
}
|
2000-03-09 15:15:27 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Now change each pointer to an fd in the global
|
|
|
|
* table to an integer that is the index to the local
|
|
|
|
* fd table entry that we set up to point to the
|
|
|
|
* global one we are transferring.
|
2000-03-09 15:15:27 +00:00
|
|
|
*/
|
2001-10-04 13:11:48 +00:00
|
|
|
newlen = newfds * sizeof(int);
|
|
|
|
*controlp = sbcreatecontrol(NULL, newlen,
|
|
|
|
SCM_RIGHTS, SOL_SOCKET);
|
|
|
|
if (*controlp == NULL) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_UNLOCK(td->td_proc->p_fd);
|
2001-10-04 13:11:48 +00:00
|
|
|
error = E2BIG;
|
|
|
|
unp_freerights(rp, newfds);
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
|
|
|
|
fdp = (int *)
|
|
|
|
CMSG_DATA(mtod(*controlp, struct cmsghdr *));
|
|
|
|
for (i = 0; i < newfds; i++) {
|
2004-01-17 00:59:04 +00:00
|
|
|
if (fdalloc(td, 0, &f))
|
2001-10-04 13:11:48 +00:00
|
|
|
panic("unp_externalize fdalloc failed");
|
|
|
|
fp = *rp++;
|
|
|
|
td->td_proc->p_fd->fd_ofiles[f] = fp;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_LOCK(fp);
|
2001-10-04 13:11:48 +00:00
|
|
|
fp->f_msgcount--;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(fp);
|
2001-10-04 13:11:48 +00:00
|
|
|
unp_rights--;
|
|
|
|
*fdp++ = f;
|
|
|
|
}
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_UNLOCK(td->td_proc->p_fd);
|
2006-07-22 17:24:55 +00:00
|
|
|
} else {
|
|
|
|
/* We can just copy anything else across. */
|
2001-10-04 13:11:48 +00:00
|
|
|
if (error || controlp == NULL)
|
|
|
|
goto next;
|
|
|
|
*controlp = sbcreatecontrol(NULL, datalen,
|
|
|
|
cm->cmsg_type, cm->cmsg_level);
|
|
|
|
if (*controlp == NULL) {
|
|
|
|
error = ENOBUFS;
|
|
|
|
goto next;
|
|
|
|
}
|
|
|
|
bcopy(data,
|
|
|
|
CMSG_DATA(mtod(*controlp, struct cmsghdr *)),
|
|
|
|
datalen);
|
2000-03-09 15:15:27 +00:00
|
|
|
}
|
2001-10-04 13:11:48 +00:00
|
|
|
|
|
|
|
controlp = &(*controlp)->m_next;
|
|
|
|
|
|
|
|
next:
|
|
|
|
if (CMSG_SPACE(datalen) < clen) {
|
|
|
|
clen -= CMSG_SPACE(datalen);
|
|
|
|
cm = (struct cmsghdr *)
|
|
|
|
((caddr_t)cm + CMSG_SPACE(datalen));
|
|
|
|
} else {
|
|
|
|
clen = 0;
|
|
|
|
cm = NULL;
|
2000-03-09 15:15:27 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2000-03-09 15:15:27 +00:00
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
m_freem(control);
|
|
|
|
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2006-04-21 09:25:40 +00:00
|
|
|
static void
|
|
|
|
unp_zone_change(void *tag)
|
|
|
|
{
|
|
|
|
|
|
|
|
uma_zone_set_max(unp_zone, maxsockets);
|
|
|
|
}
|
|
|
|
|
1998-05-15 20:11:40 +00:00
|
|
|
void
|
|
|
|
unp_init(void)
|
|
|
|
{
|
2006-07-22 17:24:55 +00:00
|
|
|
|
2002-03-20 04:11:52 +00:00
|
|
|
unp_zone = uma_zcreate("unpcb", sizeof(struct unpcb), NULL, NULL,
|
|
|
|
NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
|
2004-03-30 02:16:25 +00:00
|
|
|
if (unp_zone == NULL)
|
1998-05-15 20:11:40 +00:00
|
|
|
panic("unp_init");
|
2006-04-21 09:25:40 +00:00
|
|
|
uma_zone_set_max(unp_zone, maxsockets);
|
|
|
|
EVENTHANDLER_REGISTER(maxsockets_change, unp_zone_change,
|
|
|
|
NULL, EVENTHANDLER_PRI_ANY);
|
1998-05-15 20:11:40 +00:00
|
|
|
LIST_INIT(&unp_dhead);
|
|
|
|
LIST_INIT(&unp_shead);
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
TASK_INIT(&unp_gc_task, 0, unp_gc, NULL);
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK_INIT();
|
1998-05-15 20:11:40 +00:00
|
|
|
}
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static int
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_internalize(struct mbuf **controlp, struct thread *td)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-10-04 13:11:48 +00:00
|
|
|
struct mbuf *control = *controlp;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct proc *p = td->td_proc;
|
2000-03-09 15:15:27 +00:00
|
|
|
struct filedesc *fdescp = p->p_fd;
|
2001-10-04 13:11:48 +00:00
|
|
|
struct cmsghdr *cm = mtod(control, struct cmsghdr *);
|
|
|
|
struct cmsgcred *cmcred;
|
|
|
|
struct file **rp;
|
|
|
|
struct file *fp;
|
|
|
|
struct timeval *tv;
|
|
|
|
int i, fd, *fdp;
|
|
|
|
void *data;
|
|
|
|
socklen_t clen = control->m_len, datalen;
|
|
|
|
int error, oldfds;
|
2000-03-09 15:15:27 +00:00
|
|
|
u_int newlen;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2004-08-19 01:45:16 +00:00
|
|
|
UNP_UNLOCK_ASSERT();
|
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
error = 0;
|
|
|
|
*controlp = NULL;
|
Add support to sendmsg()/recvmsg() for passing credentials between
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).
What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:
- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.
- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.
- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.
The receiver can now inspect the credential information and use it to
authenticate the client.
1997-03-21 16:12:32 +00:00
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
while (cm != NULL) {
|
|
|
|
if (sizeof(*cm) > clen || cm->cmsg_level != SOL_SOCKET
|
|
|
|
|| cm->cmsg_len > clen) {
|
|
|
|
error = EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
data = CMSG_DATA(cm);
|
|
|
|
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;
|
|
|
|
|
|
|
|
switch (cm->cmsg_type) {
|
|
|
|
/*
|
|
|
|
* Fill in credential information.
|
|
|
|
*/
|
|
|
|
case SCM_CREDS:
|
|
|
|
*controlp = sbcreatecontrol(NULL, sizeof(*cmcred),
|
|
|
|
SCM_CREDS, SOL_SOCKET);
|
|
|
|
if (*controlp == NULL) {
|
|
|
|
error = ENOBUFS;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
cmcred = (struct cmsgcred *)
|
|
|
|
CMSG_DATA(mtod(*controlp, struct cmsghdr *));
|
|
|
|
cmcred->cmcred_pid = p->p_pid;
|
2002-02-27 18:32:23 +00:00
|
|
|
cmcred->cmcred_uid = td->td_ucred->cr_ruid;
|
|
|
|
cmcred->cmcred_gid = td->td_ucred->cr_rgid;
|
|
|
|
cmcred->cmcred_euid = td->td_ucred->cr_uid;
|
|
|
|
cmcred->cmcred_ngroups = MIN(td->td_ucred->cr_ngroups,
|
Add support to sendmsg()/recvmsg() for passing credentials between
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).
What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:
- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.
- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.
- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.
The receiver can now inspect the credential information and use it to
authenticate the client.
1997-03-21 16:12:32 +00:00
|
|
|
CMGROUP_MAX);
|
2001-10-04 13:11:48 +00:00
|
|
|
for (i = 0; i < cmcred->cmcred_ngroups; i++)
|
|
|
|
cmcred->cmcred_groups[i] =
|
2002-02-27 18:32:23 +00:00
|
|
|
td->td_ucred->cr_groups[i];
|
2001-10-04 13:11:48 +00:00
|
|
|
break;
|
Add support to sendmsg()/recvmsg() for passing credentials between
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).
What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:
- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.
- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.
- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.
The receiver can now inspect the credential information and use it to
authenticate the client.
1997-03-21 16:12:32 +00:00
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
case SCM_RIGHTS:
|
|
|
|
oldfds = datalen / sizeof (int);
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Check that all the FDs passed in refer to legal
|
|
|
|
* files. If not, reject the entire operation.
|
2001-10-04 13:11:48 +00:00
|
|
|
*/
|
|
|
|
fdp = data;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_LOCK(fdescp);
|
2001-10-04 13:11:48 +00:00
|
|
|
for (i = 0; i < oldfds; i++) {
|
|
|
|
fd = *fdp++;
|
|
|
|
if ((unsigned)fd >= fdescp->fd_nfiles ||
|
|
|
|
fdescp->fd_ofiles[fd] == NULL) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_UNLOCK(fdescp);
|
2001-10-04 13:11:48 +00:00
|
|
|
error = EBADF;
|
|
|
|
goto out;
|
|
|
|
}
|
2003-02-15 06:04:55 +00:00
|
|
|
fp = fdescp->fd_ofiles[fd];
|
|
|
|
if (!(fp->f_ops->fo_flags & DFLAG_PASSABLE)) {
|
|
|
|
FILEDESC_UNLOCK(fdescp);
|
|
|
|
error = EOPNOTSUPP;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
}
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Now replace the integer FDs with pointers to the
|
|
|
|
* associated global file table entry..
|
2001-10-04 13:11:48 +00:00
|
|
|
*/
|
|
|
|
newlen = oldfds * sizeof(struct file *);
|
|
|
|
*controlp = sbcreatecontrol(NULL, newlen,
|
|
|
|
SCM_RIGHTS, SOL_SOCKET);
|
|
|
|
if (*controlp == NULL) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_UNLOCK(fdescp);
|
2001-10-04 13:11:48 +00:00
|
|
|
error = E2BIG;
|
|
|
|
goto out;
|
|
|
|
}
|
2000-03-09 15:15:27 +00:00
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
fdp = data;
|
|
|
|
rp = (struct file **)
|
|
|
|
CMSG_DATA(mtod(*controlp, struct cmsghdr *));
|
|
|
|
for (i = 0; i < oldfds; i++) {
|
|
|
|
fp = fdescp->fd_ofiles[*fdp++];
|
|
|
|
*rp++ = fp;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_LOCK(fp);
|
2001-10-04 13:11:48 +00:00
|
|
|
fp->f_count++;
|
|
|
|
fp->f_msgcount++;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(fp);
|
2001-10-04 13:11:48 +00:00
|
|
|
unp_rights++;
|
|
|
|
}
|
2002-01-13 11:58:06 +00:00
|
|
|
FILEDESC_UNLOCK(fdescp);
|
2001-10-04 13:11:48 +00:00
|
|
|
break;
|
2000-03-09 15:15:27 +00:00
|
|
|
|
2001-10-04 13:11:48 +00:00
|
|
|
case SCM_TIMESTAMP:
|
|
|
|
*controlp = sbcreatecontrol(NULL, sizeof(*tv),
|
|
|
|
SCM_TIMESTAMP, SOL_SOCKET);
|
|
|
|
if (*controlp == NULL) {
|
|
|
|
error = ENOBUFS;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
tv = (struct timeval *)
|
|
|
|
CMSG_DATA(mtod(*controlp, struct cmsghdr *));
|
|
|
|
microtime(tv);
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
error = EINVAL;
|
|
|
|
goto out;
|
2000-03-09 15:15:27 +00:00
|
|
|
}
|
2001-10-04 13:11:48 +00:00
|
|
|
|
|
|
|
controlp = &(*controlp)->m_next;
|
|
|
|
|
|
|
|
if (CMSG_SPACE(datalen) < clen) {
|
|
|
|
clen -= CMSG_SPACE(datalen);
|
|
|
|
cm = (struct cmsghdr *)
|
|
|
|
((caddr_t)cm + CMSG_SPACE(datalen));
|
|
|
|
} else {
|
|
|
|
clen = 0;
|
|
|
|
cm = NULL;
|
2000-03-09 15:15:27 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2001-10-04 13:11:48 +00:00
|
|
|
|
|
|
|
out:
|
|
|
|
m_freem(control);
|
|
|
|
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2005-04-13 00:01:46 +00:00
|
|
|
struct mbuf *
|
|
|
|
unp_addsockcred(struct thread *td, struct mbuf *control)
|
|
|
|
{
|
2006-06-13 14:33:35 +00:00
|
|
|
struct mbuf *m, *n, *n_prev;
|
2005-04-13 00:01:46 +00:00
|
|
|
struct sockcred *sc;
|
2006-06-13 14:33:35 +00:00
|
|
|
const struct cmsghdr *cm;
|
2005-04-13 00:01:46 +00:00
|
|
|
int ngroups;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
ngroups = MIN(td->td_ucred->cr_ngroups, CMGROUP_MAX);
|
|
|
|
|
|
|
|
m = sbcreatecontrol(NULL, SOCKCREDSIZE(ngroups), SCM_CREDS, SOL_SOCKET);
|
|
|
|
if (m == NULL)
|
|
|
|
return (control);
|
|
|
|
|
|
|
|
sc = (struct sockcred *) CMSG_DATA(mtod(m, struct cmsghdr *));
|
|
|
|
sc->sc_uid = td->td_ucred->cr_ruid;
|
|
|
|
sc->sc_euid = td->td_ucred->cr_uid;
|
|
|
|
sc->sc_gid = td->td_ucred->cr_rgid;
|
|
|
|
sc->sc_egid = td->td_ucred->cr_gid;
|
|
|
|
sc->sc_ngroups = ngroups;
|
|
|
|
for (i = 0; i < sc->sc_ngroups; i++)
|
|
|
|
sc->sc_groups[i] = td->td_ucred->cr_groups[i];
|
|
|
|
|
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Unlink SCM_CREDS control messages (struct cmsgcred), since just
|
|
|
|
* created SCM_CREDS control message (struct sockcred) has another
|
|
|
|
* format.
|
2005-04-13 00:01:46 +00:00
|
|
|
*/
|
2006-06-13 14:33:35 +00:00
|
|
|
if (control != NULL)
|
|
|
|
for (n = control, n_prev = NULL; n != NULL;) {
|
|
|
|
cm = mtod(n, struct cmsghdr *);
|
|
|
|
if (cm->cmsg_level == SOL_SOCKET &&
|
|
|
|
cm->cmsg_type == SCM_CREDS) {
|
|
|
|
if (n_prev == NULL)
|
|
|
|
control = n->m_next;
|
|
|
|
else
|
|
|
|
n_prev->m_next = n->m_next;
|
|
|
|
n = m_free(n);
|
|
|
|
} else {
|
|
|
|
n_prev = n;
|
|
|
|
n = n->m_next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Prepend it to the head. */
|
|
|
|
m->m_next = control;
|
|
|
|
|
|
|
|
return (m);
|
2005-04-13 00:01:46 +00:00
|
|
|
}
|
|
|
|
|
2004-08-25 21:24:36 +00:00
|
|
|
/*
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
* unp_defer indicates whether additional work has been defered for a future
|
|
|
|
* pass through unp_gc(). It is thread local and does not require explicit
|
|
|
|
* synchronization.
|
2004-08-25 21:24:36 +00:00
|
|
|
*/
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
static int unp_defer;
|
|
|
|
|
|
|
|
static int unp_taskcount;
|
|
|
|
SYSCTL_INT(_net_local, OID_AUTO, taskcount, CTLFLAG_RD, &unp_taskcount, 0, "");
|
|
|
|
|
|
|
|
static int unp_recycled;
|
|
|
|
SYSCTL_INT(_net_local, OID_AUTO, recycled, CTLFLAG_RD, &unp_recycled, 0, "");
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
unp_gc(__unused void *arg, int pending)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2005-02-20 23:22:13 +00:00
|
|
|
struct file *fp, *nextfp;
|
|
|
|
struct socket *so;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct file **extra_ref, **fpp;
|
|
|
|
int nunref, i;
|
2004-07-02 07:40:10 +00:00
|
|
|
int nfiles_snap;
|
|
|
|
int nfiles_slack = 20;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
unp_taskcount++;
|
1994-05-24 10:09:53 +00:00
|
|
|
unp_defer = 0;
|
2004-01-11 19:48:19 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Before going through all this, set all FDs to be NOT defered and
|
|
|
|
* NOT externally accessible.
|
1996-12-05 22:41:13 +00:00
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
sx_slock(&filelist_lock);
|
1999-11-16 10:56:05 +00:00
|
|
|
LIST_FOREACH(fp, &filehead, f_list)
|
2002-01-13 11:58:06 +00:00
|
|
|
fp->f_gcflag &= ~(FMARK|FDEFER);
|
1994-05-24 10:09:53 +00:00
|
|
|
do {
|
2005-10-31 15:41:29 +00:00
|
|
|
KASSERT(unp_defer >= 0, ("unp_gc: unp_defer %d", unp_defer));
|
1999-11-16 10:56:05 +00:00
|
|
|
LIST_FOREACH(fp, &filehead, f_list) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_LOCK(fp);
|
1996-12-05 22:41:13 +00:00
|
|
|
/*
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
* If the file is not open, skip it -- could be a
|
|
|
|
* file in the process of being opened, or in the
|
|
|
|
* process of being closed. If the file is
|
|
|
|
* "closing", it may have been marked for deferred
|
|
|
|
* consideration. Clear the flag now if so.
|
1996-12-05 22:41:13 +00:00
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_count == 0) {
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
if (fp->f_gcflag & FDEFER)
|
|
|
|
unp_defer--;
|
|
|
|
fp->f_gcflag &= ~(FMARK|FDEFER);
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
continue;
|
2002-01-13 11:58:06 +00:00
|
|
|
}
|
1996-12-05 22:41:13 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* If we already marked it as 'defer' in a previous
|
|
|
|
* pass, then try process it this time and un-mark
|
|
|
|
* it.
|
1996-12-05 22:41:13 +00:00
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_gcflag & FDEFER) {
|
|
|
|
fp->f_gcflag &= ~FDEFER;
|
1994-05-24 10:09:53 +00:00
|
|
|
unp_defer--;
|
|
|
|
} else {
|
1996-12-05 22:41:13 +00:00
|
|
|
/*
|
|
|
|
* if it's not defered, then check if it's
|
|
|
|
* already marked.. if so skip it
|
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_gcflag & FMARK) {
|
|
|
|
FILE_UNLOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
continue;
|
2002-01-13 11:58:06 +00:00
|
|
|
}
|
2004-01-11 19:48:19 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* If all references are from messages in
|
|
|
|
* transit, then skip it. it's not externally
|
|
|
|
* accessible.
|
2004-01-11 19:48:19 +00:00
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_count == fp->f_msgcount) {
|
|
|
|
FILE_UNLOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
continue;
|
2002-01-13 11:58:06 +00:00
|
|
|
}
|
2004-01-11 19:48:19 +00:00
|
|
|
/*
|
1996-12-05 22:41:13 +00:00
|
|
|
* If it got this far then it must be
|
|
|
|
* externally accessible.
|
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
fp->f_gcflag |= FMARK;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
1996-12-05 22:41:13 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* Either it was defered, or it is externally
|
|
|
|
* accessible and not already marked so. Now check
|
|
|
|
* if it is possibly one of OUR sockets.
|
2004-01-11 19:48:19 +00:00
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
if (fp->f_type != DTYPE_SOCKET ||
|
2003-01-13 00:33:17 +00:00
|
|
|
(so = fp->f_data) == NULL) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
continue;
|
2002-01-13 11:58:06 +00:00
|
|
|
}
|
|
|
|
FILE_UNLOCK(fp);
|
1995-05-11 00:13:26 +00:00
|
|
|
if (so->so_proto->pr_domain != &localdomain ||
|
1994-05-24 10:09:53 +00:00
|
|
|
(so->so_proto->pr_flags&PR_RIGHTS) == 0)
|
|
|
|
continue;
|
1996-12-05 22:41:13 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* So, Ok, it's one of our sockets and it IS
|
|
|
|
* externally accessible (or was defered). Now we
|
|
|
|
* look to see if we hold any file descriptors in its
|
2004-01-11 19:48:19 +00:00
|
|
|
* message buffers. Follow those links and mark them
|
1996-12-05 22:41:13 +00:00
|
|
|
* as accessible too.
|
|
|
|
*/
|
2004-06-27 03:29:25 +00:00
|
|
|
SOCKBUF_LOCK(&so->so_rcv);
|
1994-05-24 10:09:53 +00:00
|
|
|
unp_scan(so->so_rcv.sb_mb, unp_mark);
|
2004-06-27 03:29:25 +00:00
|
|
|
SOCKBUF_UNLOCK(&so->so_rcv);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
} while (unp_defer);
|
2002-01-13 11:58:06 +00:00
|
|
|
sx_sunlock(&filelist_lock);
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
* XXXRW: The following comments need updating for a post-SMPng and
|
|
|
|
* deferred unp_gc() world, but are still generally accurate.
|
|
|
|
*
|
2006-07-22 17:24:55 +00:00
|
|
|
* We grab an extra reference to each of the file table entries that
|
|
|
|
* are not otherwise accessible and then free the rights that are
|
|
|
|
* stored in messages on them.
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* The bug in the orginal code is a little tricky, so I'll describe
|
|
|
|
* what's wrong with it here.
|
|
|
|
*
|
|
|
|
* It is incorrect to simply unp_discard each entry for f_msgcount
|
|
|
|
* times -- consider the case of sockets A and B that contain
|
|
|
|
* references to each other. On a last close of some other socket,
|
|
|
|
* we trigger a gc since the number of outstanding rights (unp_rights)
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
* is non-zero. If during the sweep phase the gc code unp_discards,
|
1994-05-24 10:09:53 +00:00
|
|
|
* we end up doing a (full) closef on the descriptor. A closef on A
|
|
|
|
* results in the following chain. Closef calls soo_close, which
|
|
|
|
* calls soclose. Soclose calls first (through the switch
|
|
|
|
* uipc_usrreq) unp_detach, which re-invokes unp_gc. Unp_gc simply
|
2006-07-22 17:24:55 +00:00
|
|
|
* returns because the previous instance had set unp_gcing, and we
|
|
|
|
* return all the way back to soclose, which marks the socket with
|
|
|
|
* SS_NOFDREF, and then calls sofree. Sofree calls sorflush to free
|
|
|
|
* up the rights that are queued in messages on the socket A, i.e.,
|
|
|
|
* the reference on B. The sorflush calls via the dom_dispose switch
|
|
|
|
* unp_dispose, which unp_scans with unp_discard. This second
|
1994-05-24 10:09:53 +00:00
|
|
|
* instance of unp_discard just calls closef on B.
|
|
|
|
*
|
|
|
|
* Well, a similar chain occurs on B, resulting in a sorflush on B,
|
|
|
|
* which results in another closef on A. Unfortunately, A is already
|
|
|
|
* being closed, and the descriptor has already been marked with
|
|
|
|
* SS_NOFDREF, and soclose panics at this point.
|
|
|
|
*
|
|
|
|
* Here, we first take an extra reference to each inaccessible
|
2006-07-22 17:24:55 +00:00
|
|
|
* descriptor. Then, we call sorflush ourself, since we know it is a
|
|
|
|
* Unix domain socket anyhow. After we destroy all the rights
|
|
|
|
* carried in messages, we do a last closef to get rid of our extra
|
|
|
|
* reference. This is the last close, and the unp_detach etc will
|
|
|
|
* shut down the socket.
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* 91/09/19, bsy@cs.cmu.edu
|
|
|
|
*/
|
2004-07-02 07:40:10 +00:00
|
|
|
again:
|
2004-12-01 09:22:26 +00:00
|
|
|
nfiles_snap = openfiles + nfiles_slack; /* some slack */
|
2004-07-02 07:40:10 +00:00
|
|
|
extra_ref = malloc(nfiles_snap * sizeof(struct file *), M_TEMP,
|
|
|
|
M_WAITOK);
|
2002-01-13 11:58:06 +00:00
|
|
|
sx_slock(&filelist_lock);
|
2004-12-01 09:22:26 +00:00
|
|
|
if (nfiles_snap < openfiles) {
|
2004-07-02 07:40:10 +00:00
|
|
|
sx_sunlock(&filelist_lock);
|
|
|
|
free(extra_ref, M_TEMP);
|
|
|
|
nfiles_slack += 20;
|
|
|
|
goto again;
|
|
|
|
}
|
2004-03-30 02:16:25 +00:00
|
|
|
for (nunref = 0, fp = LIST_FIRST(&filehead), fpp = extra_ref;
|
|
|
|
fp != NULL; fp = nextfp) {
|
1999-11-16 10:56:05 +00:00
|
|
|
nextfp = LIST_NEXT(fp, f_list);
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_LOCK(fp);
|
2004-01-11 19:48:19 +00:00
|
|
|
/*
|
1996-12-05 22:41:13 +00:00
|
|
|
* If it's not open, skip it
|
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_count == 0) {
|
|
|
|
FILE_UNLOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
continue;
|
2002-01-13 11:58:06 +00:00
|
|
|
}
|
2004-01-11 19:48:19 +00:00
|
|
|
/*
|
1996-12-05 22:41:13 +00:00
|
|
|
* If all refs are from msgs, and it's not marked accessible
|
2006-07-22 17:24:55 +00:00
|
|
|
* then it must be referenced from some unreachable cycle of
|
|
|
|
* (shut-down) FDs, so include it in our list of FDs to
|
|
|
|
* remove.
|
1996-12-05 22:41:13 +00:00
|
|
|
*/
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_count == fp->f_msgcount && !(fp->f_gcflag & FMARK)) {
|
1994-05-24 10:09:53 +00:00
|
|
|
*fpp++ = fp;
|
|
|
|
nunref++;
|
|
|
|
fp->f_count++;
|
|
|
|
}
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2002-01-13 11:58:06 +00:00
|
|
|
sx_sunlock(&filelist_lock);
|
2004-01-11 19:48:19 +00:00
|
|
|
/*
|
2006-07-22 17:24:55 +00:00
|
|
|
* For each FD on our hit list, do the following two things:
|
1996-12-05 22:41:13 +00:00
|
|
|
*/
|
1999-01-21 08:29:12 +00:00
|
|
|
for (i = nunref, fpp = extra_ref; --i >= 0; ++fpp) {
|
|
|
|
struct file *tfp = *fpp;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_LOCK(tfp);
|
2003-01-12 01:37:13 +00:00
|
|
|
if (tfp->f_type == DTYPE_SOCKET &&
|
2003-01-13 00:33:17 +00:00
|
|
|
tfp->f_data != NULL) {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(tfp);
|
2003-01-13 00:33:17 +00:00
|
|
|
sorflush(tfp->f_data);
|
2004-01-11 19:48:19 +00:00
|
|
|
} else {
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(tfp);
|
2004-01-11 19:48:19 +00:00
|
|
|
}
|
1999-01-21 08:29:12 +00:00
|
|
|
}
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
for (i = nunref, fpp = extra_ref; --i >= 0; ++fpp) {
|
2001-09-12 08:38:13 +00:00
|
|
|
closef(*fpp, (struct thread *) NULL);
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
unp_recycled++;
|
|
|
|
}
|
2002-06-28 23:17:36 +00:00
|
|
|
free(extra_ref, M_TEMP);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_dispose(struct mbuf *m)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
1997-02-10 02:22:35 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
if (m)
|
|
|
|
unp_scan(m, unp_discard);
|
|
|
|
}
|
|
|
|
|
2001-08-17 22:01:18 +00:00
|
|
|
static int
|
2005-10-30 19:44:40 +00:00
|
|
|
unp_listen(struct socket *so, struct unpcb *unp, int backlog,
|
|
|
|
struct thread *td)
|
2001-08-17 22:01:18 +00:00
|
|
|
{
|
In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections. As part of this state transition, solisten() calls into the
protocol to update protocol-layer state. There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.
This change does the following:
- Pushes the socket state transition from the socket layer solisten() to
to socket "library" routines called from the protocol. This permits
the socket routines to be called while holding the protocol mutexes,
preventing a race exposing the incomplete socket state transition to TCP
after the TCP state transition has completed. The check for a socket
layer state transition is performed by solisten_proto_check(), and the
actual transition is performed by solisten_proto().
- Holds the socket lock for the duration of the socket state test and set,
and over the protocol layer state transition, which is now possible as
the socket lock is acquired by the protocol layer, rather than vice
versa. This prevents additional state related races in the socket
layer.
This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another. Similar changes are likely require
elsewhere in the socket/protocol code.
Reported by: Peter Holm <peter@holm.cc>
Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod: gnn
2005-02-21 21:58:17 +00:00
|
|
|
int error;
|
|
|
|
|
Introduce a subsystem lock around UNIX domain sockets in order to protect
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
2004-06-10 21:34:38 +00:00
|
|
|
UNP_LOCK_ASSERT();
|
2001-08-17 22:01:18 +00:00
|
|
|
|
In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections. As part of this state transition, solisten() calls into the
protocol to update protocol-layer state. There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.
This change does the following:
- Pushes the socket state transition from the socket layer solisten() to
to socket "library" routines called from the protocol. This permits
the socket routines to be called while holding the protocol mutexes,
preventing a race exposing the incomplete socket state transition to TCP
after the TCP state transition has completed. The check for a socket
layer state transition is performed by solisten_proto_check(), and the
actual transition is performed by solisten_proto().
- Holds the socket lock for the duration of the socket state test and set,
and over the protocol layer state transition, which is now possible as
the socket lock is acquired by the protocol layer, rather than vice
versa. This prevents additional state related races in the socket
layer.
This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another. Similar changes are likely require
elsewhere in the socket/protocol code.
Reported by: Peter Holm <peter@holm.cc>
Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod: gnn
2005-02-21 21:58:17 +00:00
|
|
|
SOCK_LOCK(so);
|
|
|
|
error = solisten_proto_check(so);
|
|
|
|
if (error == 0) {
|
|
|
|
cru2x(td->td_ucred, &unp->unp_peercred);
|
|
|
|
unp->unp_flags |= UNP_HAVEPCCACHED;
|
2005-10-30 19:44:40 +00:00
|
|
|
solisten_proto(so, backlog);
|
In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections. As part of this state transition, solisten() calls into the
protocol to update protocol-layer state. There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.
This change does the following:
- Pushes the socket state transition from the socket layer solisten() to
to socket "library" routines called from the protocol. This permits
the socket routines to be called while holding the protocol mutexes,
preventing a race exposing the incomplete socket state transition to TCP
after the TCP state transition has completed. The check for a socket
layer state transition is performed by solisten_proto_check(), and the
actual transition is performed by solisten_proto().
- Holds the socket lock for the duration of the socket state test and set,
and over the protocol layer state transition, which is now possible as
the socket lock is acquired by the protocol layer, rather than vice
versa. This prevents additional state related races in the socket
layer.
This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another. Similar changes are likely require
elsewhere in the socket/protocol code.
Reported by: Peter Holm <peter@holm.cc>
Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod: gnn
2005-02-21 21:58:17 +00:00
|
|
|
}
|
|
|
|
SOCK_UNLOCK(so);
|
|
|
|
return (error);
|
2001-08-17 22:01:18 +00:00
|
|
|
}
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_scan(struct mbuf *m0, void (*op)(struct file *))
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-10-04 13:11:48 +00:00
|
|
|
struct mbuf *m;
|
|
|
|
struct file **rp;
|
|
|
|
struct cmsghdr *cm;
|
|
|
|
void *data;
|
|
|
|
int i;
|
|
|
|
socklen_t clen, datalen;
|
1994-05-24 10:09:53 +00:00
|
|
|
int qfds;
|
|
|
|
|
2004-03-30 02:16:25 +00:00
|
|
|
while (m0 != NULL) {
|
2001-10-04 13:11:48 +00:00
|
|
|
for (m = m0; m; m = m->m_next) {
|
2001-10-29 20:04:03 +00:00
|
|
|
if (m->m_type != MT_CONTROL)
|
2001-10-04 13:11:48 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
cm = mtod(m, struct cmsghdr *);
|
|
|
|
clen = m->m_len;
|
|
|
|
|
|
|
|
while (cm != NULL) {
|
|
|
|
if (sizeof(*cm) > clen || cm->cmsg_len > clen)
|
|
|
|
break;
|
|
|
|
|
|
|
|
data = CMSG_DATA(cm);
|
|
|
|
datalen = (caddr_t)cm + cm->cmsg_len
|
|
|
|
- (caddr_t)data;
|
|
|
|
|
|
|
|
if (cm->cmsg_level == SOL_SOCKET &&
|
|
|
|
cm->cmsg_type == SCM_RIGHTS) {
|
|
|
|
qfds = datalen / sizeof (struct file *);
|
|
|
|
rp = data;
|
|
|
|
for (i = 0; i < qfds; i++)
|
|
|
|
(*op)(*rp++);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (CMSG_SPACE(datalen) < clen) {
|
|
|
|
clen -= CMSG_SPACE(datalen);
|
|
|
|
cm = (struct cmsghdr *)
|
|
|
|
((caddr_t)cm + CMSG_SPACE(datalen));
|
|
|
|
} else {
|
|
|
|
clen = 0;
|
|
|
|
cm = NULL;
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2001-10-04 13:11:48 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
m0 = m0->m_act;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_mark(struct file *fp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2002-01-13 11:58:06 +00:00
|
|
|
if (fp->f_gcflag & FMARK)
|
1994-05-24 10:09:53 +00:00
|
|
|
return;
|
|
|
|
unp_defer++;
|
2002-01-13 11:58:06 +00:00
|
|
|
fp->f_gcflag |= (FMARK|FDEFER);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1995-12-14 09:55:16 +00:00
|
|
|
static void
|
2005-02-20 23:22:13 +00:00
|
|
|
unp_discard(struct file *fp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
UNP_LOCK();
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_LOCK(fp);
|
1994-05-24 10:09:53 +00:00
|
|
|
fp->f_msgcount--;
|
|
|
|
unp_rights--;
|
2002-01-13 11:58:06 +00:00
|
|
|
FILE_UNLOCK(fp);
|
Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
2005-11-10 16:06:04 +00:00
|
|
|
UNP_UNLOCK();
|
2001-09-12 08:38:13 +00:00
|
|
|
(void) closef(fp, (struct thread *)NULL);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|