Commit Graph

187 Commits

Author SHA1 Message Date
Robert Watson
b7e2f3ec76 Minor white space tweaks. 2006-08-13 23:16:59 +00:00
Robert Watson
e4445a031f Move definition of UNIX domain socket protosw and domain entries from
uipc_proto.c to uipc_usrreq.c, making localdomain static.  Remove
uipc_proto.c as it's no longer used.  With this change, UNIX domain
sockets are entirely encapsulated in uipc_usrreq.c.
2006-08-07 12:02:43 +00:00
Robert Watson
52b384621e Don't set pru_sosend, pru_soreceive, pru_sopoll to default values, as they
are already set to default values.
2006-08-06 10:39:21 +00:00
Robert Watson
f8b20fb6d6 Remove now unneeded ENOTCONN clause from SOCK_DGRAM side of uipc_send():
we have to check it regardless of the target address, so don't check it
twice.
2006-08-02 14:30:58 +00:00
Robert Watson
b5ff091431 Close a race that occurs when using sendto() to connect and send on a
UNIX domain socket at the same time as the remote host is closing the
new connections as quickly as they open.  Since the connect() and
send() paths are non-atomic with respect to another, it is possible
for the second thread's close() call to disconnect the two sockets
as connect() returns, leading to the consumer (which plans to send())
with a NULL kernel pointer to its proposed peer.  As a result, after
acquiring the UNIX domain socket subsystem lock, we need to revalidate
the connection pointers even though connect() has technically succeed,
and reurn an error to say that there's no connection on which to
perform the send.

We might want to rethink the specific errno number, perhaps ECONNRESET
would be better.

PR:		100940
Reported by:	Young Hyun <youngh at caida dot org>
MFC after:	2 weeks
MFC note:	Some adaptation will be required
2006-07-31 23:00:05 +00:00
Robert Watson
0075d85869 Remove call to soisdisconnected() in uipc_detach(), since it will already
have been invoked by uipc_close() or uipc_abort(), and the socket is in a
state of being torn down by the time we get to this point, so kqueue
state frobbed by soisdisconnected() is not available, so frobbing it will
result in a panic.

Reported by:	Munehiro Matsuda <haro at h4 dot dion dot ne dot jp>
2006-07-26 19:16:34 +00:00
Robert Watson
b0668f7151 soreceive_generic(), and sopoll_generic(). Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman
2006-07-24 15:20:08 +00:00
Robert Watson
ca948c5e93 Remove duplicate 'or'.
Submitted by:	ru
2006-07-23 21:01:09 +00:00
Robert Watson
f23929fbc5 Add additional comments to the top of the UNIX domain socket implementation
providing some high level pointers regarding the implementation.
2006-07-23 20:06:45 +00:00
Robert Watson
4f1f0ef523 Add two new unpcb flags, UNP_BINDING and UNP_CONNECTING, which will be
used to mark UNIX domain sockets as being in the process of binding or
connecting.  Use these to prevent simultaneous bind or connect
operations by multiple threads or processes on the same socket at the
same time, which closes race conditions present in the UNIX domain
socket implementation since inception.
2006-07-23 12:01:14 +00:00
Robert Watson
dd47f5ca9c Merge unp_bind() into uipc_bind(), as it is called only from uipc_bind(). 2006-07-23 11:02:12 +00:00
Robert Watson
6d32873c29 Since unp_attach() and unp_detach() are now called only from uipc_attach()
and uipc_detach(), merge them into their calling functions.
2006-07-23 10:25:28 +00:00
Robert Watson
7e711c3aae Move various UNIX socket global variables and sysctls from the middle of
the file to the top.
2006-07-23 10:19:04 +00:00
Robert Watson
f3f49bbbe8 In uipc_send() and uipc_rcvd(), store unp->unp_conn pointer in unp2
while working with the second unpcb to make the code more clear.
2006-07-22 18:41:42 +00:00
Robert Watson
1c381b19ff Re-wrap and other minor formatting and punctuation fixes for UNIX domain
socket comments.
2006-07-22 17:24:55 +00:00
Robert Watson
a152f8a361 Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn
2006-07-21 17:11:15 +00:00
Robert Watson
337cc6b60e Reduce periods of simultaneous acquisition of various socket buffer
locks and the unplock during uipc_rcvd() and uipc_send() by caching
certain values from one structure while its locks are held, and
applying them to a second structure while its locks are held.  If
done carefully, this should be correct, and will reduce the amount
of work done with the global unp lock held.

Tested by:	kris (earlier version)
2006-07-11 21:49:54 +00:00
Robert Watson
e83b30bdcb Trim basically unused 'unp' in uipc_connect(). 2006-06-26 16:18:22 +00:00
Robert Watson
9a44cbf19c Remove unused (and ifdef'd) unp_abort() and unp_drain().
MFC after:	1 month
2006-06-16 22:11:49 +00:00
Maxim Konovalov
70df31f4de o There are two methods to get a process credentials over the unix
sockets:

1) A sender sends SCM_CREDS message to a reciever, struct cmsgcred;
2) A reciever sets LOCAL_CREDS socket option and gets sender
credentials in control message, struct sockcred.

Both methods use the same control message type SCM_CREDS with the
same control message level SOL_SOCKET, so they are indistinguishable
for the receiver.  A difference in struct cmsgcred and struct sockcred
layouts may lead to unwanted effects.

Now for sockets with LOCAL_CREDS option remove all previous linked
SCM_CREDS control messages and then add a control message with
struct sockcred so the process specifically asked for the peer
credentials by LOCAL_CREDS option always gets struct sockcred.

PR:		kern/90800
Submitted by:	Andrey Simonenko
Regres. tests:	tools/regression/sockets/unix_cmsg/
MFC after:	1 month
2006-06-13 14:33:35 +00:00
Maxim Konovalov
481f8fe85f Inherit LOCAL_CREDS option from listen socket for sockets returned
by accept(2).

PR:		kern/90644
Submitted by:	Andrey Simonenko
OK'ed by:	mdodd
Tested by:	NetBSD regress/sys/kern/unfdpass/unfdpass.c
MFC after:	1 month
2006-04-24 19:09:33 +00:00
Paul Saab
4f590175b7 Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00
Robert Watson
bc725eafc7 Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months
2006-04-01 15:42:02 +00:00
Robert Watson
ac45e92ff2 Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months
2006-04-01 15:15:05 +00:00
Robert Watson
4d4b555efa Modify UNIX domain sockets to guarantee, and assume, that so_pcb is always
defined for an in-use socket.  This allows us to eliminate countless tests
of whether so_pcb is non-NULL, eliminating dozens of error cases.  For
now, retain the call to sotryfree() in the uipc_abort() path, but this
will eventually move to soabort().

These new assumptions should be largely correct, and will become more so
as the socket/pcb reference model is fixed.  Removing the notion that
so_pcb can be non-NULL is a critical step towards further fine-graining
of the UNIX domain socket locking, as the so_pcb reference no longer
needs to be protected using locks, instead it is a property of the socket
life cycle.
2006-03-17 13:52:57 +00:00
Jeff Roberson
033eb86e52 - Lock access to vrele() with VFS_LOCK_GIANT() rather than mtx_lock(&Giant).
Sponsored by:	Isilon Systems, Inc.
2006-01-30 08:19:01 +00:00
Robert Watson
d7dca9034c XXX a comment in uipc_usrreq.c that requires updating. 2006-01-13 00:00:32 +00:00
Maxime Henrion
e59898ff36 Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to
match the type of the variable they are exporting.

Spotted by:	Thomas Hurst <tom@hur.st>
MFC after:	3 days
2005-12-14 22:27:48 +00:00
Robert Watson
a0ec558af0 Correct a number of serious and closely related bugs in the UNIX domain
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received.  The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism.  In order to
fix these problems, make the following changes:

- When there are in-flight sockets and a UNIX domain socket is destroyed,
  asynchronously schedule the garbage collector, rather than running it
  synchronously in the current context.  This avoids lock order issues
  when the garbage collection code reenters the UNIX domain socket code,
  avoiding lock order reversals, deadlocks, etc.  Run the code
  asynchronously in a task queue.

- In the garbage collector, when skipping file descriptors that have
  entered a closing state (i.e., have f_count == 0), re-test the FDEFER
  flag, and decrement unp_defer.  As file descriptors can now transition
  to a closed state, while the garbage collector is running, it is no
  longer the case that unp_defer will remain an accurate count of
  deferred sockets in the mark portion of the GC algorithm.  Otherwise,
  the garbage collector will loop waiting waiting for unp_defer to reach
  zero, which it will never do as it is skipping file descriptors that
  were marked in an earlier pass, but now closed.

- Acquire the UNIX domain socket subsystem lock in unp_discard() when
  modifying the unp_rights counter, or a read/write race is risked with
  other threads also manipulating the counter.

While here:

- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
  the garbage collector, this is not required as we are able to use the
  socket buffer receive lock to protect scanning the receive buffer for
  in-flight file descriptors on the socket buffer.

- Annotate that the description of the garbage collector implementation
  is increasingly inaccurate and needs to be updated.

- Add counters of the number of deferred garbage collections and recycled
  file descriptors.  This will be removed and is here temporarily for
  debugging purposes.

With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.

Reported by:	Philip Kizer <pckizer at nostrum dot com>
MFC after:	2 weeks
2005-11-10 16:06:04 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Robert Watson
d374e81efd Push the assignment of a new or updated so_qlimit from solisten()
following the protocol pru_listen() call to solisten_proto(), so
that it occurs under the socket lock acquisition that also sets
SO_ACCEPTCONN.  This requires passing the new backlog parameter
to the protocol, which also allows the protocol to be aware of
changes in queue limit should it wish to do something about the
new queue limit.  This continues a move towards the socket layer
acting as a library for the protocol.

Bump __FreeBSD_version due to a change in the in-kernel protocol
interface.  This change has been tested with IPv4 and UNIX domain
sockets, but not other protocols.
2005-10-30 19:44:40 +00:00
Robert Watson
e1ac28e239 Canonicalize the UNIX domain socket copyright layout: original holders
before more recent holders.

MFC after:	3 days
2005-09-23 12:41:06 +00:00
Colin Percival
fe2eee8231 Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by:	Uwe Doering
2005-05-07 00:41:36 +00:00
Matthew N. Dodd
abb886facb Add missing break.
Found by:	marcus
2005-04-25 00:48:04 +00:00
Matthew N. Dodd
96a041b533 Check sopt_level in uipc_ctloutput() and return early if it is non-zero.
This prevents unintended consequnces when an application calls things like
setsockopt(x, SOL_SOCKET, SO_REUSEADDR, ...) on a Unix domain socket.
2005-04-20 02:57:56 +00:00
Matthew N. Dodd
6a2989fd54 Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT.
- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between
  PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT)

Obtained from:	 NetBSD (with some changes)
2005-04-13 00:01:46 +00:00
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
Robert Watson
c364c823d0 When aborting a UNIX domain socket bind() because VOP_CREATE() failed,
make sure to call vn_finished_write(mp) before returning.

MFC after:	3 days
2005-02-21 14:21:50 +00:00
Robert Watson
892af6b930 style(9)-ize function headers, remove use of 'register'.
MFC after:	3 days
2005-02-20 23:22:13 +00:00
Robert Watson
d664e4fa50 In unp_attach(), allow uma_zalloc to zero the new unpcb rather than
explicitly using bzero().

Update copyright.

MFC after:	3 days
2005-02-20 20:05:11 +00:00
Robert Watson
7301cf23ef Move assignment of UNIX domain socket pcb during unp_attach() outside
of the global UNIX domain socket mutex: no protection is needed that
early in the setup of the UNIX domain socket and socket structures.

MFC after:	3 days
2005-02-20 04:18:22 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Robert Watson
38e6a58c77 Remove temporary debugging printf that was used to detect the presence
of a race that had previously caused a panic in order to determine if
the fix was for the right problem.  It was.

MFC after:	2 weeks
2004-12-23 01:19:27 +00:00
Alan Cox
7abe2ac214 Add send buffer locking to uipc_send(). Without this locking a race can
occur between a reader and a writer that results in a panic upon close,
e.g.,
	"panic: sbflush_locked: cc 4 || mb 0xffffff0052afa400 || mbcnt 0"

Reviewed by: rwatson@
MFC after: 2 weeks
2004-12-22 20:28:46 +00:00
Poul-Henning Kamp
e4643c730a "nfiles" is a bad name for a global variable. Call it "openfiles" instead
as this is more correct and matches the sysctl variable.
2004-12-01 09:22:26 +00:00
Poul-Henning Kamp
756d52a195 Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
Robert Watson
81158452be Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
Robert Watson
161a0c7cff Don't hold the UNIX domain socket subsystem lock over the body of the
UNIX domain socket garbage collection implementation, as that risks
holding the mutex over potentially sleeping operations (as well as
introducing some nasty lock order issues, etc).  unp_gc() will hold
the lock long enough to do necessary deferal checks and set that it's
running, but then release it until it needs to reset the gc state.

RELENG_5 candidate.

Discussed with:	alfred
2004-08-25 21:24:36 +00:00
Robert Watson
4c5bc1ca39 Add UNP_UNLOCK_ASSERT() to asser that the UNIX domain socket subsystem
lock is not held.

Rather than annotating that the lock is released after calls to
unp_detach() with a comment, annotate with an assertion.

Assert that the UNIX domain socket subsystem lock is not held when
unp_externalize() and unp_internalize() are called.
2004-08-19 01:45:16 +00:00
Robert Watson
40f2ac28a0 Always acquire the UNIX domain socket subsystem lock (UNP lock)
before dereferencing sotounpcb() and checking its value, as so_pcb
is protected by protocol locking, not subsystem locking.  This
prevents races during close() by one thread and use of ths socket
in another.

unp_bind() now assert the UNP lock, and uipc_bind() now acquires
the lock around calls to unp_bind().
2004-08-16 04:41:03 +00:00