freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	f7780c61e7	The unp_gc() function drops and reaquires lock between scan and collect phases. The unp_discard() function executes unp_externalize_fp(), which might make the socket eligible for gc-ing, and then, later, taskqueue will close the socket. Since unp_gc() dropped the list lock to do the malloc, close might happen after the mark step but before the collection step, causing collection to not find the socket and miss one array element. I believe that the race was there before r216158, but the stated revision made the window much wider by postponing the close to taskqueue sometimes. Only process as much array elements as we find the sockets during second phase of gc [1]. Take linkage lock and recheck the eligibility of the socket for gc, as well as call fhold() under the linkage lock. Reported and tested by: jmallett Submitted by: jmallett [1] Reviewed by: rwatson, jeff (possibly) MFC after: 1 week	2011-02-01 13:33:49 +00:00
Matthew D Fleming	2fee06f087	Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string.	2011-01-18 21:14:18 +00:00
Konstantin Belousov	9f4ba450f2	Trim whitespaces at the end of lines. Use the commit to record proper log message for r216150. MFC after: 1 week If unix socket has a unix socket attached as the rights that has a unix socket attached as the rights that has a unix socket attached as the rights ... Kernel may overflow the stack on attempt to close such socket. Only close the rights file in the context of the current close if the file is not unix domain socket. Otherwise, postpone the work to taskqueue, preventing unlimited recursion. The pass of the unix domain sockets over the SCM_RIGHTS message control is not widely used, and more, the close of the socket with still attached rights is mostly an application failure. The change should not affect the performance of typical users of SCM_RIGHTS. Reviewed by: jeff, rwatson	2010-12-03 20:39:06 +00:00
Konstantin Belousov	0cb64678bc	Reviewed by: jeff, rwatson MFC after: 1 week	2010-12-03 16:15:44 +00:00
Edward Tomasz Napierala	175389cff2	Remove spurious '/*-' marks and fix some other style problems. Submitted by: bde@	2010-07-22 05:42:29 +00:00
Edward Tomasz Napierala	1a996ed1d8	Revert r210225 - turns out I was wrong; the "/*-" is not license-only thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@	2010-07-18 20:57:53 +00:00
Edward Tomasz Napierala	805cc58ac0	The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright occurences from sys/sys/ and sys/kern/.	2010-07-18 20:23:10 +00:00
Robert Watson	604f19c91e	Fix build on amd64, where sysctl arg1 is a pointer. Reported by: Mr Tinderbox MFC after: 3 months	2009-10-05 22:23:12 +00:00
Robert Watson	84d61770bc	First cut at implementing SOCK_SEQPACKET support for UNIX (local) domain sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google	2009-10-05 14:49:16 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Jamie Gritton	399645e1b9	Remove unnecessary/redundant includes. Approved by: bz (mentor)	2009-06-23 14:39:21 +00:00
John Baldwin	afd9f91c63	Fix a deadlock in the getpeername() method for UNIX domain sockets. Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week	2009-06-18 20:56:22 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Robert Watson	f93bfb23dc	Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project	2009-06-02 18:26:17 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Robert Watson	885868cd8f	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
Robert Watson	3dab55bc86	Decompose the global UNIX domain sockets rwlock into two different locks: a global list/counter/generation counter protected by a new mutex unp_list_lock, and a global linkage rwlock, unp_global_rwlock, which protects the connections between UNIX domain sockets. This eliminates conditional lock acquisition that was previously a property of the global lock being held over sonewconn() leading to a call to uipc_attach(), which also required the global lock, but couldn't rely on it as other paths existed to uipc_attach() that didn't hold it: now uipc_attach() uses only the list lock, which follows the linkage lock in the lock order. It may also reduce contention on the global lock for some workloads. Add global UNIX domain socket locks to hard-coded witness lock order. MFC after: 1 week Discussed with: kris	2009-03-08 21:48:29 +00:00
Robert Watson	b523ec24b9	White space and comment tweaks. MFC after: 3 weeks	2009-01-01 20:03:22 +00:00
Robert Watson	a9f3c7d2ff	Rename mbcnt to mbcnt_delta in uipc_send() -- unlike other local variables named mbcnt in uipc_usrreq.c, this instance is a delta rather than a cache of sb_mbcnt. MFC after: 3 weeks	2008-12-30 16:09:57 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Robert Watson	4b0e2b9add	Remove stale comment: while uipc_connect2() was, until recently, not static so it could be used by fifofs (actually portalfs), it is now static. Submitted by: kensmith	2008-10-11 17:28:22 +00:00
Robert Watson	e298cf5902	Remove stale comment (and XXX saying so) about why we zero the file descriptor pointer in unp_freerights: we can no longer recurse into unp_gc due to unp_gc being invoked in a deferred way, but it's still a good idea. MFC after: 3 days	2008-10-08 06:26:51 +00:00
Robert Watson	fa9402f28a	Differentiate pr_usrreqs for stream and datagram UNIX domain sockets, and employ soreceive_dgram for the datagram case. MFC after: 3 months	2008-10-08 06:19:49 +00:00
Robert Watson	2c8995842c	Now that portalfs doesn't directly invoke uipc_connect2(), make it a static symbol. MFC after: 3 days	2008-10-06 18:43:11 +00:00
Robert Watson	0b36cd25fc	Further minor cleanups to UNIX domain sockets: - Staticize and locally prototype functions uipc_ctloutput(), unp_dispose(), unp_init(), and unp_externalize(), none of which have been required outside of uipc_usrreq.c since uipc_proto.c was removed. - Remove stale prototype for uipc_usrreq(), which has not existed in the code since 1997 - Forward declare and staticize uipc_usrreqs structure in uipc_usrreq.c and not un.h. - Comment on why uipc_connect2() is still non-static -- it is used directly by fifofs. - Remove stale comments, tidy up whitespace. MFC after: 3 days (where applicable)	2008-10-03 13:01:56 +00:00
Robert Watson	60a5ef26a1	Remove or update several stale comments. A bit of whitespace/style cleanup. Update copyright. MFC after: 3 days (applicable changes)	2008-10-03 09:01:55 +00:00
Tom Rhodes	be6b130476	Fill in a few sysctl descriptions. Approved by: rwatson	2008-07-26 00:55:35 +00:00
Ed Maste	7928893d83	Use bcopy instead of strlcpy in uipc_bind and unp_connect, since soun->sun_path isn't a null-terminated string. As UNIX(4) states, "the terminating NUL is not part of the address." Since strlcpy has to return "the total length of the string [it] tried to create," it walks off the end of soun->sun_path looking for a \0. This reverts r105332. Reported by: Ryan Stone	2008-07-03 23:26:10 +00:00
Robert Watson	8c96f9c193	Move unlock of global UNIX domain socket lock slightly lower in unp_connect(): it is expected to return with the lock held, and two possible error paths otherwise returned with it unlocked. The fix committed here is slightly different from the patch in the PR, but along an alternative line suggested in the PR. PR: 119778 MFC after: 3 days Submitted by: James Juran <james dot juran at baesystems dot com>	2008-01-18 19:16:03 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Robert Watson	8a69e5fa71	Remove "lock pushdown" todo item in comment -- I did that for 7.0. MFC after: 3 weeks	2008-01-10 12:38:17 +00:00
Robert Watson	a635784569	Correct typos in comments. MFC after: 3 weeks	2008-01-10 12:29:12 +00:00
Jeff Roberson	41e0f66d41	- Place the fhold() in unp_internalize_fp to be more consistent with refs. - Clear all of the gc flags before doing a run. Stale flags were causing us to skip some descriptors. - If a unp socket has been marked REF in a gc pass it can't be dead. Found by: rwatson's test tool.	2008-01-01 01:46:42 +00:00
Jeff Roberson	6f552cb098	- Check the correct variable against NULL in two places. - If the unp_file is NULL that means it has never been internalized and it must be reachable.	2007-12-31 03:44:54 +00:00
Jeff Roberson	397c19d175	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
Robert Watson	03c96c3176	Add DDB "show unpcb" command, allowing DDB to print out many pertinent details from UNIX domain socket protocol layer state.	2007-05-29 12:36:00 +00:00
Robert Watson	08d73f1370	Remove more one more stale comment regarding unpcb type-safety.	2007-05-11 12:28:45 +00:00
Robert Watson	d7924b7086	Clarify and update quite a few comments to reflect locking optimizations, the addition of unpcb refcounts, and bug fixes. Some of these fixes are appropriate for MFC. MFC after: 3 days	2007-05-11 12:10:45 +00:00
Wojciech A. Koszek	9e2894466a	Don't acquire Giant unconditionally. Reviewed by: rwatson	2007-05-06 12:00:38 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Robert Watson	6e2faa2444	In uipc_close(), we no longer always free the unpcb, as the last reference may be dropped later. In this case, always unlock the unpcb so as not to leak the lock. Found by: kris (BugMagnet)	2007-03-12 14:52:00 +00:00
Robert Watson	ede6e136f8	Remove two simultaneous acquisitions of multiple unpcb locks from uipc_send in cases where only a global read lock is held by breaking them out and avoiding the unpcb lock acquire in the common case. This avoids deadlocks which manifested with X11, and should also marginally further improve performance. Reported by: sepotvin, brooks	2007-03-01 09:00:42 +00:00
Robert Watson	3592fd4de5	Lock unp2 after checking for a non-NULL unp2 pointer in uipc_send() on datagram UNIX domain sockets, not before.	2007-02-28 08:08:50 +00:00
Robert Watson	e7c33e29ed	Revise locking strategy used for UNIX domain sockets in order to improve concurrency: - Add per-unpcb mutexes protecting unpcb connection state, fields, etc. - Replace global UNP mutex with a global UNP rwlock, which will protect the UNIX domain socket connection topology, v_socket, and be acquired exclusively before acquiring more than per-unpcb at a time in order to avoid lock order issues. In performance measurements involving MySQL, this change has little or no overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in multi-processor measurements using the sysbench and supersmack benchmarks. Much testing by: kris Approved by: re (kensmith)	2007-02-26 20:47:52 +00:00
Robert Watson	6fac927ccc	Add an additional MAC check to the UNIX domain socket connect path: check that the subject has read/write access to the vnode using the vnode MAC check. MFC after: 3 weeks Submitted by: Spencer Minear <spencer_minear at securecomputing dot com> Obtained from: TrustedBSD Project	2007-02-22 09:37:44 +00:00
Robert Watson	5b950deabc	Break introductory comment into two paragraphs to separate material on the garbage collection complications from general discussion of UNIX domain sockets. Staticize unp_addsockcred(). Remove XXX comment regarding Giant and v_socket -- v_socket is protected by the global UNIX domain socket lock.	2007-02-20 10:50:02 +00:00
Robert Watson	aea52f1bf8	Minor rearrangement of global variables, comments, etc, in UNIX domain sockets.	2007-02-14 15:05:40 +00:00
Robert Watson	46a1d9bfe8	Change unp_mtx to supporting recursion, and do not drop the unp_mtx over sonewconn() in unp_connect(). This avoids a race that occurs due to v_socket being an uncounted reference, as the lock was being released in order to call sonewconn(), which otherwise recurses into the UNIX domain socket code via pru_attach, as well as holding the lock over a sleeping memory allocation in uipc_attach(). Switch to a non-sleeping memory allocation during UNIX domain socket attach. This fix non-ideal in that it requires enabling recursion, but is a much smaller change than moving to using true references for v_socket. The reported panic occurs in unp_connect() following the return of sonewconn(). Update copyright year. Panic reported by: jhb	2007-02-14 12:22:11 +00:00

1 2 3 4 5

242 Commits