freebsd-nq

Author	SHA1	Message	Date
Robert Watson	265de5bb62	Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>	2008-01-31 08:22:24 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
David Malone	041b706b2f	Despite several examples in the kernel, the third argument of sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.	2007-06-04 18:25:08 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Robert Watson	d19e16a72c	Generally migrate to ANSI function headers, and remove 'register' use.	2007-05-16 20:41:08 +00:00
Pyun YongHyeon	ccd8d954f3	Add missing socket buffer unlock before returning to userland. Reviewed by: rwatson	2007-05-08 12:34:14 +00:00
Robert Watson	7abab91135	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
Robert Watson	8c799760e1	Following movement of functions from uipc_socket2.c to uipc_socket.c and uipc_sockbuf.c, clean up and update comments.	2007-03-26 17:05:09 +00:00
Robert Watson	20d9e5e87c	Complete removal of uipc_socket2.c by moving the last few functions to other C files: - Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c. While sbcreatecontrol() is really an mbuf allocation routine, it does its work with awareness of the layout of socket buffer memory. - Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub versions of several of these functions live. Likewise, move socket state transition calls (soisconnecting(), etc) to uipc_socket.c. Moveo sodupsockaddr() and sotoxsocket().	2007-03-26 08:59:03 +00:00
Gleb Smirnoff	cd68a3f706	Move the dom_dispose and pru_detach calls in sofree() earlier. Only after calling pru_detach we can be absolutely sure, that we don't have any references to the socket in the stack. This closes race between lockless sbdestroy() and data arriving on socket. Reviewed by: rwatson	2007-03-22 13:21:24 +00:00
John Baldwin	7568503421	- Use m_gethdr(), m_get(), and m_clget() instead of the macros in sosend_copyin(). - Use M_WAITOK instead of M_TRYWAIT in sosend_copyin(). - Don't check for NULL from M_WAITOK and return ENOBUFS. M_WAITOK/M_TRYWAIT allocations don't fail with NULL. Reviewed by: andre Requested by: andre (2)	2007-03-12 19:27:36 +00:00
Ruslan Ermilov	fac61393b9	Don't block on the socket zone limit during the socket() call which can easily lock up a system otherwise; instead, return ENOBUFS as documented in a manpage, thus reverting us to the FreeBSD 4.x behavior. Reviewed by: rwatson MFC after: 2 weeks	2007-02-26 10:45:21 +00:00
Robert Watson	f58dd47091	Rename somaxconn_sysctl() to sysctl_somaxconn() so that I will be able to claim that sofoo() functions all accept a socket as their first argument.	2007-02-15 10:11:00 +00:00
Bruce M Simpson	7dc8d021ea	Diff reduction with RELENG_6, style(9): Remove unnecessary brace; && should be on end of line. No functional changes.	2007-02-03 03:57:45 +00:00
Andre Oppermann	6a37f331d7	Generic socket buffer auto sizing support, header defines, flag inheritance. MFC after: 1 month	2007-02-01 17:53:41 +00:00
Andre Oppermann	7c32173ba8	Unbreak writes of 0 bytes. Zero byte writes happen when only ancillary control data but no payload data is passed. Change m_uiotombuf() to return at least one empty mbuf if the requested length was zero. Add comment to sosend_dgram and sosend_generic(). Diagnoses by: jhb Regression test by: rwatson Pointy hat to. andre	2007-01-22 14:50:28 +00:00
Robert Watson	abdeb3b01f	Canonicalize copyrights in some files I hold copyrights on: - Sort by date in license blocks, oldest copyright first. - All rights reserved after all copyrights, not just the first. - Use (c) to be consistent with other entries. MFC after: 3 days	2007-01-08 17:49:59 +00:00
Bruce M Simpson	a86ec33820	Drop all received data mbufs from a socket's queue if the MT_SONAME mbuf is dropped, to preserve the invariant in the PR_ADDR case. Add a regression test to detect this condition, but do not hook it up to the build for now. PR: kern/38495 Submitted by: James Juran Reviewed by: sam, rwatson Obtained from: NetBSD MFC after: 2 weeks	2006-12-23 21:07:07 +00:00
Mohan Srinivasan	84eab9ad73	Fix a race in soclose() where connections could be queued to the listening socket after the pass that cleans those queues. This results in these connections being orphaned (and leaked). The fix is to clean up the so queues after detaching the socket from the protocol. Thanks to ups and jhb for discussions and a thorough code review.	2006-11-22 23:54:29 +00:00
Andre Oppermann	1ae4d97d51	Use the improved m_uiotombuf() function instead of home grown sosend_copyin() to do the userland to kernel copying in sosend_generic() and sosend_dgram(). sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported by m_uiotombuf(). Benchmaring shows significant improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:45:28 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Bruce M Simpson	4a75dc2585	Fix a case where socket I/O atomicity is violated due to not dropping the entire record when a non-data mbuf is removed in the soreceive() path. This only triggers a panic directly when compiled with INVARIANTS. PR: 38495 Submitted by: James Juran MFC after: 1 week	2006-09-22 15:34:16 +00:00
Pawel Jakub Dawidek	689f94bfe6	Fix a lock leak in an error case. Reported by: netchild Reviewed by: rwatson	2006-09-13 06:58:40 +00:00
Andre Oppermann	805def2e04	New sockets created by incoming connections into listen sockets should inherit all settings and options except listen specific options. Add the missing send/receive timeouts and low watermarks. Remove inheritance of the field so_timeo which is unused. Noticed by: phk Reviewed by: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-10 17:08:06 +00:00
George V. Neville-Neil	daa5817e92	Fix a kernel panic based on receiving an ICMPv6 Packet too Big message. PR: 99779 Submitted by: Jinmei Tatuya Reviewed by: clement, rwatson MFC after: 1 week	2006-08-18 14:05:13 +00:00
Robert Watson	79ad81c06d	Before performing a sodealloc() when pru_attach() fails, assert that the socket refcount remains 1, and then drop to 0 before freeing the socket. PR: 101763 Reported by: Gleb Kozyrev <gkozyrev at ukr dot net>	2006-08-11 23:03:10 +00:00
Robert Watson	9126410f4b	Move destroying kqueue state from above pru_detach to below it in sofree(), as a number of protocols expect to be able to call soisdisconnected() during detach. That may not be a good assumption, but until I'm sure if it's a good assumption or not, allow it.	2006-08-02 18:37:44 +00:00
Robert Watson	c0e1415d51	Move updated of 'numopensockets' from bottom of sodealloc() to the top, eliminating a second set of identical mutex operations at the bottom. This allows brief exceeding of the max sockets limit, but only by sockets in the last stages of being torn down.	2006-08-02 00:45:27 +00:00
Robert Watson	eaa6dfbcc2	Reimplement socket buffer tear-down in sofree(): as the socket is no longer referenced by other threads (hence our freeing it), we don't need to set the can't send and can't receive flags, wake up the consumers, perform two levels of locking, etc. Implement a fast-path teardown, sbdestroy(), which flushes and releases each socket buffer. A manual dom_dispose of the receive buffer is still required explicitly to GC any in-flight file descriptors, etc, before flushing the buffer. This results in a 9% UP performance improvement and 16% SMP performance improvement on a tight loop of socket();close(); in micro-benchmarking, but will likely also affect CPU-bound macro-benchmark performance.	2006-08-01 10:30:26 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
Robert Watson	809c2b789c	Update various uipc_socket.c comments, and reformat others.	2006-07-23 20:36:04 +00:00
Robert Watson	a152f8a361	Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn	2006-07-21 17:11:15 +00:00
Robert Watson	5cd1a27145	Change comment on soabort() to more accurately describe how/when soabort() is used. Remove trailing white space.	2006-07-16 23:09:39 +00:00
Robert Watson	5908c617bb	Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel) return void, so don't implement no-op versions of these functions. Instead, consistently check if those switch pointers are NULL before invoking them.	2006-07-11 23:18:28 +00:00
Robert Watson	f949ae9b31	When pru_attach() fails, call sodealloc() on the socket rather than using sorele() and the full tear-down path. Since protocol state allocation failed, this is not required (and is arguably undesirable). This matches the behavior of sonewconn() under the same circumstances.	2006-07-11 21:56:58 +00:00
Robert Watson	721150ad8f	When retrieving SO_ERROR via getsockopt(), hold the socket lock around the retrieval and replacement with 0. MFC after: 1 week	2006-06-18 19:02:49 +00:00
Robert Watson	b37ffd3189	Move some functions and definitions from uipc_socket2.c to uipc_socket.c: - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month	2006-06-10 14:34:07 +00:00
Robert Watson	e02421f3fb	Rearrange code in soalloc() so that it's less indented by returning early if uma_zalloc() from the socket zone fails. No functional change. MFC after: 1 week	2006-06-08 22:33:18 +00:00
Robert Watson	0cec9959e8	Assert that sockets passed into soabort() not be SQ_COMP or SQ_INCOMP, since that removal should have been done a layer up. MFC after: 3 months	2006-04-23 18:15:54 +00:00
Robert Watson	28ea180136	Add missing 'not' to SQ_COMP comment. MFC after: 3 months	2006-04-23 15:37:23 +00:00
Robert Watson	6ca35d4b81	Move handling of SQ_COMP exception case in sofree() to the top of the function along with the remainder of the reference checking code. Move comment from body to header with remainder of comments. Inclusion of a socket in a completed connection queue counts as a true reference, and should not be handled as an under-documented edge case. MFC after: 3 months	2006-04-23 15:33:38 +00:00
Robert Watson	bc725eafc7	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
Robert Watson	ac45e92ff2	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
Robert Watson	7f689de232	Assert so->so_pcb is NULL in sodealloc() -- the protocol state should not be present at this point. We will eventually remove this assert because the socket layer should never look at so_pcb, but for now it's a useful debugging tool. MFC after: 3 months	2006-04-01 10:45:52 +00:00
Robert Watson	220c1357ed	Add a somewhat sizable comment documenting the semantics of various kernel socket calls relating to the creation and destruction of sockets. This will eventually form the foundation of socket(9), but is currently in too much flux to do so. MFC after: 3 months	2006-04-01 10:43:02 +00:00
Robert Watson	92c07a345e	Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.	2006-03-16 07:03:14 +00:00
Robert Watson	93709ad0be	As with socket consumer references (so_count), make sofree() return without GC'ing the socket if a strong protocol reference to the socket is present (SS_PROTOREF).	2006-03-15 12:45:35 +00:00
Robert Watson	13f322c2fc	Improve consistency of return() style. MFC after: 3 days	2006-02-12 15:00:27 +00:00
Robert Watson	b8ae1cd619	Add sosend_dgram(), a greatly reduced and simplified version of sosend() intended for use solely with atomic datagram socket types, and relies on the previous break-out of sosend_copyin(). Changes to allow UDP to optionally use this instead of sosend() will be committed as a follow-up.	2006-01-13 10:22:01 +00:00
John Baldwin	398293a8de	Fix snderr() to not leak the socket buffer lock if an error occurs in sosend(). Robert accidentally changed the snderr() macro to jump to the out label which assumes the lock is already released rather than the release label which drops the lock in his previous change to sosend(). This should fix the recent panics about returning from write(2) with the socket lock held and the most recent LOR on current@.	2005-11-29 23:07:14 +00:00

1 2 3 4 5 ...

304 Commits