freebsd-skq

Author	SHA1	Message	Date
George V. Neville-Neil	daa5817e92	Fix a kernel panic based on receiving an ICMPv6 Packet too Big message. PR: 99779 Submitted by: Jinmei Tatuya Reviewed by: clement, rwatson MFC after: 1 week	2006-08-18 14:05:13 +00:00
Robert Watson	79ad81c06d	Before performing a sodealloc() when pru_attach() fails, assert that the socket refcount remains 1, and then drop to 0 before freeing the socket. PR: 101763 Reported by: Gleb Kozyrev <gkozyrev at ukr dot net>	2006-08-11 23:03:10 +00:00
Robert Watson	9126410f4b	Move destroying kqueue state from above pru_detach to below it in sofree(), as a number of protocols expect to be able to call soisdisconnected() during detach. That may not be a good assumption, but until I'm sure if it's a good assumption or not, allow it.	2006-08-02 18:37:44 +00:00
Robert Watson	c0e1415d51	Move updated of 'numopensockets' from bottom of sodealloc() to the top, eliminating a second set of identical mutex operations at the bottom. This allows brief exceeding of the max sockets limit, but only by sockets in the last stages of being torn down.	2006-08-02 00:45:27 +00:00
Robert Watson	eaa6dfbcc2	Reimplement socket buffer tear-down in sofree(): as the socket is no longer referenced by other threads (hence our freeing it), we don't need to set the can't send and can't receive flags, wake up the consumers, perform two levels of locking, etc. Implement a fast-path teardown, sbdestroy(), which flushes and releases each socket buffer. A manual dom_dispose of the receive buffer is still required explicitly to GC any in-flight file descriptors, etc, before flushing the buffer. This results in a 9% UP performance improvement and 16% SMP performance improvement on a tight loop of socket();close(); in micro-benchmarking, but will likely also affect CPU-bound macro-benchmark performance.	2006-08-01 10:30:26 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
Robert Watson	809c2b789c	Update various uipc_socket.c comments, and reformat others.	2006-07-23 20:36:04 +00:00
Robert Watson	a152f8a361	Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn	2006-07-21 17:11:15 +00:00
Robert Watson	5cd1a27145	Change comment on soabort() to more accurately describe how/when soabort() is used. Remove trailing white space.	2006-07-16 23:09:39 +00:00
Robert Watson	5908c617bb	Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel) return void, so don't implement no-op versions of these functions. Instead, consistently check if those switch pointers are NULL before invoking them.	2006-07-11 23:18:28 +00:00
Robert Watson	f949ae9b31	When pru_attach() fails, call sodealloc() on the socket rather than using sorele() and the full tear-down path. Since protocol state allocation failed, this is not required (and is arguably undesirable). This matches the behavior of sonewconn() under the same circumstances.	2006-07-11 21:56:58 +00:00
Robert Watson	721150ad8f	When retrieving SO_ERROR via getsockopt(), hold the socket lock around the retrieval and replacement with 0. MFC after: 1 week	2006-06-18 19:02:49 +00:00
Robert Watson	b37ffd3189	Move some functions and definitions from uipc_socket2.c to uipc_socket.c: - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month	2006-06-10 14:34:07 +00:00
Robert Watson	e02421f3fb	Rearrange code in soalloc() so that it's less indented by returning early if uma_zalloc() from the socket zone fails. No functional change. MFC after: 1 week	2006-06-08 22:33:18 +00:00
Robert Watson	0cec9959e8	Assert that sockets passed into soabort() not be SQ_COMP or SQ_INCOMP, since that removal should have been done a layer up. MFC after: 3 months	2006-04-23 18:15:54 +00:00
Robert Watson	28ea180136	Add missing 'not' to SQ_COMP comment. MFC after: 3 months	2006-04-23 15:37:23 +00:00
Robert Watson	6ca35d4b81	Move handling of SQ_COMP exception case in sofree() to the top of the function along with the remainder of the reference checking code. Move comment from body to header with remainder of comments. Inclusion of a socket in a completed connection queue counts as a true reference, and should not be handled as an under-documented edge case. MFC after: 3 months	2006-04-23 15:33:38 +00:00
Robert Watson	bc725eafc7	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
Robert Watson	ac45e92ff2	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
Robert Watson	7f689de232	Assert so->so_pcb is NULL in sodealloc() -- the protocol state should not be present at this point. We will eventually remove this assert because the socket layer should never look at so_pcb, but for now it's a useful debugging tool. MFC after: 3 months	2006-04-01 10:45:52 +00:00
Robert Watson	220c1357ed	Add a somewhat sizable comment documenting the semantics of various kernel socket calls relating to the creation and destruction of sockets. This will eventually form the foundation of socket(9), but is currently in too much flux to do so. MFC after: 3 months	2006-04-01 10:43:02 +00:00
Robert Watson	92c07a345e	Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.	2006-03-16 07:03:14 +00:00
Robert Watson	93709ad0be	As with socket consumer references (so_count), make sofree() return without GC'ing the socket if a strong protocol reference to the socket is present (SS_PROTOREF).	2006-03-15 12:45:35 +00:00
Robert Watson	13f322c2fc	Improve consistency of return() style. MFC after: 3 days	2006-02-12 15:00:27 +00:00
Robert Watson	b8ae1cd619	Add sosend_dgram(), a greatly reduced and simplified version of sosend() intended for use solely with atomic datagram socket types, and relies on the previous break-out of sosend_copyin(). Changes to allow UDP to optionally use this instead of sosend() will be committed as a follow-up.	2006-01-13 10:22:01 +00:00
John Baldwin	398293a8de	Fix snderr() to not leak the socket buffer lock if an error occurs in sosend(). Robert accidentally changed the snderr() macro to jump to the out label which assumes the lock is already released rather than the release label which drops the lock in his previous change to sosend(). This should fix the recent panics about returning from write(2) with the socket lock held and the most recent LOR on current@.	2005-11-29 23:07:14 +00:00
Robert Watson	66dd8a6f99	Move zero copy statistics structure before sosend_copyin(). MFC after: 1 month Reported by: tinderbox, sam	2005-11-28 21:45:36 +00:00
Robert Watson	a725629cf8	Break out functionality in sosend() responsible for building mbuf chains and copying in mbufs from the body of the send logic, creating a new function sosend_copyin(). This changes makes sosend() almost readable, and will allow the same logic to be used by tailored socket send routines. MFC after: 1 month Reviewed by: andre, glebius	2005-11-28 18:09:03 +00:00
Andre Oppermann	34333b16cd	Retire MT_HEADER mbuf type and change its users to use MT_DATA. Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag. Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-02 13:46:32 +00:00
Robert Watson	d374e81efd	Push the assignment of a new or updated so_qlimit from solisten() following the protocol pru_listen() call to solisten_proto(), so that it occurs under the socket lock acquisition that also sets SO_ACCEPTCONN. This requires passing the new backlog parameter to the protocol, which also allows the protocol to be aware of changes in queue limit should it wish to do something about the new queue limit. This continues a move towards the socket layer acting as a library for the protocol. Bump __FreeBSD_version due to a change in the in-kernel protocol interface. This change has been tested with IPv4 and UNIX domain sockets, but not other protocols.	2005-10-30 19:44:40 +00:00
Paul Saab	53f5742d33	Allow 32bit get/setsockopt with SO_SNDTIMEO or SO_RECVTIMEO to work.	2005-10-27 04:26:35 +00:00
Robert Watson	8434c29b28	Add three new read-only socket options, which allow regression tests and other applications to query the state of the stack regarding the accept queue on a listen socket: SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog) SO_LISTENQLEN Return the value of so_qlen (complete sockets) SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets) Minor white space tweaks to existing socket options to make them consistent. Discussed with: andre MFC after: 1 week	2005-09-18 21:08:03 +00:00
Robert Watson	bc6b8b5d64	Fix spelling in a comment. MFC after: 3 days	2005-09-18 10:46:34 +00:00
Maxim Konovalov	aada5cccd8	Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected sockets. Pointed out by: rwatson	2005-09-15 13:18:05 +00:00
Maxim Konovalov	c5cff17017	o Return ENOTCONN when shutdown(2) on non-connected socket. PR: kern/84761 Submitted by: James Juran R-test: tools/regression/sockets/shutdown MFC after: 1 month	2005-09-15 11:45:36 +00:00
Gleb Smirnoff	016e62123a	In soreceive(), when a first mbuf is removed from socket buffer use sockbuf_pushsync(). Previous manipulation could lead to an inconsistent mbuf. Reviewed by: rwatson	2005-09-06 17:05:11 +00:00
Kelly Yancey	dcb5fef5db	Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std 1003.1 (POSIX).	2005-08-01 21:15:09 +00:00
George V. Neville-Neil	0d52d7b01a	Fix for PR 83885. Make sure that there actually is a next packet before setting nextrecord to that field. PR: 83885 Submitted by: hirose@comm.yamaha.co.jp Obtained from: Patch suggested in the PR MFC after: 1 week	2005-07-28 10:10:01 +00:00
Suleiman Souhlal	571dcd15e2	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone	2005-07-01 16:28:32 +00:00
Brooks Davis	fc74a9f93a	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam	2005-06-10 16:49:24 +00:00
Scott Long	8bde93598a	Drat! Committed from the wrong branch. Restore HEAD to its previous goodness.	2005-06-09 19:59:09 +00:00
Scott Long	76b472dbda	Back out 1.68.2.26. It was a mis-guided change that was already backed out of HEAD and should not have been MFC'd. This will restore UDP socket functionality, which will correct the recent NFS problems. Submitted by: rwatson	2005-06-09 19:56:38 +00:00
Andrew Gallatin	92dd256bd4	Allow sends sent from non page-aligned userspace addresses to be considered for zero-copy sends. Reviewed by: alc Submitted by: Romer Gil at Rice University	2005-06-05 17:13:23 +00:00
Robert Watson	a59f81d263	Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it now matches do_setopt_accept_filter(). Slightly reformulate the logic to match the optimistic allocation of storage for the argument in advance, and slightly expand the coverage of the socket lock.	2005-03-12 12:57:18 +00:00
Robert Watson	56856fbfb4	Remove an additional commented out reference to a possible future sx lock.	2005-03-11 19:16:02 +00:00
Robert Watson	2b37548a71	When setting up a socket in socreate(), there's no need to lock the socket lock around knlist_init(), so don't. Hard code the setting of the socket reference count to 1 rather than using soref() to avoid asserting the socket lock, since we've not yet exposed the socket to other threads. This removes two mutex operations from each socket allocation.	2005-03-11 16:30:02 +00:00
Robert Watson	5fab68b19e	Remove suggestive sx_init() comment in soalloc(). We will have something like this at some point, but for now it clutters the source.	2005-03-11 16:26:33 +00:00
Robert Watson	0daccb9c94	In the current world order, solisten() implements the state transition of a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn	2005-02-21 21:58:17 +00:00
Robert Watson	a00428ef92	In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING, only call the protocol's pru_rcvd() if the protocol has the flag PR_WANTRCVD set. This brings that instance of pru_rcvd() into line with the rest, which do check the flag. MFC after: 3 days	2005-02-20 15:54:44 +00:00
Robert Watson	a7ae36bc45	Correct a typo in the comment describing soreceive_rcvoob(). MFC after: 3 days	2005-02-18 19:15:22 +00:00

1 2 3 4 5 ...

280 Commits