freebsd-skq

Author	SHA1	Message	Date
rwatson	b3be1c6e3b	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
jhb	a1af9ecca4	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
emax	b3c91fe7cc	Update comment. soalloc() is no longer performing M_WAITOK memory allocations. Submitted by: ru MFC after: 3 days	2009-02-10 20:27:05 +00:00
emax	9bddc26cc8	Allow unprivileged users to run l2ping(8). MFC after: 1 month	2009-02-04 22:44:09 +00:00
des	a1e1ad22e0	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
des	66f807ed8b	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
emax	5d25ad7852	Implement ratelimiting for debug messages. For now, allow at most one message per second. In the future might add a sysctl knob for each socket family to fine tune this. MFC after: 1 week	2008-08-01 00:36:43 +00:00
emax	dbb1414312	Increase maximum input queue size limit for raw Bluetooth HCI sockets. MFC after: 3 days	2008-08-01 00:16:40 +00:00
emax	8ea609367b	Fix locking bug, i.e. lock "wildcard" matched pcb before return.	2008-08-01 00:13:32 +00:00
emax	bb4c6de0cf	Introduce support for Bluetooth SCO sockets. This is based on older code that was revisted. MFC after: 3 months	2008-07-30 22:41:23 +00:00
emax	ff226f6ee0	Fix locking issue in ng_btsocket_l2cap_ctloutput() Submitted by: Heiko Wundram (Beenic) < wundram at beenic dot net > MFC after: 3 days	2007-10-31 16:17:20 +00:00
emax	0cf18b2c7c	Allow RFCOMM servers to bind to a ''wildcard'' RFCOMM channel zero (0). Actual RFCOMM channel will be assigned after listen(2) call is done on a RFCOMM socket bound to a ''wildcard'' RFCOMM channel zero (0). Address locking issues in ng_btsocket_rfcomm_bind() Submitted by: Heiko Wundram (Beenic) < wundram at beenic dot net > MFC after: 1 week	2007-10-29 19:06:47 +00:00
emax	e04fc3e9d0	Return EADDRNOTAVAIL instead of EDESTADDRREQ error when listen(2) is called on improperly bound socket. Suggested by: Iain Hibbert Approved by: re (kensmith) MFC after: 3 days	2007-08-23 16:55:22 +00:00
emax	0d312fb512	Replace sosend() with direct call to .pru_send method on the L2CAP socket. This is to avoid LOR with sx(9) lock in sblock() called from sosend_generic(). Approved by: re (kensmith) MFC after: 1 week	2007-06-21 19:55:49 +00:00
rwatson	79a2e40812	Universally adopt most conventional spelling of acquire.	2007-05-27 20:50:23 +00:00
maxim	f01ad312df	o Update a comment: sonewconn() lives in uipc_socket.c now.	2007-03-26 18:17:57 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
emax	06408c929f	- Catch up with ongoing rwatson's socket work; - Fix a couple of LORs and panics; - Temporarily remove the code that tries to cleanup sockets that stuck on accepting queues (both complete and incomplete). I'm taking an ostrich approach here until I find a better way to deal with sockets that were disconnected before accepting (i.e. while socket was on complete or incomplete accept queue).	2006-08-25 17:53:13 +00:00
emax	2b65f3157a	Define mtu as u_int16_t not as int. This should fix problem with rfcomm on sparc64. Reported by: Andrew Belashov <bel at orel dot ru> Tested by: Andrew Belashov <bel at orel dot ru> MFC after: 3 days	2006-08-24 16:51:02 +00:00
rwatson	40868fda8a	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
rwatson	720efebbba	Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn	2006-07-21 17:11:15 +00:00
emax	2ec6a6b5e7	Add new SIOC_HCI_RAW_NODE_LIST_NAMES ioctl. User-space applications can use this ioctl to obtain the list of HCI nodes. User-space application is expected to preallocate 'ng_btsocket_hci_raw_node_list_names' structure and set limit in 'num_nodes' field. The 'nodes' field should be allocated as well and it should have space for at least 'num_nodes' elements. The SIOC_HCI_RAW_NODE_LIST_NAMES should be issued on bound raw HCI socket. It does not really really matter what HCI name the socket is bound to, as long as it is not empty. MFC after: 1 week	2006-05-17 00:13:07 +00:00
rwatson	5479e5d692	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
rwatson	8622e776f9	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
ru	dcace5669d	Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.	2005-11-09 13:29:16 +00:00
rwatson	49831ed8da	Push the assignment of a new or updated so_qlimit from solisten() following the protocol pru_listen() call to solisten_proto(), so that it occurs under the socket lock acquisition that also sets SO_ACCEPTCONN. This requires passing the new backlog parameter to the protocol, which also allows the protocol to be aware of changes in queue limit should it wish to do something about the new queue limit. This continues a move towards the socket layer acting as a library for the protocol. Bump __FreeBSD_version due to a change in the in-kernel protocol interface. This change has been tested with IPv4 and UNIX domain sockets, but not other protocols.	2005-10-30 19:44:40 +00:00
emax	1dfaa5f929	Fix multiple typos in the mutex names. This fixes false positive (and pretty strange looking too) LORs I have seen on my system. Pointy hat to goes to me. MFC after: 1 day	2005-08-23 00:50:59 +00:00
emax	abb41f91bc	Address minor locking issues. Use taskqueue_swi instead of taskqueue_swi_giant. MFC after: 1 month	2005-07-28 17:43:20 +00:00
emax	3ca52382ee	Remove PR_ATOMIC flag in ng_btsocket_protosw[] for BLUETOOTH_PROTO_RFCOMM protocol. RFCOMM is a SOCK_STREAM protocol not SOCK_SEQPACKET. This was a serious bug caused by cut-and-paste. I'm surprised it did not bite me before. Dunce hat goes to me. MFC after: 3 days	2005-04-06 20:54:05 +00:00
emax	04072c22e0	In ng_btsocket_rfcomm_receive_frame() correctly set length variable when EA bit is set in hdr->length (16-bit length). This currently has no effect on the rest of the code. It just fixes the debug message. MFC After: 3 weeks	2005-04-06 18:55:58 +00:00
sam	91d370b82c	move ptr use down to after null check Noticed by: Coverity Prevent analysis tool Reviewed by: emax	2005-02-26 02:31:34 +00:00
rwatson	26df80bf2c	In the current world order, solisten() implements the state transition of a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn	2005-02-21 21:58:17 +00:00
imp	a50ffc2912	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
mlaier	ea0fd1c083	Move ng_socket and ng_btsocket initialization to SI_SUB_PROTO_DOMAIN as they call net_add_domain(). Calling this function too early (or late) breaks assertations about the global domains list. Actually it should be forbidden to call net_add_domain() outside of SI_SUB_PROTO_DOMAIN completely as there are many places where we traverse the domains list unprotected, but for now we allow late calls (mostly to support netgraph). In order to really fix this we have to lock the domains list in all places or find another way to ensure that we can safely walk the list while another thread might be adding a new domain. Spotted by: se Reviewed by: julian, glebius PR: kern/73321 (partly)	2004-11-30 22:28:50 +00:00
phk	027fce30f5	Initialize struct pr_userreqs in new/sparse style and fill in common default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.	2004-11-08 14:44:54 +00:00
rwatson	4b81ce6dd2	Push acquisition of the accept mutex out of sofree() into the caller (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>	2004-10-18 22:19:43 +00:00
emax	3ba687c4f4	Add '#include <sys/mbuf.h>' to fix the kernel build.	2004-06-25 23:03:33 +00:00
rwatson	081dc461f1	Correct merge-o: make sure to unlock symmetrically socket buffer locks on bluetooth sockets when clearing upcall flags. Submitted by: emax	2004-06-18 05:09:42 +00:00
rwatson	855c4bb01f	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.	2004-06-17 22:48:11 +00:00
rwatson	f2c0db1521	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
rwatson	82295697cd	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
rwatson	576b26bafd	Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.	2004-06-02 04:15:39 +00:00
rwatson	bddadcf71a	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
julian	c85e63d425	Switch to using C99 sparse initialisers for the type methods array. Should make no binary difference. Submitted by: Gleb Smirnoff <glebius@cell.sick.ru> Reviewed by: Harti Brandt <harti@freebsd.org> MFC after: 1 week	2004-05-29 00:51:19 +00:00
emax	8a65e07a87	Address few style issues pointed out by bde Reviewed by: bde, ru	2004-04-27 16:38:15 +00:00
emax	a2939bc1de	Make sure RFCOMM multiplexor channel does not hang in DISCONNECTING state. Apparently it happens when both devices try to disconnect RFCOMM multiplexor channel at the same time. The scenario is as follows: - local device initiates RFCOMM connection to the remote device. This creates both RFCOMM multiplexor channel and data channel; - remote device terminates RFCOMM data channel (inactivity timeout); - local device acknowledges RFCOMM data channel termination. Because there is no more active data channels and local device has initiated connection it terminates RFCOMM multiplexor channel; - remote device does not acknowledges RFCOMM multiplexor channel termination. Instead it sends its own request to terminate RFCOMM multiplexor channel. Even though local device acknowledges RFCOMM multiplexor channel termination the remote device still keeps L2CAP connection open. Because of hanging RFCOMM multiplexor channel subsequent RFCOMM connections between local and remote devices will fail. Reported by: Johann Hugo <jhugo@icomtek.csir.co.za>	2004-04-23 20:21:17 +00:00
rwatson	b0b5f961bd	Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam	2004-03-01 03:14:23 +00:00
harti	5e802bbf2a	Replace deprecated NG_NODELEN with the new NG_NODESIZ. There is one problem here still to be solved: the sockaddr_hci has still a 16 byte field for the node name. The code currently does not correctly use the length field in the sockaddr to handle the address length, so node names get truncated to 15 characters when put into a sockaddr_hci.	2004-01-26 15:19:43 +00:00
alfred	fc379f67bb	NULL -> 0 where appropriate.	2003-12-24 18:51:01 +00:00
rwatson	9c969b771a	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00

1 2

59 Commits