freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	7737de9515	Check return value from soaccept(). Coverity: 1376209	2017-06-14 16:13:20 +00:00
Gleb Smirnoff	779f106aa1	Listening sockets improvements. o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set(). o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information. o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc. o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()? Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770	2017-06-08 21:30:34 +00:00
Pedro F. Giffuni	053359b7f4	sys/netgraph: spelling fixes in comments. No functional change.	2016-04-29 21:25:05 +00:00
Gleb Smirnoff	45c203fce2	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
Gleb Smirnoff	2c284d9395	Remove IPX support. IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 02:58:48 +00:00
Gleb Smirnoff	9165bf6297	In r248885 I have reduced size of fake uio resid that ng_ksocket(4) passes to the soreceive(). This exposed a bug. When reading from a raw socket, when our fake limit is depleted, we receive a truncated mbuf chain, with m->m_pkthdr.len > m_length(m). The first problem is that MSG_TRUNC was not handled. The second one is that we didn't reinit uio_resid in our endless loop (neither flags), and if socket buffer contained several records, then we quickly deplete our fake limit. The third bug, actually introduced in r248885, is that MJUMPAGESIZE isn't enough to handle maximum packet that ng_ksocket(4) can theoretically receive. Changes: - Reinit uio_resid and flags before every call to soreceive(). - Set maximum acceptable size of packet to IP_MAXPACKET. As for now the module doesn't support INET6. - Properly handle MSG_TRUNC return from soreceive(). PR: 184601 Submitted & tested by: Viktor Velichkin <avisom yandex.ru> Sponsored by: Nginx, Inc.	2013-12-21 14:41:32 +00:00
Gleb Smirnoff	9a4d9e198a	Revamp mbuf handling in ng_ksocket_incoming2(): - Clear code that workarounded a bug in FreeBSD 3, and even predated import of netgraph(4). - Clear workaround for m_nextpkt pointing into next record in buffer (fixed in r248884). Assert that m_nextpkt is clear. - Do not rely on SOCK_STREAM sockets containing M_PKTHDR mbufs. Create a header ourselves and attach chain to it. This is correct fix for kern/154676. PR: kern/154676 Sponsored by: Nginx, Inc	2013-03-29 14:04:26 +00:00
Gleb Smirnoff	6b1781e3ea	Whitespace.	2013-03-29 13:53:14 +00:00
Gleb Smirnoff	d09c774bb5	Non-functional cleanup of ng_ksocket_incoming2().	2013-03-29 13:51:01 +00:00
Andre Oppermann	c9b652e3e8	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
Gleb Smirnoff	38f1b2d1bc	Revert r220768 for ng_ksocket. This node is special and when it is cloning, its constructor method may be called in a context that isn't allowed to sleep. Noticed by: Vadim Goncharov	2012-05-24 18:22:57 +00:00
Ed Schouten	dc15eac046	Use strchr() and strrchr(). It seems strchr() and strrchr() are used more often than index() and rindex(). Therefore, simply migrate all kernel code to use it. For the XFS code, remove an empty line to make the code identical to the code in the Linux kernel.	2012-01-02 12:12:10 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Gleb Smirnoff	674d86bf91	Node constructor methods are supposed to be called in syscall context always. Convert nodes to consistently use M_WAITOK flag for memory allocation. Reviewed by: julian	2011-04-18 09:12:27 +00:00
Fabien Thomas	f9e4dd7122	Fix an invalid parameter detected by INVARIANT and confirmed by r193272.	2010-05-06 20:58:23 +00:00
Alexander Motin	5c100aeaad	Make ng_ksocket fulfill lower protocol stack layers alignment requirements on platforms with strict alignment constraints. This fixes kernel panics on arm and probably other architectures. PR: sparc64/80410	2010-03-31 22:16:05 +00:00
Stanislav Sedov	fe1d3f15f6	- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks	2009-06-28 21:49:43 +00:00
John Baldwin	74fb0ba732	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Alexander Motin	6e7ed93017	Send only one incoming notification at a time to reduce queue trashing and improve performance. Remove waitflag argument from ng_ksocket_incoming2(), it means nothing as function call was queued by netgraph. Remove node validity check, as node validity guarantied by netgraph. Update comments.	2008-03-07 21:12:56 +00:00
Bruce M Simpson	4ae54e2fad	In the output path, mask off M_BCAST\|M_MCAST so as to prevent incorrect addressing if a packet is later re-encapsulated and sent to a non-broadcast, non-multicast destination after being received on the ng_ksocket input hook. PR: 106999 Submitted by: Kevin Lahey MFC after: 4 weeks	2007-02-09 12:35:29 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
Ruslan Ermilov	aa00bc830f	Clear csum_flags after reading data from socket buffer. Otherwise, if ksocket is connected to an interface-type node somewhere later in the graph (e.g., ng_eiface or ng_iface), the csum_data may be applied to a wrong packet (if we encapsulate Ethernet or IP). MFC after: 3 days	2006-02-21 13:04:39 +00:00
Gleb Smirnoff	e71fefbe21	When we read data from socket buffer using soreceive() the socket layer does not clear m_nextpkt for us. The mbufs are sent into netgraph and then, if they contain a TCP packet delivered locally, they will enter socket code again. They can pass the first assert in sbappendstream() because m_nextpkt may be set not in the first mbuf, but deeper in the chain. So the problem will trigger much later, when local program reads the data from socket, and an mbuf with m_nextpkt becomes a first one. This bug was demasked by revision 1.54, when I made upcall queueable. Before revision 1.54 there was a very small probability to have 2 mbufs in GRE socket buffer, because ng_ksocket_incoming2() dequeued the first one immediately. - in ng_ksocket_incoming2() clear m_nextpkt on all mbufs read from socket. - restore rev. 1.54 change in ng_ksocket_incoming(). PR: kern/84952 PR: kern/82413 In collaboration with: rwatson	2005-09-06 17:15:42 +00:00
Gleb Smirnoff	d7f56eabab	Backout revision 1.54, because it exposes a worse problem, than it fixes. I believe the problem lives somewhere outside ng_ksocket, but until it is found, let the node be working. PR: kern/84952 PR: kern/82413 MFC after: 3 days	2005-08-25 07:21:15 +00:00
Gleb Smirnoff	f6c9d18d2f	Catch up with new ng_send_fn1() interface.	2005-05-16 17:07:39 +00:00
Gleb Smirnoff	0f4a3524dd	When used as divert socket we need to decouple stack when node is entered from socket side. Use ng_queue_fn() instead of ng_send_fn().	2005-05-13 11:40:08 +00:00
Gleb Smirnoff	bc90ff47ff	Fix panics with misconfigured routing: - Backout previous revision, the check is useless. - Turn node to queue mode, since it is edge node. Reported by: sem	2005-04-18 11:32:17 +00:00
Gleb Smirnoff	f1c6a420b1	Reimplement recursion protection, checking whether current thread holds sockbuf mutex. Reviewed by: rwatson	2005-02-19 14:41:49 +00:00
Gleb Smirnoff	848a25c773	Remove a recursion protection, which we inherited from splnet() netgraph times. Now several threads may write data to ng_ksocket. Locking of socket is done in sosend(). Reviewed by: archie, julian, rwatson MFC after: 2 weeks	2005-02-16 16:00:35 +00:00
Gleb Smirnoff	d96bd8d144	Allocate enough space for new tag. Pointy hat to: glebius	2005-02-12 16:26:36 +00:00
Gleb Smirnoff	b07785ef50	When netgraph(4) was converted to use mbuf_tags(9) instead of meta-data a definite setup was broken: two ng_ksockets are connected to each other, connect()ed to different remote hosts, and bind()ed to different local interfaces. In this case one ng_ksocket is fooled with tag from the other one. Put node id into tag. In rcvdata method utilize tag only if it has our own id inside or id equals zero. The latter case is added to support packets send by some third, not ng_ksocket node. MFC after: 1 week	2005-02-12 14:54:19 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Robert Watson	42ec1da481	In FreeBSD 5.x, curthread is always defined, so we don't need to to test and optionally use &thread0 if it's NULL. Spotted by: julian	2004-09-02 19:53:13 +00:00
Julian Elischer	327b288e5c	Convert Netgraph to use mbuf tags to pass its meta information around. Thanks to Sam for importing tags in a way that allowed this to be done. Submitted by: Gleb Smirnoff <glebius@cell.sick.ru> Also allow the sr and ar drivers to create netgraph versions of their modules. Document the change to the ksocket node.	2004-06-25 19:22:05 +00:00
Robert Watson	9535efc00d	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.	2004-06-17 22:48:11 +00:00
Robert Watson	c0b99ffa02	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
Robert Watson	395a08c904	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
Robert Watson	2658b3bb8e	Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.	2004-06-02 04:15:39 +00:00
Robert Watson	36568179e3	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
Julian Elischer	f8aae7776f	Switch to using C99 sparse initialisers for the type methods array. Should make no binary difference. Submitted by: Gleb Smirnoff <glebius@cell.sick.ru> Reviewed by: Harti Brandt <harti@freebsd.org> MFC after: 1 week	2004-05-29 00:51:19 +00:00
Hartmut Brandt	87e2c66a6a	Get rid of the deprecated LEN constants in favour of the new SIZ constants that include the trailing \0 byte.	2004-01-26 14:05:31 +00:00
Ruslan Ermilov	7304a833fb	Replaced two bzero() calls with the M_ZERO flag to malloc(). Reviewed by: julian	2003-12-17 11:48:18 +00:00
Jeffrey Hsu	33583c6f18	Add Protocol Independent Multicast protocol. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-20 22:11:58 +00:00
Archie Cobbs	7d78074030	Add missing braces. Submitted by: Andrew Lankford <arlankfo@141.com>	2003-04-28 20:38:05 +00:00
Benno Rice	fcfa0b48b3	Reference the socket we're accepting.	2002-09-14 08:56:10 +00:00
Benno Rice	a7d83226f0	Remember who asked for a connect or accept operation so we can actually tell them when it's done. Reviewed by: archie	2002-09-11 00:52:50 +00:00
Archie Cobbs	facfd88935	Don't use "NULL" when "0" is really meant.	2002-08-22 00:30:03 +00:00
Archie Cobbs	f0184ff8e3	Fix GCC warnings caused by initializing a zero length array. In the process, simply things a bit by getting rid of 'struct ng_parse_struct_info' which was useless because it only contained one field. MFC after: 2 weeks	2002-05-31 23:48:03 +00:00
Seigo Tanimura	4cc20ab1f0	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu	2002-05-31 11:52:35 +00:00

1 2

79 Commits