freebsd-dev

Author	SHA1	Message	Date
Eugene Grosbein	4824d78872	listen(2): improve administrator control over logging As documented in listen.2 manual page, the kernel emits a LOG_DEBUG syslog message if a socket listen queue overflows. For some appliances, it may be desirable to change the priority to some higher value like LOG_INFO while keeping other debugging suppressed. OTOH there are cases when such overflows are normal and expected. Then it may be desirable to suppress overflow logging altogether, so that dmesg buffer is not flooded over long run. In addition to existing sysctl kern.ipc.sooverinterval, introduce new sysctl kern.ipc.sooverprio that defaults to 7 (LOG_DEBUG) to preserve current behavior. It may be changed to any value in a range of 0..7 for corresponding priority or to -1 to suppress logging. Document it in the listen.2 manual page. MFC after: 1 month	2023-05-01 03:26:44 +07:00
Mark Johnston	b4b33821fa	ktls: Fix interlocking between ktls_enable_rx() and listen(2) The TCP_TXTLS_ENABLE and TCP_RXTLS_ENABLE socket option handlers check whether the socket is listening socket and fail if so, but this check is racy. Since we have to lock the socket buffer later anyway, defer the check to that point. ktls_enable_tx() locks the send buffer's I/O lock, which will fail if the socket is a listening socket, so no explicit checks are needed. In ktls_enable_rx(), which does not acquire the I/O lock (see the review for some discussion on this), use an explicit SOLISTENING() check after locking the recv socket buffer. Otherwise, a concurrent solisten_proto() call can trigger crashes and memory leaks by wiping out socket buffers as ktls_enable_*() is modifying them. Also make sure that a KTLS-enabled socket can't be converted to a listening socket, and use SOCK_(SEND\|RECV)BUF_LOCK macros instead of the old ones while here. Add some simple regression tests involving listen(2). Reported by: syzkaller MFC after: 2 weeks Reviewed by: gallatin, glebius, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38504	2023-03-21 16:04:00 -04:00
Mark Johnston	636b19ead4	tcp: Disallow re-connection of a connected socket soconnectat() tries to ensure that one cannot connect a connected socket. However, the check is racy and does not really prevent two threads from attempting to connect the same TCP socket. Modify tcp_connect() and tcp6_connect() to perform the check again, this time synchronized by the inpcb lock, under which we call soisconnecting(). Reported by: syzkaller Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38507	2023-02-14 10:07:19 -05:00
Gleb Smirnoff	a0102dee34	sockets: in sousrsend() pass down the error to aio(4) This somewhat undermines the initial goal of sousrsend() to have all the special error handling for a write on a socket in a single place. The aio(4) needs to see EWOULDBLOCK to re-schedule the job. Because aio(4) handles return from soreceive() and sousrsend() with the same code, we can't check for (error == 0 && done < job_nbytes). Keeping this exclusion for aio(4) seems a lesser evil. Fixes: `7a2c93b86e`	2023-02-01 13:03:10 -08:00
Gleb Smirnoff	7a2c93b86e	sockets: provide sousrsend() that does socket specific error handling Sockets have special handling for EPIPE on a write, that was spread out into several places. Treating transient errors is also special - if protocol is atomic, than we should ignore any changes to uio_resid, a transient error means the write had completely failed (see `d2b3a0ed31`). - Provide sousrsend() that expects a valid uio, and leave sosend() for kernel consumers only. Do all special error handling right here. - In dofilewrite() don't do special handling of error for DTYPE_SOCKET. - For send(2), write(2) and aio_write(2) call into sousrsend() and remove error handling for kern_sendit(), soo_write() and soaio_process_job(). PR: 265087 Reported by: rz-rpi03 at h-ka.de Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35863	2022-12-14 10:02:44 -08:00
Mateusz Guzik	ebdf27b6f3	uipc: remove accept_mtx It is unused since `779f106aa1` ("Listening sockets improvements.") Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-12-11 02:47:07 +00:00
Michael Tuexen	bc0d407676	Revert "listen(): improve POSIX compliance" This reverts commit `76e6e4d72f`. Several programs in the tree use -1 instead of INT_MAX to use the maximum value. Thanks to Eugene Grosbein for pointing this out.	2022-10-12 04:33:00 +02:00
Michael Tuexen	76e6e4d72f	listen(): improve POSIX compliance Ensure that a negative backlog argument is handled as it if was 0. Reviewed by: markj@, glebius@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31821	2022-10-11 22:46:51 +02:00
Alexander V. Chernikov	f66968564d	protocols: make socket buffers ioctl handler changeable Allow to set custom per-protocol handlers for the socket buffers ioctls by introducing pr_setsbopt callback with the default value set to the currently-used sbsetopt(). Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36746	2022-09-28 10:20:09 +00:00
Gleb Smirnoff	e80062a2d4	tcp: avoid call to soisconnected() on transition to ESTABLISHED This call existed since pre-FreeBSD times, and it is hard to understand why it was there in the first place. After `6f3caa6d81` it definitely became necessary always and commit message from `f1ee30ccd6` confirms that. Now that `6f3caa6d81` is effectively backed out by `07285bb4c2`, the call appears to be useful only for sockets that landed on the incomplete queue, e.g. sockets that have accept_filter(9) enabled on them. Provide a new TCP flag to mark connections that are known to be on the incomplete queue, and call soisconnected() only for those connections. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36488	2022-09-08 09:16:04 -07:00
Gleb Smirnoff	24af7808fa	protosw: repair protocol selection logic in socket(2) Pointy hat to: glebius Fixes: `61f7427f02`	2022-08-30 21:19:46 -07:00
Gleb Smirnoff	61f7427f02	protosw: cleanup protocols that existed merely to provide pr_input Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IPv4 datagrams (later copied to IPv6). This story ended with `78b1fc05b2`. These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch packets in ip_input(), they would also be returned by pffindproto() if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw sockets to work correctly, all the entries were pointing at raw_usrreq differentiating only in the value of pr_protocol. With `78b1fc05b2` all these entries are no longer needed, as ip_protox is independent of protosw. Any socket syscall requesting SOCK_RAW type would end up with rip_protosw. And this protosw has its pr_protocol set to 0, allowing to mark socket with any protocol. For IPv6 raw socket the change required two small fixes: o Validate user provided protocol value o Always use protocol number stored in inp in rip6_attach, instead of protosw value, which is now always 0. Differential revision: https://reviews.freebsd.org/D36380	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	8624f4347e	divert: declare PF_DIVERT domain and stop abusing PF_INET The divert(4) is not a protocol of IPv4. It is a socket to intercept packets from ipfw(4) to userland and re-inject them back. It can divert and re-inject IPv4 and IPv6 packets today, but potentially it is not limited to these two protocols. The IPPROTO_DIVERT does not belong to known IP protocols, it doesn't even fit into u_char. I guess, the implementation of divert(4) was done the way it is done basically because it was easier to do it this way, back when protocols for sockets were intertwined with IP protocols and domains were statically compiled in. Moving divert(4) out of inetsw accomplished two important things: 1) IPDIVERT is getting much closer to be not dependent on INET. This will be finalized in following changes. 2) Now divert socket no longer aliases with raw IPv4 socket. Domain/proto selection code won't need a hack for SOCK_RAW and multiple entries in inetsw implementing different flavors of raw socket can merge into one without requirement of raw IPv4 being the last member of dom_protosw. Differential revision: https://reviews.freebsd.org/D36379	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	e7d02be19d	protosw: refactor protosw and domain static declaration and load o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see `7b187005d1`), and later reiterated in 2006 by rwatson@ (see `6fbb9cf860`). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ `dff3237ee5`). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232	2022-08-17 11:50:32 -07:00
Gleb Smirnoff	f277746e13	protosw: change prototype for pr_control For some reason protosw.h is used during world complation and userland is not aware of caddr_t, a relic from the first version of C. Broken buildworld is good reason to get rid of yet another caddr_t in kernel. Fixes: `886fc1e804`	2022-08-12 12:08:18 -07:00
Gleb Smirnoff	07285bb4c2	tcp: utilize new solisten_clone() and solisten_enqueue() This streamlines cloning of a socket from a listener. Now we do not drop the inpcb lock during creation of a new socket, do not do useless state transitions, and put a fully initialized socket+inpcb+tcpcb into the listen queue. Before this change, first we would allocate the socket and inpcb+tcpcb via tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock pcb and put this onto incomplete queue (see `6f3caa6d81`). Then, after sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED, insert into inpcb hash, finalize initialization of tcpcb. And then, in call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call soisconnected(). This call would lock the listening socket once again with a LOR protection sequence and then we would relocate the socket onto the complete queue and only now it is ready for accept(2). Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36064	2022-08-10 11:09:34 -07:00
Gleb Smirnoff	8f5a0a2e4f	sockets: provide solisten_clone(), solisten_enqueue() as alternative KPI to sonewconn(). The latter has three stages: - check the listening socket queue limits - allocate a new socket - call into protocol attach method - link the new socket into the listen queue of the listening socket The attach method, originally designed for a creation of socket by the socket(2) syscall has slightly different semantics than attach of a socket cloned by listener. Make it possible for protocols to call into the first stage, then perform a different attach, and then call into the final stage. The first stage, that checks limits and clones a socket is called solisten_clone(), and the function that enqueues the socket is solisten_enqueue(). Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D36063	2022-08-10 11:09:34 -07:00
Alexander V. Chernikov	be1f485d7d	sockets: add MSG_TRUNC flag handling for recvfrom()/recvmsg(). Implement Linux-variant of MSG_TRUNC input flag used in recv(), recvfrom() and recvmsg(). Posix defines MSG_TRUNC as an output flag, indicating packet/datagram truncation. Linux extended it a while (~15+ years) ago to act as input flag, resulting in returning the full packet size regarless of the input buffer size. It's a (relatively) popular pattern to do recvmsg( MSG_PEEK \| MSG_TRUNC) to get the packet size, allocate the buffer and issue another call to fetch the packet. In particular, it's popular in userland netlink code, which is the primary driving factor of this change. This commit implements the MSG_TRUNC support for SOCK_DGRAM sockets (udp, unix and all soreceive_generic() users). PR: kern/176322 Reviewed by: pauamma(doc) Differential Revision: https://reviews.freebsd.org/D35909 MFC after: 1 month	2022-07-30 18:21:51 +00:00
Gleb Smirnoff	c261510ef5	sockets: fix setsockopt(SO_RCVTIMEO) on a listening socket MFC after: 3 weeks	2022-07-08 11:33:24 -07:00
Gleb Smirnoff	d8596171c5	sockets: use only soref()/sorele() as socket reference count o Retire SS_FDREF as it is basically a debug flag on top of already existing soref()/sorele(). o Convert SS_PROTOREF into soref()/sorele(). o Change reference model for the listen queues, see below. o Make sofree() private. The correct KPI to use is only sorele(). o Make soabort() respect the model and sorele() instead of sofree(). Note on listening queues. Until now the sockets on a queue had zero reference count. And the reference were given only upon accept(2). The assumption was that there is no way to see the queued socket from anywhere except its head. This is not true, since queued sockets already have pcbs, which are linked at least into the global pcb lists. With this change we put the reference right in the sonewconn() and on accept(2) path we just hand the existing reference to the file descriptor. Differential revision: https://reviews.freebsd.org/D35679	2022-07-04 12:40:51 -07:00
Gleb Smirnoff	bc7605647c	sockets: use positive flag for file descriptor socket reference Rename SS_NOFDREF to SS_FDREF and flip all bitwise operations. Mark sockets created by socreate() with SS_FDREF. This change is mostly illustrative. With it we see that SS_FDREF is a debugging flag, since: * socreate() takes a reference with soref(). * on accept path solisten_dequeue() takes a reference with soref() and then soaccept() sets SS_FDREF. * soclose() checks SS_FDREF, removes it and does sorele(). Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D35678	2022-07-04 12:40:51 -07:00
Gleb Smirnoff	66c8e3fccf	socket: fix listen(2) on an already listening socket Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35669 Fixes: `141fe2dcee`	2022-06-30 07:50:29 -07:00
Gleb Smirnoff	a4fc41423f	sockets: enable protocol specific socket buffers Split struct sockbuf into common shared fields and protocol specific union, where protocols are free to implement whatever buffer they want. Such protocols should mark themselves with PR_SOCKBUF and are expected to initialize their buffers in their pr_attach and tear them down in pr_detach. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35299	2022-06-24 09:09:10 -07:00
Mark Johnston	f6379f7fde	socket: Fix a race between kevent(2) and listen(2) When locking the knote list for a socket, we check whether the socket is a listening socket in order to select the appropriate mutex; a listening socket uses the socket lock, while data sockets use socket buffer mutexes. If SOLISTENING(so) is false and the knote lock routine locks a socket buffer, then it must re-check whether the socket is a listening socket since solisten_proto() could have changed the socket's identity while we were blocked on the socket buffer lock. Reported by: syzkaller Reviewed by: glebius MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35492	2022-06-16 10:20:04 -04:00
Gleb Smirnoff	a8e286bb5d	sockets: use socket buffer mutexes in struct socket directly Convert more generic socket code to not use sockbuf compat pointer. Continuation of `4328318445`.	2022-06-03 12:55:44 -07:00
Rick Macklem	373511338d	uipc_socket.c: Modify MSG_TLSAPPDATA to only do Alert Records Without this patch, the MSG_TLSAPPDATA flag would cause soreceive_generic() to return ENXIO for any non-application data record in a TLS receive stream. This works ok for TLS1.2, since Alert records appear to be the only non-application data records received. However, for TLS1.3, there can be post-handshake handshake records, such as NewSessionKey sent to the client from the server. These handshake records cannot be handled by the upcall which does an SSL_read() with length == 0. It appears that the client can simply throw away these NewSessionKey records, but to do so, it needs to receive them within the kernel. This patch modifies the semantics of MSG_TLSAPPDATA slightly, so that it only applies to Alert records and not Handshake records. It is needed to allow the krpc to work with KTLS1.3. Reviewed by: hselasky MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D35170	2022-05-14 12:56:50 -07:00
Gleb Smirnoff	4328318445	sockets: use socket buffer mutexes in struct socket directly Since `c67f3b8b78` the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In `74a68313b5` macros that access this mutex directly were added. Go over the core socket code and eliminate code that reaches the mutex by dereferencing the sockbuf compatibility pointer. This change requires a KPI change, as some functions were given the sockbuf pointer only without any hint if it is a receive or send buffer. This change doesn't cover the whole kernel, many protocols still use compatibility pointers internally. However, it allows operation of a protocol that doesn't use them. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35152	2022-05-12 13:22:12 -07:00
Gleb Smirnoff	2e4e5ee23f	sockets: delete stale comment from sofree() First paragraph refers to old past "we used to" and is no longer important today. Second paragraph has just a wrong statement that socket buffer is destroyed before pru_detach.	2022-05-12 11:02:50 -07:00
Gleb Smirnoff	a982ce0442	sockets: remove the socket-on-stack hack from sorflush() The hack can be tracked down to 4.4BSD, where copy was performed under splimp() and then after splx() dom_dispose was called. Stevens has a chapter on this function, but he doesn't answer why this trick is necessary. Why can't we call into dom_dispose under splimp()? Anyway, with multithreaded kernel the hack seems to be necessary to avoid LORs between socket buffer lock and different filesystem locks, especially network file systems. The new socket buffers KPI sbcut() from `1d2df300e9` allow us to get rid of the hack. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35125	2022-05-09 10:43:01 -07:00
Gleb Smirnoff	42f2fa9953	sockets: don't call dom_dispose() on a listening socket sorflush() already did the right thing, so only sofree() needed a fix. Turn check into assertion in our only dom_dispose method. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35124	2022-05-09 10:42:57 -07:00
Gleb Smirnoff	c17418a0ba	sockets: assert that any protocol with PR_RIGHTS has dom_dispose() Through the entire history only PF_UNIX has this feature. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35123	2022-05-09 10:42:48 -07:00
Gleb Smirnoff	97f8198e95	sockets: make SO_SND/SO_RCV a enum Not a functional change now. The enum will also be used for other socket buffer related KPIs.	2022-05-09 10:42:47 -07:00
Alexander Leidinger	aeb91e95cf	Log euid, rgid and jail on listen queue overflow If you have numerous jails with multiple similar services running, this helps to narrow down which services this log is referring to.	2022-03-26 11:17:55 +01:00
Mark Johnston	5de79eeddb	ktls: Disallow transmitting empty frames outside of TLS 1.0/CBC mode There was nothing preventing one from sending an empty fragment on an arbitrary KTLS TX-enabled socket, but ktls_frame() asserts that this could not happen. Though the transmit path handles this case for TLS 1.0 with AES-CBC, we should be strict and allow empty fragments only in modes where it is explicitly allowed. Modify sosend_generic() to reject writes to a KTLS-enabled socket if the number of data bytes is zero, so that userspace cannot trigger the aforementioned assertion. Add regression tests to exercise this case. Reported by: syzkaller Reviewed by: gallatin, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34195	2022-02-08 12:40:41 -05:00
Alexander Motin	fe27f1db5f	kern: Remove CTLFLAG_NEEDGIANT from some sysctls. MFC after: 2 weeks	2021-12-26 12:03:33 -05:00
Hans Petter Selasky	f9978339d1	Remove dead code. The variable orig_resid is always set to zero right after the while loop where it is cleared. Reviewed by: gallatin@ and glebius@ Differential Revision: https://reviews.freebsd.org/D33589 MFC after: 1 week Sponsored by: NVIDIA Networking	2021-12-21 18:35:03 +01:00
John Baldwin	e3ba94d4f3	Don't require the socket lock for sorele(). Previously, sorele() always required the socket lock and dropped the lock if the released reference was not the last reference. Many callers locked the socket lock just before calling sorele() resulting in a wasted lock/unlock when not dropping the last reference. Move the previous implementation of sorele() into a new sorele_locked() function and use it instead of sorele() for various places in uipc_socket.c that called sorele() while already holding the socket lock. The sorele() macro now uses refcount_release_if_not_last() try to drop the socket reference without locking the socket. If that shortcut fails, it locks the socket and calls sorele_locked(). Reviewed by: kib, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32741	2021-11-09 10:50:12 -08:00
Allan Jude	c441592a0e	Allow kern.ipc.maxsockets to be set to current value without error Normally setting kern.ipc.maxsockets returns EINVAL if the new value is not greater than the previous value. This can cause spurious error messages when sysctl.conf is processed multiple times, or when automation systems try to ensure the sysctl is set to the correct value. If the value is unchanged, then just do nothing. PR: 243532 Reviewed by: markj MFC after: 3 days Sponsored by: Modirum MDPay Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D32775	2021-11-04 12:56:09 +00:00
Gleb Smirnoff	a37e4fd1ea	Re-style `dfcef87714` to keep the code and variables related to listening sockets separated from code for generic sockets. No objection: markj	2021-10-01 13:38:24 -07:00
Mark Johnston	ade1daa5c0	socket: Synchronize soshutdown() with listen(2) and AIO To handle shutdown(SHUT_RD) we flush the receive buffer of the socket. This may involve searching for control messages of type SCM_RIGHTS, since we need to close the file references. Closing arbitrary files with socket buffer locks held is undesirable, mainly due to lock ordering issues, so we instead make a copy of the socket buffer and operate on that without any locks. Fields in the original buffer are cleared. This behaviour clobbered the AIO job queue associated with a receive buffer. It could also cause us to leak a KTLS session reference. Reorder socket buffer fields to address this. An alternate solution would be to remove the hack in sorflush(), but this is not quite feasible (yet). In particular, though sorflush() flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to be queued - the flag just prevents userspace from reading more data. I suspect we should fix this; SBS_CANTRCVMORE represents a terminal state and protocols can likely just drop any data destined for such a buffer. Many of them already do, but in some cases the check is racy, and some KPI churn will be needed to fix everything. This approach is more straightforward for now. Reported by: syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com Reported by: syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31976	2021-09-17 14:19:06 -04:00
Mark Johnston	883761f0a8	socket: Remove NOFREE from the socket zone This flag was added during the transition away from the legacy zone allocator, commit `c897b81311`. The old zone allocator effectively provided _NOFREE semantics, but it seems that they are not required for sockets. In particular, we use reference counting to keep sockets live. One somewhat dangerous case is sonewconn(), which returns a pointer to a socket with reference count 0. This socket is still effectively owned by the listening socket. Protocols must therefore be careful to synchronize sonewconn() calls with their pru_close implementations, since for listening sockets soclose() will abort the child sockets. For example, TCP holds the listening socket's PCB read locked across the sonewconn() call, which blocks tcp_usr_close(), and sofree() synchronizes with a concurrent soabort() of the nascent socket. However, _NOFREE semantics are not required here. Eliminating _NOFREE has several benefits: it enables use-after-free detection (e.g., by KASAN) and lets the system reclaim memory from the socket zone under memory pressure. No functional change intended. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31975	2021-09-17 14:19:06 -04:00
Mark Johnston	6b288408ca	socket: Add assertions around naked refcount decrements Sockets in a listen queue hold a reference to the parent listening socket. Several code paths release this reference manually when moving a child socket out of the queue. Replace comments about the expected post-decrement refcount value with assertions. Use refcount_load() instead of a plain load. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31974	2021-09-17 14:19:06 -04:00
Mark Johnston	dfcef87714	socket: Fix a use-after-free in soclose() After releasing the fd reference to a socket "so", we should avoid testing SOLISTENING(so) since the socket may have been freed. Instead, directly test whether the list of unaccepted sockets is empty. Fixes: `f4bb1869dd` ("Consistently use the SOLISTENING() macro") Pointy hat: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31973	2021-09-17 14:19:05 -04:00
Mark Johnston	fa0463c384	socket: De-duplicate SBLOCKWAIT() definitions MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-09-14 09:01:32 -04:00
Mark Johnston	141fe2dcee	aio: Interlock with listen(2) soo_aio_queue() did not handle the possibility that the provided socket is a listening socket. Up until recently, to fix this one would have to acquire the socket lock first and check, since the socket buffer locks were destroyed by listen(2). Now that the socket buffer locks belong to the socket, simply check SOLISTENING(so) after acquiring them, and make listen(2) return an error if any AIO jobs are enqueued on the socket. Add a couple of simple regression test cases. Note that this fixes things only for the default AIO implementation; cxgbe(4)'s TCP offload has a separate pru_aio_queue implementation which requires its own solution. Reported by: syzbot+c8aa122fa2c6a4e2a28b@syzkaller.appspotmail.com Reported by: syzbot+39af117d43d4f0faf512@syzkaller.appspotmail.com Reported by: syzbot+60cceb9569145a0b993b@syzkaller.appspotmail.com Reported by: syzbot+2d522c5db87710277ca5@syzkaller.appspotmail.com Reviewed by: tuexen, gallatin, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31901	2021-09-10 17:21:11 -04:00
Mark Johnston	523d58aad1	socket: Remove unneeded SOLISTENING checks Now that SOCK_IO_*_LOCK() checks for listening sockets, we can eliminate some racy SOLISTENING() checks. No functional change intended. Reviewed by: tuexen MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31660	2021-09-07 17:12:09 -04:00
Mark Johnston	bd4a39cc93	socket: Properly interlock when transitioning to a listening socket Currently, most protocols implement pru_listen with something like the following: SOCK_LOCK(so); error = solisten_proto_check(so); if (error) { SOCK_UNLOCK(so); return (error); } solisten_proto(so); SOCK_UNLOCK(so); solisten_proto_check() fails if the socket is connected or connecting. However, the socket lock is not used during I/O, so this pattern is racy. The change modifies solisten_proto_check() to additionally acquire socket buffer locks, and the calling thread holds them until solisten_proto() or solisten_proto_abort() is called. Now that the socket buffer locks are preserved across a listen(2), this change allows socket I/O paths to properly interlock with listen(2). This fixes a large number of syzbot reports, only one is listed below and the rest will be dup'ed to it. Reported by: syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31659	2021-09-07 17:11:43 -04:00
Mark Johnston	f94acf52a4	socket: Rename sb(un)lock() and interlock with listen(2) In preparation for moving sockbuf locks into the containing socket, provide alternative macros for the sockbuf I/O locks: SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a socket rather than a socket buffer. Note that these locks are used only to prevent concurrent readers and writters from interleaving I/O. When locking for I/O, return an error if the socket is a listening socket. Currently the check is racy since the sockbuf sx locks are destroyed during the transition to a listening socket, but that will no longer be true after some follow-up changes. Modify a few places to check for errors from sblock()/SOCK_IO_(SEND\|RECV)_LOCK() where they were not before. In particular, add checks to sendfile() and sorflush(). Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31657	2021-09-07 15:06:48 -04:00
Roy Marples	7045b1603b	socket: Implement SO_RERROR SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports. Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652	2021-07-28 09:35:09 -07:00
Mark Johnston	a100217489	Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros This makes it easier to change the socket locking protocols. No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-06-14 17:32:32 -04:00

1 2 3 4 5 ...

559 Commits