freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	458f475df8	unix/dgram: smart socket buffers for one-to-many sockets A one-to-many unix/dgram socket is a socket that has been bound with bind(2) and can get multiple connections. A typical example is /var/run/log bound by syslogd(8) and receiving multiple connections from libc syslog(3) API. Until now all of these connections shared the same receive socket buffer of the bound socket. This made the socket vulnerable to overflow attack. See `240d5a9b1c` for a historical attempt to workaround the problem. This commit creates a per-connection socket buffer for every single connected socket and eliminates the problem. The new behavior will optimize seldom writers over frequent writers. See added test case scenarios and code comments for more detailed description of the new behavior. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35303	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	1093f16487	unix/dgram: reduce mbuf chain traversals in send(2) and recv(2) o Use m_pkthdr.memlen from m_uiotombuf() o Modify unp_internalize() to keep track of allocated space and memory as well as pointer to the last buffer. o Modify unp_addsockcred() to keep track of allocated space and memory as well as pointer to the last buffer. o Record the datagram len/memlen/ctllen in the first (from) mbuf of the chain in uipc_sosend_dgram() and reuse it in uipc_soreceive_dgram(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35302	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	9b841b0e23	m_uiotombuf: write total memory length of the allocated chain in pkthdr Data allocated by m_uiotombuf() usually goes into a socket buffer. We are interested in the length of useful data to be added to sb_acc, as well as total memory used by mbufs. The later would be added to sb_mbcnt. Calculating this value at allocation time allows to save on extra traversal of the mbuf chain. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35301	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	a7444f807e	unix/dgram: use minimal possible socket buffer for PF_UNIX/SOCK_DGRAM This change fully splits away PF_UNIX/SOCK_DGRAM from other socket buffer implementations, without any behavior changes. Generic socket implementation is reduced down to one STAILQ and very little code. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35300	2022-06-24 09:09:11 -07:00
Gleb Smirnoff	315167c0de	unix: provide an option to return locked from unp_connectat() Use this new version in unix/dgram socket when sending to a target address. This removes extra lock release/acquisition and possible counter-intuitive ENOTCONN. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35298	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	5dc8dd5f3a	unix/dgram: inline sbappendaddr_locked() into uipc_sosend_dgram() This allows to remove one M_NOWAIT allocation and also makes it more clear what's going on. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35297	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	e3fbbf965e	unix/dgram: add a specific receive method - uipc_soreceive_dgram With this second step PF_UNIX/SOCK_DGRAM has protocol specific implementation. This gives some possibility performance optimizations. However, it still operates on the same struct socket as all other sockets do. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35296	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	f384a97c83	unix/dgram: cleanup uipc_send of PF_UNIX/SOCK_DGRAM, step 2 Just remove one level of indentation as the case clause always match. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35295	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	7e5b6b391e	unix/dgram: cleanup uipc_send of PF_UNIX/SOCK_DGRAM, step 1 Remove the dead code. The new uipc_sosend_dgram() handles send() on PF_UNIX/SOCK_DGRAM in full. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35294	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	3464958246	unix/dgram: add a specific send method - uipc_sosend_dgram() This is first step towards splitting classic BSD socket implementation into separate classes. The first to be split is PF_UNIX/SOCK_DGRAM as it has most differencies to SOCK_STREAM sockets and to PF_INET sockets. Historically a protocol shall provide two methods for sendmsg(2): pru_sosend and pru_send. The former is a generic send method, e.g. sosend_generic() which would internally call the latter, uipc_send() in our case. There is one important exception, though, the sendfile(2) code will call pru_send directly. But sendfile doesn't work on SOCK_DGRAM, so we can do the trick. We will create socket class specific uipc_sosend_dgram() which will carry only important bits from sosend_generic() and uipc_send(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35293	2022-06-24 09:09:10 -07:00
Gleb Smirnoff	d97922c6c6	unix/*: rewrite unp_internalize() cmsg parsing cycle Make it a complex, but a single for(;;) statement. The previous cycle with some loop logic in the beginning and some loop logic at the end was confusing. Both me and markj@ were misleaded to a conclusion that some checks are unnecessary, while they actually were necessary. While here, handle an edge case found by Mark, when on 64-bit platform an incorrect message from userland would underflow length counter, but return without any error. Provide a test case for such message. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35375	2022-06-06 10:05:28 -07:00
Gleb Smirnoff	2573e6ced9	unix/dgram: rename unpdg_sendspace to unpdg_maxdgram Matches the meaning of the variable and sysctl node name.	2022-06-03 12:55:44 -07:00
Gleb Smirnoff	d64f2f42c1	unix: unp_externalize() can M_WAITOK Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35318	2022-05-27 20:48:38 -07:00
Gleb Smirnoff	75e7e3ce34	unix: fix incorrect assertion in `4682ac697c` Pointy hat to: glebius Fixes: `4682ac697c`	2022-05-26 11:35:05 -07:00
Gleb Smirnoff	4682ac697c	unix: turn check in unp_externalize() into assertion In this function we always work with mbufs that we previously created ourselves in unp_internalize(). They must be valid. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35319	2022-05-25 13:29:20 -07:00
Gleb Smirnoff	579b45e203	unix/*: check new control size in unp_internalize() Now that we call sbcreatecontrol() with M_WAITOK, we are expected to pass a valid size. Return same error code, we are returning for an oversized control from sockargs(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35317	2022-05-25 13:29:13 -07:00
Gleb Smirnoff	b46667c63e	sockbuf: merge two versions of sbcreatecontrol() into one No functional change.	2022-05-17 10:10:42 -07:00
Gleb Smirnoff	eac7f0798b	unix: garbage collect unp_dispose_mbuf() for brevity	2022-05-17 10:10:41 -07:00
Gleb Smirnoff	2e5bf7c49f	unix: fix mbuf leak on close of socket with data Fixes: `1f32cef471`	2022-05-17 10:10:41 -07:00
Gleb Smirnoff	bb35a4e11d	unix: microoptimize unp_connectat() - one less lock on success This change is also a preparation for further optimization to allow locked return on success. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35182	2022-05-12 13:22:39 -07:00
Gleb Smirnoff	08f17d1432	unix: make unp_connect2() void Assert that sockets are of the same type. unp_connectat() already did this check. Add the check to uipc_connect2(). Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35181	2022-05-12 13:22:39 -07:00
Gleb Smirnoff	4328318445	sockets: use socket buffer mutexes in struct socket directly Since `c67f3b8b78` the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In `74a68313b5` macros that access this mutex directly were added. Go over the core socket code and eliminate code that reaches the mutex by dereferencing the sockbuf compatibility pointer. This change requires a KPI change, as some functions were given the sockbuf pointer only without any hint if it is a receive or send buffer. This change doesn't cover the whole kernel, many protocols still use compatibility pointers internally. However, it allows operation of a protocol that doesn't use them. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35152	2022-05-12 13:22:12 -07:00
Gleb Smirnoff	01235012e5	unix/dgram: uipc_listen() is specific for SOCK_STREAM and SOCK_SEQPACKET Rely on pr_usrreqs_init() to init SOCK_DGRAM to pru_listen_notsupp().	2022-05-12 11:04:40 -07:00
Gleb Smirnoff	3c87ba3c3b	unix/dgram: pru_rcvd never called since PR_WANTRCVD not set	2022-05-12 11:04:40 -07:00
Gleb Smirnoff	1f32cef471	unix: don't call sbrelease() in uipc_detach() Since `a982ce0442` the socket buffer is already cleared and released in unp_dispose() that is called just before uipc_detach().	2022-05-12 11:02:50 -07:00
Gleb Smirnoff	a982ce0442	sockets: remove the socket-on-stack hack from sorflush() The hack can be tracked down to 4.4BSD, where copy was performed under splimp() and then after splx() dom_dispose was called. Stevens has a chapter on this function, but he doesn't answer why this trick is necessary. Why can't we call into dom_dispose under splimp()? Anyway, with multithreaded kernel the hack seems to be necessary to avoid LORs between socket buffer lock and different filesystem locks, especially network file systems. The new socket buffers KPI sbcut() from `1d2df300e9` allow us to get rid of the hack. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35125	2022-05-09 10:43:01 -07:00
Gleb Smirnoff	42f2fa9953	sockets: don't call dom_dispose() on a listening socket sorflush() already did the right thing, so only sofree() needed a fix. Turn check into assertion in our only dom_dispose method. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35124	2022-05-09 10:42:57 -07:00
Gleb Smirnoff	24df85d29a	unix/*: unp_internalize() can sleep, so allocate mbufs with M_WAITOK	2022-05-09 10:42:48 -07:00
Mateusz Guzik	bb92cd7bcd	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)	2022-03-24 10:20:51 +00:00
Mateusz Guzik	f17ef28674	fd: rename fget_locked to fget_noref This gets rid of the error prone naming where fget_unlocked returns with a ref held, while fget_locked requires a lock but provides nothing in terms of making sure the file lives past unlock. No functional changes.	2022-02-22 18:53:43 +00:00
Gleb Smirnoff	65572cade3	unix/dgram: return EAGAIN instead of ENOBUFS when O_NONBLOCK set This is behavior what some programs expect and what Linux does. For example nginx expects EAGAIN when sending messages to /var/run/log, which it connects to with O_NONBLOCK. Particularly with nginx the problem is magnified by the fact that a ENOBUFS on send(2) is also logged, so situation creates a log-bomb - a failed log message triggers another log message. Reviewed by: markj Differential revision: https://reviews.freebsd.org/D34187	2022-02-14 09:21:55 -08:00
Gleb Smirnoff	24e1c6ae7d	domains: init with standard SYSINIT(9) or VNET_SYSINIT() There left only three modules that used dom_init(). And netipsec was the last one to use dom_destroy(). Differential revision: https://reviews.freebsd.org/D33540	2022-01-03 10:15:22 -08:00
Mark Johnston	d157f2627b	unix: Increase the default datagram recv buffer size syslog(3) was recently change to support larger messages, up to 8KB. Our syslogd handles this fine, as it adjusts /dev/log's recv buffer to a large size. rsyslog, however, uses the system default of 4KB. This leads to problems since our syslog(3) retries indefinitely when a send() returns ENOBUFS, but if the message is large enough this will never succeed. Increase the default recv buffer size for datagram sockets to support 8KB syslog messages without requiring the logging daemon to adjust its buffers. PR: 260126 Reviewed by: asomers MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33380	2021-12-17 13:09:49 -05:00
Mateusz Guzik	7e1d3eefd4	vfs: remove the unused thread argument from NDINIT* See `b4a58fbf64` ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.	2021-11-25 22:50:42 +00:00
Mark Johnston	42188bb5c1	unix: Remove a write-only local variable Reported by: clang MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-11-16 13:30:22 -05:00
Mark Johnston	50b07c1f71	unix: Fix a use-after-free in unp_drop() We need to load the socket pointer after locking the PCB, otherwise the socket may have been detached and freed by the time that unp_drop() sets so_error. This previously went unnoticed as the socket zone was _NOFREE. Reported by: pho MFC after: 1 week	2021-09-18 10:38:39 -04:00
Mark Johnston	bd4a39cc93	socket: Properly interlock when transitioning to a listening socket Currently, most protocols implement pru_listen with something like the following: SOCK_LOCK(so); error = solisten_proto_check(so); if (error) { SOCK_UNLOCK(so); return (error); } solisten_proto(so); SOCK_UNLOCK(so); solisten_proto_check() fails if the socket is connected or connecting. However, the socket lock is not used during I/O, so this pattern is racy. The change modifies solisten_proto_check() to additionally acquire socket buffer locks, and the calling thread holds them until solisten_proto() or solisten_proto_abort() is called. Now that the socket buffer locks are preserved across a listen(2), this change allows socket I/O paths to properly interlock with listen(2). This fixes a large number of syzbot reports, only one is listed below and the rest will be dup'ed to it. Reported by: syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31659	2021-09-07 17:11:43 -04:00
Roy Marples	7045b1603b	socket: Implement SO_RERROR SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports. Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652	2021-07-28 09:35:09 -07:00
Mark Johnston	f4bb1869dd	Consistently use the SOLISTENING() macro Some code was using it already, but in many places we were testing SO_ACCEPTCONN directly. As a small step towards fixing some bugs involving synchronization with listen(2), make the kernel consistently use SOLISTENING(). No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-06-14 17:32:27 -04:00
Mark Johnston	274579831b	capsicum: Limit socket operations in capability mode Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However, these facilities can be used to circumvent some of the restrictions that capability mode is supposed to enforce. Add capability mode checks to disallow network configuration ioctls and creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET internet sockets. Reviewed by: oshogbo Discussed with: emaste Reported by: manu Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29423	2021-04-07 14:32:56 -04:00
Alex Richardson	6ceacebdf5	Unbreak MSG_CMSG_CLOEXEC MSG_CMSG_CLOEXEC has not been working since 2015 (SVN r284380) because _finstall expects O_CLOEXEC and not UF_EXCLOSE as the flags argument. This was probably not noticed because we don't have a test for this flag so this commit adds one. I found this problem because one of the libwayland tests was failing. Fixes: `ea31808c3b` ("fd: move out actual fp installation to _finstall") MFC after: 3 days Reviewed By: mjg, kib Differential Revision: https://reviews.freebsd.org/D29328	2021-03-18 20:52:20 +00:00
Konstantin Belousov	3b2aa36024	Use VOP_VPUT_PAIR() for eligible VFS syscalls. The current list is limited to the cases where UFS needs to handle vput(dvp) specially. Which means VOP_CREATE(), VOP_MKDIR(), VOP_MKNOD(), VOP_LINK(), and VOP_SYMLINK(). Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Alexander V. Chernikov	924d1c9a05	Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly. This reverts commit `4a01b854ca`.	2021-02-08 22:32:32 +00:00
Alexander V. Chernikov	4a01b854ca	SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.	2021-02-08 21:42:20 +00:00
Mateusz Guzik	4faa375cdd	fd: provide a dedicated closef variant for unix socket code This avoids testing for td != NULL.	2021-01-13 03:27:03 +01:00
Mateusz Guzik	cdb62ab74e	vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callers Check the comment above the routine for reasoning.	2021-01-12 13:16:10 +00:00
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Mateusz Guzik	6404d7ffc1	uipc: disable prediction in unp_pcb_lock_peer The branch is not very predictable one way or the other, at least during buildkernel where it only correctly matched 57% of calls.	2020-12-13 21:32:19 +00:00
Conrad Meyer	85078b8573	Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037	2020-11-17 21:14:13 +00:00
Conrad Meyer	ede4af47ae	unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI As this ABI is still fresh (r367287), let's correct some mistakes now: - Version the structure to allow for future changes - Include sender's pid in control message structure - Use a distinct control message type from the cmsgcred / sockcred mess Discussed with: kib, markj, trasz Differential Revision: https://reviews.freebsd.org/D27084	2020-11-17 20:01:21 +00:00

1 2 3 4 5 ...

407 Commits