388 Commits

Author SHA1 Message Date
Gleb Smirnoff
bb35a4e11d unix: microoptimize unp_connectat() - one less lock on success
This change is also a preparation for further optimization to
allow locked return on success.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35182
2022-05-12 13:22:39 -07:00
Gleb Smirnoff
08f17d1432 unix: make unp_connect2() void
Assert that sockets are of the same type.  unp_connectat() already did
this check.  Add the check to uipc_connect2().

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35181
2022-05-12 13:22:39 -07:00
Gleb Smirnoff
4328318445 sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it.  In 74a68313b50 macros that access
this mutex directly were added.  Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally.  However, it allows operation of a
protocol that doesn't use them.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35152
2022-05-12 13:22:12 -07:00
Gleb Smirnoff
01235012e5 unix/dgram: uipc_listen() is specific for SOCK_STREAM and SOCK_SEQPACKET
Rely on pr_usrreqs_init() to init SOCK_DGRAM to pru_listen_notsupp().
2022-05-12 11:04:40 -07:00
Gleb Smirnoff
3c87ba3c3b unix/dgram: pru_rcvd never called since PR_WANTRCVD not set 2022-05-12 11:04:40 -07:00
Gleb Smirnoff
1f32cef471 unix: don't call sbrelease() in uipc_detach()
Since a982ce04428e the socket buffer is already cleared and released in
unp_dispose() that is called just before uipc_detach().
2022-05-12 11:02:50 -07:00
Gleb Smirnoff
a982ce0442 sockets: remove the socket-on-stack hack from sorflush()
The hack can be tracked down to 4.4BSD, where copy was performed
under splimp() and then after splx() dom_dispose was called.
Stevens has a chapter on this function, but he doesn't answer why
this trick is necessary.  Why can't we call into dom_dispose under
splimp()?  Anyway, with multithreaded kernel the hack seems to be
necessary to avoid LORs between socket buffer lock and different
filesystem locks, especially network file systems.

The new socket buffers KPI sbcut() from 1d2df300e9b allow us to get
rid of the hack.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35125
2022-05-09 10:43:01 -07:00
Gleb Smirnoff
42f2fa9953 sockets: don't call dom_dispose() on a listening socket
sorflush() already did the right thing, so only sofree() needed
a fix.  Turn check into assertion in our only dom_dispose method.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35124
2022-05-09 10:42:57 -07:00
Gleb Smirnoff
24df85d29a unix/*: unp_internalize() can sleep, so allocate mbufs with M_WAITOK 2022-05-09 10:42:48 -07:00
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Mateusz Guzik
f17ef28674 fd: rename fget*_locked to fget*_noref
This gets rid of the error prone naming where fget_unlocked returns with
a ref held, while fget_locked requires a lock but provides nothing in
terms of making sure the file lives past unlock.

No functional changes.
2022-02-22 18:53:43 +00:00
Gleb Smirnoff
65572cade3 unix/dgram: return EAGAIN instead of ENOBUFS when O_NONBLOCK set
This is behavior what some programs expect and what Linux does.  For
example nginx expects EAGAIN when sending messages to /var/run/log,
which it connects to with O_NONBLOCK.  Particularly with nginx the
problem is magnified by the fact that a ENOBUFS on send(2) is also
logged, so situation creates a log-bomb - a failed log message
triggers another log message.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D34187
2022-02-14 09:21:55 -08:00
Gleb Smirnoff
24e1c6ae7d domains: init with standard SYSINIT(9) or VNET_SYSINIT()
There left only three modules that used dom_init().  And netipsec
was the last one to use dom_destroy().

Differential revision:	https://reviews.freebsd.org/D33540
2022-01-03 10:15:22 -08:00
Mark Johnston
d157f2627b unix: Increase the default datagram recv buffer size
syslog(3) was recently change to support larger messages, up to 8KB.
Our syslogd handles this fine, as it adjusts /dev/log's recv buffer to a
large size.  rsyslog, however, uses the system default of 4KB.  This
leads to problems since our syslog(3) retries indefinitely when a send()
returns ENOBUFS, but if the message is large enough this will never
succeed.

Increase the default recv buffer size for datagram sockets to support
8KB syslog messages without requiring the logging daemon to adjust its
buffers.

PR:		260126
Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33380
2021-12-17 13:09:49 -05:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf640409a1 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Mark Johnston
42188bb5c1 unix: Remove a write-only local variable
Reported by:	clang
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-16 13:30:22 -05:00
Mark Johnston
50b07c1f71 unix: Fix a use-after-free in unp_drop()
We need to load the socket pointer after locking the PCB, otherwise
the socket may have been detached and freed by the time that unp_drop()
sets so_error.

This previously went unnoticed as the socket zone was _NOFREE.

Reported by:	pho
MFC after:	1 week
2021-09-18 10:38:39 -04:00
Mark Johnston
bd4a39cc93 socket: Properly interlock when transitioning to a listening socket
Currently, most protocols implement pru_listen with something like the
following:

	SOCK_LOCK(so);
	error = solisten_proto_check(so);
	if (error) {
		SOCK_UNLOCK(so);
		return (error);
	}
	solisten_proto(so);
	SOCK_UNLOCK(so);

solisten_proto_check() fails if the socket is connected or connecting.
However, the socket lock is not used during I/O, so this pattern is
racy.

The change modifies solisten_proto_check() to additionally acquire
socket buffer locks, and the calling thread holds them until
solisten_proto() or solisten_proto_abort() is called.  Now that the
socket buffer locks are preserved across a listen(2), this change allows
socket I/O paths to properly interlock with listen(2).

This fixes a large number of syzbot reports, only one is listed below
and the rest will be dup'ed to it.

Reported by:	syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com
Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31659
2021-09-07 17:11:43 -04:00
Roy Marples
7045b1603b socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652
2021-07-28 09:35:09 -07:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Mark Johnston
274579831b capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.

Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.

Reviewed by:	oshogbo
Discussed with:	emaste
Reported by:	manu
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29423
2021-04-07 14:32:56 -04:00
Alex Richardson
6ceacebdf5 Unbreak MSG_CMSG_CLOEXEC
MSG_CMSG_CLOEXEC has not been working since 2015 (SVN r284380) because
_finstall expects O_CLOEXEC and not UF_EXCLOSE as the flags argument.
This was probably not noticed because we don't have a test for this flag
so this commit adds one. I found this problem because one of the
libwayland tests was failing.

Fixes:		ea31808c3b07 ("fd: move out actual fp installation to _finstall")
MFC after:	3 days
Reviewed By:	mjg, kib
Differential Revision: https://reviews.freebsd.org/D29328
2021-03-18 20:52:20 +00:00
Konstantin Belousov
3b2aa36024 Use VOP_VPUT_PAIR() for eligible VFS syscalls.
The current list is limited to the cases where UFS needs to handle
vput(dvp) specially. Which means VOP_CREATE(), VOP_MKDIR(), VOP_MKNOD(),
VOP_LINK(), and VOP_SYMLINK().

Reviewed by:	chs, mkcusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Alexander V. Chernikov
924d1c9a05 Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.
2021-02-08 22:32:32 +00:00
Alexander V. Chernikov
4a01b854ca SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
2021-02-08 21:42:20 +00:00
Mateusz Guzik
4faa375cdd fd: provide a dedicated closef variant for unix socket code
This avoids testing for td != NULL.
2021-01-13 03:27:03 +01:00
Mateusz Guzik
cdb62ab74e vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callers
Check the comment above the routine for reasoning.
2021-01-12 13:16:10 +00:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Mateusz Guzik
6404d7ffc1 uipc: disable prediction in unp_pcb_lock_peer
The branch is not very predictable one way or the other, at least during
buildkernel where it only correctly matched 57% of calls.
2020-12-13 21:32:19 +00:00
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Conrad Meyer
ede4af47ae unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI
As this ABI is still fresh (r367287), let's correct some mistakes now:

- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess

Discussed with:	kib, markj, trasz
Differential Revision:	https://reviews.freebsd.org/D27084
2020-11-17 20:01:21 +00:00
Konstantin Belousov
441eb16a95 Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata.  In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating.  Right now this should be nop.

Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-13 09:42:32 +00:00
Mateusz Guzik
3c50616fc1 fd: make all f_count uses go through refcount_* 2020-11-05 02:12:33 +00:00
Conrad Meyer
2de07e4096 unix(4): Add SOL_LOCAL:LOCAL_CREDS_PERSISTENT
This option is intended to be semantically identical to Linux's
SOL_SOCKET:SO_PASSCRED.  For now, it is mutually exclusive with the
pre-existing sockopt SOL_LOCAL:LOCAL_CREDS.

Reviewed by:	markj (penultimate version)
Differential Revision:	https://reviews.freebsd.org/D27011
2020-11-03 01:17:45 +00:00
Mark Johnston
b4e07e3da5 Fix locking in uipc_accept().
Reported by:	cy
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-09-15 23:03:56 +00:00
Mark Johnston
448000279e Fix locking in uipc_accept().
This function wasn't converted to use the new locking protocol in
r333744.  Make it use the PCB lock for synchronizing connection state.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26300
2020-09-15 19:23:42 +00:00
Mark Johnston
ccdadf1a9b Simplify unix socket connection peer locking.
unp_pcb_owned_lock2() has some sharp edges and forces callers to deal
with a bunch of cases.  Simplify it:

- Rename to unp_pcb_lock_peer().
- Return the connected peer instead of forcing callers to load it
  beforehand.
- Handle self-connected sockets.
- In unp_connectat(), just lock the accept socket directly.  It should
  not be possible for the nascent socket to participate in any other
  lock orders.
- Get rid of connect_internal().  It does not provide any useful
  checking anymore.
- Block in unp_connectat() when a different thread is concurrently
  attempting to lock both sides of a connection.  This provides simpler
  semantics for callers of unp_pcb_lock_peer().
- Make unp_connectat() return EISCONN if the socket is already
  connected.  This fixes a race[1] when multiple threads attempt to
  connect() to different addresses using the same datagram socket.
  Upper layers will disconnect a connected datagram socket before
  calling the protocol connect's method, but there is no synchronization
  between this and protocol-layer code.

Reported by:	syzkaller [1]
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26299
2020-09-15 19:23:22 +00:00
Mark Johnston
ed92e1c78c Avoid an unnecessary malloc() when connecting dgram sockets.
The allocated memory is only required for SOCK_STREAM and SOCK_SEQPACKET
sockets.

Reviewed by:	kevans
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26298
2020-09-15 19:23:01 +00:00
Mark Johnston
f0317f868b Simplify unp_disconnect() callers.
In all cases, PCBs are unlocked after unp_disconnect() returns.  Since
unp_disconnect() may release the last PCB reference, callers may have to
bump the refcount before the call just so that they can release them
again.

Change unp_disconnect() to release PCB locks as well as connection
references; this lets us remove several refcount manipulations.  Tighten
assertions.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26297
2020-09-15 19:22:37 +00:00
Mark Johnston
4820bf6ac2 Rename unp_pcb_lock2().
unp_pcb_lock_pair() seems like a better name.  Also make it handle the
case where the two sockets are the same instead of making callers do it.
No functional change intended.

Reviewed by:	glebius, kevans, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26296
2020-09-15 19:22:16 +00:00
Mark Johnston
5362170da7 Improve unix socket PCB refcounting.
- Use refcount_init().
- Define an INVARIANTS-only zone destructor to assert that various
  bits of PCB state aren't left dangling.
- Annotate unp_pcb_rele() with __result_use_check.
- Simplify control flow.

Reviewed by:	glebius, kevans, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26295
2020-09-15 19:21:58 +00:00
Mark Johnston
d5cbccecd8 Update unix domain socket locking comments.
- Define a locking key for unpcb members.
- Rewrite some of the locking protocol description to make it less
  verbose and avoid referencing some subroutines which will be renamed.
- Reorder includes.

Reviewed by:	glebius, kevans, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26294
2020-09-15 19:21:33 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Konstantin Belousov
6e0c8e1ae2 Add SOL_LOCAL symbolic constant for unix socket option level.
The constant seems to exists on MacOS X >= 10.8.

Requested by:	swills
Reviewed by:	allanjude, kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25933
2020-08-03 22:13:02 +00:00
Mark Johnston
1b778ba260 Fix a logic error in uipc_ready_scan().
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.

Reported by:	syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes:		r359778
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-30 00:52:37 +00:00
Mark Johnston
b36871af6d Fix sendto() on unconnected SOCK_STREAM/SEQPACKET unix sockets.
Previously the unpcb pointer of the newly connected remote socket was
not initialized correctly, so attempting to lock it would result in a
null pointer dereference.

Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-04-13 19:22:05 +00:00
Mark Johnston
25f4ddfb2b sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs.
If LOCAL_CREDS is set on a unix socket and sendfile() is called,
sendfile will call uipc_send(PRUS_NOTREADY), prepending a control
message to the M_NOTREADY mbufs.  uipc_send() then calls
sbappendcontrol() instead of sbappend(), and sbappendcontrol() would
erroneously clear M_NOTREADY.

Pass send flags to sbappendcontrol(), like we do for sbappend(), to
preserve M_READY when necessary.

Reported by:	syzkaller
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24333
2020-04-10 20:42:11 +00:00
Mark Johnston
a50b1900a0 Properly handle disconnected sockets in uipc_ready().
When transmitting over a unix socket, data is placed directly into the
receiving socket's receive buffer, instead of the transmitting socket's
send buffer.  This means that when pru_ready is called during
sendfile(), the passed socket does not contain M_NOTREADY mbufs in its
buffers; uipc_ready() must locate the linked socket.

Currently uipc_ready() frees the mbufs if the socket is disconnected,
but this is wrong since the mbufs may still be present in the receiving
socket's buffer after a disconnect.  This can result in a use-after-free
and potentially a double free if the receive buffer is flushed after
uipc_ready() frees the mbufs.

Fix the problem by trying harder to locate the correct socket buffer and
calling sbready(): use the global list of SOCK_STREAM unix sockets to
search for a sockbuf containing the now-ready mbufs.  Only free the
mbufs if we fail this search.

Reviewed by:	jah, kib
Reported and tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24332
2020-04-10 20:41:59 +00:00
Mark Johnston
db38b699b4 Simplify uipc_detach() slightly.
Remove a goto and an unneeded local variable, and fix style.  No
functional change intended.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-03-20 16:18:54 +00:00
Mark Johnston
569a186d02 Remove UNP_NASCENT, reverting r303855.
unp_connectat() no longer holds the link lock across calls to
sonewconn(), so the recursion described in r303855 can no longer occur.
No functional change intended.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-03-20 16:17:54 +00:00