This change is also a preparation for further optimization to
allow locked return on success.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35182
Assert that sockets are of the same type. unp_connectat() already did
this check. Add the check to uipc_connect2().
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35181
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b50 macros that access
this mutex directly were added. Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.
This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.
This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally. However, it allows operation of a
protocol that doesn't use them.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35152
The hack can be tracked down to 4.4BSD, where copy was performed
under splimp() and then after splx() dom_dispose was called.
Stevens has a chapter on this function, but he doesn't answer why
this trick is necessary. Why can't we call into dom_dispose under
splimp()? Anyway, with multithreaded kernel the hack seems to be
necessary to avoid LORs between socket buffer lock and different
filesystem locks, especially network file systems.
The new socket buffers KPI sbcut() from 1d2df300e9b allow us to get
rid of the hack.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35125
sorflush() already did the right thing, so only sofree() needed
a fix. Turn check into assertion in our only dom_dispose method.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35124
This gets rid of the error prone naming where fget_unlocked returns with
a ref held, while fget_locked requires a lock but provides nothing in
terms of making sure the file lives past unlock.
No functional changes.
This is behavior what some programs expect and what Linux does. For
example nginx expects EAGAIN when sending messages to /var/run/log,
which it connects to with O_NONBLOCK. Particularly with nginx the
problem is magnified by the fact that a ENOBUFS on send(2) is also
logged, so situation creates a log-bomb - a failed log message
triggers another log message.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D34187
There left only three modules that used dom_init(). And netipsec
was the last one to use dom_destroy().
Differential revision: https://reviews.freebsd.org/D33540
syslog(3) was recently change to support larger messages, up to 8KB.
Our syslogd handles this fine, as it adjusts /dev/log's recv buffer to a
large size. rsyslog, however, uses the system default of 4KB. This
leads to problems since our syslog(3) retries indefinitely when a send()
returns ENOBUFS, but if the message is large enough this will never
succeed.
Increase the default recv buffer size for datagram sockets to support
8KB syslog messages without requiring the logging daemon to adjust its
buffers.
PR: 260126
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33380
We need to load the socket pointer after locking the PCB, otherwise
the socket may have been detached and freed by the time that unp_drop()
sets so_error.
This previously went unnoticed as the socket zone was _NOFREE.
Reported by: pho
MFC after: 1 week
Currently, most protocols implement pru_listen with something like the
following:
SOCK_LOCK(so);
error = solisten_proto_check(so);
if (error) {
SOCK_UNLOCK(so);
return (error);
}
solisten_proto(so);
SOCK_UNLOCK(so);
solisten_proto_check() fails if the socket is connected or connecting.
However, the socket lock is not used during I/O, so this pattern is
racy.
The change modifies solisten_proto_check() to additionally acquire
socket buffer locks, and the calling thread holds them until
solisten_proto() or solisten_proto_abort() is called. Now that the
socket buffer locks are preserved across a listen(2), this change allows
socket I/O paths to properly interlock with listen(2).
This fixes a large number of syzbot reports, only one is listed below
and the rest will be dup'ed to it.
Reported by: syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com
Reviewed by: tuexen, gallatin
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31659
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.
Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.
Reviewed by: oshogbo
Discussed with: emaste
Reported by: manu
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29423
MSG_CMSG_CLOEXEC has not been working since 2015 (SVN r284380) because
_finstall expects O_CLOEXEC and not UF_EXCLOSE as the flags argument.
This was probably not noticed because we don't have a test for this flag
so this commit adds one. I found this problem because one of the
libwayland tests was failing.
Fixes: ea31808c3b07 ("fd: move out actual fp installation to _finstall")
MFC after: 3 days
Reviewed By: mjg, kib
Differential Revision: https://reviews.freebsd.org/D29328
The current list is limited to the cases where UFS needs to handle
vput(dvp) specially. Which means VOP_CREATE(), VOP_MKDIR(), VOP_MKNOD(),
VOP_LINK(), and VOP_SYMLINK().
Reviewed by: chs, mkcusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.
This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata. In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating. Right now this should be nop.
Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
This option is intended to be semantically identical to Linux's
SOL_SOCKET:SO_PASSCRED. For now, it is mutually exclusive with the
pre-existing sockopt SOL_LOCAL:LOCAL_CREDS.
Reviewed by: markj (penultimate version)
Differential Revision: https://reviews.freebsd.org/D27011
This function wasn't converted to use the new locking protocol in
r333744. Make it use the PCB lock for synchronizing connection state.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26300
unp_pcb_owned_lock2() has some sharp edges and forces callers to deal
with a bunch of cases. Simplify it:
- Rename to unp_pcb_lock_peer().
- Return the connected peer instead of forcing callers to load it
beforehand.
- Handle self-connected sockets.
- In unp_connectat(), just lock the accept socket directly. It should
not be possible for the nascent socket to participate in any other
lock orders.
- Get rid of connect_internal(). It does not provide any useful
checking anymore.
- Block in unp_connectat() when a different thread is concurrently
attempting to lock both sides of a connection. This provides simpler
semantics for callers of unp_pcb_lock_peer().
- Make unp_connectat() return EISCONN if the socket is already
connected. This fixes a race[1] when multiple threads attempt to
connect() to different addresses using the same datagram socket.
Upper layers will disconnect a connected datagram socket before
calling the protocol connect's method, but there is no synchronization
between this and protocol-layer code.
Reported by: syzkaller [1]
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26299
The allocated memory is only required for SOCK_STREAM and SOCK_SEQPACKET
sockets.
Reviewed by: kevans
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26298
In all cases, PCBs are unlocked after unp_disconnect() returns. Since
unp_disconnect() may release the last PCB reference, callers may have to
bump the refcount before the call just so that they can release them
again.
Change unp_disconnect() to release PCB locks as well as connection
references; this lets us remove several refcount manipulations. Tighten
assertions.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26297
unp_pcb_lock_pair() seems like a better name. Also make it handle the
case where the two sockets are the same instead of making callers do it.
No functional change intended.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26296
- Use refcount_init().
- Define an INVARIANTS-only zone destructor to assert that various
bits of PCB state aren't left dangling.
- Annotate unp_pcb_rele() with __result_use_check.
- Simplify control flow.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26295
- Define a locking key for unpcb members.
- Rewrite some of the locking protocol description to make it less
verbose and avoid referencing some subroutines which will be renamed.
- Reorder includes.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26294
The constant seems to exists on MacOS X >= 10.8.
Requested by: swills
Reviewed by: allanjude, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25933
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.
Reported by: syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes: r359778
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Previously the unpcb pointer of the newly connected remote socket was
not initialized correctly, so attempting to lock it would result in a
null pointer dereference.
Reported by: syzkaller
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
If LOCAL_CREDS is set on a unix socket and sendfile() is called,
sendfile will call uipc_send(PRUS_NOTREADY), prepending a control
message to the M_NOTREADY mbufs. uipc_send() then calls
sbappendcontrol() instead of sbappend(), and sbappendcontrol() would
erroneously clear M_NOTREADY.
Pass send flags to sbappendcontrol(), like we do for sbappend(), to
preserve M_READY when necessary.
Reported by: syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24333
When transmitting over a unix socket, data is placed directly into the
receiving socket's receive buffer, instead of the transmitting socket's
send buffer. This means that when pru_ready is called during
sendfile(), the passed socket does not contain M_NOTREADY mbufs in its
buffers; uipc_ready() must locate the linked socket.
Currently uipc_ready() frees the mbufs if the socket is disconnected,
but this is wrong since the mbufs may still be present in the receiving
socket's buffer after a disconnect. This can result in a use-after-free
and potentially a double free if the receive buffer is flushed after
uipc_ready() frees the mbufs.
Fix the problem by trying harder to locate the correct socket buffer and
calling sbready(): use the global list of SOCK_STREAM unix sockets to
search for a sockbuf containing the now-ready mbufs. Only free the
mbufs if we fail this search.
Reviewed by: jah, kib
Reported and tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24332
Remove a goto and an unneeded local variable, and fix style. No
functional change intended.
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
unp_connectat() no longer holds the link lock across calls to
sonewconn(), so the recursion described in r303855 can no longer occur.
No functional change intended.
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation