No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata. In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating. Right now this should be nop.
Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
This option is intended to be semantically identical to Linux's
SOL_SOCKET:SO_PASSCRED. For now, it is mutually exclusive with the
pre-existing sockopt SOL_LOCAL:LOCAL_CREDS.
Reviewed by: markj (penultimate version)
Differential Revision: https://reviews.freebsd.org/D27011
This function wasn't converted to use the new locking protocol in
r333744. Make it use the PCB lock for synchronizing connection state.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26300
unp_pcb_owned_lock2() has some sharp edges and forces callers to deal
with a bunch of cases. Simplify it:
- Rename to unp_pcb_lock_peer().
- Return the connected peer instead of forcing callers to load it
beforehand.
- Handle self-connected sockets.
- In unp_connectat(), just lock the accept socket directly. It should
not be possible for the nascent socket to participate in any other
lock orders.
- Get rid of connect_internal(). It does not provide any useful
checking anymore.
- Block in unp_connectat() when a different thread is concurrently
attempting to lock both sides of a connection. This provides simpler
semantics for callers of unp_pcb_lock_peer().
- Make unp_connectat() return EISCONN if the socket is already
connected. This fixes a race[1] when multiple threads attempt to
connect() to different addresses using the same datagram socket.
Upper layers will disconnect a connected datagram socket before
calling the protocol connect's method, but there is no synchronization
between this and protocol-layer code.
Reported by: syzkaller [1]
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26299
The allocated memory is only required for SOCK_STREAM and SOCK_SEQPACKET
sockets.
Reviewed by: kevans
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26298
In all cases, PCBs are unlocked after unp_disconnect() returns. Since
unp_disconnect() may release the last PCB reference, callers may have to
bump the refcount before the call just so that they can release them
again.
Change unp_disconnect() to release PCB locks as well as connection
references; this lets us remove several refcount manipulations. Tighten
assertions.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26297
unp_pcb_lock_pair() seems like a better name. Also make it handle the
case where the two sockets are the same instead of making callers do it.
No functional change intended.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26296
- Use refcount_init().
- Define an INVARIANTS-only zone destructor to assert that various
bits of PCB state aren't left dangling.
- Annotate unp_pcb_rele() with __result_use_check.
- Simplify control flow.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26295
- Define a locking key for unpcb members.
- Rewrite some of the locking protocol description to make it less
verbose and avoid referencing some subroutines which will be renamed.
- Reorder includes.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26294
The constant seems to exists on MacOS X >= 10.8.
Requested by: swills
Reviewed by: allanjude, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25933
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.
Reported by: syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes: r359778
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Previously the unpcb pointer of the newly connected remote socket was
not initialized correctly, so attempting to lock it would result in a
null pointer dereference.
Reported by: syzkaller
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
If LOCAL_CREDS is set on a unix socket and sendfile() is called,
sendfile will call uipc_send(PRUS_NOTREADY), prepending a control
message to the M_NOTREADY mbufs. uipc_send() then calls
sbappendcontrol() instead of sbappend(), and sbappendcontrol() would
erroneously clear M_NOTREADY.
Pass send flags to sbappendcontrol(), like we do for sbappend(), to
preserve M_READY when necessary.
Reported by: syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24333
When transmitting over a unix socket, data is placed directly into the
receiving socket's receive buffer, instead of the transmitting socket's
send buffer. This means that when pru_ready is called during
sendfile(), the passed socket does not contain M_NOTREADY mbufs in its
buffers; uipc_ready() must locate the linked socket.
Currently uipc_ready() frees the mbufs if the socket is disconnected,
but this is wrong since the mbufs may still be present in the receiving
socket's buffer after a disconnect. This can result in a use-after-free
and potentially a double free if the receive buffer is flushed after
uipc_ready() frees the mbufs.
Fix the problem by trying harder to locate the correct socket buffer and
calling sbready(): use the global list of SOCK_STREAM unix sockets to
search for a sockbuf containing the now-ready mbufs. Only free the
mbufs if we fail this search.
Reviewed by: jah, kib
Reported and tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24332
Remove a goto and an unneeded local variable, and fix style. No
functional change intended.
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
unp_connectat() no longer holds the link lock across calls to
sonewconn(), so the recursion described in r303855 can no longer occur.
No functional change intended.
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.
This change rewrites the garbage collector to precisely detect cycles:
1. The existing check of msgcount==f_count is still used to determine
whether the socket is potentially in a cycle.
2. The socket is now placed on a local "dead list", which is used to
reduce iteration time (and therefore contention on the global
unp_link_rwlock).
3. The first pass through the dead list removes each potentially-dead
socket's outgoing references from the graph of potentially-dead
sockets, using a gc-specific copy of the original reference count.
4. The second series of passes through the dead list removes from the
list any socket whose remaining gc refcount is non-zero, as this
indicates the socket is actually accessible outside of any possible
cycle. Iteration is repeated until no further sockets are removed
from the dead list.
5. Sockets remaining in the dead list are destroyed as before.
PR: 227285
Submitted by: jan.kokemueller@gmail.com (prior version)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D23142
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
As unp_internalize() processes the input control messages, it builds
an output mbuf chain containing the internalized representations of
those messages. In one special case, that of an empty SCM_RIGHTS
message, the message is simply discarded. However, the loop which
appends mbufs to the output chain assumed that each iteration would
produce an mbuf, resulting in a null pointer dereference if an empty
SCM_RIGHTS message was followed by a non-empty message.
Fix this by advancing the output mbuf chain tail pointer only if an
internalized control message was produced.
Reported by: syzbot+1b5cced0f7fad26ae382@syzkaller.appspotmail.com
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
fget_unlocked() and fhold().
On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
kern.maxfilesperproc: 939132
kern.maxprocperuid: 34203
which already overflows u_int. More, the malicious code can create
transient references by sending fds over unix sockets.
I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html
Reviewed by: markj (previous version), mjg
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20947
When sendmsg(2) sucessfully internalized one SCM_RIGHTS control
message, but failed to process some other control message later, both
file references and filedescent memory needs to be freed. This was not
done, only mbuf chain was freed.
Noted, test case written, reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21000
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.
PR: 215202
Reviewed by: tijl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20415
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland. Fix a number of such handlers.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: tuexen
MFC after: 3 days
Security: kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18301
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.
Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.
Sponsored by: The FreeBSD Foundation
If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space
for all control messages, the kernel sets MSG_CTRUNC in the message
flags to indicate truncation of the control messages. In the case
of SCM_RIGHTS messages, however, we were failing to dispose of the
rights that had already been externalized into the recipient's file
descriptor table. Add a new function and mbuf type to handle this
cleanup task, and use it any time we fail to copy control messages
out to the recipient. To simplify cleanup, control message truncation
is now only performed at control message boundaries.
The change also fixes a few related bugs:
- Rights could be leaked to the recipient process if an error occurred
while copying out a message's contents.
- We failed to set MSG_CTRUNC if the truncation occurred on a control
message boundary, e.g., if the caller received two control messages
and provided only the exact amount of buffer space needed for the
first.
PR: 131876
Reviewed by: ed (previous version)
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16561
sosend_generic() performs an initial comparison of the amount of data
(including control messages) to be transmitted with the send buffer
size. When transmitting on a unix socket, we then compare the amount
of data being sent with the amount of space in the receive buffer size;
if insufficient space is available, sbappendcontrol() returns an error
and the data is lost. This is easily triggered by sending control
messages together with an amount of data roughly equal to the send
buffer size, since the control message size may change in uipc_send()
as file descriptors are internalized.
Fix the problem by removing the space check in sbappendcontrol(),
whose only consumer is the unix sockets code. The stream sockets code
uses the SB_STOP mechanism to ensure that senders will block if the
receive buffer fills up.
PR: 181741
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16515
Enable the LOCAL_PEERCRED socket option for unix domain stream sockets
created with socketpair(2). Previously, it only worked with unix domain
stream sockets created with socket(2)/listen(2)/connect(2)/accept(2).
PR: 176419
Reported by: Nicholas Wilson <nicholas@nicholaswilson.me.uk>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16350
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.
On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662. This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.
PR: 228301 (exp-run by antoine)
Reviewed by: jtl, kib, rwatson (various versions)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15386
to themselves. The updated code assumed that that could not happen
and would try to lock the unp mutex twice.
There may be a lingering issue here but this fixes it for the
reporter.
PR: 228458
Reported by: marieheleneka at gmail.com
two copyrights. The line originated with the Berkeely Regents, who
we have not approached about removing it (it's honestly too trivial
to be worth that fight). Restore it to rwatson's line as well. He
can decide if he wants it or not on his own. Matt clearly doesn't
want it, per project preference and his own statements on IRC.
Noticed by: rgrimes@