Commit Graph

359 Commits

Author SHA1 Message Date
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Conrad Meyer
ede4af47ae unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI
As this ABI is still fresh (r367287), let's correct some mistakes now:

- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess

Discussed with:	kib, markj, trasz
Differential Revision:	https://reviews.freebsd.org/D27084
2020-11-17 20:01:21 +00:00
Konstantin Belousov
441eb16a95 Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata.  In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating.  Right now this should be nop.

Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.

In collaboration with:	pho
Reviewed by:	mckusick (previous version), markj
Tested by:	markj (syzkaller), pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26136
2020-11-13 09:42:32 +00:00
Mateusz Guzik
3c50616fc1 fd: make all f_count uses go through refcount_* 2020-11-05 02:12:33 +00:00
Conrad Meyer
2de07e4096 unix(4): Add SOL_LOCAL:LOCAL_CREDS_PERSISTENT
This option is intended to be semantically identical to Linux's
SOL_SOCKET:SO_PASSCRED.  For now, it is mutually exclusive with the
pre-existing sockopt SOL_LOCAL:LOCAL_CREDS.

Reviewed by:	markj (penultimate version)
Differential Revision:	https://reviews.freebsd.org/D27011
2020-11-03 01:17:45 +00:00
Mark Johnston
b4e07e3da5 Fix locking in uipc_accept().
Reported by:	cy
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-09-15 23:03:56 +00:00
Mark Johnston
448000279e Fix locking in uipc_accept().
This function wasn't converted to use the new locking protocol in
r333744.  Make it use the PCB lock for synchronizing connection state.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26300
2020-09-15 19:23:42 +00:00
Mark Johnston
ccdadf1a9b Simplify unix socket connection peer locking.
unp_pcb_owned_lock2() has some sharp edges and forces callers to deal
with a bunch of cases.  Simplify it:

- Rename to unp_pcb_lock_peer().
- Return the connected peer instead of forcing callers to load it
  beforehand.
- Handle self-connected sockets.
- In unp_connectat(), just lock the accept socket directly.  It should
  not be possible for the nascent socket to participate in any other
  lock orders.
- Get rid of connect_internal().  It does not provide any useful
  checking anymore.
- Block in unp_connectat() when a different thread is concurrently
  attempting to lock both sides of a connection.  This provides simpler
  semantics for callers of unp_pcb_lock_peer().
- Make unp_connectat() return EISCONN if the socket is already
  connected.  This fixes a race[1] when multiple threads attempt to
  connect() to different addresses using the same datagram socket.
  Upper layers will disconnect a connected datagram socket before
  calling the protocol connect's method, but there is no synchronization
  between this and protocol-layer code.

Reported by:	syzkaller [1]
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26299
2020-09-15 19:23:22 +00:00
Mark Johnston
ed92e1c78c Avoid an unnecessary malloc() when connecting dgram sockets.
The allocated memory is only required for SOCK_STREAM and SOCK_SEQPACKET
sockets.

Reviewed by:	kevans
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26298
2020-09-15 19:23:01 +00:00
Mark Johnston
f0317f868b Simplify unp_disconnect() callers.
In all cases, PCBs are unlocked after unp_disconnect() returns.  Since
unp_disconnect() may release the last PCB reference, callers may have to
bump the refcount before the call just so that they can release them
again.

Change unp_disconnect() to release PCB locks as well as connection
references; this lets us remove several refcount manipulations.  Tighten
assertions.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26297
2020-09-15 19:22:37 +00:00
Mark Johnston
4820bf6ac2 Rename unp_pcb_lock2().
unp_pcb_lock_pair() seems like a better name.  Also make it handle the
case where the two sockets are the same instead of making callers do it.
No functional change intended.

Reviewed by:	glebius, kevans, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26296
2020-09-15 19:22:16 +00:00
Mark Johnston
5362170da7 Improve unix socket PCB refcounting.
- Use refcount_init().
- Define an INVARIANTS-only zone destructor to assert that various
  bits of PCB state aren't left dangling.
- Annotate unp_pcb_rele() with __result_use_check.
- Simplify control flow.

Reviewed by:	glebius, kevans, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26295
2020-09-15 19:21:58 +00:00
Mark Johnston
d5cbccecd8 Update unix domain socket locking comments.
- Define a locking key for unpcb members.
- Rewrite some of the locking protocol description to make it less
  verbose and avoid referencing some subroutines which will be renamed.
- Reorder includes.

Reviewed by:	glebius, kevans, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26294
2020-09-15 19:21:33 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Konstantin Belousov
6e0c8e1ae2 Add SOL_LOCAL symbolic constant for unix socket option level.
The constant seems to exists on MacOS X >= 10.8.

Requested by:	swills
Reviewed by:	allanjude, kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25933
2020-08-03 22:13:02 +00:00
Mark Johnston
1b778ba260 Fix a logic error in uipc_ready_scan().
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.

Reported by:	syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes:		r359778
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-30 00:52:37 +00:00
Mark Johnston
b36871af6d Fix sendto() on unconnected SOCK_STREAM/SEQPACKET unix sockets.
Previously the unpcb pointer of the newly connected remote socket was
not initialized correctly, so attempting to lock it would result in a
null pointer dereference.

Reported by:	syzkaller
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-04-13 19:22:05 +00:00
Mark Johnston
25f4ddfb2b sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs.
If LOCAL_CREDS is set on a unix socket and sendfile() is called,
sendfile will call uipc_send(PRUS_NOTREADY), prepending a control
message to the M_NOTREADY mbufs.  uipc_send() then calls
sbappendcontrol() instead of sbappend(), and sbappendcontrol() would
erroneously clear M_NOTREADY.

Pass send flags to sbappendcontrol(), like we do for sbappend(), to
preserve M_READY when necessary.

Reported by:	syzkaller
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24333
2020-04-10 20:42:11 +00:00
Mark Johnston
a50b1900a0 Properly handle disconnected sockets in uipc_ready().
When transmitting over a unix socket, data is placed directly into the
receiving socket's receive buffer, instead of the transmitting socket's
send buffer.  This means that when pru_ready is called during
sendfile(), the passed socket does not contain M_NOTREADY mbufs in its
buffers; uipc_ready() must locate the linked socket.

Currently uipc_ready() frees the mbufs if the socket is disconnected,
but this is wrong since the mbufs may still be present in the receiving
socket's buffer after a disconnect.  This can result in a use-after-free
and potentially a double free if the receive buffer is flushed after
uipc_ready() frees the mbufs.

Fix the problem by trying harder to locate the correct socket buffer and
calling sbready(): use the global list of SOCK_STREAM unix sockets to
search for a sockbuf containing the now-ready mbufs.  Only free the
mbufs if we fail this search.

Reviewed by:	jah, kib
Reported and tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24332
2020-04-10 20:41:59 +00:00
Mark Johnston
db38b699b4 Simplify uipc_detach() slightly.
Remove a goto and an unneeded local variable, and fix style.  No
functional change intended.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-03-20 16:18:54 +00:00
Mark Johnston
569a186d02 Remove UNP_NASCENT, reverting r303855.
unp_connectat() no longer holds the link lock across calls to
sonewconn(), so the recursion described in r303855 can no longer occur.
No functional change intended.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-03-20 16:17:54 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Jason A. Harmening
a9aa06f7b1 Implement cycle-detecting garbage collector for AF_UNIX sockets
The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.

This change rewrites the garbage collector to precisely detect cycles:

1. The existing check of msgcount==f_count is still used to determine
   whether the socket is potentially in a cycle.
2. The socket is now placed on a local "dead list", which is used to
   reduce iteration time (and therefore contention on the global
   unp_link_rwlock).
3. The first pass through the dead list removes each potentially-dead
   socket's outgoing references from the graph of potentially-dead
   sockets, using a gc-specific copy of the original reference count.
4. The second series of passes through the dead list removes from the
   list any socket whose remaining gc refcount is non-zero, as this
   indicates the socket is actually accessible outside of any possible
   cycle.  Iteration is repeated until no further sockets are removed
   from the dead list.
5. Sockets remaining in the dead list are destroyed as before.

PR:		227285
Submitted by:	jan.kokemueller@gmail.com (prior version)
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23142
2020-01-25 08:57:26 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mark Johnston
4013d72684 Fix handling of empty SCM_RIGHTS messages.
As unp_internalize() processes the input control messages, it builds
an output mbuf chain containing the internalized representations of
those messages.  In one special case, that of an empty SCM_RIGHTS
message, the message is simply discarded.  However, the loop which
appends mbufs to the output chain assumed that each iteration would
produce an mbuf, resulting in a null pointer dereference if an empty
SCM_RIGHTS message was followed by a non-empty message.

Fix this by advancing the output mbuf chain tail pointer only if an
internalized control message was produced.

Reported by:	syzbot+1b5cced0f7fad26ae382@syzkaller.appspotmail.com
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-08 23:34:48 +00:00
Konstantin Belousov
f1cf2b9dcb Check and avoid overflow when incrementing fp->f_count in
fget_unlocked() and fhold().

On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
	kern.maxfilesperproc: 939132
	kern.maxprocperuid: 34203
which already overflows u_int.  More, the malicious code can create
transient references by sending fds over unix sockets.

I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Reviewed by:	markj (previous version), mjg
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20947
2019-07-21 15:07:12 +00:00
Konstantin Belousov
47c3450e50 Fix leak of memory and file refs with sendmsg(2) over unix domain sockets.
When sendmsg(2) sucessfully internalized one SCM_RIGHTS control
message, but failed to process some other control message later, both
file references and filedescent memory needs to be freed. This was not
done, only mbuf chain was freed.

Noted, test case written, reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21000
2019-07-19 20:51:39 +00:00
Dmitry Chagin
c5afec6e89 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
Mark Johnston
79db6fe7aa Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
Mateusz Guzik
f218ac5087 uipc_usrreq: fix inode number assignment
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.

Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:25:05 +00:00
Mark Johnston
c7902fbeae Improve handling of control message truncation.
If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space
for all control messages, the kernel sets MSG_CTRUNC in the message
flags to indicate truncation of the control messages.  In the case
of SCM_RIGHTS messages, however, we were failing to dispose of the
rights that had already been externalized into the recipient's file
descriptor table.  Add a new function and mbuf type to handle this
cleanup task, and use it any time we fail to copy control messages
out to the recipient.  To simplify cleanup, control message truncation
is now only performed at control message boundaries.

The change also fixes a few related bugs:
- Rights could be leaked to the recipient process if an error occurred
  while copying out a message's contents.
- We failed to set MSG_CTRUNC if the truncation occurred on a control
  message boundary, e.g., if the caller received two control messages
  and provided only the exact amount of buffer space needed for the
  first.

PR:		131876
Reviewed by:	ed (previous version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16561
2018-08-07 16:36:48 +00:00
Mark Johnston
5b0480f2cc Don't check rcv sockbuf limits when sending on a unix stream socket.
sosend_generic() performs an initial comparison of the amount of data
(including control messages) to be transmitted with the send buffer
size. When transmitting on a unix socket, we then compare the amount
of data being sent with the amount of space in the receive buffer size;
if insufficient space is available, sbappendcontrol() returns an error
and the data is lost.  This is easily triggered by sending control
messages together with an amount of data roughly equal to the send
buffer size, since the control message size may change in uipc_send()
as file descriptors are internalized.

Fix the problem by removing the space check in sbappendcontrol(),
whose only consumer is the unix sockets code.  The stream sockets code
uses the SB_STOP mechanism to ensure that senders will block if the
receive buffer fills up.

PR:		181741
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16515
2018-08-04 20:26:54 +00:00
Mark Johnston
e62ca80bde Style. 2018-08-04 20:16:36 +00:00
Alan Somers
da4465506d Fix LOCAL_PEERCRED with socketpair(2)
Enable the LOCAL_PEERCRED socket option for unix domain stream sockets
created with socketpair(2). Previously, it only worked with unix domain
stream sockets created with socket(2)/listen(2)/connect(2)/accept(2).

PR:		176419
Reported by:	Nicholas Wilson <nicholas@nicholaswilson.me.uk>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16350
2018-08-03 01:37:00 +00:00
Brooks Davis
3a20f06a1c Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
Brooks Davis
7524b4c14b Correct breakage on 32-bit platforms from r335979. 2018-07-06 10:03:33 +00:00
Brooks Davis
f38b68ae8a Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
Matt Macy
a62b4665f4 AF_UNIX: bring uipc_ready in compliance with new locking protocol
PR:	228742
Submitted by: markj
Reviewed by:	markj
2018-06-08 20:31:59 +00:00
Matt Macy
fcabd54160 AF_UNIX: check for unp == unp2 on disconnect 2018-06-07 04:57:40 +00:00
Matt Macy
acf9fd05d8 AF_UNIX: It is possible for UNIX datagram sockets to be connected
to themselves. The updated code assumed that that could not happen
and would try to lock the unp mutex twice.

There may be a lingering issue here but this fixes it for the
reporter.

PR:	228458
Reported by:	marieheleneka at gmail.com
2018-05-24 21:13:46 +00:00
Matt Macy
c684c14ce3 AF_UNIX: evidently Samba likes to connect a unix socket to itself, fix locking 2018-05-24 18:22:13 +00:00
Matt Macy
a3a734908b AF_UNIX in connectat unp and unp2 can be the same 2018-05-24 18:22:05 +00:00
Matt Macy
16529dace8 AF_UNIX: assert that we're not acquiring the same lock 2018-05-24 15:28:16 +00:00
Matt Macy
9725c9cef7 AF_UNIX gc unused label
...sigh
2018-05-20 21:37:34 +00:00
Matt Macy
cb8f450b94 AF_UNIX: Don't unlock unp/unp2 if they're not locked
Reported by:	mjg
2018-05-20 21:20:26 +00:00
Matt Macy
e10ef65d23 AF_UNIX: fix LOR introduced by the locking rewrite 2018-05-20 05:50:53 +00:00
Matt Macy
d95253403f AF_UNIX: make unpcb lock name line up with what's in witness 2018-05-20 04:32:48 +00:00
Warner Losh
f344fb0b4b Restore the all rights reserved language. Put it on each of the prior
two copyrights. The line originated with the Berkeely Regents, who
we have not approached about removing it (it's honestly too trivial
to be worth that fight). Restore it to rwatson's line as well. He
can decide if he wants it or not on his own. Matt clearly doesn't
want it, per project preference and his own statements on IRC.

Noticed by: rgrimes@
2018-05-19 17:29:57 +00:00
Matt Macy
4b06dee1e5 AF_UNIX: switch to annotations to avoid warnings 2018-05-19 05:37:58 +00:00