Commit Graph

333 Commits

Author SHA1 Message Date
kib
7d29da5483 Check and avoid overflow when incrementing fp->f_count in
fget_unlocked() and fhold().

On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
	kern.maxfilesperproc: 939132
	kern.maxprocperuid: 34203
which already overflows u_int.  More, the malicious code can create
transient references by sending fds over unix sockets.

I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Reviewed by:	markj (previous version), mjg
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20947
2019-07-21 15:07:12 +00:00
kib
ab87bfcd7a Fix leak of memory and file refs with sendmsg(2) over unix domain sockets.
When sendmsg(2) sucessfully internalized one SCM_RIGHTS control
message, but failed to process some other control message later, both
file references and filedescent memory needs to be freed. This was not
done, only mbuf chain was freed.

Noted, test case written, reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21000
2019-07-19 20:51:39 +00:00
dchagin
a25b408b04 Complete LOCAL_PEERCRED support. Cache pid of the remote process in the
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.

PR:		215202
Reviewed by:	tijl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D20415
2019-05-30 14:24:26 +00:00
markj
5c563658ea Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
mjg
ffdee46ab5 uipc_usrreq: fix inode number assignment
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.

Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:25:05 +00:00
markj
7a979485ab Improve handling of control message truncation.
If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space
for all control messages, the kernel sets MSG_CTRUNC in the message
flags to indicate truncation of the control messages.  In the case
of SCM_RIGHTS messages, however, we were failing to dispose of the
rights that had already been externalized into the recipient's file
descriptor table.  Add a new function and mbuf type to handle this
cleanup task, and use it any time we fail to copy control messages
out to the recipient.  To simplify cleanup, control message truncation
is now only performed at control message boundaries.

The change also fixes a few related bugs:
- Rights could be leaked to the recipient process if an error occurred
  while copying out a message's contents.
- We failed to set MSG_CTRUNC if the truncation occurred on a control
  message boundary, e.g., if the caller received two control messages
  and provided only the exact amount of buffer space needed for the
  first.

PR:		131876
Reviewed by:	ed (previous version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16561
2018-08-07 16:36:48 +00:00
markj
75b64fe9d9 Don't check rcv sockbuf limits when sending on a unix stream socket.
sosend_generic() performs an initial comparison of the amount of data
(including control messages) to be transmitted with the send buffer
size. When transmitting on a unix socket, we then compare the amount
of data being sent with the amount of space in the receive buffer size;
if insufficient space is available, sbappendcontrol() returns an error
and the data is lost.  This is easily triggered by sending control
messages together with an amount of data roughly equal to the send
buffer size, since the control message size may change in uipc_send()
as file descriptors are internalized.

Fix the problem by removing the space check in sbappendcontrol(),
whose only consumer is the unix sockets code.  The stream sockets code
uses the SB_STOP mechanism to ensure that senders will block if the
receive buffer fills up.

PR:		181741
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16515
2018-08-04 20:26:54 +00:00
markj
c0f926949d Style. 2018-08-04 20:16:36 +00:00
asomers
fabe732b5e Fix LOCAL_PEERCRED with socketpair(2)
Enable the LOCAL_PEERCRED socket option for unix domain stream sockets
created with socketpair(2). Previously, it only worked with unix domain
stream sockets created with socket(2)/listen(2)/connect(2)/accept(2).

PR:		176419
Reported by:	Nicholas Wilson <nicholas@nicholaswilson.me.uk>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16350
2018-08-03 01:37:00 +00:00
brooks
39f527e7ee Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
brooks
8baf738e84 Correct breakage on 32-bit platforms from r335979. 2018-07-06 10:03:33 +00:00
brooks
6615ed4c61 Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
mmacy
54ac1282ce AF_UNIX: bring uipc_ready in compliance with new locking protocol
PR:	228742
Submitted by: markj
Reviewed by:	markj
2018-06-08 20:31:59 +00:00
mmacy
2b52b582f3 AF_UNIX: check for unp == unp2 on disconnect 2018-06-07 04:57:40 +00:00
mmacy
36c253e5ad AF_UNIX: It is possible for UNIX datagram sockets to be connected
to themselves. The updated code assumed that that could not happen
and would try to lock the unp mutex twice.

There may be a lingering issue here but this fixes it for the
reporter.

PR:	228458
Reported by:	marieheleneka at gmail.com
2018-05-24 21:13:46 +00:00
mmacy
018b2ffa87 AF_UNIX: evidently Samba likes to connect a unix socket to itself, fix locking 2018-05-24 18:22:13 +00:00
mmacy
a5b8ee8c85 AF_UNIX in connectat unp and unp2 can be the same 2018-05-24 18:22:05 +00:00
mmacy
dc8bdd983f AF_UNIX: assert that we're not acquiring the same lock 2018-05-24 15:28:16 +00:00
mmacy
1789285c11 AF_UNIX gc unused label
...sigh
2018-05-20 21:37:34 +00:00
mmacy
b0c65d1080 AF_UNIX: Don't unlock unp/unp2 if they're not locked
Reported by:	mjg
2018-05-20 21:20:26 +00:00
mmacy
8e113981f9 AF_UNIX: fix LOR introduced by the locking rewrite 2018-05-20 05:50:53 +00:00
mmacy
d518618594 AF_UNIX: make unpcb lock name line up with what's in witness 2018-05-20 04:32:48 +00:00
imp
fe24edb902 Restore the all rights reserved language. Put it on each of the prior
two copyrights. The line originated with the Berkeely Regents, who
we have not approached about removing it (it's honestly too trivial
to be worth that fight). Restore it to rwatson's line as well. He
can decide if he wants it or not on his own. Matt clearly doesn't
want it, per project preference and his own statements on IRC.

Noticed by: rgrimes@
2018-05-19 17:29:57 +00:00
mmacy
20798cced4 AF_UNIX: switch to annotations to avoid warnings 2018-05-19 05:37:58 +00:00
mmacy
092fac4e4a fix gcc8 unused variable and set but not used variable in unix sockets
add copyright from lock rewrite while here
2018-05-19 02:15:40 +00:00
mmacy
7c5c49366c AF_UNIX: make unix socket locking finer grained
This change moves to using a reference count across lock drop / reacquire
to guarantee liveness.

Currently sends on unix sockets contend heavily on read locking the list lock.
unix1_processes in will-it-scale peaks at 6 processes and then declines.

With this change I get a substantial improvement in number of operations per second
with 96 processes:

x before
+ after
    N           Min           Max        Median           Avg        Stddev
x  11       1688420       1696389       1693578     1692766.3     2971.1702
+  10      63417955      71030114      70662504      69576423     2374684.6
Difference at 95.0% confidence
        6.78837e+07 +/- 1.49463e+06
        4010.22% +/- 88.4246%
        (Student's t, pooled s = 1.63437e+06)

And even for 2 processes shows a ~18% improvement.
"Small" iron changes (1, 2, and 4 processes):

x before1
+ after1.2
+------------------------------------------------------------------------+
|                                                                  +     |
|                                                           x      +     |
|                                                           x      +     |
|                                                           x      +     |
|                                                           x     ++     |
|                                                          xx     ++     |
|x                                                       x xx     ++     |
|                                  |__________________A_____M_____AM____||
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10       1131648       1197750     1197138.5     1190369.3     20651.839
+  10       1203840       1205056       1204919     1204827.9     353.27404
Difference at 95.0% confidence
        14458.6 +/- 13723
        1.21463% +/- 1.16683%
        (Student's t, pooled s = 14605.2)

x before2
+ after2.2
+------------------------------------------------------------------------+
|                                                                       +|
|                                                                       +|
|                                                                       +|
|                                                                       +|
|                                                                       +|
|                                                                       +|
|           x                                                           +|
|           x                                                           +|
|         x xx                                                          +|
|x        xxxx                                                          +|
|      |___AM_|                                                         A|
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10       1972843       2045866     2038186.5     2030443.8     21367.694
+  10       2400853       2402196     2401043.5     2401172.7     385.40024
Difference at 95.0% confidence
        370729 +/- 14198.9
        18.2585% +/- 0.826943%
        (Student's t, pooled s = 15111.7)

x before4
+ after4.2
    N           Min           Max        Median           Avg        Stddev
x  10       3986994       3991728     3990137.5     3989985.2     1300.0164
+  10       4799990       4806664     4806116.5       4805194     1990.6625
Difference at 95.0% confidence
        815209 +/- 1579.64
        20.4314% +/- 0.0421713%
        (Student's t, pooled s = 1681.19)

Tested by: pho
Reported by:	mjg
Approved by:	sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15430
2018-05-17 17:59:35 +00:00
pfg
6f66677652 Forgot to sort here in r328238. 2018-01-22 02:26:10 +00:00
pfg
f0c6025eb6 Unsign some values related to allocation.
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after:	3 weeks
2018-01-22 02:08:10 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
glebius
7168fac388 Hide struct socket and struct unpcb from the userland.
Violators may define _WANT_SOCKET and _WANT_UNPCB respectively and
are not guaranteed for stability of the structures.  The violators
list is the the usual one: libprocstat(3) and netstat(1) internally
and lsof in ports.

In struct xunpcb remove the inclusion of kernel structure and add
a bunch of spare fields.  The xsocket already has socket not included,
but add there spares as well.  Embed xsockbuf into xsocket.

Sort declarations in sys/socketvar.h to separate kernel only from
userland available ones.

PR:		221820 (exp-run)
2017-10-02 23:29:56 +00:00
glebius
c3c8e7f59c Fix two issues with not ready data in sockets (read: sendfile)
in UNIX sockets.

o Check that socket is still connected in uipc_ready(). If not
  we are responsible to free mbufs.
o In uipc_send() if socket appears to be disconnected, but we
  are sending data with pending I/Os, don't free mbufs.

Reported by:	Kevin Bowling <kbowling llnw.com>
Tested by:	Kevin Bowling <kbowling llnw.com>
PR:		222259
Reported by:	Mark Martinec <Mark.Martinec ijs.si>
MFC after:	3 days
2017-09-13 16:47:23 +00:00
glebius
e35d543ec1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
glebius
6ab6f6bd42 Remove write only flag UNP_HAVEPCCACHED. 2017-06-02 17:39:05 +00:00
glebius
8a7f8bb123 For UNIX sockets make vnode point not to the socket, but to the UNIX PCB,
since the latter is the thing that links together VFS and sockets.

While here, make the union in the struct vnode anonymous.
2017-06-02 17:31:25 +00:00
glebius
0fb0d228f8 For non-listening AF_UNIX sockets return error code EOPNOTSUPP to match
documentation and SUS.
2017-01-25 22:26:45 +00:00
sobomax
701697521c Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171
2017-01-16 17:46:38 +00:00
emaste
00b67b15b9 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
markj
1ae67e4491 Rename unp_dispose_so() to unp_dispose().
It implements the dom_dispose method for local socket domain, so its name
should match the method name.
2016-08-31 21:48:22 +00:00
markj
cfea0efd4a Handle races with listening socket close when connecting a unix socket.
If the listening socket is closed while sonewconn() is executing, the
nascent child socket is aborted, which results in recursion on the
unp_link lock when the child's pru_detach method is invoked. Fix this
by using a flag to mark such sockets, and skip a part of the socket's
teardown during detach.

Reported by:	Raviprakash Darbha <rdarbha@juniper.net>
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D7398
2016-08-08 20:25:04 +00:00
pfg
a7d40a88c9 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
ed
4a923c8cd0 Remove the errno argument from unp_drop().
While there, add a comment to clarify that ECONNRESET should always be
returned for POSIX conformance.

Suggested by:	Steven Hartland
2016-02-26 12:46:34 +00:00
ed
0151167359 Make asynchronous connection failures on UNIX sockets fail with ECONNRESET.
While making CloudABI work well on Linux, I discovered that I had a
FreeBSD-ism in one of my unit tests. The test did the following:

- Create UNIX socket 1, bind it, make it listen.
- Create UNIX socket 2, connect it to UNIX socket 1.
- Close UNIX socket 1.
- Obtain SO_ERROR from socket 2.

On FreeBSD this returns ECONNABORTED, while on Linux it returns
ECONNRESET. I dug through some of the relevant specifications[1] and it
looks like Linux is all right here. ECONNABORTED should only be returned
when the local connection (socket 2) is aborted; not the peer (socket 1).

It is of course slightly misleading: the function in which we set this
error is called uipc_abort(), but keep in mind that we're aborting the
peer, thus resetting the local socket.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html

Reviewed by:	cem
Sponsored by:	Nuxi, the Netherlands
Differential Revision:	https://reviews.freebsd.org/D5419
2016-02-24 17:10:32 +00:00
glebius
e25e77f91d Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like
sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM
sockets, due to sendfile(2) supporting only the latter, there is a corner
case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake
of control data, albeit being stream socket.

Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY,
similar to m_demote().
2016-01-08 19:03:20 +00:00
glebius
088235535d Revert r293405: it breaks socket buffer INVARIANTS when sending control
data over local sockets.
2016-01-08 17:27:23 +00:00
glebius
a4cad9f2ef For SOCK_STREAM socket use sbappendstream() instead of sbappend(). 2016-01-08 01:16:03 +00:00
mjg
cc8534cb73 fd: make the common case in filecaps_copy work lockless
The filedesc lock is only needed if ioctls caps are present, which is a
rare situation. This is a step towards reducing the scope of the filedesc
lock.
2015-09-07 20:02:56 +00:00
cem
576619e564 Fix cleanup race between unp_dispose and unp_gc
unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.

This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.

To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

PR:		194264
Differential Revision:	https://reviews.freebsd.org/D3044
Reviewed by:	mjg (earlier version)
Approved by:	markj (mentor)
Obtained from:	mjg
MFC after:	1 month
Sponsored by:	EMC / Isilon Storage Division
2015-07-14 02:00:50 +00:00
ed
790c476c1a Let listen() return EDESTADDRREQ when not bound.
We currently return EINVAL when calling listen() on a UNIX socket that
has not been bound to a pathname. If my interpretation of POSIX is
correct, we should return EDESTADDRREQ: "The socket is not bound to a
local address, and the protocol does not support listening on an unbound
socket."

Return EDESTADDRREQ instead when not bound and not connected.

Differential Revision:	https://reviews.freebsd.org/D3038
Reviewed by:	gnn, network
2015-07-10 06:47:14 +00:00
mjg
98e752b84d fd: move out actual fp installation to _finstall
Use it in fd passing functions as the first step towards fd code cleanup.
2015-06-14 14:08:52 +00:00
mjg
a75e86bc6b ussreq: use saved fdp pointer insted of td->td_proc->p_fd
No functional changes.
2015-06-12 06:28:22 +00:00