freebsd-skq

Author	SHA1	Message	Date
kib	7d29da5483	Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold(). On sufficiently large machine, f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some smaller machine, I see kern.maxfilesperproc: 939132 kern.maxprocperuid: 34203 which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets. I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html Reviewed by: markj (previous version), mjg Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20947	2019-07-21 15:07:12 +00:00
kib	ab87bfcd7a	Fix leak of memory and file refs with sendmsg(2) over unix domain sockets. When sendmsg(2) sucessfully internalized one SCM_RIGHTS control message, but failed to process some other control message later, both file references and filedescent memory needs to be freed. This was not done, only mbuf chain was freed. Noted, test case written, reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21000	2019-07-19 20:51:39 +00:00
dchagin	a25b408b04	Complete LOCAL_PEERCRED support. Cache pid of the remote process in the struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed. PR: 215202 Reviewed by: tijl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D20415	2019-05-30 14:24:26 +00:00
markj	5c563658ea	Plug some networking sysctl leaks. Various network protocol sysctl handlers were not zero-filling their output buffers and thus would export uninitialized stack memory to userland. Fix a number of such handlers. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: tuexen MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18301	2018-11-22 20:49:41 +00:00
mjg	ffdee46ab5	uipc_usrreq: fix inode number assignment The code was incrementing a global variable in an unsafe manner. Two different threads stating two different sockets could have resulted in the same inode numbers assigned to both. Creation is protected with a global lock, move the assigment there. Since inode numbers are 64-bit now drop the check for overflows. Sponsored by: The FreeBSD Foundation	2018-11-21 22:25:05 +00:00
markj	7a979485ab	Improve handling of control message truncation. If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space for all control messages, the kernel sets MSG_CTRUNC in the message flags to indicate truncation of the control messages. In the case of SCM_RIGHTS messages, however, we were failing to dispose of the rights that had already been externalized into the recipient's file descriptor table. Add a new function and mbuf type to handle this cleanup task, and use it any time we fail to copy control messages out to the recipient. To simplify cleanup, control message truncation is now only performed at control message boundaries. The change also fixes a few related bugs: - Rights could be leaked to the recipient process if an error occurred while copying out a message's contents. - We failed to set MSG_CTRUNC if the truncation occurred on a control message boundary, e.g., if the caller received two control messages and provided only the exact amount of buffer space needed for the first. PR: 131876 Reviewed by: ed (previous version) MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16561	2018-08-07 16:36:48 +00:00
markj	75b64fe9d9	Don't check rcv sockbuf limits when sending on a unix stream socket. sosend_generic() performs an initial comparison of the amount of data (including control messages) to be transmitted with the send buffer size. When transmitting on a unix socket, we then compare the amount of data being sent with the amount of space in the receive buffer size; if insufficient space is available, sbappendcontrol() returns an error and the data is lost. This is easily triggered by sending control messages together with an amount of data roughly equal to the send buffer size, since the control message size may change in uipc_send() as file descriptors are internalized. Fix the problem by removing the space check in sbappendcontrol(), whose only consumer is the unix sockets code. The stream sockets code uses the SB_STOP mechanism to ensure that senders will block if the receive buffer fills up. PR: 181741 MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16515	2018-08-04 20:26:54 +00:00
markj	c0f926949d	Style.	2018-08-04 20:16:36 +00:00
asomers	fabe732b5e	Fix LOCAL_PEERCRED with socketpair(2) Enable the LOCAL_PEERCRED socket option for unix domain stream sockets created with socketpair(2). Previously, it only worked with unix domain stream sockets created with socket(2)/listen(2)/connect(2)/accept(2). PR: 176419 Reported by: Nicholas Wilson <nicholas@nicholaswilson.me.uk> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16350	2018-08-03 01:37:00 +00:00
brooks	39f527e7ee	Use uintptr_t alone when assigning to kvaddr_t variables. Suggested by: jhb	2018-07-10 13:03:06 +00:00
brooks	8baf738e84	Correct breakage on 32-bit platforms from r335979.	2018-07-06 10:03:33 +00:00
brooks	6615ed4c61	Make struct xinpcb and friends word-size independent. Replace size_t members with ksize_t (uint64_t) and pointer members (never used as pointers in userspace, but instead as unique idenitifiers) with kvaddr_t (uint64_t). This makes the structs identical between 32-bit and 64-bit ABIs. On 64-bit bit systems, the ABI is maintained. On 32-bit systems, this is an ABI breaking change. The ABI of most of these structs was previously broken in r315662. This also imposes a small API change on userspace consumers who must handle kernel pointers becoming virtual addresses. PR: 228301 (exp-run by antoine) Reviewed by: jtl, kib, rwatson (various versions) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15386	2018-07-05 13:13:48 +00:00
mmacy	54ac1282ce	AF_UNIX: bring uipc_ready in compliance with new locking protocol PR: 228742 Submitted by: markj Reviewed by: markj	2018-06-08 20:31:59 +00:00
mmacy	2b52b582f3	AF_UNIX: check for unp == unp2 on disconnect	2018-06-07 04:57:40 +00:00
mmacy	36c253e5ad	AF_UNIX: It is possible for UNIX datagram sockets to be connected to themselves. The updated code assumed that that could not happen and would try to lock the unp mutex twice. There may be a lingering issue here but this fixes it for the reporter. PR: 228458 Reported by: marieheleneka at gmail.com	2018-05-24 21:13:46 +00:00
mmacy	018b2ffa87	AF_UNIX: evidently Samba likes to connect a unix socket to itself, fix locking	2018-05-24 18:22:13 +00:00
mmacy	a5b8ee8c85	AF_UNIX in connectat unp and unp2 can be the same	2018-05-24 18:22:05 +00:00
mmacy	dc8bdd983f	AF_UNIX: assert that we're not acquiring the same lock	2018-05-24 15:28:16 +00:00
mmacy	1789285c11	AF_UNIX gc unused label ...sigh	2018-05-20 21:37:34 +00:00
mmacy	b0c65d1080	AF_UNIX: Don't unlock unp/unp2 if they're not locked Reported by: mjg	2018-05-20 21:20:26 +00:00
mmacy	8e113981f9	AF_UNIX: fix LOR introduced by the locking rewrite	2018-05-20 05:50:53 +00:00
mmacy	d518618594	AF_UNIX: make unpcb lock name line up with what's in witness	2018-05-20 04:32:48 +00:00
imp	fe24edb902	Restore the all rights reserved language. Put it on each of the prior two copyrights. The line originated with the Berkeely Regents, who we have not approached about removing it (it's honestly too trivial to be worth that fight). Restore it to rwatson's line as well. He can decide if he wants it or not on his own. Matt clearly doesn't want it, per project preference and his own statements on IRC. Noticed by: rgrimes@	2018-05-19 17:29:57 +00:00
mmacy	20798cced4	AF_UNIX: switch to annotations to avoid warnings	2018-05-19 05:37:58 +00:00
mmacy	092fac4e4a	fix gcc8 unused variable and set but not used variable in unix sockets add copyright from lock rewrite while here	2018-05-19 02:15:40 +00:00
mmacy	7c5c49366c	AF_UNIX: make unix socket locking finer grained This change moves to using a reference count across lock drop / reacquire to guarantee liveness. Currently sends on unix sockets contend heavily on read locking the list lock. unix1_processes in will-it-scale peaks at 6 processes and then declines. With this change I get a substantial improvement in number of operations per second with 96 processes: x before + after N Min Max Median Avg Stddev x 11 1688420 1696389 1693578 1692766.3 2971.1702 + 10 63417955 71030114 70662504 69576423 2374684.6 Difference at 95.0% confidence 6.78837e+07 +/- 1.49463e+06 4010.22% +/- 88.4246% (Student's t, pooled s = 1.63437e+06) And even for 2 processes shows a ~18% improvement. "Small" iron changes (1, 2, and 4 processes): x before1 + after1.2 +------------------------------------------------------------------------+ \| + \| \| x + \| \| x + \| \| x + \| \| x ++ \| \| xx ++ \| \|x x xx ++ \| \| \|__________________A_____M_____AM____\|\| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 1131648 1197750 1197138.5 1190369.3 20651.839 + 10 1203840 1205056 1204919 1204827.9 353.27404 Difference at 95.0% confidence 14458.6 +/- 13723 1.21463% +/- 1.16683% (Student's t, pooled s = 14605.2) x before2 + after2.2 +------------------------------------------------------------------------+ \| +\| \| +\| \| +\| \| +\| \| +\| \| +\| \| x +\| \| x +\| \| x xx +\| \|x xxxx +\| \| \|___AM_\| A\| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 1972843 2045866 2038186.5 2030443.8 21367.694 + 10 2400853 2402196 2401043.5 2401172.7 385.40024 Difference at 95.0% confidence 370729 +/- 14198.9 18.2585% +/- 0.826943% (Student's t, pooled s = 15111.7) x before4 + after4.2 N Min Max Median Avg Stddev x 10 3986994 3991728 3990137.5 3989985.2 1300.0164 + 10 4799990 4806664 4806116.5 4805194 1990.6625 Difference at 95.0% confidence 815209 +/- 1579.64 20.4314% +/- 0.0421713% (Student's t, pooled s = 1681.19) Tested by: pho Reported by: mjg Approved by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15430	2018-05-17 17:59:35 +00:00
pfg	6f66677652	Forgot to sort here in r328238.	2018-01-22 02:26:10 +00:00
pfg	f0c6025eb6	Unsign some values related to allocation. When allocating memory through malloc(9), we always expect the amount of memory requested to be unsigned as a negative value would either stand for an error or an overflow. Unsign some values, found when considering the use of mallocarray(9), to avoid unnecessary casting. Also consider that indexes should be of at least the same size/type as the upper limit they pretend to index. MFC after: 3 weeks	2018-01-22 02:08:10 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
glebius	7168fac388	Hide struct socket and struct unpcb from the userland. Violators may define _WANT_SOCKET and _WANT_UNPCB respectively and are not guaranteed for stability of the structures. The violators list is the the usual one: libprocstat(3) and netstat(1) internally and lsof in ports. In struct xunpcb remove the inclusion of kernel structure and add a bunch of spare fields. The xsocket already has socket not included, but add there spares as well. Embed xsockbuf into xsocket. Sort declarations in sys/socketvar.h to separate kernel only from userland available ones. PR: 221820 (exp-run)	2017-10-02 23:29:56 +00:00
glebius	c3c8e7f59c	Fix two issues with not ready data in sockets (read: sendfile) in UNIX sockets. o Check that socket is still connected in uipc_ready(). If not we are responsible to free mbufs. o In uipc_send() if socket appears to be disconnected, but we are sending data with pending I/Os, don't free mbufs. Reported by: Kevin Bowling <kbowling llnw.com> Tested by: Kevin Bowling <kbowling llnw.com> PR: 222259 Reported by: Mark Martinec <Mark.Martinec ijs.si> MFC after: 3 days	2017-09-13 16:47:23 +00:00
glebius	e35d543ec1	Listening sockets improvements. o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set(). o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information. o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc. o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()? Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770	2017-06-08 21:30:34 +00:00
glebius	6ab6f6bd42	Remove write only flag UNP_HAVEPCCACHED.	2017-06-02 17:39:05 +00:00
glebius	8a7f8bb123	For UNIX sockets make vnode point not to the socket, but to the UNIX PCB, since the latter is the thing that links together VFS and sockets. While here, make the union in the struct vnode anonymous.	2017-06-02 17:31:25 +00:00
glebius	0fb0d228f8	For non-listening AF_UNIX sockets return error code EOPNOTSUPP to match documentation and SUS.	2017-01-25 22:26:45 +00:00
sobomax	701697521c	Add a new socket option SO_TS_CLOCK to pick from several different clock sources to return timestamps when SO_TIMESTAMP is enabled. Two additional clock sources are: o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME); o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC). In addition to this, this option provides unified interface to get bintime (equivalent of using SO_BINTIME), except it also supported with IPv6 where SO_BINTIME has never been supported. The long term plan is to depreciate SO_BINTIME and move everything to using SO_TS_CLOCK. Idea for this enhancement has been briefly discussed on the Net session during dev summit in Ottawa last June and the general input was positive. This change is believed to benefit network benchmarks/profiling as well as other scenarios where precise time of arrival measurement is necessary. There are two regression test cases as part of this commit: one extends unix domain test code (unix_cmsg) to test new SCM_XXX types and another one implementis totally new test case which exchanges UDP packets between two processes using both conventional methods (i.e. calling clock_gettime(2) before recv(2) and after send(2)), as well as using setsockopt()+recv() in receive path. The resulting delays are checked for sanity for all supported clock types. Reviewed by: adrian, gnn Differential Revision: https://reviews.freebsd.org/D9171	2017-01-16 17:46:38 +00:00
emaste	00b67b15b9	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
markj	1ae67e4491	Rename unp_dispose_so() to unp_dispose(). It implements the dom_dispose method for local socket domain, so its name should match the method name.	2016-08-31 21:48:22 +00:00
markj	cfea0efd4a	Handle races with listening socket close when connecting a unix socket. If the listening socket is closed while sonewconn() is executing, the nascent child socket is aborted, which results in recursion on the unp_link lock when the child's pru_detach method is invoked. Fix this by using a flag to mark such sockets, and skip a part of the socket's teardown during detach. Reported by: Raviprakash Darbha <rdarbha@juniper.net> Tested by: pho MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7398	2016-08-08 20:25:04 +00:00
pfg	a7d40a88c9	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
ed	4a923c8cd0	Remove the errno argument from unp_drop(). While there, add a comment to clarify that ECONNRESET should always be returned for POSIX conformance. Suggested by: Steven Hartland	2016-02-26 12:46:34 +00:00
ed	0151167359	Make asynchronous connection failures on UNIX sockets fail with ECONNRESET. While making CloudABI work well on Linux, I discovered that I had a FreeBSD-ism in one of my unit tests. The test did the following: - Create UNIX socket 1, bind it, make it listen. - Create UNIX socket 2, connect it to UNIX socket 1. - Close UNIX socket 1. - Obtain SO_ERROR from socket 2. On FreeBSD this returns ECONNABORTED, while on Linux it returns ECONNRESET. I dug through some of the relevant specifications[1] and it looks like Linux is all right here. ECONNABORTED should only be returned when the local connection (socket 2) is aborted; not the peer (socket 1). It is of course slightly misleading: the function in which we set this error is called uipc_abort(), but keep in mind that we're aborting the peer, thus resetting the local socket. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html Reviewed by: cem Sponsored by: Nuxi, the Netherlands Differential Revision: https://reviews.freebsd.org/D5419	2016-02-24 17:10:32 +00:00
glebius	e25e77f91d	Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM sockets, due to sendfile(2) supporting only the latter, there is a corner case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake of control data, albeit being stream socket. Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY, similar to m_demote().	2016-01-08 19:03:20 +00:00
glebius	088235535d	Revert r293405: it breaks socket buffer INVARIANTS when sending control data over local sockets.	2016-01-08 17:27:23 +00:00
glebius	a4cad9f2ef	For SOCK_STREAM socket use sbappendstream() instead of sbappend().	2016-01-08 01:16:03 +00:00
mjg	cc8534cb73	fd: make the common case in filecaps_copy work lockless The filedesc lock is only needed if ioctls caps are present, which is a rare situation. This is a step towards reducing the scope of the filedesc lock.	2015-09-07 20:02:56 +00:00
cem	576619e564	Fix cleanup race between unp_dispose and unp_gc unp_dispose and unp_gc could race to teardown the same mbuf chains, which can lead to dereferencing freed filedesc pointers. This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS as invalid/freed. The flag is protected by UNP_LIST_LOCK. To serialize against unp_gc, unp_dispose needs the socket object. Change the dom_dispose() KPI to take a socket object instead of an mbuf chain directly. PR: 194264 Differential Revision: https://reviews.freebsd.org/D3044 Reviewed by: mjg (earlier version) Approved by: markj (mentor) Obtained from: mjg MFC after: 1 month Sponsored by: EMC / Isilon Storage Division	2015-07-14 02:00:50 +00:00
ed	790c476c1a	Let listen() return EDESTADDRREQ when not bound. We currently return EINVAL when calling listen() on a UNIX socket that has not been bound to a pathname. If my interpretation of POSIX is correct, we should return EDESTADDRREQ: "The socket is not bound to a local address, and the protocol does not support listening on an unbound socket." Return EDESTADDRREQ instead when not bound and not connected. Differential Revision: https://reviews.freebsd.org/D3038 Reviewed by: gnn, network	2015-07-10 06:47:14 +00:00
mjg	98e752b84d	fd: move out actual fp installation to _finstall Use it in fd passing functions as the first step towards fd code cleanup.	2015-06-14 14:08:52 +00:00
mjg	a75e86bc6b	ussreq: use saved fdp pointer insted of td->td_proc->p_fd No functional changes.	2015-06-12 06:28:22 +00:00

1 2 3 4 5 ...

333 Commits