Commit Graph

700 Commits

Author SHA1 Message Date
Mark Johnston
36bd49ac4d fd: Avoid truncating output buffers for KERN_PROC_{CWD,FILEDESC}
These sysctls failed to return an error if the caller had provided too
short an output buffer.  Change them to return ENOMEM instead, to ensure
that callers can detect truncation in the face of a concurrently
changing fd table.

PR:		228432
Discussed with:	cem, jhb
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D15607
2021-12-17 13:10:23 -05:00
Mark Johnston
327060bd77 fd: Initialize more export_fd_buf fields in kern_proc_cwd_out()
In particular, we need to initialize efbuf->flags, since
export_vnode_to_sb() loads that field.  This was mostly harmless since
the flag only determines whether the output kinfo_file is packed, and
KERN_PROC_CWD only ever emits a single kinfo_file anyway.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-12-17 13:10:22 -05:00
Konstantin Belousov
794d3e8e63 fcntl(2): add F_KINFO operation
that returns struct kinfo_file for the given file descriptor.  Among
other data, it also returns kf_path, if file op was able to restore file
path.

Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-06 22:18:09 +02:00
Konstantin Belousov
6e51d61a96 Add declaration for static export_file_to_kinfo()
Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33277
2021-12-06 22:18:09 +02:00
Brooks Davis
6eefabd4ca syscalls: improve nstat, nfstat, nlstat
Optionally return errors when truncating dev_t, ino_t, and nlink_t.
In the interest of code reuse, use freebsd11_cvtstat() to perform the
truncation and error handling and then convert the resulting struct
freebsd11_stat to struct nstat.

Add missing freebsd32 compat syscalls. These syscalls require
translation because struct nstat contains four instances of struct
timespec which in turn contains a time_t and a long.

Reviewed by:	kib
2021-11-22 22:36:56 +00:00
Konstantin Belousov
be10c0a910 fexecve(2): allow O_PATH file descriptors opened without O_EXEC
This improves compatibility with Linux.

Noted by:	Drew DeVault <sir@cmpwn.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32821
2021-11-03 18:00:42 +02:00
Mateusz Guzik
7dd419cabc cache: add empty path support
This avoids spurious drop offs as EMPTY is passed regardless of the
actual path name.

Pushign the work inside the lookup instead of just ignorign the flag
allows avoid checking for empty pathname for all other lookups.
2021-10-16 20:08:37 +00:00
Mateusz Guzik
2b68eb8e1d vfs: remove thread argument from VOP_STAT
and fo_stat.
2021-10-11 13:22:32 +00:00
Mateusz Guzik
a0558fe90d Retire code added to support CloudABI
CloudABI was removed in cf0ee8738e
2021-10-10 18:24:29 +00:00
Mateusz Guzik
85c855d31b fd: add pwd_hold_proc 2021-09-30 12:49:51 +02:00
Mateusz Guzik
d71e1a883c fifo: support flock
This evens it up with Linux.

Original patch by:	Greg V <greg@unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D24255#565302
2021-09-25 14:58:31 +00:00
Mark Johnston
7326e8589c fsetown: Avoid process group lock recursion
Restore the pre-1d874ba4f8ba behaviour of disassociating the current
SIGIO recipient before looking up the specified process or process
group.  This avoids a lock recursion in the scenario where a process
group is configured to receive SIGIO for an fd when it has already been
so configured.

Reported by:	pho
Tested by:	pho
Reviewed by:	kib
MFC after:	3 days
2021-08-28 15:50:44 -04:00
Mark Johnston
a507a40f3b fsetown: Simplify error handling
No functional change intended.

Suggested by:	kib
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31671
2021-08-25 16:20:07 -04:00
Mark Johnston
1d874ba4f8 fsetown: Fix process lookup bugs
- pget()/pfind() will acquire the PID hash bucket locks, which are
  sleepable sx locks, but this means that the sigio mutex cannot be held
  while calling these functions.  Instead, use pget() to hold the
  process, after which we lock the sigio and proc locks, respectively.
- funsetownlst() assumes that processes cannot be registered for SIGIO
  once they have P_WEXIT set.  However, pfind() will happily return
  exiting processes, breaking the invariant.  Add an explicit check for
  P_WEXIT in fsetown() to fix this. [1]

Fixes:	f52979098d ("Fix a pair of races in SIGIO registration")
Reported by:	syzkaller [1]
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31661
2021-08-25 16:18:10 -04:00
Mark Johnston
0dcef81de9 Add required sysctl name length checks to various handlers
Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-07-23 10:47:13 -04:00
Mateusz Guzik
9bfddb3ac4 fd: use PROC_WAIT_UNLOCKED when clearing p_fd/p_pd 2021-05-29 22:04:09 +00:00
Konstantin Belousov
1762f674cc ktrace: pack all ktrace parameters into allocated structure ktr_io_params
Ref-count the ktr_io_params structure instead of vnode/cred.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
70c05850e2 kern_descrip.c: Style
Wrap too long lines.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30257
2021-05-22 15:16:08 +03:00
Konstantin Belousov
bbf7a4e878 O_PATH: allow vnode kevent filter on such files
if VREAD access is checked as allowed during open

Requested by:	wulf
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:49:18 +03:00
Konstantin Belousov
a5970a529c Make files opened with O_PATH to not block non-forced unmount
by only keeping hold count on the vnode, instead of the use count.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:48:27 +03:00
Konstantin Belousov
8d9ed174f3 open(2): Implement O_PATH
Reviewed by:	markj
Tested by:	pho
Discussed with:	walker.aj325_gmail.com, wulf
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:48:24 +03:00
Konstantin Belousov
42be0a7b10 Style.
Add missed spaces, wrap long lines.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29323
2021-04-15 12:47:46 +03:00
Alex Richardson
fa32350347 close_range: add audit support
This fixes the closefrom test in sys/audit.

Includes cherry-picks of the following commits from openbsm:

4dfc628aaf
99ff6fe32a
da48a0399e

Reviewed By:	kevans
Differential Revision: https://reviews.freebsd.org/D28388
2021-02-23 17:47:07 +00:00
Jamie Gritton
d4380c0cdd jail: Change both root and working directories in jail_attach(2)
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.

Add a matching internal chdir operation to the jail's root.  Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.

Reported by:    mjg
Approved by:    markj, kib
MFC after:      3 days
2021-02-19 14:13:35 -08:00
Alex Richardson
0482d7c9e9 Fix fget_only_user() to return ENOTCAPABLE on a failed capsicum check
After eaad8d1303 four additional
capsicum-test tests started failing. It turns out this is because
fget_only_user() was returning EBADF on a failed capsicum check instead
of forwarding the return value of cap_check_inline() like
fget_unlocked_seq().

capsicum-test failures before this:
```
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] Capability.OperationsForked
[  FAILED  ] Capability.NoBypassDAC
[  FAILED  ] Pdfork.OtherUserForked
[  FAILED  ] PipePdfork.WildcardWait
[  FAILED  ] OpenatTest.WithFlag
[  FAILED  ] ForkedOpenatTest_WithFlagInCapabilityMode._
[  FAILED  ] Select.LotsOFileDescriptorsForked
```
After:
```
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] Capability.NoBypassDAC
[  FAILED  ] Pdfork.OtherUserForked
[  FAILED  ] PipePdfork.WildcardWait
```

Reviewed By:	mjg
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D28691
2021-02-15 22:55:12 +00:00
Mateusz Guzik
eaad8d1303 fd: add fget_only_user
This can be used by single-threaded processes which don't share a file
descriptor table to access their file objects without having to
reference them.

For example select consumers tend to match the requirement and have
several file descriptors to inspect.
2021-01-29 11:23:43 +00:00
Mateusz Guzik
5753be8e43 fd: add refcount argument to falloc_noinstall
This lets callers avoid atomic ops by initializing the count to required
value from the get go.

While here add falloc_abort to backpedal from this without having to
fdrop.
2021-01-13 15:29:34 +00:00
Mateusz Guzik
530b699a62 fd: add finstall_refed
Can be used to consume an already existing reference and consequently
avoid atomic ops.
2021-01-13 03:27:03 +01:00
Mateusz Guzik
4faa375cdd fd: provide a dedicated closef variant for unix socket code
This avoids testing for td != NULL.
2021-01-13 03:27:03 +01:00
Mateusz Guzik
71bd18d373 fd: use seqc_read_notmodify when translating fds 2021-01-07 23:30:04 +00:00
Mateusz Guzik
20ac5cda96 fd: make fd/fp mandatory
They are both always passed anyway.
2021-01-07 23:30:04 +00:00
Mateusz Guzik
bb3a12f0e5 fd: inline pwd_get_smr
Tested by:	pho
2021-01-01 00:10:42 +00:00
Konstantin Belousov
7a202823aa Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues.  Unfortunately, kqueues
are not passable between processes.  And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow.  This came up when debugging strange HW video decode
failures in Firefox.  A native implementation would avoid these problems
and help with porting Linux software.

Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Mateusz Guzik
57efe26bcb fd: reimplement close_range to avoid spurious relocking 2020-12-17 18:52:30 +00:00
Mateusz Guzik
08a5615cfe audit: rework AUDIT_SYSCLOSE
This in particular avoids spurious lookups on close.
2020-12-17 18:52:04 +00:00
Mateusz Guzik
1e71e7c4f6 fd: refactor closefp in preparation for close_range rework 2020-12-17 18:51:09 +00:00
Mateusz Guzik
08241fedc4 fd: remove redundant saturation check from fget_unlocked_seq
refcount_acquire_if_not_zero returns true on saturation.
The case of 0 is handled by looping again, after which the originally
found pointer will no longer be there.

Noted by:	kib
2020-12-16 18:01:41 +00:00
Mateusz Guzik
edcdcefb88 fd: fix fdrop prediction when closing a fd
Most of the time this is the last reference, contrary to typical fdrop use.
2020-12-13 18:06:24 +00:00
Mateusz Guzik
0ecce93dca fd: make serialization in fdescfree_fds conditional on hold count
p_fd nullification in fdescfree serializes against new threads transitioning
the count 1 -> 2, meaning that fdescfree_fds observing the count of 1 can
safely assume there is nobody else using the table. Losing the race and
observing > 1 is harmless.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27522
2020-12-10 17:17:22 +00:00
Mark Johnston
3309fa7403 Plug a race between fd table teardown and several loops
To export information from fd tables we have several loops which do
this:

FILDESC_SLOCK(fdp);
for (i = 0; fdp->fd_refcount > 0 && i <= lastfile; i++)
	<export info for fd i>;
FILDESC_SUNLOCK(fdp);

Before r367777, fdescfree() acquired the fd table exclusive lock between
decrementing fdp->fd_refcount and freeing table entries.  This
serialized with the loop above, so the file at descriptor i would remain
valid until the lock is dropped.  Now there is no serialization, so the
loops may race with teardown of file descriptor tables.

Acquire the exclusive fdtable lock after releasing the final table
reference to provide a barrier synchronizing with these loops.

Reported by:	pho
Reviewed by:	kib (previous version), mjg
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27513
2020-12-09 14:05:08 +00:00
Mark Johnston
4c1c90ea95 Use refcount_load(9) to load fd table reference counts
No functional change intended.

Reviewed by:	kib, mjg
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27512
2020-12-09 14:04:54 +00:00
Kyle Evans
c7ef3490e2 kern: never restart syscalls calling closefp(), e.g. close(2)
All paths leading into closefp() will either replace or remove the fd from
the filedesc table, and closefp() will call fo_close methods that can and do
currently sleep without regard for the possibility of an ERESTART. This can
be dangerous in multithreaded applications as another thread could have
opened another file in its place that is subsequently operated on upon
restart.

The following are seemingly the only ones that will pass back ERESTART
in-tree:
- sockets (SO_LINGER)
- fusefs
- nfsclient

Reviewed by:	jilles, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27310
2020-11-25 01:08:57 +00:00
Kyle Evans
f96078b8fe kern: dup: do not assume oldfde is valid
oldfde may be invalidated if the table has grown due to the operation that
we're performing, either via fdalloc() or a direct fdgrowtable_exp().

This was technically OK before rS367927 because the old table remained valid
until the filedesc became unused, but now it may be freed immediately if
it's an unshared table in a single-threaded process, so it is no longer a
good assumption to make.

This fixes dup/dup2 invocations that grow the file table; in the initial
report, it manifested as a kernel panic in devel/gmake's configure script.

Reported by:	Guy Yur <guyyur gmail com>
Reviewed by:	rew
Differential Revision:	https://reviews.freebsd.org/D27319
2020-11-23 00:33:06 +00:00
Robert Wing
3c85ca21d1 fd: free old file descriptor tables when not shared
During the life of a process, new file descriptor tables may be allocated. When
a new table is allocated, the old table is placed in a free list and held onto
until all processes referencing them exit.

When a new file descriptor table is allocated, the old file descriptor table
can be freed when the current process has a single-thread and the file
descriptor table is not being shared with any other processes.

Reviewed by:    kevans
Approved by:    kevans (mentor)
Differential Revision:  https://reviews.freebsd.org/D18617
2020-11-22 05:00:28 +00:00
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Mark Johnston
f52979098d Fix a pair of races in SIGIO registration
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list.  Then it
acquires the global sigio lock and processes the list.  However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.

Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty.  Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.

Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object.  However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object.  However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.

Fix this by introducing funsetown_locked(), which avoids the racy check.

Reviewed by:	kib
Reported by:	pho
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27157
2020-11-11 13:44:27 +00:00
Mateusz Guzik
3c50616fc1 fd: make all f_count uses go through refcount_* 2020-11-05 02:12:33 +00:00
Mateusz Guzik
d737e9eaf5 fd: hide _fdrop 0 count check behind INVARIANTS
While here use refcount_load and make sure to report the tested value.
2020-11-05 02:12:08 +00:00
Mateusz Guzik
dd28b379cb vfs: support lockless dirfd lookups 2020-10-10 03:48:17 +00:00
Mateusz Guzik
4e2266100d cache: fix pwd use-after-free in setting up fallback
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.

Note it is very unlikely anyone ran into it.
2020-10-05 19:38:51 +00:00