This avoids spurious drop offs as EMPTY is passed regardless of the
actual path name.
Pushign the work inside the lookup instead of just ignorign the flag
allows avoid checking for empty pathname for all other lookups.
Restore the pre-1d874ba4f8ba behaviour of disassociating the current
SIGIO recipient before looking up the specified process or process
group. This avoids a lock recursion in the scenario where a process
group is configured to receive SIGIO for an fd when it has already been
so configured.
Reported by: pho
Tested by: pho
Reviewed by: kib
MFC after: 3 days
- pget()/pfind() will acquire the PID hash bucket locks, which are
sleepable sx locks, but this means that the sigio mutex cannot be held
while calling these functions. Instead, use pget() to hold the
process, after which we lock the sigio and proc locks, respectively.
- funsetownlst() assumes that processes cannot be registered for SIGIO
once they have P_WEXIT set. However, pfind() will happily return
exiting processes, breaking the invariant. Add an explicit check for
P_WEXIT in fsetown() to fix this. [1]
Fixes: f52979098d ("Fix a pair of races in SIGIO registration")
Reported by: syzkaller [1]
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31661
Wrap too long lines.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30257
if VREAD access is checked as allowed during open
Requested by: wulf
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29323
by only keeping hold count on the vnode, instead of the use count.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29323
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.
Add a matching internal chdir operation to the jail's root. Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.
Reported by: mjg
Approved by: markj, kib
MFC after: 3 days
This can be used by single-threaded processes which don't share a file
descriptor table to access their file objects without having to
reference them.
For example select consumers tend to match the requirement and have
several file descriptors to inspect.
This lets callers avoid atomic ops by initializing the count to required
value from the get go.
While here add falloc_abort to backpedal from this without having to
fdrop.
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.
Submitted by: greg@unrelenting.technology
Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26668
refcount_acquire_if_not_zero returns true on saturation.
The case of 0 is handled by looping again, after which the originally
found pointer will no longer be there.
Noted by: kib
p_fd nullification in fdescfree serializes against new threads transitioning
the count 1 -> 2, meaning that fdescfree_fds observing the count of 1 can
safely assume there is nobody else using the table. Losing the race and
observing > 1 is harmless.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27522
To export information from fd tables we have several loops which do
this:
FILDESC_SLOCK(fdp);
for (i = 0; fdp->fd_refcount > 0 && i <= lastfile; i++)
<export info for fd i>;
FILDESC_SUNLOCK(fdp);
Before r367777, fdescfree() acquired the fd table exclusive lock between
decrementing fdp->fd_refcount and freeing table entries. This
serialized with the loop above, so the file at descriptor i would remain
valid until the lock is dropped. Now there is no serialization, so the
loops may race with teardown of file descriptor tables.
Acquire the exclusive fdtable lock after releasing the final table
reference to provide a barrier synchronizing with these loops.
Reported by: pho
Reviewed by: kib (previous version), mjg
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27513
All paths leading into closefp() will either replace or remove the fd from
the filedesc table, and closefp() will call fo_close methods that can and do
currently sleep without regard for the possibility of an ERESTART. This can
be dangerous in multithreaded applications as another thread could have
opened another file in its place that is subsequently operated on upon
restart.
The following are seemingly the only ones that will pass back ERESTART
in-tree:
- sockets (SO_LINGER)
- fusefs
- nfsclient
Reviewed by: jilles, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27310
oldfde may be invalidated if the table has grown due to the operation that
we're performing, either via fdalloc() or a direct fdgrowtable_exp().
This was technically OK before rS367927 because the old table remained valid
until the filedesc became unused, but now it may be freed immediately if
it's an unshared table in a single-threaded process, so it is no longer a
good assumption to make.
This fixes dup/dup2 invocations that grow the file table; in the initial
report, it manifested as a kernel panic in devel/gmake's configure script.
Reported by: Guy Yur <guyyur gmail com>
Reviewed by: rew
Differential Revision: https://reviews.freebsd.org/D27319
During the life of a process, new file descriptor tables may be allocated. When
a new table is allocated, the old table is placed in a free list and held onto
until all processes referencing them exit.
When a new file descriptor table is allocated, the old file descriptor table
can be freed when the current process has a single-thread and the file
descriptor table is not being shared with any other processes.
Reviewed by: kevans
Approved by: kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D18617
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list. Then it
acquires the global sigio lock and processes the list. However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.
Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty. Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.
Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object. However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object. However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.
Fix this by introducing funsetown_locked(), which avoids the racy check.
Reviewed by: kib
Reported by: pho
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27157
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.
Note it is very unlikely anyone ran into it.
The pointer to vnode is already stored into f_vnode, so f_data can be
reused. Fix all found users of f_data for DTYPE_VNODE.
Provide finit_vnode() helper to initialize file of DTYPE_VNODE type.
Reviewed by: markj (previous version)
Discussed with: freqlabs (openzfs chunk)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346