freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	1762f674cc	ktrace: pack all ktrace parameters into allocated structure ktr_io_params Ref-count the ktr_io_params structure instead of vnode/cred. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257	2021-05-22 15:16:08 +03:00
Konstantin Belousov	70c05850e2	kern_descrip.c: Style Wrap too long lines. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30257	2021-05-22 15:16:08 +03:00
Konstantin Belousov	bbf7a4e878	O_PATH: allow vnode kevent filter on such files if VREAD access is checked as allowed during open Requested by: wulf Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323	2021-04-15 12:49:18 +03:00
Konstantin Belousov	a5970a529c	Make files opened with O_PATH to not block non-forced unmount by only keeping hold count on the vnode, instead of the use count. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323	2021-04-15 12:48:27 +03:00
Konstantin Belousov	8d9ed174f3	open(2): Implement O_PATH Reviewed by: markj Tested by: pho Discussed with: walker.aj325_gmail.com, wulf Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323	2021-04-15 12:48:24 +03:00
Konstantin Belousov	42be0a7b10	Style. Add missed spaces, wrap long lines. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29323	2021-04-15 12:47:46 +03:00
Alex Richardson	fa32350347	close_range: add audit support This fixes the closefrom test in sys/audit. Includes cherry-picks of the following commits from openbsm: `4dfc628aaf` `99ff6fe32a` `da48a0399e` Reviewed By: kevans Differential Revision: https://reviews.freebsd.org/D28388	2021-02-23 17:47:07 +00:00
Jamie Gritton	d4380c0cdd	jail: Change both root and working directories in jail_attach(2) jail_attach(2) performs an internal chroot operation, leaving it up to the calling process to assure the working directory is inside the jail. Add a matching internal chdir operation to the jail's root. Also ignore kern.chroot_allow_open_directories, and always disallow the operation if there are any directory descriptors open. Reported by: mjg Approved by: markj, kib MFC after: 3 days	2021-02-19 14:13:35 -08:00
Alex Richardson	0482d7c9e9	Fix fget_only_user() to return ENOTCAPABLE on a failed capsicum check After `eaad8d1303` four additional capsicum-test tests started failing. It turns out this is because fget_only_user() was returning EBADF on a failed capsicum check instead of forwarding the return value of cap_check_inline() like fget_unlocked_seq(). capsicum-test failures before this: ``` [ FAILED ] 7 tests, listed below: [ FAILED ] Capability.OperationsForked [ FAILED ] Capability.NoBypassDAC [ FAILED ] Pdfork.OtherUserForked [ FAILED ] PipePdfork.WildcardWait [ FAILED ] OpenatTest.WithFlag [ FAILED ] ForkedOpenatTest_WithFlagInCapabilityMode._ [ FAILED ] Select.LotsOFileDescriptorsForked ``` After: ``` [ FAILED ] 3 tests, listed below: [ FAILED ] Capability.NoBypassDAC [ FAILED ] Pdfork.OtherUserForked [ FAILED ] PipePdfork.WildcardWait ``` Reviewed By: mjg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28691	2021-02-15 22:55:12 +00:00
Mateusz Guzik	eaad8d1303	fd: add fget_only_user This can be used by single-threaded processes which don't share a file descriptor table to access their file objects without having to reference them. For example select consumers tend to match the requirement and have several file descriptors to inspect.	2021-01-29 11:23:43 +00:00
Mateusz Guzik	5753be8e43	fd: add refcount argument to falloc_noinstall This lets callers avoid atomic ops by initializing the count to required value from the get go. While here add falloc_abort to backpedal from this without having to fdrop.	2021-01-13 15:29:34 +00:00
Mateusz Guzik	530b699a62	fd: add finstall_refed Can be used to consume an already existing reference and consequently avoid atomic ops.	2021-01-13 03:27:03 +01:00
Mateusz Guzik	4faa375cdd	fd: provide a dedicated closef variant for unix socket code This avoids testing for td != NULL.	2021-01-13 03:27:03 +01:00
Mateusz Guzik	71bd18d373	fd: use seqc_read_notmodify when translating fds	2021-01-07 23:30:04 +00:00
Mateusz Guzik	20ac5cda96	fd: make fd/fp mandatory They are both always passed anyway.	2021-01-07 23:30:04 +00:00
Mateusz Guzik	bb3a12f0e5	fd: inline pwd_get_smr Tested by: pho	2021-01-01 00:10:42 +00:00
Konstantin Belousov	7a202823aa	Expose eventfd in the native API/ABI using a new __specialfd syscall eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software. Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does. Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668	2020-12-27 12:57:26 +02:00
Mateusz Guzik	57efe26bcb	fd: reimplement close_range to avoid spurious relocking	2020-12-17 18:52:30 +00:00
Mateusz Guzik	08a5615cfe	audit: rework AUDIT_SYSCLOSE This in particular avoids spurious lookups on close.	2020-12-17 18:52:04 +00:00
Mateusz Guzik	1e71e7c4f6	fd: refactor closefp in preparation for close_range rework	2020-12-17 18:51:09 +00:00
Mateusz Guzik	08241fedc4	fd: remove redundant saturation check from fget_unlocked_seq refcount_acquire_if_not_zero returns true on saturation. The case of 0 is handled by looping again, after which the originally found pointer will no longer be there. Noted by: kib	2020-12-16 18:01:41 +00:00
Mateusz Guzik	edcdcefb88	fd: fix fdrop prediction when closing a fd Most of the time this is the last reference, contrary to typical fdrop use.	2020-12-13 18:06:24 +00:00
Mateusz Guzik	0ecce93dca	fd: make serialization in fdescfree_fds conditional on hold count p_fd nullification in fdescfree serializes against new threads transitioning the count 1 -> 2, meaning that fdescfree_fds observing the count of 1 can safely assume there is nobody else using the table. Losing the race and observing > 1 is harmless. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27522	2020-12-10 17:17:22 +00:00
Mark Johnston	3309fa7403	Plug a race between fd table teardown and several loops To export information from fd tables we have several loops which do this: FILDESC_SLOCK(fdp); for (i = 0; fdp->fd_refcount > 0 && i <= lastfile; i++) <export info for fd i>; FILDESC_SUNLOCK(fdp); Before r367777, fdescfree() acquired the fd table exclusive lock between decrementing fdp->fd_refcount and freeing table entries. This serialized with the loop above, so the file at descriptor i would remain valid until the lock is dropped. Now there is no serialization, so the loops may race with teardown of file descriptor tables. Acquire the exclusive fdtable lock after releasing the final table reference to provide a barrier synchronizing with these loops. Reported by: pho Reviewed by: kib (previous version), mjg Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27513	2020-12-09 14:05:08 +00:00
Mark Johnston	4c1c90ea95	Use refcount_load(9) to load fd table reference counts No functional change intended. Reviewed by: kib, mjg Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27512	2020-12-09 14:04:54 +00:00
Kyle Evans	c7ef3490e2	kern: never restart syscalls calling closefp(), e.g. close(2) All paths leading into closefp() will either replace or remove the fd from the filedesc table, and closefp() will call fo_close methods that can and do currently sleep without regard for the possibility of an ERESTART. This can be dangerous in multithreaded applications as another thread could have opened another file in its place that is subsequently operated on upon restart. The following are seemingly the only ones that will pass back ERESTART in-tree: - sockets (SO_LINGER) - fusefs - nfsclient Reviewed by: jilles, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27310	2020-11-25 01:08:57 +00:00
Kyle Evans	f96078b8fe	kern: dup: do not assume oldfde is valid oldfde may be invalidated if the table has grown due to the operation that we're performing, either via fdalloc() or a direct fdgrowtable_exp(). This was technically OK before rS367927 because the old table remained valid until the filedesc became unused, but now it may be freed immediately if it's an unshared table in a single-threaded process, so it is no longer a good assumption to make. This fixes dup/dup2 invocations that grow the file table; in the initial report, it manifested as a kernel panic in devel/gmake's configure script. Reported by: Guy Yur <guyyur gmail com> Reviewed by: rew Differential Revision: https://reviews.freebsd.org/D27319	2020-11-23 00:33:06 +00:00
Robert Wing	3c85ca21d1	fd: free old file descriptor tables when not shared During the life of a process, new file descriptor tables may be allocated. When a new table is allocated, the old table is placed in a free list and held onto until all processes referencing them exit. When a new file descriptor table is allocated, the old file descriptor table can be freed when the current process has a single-thread and the file descriptor table is not being shared with any other processes. Reviewed by: kevans Approved by: kevans (mentor) Differential Revision: https://reviews.freebsd.org/D18617	2020-11-22 05:00:28 +00:00
Conrad Meyer	85078b8573	Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037	2020-11-17 21:14:13 +00:00
Mark Johnston	f52979098d	Fix a pair of races in SIGIO registration First, funsetownlst() list looks at the first element of the list to see whether it's processing a process or a process group list. Then it acquires the global sigio lock and processes the list. However, nothing prevents the first sigio tracker from being freed by a concurrent funsetown() before the sigio lock is acquired. Fix this by acquiring the global sigio lock immediately after checking whether the list is empty. Callers of funsetownlst() ensure that new sigio trackers cannot be added concurrently. Second, fsetown() uses funsetown() to remove an existing sigio structure from a file object. However, funsetown() uses a racy check to avoid the sigio lock, so two threads may call fsetown() on the same file object, both observe that no sigio tracker is present, and enqueue two sigio trackers for the same file object. However, if the file object is destroyed, funsetown() will only remove one sigio tracker, and funsetownlst() may later trigger a use-after-free when it clears the file object reference for each entry in the list. Fix this by introducing funsetown_locked(), which avoids the racy check. Reviewed by: kib Reported by: pho Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27157	2020-11-11 13:44:27 +00:00
Mateusz Guzik	3c50616fc1	fd: make all f_count uses go through refcount_*	2020-11-05 02:12:33 +00:00
Mateusz Guzik	d737e9eaf5	fd: hide _fdrop 0 count check behind INVARIANTS While here use refcount_load and make sure to report the tested value.	2020-11-05 02:12:08 +00:00
Mateusz Guzik	dd28b379cb	vfs: support lockless dirfd lookups	2020-10-10 03:48:17 +00:00
Mateusz Guzik	4e2266100d	cache: fix pwd use-after-free in setting up fallback Since the code exits smr section prior to calling pwd_hold, the used pwd can be freed and a new one allocated with the same address, making the comparison erroneously true. Note it is very unlikely anyone ran into it.	2020-10-05 19:38:51 +00:00
Konstantin Belousov	96474d2a3f	Do not copy vp into f_data for DTYPE_VNODE files. The pointer to vnode is already stored into f_vnode, so f_data can be reused. Fix all found users of f_data for DTYPE_VNODE. Provide finit_vnode() helper to initialize file of DTYPE_VNODE type. Reviewed by: markj (previous version) Discussed with: freqlabs (openzfs chunk) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 21:55:21 +00:00
Mateusz Guzik	54052edaa0	fd: fix fhold on an uninitialized var in fdcopy_remapped Reported by: gcc9	2020-09-08 16:07:47 +00:00
Mateusz Guzik	cd4a1797b0	fd: pwd_drop after releasing filedesc lock Fixes a potential LOR against vnode lock.	2020-08-22 16:57:45 +00:00
Mateusz Guzik	e914224af1	fd: put back FILEDESC_SUNLOCK to pwd_hold lost during rebase Reported by: pho	2020-07-25 15:34:29 +00:00
Mateusz Guzik	07d2145a17	vfs: add the infrastructure for lockless lookup Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25577	2020-07-25 10:32:45 +00:00
Mateusz Guzik	d8bc2a17a5	fd: remove fd_lastfile It keeps recalculated way more often than it is needed. Provide a routine (fdlastfile) to get it if necessary. Consumers may be better off with a bitmap iterator instead.	2020-07-15 10:24:04 +00:00
Mateusz Guzik	7177149a4d	fd: add obvious branch predictions to fdalloc	2020-07-15 10:14:00 +00:00
Mateusz Guzik	373278a7f6	fd: stop looping in pwd_hold We don't expect to fail acquiring the reference unless running into a corner case. Just in case ensure forward progress by taking the lock. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D25616	2020-07-11 21:57:03 +00:00
Thomas Munro	f270658873	vfs: track sequential reads and writes separately For software like PostgreSQL and SQLite that sometimes reads sequentially while also writing sequentially some distance behind with interleaved syscalls on the same fd, performance is better on UFS if we do sequential access heuristics separately for reads and writes. Patch originally by Andrew Gierth in 2008, updated and proposed by me with his permission. Reviewed by: mjg, kib, tmunro Approved by: mjg (mentor) Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk> Differential Revision: https://reviews.freebsd.org/D25024	2020-06-21 08:51:24 +00:00
Mateusz Guzik	21d3be9105	pwd: unbreak repeated calls to set_rootvnode Prior to the change the once set pointer would never be updated. Unbreaks reboot -r. Reported by: Ross Gohlke	2020-04-27 13:54:00 +00:00
Kyle Evans	7d03e08112	Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range Include a temporarily compatibility shim as well for kernels predating close_range, since closefrom is used in some critical areas. Reviewed by: markj (previous version), kib Differential Revision: https://reviews.freebsd.org/D24399	2020-04-14 18:07:42 +00:00
Kyle Evans	605c4cda2f	close_range/closefrom: fix regression from close_range introduction close_range will clamp the range between [0, fdp->fd_lastfile], but failed to take into account that fdp->fd_lastfile can become -1 if all fds are closed. =-( In this scenario, just return because there's nothing further we can do at the moment. Add a test case for this, fork() and simply closefrom(0) twice in the child; on the second invocation, fdp->fd_lastfile == -1 and will trigger a panic before this change. X-MFC-With: r359836	2020-04-13 17:55:31 +00:00
Kyle Evans	472ced39ef	Implement a close_range(2) syscall close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627	2020-04-12 21:23:19 +00:00
Mark Johnston	429537caeb	kern_dup(): Call filecaps_free_prep() in a write section. filecaps_free_prep() bzeros the capabilities structure and we need to be careful to synchronize with unlocked readers, which expect a consistent rights structure. Reviewed by: kib, mjg Reported by: syzbot+5f30b507f91ddedded21@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24120	2020-03-19 15:40:05 +00:00
Mateusz Guzik	d2222aa0e9	fd: use smr for managing struct pwd This has a side effect of eliminating filedesc slock/sunlock during path lookup, which in turn removes contention vs concurrent modifications to the fd table. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D23889	2020-03-08 00:23:36 +00:00
Mateusz Guzik	8d03b99b9d	fd: move vnodes out of filedesc into a dedicated structure The new structure is copy-on-write. With the assumption that path lookups are significantly more frequent than chdirs and chrooting this is a win. This provides stable root and jail root vnodes without the need to reference them on lookup, which in turn means less work on globally shared structures. Note this also happens to fix a bug where jail vnode was never referenced, meaning subsequent access on lookup could run into use-after-free. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23884	2020-03-01 21:53:46 +00:00

1 2 3 4 5 ...

684 Commits