freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	4602214772	vfs: refactor vputx and add more comment Reviewed by: jeff (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D23530	2020-02-12 11:19:07 +00:00
Mateusz Guzik	ed67a63c39	vfs: drop remaining zpcpu casts	2020-02-12 11:18:12 +00:00
Mateusz Guzik	123c519731	vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit In particular on amd64 this eliminates an atomic op in the common case, trading it for IPIs in the uncommon case of catching CPUs executing the code while the filesystem is getting suspended or unmounted.	2020-02-12 11:17:45 +00:00
Mateusz Guzik	00ac9d2632	rms: use smp_rendezvous_cpus_retry instead of a hand-rolled variant	2020-02-12 11:17:18 +00:00
Mateusz Guzik	e4f584971b	Add smp_rendezvous_cpus_retry This is a wrapper around smp_rendezvous_cpus which enables use of IPI handlers which can fail and require retrying. wait_func argument is added to to provide a routine which can be used to poll CPU of interest for when the IPI can be retried. Handlers which succeed must call smp_rendezvous_cpus_done to denote that fact. Discussed with: jeff Differential Revision: https://reviews.freebsd.org/D23582	2020-02-12 11:16:55 +00:00
Mateusz Guzik	3acb6572fc	Store offset into zpcpu allocations in the per-cpu area. This shorten zpcpu_get and allows more optimizations. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23570	2020-02-12 11:11:22 +00:00
Mateusz Guzik	48baf00f54	epoch: convert zpcpu_get_cpua(.., curcpu) to zpcpu_get	2020-02-12 11:10:10 +00:00
Gleb Smirnoff	4426b2e64b	Add flag to struct task to mark the task as requiring network epoch. When processing a taskqueue and a task has associated epoch, then enter for duration of the task. If consecutive tasks belong to the same epoch, batch them. Now we are talking about the network epoch only. Shrink the ta_priority size to 8-bits. No current consumers use a priority that won't fit into 8 bits. Also complexity of taskqueue_enqueue() is a square of maximum value of priority, so we unlikely ever want to go over UCHAR_MAX here. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D23518	2020-02-11 18:48:07 +00:00
Mateusz Guzik	57349a4f41	vfs: fix vhold race in mnt_vnode_next_lazy_relock vdrop can set the hold count to 0 and wait for the ->mnt_listmtx held by mnt_vnode_next_lazy_relock caller. The routine incorrectly asserted the count has to be > 0. Reported by: pho Tested by: pho	2020-02-11 18:19:56 +00:00
Mateusz Guzik	1b853b62f3	capsicum: restore the cap_rights_contains symbol It is expected to be provided by libc. PR: 244033 Reported by: Jan Kokemueller	2020-02-11 18:13:53 +00:00
Mateusz Guzik	2e57c8fde7	vfs: fix device count leak on vrele racing with vgone The race is: CPU1 CPU2 devfs_reclaim_vchr make v_usecount 0 VI_LOCK sees v_usecount == 0, no updates vp->v_rdev = NULL; ... VI_UNLOCK VI_LOCK v_decr_devcount sees v_rdev == NULL, no updates In this scenario si_devcount decrement is not performed. Note this can only happen if the vnode lock is not held. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23529	2020-02-10 22:28:54 +00:00
Li-Wen Hsu	37d4ece7c5	Restore the behavior of allowing empty string in a string sysctl Added as a special case to avoid unnecessary memory operations. Reviewed by: delphij Sponsored by: The FreeBSD Foundation	2020-02-10 20:53:59 +00:00
Hans Petter Selasky	f912e8f2ff	Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt handler. Interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the interrupt is woken up, and calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. This can lead to a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning the need for epoch must be cached locally. This can happen when loading and unloading network drivers. Also make sure the ie_hflags is not cleared before being updated. This is a regression issue after r357004. Backtrace: panic() # trying to access epoch tracker on stack of dead thread _epoch_enter_preempt() ifunit_ref() ifioctl() fo_ioctl() kern_ioctl() sys_ioctl() syscallenter() amd64_syscall() Differential Revision: https://reviews.freebsd.org/D23483 Reviewed by: glebius@, gallatin@, mav@, jeff@ and kib@ Sponsored by: Mellanox Technologies	2020-02-10 20:23:08 +00:00
Mateusz Guzik	cd951a0d8e	vfs: fix lock recursion in vrele vrele is supposed to be called with an unlocked vnode, but this was never asserted for if v_usecount was > 0. For such counts the lock is never touched by the routine. As a result the kernel has several consumers which expect vunref semantics and get away with calling vrele since they happen to never do it when this is the last reference (and for some of them this may happen to be a guarantee). Work around the problem by changing vrele semantics to tolerate being called with a lock. This eliminates a possible bug where the lock is already held and vputx takes it anyway. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23528	2020-02-10 13:54:34 +00:00
Konstantin Belousov	48fcb46311	Add sysctl kern.proc.sigfastblk for reporting sigfastblock word address. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773	2020-02-09 12:29:51 +00:00
Konstantin Belousov	944cf37bb5	Add AT_BSDFLAGS auxv entry. The intent is to provide bsd-specific flags relevant to interpreter and C runtime. I did not want to reuse AT_FLAGS which is common ELF auxv entry. Use bsdflags to report kernel support for sigfastblock(2). This allows rtld and libthr to safely infer the syscall presence without SIGSYS. The tunable kern.elf{32,64}.sigfastblock blocks reporting. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773	2020-02-09 12:10:37 +00:00
Konstantin Belousov	f88c67a625	Regen.	2020-02-09 11:53:37 +00:00
Konstantin Belousov	146fc63fce	Add a way to manage thread signal mask using shared word, instead of syscall. A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773	2020-02-09 11:53:12 +00:00
Mateusz Guzik	2f7f11b7de	vfs: tidy up vget_finish and vn_lock - remove assertion which duplicates vn_lock - use VNPASS instead of retyping the failure - report what flags were passed if panicking on them	2020-02-08 15:52:20 +00:00
Mateusz Guzik	3eb6b656c2	vfs: remove now useless ENODEV handling from vn_fullpath consumers Noted by: ngie	2020-02-08 15:51:08 +00:00
Konstantin Belousov	300b525d29	Correct the function name in the comment. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2020-02-08 15:06:06 +00:00
Mateusz Guzik	ea77ce6ef9	rms: use newly added zpcpu routines instead of direct access where appropriate	2020-02-07 22:44:41 +00:00
Jeff Roberson	a40068e524	Fix a race in smr_advance() that could result in unnecessary poll calls. This was relatively harmless but surprising to see in counters. The race occurred when rd_seq was read after the goal was updated and we incorrectly calculated the delta between them. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23464	2020-02-06 20:51:46 +00:00
Jeff Roberson	8d7f16a5db	Add some global counters for SMR. These may eventually become per-smr counters. In my stress test there is only one poll for every 15,000 frees. This means we are effectively amortizing the cache coherency overhead even with very high write rates (3M/s/core). Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23463	2020-02-06 20:10:21 +00:00
Pawel Biernacki	210176ad76	sysctl(9): add CTLFLAG_NEEDGIANT flag Add CTLFLAG_NEEDGIANT flag (modelled after D_NEEDGIANT) that will be used to mark sysctls that still require locking Giant. Rewrite sysctl_handle_string() to use internal locking instead of locking Giant. Mark SYSCTL_STRING, SYSCTL_OPAQUE and their variants as MPSAFE. Add infrastructure support for enforcing proper use of CTLFLAG_NEEDGIANT and CTLFLAG_MPSAFE flags with SYSCTL_PROC and SYSCTL_NODE, not enabled yet. Reviewed by: kib (mentor) Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D23378	2020-02-06 12:45:58 +00:00
Mark Johnston	d3631aa582	Avoid releasing object PIP in vn_sendfile() if no pages were grabbed. sendfile(2) optionally takes a set of headers that get prepended to the file data. If the request length is less than that of the headers, sendfile may not allocate an sfio structure, in which case its pointer is null and we should be careful not to dereference. This was introduced in r356902. Reported by: syzkaller Sponsored by: The FreeBSD Foundation	2020-02-05 16:09:21 +00:00
Leandro Lupori	eb5a41cf2f	Add SYSCTL to get KERNBASE and relocated KERNBASE This change adds 2 new SYSCTLs, to retrieve the original and relocated KERNBASE values. This provides an easy, architecture independent way to calculate the running kernel displacement (current/load address minus original base address). The initial goal for this change is to add a new libkvm function that returns the kernel displacement, both for live kernels and crashdumps. This would in turn be used by kgdb to find out how to relocate kernel symbols (if needed). Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D23284	2020-02-05 11:34:10 +00:00
Mateusz Guzik	1a9fe4528b	fd: always nullify fdp in fget routines Some consumers depend on the pointer being NULL if an error is returned. The guarantee got broken in r357469. Reported by: https://syzkaller.appspot.com/bug?extid=0c9b05e2b727aae21eef Noted by: markj	2020-02-05 00:20:26 +00:00
Ryan Libby	10c8fb47d9	uma: convert mbuf_jumbo_alloc to UMA_ZONE_CONTIG & tag others Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default contig allocator (a copy of mbuf_jumbo_alloc). Tag other zones which require contiguous objects, even if they don't use the new default contig allocator, so that uma knows about their constraints. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D23238	2020-02-04 22:40:23 +00:00
Konstantin Belousov	0783b70974	Remove unneeded assert for curproc. Simplify. Reported by: syzkaller by markj Sponsored by: The FreeBSD Foundation	2020-02-04 21:02:08 +00:00
Mark Johnston	60185d649b	Correct the malloc tag used when freeing the temporary semop(2) buffer. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2020-02-04 20:00:45 +00:00
Dmitry Chagin	cbc1089190	For code reuse in Linuxulator rename get_proccess_cputime() and get_thread_cputime() and add prototypes for it to <sys/syscallsubr.h>. As both functions become a public interface add process lock assert to ensure that the process is not exiting under it. Fix whitespace nit while here. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23340 MFC after 2 weeks	2020-02-04 05:25:51 +00:00
Jeff Roberson	bc6509845d	Implement a deferred write advancement feature that can be used to further amortize shared cacheline writes. Discussed with: rlibby Differential Revision: https://reviews.freebsd.org/D23462	2020-02-04 02:44:52 +00:00
Jeff Roberson	c8ea36e881	Fix a recursion on the thread lock by acquiring it after call rtp_to_pri(). Reported by: swills Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23495	2020-02-04 02:42:54 +00:00
Mark Johnston	e489450589	Fix the !SMP case in sched_add() after r355779. If the thread's lock is already that of the runqueue, don't recurse on the queue lock. Reviewed by: jeff, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23492	2020-02-03 22:49:05 +00:00
Mateusz Guzik	8151b6e92a	fd: partially unengrish the previous commit	2020-02-03 22:34:50 +00:00
Mateusz Guzik	e10f063b30	fd: streamline fget_unlocked clang has the unfortunate property of paying little attention to prediction hints when faced with a loop spanning the majority of the rotuine. In particular fget_unlocked has an unlikely corner case where it starts almost from scratch. Faced with this clang generates a maze of taken jumps, whereas gcc produces jump-free code (in the expected case). Work around the problem by providing a variant which only tries once and resorts to calling the original code if anything goes wrong. While here note that the 'seq' parameter is almost never passed, thus the seldom users are redirected to call it directly.	2020-02-03 22:32:49 +00:00
Mateusz Guzik	52604ed792	fd: remove the seq argument from fget_unlocked It is almost always NULL.	2020-02-03 22:27:55 +00:00
Mateusz Guzik	7f1566f884	fd: remove the seq argument from fget routines It is almost always NULL.	2020-02-03 22:27:03 +00:00
Mateusz Guzik	0a1427c5ab	ktrace: provide ktrstat_error This eliminates a branch from its consumers trading it for an extra call if ktrace is enabled for curthread. Given that this is almost never true, the tradeoff is worth it.	2020-02-03 22:26:00 +00:00
Gleb Smirnoff	0017b2adac	Couple protocol drain routines (frag6_drain and sctp_drain) may send packets. An unexpected behaviour for memory reclamation routine. Anyway, we need enter the network epoch for doing that.	2020-02-03 20:48:57 +00:00
Kyle Evans	3d62f685d5	namei: preserve errors from fget_cap_locked Most notably, we want to make sure we don't clobber any capabilities-related errors. This is a regression from r357412 (O_SEARCH) that was picked up by the capsicum tests. PR: 243839 Reviewed by: kib (committed form recommended by) Tested by: lwhsu Differential Revision: https://reviews.freebsd.org/D23479	2020-02-03 18:59:07 +00:00
Warner Losh	58aa35d429	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
Mateusz Guzik	bcd1cf4f03	capsicum: faster cap_rights_contains Instead of doing a 2 iteration loop (determined at runeimt), take advantage of the fact that the size is already known. While here provdie cap_check_inline so that fget_unlocked does not have to do a function call. Verified with the capsicum suite /usr/tests.	2020-02-03 17:08:11 +00:00
Mateusz Guzik	fee204544e	fd: fix f_count acquire in fget_unlocked The code was using a hand-rolled fcmpset loop, while in other places the same count is manipulated with the refcount API. This transferred from a stylistic issue into a bug after the API got extended to support flags. As a result the hand-rolled loop could bump the count high enough to set the bit flag. Another bump + refcount_release would then free the file prematurely. The bug is only present in -CURRENT.	2020-02-03 14:28:31 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Mateusz Guzik	2568d5bb79	fd: sprinkle some predits around fget clang inlines fget -> _fget into kern_fstat and eliminates several checkes, but prior to this change it would assume fget_unlocked was likely to fail and consequently avoidable jumps got generated.	2020-02-02 09:38:40 +00:00
Mateusz Guzik	da4f45ea5c	fd: use atomic_load_ptr instead of hand-rolled cast through volatile No change in assembly.	2020-02-02 09:37:16 +00:00
Mateusz Guzik	6698e11f4b	vfs: remove the now empty vop_unlock_post	2020-02-02 09:36:32 +00:00

1 2 3 4 5 ...

17205 Commits