freebsd-nq

Author	SHA1	Message	Date
Mateusz Guzik	9b0c2e5909	vfs: expand on vhold_smr comment	2020-07-06 02:00:35 +00:00
Mateusz Guzik	f8022be3e6	vfs: protect vnodes with smr vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr section, allowing consumers to get away without locking. See vhold_smr and vdropl for comments explaining caveats. Reviewed by: kib Testec by: pho Differential Revision: https://reviews.freebsd.org/D23913	2020-07-01 05:56:29 +00:00
Ryan Moeller	245bfd34da	Deduplicate fsid comparisons Comparing fsid_t objects requires internal knowledge of the fsid structure and yet this is duplicated across a number of places in the code. Simplify by creating a fsidcmp function (macro). Reviewed by: mjg, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24749	2020-05-21 01:55:35 +00:00
Chuck Silvers	f15ccf8836	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787	2020-03-06 18:41:37 +00:00
Ryan Libby	2782c00c04	vfs: quiet -Wwrite-strings Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23797	2020-02-23 03:32:11 +00:00
Jeff Roberson	6c5f36ff30	Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in virtual address or physical page allocation need to be marked with this flag. Reviewed by: markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D23712	2020-02-19 08:17:27 +00:00
Mateusz Guzik	3403d5245e	vfs: fix vlrureclaim ->v_object access The routine was checking for ->v_type == VBAD. Since vgone drops the interlock early sets this type at the end of the process of dooming a vnode, this opens a time window where it can clear the pointer while the inerlock-holders is accessing it. Another note is that the code was: (vp->v_object != NULL && vp->v_object->resident_page_count > trigger) With the compiler being fully allowed to emit another read to get the pointer, and in fact it did on the kernel used by pho. Use atomic_load_ptr and remember the result. Note that this depends on type-safety of vm_object. Reported by: pho	2020-02-16 03:33:34 +00:00
Mateusz Guzik	c615009461	vfs: check early for VCHR in vput_final to short-circuit in the common case Otherwise the compiler inlines v_decr_devcount which keps getting jumped over in the common case of not dealing with a device.	2020-02-16 03:16:28 +00:00
Mateusz Guzik	df0d5a2a85	vfs: remove no longer needed atomic_load_ptr casts	2020-02-14 23:18:32 +00:00
Mateusz Guzik	4602214772	vfs: refactor vputx and add more comment Reviewed by: jeff (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D23530	2020-02-12 11:19:07 +00:00
Mateusz Guzik	123c519731	vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit In particular on amd64 this eliminates an atomic op in the common case, trading it for IPIs in the uncommon case of catching CPUs executing the code while the filesystem is getting suspended or unmounted.	2020-02-12 11:17:45 +00:00
Mateusz Guzik	57349a4f41	vfs: fix vhold race in mnt_vnode_next_lazy_relock vdrop can set the hold count to 0 and wait for the ->mnt_listmtx held by mnt_vnode_next_lazy_relock caller. The routine incorrectly asserted the count has to be > 0. Reported by: pho Tested by: pho	2020-02-11 18:19:56 +00:00
Mateusz Guzik	2e57c8fde7	vfs: fix device count leak on vrele racing with vgone The race is: CPU1 CPU2 devfs_reclaim_vchr make v_usecount 0 VI_LOCK sees v_usecount == 0, no updates vp->v_rdev = NULL; ... VI_UNLOCK VI_LOCK v_decr_devcount sees v_rdev == NULL, no updates In this scenario si_devcount decrement is not performed. Note this can only happen if the vnode lock is not held. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23529	2020-02-10 22:28:54 +00:00
Mateusz Guzik	cd951a0d8e	vfs: fix lock recursion in vrele vrele is supposed to be called with an unlocked vnode, but this was never asserted for if v_usecount was > 0. For such counts the lock is never touched by the routine. As a result the kernel has several consumers which expect vunref semantics and get away with calling vrele since they happen to never do it when this is the last reference (and for some of them this may happen to be a guarantee). Work around the problem by changing vrele semantics to tolerate being called with a lock. This eliminates a possible bug where the lock is already held and vputx takes it anyway. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23528	2020-02-10 13:54:34 +00:00
Mateusz Guzik	2f7f11b7de	vfs: tidy up vget_finish and vn_lock - remove assertion which duplicates vn_lock - use VNPASS instead of retyping the failure - report what flags were passed if panicking on them	2020-02-08 15:52:20 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Mateusz Guzik	6698e11f4b	vfs: remove the now empty vop_unlock_post	2020-02-02 09:36:32 +00:00
Mateusz Guzik	643656cfaf	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422	2020-02-01 06:46:55 +00:00
Mateusz Guzik	21c4f1041e	vfs: add vrefactn Differential Revision: https://reviews.freebsd.org/D23427	2020-02-01 06:39:49 +00:00
Mateusz Guzik	0f4d8b77c0	vfs: revert the overzealous assert added in r357285 to vgone The intent was to make it more likely to catch filesystems with custom need_inactive routines which fail to call vn_need_pageq_flush (or do an equivalent). One immediate case which is missed is vgone from called by inactive itself. A better assertion may land later. The routine is not added to vputx because it is of no use to tmpfs et al. Reported by: syzbot+5f697ec11f89b60941db@syzkaller.appspotmail.com	2020-01-31 11:31:14 +00:00
Mateusz Guzik	3ff65f71cb	Remove duplicated empty lines from kern/*.c No functional changes.	2020-01-30 20:05:05 +00:00
Mateusz Guzik	c2ef6aa3d5	vfs: assert that doomed vnodes don't need to call vm_object_page_clean ... after the optional inactive processing.	2020-01-30 04:59:08 +00:00
Mateusz Guzik	07c6e2f4ab	vfs: unlazy before dooming the vnode With this change having the listmtx lock held postpones dooming the vnode. Use this fact to simplify iteration over the lazy list. It also allows filters to safely access ->v_data. Reviewed by: kib (early version) Differential Revision: https://reviews.freebsd.org/D23397	2020-01-30 02:12:52 +00:00
Gleb Smirnoff	79674264df	Fix text format definition for kern.maxvnodes, vfs.wantfreevnodes. This is a regression from r356642, r356645.	2020-01-30 00:18:00 +00:00
Mateusz Guzik	1513f80391	vfs: do an unlocked check before iterating the lazy list For most filesystems it is expected to be empty most of the time.	2020-01-26 07:06:18 +00:00
Mateusz Guzik	6d69e665dd	vfs: fix freevnodes count update race against preemption vdbatch_process leaves the critical section too early, openign a time window where another thread can get scheduled and modify vd->freevnodes. Once it the preempted thread gets back it overrides the value with 0. Just move critical_exit to the end of the function.	2020-01-26 00:40:27 +00:00
Mateusz Guzik	dc9a1cb60b	vfs: predict vn_lock failure as unlikely in vget	2020-01-26 00:34:57 +00:00
Mateusz Guzik	28eb39a5ab	vfs: allow v_usecount to transition 0->1 without the interlock There is nothing to do but to bump the count even during said transition. There are 2 places which can do it: - vget only does this after locking the vnode, meaning there is no change in contract versus inactive or reclamantion - vref only ever did it with the interlock held which did not protect against either (that is, it would always succeed) VCHR vnodes retain special casing due to the need to maintain dev use count. Reviewed by: jeff, kib Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D23185	2020-01-24 07:47:44 +00:00
Mateusz Guzik	d93762b94d	vfs: stop handling VI_OWEINACT in vget vget is almost always called with LK_SHARED, meaning the flag (if present) is almost guaranteed to get cleared. Stop handling it in the first place and instead let the thread which wanted to do inactive handle the bumepd usecount. Reviewed by: jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D23184	2020-01-24 07:45:59 +00:00
Mateusz Guzik	74c4b7cc60	vfs: stop unlocking the vnode upfront in vput Doing so runs into races with filesystems which make half-constructed vnodes visible to other users, while depending on the chain vput -> vinactive -> vrecycle to be executed without dropping the vnode lock. Impediments for making this work got cleared up (notably vop_unlock_post now does not do anything and lockmgr stops touching the lock after the final write). Stacked filesystems keep vhold/vdrop across unlock, which arguably can now be eliminated. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23344	2020-01-24 07:44:25 +00:00
Mateusz Guzik	28479aaae2	vfs: allow v_holdcnt to transition 0->1 without the interlock Since r356672 ("vfs: rework vnode list management") there is nothing to do apart from altering freevnodes count, but this much can be safely done based on the result of atomic_fetchadd. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23186	2020-01-19 17:47:04 +00:00
Mateusz Guzik	512fa9a4e0	vfs: plug a conditional assigment of lo_name in getnewvnode It only matters for witness. No functional changes.	2020-01-19 05:36:45 +00:00
Mateusz Guzik	2d0c620272	vfs: distribute freevnodes counter per-cpu It gets rolled up to the global when deferred requeueing is performed. A dedicated read routine makes sure to return a value only off by a certain amount. This soothes a global serialisation point for all 0<->1 hold count transitions. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23235	2020-01-18 01:29:02 +00:00
Mateusz Guzik	1ad72b270c	vfs: shorten lock hold time in vdbatch_process	2020-01-17 14:39:00 +00:00
Mateusz Guzik	66f67d5e5e	vfs: increment numvnodes without the vnode list lock unless under pressure The vnode list lock is only needed to reclaim free vnodes or kick the vnlru thread (or to block and not miss a wake up (but note the sleep has a timeout so this would not be a correctness issue)). Try to get away without the lock by just doing an atomic increment. The lock is contended e.g., during poudriere -j 104 where about half of all acquires come from vnode allocation code. Note the entire scheme needs a rewrite, the above just reduces it's SMP impact. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23140	2020-01-16 21:45:21 +00:00
Mateusz Guzik	b7f50b9ad1	vfs: refcator vnode allocation Semantics are almost identical. Some code is deduplicated and there are fewer memory accesses. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D23158	2020-01-16 21:43:13 +00:00
Mateusz Guzik	875cfc082d	vfs: reimplement vlrureclaim to actually use LRU Take advantage of global ordering introduced in r356672. Reviewed by: mckusick (previous version) Differential Revision: https://reviews.freebsd.org/D23067	2020-01-16 10:44:02 +00:00
Mateusz Guzik	0c236d3d52	vfs: per-cpu batched requeuing of free vnodes Constant requeuing adds significant lock contention in certain workloads. Lessen the problem by batching it. Per-cpu areas are locked in order to synchronize against UMA freeing memory. vnode's v_mflag is converted to short to prevent the struct from growing. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 122.38s user 1780.45s system 6242% cpu 30.480 total patched: 144.84s user 985.90s system 4856% cpu 23.282 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22998	2020-01-13 02:39:41 +00:00
Mateusz Guzik	cc3593fbd9	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997	2020-01-13 02:37:25 +00:00
Mateusz Guzik	57083d2576	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995	2020-01-13 02:34:02 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Mateusz Guzik	91de98e6d4	vfs: only recalculate watermarks when limits are changing Previously they would get recalculated all the time, in particular in: getnewvnode -> vcheckspace -> vspace	2020-01-11 23:00:57 +00:00
Mateusz Guzik	e6ae744e0e	vfs: deduplicate vnode allocation logic This creates a dedicated routine (vn_alloc) to allocate vnodes. As a side effect code duplicationw with getnewvnode_reserve is eleminated. Add vn_free for symmetry.	2020-01-11 22:59:44 +00:00
Mateusz Guzik	b52d50cf69	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.	2020-01-11 22:58:14 +00:00
Mateusz Guzik	6928306764	vfs: incomplete pass at converting more ints to u_long Most notably numvnodes and freevnodes were u_long, but parameters used to govern them remained as ints.	2020-01-11 22:56:20 +00:00
Mateusz Guzik	bf62296f35	vfs: add missing CLTFLA_MPSAFE annotations This covers all kern/vfs_*.c files.	2020-01-11 22:55:12 +00:00
Mateusz Guzik	a9a047bc87	vfs: handle doomed vnodes in vdefer_inactive vgone dooms the vnode while keeping VI_OWEINACT set and then drops the interlock. vputx can pick up the interlock and pass it to vdefer_inactive since the flag is set. The race is harmless, just don't defer anything as vgone will take care of it. Reported by: pho	2020-01-07 20:24:21 +00:00
Mateusz Guzik	c8b3463dd0	vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT) The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036	2020-01-07 15:56:24 +00:00
Mateusz Guzik	b7cc9d1847	vfs: trylock in vfs_msync and refactor the func - use LK_NOWAIT instead of calling VOP_ISLOCKED before deciding to lock - evaluate flags before looping over vnodes Reviewed by: kib Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23035	2020-01-07 15:44:19 +00:00
Mateusz Guzik	c92fe112a7	vfs: use a dedicated counter for free vnode recycling Otherwise vlrureclaim activitity is mixed in and it is hard to tell which vnodes got reclaimed.	2020-01-07 15:42:01 +00:00

1 2 3 4 5 ...

1082 Commits