Commit Graph

1150 Commits

Author SHA1 Message Date
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Mateusz Guzik
36f47512d9 vfs: inline vrefcnt 2020-08-12 04:53:20 +00:00
Mateusz Guzik
4c2d103a02 vfs: garbage collect vrefactn 2020-08-12 04:53:02 +00:00
Mateusz Guzik
6883f07e97 vfs: reimplement vref on top of vget
No change in generated assembly.
2020-08-12 04:52:35 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
7f70080150 vfs: disallow NOCACHE with LOOKUP
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:33:40 +00:00
Mateusz Guzik
1ff80a3400 vfs: release the interlock after failing to set VHOLD_NO_SMR
While here add more comments.

Diagnosed by:	markj
Reported by:	pho
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-07 19:36:08 +00:00
Mark Johnston
0ffec1b03d Clean up reassignbuf() and buf_vlist_remove() a bit.
- Convert panic() calls to INVARIANTS-only assertions.  The PCTRIE code
  provides some of the same protection since it will panic upon an
  attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:43:15 +00:00
Mark Johnston
7013797e34 Remove the vfs.reassignbufcalls counter and sysctl.
As the 20-year old comment above it suggests, the counter is of dubious
value.  Moreover, the (global) counter was not updated precisely and
hurts scalability.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:42:59 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
db99ec5656 vfs: support lockless dotdot lookup
Tested by:	pho
2020-08-04 23:07:42 +00:00
Mateusz Guzik
6e10434c02 cache: add cache_purge_vgone
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.

Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.

Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.

Reported by:	pho
2020-08-04 23:04:29 +00:00
Mateusz Guzik
838984de32 vfs: move namecache initialisation into cache_vnode_init 2020-08-02 19:42:06 +00:00
Mateusz Guzik
848f8effdd vfs: inline vops if there are no pre/post associated calls
This removes a level of indirection from frequently used methods, most notably
VOP_LOCK1 and VOP_UNLOCK1.

Tested by:	pho
2020-07-30 15:50:51 +00:00
Mateusz Guzik
07d2145a17 vfs: add the infrastructure for lockless lookup
Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25577
2020-07-25 10:32:45 +00:00
Mateusz Guzik
0379ff6ae3 vfs: introduce vnode sequence counters
Modified on each permission change and link/unlink.

Reviewed by:	kib
Tested by:	pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25573
2020-07-25 10:31:52 +00:00
Conrad Meyer
68ee1dda06 Add unlocked/SMR fast path to getblk()
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality.  Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.

Reported by:	Attilio
Reviewed by:	kib, markj
Testing:	pho (in progress)
Sponsored by:	Isilon
Differential Revision:	https://reviews.freebsd.org/D25782
2020-07-24 17:34:04 +00:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Mateusz Guzik
9b0c2e5909 vfs: expand on vhold_smr comment 2020-07-06 02:00:35 +00:00
Mateusz Guzik
f8022be3e6 vfs: protect vnodes with smr
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.

See vhold_smr and vdropl for comments explaining caveats.

Reviewed by:	kib
Testec by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:56:29 +00:00
Ryan Moeller
245bfd34da Deduplicate fsid comparisons
Comparing fsid_t objects requires internal knowledge of the fsid structure
and yet this is duplicated across a number of places in the code.

Simplify by creating a fsidcmp function (macro).

Reviewed by:	mjg, rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D24749
2020-05-21 01:55:35 +00:00
Chuck Silvers
f15ccf8836 Add a new "mntfs" pseudo file system which provides private device vnodes for
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.

Reviewed by:	kib, mckusick
Approved by:	imp (mentor)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D23787
2020-03-06 18:41:37 +00:00
Ryan Libby
2782c00c04 vfs: quiet -Wwrite-strings
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23797
2020-02-23 03:32:11 +00:00
Jeff Roberson
6c5f36ff30 Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in
virtual address or physical page allocation need to be marked with this
flag.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23712
2020-02-19 08:17:27 +00:00
Mateusz Guzik
3403d5245e vfs: fix vlrureclaim ->v_object access
The routine was checking for ->v_type == VBAD. Since vgone drops the interlock
early sets this type at the end of the process of dooming a vnode, this opens
a time window where it can clear the pointer while the inerlock-holders is
accessing it.

Another note is that the code was:
	   (vp->v_object != NULL &&
	   vp->v_object->resident_page_count > trigger)

With the compiler being fully allowed to emit another read to get the pointer,
and in fact it did on the kernel used by pho.

Use atomic_load_ptr and remember the result.

Note that this depends on type-safety of vm_object.

Reported by:	pho
2020-02-16 03:33:34 +00:00
Mateusz Guzik
c615009461 vfs: check early for VCHR in vput_final to short-circuit in the common case
Otherwise the compiler inlines v_decr_devcount which keps getting jumped over
in the common case of not dealing with a device.
2020-02-16 03:16:28 +00:00
Mateusz Guzik
df0d5a2a85 vfs: remove no longer needed atomic_load_ptr casts 2020-02-14 23:18:32 +00:00
Mateusz Guzik
4602214772 vfs: refactor vputx and add more comment
Reviewed by:	jeff (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D23530
2020-02-12 11:19:07 +00:00
Mateusz Guzik
123c519731 vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit
In particular on amd64 this eliminates an atomic op in the common case,
trading it for IPIs in the uncommon case of catching CPUs executing the
code while the filesystem is getting suspended or unmounted.
2020-02-12 11:17:45 +00:00
Mateusz Guzik
57349a4f41 vfs: fix vhold race in mnt_vnode_next_lazy_relock
vdrop can set the hold count to 0 and wait for the ->mnt_listmtx held by
mnt_vnode_next_lazy_relock caller. The routine incorrectly asserted the
count has to be > 0.

Reported by:	pho
Tested by:	pho
2020-02-11 18:19:56 +00:00
Mateusz Guzik
2e57c8fde7 vfs: fix device count leak on vrele racing with vgone
The race is:

CPU1                                CPU2
                                    devfs_reclaim_vchr
make v_usecount 0
                                      VI_LOCK
                                      sees v_usecount == 0, no updates
                                      vp->v_rdev = NULL;
                                      ...
                                      VI_UNLOCK
VI_LOCK
v_decr_devcount
  sees v_rdev == NULL, no updates

In this scenario si_devcount decrement is not performed.

Note this can only happen if the vnode lock is not held.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23529
2020-02-10 22:28:54 +00:00
Mateusz Guzik
cd951a0d8e vfs: fix lock recursion in vrele
vrele is supposed to be called with an unlocked vnode, but this was never
asserted for if v_usecount was > 0. For such counts the lock is never touched
by the routine. As a result the kernel has several consumers which expect
vunref semantics and get away with calling vrele since they happen to never do
it when this is the last reference (and for some of them this may happen to be
a guarantee).

Work around the problem by changing vrele semantics to tolerate being called
with a lock. This eliminates a possible bug where the lock is already held and
vputx takes it anyway.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23528
2020-02-10 13:54:34 +00:00
Mateusz Guzik
2f7f11b7de vfs: tidy up vget_finish and vn_lock
- remove assertion which duplicates vn_lock
- use VNPASS instead of retyping the failure
- report what flags were passed if panicking on them
2020-02-08 15:52:20 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
6698e11f4b vfs: remove the now empty vop_unlock_post 2020-02-02 09:36:32 +00:00
Mateusz Guzik
643656cfaf vfs: replace VOP_MARKATIME with VOP_MMAPPED
The routine is only provided by ufs and is only used on mmap and exec.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23422
2020-02-01 06:46:55 +00:00
Mateusz Guzik
21c4f1041e vfs: add vrefactn
Differential Revision:	https://reviews.freebsd.org/D23427
2020-02-01 06:39:49 +00:00
Mateusz Guzik
0f4d8b77c0 vfs: revert the overzealous assert added in r357285 to vgone
The intent was to make it more likely to catch filesystems with custom
need_inactive routines which fail to call vn_need_pageq_flush (or do an
equivalent).

One immediate case which is missed is vgone from called by inactive itself.

A better assertion may land later. The routine is not added to vputx because
it is of no use to tmpfs et al.

Reported by:	syzbot+5f697ec11f89b60941db@syzkaller.appspotmail.com
2020-01-31 11:31:14 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mateusz Guzik
c2ef6aa3d5 vfs: assert that doomed vnodes don't need to call vm_object_page_clean
... after the optional inactive processing.
2020-01-30 04:59:08 +00:00
Mateusz Guzik
07c6e2f4ab vfs: unlazy before dooming the vnode
With this change having the listmtx lock held postpones dooming the vnode.
Use this fact to simplify iteration over the lazy list. It also allows
filters to safely access ->v_data.

Reviewed by:	kib (early version)
Differential Revision:	https://reviews.freebsd.org/D23397
2020-01-30 02:12:52 +00:00
Gleb Smirnoff
79674264df Fix text format definition for kern.maxvnodes, vfs.wantfreevnodes. This
is a regression from r356642, r356645.
2020-01-30 00:18:00 +00:00
Mateusz Guzik
1513f80391 vfs: do an unlocked check before iterating the lazy list
For most filesystems it is expected to be empty most of the time.
2020-01-26 07:06:18 +00:00
Mateusz Guzik
6d69e665dd vfs: fix freevnodes count update race against preemption
vdbatch_process leaves the critical section too early, openign a time
window where another thread can get scheduled and modify vd->freevnodes.
Once it the preempted thread gets back it overrides the value with 0.

Just move critical_exit to the end of the function.
2020-01-26 00:40:27 +00:00
Mateusz Guzik
dc9a1cb60b vfs: predict vn_lock failure as unlikely in vget 2020-01-26 00:34:57 +00:00
Mateusz Guzik
28eb39a5ab vfs: allow v_usecount to transition 0->1 without the interlock
There is nothing to do but to bump the count even during said transition.
There are 2 places which can do it:
- vget only does this after locking the vnode, meaning there is no change in
  contract versus inactive or reclamantion
- vref only ever did it with the interlock held which did not protect against
  either (that is, it would always succeed)

VCHR vnodes retain special casing due to the need to maintain dev use count.

Reviewed by:	jeff, kib
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D23185
2020-01-24 07:47:44 +00:00
Mateusz Guzik
d93762b94d vfs: stop handling VI_OWEINACT in vget
vget is almost always called with LK_SHARED, meaning the flag (if present) is
almost guaranteed to get cleared. Stop handling it in the first place and
instead let the thread which wanted to do inactive handle the bumepd usecount.

Reviewed by:	jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23184
2020-01-24 07:45:59 +00:00
Mateusz Guzik
74c4b7cc60 vfs: stop unlocking the vnode upfront in vput
Doing so runs into races with filesystems which make half-constructed vnodes
visible to other users, while depending on the chain vput -> vinactive ->
vrecycle to be executed without dropping the vnode lock.

Impediments for making this work got cleared up (notably vop_unlock_post now
does not do anything and lockmgr stops touching the lock after the final
write). Stacked filesystems keep vhold/vdrop across unlock, which arguably can
now be eliminated.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23344
2020-01-24 07:44:25 +00:00
Mateusz Guzik
28479aaae2 vfs: allow v_holdcnt to transition 0->1 without the interlock
Since r356672 ("vfs: rework vnode list management") there is nothing to do
apart from altering freevnodes count, but this much can be safely done based
on the result of atomic_fetchadd.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23186
2020-01-19 17:47:04 +00:00
Mateusz Guzik
512fa9a4e0 vfs: plug a conditional assigment of lo_name in getnewvnode
It only matters for witness. No functional changes.
2020-01-19 05:36:45 +00:00
Mateusz Guzik
2d0c620272 vfs: distribute freevnodes counter per-cpu
It gets rolled up to the global when deferred requeueing is performed.
A dedicated read routine makes sure to return a value only off by a certain
amount.

This soothes a global serialisation point for all 0<->1 hold count transitions.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23235
2020-01-18 01:29:02 +00:00
Mateusz Guzik
1ad72b270c vfs: shorten lock hold time in vdbatch_process 2020-01-17 14:39:00 +00:00
Mateusz Guzik
66f67d5e5e vfs: increment numvnodes without the vnode list lock unless under pressure
The vnode list lock is only needed to reclaim free vnodes or kick the vnlru
thread (or to block and not miss a wake up (but note the sleep has a timeout so
this would not be a correctness issue)). Try to get away without the lock by
just doing an atomic increment.

The lock is contended e.g., during poudriere -j 104 where about half of all
acquires come from vnode allocation code.

Note the entire scheme needs a rewrite, the above just reduces it's SMP impact.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23140
2020-01-16 21:45:21 +00:00
Mateusz Guzik
b7f50b9ad1 vfs: refcator vnode allocation
Semantics are almost identical. Some code is deduplicated and there are
fewer memory accesses.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D23158
2020-01-16 21:43:13 +00:00
Mateusz Guzik
875cfc082d vfs: reimplement vlrureclaim to actually use LRU
Take advantage of global ordering introduced in r356672.

Reviewed by:	mckusick (previous version)
Differential Revision:	https://reviews.freebsd.org/D23067
2020-01-16 10:44:02 +00:00
Mateusz Guzik
0c236d3d52 vfs: per-cpu batched requeuing of free vnodes
Constant requeuing adds significant lock contention in certain
workloads. Lessen the problem by batching it.

Per-cpu areas are locked in order to synchronize against UMA freeing
memory.

vnode's v_mflag is converted to short to prevent the struct from
growing.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   122.38s user 1780.45s system 6242% cpu 30.480 total
patched: 144.84s user 985.90s system 4856% cpu 23.282 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22998
2020-01-13 02:39:41 +00:00
Mateusz Guzik
cc3593fbd9 vfs: rework vnode list management
The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22997
2020-01-13 02:37:25 +00:00
Mateusz Guzik
57083d2576 vfs: add per-mount vnode lazy list and use it for deferred inactive + msync
This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22995
2020-01-13 02:34:02 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik
91de98e6d4 vfs: only recalculate watermarks when limits are changing
Previously they would get recalculated all the time, in particular in:
getnewvnode -> vcheckspace -> vspace
2020-01-11 23:00:57 +00:00
Mateusz Guzik
e6ae744e0e vfs: deduplicate vnode allocation logic
This creates a dedicated routine (vn_alloc) to allocate vnodes.

As a side effect code duplicationw with getnewvnode_reserve is eleminated.

Add vn_free for symmetry.
2020-01-11 22:59:44 +00:00
Mateusz Guzik
b52d50cf69 vfs: prealloc vnodes in getnewvnode_reserve
Having a reserved vnode count does not guarantee that getnewvnodes wont
block later. Said blocking partially defeats the purpose of reserving in
the first place.

Preallocate instaed. The only consumer was always passing "1" as count
and never nesting reservations.
2020-01-11 22:58:14 +00:00
Mateusz Guzik
6928306764 vfs: incomplete pass at converting more ints to u_long
Most notably numvnodes and freevnodes were u_long, but parameters used to
govern them remained as ints.
2020-01-11 22:56:20 +00:00
Mateusz Guzik
bf62296f35 vfs: add missing CLTFLA_MPSAFE annotations
This covers all kern/vfs_*.c files.
2020-01-11 22:55:12 +00:00
Mateusz Guzik
a9a047bc87 vfs: handle doomed vnodes in vdefer_inactive
vgone dooms the vnode while keeping VI_OWEINACT set and then drops the
interlock.

vputx can pick up the interlock and pass it to vdefer_inactive since the
flag is set.

The race is harmless, just don't defer anything as vgone will take care of it.

Reported by:	pho
2020-01-07 20:24:21 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Mateusz Guzik
b7cc9d1847 vfs: trylock in vfs_msync and refactor the func
- use LK_NOWAIT instead of calling VOP_ISLOCKED before deciding to lock
- evaluate flags before looping over vnodes

Reviewed by:	kib
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23035
2020-01-07 15:44:19 +00:00
Mateusz Guzik
c92fe112a7 vfs: use a dedicated counter for free vnode recycling
Otherwise vlrureclaim activitity is mixed in and it is hard to tell which
vnodes got reclaimed.
2020-01-07 15:42:01 +00:00
Mateusz Guzik
cc2b586d69 vfs: prevent numvnodes and freevnodes re-reads when appropriate
Otherwise in code like this:
if (numvnodes > desiredvnodes)
	vnlru_free_locked(numvnodes - desiredvnodes, NULL);

numvnodes can drop below desiredvnodes prior to the call and if the
compiler generated another read the subtraction would get a negative
value.
2020-01-07 04:34:03 +00:00
Mateusz Guzik
37fe521a6f vfs: annotate numvnodes and vnode_free_list_mtx with __exclusive_cache_line 2020-01-07 04:30:49 +00:00
Mateusz Guzik
478368ca41 vfs: eliminate v_tag from struct vnode
There was only one consumer and it was using it incorrectly.

It is given an equivalent hack.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23037
2020-01-07 04:29:34 +00:00
Mateusz Guzik
a91190c63e vfs: add a helper for allocating marker vnodes 2020-01-07 04:27:40 +00:00
Mateusz Guzik
8dbc63520c vfs: drop thread argument from vinactive 2020-01-05 00:59:47 +00:00
Mateusz Guzik
867fd730c6 vfs: patch up vnode count assertions to report found value 2020-01-05 00:59:16 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
57db0e12c8 vfs: drop an always-false check from vlrureclaim
The vnode gets held few lines prior, making the VI_FREE condition
illegal.
2020-01-01 22:51:17 +00:00
Mateusz Guzik
eb9764615d vfs: remove production kernel checks and mp == NULL support from vdrop
1. The only place in the tree which calls getnewvnode with mp == NULL does it
for vp_crossmp which will never execute this codepath. Any vnode which legally
has ->v_mount == NULL is also doomed, which once more wont execute this code.
2. Remove an assertion for v_holdcnt from production kernels. It gets taken care
of by refcount macros in debug kernels.

Any code which would want to pass NULL mp can construct a fake one instead.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D22722
2019-12-27 11:26:12 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
ff4486e827 vfs: refactor vhold and vdrop
No fuctional changes.
2019-12-10 00:08:05 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mateusz Guzik
791a24c7ea vfs: clean up vputx a little
1. replace hand-rolled macros for operation type with enum
2. unlock the vnode in vput itself, there is no need to branch on it. existence
of VPUTX_VPUT remains significant in that the inactive variant adds LK_NOWAIT
to locking request.
3. remove the useless v_usecount assertion. few lines above the checks if
v_usecount > 0 and leaves. should the value be negative, refcount would fail.
4. the CTR return vnode %p to the freelist is incorrect as vdrop may find the
vnode with holdcnt > 1. if the like should exist, it should be moved there
5. no need to error = 0 for everyone

Reviewed by:	kib, jeff (previous version)
Differential Revision:	https://reviews.freebsd.org/D22718
2019-12-08 21:13:07 +00:00
Mateusz Guzik
fd6e0c43a6 vfs: factor out vnode destruction out of vdrop
Sponsored by:	The FreeBSD Foundation
2019-12-08 21:11:25 +00:00
Mateusz Guzik
12e483e5f7 vfs: clean up delmntque similarly to vdrop r355414 2019-12-07 12:56:24 +00:00
Mateusz Guzik
4f4d9a086a vfs: catch vn_printf up with reality
- add the missing VV_VMSIZEVNLOCK and VV_READLINK flags
- add decoding v_mflag

While here sort flags.
2019-12-07 12:55:58 +00:00
Mateusz Guzik
3eeb8a1fba vfs: remove 'active' variable from _vdrop
No functional changes.
2019-12-05 13:40:10 +00:00
Mateusz Guzik
d957f3a4f0 vfs: perform a more racy check in vfs_notify_upper
Locking mp does not buy anything interms of correctness and only contributes to
contention.
2019-11-20 12:07:54 +00:00
Mateusz Guzik
1fccb43c39 vfs: change si_usecount management to count used vnodes
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.

There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).

Reviewed by:	kib, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22202
2019-11-20 12:05:59 +00:00
Jeff Roberson
67d0e29304 Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22116
2019-10-29 21:06:34 +00:00
Konstantin Belousov
c92f130498 Fix undefined behavior.
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().

Reported and reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22126
2019-10-23 16:06:47 +00:00
Konstantin Belousov
8076c4e7d1 vn_printf(): Decode VI_TEXT_REF.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-10-23 15:51:26 +00:00
Mateusz Guzik
d1cbf3eeea vfs: add MNTK_NOMSYNC
On many filesystems the traversal is effectively a no-op. Add a way to avoid
the overhead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:40:34 +00:00
Mateusz Guzik
737241cd51 vfs: return free vnode batches in sync instead of vfs_msync
It is a more natural fit. vfs_msync only deals with active vnodes.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22008
2019-10-13 15:39:11 +00:00
Mateusz Guzik
dc20b834ca vfs: add optional root vnode caching
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by:	jeff (previous version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:14:32 +00:00
Eric van Gyzen
e61e783b83 Add CTLFLAG_STATS to some vfs sysctl OIDs
Add CTLFLAG_STATS to the following OIDs:

vfs.altbufferflushes
vfs.recursiveflushes
vfs.barrierwrites
vfs.flushwithdeps
vfs.reassignbufcalls

Refer to r353111.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2019-10-04 21:43:43 +00:00
Ed Maste
f91dd6091b simplify path handling in sysctl_try_reclaim_vnode
MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei
verifies the presence of the NUL.  Thus there is no need to increase the
buffer size here.

The sysctl passes the string excluding the NUL, so req->newlen equal to
PATH_MAX is too long.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21876
2019-10-02 21:01:23 +00:00
Sean Eric Fagan
ba7a55d934 Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover:  Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir:  Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D21458
2019-09-23 04:28:07 +00:00
Mateusz Guzik
4cace859c2 vfs: convert struct mount counters to per-cpu
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before:   852393 ops/s
after:  76682077 ops/s

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21637
2019-09-16 21:37:47 +00:00
Mateusz Guzik
ee831b2543 vfs: manage mnt_lockref with atomics
See r352424.

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21574
2019-09-16 21:32:21 +00:00
Mateusz Guzik
a8c8e44bf0 vfs: manage mnt_ref with atomics
New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.

mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.

Reviewed by:	kib, jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21425
2019-09-16 21:31:02 +00:00
Mateusz Guzik
ce3ba63f67 vfs: release usecount using fetchadd
1. If we release the last usecount we take ownership of the hold count, which
means the vnode will remain allocated until we vdrop it.
2. If someone else vrefs they will find no usecount and will proceed to add
their own hold count.
3. No code has a problem with v_usecount transitioning to 0 without the
interlock

These facts combined mean we can fetchadd instead of having a cmpset loop.

Reviewed by:	kib (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21528
2019-09-13 15:49:04 +00:00