1038 Commits

Author SHA1 Message Date
Mateusz Guzik
6928306764 vfs: incomplete pass at converting more ints to u_long
Most notably numvnodes and freevnodes were u_long, but parameters used to
govern them remained as ints.
2020-01-11 22:56:20 +00:00
Mateusz Guzik
bf62296f35 vfs: add missing CLTFLA_MPSAFE annotations
This covers all kern/vfs_*.c files.
2020-01-11 22:55:12 +00:00
Mateusz Guzik
a9a047bc87 vfs: handle doomed vnodes in vdefer_inactive
vgone dooms the vnode while keeping VI_OWEINACT set and then drops the
interlock.

vputx can pick up the interlock and pass it to vdefer_inactive since the
flag is set.

The race is harmless, just don't defer anything as vgone will take care of it.

Reported by:	pho
2020-01-07 20:24:21 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Mateusz Guzik
b7cc9d1847 vfs: trylock in vfs_msync and refactor the func
- use LK_NOWAIT instead of calling VOP_ISLOCKED before deciding to lock
- evaluate flags before looping over vnodes

Reviewed by:	kib
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23035
2020-01-07 15:44:19 +00:00
Mateusz Guzik
c92fe112a7 vfs: use a dedicated counter for free vnode recycling
Otherwise vlrureclaim activitity is mixed in and it is hard to tell which
vnodes got reclaimed.
2020-01-07 15:42:01 +00:00
Mateusz Guzik
cc2b586d69 vfs: prevent numvnodes and freevnodes re-reads when appropriate
Otherwise in code like this:
if (numvnodes > desiredvnodes)
	vnlru_free_locked(numvnodes - desiredvnodes, NULL);

numvnodes can drop below desiredvnodes prior to the call and if the
compiler generated another read the subtraction would get a negative
value.
2020-01-07 04:34:03 +00:00
Mateusz Guzik
37fe521a6f vfs: annotate numvnodes and vnode_free_list_mtx with __exclusive_cache_line 2020-01-07 04:30:49 +00:00
Mateusz Guzik
478368ca41 vfs: eliminate v_tag from struct vnode
There was only one consumer and it was using it incorrectly.

It is given an equivalent hack.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23037
2020-01-07 04:29:34 +00:00
Mateusz Guzik
a91190c63e vfs: add a helper for allocating marker vnodes 2020-01-07 04:27:40 +00:00
Mateusz Guzik
8dbc63520c vfs: drop thread argument from vinactive 2020-01-05 00:59:47 +00:00
Mateusz Guzik
867fd730c6 vfs: patch up vnode count assertions to report found value 2020-01-05 00:59:16 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
57db0e12c8 vfs: drop an always-false check from vlrureclaim
The vnode gets held few lines prior, making the VI_FREE condition
illegal.
2020-01-01 22:51:17 +00:00
Mateusz Guzik
eb9764615d vfs: remove production kernel checks and mp == NULL support from vdrop
1. The only place in the tree which calls getnewvnode with mp == NULL does it
for vp_crossmp which will never execute this codepath. Any vnode which legally
has ->v_mount == NULL is also doomed, which once more wont execute this code.
2. Remove an assertion for v_holdcnt from production kernels. It gets taken care
of by refcount macros in debug kernels.

Any code which would want to pass NULL mp can construct a fake one instead.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D22722
2019-12-27 11:26:12 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
ff4486e827 vfs: refactor vhold and vdrop
No fuctional changes.
2019-12-10 00:08:05 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mateusz Guzik
791a24c7ea vfs: clean up vputx a little
1. replace hand-rolled macros for operation type with enum
2. unlock the vnode in vput itself, there is no need to branch on it. existence
of VPUTX_VPUT remains significant in that the inactive variant adds LK_NOWAIT
to locking request.
3. remove the useless v_usecount assertion. few lines above the checks if
v_usecount > 0 and leaves. should the value be negative, refcount would fail.
4. the CTR return vnode %p to the freelist is incorrect as vdrop may find the
vnode with holdcnt > 1. if the like should exist, it should be moved there
5. no need to error = 0 for everyone

Reviewed by:	kib, jeff (previous version)
Differential Revision:	https://reviews.freebsd.org/D22718
2019-12-08 21:13:07 +00:00
Mateusz Guzik
fd6e0c43a6 vfs: factor out vnode destruction out of vdrop
Sponsored by:	The FreeBSD Foundation
2019-12-08 21:11:25 +00:00
Mateusz Guzik
12e483e5f7 vfs: clean up delmntque similarly to vdrop r355414 2019-12-07 12:56:24 +00:00
Mateusz Guzik
4f4d9a086a vfs: catch vn_printf up with reality
- add the missing VV_VMSIZEVNLOCK and VV_READLINK flags
- add decoding v_mflag

While here sort flags.
2019-12-07 12:55:58 +00:00
Mateusz Guzik
3eeb8a1fba vfs: remove 'active' variable from _vdrop
No functional changes.
2019-12-05 13:40:10 +00:00
Mateusz Guzik
d957f3a4f0 vfs: perform a more racy check in vfs_notify_upper
Locking mp does not buy anything interms of correctness and only contributes to
contention.
2019-11-20 12:07:54 +00:00
Mateusz Guzik
1fccb43c39 vfs: change si_usecount management to count used vnodes
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.

There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).

Reviewed by:	kib, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22202
2019-11-20 12:05:59 +00:00
Jeff Roberson
67d0e29304 Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22116
2019-10-29 21:06:34 +00:00
Konstantin Belousov
c92f130498 Fix undefined behavior.
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().

Reported and reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22126
2019-10-23 16:06:47 +00:00
Konstantin Belousov
8076c4e7d1 vn_printf(): Decode VI_TEXT_REF.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-10-23 15:51:26 +00:00
Mateusz Guzik
d1cbf3eeea vfs: add MNTK_NOMSYNC
On many filesystems the traversal is effectively a no-op. Add a way to avoid
the overhead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:40:34 +00:00
Mateusz Guzik
737241cd51 vfs: return free vnode batches in sync instead of vfs_msync
It is a more natural fit. vfs_msync only deals with active vnodes.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22008
2019-10-13 15:39:11 +00:00
Mateusz Guzik
dc20b834ca vfs: add optional root vnode caching
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by:	jeff (previous version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:14:32 +00:00
Eric van Gyzen
e61e783b83 Add CTLFLAG_STATS to some vfs sysctl OIDs
Add CTLFLAG_STATS to the following OIDs:

vfs.altbufferflushes
vfs.recursiveflushes
vfs.barrierwrites
vfs.flushwithdeps
vfs.reassignbufcalls

Refer to r353111.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2019-10-04 21:43:43 +00:00
Ed Maste
f91dd6091b simplify path handling in sysctl_try_reclaim_vnode
MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei
verifies the presence of the NUL.  Thus there is no need to increase the
buffer size here.

The sysctl passes the string excluding the NUL, so req->newlen equal to
PATH_MAX is too long.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21876
2019-10-02 21:01:23 +00:00
Sean Eric Fagan
ba7a55d934 Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover:  Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir:  Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D21458
2019-09-23 04:28:07 +00:00
Mateusz Guzik
4cace859c2 vfs: convert struct mount counters to per-cpu
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before:   852393 ops/s
after:  76682077 ops/s

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21637
2019-09-16 21:37:47 +00:00
Mateusz Guzik
ee831b2543 vfs: manage mnt_lockref with atomics
See r352424.

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21574
2019-09-16 21:32:21 +00:00
Mateusz Guzik
a8c8e44bf0 vfs: manage mnt_ref with atomics
New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.

mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.

Reviewed by:	kib, jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21425
2019-09-16 21:31:02 +00:00
Mateusz Guzik
ce3ba63f67 vfs: release usecount using fetchadd
1. If we release the last usecount we take ownership of the hold count, which
means the vnode will remain allocated until we vdrop it.
2. If someone else vrefs they will find no usecount and will proceed to add
their own hold count.
3. No code has a problem with v_usecount transitioning to 0 without the
interlock

These facts combined mean we can fetchadd instead of having a cmpset loop.

Reviewed by:	kib (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21528
2019-09-13 15:49:04 +00:00
Mateusz Guzik
68c3c1abe1 vfs: temporarily revert r351825
There are 2 problems:
- it introduces a funny bug where it can end up trylocking the same vnode [1]
- it exposes a pre-existing softdep deadlock [2]

Both are easier to run into that the bug which got fixed, so revert until
a complete solution is worked out.

Reported by:	cy [1], pho [2]
Sponsored by:	The FreeBSD Foundation
2019-09-05 18:19:51 +00:00
Mateusz Guzik
c07d4a0a68 vfs: fully hold vnodes in vnlru_free_locked
Currently the code only bumps holdcnt and clears the VI_FREE flag, not
performing actual vhold. Since the vnode is still visible elsewhere, a
potential new user can find it and incorrectly assume it is properly held.

Use vholdl instead to correctly hold the vnode. Another place recycling
(vlrureclaim) does this already.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21522
2019-09-04 19:23:18 +00:00
Mateusz Guzik
e3c3248cc7 vfs: implement usecount implying holdcnt
vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting
freed and usecount to denote it is actively used.

Previously all operations bumping usecount would also bump holdcnt, which is
not necessary. We can detect if usecount is already > 1 (in which case holdcnt
is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves
on atomic ops.

Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21471
2019-09-03 15:42:11 +00:00
Mark Johnston
08cfa56ea3 Extend uma_reclaim() to permit different reclamation targets.
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure.  This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.

With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets.  Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure.  In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim().  When trimming, only
items in excess of the estimate are reclaimed.  Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.

Now, when under memory pressure, the page daemon will trim zones
rather than draining them.  As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.

Reviewed by:	jeff
Tested by:	pho (an earlier version)
MFC after:	2 months
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16667
2019-09-01 22:22:43 +00:00
Mateusz Guzik
c2b600f98f vfs: add a missing VNODE_REFCOUNT_FENCE_REL to v_incr_usecount_locked
Sponsored by:	The FreeBSD Foundation
2019-08-30 21:54:45 +00:00
Mateusz Guzik
3bb8d8d8c9 vfs: tidy up assertions in vfs_subr
- assert unlocked vnode interlock in vref
- assert right counts in vputx
- print debug info for panic in vdrop

Sponsored by:	The FreeBSD Foundation
2019-08-30 00:45:53 +00:00
Konstantin Belousov
6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Mateusz Guzik
1e2f0ceb2f vfs: add VOP_NEED_INACTIVE
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.

vinactive is very rarely needed and can be tested for without the vnode lock held.

This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:

before: 557563641706 (lockmgr:tmpfs)
after:   46309603301 (lockmgr:tmpfs)

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:34:24 +00:00
Mateusz Guzik
368cabbcb5 vfs: stop passing LK_INTERLOCK to VOP_UNLOCK
The plan is to drop the flags argument. There is also a temporary bug
now that nullfs ignores the flag.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21252
2019-08-27 20:30:56 +00:00
Mateusz Guzik
0256405e98 vfs: add vholdnz (for already held vnodes)
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21358
2019-08-25 05:11:43 +00:00
Konstantin Belousov
e671edac06 De-commision the MNTK_NOINSMNTQ kernel mount flag.
After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-23 19:40:10 +00:00
Jeff Roberson
cf27e0d125 Use an atomic reference count for paging in progress so that callers do not
require the object lock.

Reviewed by:	markj
Tested by:	pho (as part of a larger branch)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21311
2019-08-19 23:09:38 +00:00