vgone dooms the vnode while keeping VI_OWEINACT set and then drops the
interlock.
vputx can pick up the interlock and pass it to vdefer_inactive since the
flag is set.
The race is harmless, just don't defer anything as vgone will take care of it.
Reported by: pho
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.
Reviewed by: kib (previous version)
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D23036
- use LK_NOWAIT instead of calling VOP_ISLOCKED before deciding to lock
- evaluate flags before looping over vnodes
Reviewed by: kib
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D23035
Otherwise in code like this:
if (numvnodes > desiredvnodes)
vnlru_free_locked(numvnodes - desiredvnodes, NULL);
numvnodes can drop below desiredvnodes prior to the call and if the
compiler generated another read the subtraction would get a negative
value.
There was only one consumer and it was using it incorrectly.
It is given an equivalent hack.
Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D23037
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
1. The only place in the tree which calls getnewvnode with mp == NULL does it
for vp_crossmp which will never execute this codepath. Any vnode which legally
has ->v_mount == NULL is also doomed, which once more wont execute this code.
2. Remove an assertion for v_holdcnt from production kernels. It gets taken care
of by refcount macros in debug kernels.
Any code which would want to pass NULL mp can construct a fake one instead.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D22722
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
1. replace hand-rolled macros for operation type with enum
2. unlock the vnode in vput itself, there is no need to branch on it. existence
of VPUTX_VPUT remains significant in that the inactive variant adds LK_NOWAIT
to locking request.
3. remove the useless v_usecount assertion. few lines above the checks if
v_usecount > 0 and leaves. should the value be negative, refcount would fail.
4. the CTR return vnode %p to the freelist is incorrect as vdrop may find the
vnode with holdcnt > 1. if the like should exist, it should be moved there
5. no need to error = 0 for everyone
Reviewed by: kib, jeff (previous version)
Differential Revision: https://reviews.freebsd.org/D22718
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.
There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).
Reviewed by: kib, jeff
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22202
flag and use the same system.
This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().
Reported and reviewed by: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22126
On many filesystems the traversal is effectively a no-op. Add a way to avoid
the overhead.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22009
It is a more natural fit. vfs_msync only deals with active vnodes.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22008
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.
Reviewed by: jeff (previous version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646
MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei
verifies the presence of the NUL. Thus there is no need to increase the
buffer size here.
The sysctl passes the string excluding the NUL, so req->newlen equal to
PATH_MAX is too long.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21876
The two options are
* nocover/cover: Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir: Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.
Neither of these options is intended to be a default, for historical and
compatibility reasons.
Reviewed by: allanjude, kib
Differential Revision: https://reviews.freebsd.org/D21458
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.
Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s
Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637
New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.
mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.
Reviewed by: kib, jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21425
1. If we release the last usecount we take ownership of the hold count, which
means the vnode will remain allocated until we vdrop it.
2. If someone else vrefs they will find no usecount and will proceed to add
their own hold count.
3. No code has a problem with v_usecount transitioning to 0 without the
interlock
These facts combined mean we can fetchadd instead of having a cmpset loop.
Reviewed by: kib (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21528
There are 2 problems:
- it introduces a funny bug where it can end up trylocking the same vnode [1]
- it exposes a pre-existing softdep deadlock [2]
Both are easier to run into that the bug which got fixed, so revert until
a complete solution is worked out.
Reported by: cy [1], pho [2]
Sponsored by: The FreeBSD Foundation
Currently the code only bumps holdcnt and clears the VI_FREE flag, not
performing actual vhold. Since the vnode is still visible elsewhere, a
potential new user can find it and incorrectly assume it is properly held.
Use vholdl instead to correctly hold the vnode. Another place recycling
(vlrureclaim) does this already.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21522
vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting
freed and usecount to denote it is actively used.
Previously all operations bumping usecount would also bump holdcnt, which is
not necessary. We can detect if usecount is already > 1 (in which case holdcnt
is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves
on atomic ops.
Reviewed by: kib
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21471
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure. This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.
With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets. Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only
items in excess of the estimate are reclaimed. Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.
Now, when under memory pressure, the page daemon will trim zones
rather than draining them. As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.
Reviewed by: jeff
Tested by: pho (an earlier version)
MFC after: 2 months
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16667
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode. Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.
Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM(). This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation. Remove code from
vnode_create_vobject() to handle races with the parallel destruction.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21412
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.
vinactive is very rarely needed and can be tested for without the vnode lock held.
This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:
before: 557563641706 (lockmgr:tmpfs)
after: 46309603301 (lockmgr:tmpfs)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21371
The plan is to drop the flags argument. There is also a temporary bug
now that nullfs ignores the flag.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21252
After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week