They were only modified to accomodate a redundant assertion.
This runs into problems as lockless lookup can still try to use the vnode
and crash instead of getting an error.
The bug was only present in kernels with INVARIANTS.
Reported by: kevans
- Mask out recently added VV_* bits to avoid printing them twice.
- Keep VI_LOCKed on the same line as the rest of the flags.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27261
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata. In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating. Right now this should be nop.
Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
All vnodes allocated by UMA are present on the global list used by
vnlru. getnewvnode modifies the state of the vnode (most notably
altering v_holdcnt) but never locks it. Moreover filesystems also
modify it in arbitrary manners sometimes before taking the vnode
lock or adding any other indicator that the vnode can be used.
Picking up such a vnode by vnlru would be problematic.
To that end there are 2 fixes:
- vlrureclaim, not recycling v_holdcnt == 0 vnodes, takes the
interlock and verifies that v_mount has been set. It is an
invariant that the vnode lock is held by that point, providing
the necessary serialisation against locking after vhold.
- vnlru_free_locked, only wanting to free v_holdcnt == 0 vnodes,
now makes sure to only transition the count 0->1 and newly allocated
vnodes start with v_holdcnt == VHOLD_NO_SMR. getnewvnode will only
transition VHOLD_NO_SMR->1 once more making the hold fail
Tested by: pho
It no longer protects any of tested fields, keeping all the checks racy.
While here make vtryrecycle drop the vnode on its own. Avoids an additional
lock trip.
mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.
The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.
For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.
Tested by: pho
Discussed with: kib
It is a common pattern for filesystems' VOP_INACTIVE() implementation
to forcibly reclaim the vnode when its state is final. For instance,
UFS vnode with zero link count is removed, and since it is
inactivated, the last open reference on it is dropped.
On the other hand, vnode might get spurious usecount reference for
many reasons. If the spurious reference exists while vgonel() checks
for active state of the vnode, it would recurse into VOP_INACTIVE().
Fix it by checking and not doing inactivation when vgone() was called
from inactive VOP.
Reported and tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There are several negative side-effects of not calling into VOP layer
at all for page cache reads. The biggest is the missed activation of
EVFILT_READ knotes.
Also, it allows filesystem to make more fine grained decision to
refuse read from page cache.
Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for
asserts.
Reviewed by: markj
Tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346
If possible, i.e. if the requested range is resident valid in the vm
object queue, and some secondary conditions hold, copy data for
read(2) directly from the valid cached pages, avoiding vnode lock and
instantiating buffers. I intentionally do not start read-ahead, nor
handle the advises on the cached range.
Filesystems indicate support for VMIO reads by setting VIRF_PGREAD
flag, which must not be cleared until vnode reclamation.
Currently only filesystems that use vnode pager for v_objects can
enable it, due to reliance on vnp_size. There is a WIP to handle it
for tmpfs.
Reviewed by: markj
Discussed with: jeff
Tested by: pho
Benchmarked by: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25968
This removes a lot of special casing from the VFS layer.
Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D25612
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.
Tested by: pho
Sponsored by: The FreeBSD Foundation
- Convert panic() calls to INVARIANTS-only assertions. The PCTRIE code
provides some of the same protection since it will panic upon an
attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.
Reviewed by: cem, kib, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25965
As the 20-year old comment above it suggests, the counter is of dubious
value. Moreover, the (global) counter was not updated precisely and
hurts scalability.
Reviewed by: cem, kib, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25965
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.
Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.
Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.
Reported by: pho
Modified on each permission change and link/unlink.
Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25573
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality. Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.
Reported by: Attilio
Reviewed by: kib, markj
Testing: pho (in progress)
Sponsored by: Isilon
Differential Revision: https://reviews.freebsd.org/D25782