Commit Graph

1124 Commits

Author SHA1 Message Date
Mateusz Guzik
f6dd1aefb7 vfs: group mount per-cpu vars into one struct
While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by:	pho
2020-11-09 23:02:13 +00:00
Mateusz Guzik
e90afaa015 kqueue: save space by using only one func pointer for assertions 2020-11-09 00:04:35 +00:00
Mateusz Guzik
0685574968 vfs: change vnode poll to just a malloc type
The size is 120, close fit for 128 and rarely used. The infrequent use
avoidably populates per-CPU caches and ends up with more memory.
2020-10-30 14:02:56 +00:00
Mateusz Guzik
11743b6e47 vfs: tidy up vnlru_free
Apart from cosmeatic changes make sure to only decrease the recycled counter
if vtryrecycle succeeded.

Tested by:	pho
2020-10-27 18:13:09 +00:00
Mateusz Guzik
68ac2b804c vfs: fix vnode reclaim races against getnwevnode
All vnodes allocated by UMA are present on the global list used by
vnlru. getnewvnode modifies the state of the vnode (most notably
altering v_holdcnt) but never locks it. Moreover filesystems also
modify it in arbitrary manners sometimes before taking the vnode
lock or adding any other indicator that the vnode can be used.

Picking up such a vnode by vnlru would be problematic.

To that end there are 2 fixes:
- vlrureclaim, not recycling v_holdcnt == 0 vnodes, takes the
interlock and verifies that v_mount has been set. It is an
invariant that the vnode lock is held by that point, providing
the necessary serialisation against locking after vhold.
- vnlru_free_locked, only wanting to free v_holdcnt == 0 vnodes,
now makes sure to only transition the count 0->1 and newly allocated
vnodes start with v_holdcnt == VHOLD_NO_SMR. getnewvnode will only
transition VHOLD_NO_SMR->1 once more making the hold fail

Tested by:	pho
2020-10-27 18:12:07 +00:00
Mateusz Guzik
7cc1718613 vfs: fix a race where reclaim vholds freed vnodes
Reported by:	pho
Tested by:	pho (previous version)
Fixes:	r366974 ("vfs: stop taking the interlock in vnode reclaim")
2020-10-24 13:30:37 +00:00
Mateusz Guzik
703f3fafa5 vfs: stop taking the interlock in vnode reclaim
It no longer protects any of tested fields, keeping all the checks racy.

While here make vtryrecycle drop the vnode on its own. Avoids an additional
lock trip.
2020-10-23 15:49:18 +00:00
Mateusz Guzik
c7520caa4f vfs: prevent avoidable evictions on mkdir of existing directories
mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.

The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.

For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.

Tested by:	pho
Discussed with:	kib
2020-10-22 19:28:12 +00:00
Mateusz Guzik
ab21ed17ed vfs: drop the de facto curthread argument from VOP_INACTIVE 2020-10-20 07:19:03 +00:00
Konstantin Belousov
c0baa3dc4a vgonel(): avoid recursing into VOP_INACTIVE().
It is a common pattern for filesystems' VOP_INACTIVE() implementation
to forcibly reclaim the vnode when its state is final.  For instance,
UFS vnode with zero link count is removed, and since it is
inactivated, the last open reference on it is dropped.

On the other hand, vnode might get spurious usecount reference for
many reasons.  If the spurious reference exists while vgonel() checks
for active state of the vnode, it would recurse into VOP_INACTIVE().

Fix it by checking and not doing inactivation when vgone() was called
from inactive VOP.

Reported and tested by:	pho
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-19 19:20:23 +00:00
Konstantin Belousov
3c484f325e Convert page cache read to VOP.
There are several negative side-effects of not calling into VOP layer
at all for page cache reads.  The biggest is the missed activation of
EVFILT_READ knotes.

Also, it allows filesystem to make more fine grained decision to
refuse read from page cache.

Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for
asserts.

Reviewed by:	markj
Tested by:	pho
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:06:36 +00:00
Mateusz Guzik
2bcfa5ba6f vfs: drop a write-only var in vfs_periodic_msync_inactive 2020-09-08 16:06:26 +00:00
Mateusz Guzik
b1a824b684 vfs: retire vholdl as a symbol
Similarly to vrefl in r364283.
2020-09-02 19:21:37 +00:00
Mateusz Guzik
2b4632aee9 vfs: purge cache entries early on vgone
There is no reason for them to linger across reclaim and it is an
invariant that doomed vnodes are not added to the namecache.
2020-09-02 19:21:10 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Mateusz Guzik
a459a6cfe7 vfs: respect PRIV_VFS_LOOKUP in vaccess_smr
Reported by:	novel
2020-08-25 14:18:50 +00:00
Mateusz Guzik
9ce9158b53 vfs: support denying access in vaccess_vexec_smr 2020-08-23 21:05:06 +00:00
Mateusz Guzik
ba3b099198 vfs: factor away doomed vnode handling into vdropl_final 2020-08-23 21:04:35 +00:00
Mateusz Guzik
2ca83b5c27 vfs: mark freevnode as noinline 2020-08-23 11:05:26 +00:00
Mateusz Guzik
19337211f8 vfs: fix freevnode accounting
Most notably add the missing decrement to vhold_smr.

    .-'---`-.
  ,'          `.
  |             \
  |              \
  \           _  \
  ,\  _    ,'-,/-)\
  ( * \ \,' ,' ,'-)
   `._,)     -',-')
     \/         ''/
      )        / /
     /       ,'-'

Reported by:	Dan Nelson <dnelson_1901@yahoo.com>
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-21 21:24:14 +00:00
Mateusz Guzik
8f226f4c23 vfs: remove the always-curthread td argument from VOP_RECLAIM 2020-08-19 07:28:01 +00:00
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Konstantin Belousov
fbca789fc3 VMIO read
If possible, i.e. if the requested range is resident valid in the vm
object queue, and some secondary conditions hold, copy data for
read(2) directly from the valid cached pages, avoiding vnode lock and
instantiating buffers.  I intentionally do not start read-ahead, nor
handle the advises on the cached range.

Filesystems indicate support for VMIO reads by setting VIRF_PGREAD
flag, which must not be cleared until vnode reclamation.

Currently only filesystems that use vnode pager for v_objects can
enable it, due to reliance on vnp_size.  There is a WIP to handle it
for tmpfs.

Reviewed by:	markj
Discussed with:	jeff
Tested by:	pho
Benchmarked by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25968
2020-08-16 21:02:45 +00:00
Mateusz Guzik
6041408826 vfs: retire vrefl as a symbol
vrefl calls vref and there is only one in-tree consumer.

Keep it as a macro for assertion purposes.
2020-08-16 18:51:12 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Mateusz Guzik
36f47512d9 vfs: inline vrefcnt 2020-08-12 04:53:20 +00:00
Mateusz Guzik
4c2d103a02 vfs: garbage collect vrefactn 2020-08-12 04:53:02 +00:00
Mateusz Guzik
6883f07e97 vfs: reimplement vref on top of vget
No change in generated assembly.
2020-08-12 04:52:35 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
7f70080150 vfs: disallow NOCACHE with LOOKUP
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:33:40 +00:00
Mateusz Guzik
1ff80a3400 vfs: release the interlock after failing to set VHOLD_NO_SMR
While here add more comments.

Diagnosed by:	markj
Reported by:	pho
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-07 19:36:08 +00:00
Mark Johnston
0ffec1b03d Clean up reassignbuf() and buf_vlist_remove() a bit.
- Convert panic() calls to INVARIANTS-only assertions.  The PCTRIE code
  provides some of the same protection since it will panic upon an
  attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:43:15 +00:00
Mark Johnston
7013797e34 Remove the vfs.reassignbufcalls counter and sysctl.
As the 20-year old comment above it suggests, the counter is of dubious
value.  Moreover, the (global) counter was not updated precisely and
hurts scalability.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:42:59 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
db99ec5656 vfs: support lockless dotdot lookup
Tested by:	pho
2020-08-04 23:07:42 +00:00
Mateusz Guzik
6e10434c02 cache: add cache_purge_vgone
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.

Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.

Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.

Reported by:	pho
2020-08-04 23:04:29 +00:00
Mateusz Guzik
838984de32 vfs: move namecache initialisation into cache_vnode_init 2020-08-02 19:42:06 +00:00
Mateusz Guzik
848f8effdd vfs: inline vops if there are no pre/post associated calls
This removes a level of indirection from frequently used methods, most notably
VOP_LOCK1 and VOP_UNLOCK1.

Tested by:	pho
2020-07-30 15:50:51 +00:00
Mateusz Guzik
07d2145a17 vfs: add the infrastructure for lockless lookup
Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25577
2020-07-25 10:32:45 +00:00
Mateusz Guzik
0379ff6ae3 vfs: introduce vnode sequence counters
Modified on each permission change and link/unlink.

Reviewed by:	kib
Tested by:	pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25573
2020-07-25 10:31:52 +00:00
Conrad Meyer
68ee1dda06 Add unlocked/SMR fast path to getblk()
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality.  Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.

Reported by:	Attilio
Reviewed by:	kib, markj
Testing:	pho (in progress)
Sponsored by:	Isilon
Differential Revision:	https://reviews.freebsd.org/D25782
2020-07-24 17:34:04 +00:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Mateusz Guzik
9b0c2e5909 vfs: expand on vhold_smr comment 2020-07-06 02:00:35 +00:00
Mateusz Guzik
f8022be3e6 vfs: protect vnodes with smr
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.

See vhold_smr and vdropl for comments explaining caveats.

Reviewed by:	kib
Testec by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:56:29 +00:00
Ryan Moeller
245bfd34da Deduplicate fsid comparisons
Comparing fsid_t objects requires internal knowledge of the fsid structure
and yet this is duplicated across a number of places in the code.

Simplify by creating a fsidcmp function (macro).

Reviewed by:	mjg, rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D24749
2020-05-21 01:55:35 +00:00
Chuck Silvers
f15ccf8836 Add a new "mntfs" pseudo file system which provides private device vnodes for
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.

Reviewed by:	kib, mckusick
Approved by:	imp (mentor)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D23787
2020-03-06 18:41:37 +00:00
Ryan Libby
2782c00c04 vfs: quiet -Wwrite-strings
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23797
2020-02-23 03:32:11 +00:00
Jeff Roberson
6c5f36ff30 Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in
virtual address or physical page allocation need to be marked with this
flag.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23712
2020-02-19 08:17:27 +00:00
Mateusz Guzik
3403d5245e vfs: fix vlrureclaim ->v_object access
The routine was checking for ->v_type == VBAD. Since vgone drops the interlock
early sets this type at the end of the process of dooming a vnode, this opens
a time window where it can clear the pointer while the inerlock-holders is
accessing it.

Another note is that the code was:
	   (vp->v_object != NULL &&
	   vp->v_object->resident_page_count > trigger)

With the compiler being fully allowed to emit another read to get the pointer,
and in fact it did on the kernel used by pho.

Use atomic_load_ptr and remember the result.

Note that this depends on type-safety of vm_object.

Reported by:	pho
2020-02-16 03:33:34 +00:00
Mateusz Guzik
c615009461 vfs: check early for VCHR in vput_final to short-circuit in the common case
Otherwise the compiler inlines v_decr_devcount which keps getting jumped over
in the common case of not dealing with a device.
2020-02-16 03:16:28 +00:00