freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	b261bb4057	vfs: Add KASAN state transitions for vnodes vnodes are a bit special in that they may exist on per-CPU lists even while free. Add a KASAN-only destructor that poisons regions of each vnode that are not expected to be accessed after a free. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29459	2021-04-13 17:42:21 -04:00
Mateusz Guzik	e9272225e6	vfs: fix vnlru marker handling for filtered/unfiltered cases The global list has a marker with an invariant that free vnodes are placed somewhere past that. A caller which performs filtering (like ZFS) can move said marker all the way to the end, across free vnodes which don't match. Then a caller which does not perform filtering will fail to find them. This makes vn_alloc_hard sleep for 1 second instead of reclaiming, resulting in significant stalls. Fix the problem by requiring an explicit marker by callers which do filtering. As a temporary measure extend vnlru_free to restart if it fails to reclaim anything. Big thanks go to the reporter for testing several iterations of the patch. Reported by: Yamagi <lists yamagi.org> Tested by: Yamagi <lists yamagi.org> Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29324	2021-03-18 14:59:03 +00:00
Konstantin Belousov	44691b33cc	vlrureclaim: only skip vnode with resident pages if it own the pages Nullfs vnode which shares vm_object and pages with the lower vnode should not be exempt from the reclaim just because lower vnode cached a lot. Their reclamation is actually very cheap and should be preferred over real fs vnodes, but this change is already useful. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Robert Wing	0cc746f193	filt_fsevent: only record interested events Respect filter-specific flags for the EVFILT_FS filter. When a kevent is registered with the EVFILT_FS filter, it is always triggered when an EVFILT_FS event occurs, regardless of the filter-specific flags used. Fix that. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28974	2021-03-02 14:19:22 -09:00
Konstantin Belousov	2bfd8992c7	vnode: move write cluster support data to inodes. The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679	2021-02-21 11:38:21 +02:00
Konstantin Belousov	662283b108	vn_printf: handle VI_FOPENING Noted by: mjg Sponsored by: The FreeBSD Foundation MFC after: 6 days Fixes: `fa3bd463ce`	2021-02-18 16:28:28 +02:00
Konstantin Belousov	b59a8e63d6	Stop ignoring ERELOOKUP from VOP_INACTIVE() When possible, relock the vnode and retry inactivation. Only vunref() is required not to drop the vnode lock, so handle it specially by not retrying. This is a part of the efforts to ensure that unlinked not referenced vnode does not prevent inode from reusing. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Mateusz Guzik	739ecbcf1c	cache: add symlink support to lockless lookup Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488	2021-01-23 15:04:43 +00:00
Mateusz Guzik	33a195baf3	vfs: keep seqc unchanged as long as the vnode is accessible via SMR	2021-01-03 21:22:16 +00:00
Mateusz Guzik	82397d7919	vfs: denote vnode being a mount point with VIRF_MOUNTPOINT Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27794	2021-01-03 06:50:06 +00:00
Mateusz Guzik	3e506a67bb	vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793	2021-01-03 06:50:06 +00:00
Mateusz Guzik	0c23d26230	vfs: keep bad ops on vnode reclaim They were only modified to accomodate a redundant assertion. This runs into problems as lockless lookup can still try to use the vnode and crash instead of getting an error. The bug was only present in kernels with INVARIANTS. Reported by: kevans	2020-12-05 05:56:23 +00:00
John Baldwin	5335f6434b	Fix a few nits in vn_printf(). - Mask out recently added VV_* bits to avoid printing them twice. - Keep VI_LOCKed on the same line as the rest of the flags. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27261	2020-11-18 16:21:37 +00:00
Konstantin Belousov	441eb16a95	Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level. Restart syscalls and some sync operations when filesystem indicated ERELOOKUP condition, mostly for VOPs operating on metdata. In particular, lookup results cached in the inode/v_data is no longer valid and needs recalculating. Right now this should be nop. Assert that ERELOOKUP is catched everywhere and not returned to userspace, by asserting that td_errno != ERELOOKUP on syscall return path. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-13 09:42:32 +00:00
Mateusz Guzik	f6dd1aefb7	vfs: group mount per-cpu vars into one struct While here move frequently read stuff into the same cacheline. This shrinks struct mount by 64 bytes. Tested by: pho	2020-11-09 23:02:13 +00:00
Mateusz Guzik	e90afaa015	kqueue: save space by using only one func pointer for assertions	2020-11-09 00:04:35 +00:00
Mateusz Guzik	0685574968	vfs: change vnode poll to just a malloc type The size is 120, close fit for 128 and rarely used. The infrequent use avoidably populates per-CPU caches and ends up with more memory.	2020-10-30 14:02:56 +00:00
Mateusz Guzik	11743b6e47	vfs: tidy up vnlru_free Apart from cosmeatic changes make sure to only decrease the recycled counter if vtryrecycle succeeded. Tested by: pho	2020-10-27 18:13:09 +00:00
Mateusz Guzik	68ac2b804c	vfs: fix vnode reclaim races against getnwevnode All vnodes allocated by UMA are present on the global list used by vnlru. getnewvnode modifies the state of the vnode (most notably altering v_holdcnt) but never locks it. Moreover filesystems also modify it in arbitrary manners sometimes before taking the vnode lock or adding any other indicator that the vnode can be used. Picking up such a vnode by vnlru would be problematic. To that end there are 2 fixes: - vlrureclaim, not recycling v_holdcnt == 0 vnodes, takes the interlock and verifies that v_mount has been set. It is an invariant that the vnode lock is held by that point, providing the necessary serialisation against locking after vhold. - vnlru_free_locked, only wanting to free v_holdcnt == 0 vnodes, now makes sure to only transition the count 0->1 and newly allocated vnodes start with v_holdcnt == VHOLD_NO_SMR. getnewvnode will only transition VHOLD_NO_SMR->1 once more making the hold fail Tested by: pho	2020-10-27 18:12:07 +00:00
Mateusz Guzik	7cc1718613	vfs: fix a race where reclaim vholds freed vnodes Reported by: pho Tested by: pho (previous version) Fixes: r366974 ("vfs: stop taking the interlock in vnode reclaim")	2020-10-24 13:30:37 +00:00
Mateusz Guzik	703f3fafa5	vfs: stop taking the interlock in vnode reclaim It no longer protects any of tested fields, keeping all the checks racy. While here make vtryrecycle drop the vnode on its own. Avoids an additional lock trip.	2020-10-23 15:49:18 +00:00
Mateusz Guzik	c7520caa4f	vfs: prevent avoidable evictions on mkdir of existing directories mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST. The NOCACHE lookup will make the namecache unnecessarily evict the existing entry, and then fallback to the fs lookup routine eventually leading namei to return an error as the directory is already there. For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers fallbacks to the slowpath for concurrently executing lookups. Tested by: pho Discussed with: kib	2020-10-22 19:28:12 +00:00
Mateusz Guzik	ab21ed17ed	vfs: drop the de facto curthread argument from VOP_INACTIVE	2020-10-20 07:19:03 +00:00
Konstantin Belousov	c0baa3dc4a	vgonel(): avoid recursing into VOP_INACTIVE(). It is a common pattern for filesystems' VOP_INACTIVE() implementation to forcibly reclaim the vnode when its state is final. For instance, UFS vnode with zero link count is removed, and since it is inactivated, the last open reference on it is dropped. On the other hand, vnode might get spurious usecount reference for many reasons. If the spurious reference exists while vgonel() checks for active state of the vnode, it would recurse into VOP_INACTIVE(). Fix it by checking and not doing inactivation when vgone() was called from inactive VOP. Reported and tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-10-19 19:20:23 +00:00
Konstantin Belousov	3c484f325e	Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 22:06:36 +00:00
Mateusz Guzik	2bcfa5ba6f	vfs: drop a write-only var in vfs_periodic_msync_inactive	2020-09-08 16:06:26 +00:00
Mateusz Guzik	b1a824b684	vfs: retire vholdl as a symbol Similarly to vrefl in r364283.	2020-09-02 19:21:37 +00:00
Mateusz Guzik	2b4632aee9	vfs: purge cache entries early on vgone There is no reason for them to linger across reclaim and it is an invariant that doomed vnodes are not added to the namecache.	2020-09-02 19:21:10 +00:00
Mateusz Guzik	6fed89b179	kern: clean up empty lines in .c and .h files	2020-09-01 22:12:32 +00:00
Mateusz Guzik	a459a6cfe7	vfs: respect PRIV_VFS_LOOKUP in vaccess_smr Reported by: novel	2020-08-25 14:18:50 +00:00
Mateusz Guzik	9ce9158b53	vfs: support denying access in vaccess_vexec_smr	2020-08-23 21:05:06 +00:00
Mateusz Guzik	ba3b099198	vfs: factor away doomed vnode handling into vdropl_final	2020-08-23 21:04:35 +00:00
Mateusz Guzik	2ca83b5c27	vfs: mark freevnode as noinline	2020-08-23 11:05:26 +00:00
Mateusz Guzik	19337211f8	vfs: fix freevnode accounting Most notably add the missing decrement to vhold_smr. .-'---`-. ,' `. \| \ \| \ \ _ \ ,\ _ ,'-,/-)\ ( * \ \,' ,' ,'-) `._,) -',-') \/ ''/ ) / / / ,'-' Reported by: Dan Nelson <dnelson_1901@yahoo.com> Fixes: r362827 ("vfs: protect vnodes with smr")	2020-08-21 21:24:14 +00:00
Mateusz Guzik	8f226f4c23	vfs: remove the always-curthread td argument from VOP_RECLAIM	2020-08-19 07:28:01 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Konstantin Belousov	fbca789fc3	VMIO read If possible, i.e. if the requested range is resident valid in the vm object queue, and some secondary conditions hold, copy data for read(2) directly from the valid cached pages, avoiding vnode lock and instantiating buffers. I intentionally do not start read-ahead, nor handle the advises on the cached range. Filesystems indicate support for VMIO reads by setting VIRF_PGREAD flag, which must not be cleared until vnode reclamation. Currently only filesystems that use vnode pager for v_objects can enable it, due to reliance on vnp_size. There is a WIP to handle it for tmpfs. Reviewed by: markj Discussed with: jeff Tested by: pho Benchmarked by: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 21:02:45 +00:00
Mateusz Guzik	6041408826	vfs: retire vrefl as a symbol vrefl calls vref and there is only one in-tree consumer. Keep it as a macro for assertion purposes.	2020-08-16 18:51:12 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Mateusz Guzik	36f47512d9	vfs: inline vrefcnt	2020-08-12 04:53:20 +00:00
Mateusz Guzik	4c2d103a02	vfs: garbage collect vrefactn	2020-08-12 04:53:02 +00:00
Mateusz Guzik	6883f07e97	vfs: reimplement vref on top of vget No change in generated assembly.	2020-08-12 04:52:35 +00:00
Mateusz Guzik	3b44443626	devfs: rework si_usecount to track opens This removes a lot of special casing from the VFS layer. Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D25612	2020-08-11 14:27:57 +00:00
Mateusz Guzik	7f70080150	vfs: disallow NOCACHE with LOOKUP This means there is no expectation lookup will purge the terminal entry, which simplifies lockless lookup. Tested by: pho Sponsored by: The FreeBSD Foundation	2020-08-10 10:33:40 +00:00
Mateusz Guzik	1ff80a3400	vfs: release the interlock after failing to set VHOLD_NO_SMR While here add more comments. Diagnosed by: markj Reported by: pho Fixes: r362827 ("vfs: protect vnodes with smr")	2020-08-07 19:36:08 +00:00
Mark Johnston	0ffec1b03d	Clean up reassignbuf() and buf_vlist_remove() a bit. - Convert panic() calls to INVARIANTS-only assertions. The PCTRIE code provides some of the same protection since it will panic upon an attempt to remove a non-resident buffer. - Update the comment above reassignbuf() to reflect reality. Reviewed by: cem, kib, mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25965	2020-08-06 15:43:15 +00:00
Mark Johnston	7013797e34	Remove the vfs.reassignbufcalls counter and sysctl. As the 20-year old comment above it suggests, the counter is of dubious value. Moreover, the (global) counter was not updated precisely and hurts scalability. Reviewed by: cem, kib, mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25965	2020-08-06 15:42:59 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Mateusz Guzik	db99ec5656	vfs: support lockless dotdot lookup Tested by: pho	2020-08-04 23:07:42 +00:00
Mateusz Guzik	6e10434c02	cache: add cache_purge_vgone cache_purge locklessly checks whether the vnode at hand has any namecache entries. This can race with a concurrent purge which managed to remove the last entry, but may not be done touching the vnode. Make sure we observe the relevant vnode lock as not taken before proceeding with vgone. Paired with the fact that doomed vnodes cannnot receive entries this restores the invariant that there are no namecache-related writing users past cache_purge in vgone. Reported by: pho	2020-08-04 23:04:29 +00:00

1 2 3 4 5 ...

1138 Commits