287 Commits

Author SHA1 Message Date
Alex Richardson
1d15bceae6 tmpfs: implement pathconf(_PC_SYMLINK_MAX)
This fixes one of the sys/audit tests when running them on tmpfs.

Reviewed By:	delphij, kib
Differential Revision: https://reviews.freebsd.org/D28387
2021-01-29 09:30:25 +00:00
Kyle Evans
0f919ed4ae tmpfs: push VEXEC check into tmpfs_lookup()
vfs_cache_lookup() has already done the appropriate VEXEC check, therefore
we must not re-check in VOP_CACHEDLOOKUP.

This fixes O_SEARCH semantics on tmpfs and removes a redundant descent into
VOP_ACCESS() in the common case.

Reported-by:	arichardson (via CheriBSD Jenkins CI)
Reviewed-by:	kib
MFC-after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28401
2021-01-28 19:25:11 -06:00
Mateusz Guzik
c09f799271 tmpfs: drop acq fence now that vn_load_v_data_smr has consume semantics 2021-01-25 22:40:15 +00:00
Mateusz Guzik
cc96f92a57 atomic: make atomic_store_ptr type-aware 2021-01-25 22:40:15 +00:00
Mateusz Guzik
618029af50 tmpfs: add support for lockless symlink lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Konstantin Belousov
2d1e4220eb tmpfs_reclaim: detach unlinked node on dereferencing.
Otherwise it is dereferenced one extra time at unmount, if it survives
long enough.  One way to hold the reference on such node is to keep it
open.

tmpfs_vptocnp() now needs to account for the possibility that unlocked
node was removed from the list.

Reported by:	danfe
Tested by:	danfe, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-14 14:51:37 +02:00
Konstantin Belousov
685265ecfb tmpfs_reclaim: style
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-01-14 14:43:13 +02:00
Konstantin Belousov
ac2576b9f7 tmpfs open: assert that there is no double-init of f_data.
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:36 +02:00
Konstantin Belousov
9f200bc47b tmpfs_free_tmp(): explicitly assert that tmp is locked
Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked
mutex provides different failure mode.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:29 +02:00
Konstantin Belousov
42bebbda9e tmpfs: make M_TMPFSMNT static to tmpfs_vfsops.c
This malloc type is only used in this file.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:44:55 +02:00
Mark Johnston
90f580b954 Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry.  Several filesystems were not initializing
the field, which ends up being copied out to userland.

Reported by:	Syed Faraz Abrar <faraz@elttam.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27792
2021-01-03 11:50:31 -05:00
Mateusz Guzik
3e506a67bb vfs: add v_irflag accessors
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27793
2021-01-03 06:50:06 +00:00
Mateusz Guzik
d71965127f tmpfs: use VNPASS when asserting on a vnode in tmpfs_read_pgcache 2021-01-01 03:23:01 +00:00
Mateusz Guzik
f24aa01f9d tmpfs: reorder struct tmpfs_node to shrink it by 8 bytes
The reduction (232 -> 224 bytes) allows UMA to fit one more item (17 -> 18)
per slab as reported in vm.uma.TMPFS_node.keg.ipers.
2020-11-05 11:24:45 +00:00
Mateusz Guzik
7c58c37ebb tmpfs: change tmpfs dirent zone into a malloc type
It is 64 bytes.
2020-10-30 14:07:25 +00:00
Mateusz Guzik
4bfebc8d2c cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename 2020-10-30 10:46:35 +00:00
Mateusz Guzik
25fb30bd9a vfs: drop spurious cache_purge on rmdir
The removed directory gets cache_purged which is sufficient to remove any entries
related to the parent.

Note only tmpfs, ufs and zfs are patched.
2020-10-23 15:50:49 +00:00
Eric van Gyzen
f9cc8410e1 vm_ooffset_t is now unsigned
vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.

Reported by:	Coverity
Reviewed by:	kib markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26214
2020-09-18 16:48:08 +00:00
Konstantin Belousov
016b7c7e39 tmpfs: restore atime updates for reads from page cache.
Split TMPFS_NODE_ACCCESSED bit into dedicated byte that can be updated
atomically without locks or (locked) atomics.

tn_update_getattr() change also contains unrelated bug fix.

Reported by:	lwhsu
PR:	249362
Reviewed by:	markj (previous version)
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26451
2020-09-16 21:28:18 +00:00
Konstantin Belousov
23f9071466 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-09-16 21:24:34 +00:00
Konstantin Belousov
081e36e760 Add tmpfs page cache read support.
Or it could be explained as lockless (for vnode lock) reads.  Reads
are performed from the node tn_obj object.  Tmpfs regular vnode object
lifecycle is significantly different from the normal OBJT_VNODE: it is
alive as far as ref_count > 0.

Ensure liveness of the tmpfs VREG node and consequently v_object
inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open().
Provide custom tmpfs fo_close() method on file, to ensure that close
is paired with open.

Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks.
It is quite cheap in code size sense to support page-ins for read for
tmpfs even if we do not own tmpfs vnode lock.  Also, we can handle
holes in tmpfs node without additional efforts, and do not have
limitation of the transfer size.

Reviewed by:	markj
Discussed with and benchmarked by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:19:16 +00:00
Konstantin Belousov
4601f5f5ee Microoptimize tmpfs node ref/unref by using atomics.
Avoid tmpfs mount and node locks when ref count is greater than zero,
which is the case until node is being destroyed by unlink or unmount.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:13:21 +00:00
Mateusz Guzik
c86d2ba8a5 tmpfs: drop spurious cache_purge in tmpfs_reclaim
vgone already performs it.
2020-09-04 19:30:15 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Mateusz Guzik
39f8815070 cache: add cache_rename, a dedicated helper to use for renames
While here make both tmpfs and ufs use it.

No fuctional changes.
2020-08-20 10:05:46 +00:00
Mateusz Guzik
1abe36567f tmpfs: use vget_prep/vget_finish instead of vget + vnode 2020-08-16 17:19:23 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Mateusz Guzik
03337743db vfs: clean MNTK_FPLOOKUP if MNT_UNION is set
Elides checking it during lookup.
2020-08-10 11:51:21 +00:00
Mateusz Guzik
9a14439f2f tmpfs: add VOP_STAT handler 2020-08-07 23:07:47 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
172ffe702c tmpfs: add support for lockless lookup
Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25580
2020-07-25 10:38:44 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Ryan Moeller
693d10a291 tmpfs: Preserve alignment of struct fid fields
On 64-bit platforms, the two short fields in `struct tmpfs_fid` are padded to
the 64-bit alignment of the long field.  This pushes the offsets of the
subsequent fields by 4 bytes and makes `struct tmpfs_fid` bigger than
`struct fid`.  `tmpfs_vptofh()` casts a `struct fid *` to `struct tmpfs_fid *`,
causing 4 bytes of adjacent memory to be overwritten when the struct fields are
set.  Through several layers of indirection and embedded structs, the adjacent
memory for one particular call to `tmpfs_vptofh()` happens to be the stack
canary for `nfsrvd_compound()`.  Half of the canary ends up being clobbered,
going unnoticed until eventually the stack check fails when `nfsrvd_compound()`
returns and a panic is triggered.

Instead of duplicating fields of `struct fid` in `struct tmpfs_fid`, narrow the
struct to cover only the unique fields for tmpfs and assert at compile time
that the struct fits in the allotted space.  This way we don't have to
replicate the offsets of `struct fid` fields, we just use them directly.

Reviewed by:	kib, mav, rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25077
2020-06-03 09:38:51 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
074ad60a4c vfs: make write suspension mandatory
At the time opt-in was introduced adding yourself as a writer was esrializing
across the mount point. Nowadays it is fully per-cpu, the only impact being
a small single-threaded hit on top of what's there right now.

Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which
has is done regardless.

Should someone want to microoptimize this single-threaded they can coalesce
looking the mount up with adding a write to it.
2020-02-15 13:00:39 +00:00
Konstantin Belousov
c1e84733ac tmpfs: add nomtime mount option,
which disables tracking mtime updates due to writes through the shared
mapped areas backed by tmpfs files.  This removes periodic scans which
downgrades rw mapped pages to ro to note the writes.

Suggested by:	mjg
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23432
2020-02-04 19:05:58 +00:00
Konstantin Belousov
b66352b787 tmpfs_mount update: simplify, cache the value of VFS_TO_TMPFS() calculation.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-04 18:52:25 +00:00
Mateusz Guzik
2abdae33b1 tmpfs: inline tmpfs_update
It was generated to be just a jumping off point to tmpfs_itimes.

While here provide a dedicated variant for getattr since we normally don't
expect to need to the update from that caller.
2020-02-03 17:06:21 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
45757984f8 vfs: consistently use size_t for buflen around VOP_VPTOCNP 2020-02-01 20:34:43 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Mateusz Guzik
2a829749d3 tmpfs: add missing CLTFLAG_MPSAFE annotation 2020-01-15 01:32:11 +00:00
Mateusz Guzik
cc3593fbd9 vfs: rework vnode list management
The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22997
2020-01-13 02:37:25 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Doug Moore
f9f4c60aa7 Including <sys/tmpfs.h> into non-kernel software leads to a
compilation error because, without _KERNEL defined, the macro
TMPFS_VALIDATE_DIR is invoked, but never defined. User-level software
that includes sys/tmpfs.h must define _KERNEL to make the definition
of TMPFS_VALIDATE_DIR visible.

This change puts all the inline functions that, directly or
indirectly, invoke MPASS into the scope of the _KERNEL block, allowing
many user-space includers of <sys/tmpfs.h> to stop defining _KERNEL.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22874
2019-12-19 16:39:52 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Mateusz Guzik
c8b29d1212 vfs: locking primitives which elide ->v_vnlock and shared locking disablement
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by:	jeff (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22665
2019-12-11 23:11:21 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00