Commit Graph

4247 Commits

Author SHA1 Message Date
Fedor Uporov
3767ed5b11 Add a EXT2FS-specific implementation for lseek(SEEK_DATA).
The lseek(SEEK_DATA) optimization logic could be simply borrowed from ufs side.
See, https://reviews.freebsd.org/D19599.

Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D23605
2020-02-18 16:39:57 +00:00
Pawel Biernacki
e0d69c5a88 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (1 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	kib, trasz
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23640
2020-02-15 18:48:38 +00:00
Mateusz Guzik
074ad60a4c vfs: make write suspension mandatory
At the time opt-in was introduced adding yourself as a writer was esrializing
across the mount point. Nowadays it is fully per-cpu, the only impact being
a small single-threaded hit on top of what's there right now.

Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which
has is done regardless.

Should someone want to microoptimize this single-threaded they can coalesce
looking the mount up with adding a write to it.
2020-02-15 13:00:39 +00:00
Konstantin Belousov
c1e84733ac tmpfs: add nomtime mount option,
which disables tracking mtime updates due to writes through the shared
mapped areas backed by tmpfs files.  This removes periodic scans which
downgrades rw mapped pages to ro to note the writes.

Suggested by:	mjg
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23432
2020-02-04 19:05:58 +00:00
Konstantin Belousov
b66352b787 tmpfs_mount update: simplify, cache the value of VFS_TO_TMPFS() calculation.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-04 18:52:25 +00:00
Mateusz Guzik
2abdae33b1 tmpfs: inline tmpfs_update
It was generated to be just a jumping off point to tmpfs_itimes.

While here provide a dedicated variant for getattr since we normally don't
expect to need to the update from that caller.
2020-02-03 17:06:21 +00:00
Mateusz Guzik
f1fa1ba3d0 Fix up various vnode-related asserts which did not dump the used vnode 2020-02-03 14:25:32 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Kyle Evans
bd11e674ec pseudofs: don't do VEXEC check in VOP_CACHEDLOOKUP
VOP_CACHEDLOOKUP should assume that the appropriate VEXEC check has been
done in the caller (vfs_cache_lookup), so it does not belong here.
2020-02-02 15:36:12 +00:00
Mateusz Guzik
10a15df653 vfs: remove the never set VDESC_VPP_WILLRELE flag 2020-02-02 09:35:48 +00:00
Mateusz Guzik
45757984f8 vfs: consistently use size_t for buflen around VOP_VPTOCNP 2020-02-01 20:34:43 +00:00
Konstantin Belousov
dc1d2cc648 Fix a bug in r357199.
Around a generic call to null_nodeget(), there is nothing that would
prevent the unmount of the nullfs mp until we process to the
insmntque1() point.  Calculate the VV_ROOT flag after insmntque1() to
not access mp->mnt_data before we have an exclusively locked vnode
from this mount point on the mp vnode list.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-01-30 19:34:37 +00:00
Mateusz Guzik
3cfabd81a1 vfs: remove the never set VDESC_NOMAP_VPP flag 2020-01-30 08:56:22 +00:00
Konstantin Belousov
5fc9e11c42 Save lower root vnode in nullfs mnt data instead of upper.
Nullfs needs to know the root vnode of the lower fs during the
operation.  Currently it caches the upper vnode of it, which is also
the root of the nullfs mount.  On unmount, nullfs calls vflush() with
rootrefs == 1, and aborts non-forced unmount if there are any more
vnodes instantiated during vflush().  This means that the reference to
the root vnode after failed non-forced unmount could be lost and
nullm_rootvp points to the freed memory.

Fix it by storing the reference for lower vnode instead, which is kept
intact during vflush().  nullfs_root() now instantiates the upper
vnode of lower root.  Care about VV_ROOT flag in null_nodeget().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-01-28 11:29:06 +00:00
Alex Richardson
162ae9c834 Allow bootstrapping makefs on older FreeBSD hosts and Linux/macOS
In order to do so we need to install the msdosfs headers to the bootstrap
sysroot and avoid includes of kernel headers that may not exist on every
host (e.g. sys/lockmgr.h). This change should allow bootstrapping of makefs
on FreeBSD 11+ as well as Linux and macOS.

We also have to avoid using the IO_SYNC macro since that may not be
available. In makefs it is only used to switch between calling
bwrite() and bdwrite() which both call the same function. Therefore we
can simply always call bwrite().

For our CheriBSD builds we always bootstrap makefs by setting
LOCAL_XTOOL_DIRS='lib/libnetbsd usr.sbin/makefs' and use the makefs binary
from the build tree to create a bootable disk image.

Reviewed By:	brooks
Differential Revision: https://reviews.freebsd.org/D23201
2020-01-27 12:02:41 +00:00
Rick Macklem
60a09a94cf Fix a crash in the NFSv4 server.
The PR reported a crash that occurred when a file was removed while
client(s) were actively doing lock operations on it.
Since nfsvno_getvp() will return NULL when the file does not exist,
the bug was obvious and easy to fix via this patch. It is a little
surprising that this wasn't found sooner, but I guess the above
case rarely occurs.

Tested by:	iron.udjin@gmail.com
PR:		242768
Reported by:	iron.udjin@gmail.com
MFC after:	2 weeks
2020-01-26 17:59:05 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Mateusz Guzik
d3cc535474 vfs: provide F_ISUNIONSTACK as a kludge for libc
Prior to introduction of this op libc's readdir would call fstatfs(2), in
effect unnecessarily copying kilobytes of data just to check fs name and a
mount flag.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D23162
2020-01-17 14:42:25 +00:00
Mateusz Guzik
e2fa68513e unionfs: use MNTK_NOMSYNC 2020-01-16 22:45:08 +00:00
Mateusz Guzik
2a829749d3 tmpfs: add missing CLTFLAG_MPSAFE annotation 2020-01-15 01:32:11 +00:00
Mateusz Guzik
7493134e08 nfs: add missing CLTFLAG_MPSAFE annotations 2020-01-15 01:31:57 +00:00
Mateusz Guzik
388820fbef fusefs: add missing CLTFLAG_MPSAFE annotation 2020-01-15 01:31:28 +00:00
Eric van Gyzen
cf64777f50 Add missing comma in nfsv4_errstr
Reported by:	Coverity
CID:		1412243
Sponsored by:	Dell EMC Isilon
2020-01-13 21:49:27 +00:00
Mateusz Guzik
cc3593fbd9 vfs: rework vnode list management
The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22997
2020-01-13 02:37:25 +00:00
Mateusz Guzik
57083d2576 vfs: add per-mount vnode lazy list and use it for deferred inactive + msync
This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22995
2020-01-13 02:34:02 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
4a20fe31c3 unionfs: fix up VOP_UNLOCK use after flags stopped being supported
For the most part the code was passing the LK_RELEASE flag.
The 2 cases which did not use the VOP_UNLOCK_FLAGS macro.

This fixes a panic when stacking unionfs on top of e.g., tmpfs when
debug is enabled.

Note there are latent bugs which prevent unionfs from working with debug
regardless of this change.

PR:		243064
Reported by:	Mason Loring Bliss
2020-01-03 22:12:25 +00:00
Mateusz Guzik
d2203b48a5 msdos: vgone unconstructed vnode before vputing it
Otherwise someone else may race to start using it. Race window
was opened by r351748 ("vfs: implement usecount implying holdcnt").

Noted by:	kib
2020-01-01 22:50:23 +00:00
Mateusz Guzik
f342b91c76 msdosfs: add a missing MNT_VNODE_FOREACH_ALL_ABORT to msdosfs_sync 2020-01-01 22:47:00 +00:00
Mark Johnston
9f5632e6c8 Remove page locking for queue operations.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page.  It is also no longer needed in
the page queue scans.  This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.

Reviewed by:	jeff
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D22885
2019-12-28 19:04:00 +00:00
Rick Macklem
57fd4aa7e4 Change NFSv4.1 and NFSv4.2 error strings to start with lower case letter.
r356084 added error strings for NFSv4.1 and NFSv4.2, with the first
character capitalized. Since the other error strings were not capitalized
and these strings would usually be imbedded in an error, I decided to
make the first characters lower cased.
No real effect but more consistent.
2019-12-26 21:06:34 +00:00
Rick Macklem
8f2940cec7 Add NFSv4.1 and NFSv4.2 errors to nfsv4_errstr.h.
nfsv4_errstr.h only had strings for NFSv4.0 errors. This patch adds the
errors for NFSv4.1 and NFSv4.2. At this time, this file is not used by
any sources in the tree, so the change is not significant.
I do plan on using nfsv4_errstr.h in a future patch to mount_nfs.c.
Since I am doing this patch so that "minor version mismatch" will be
recognized, I made that string less abbreviated.
2019-12-25 22:25:30 +00:00
Rick Macklem
05dcd5d2c8 Fix nfsmount() so that it will return NFSERR_MINORVERMISMATCH.
If nfsrpc_getdirpath() returns NFSERR_MINORVERMISMATCH, it would erroneously
get mapped to EIO. This was not particularily harmful, but would make it
hard for sysadmins to diagnose why an NFSv4 mount is failing.

mount_nfs.c still needs to be fixed so that it does not report
NFSERR_MINORVERMISMATCH as an unknown error 10021.

MFC after:	1 week
2019-12-25 01:15:38 +00:00
Doug Moore
f9f4c60aa7 Including <sys/tmpfs.h> into non-kernel software leads to a
compilation error because, without _KERNEL defined, the macro
TMPFS_VALIDATE_DIR is invoked, but never defined. User-level software
that includes sys/tmpfs.h must define _KERNEL to make the definition
of TMPFS_VALIDATE_DIR visible.

This change puts all the inline functions that, directly or
indirectly, invoke MPASS into the scope of the _KERNEL block, allowing
many user-space includers of <sys/tmpfs.h> to stop defining _KERNEL.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22874
2019-12-19 16:39:52 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Rick Macklem
f808cf7294 Silence some "might not be initialized" warnings for riscv64.
None of these case were actually using the variable(s) uninitialized, but
I figured that silencing the warnings via initializing them made sense.

Some of these predated r355677.
2019-12-13 21:38:08 +00:00
Rick Macklem
bf6ac05aa3 Add some more initializations to quiet riscv build.
The one case in nfs_copy_file_range() was a legitimate case, although
it would probably never occur in practice.
2019-12-13 01:34:25 +00:00
Rick Macklem
95bf2e523b Fix the build for MAC not defined and a couple of might not be initialized.
r355677 broke the build for the not MAC defined case and a couple of
might not be initialized warnings were generated for riscv. Others seem
to be erroneous.

Hopefully there won't be too many more build errors.

Pointy hat goes on me.
2019-12-13 00:45:14 +00:00
Rick Macklem
c057a37818 Add support for NFSv4.2 to the NFS client and server.
This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes
(RFC-8276) to the NFS client and server.
NFSv4.2 is comprised of several optional features that can be supported
in addition to NFSv4.1. This patch adds the following optional features:
   - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
   - posix_fallocate()
   - intra server file range copying via the copy_file_range(2) syscall
     --> Avoiding data tranfer over the wire to/from the NFS client.
   - lseek(SEEK_DATA/SEEK_HOLE)
   - Extended attribute syscalls for "user" namespace attributes as defined
     by RFC-8276.

Although this patch is fairly large, it should not affect support for
the other versions of NFS. However it does add two new sysctls that allow
a sysadmin to limit which minor versions of NFSv4 a server supports, allowing
a sysadmin to disable NFSv4.2.

Unfortunately, when the NFS stats structure was last revised, it was assumed
that there would be no additional operations added beyond what was
specified in RFC-7862. However RFC-8276 did add additional operations,
forcing the NFS stats structure to revised again. It now has extra unused
entries in all arrays, so that future extensions to NFSv4.2 can be
accomodated without revising this structure again.

A future commit will update nfsstat(1) to report counts for the new NFSv4.2
specific operations/procedures.

This patch affects the internal interface between the nfscommon, nfscl and
nfsd modules and, as such, they all must be upgraded simultaneously.
I will do a version bump (although arguably not needed), due to this.

This code has survived a "make universe" but has not been built with a
recent GCC. If you encounter build problems, please email me.

Relnotes:	yes
2019-12-12 23:22:55 +00:00
Mateusz Guzik
c8b29d1212 vfs: locking primitives which elide ->v_vnlock and shared locking disablement
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by:	jeff (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22665
2019-12-11 23:11:21 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Rick Macklem
a95cd06e9a Delete an unused external declaration.
Since nfsv4_opflag is no longer used in nfs_clcomsubs.c, delete the
external declaration of it. Found during NFSv4.2 code merge.

MFC after:	2 weeks
2019-12-08 16:59:36 +00:00
Rick Macklem
8e1906f700 Fix kernel handling of a NFSERR_MINORVERSMISMATCH NFSv4 server reply.
When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate
a status result for the first operation in the compound. Without this
patch, this will result in a bogus EBADXDR error return.
Returning EBADXDR is relatively harmless, but a correct reply of
NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct
minor version to use for a File Layout DS now that there can be NFSv4.2
DS servers.

mount_nfs.c still needs to be fixed for this, although how the mount fails
is only useful to help sysadmins isolate why a mount fails.

Found during testing of the NFSv4.2 client and server.

MFC after:	2 weeks
2019-12-08 00:06:00 +00:00
Rick Macklem
238da71f91 Add some definitions for NFSv4.2 which will be used by subsequent commits.
This is a preliminary commit of NFSv4.2 definitions that will be used by
subsequent commits which adds NFSv4.2 support to the NFS client and server.

There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
2019-12-07 23:13:51 +00:00
Rick Macklem
394dae30b1 Set the XATTRSUPPORT attribute bit for NFSv4.2, always cleared for now.
Since r355472 added code which clears the XATTRSUPPORT bit for non-NFSv4.2
mounts, it is now safe to set it.

There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
This commit completes updates to nfsproto.h required by the NFSv4.2.
2019-12-07 01:10:38 +00:00
Rick Macklem
2096ce0339 Add a couple of definitions for NFSv4.2 and update macros to use them.
This patch adds code to macros to clear attribute bits not supported
by NFSv4.2. For now, these bits are never set anyhow, but this prepares
the code for the addition of NFSv4.2 support in a future commit.

There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
2019-12-06 23:51:11 +00:00
Rick Macklem
8f9259a550 Add some definitions for NFSv4.2 which will be used by subsequent commits.
This is a preliminary commit of NFSv4.2 definitions that will be used by
subsequent commits which adds NFSv4.2 support to the NFS client and server.

There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
2019-12-06 01:53:02 +00:00
Mateusz Guzik
1e0006e49c nullfs: locklessly check for entries in null_hashget
During random sampling over poudriere -j 104 over 10% of calls returned NULL.
2019-12-05 13:41:22 +00:00
Konstantin Belousov
a51c8071a3 Stop using per-mount tmpfs zones.
Requested and reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22643
2019-12-05 00:03:17 +00:00