Commit Graph

4640 Commits

Author SHA1 Message Date
Rick Macklem
7f5508fe78 nfscl: Avoid KASSERT() panic in cache_enter_time()
Commit 844aa31c6d added cache_enter_time_flags(), specifically
so that the NFS client could specify that cache enter replace
any stale entry for the same name.  Doing so avoids a KASSERT()
panic() in cache_enter_time(), as reported by the PR.

This patch uses cache_enter_time_flags() for Readdirplus, to
avoid the panic(), since it is impossible for the NFS client
to know if another client (or a local process on the NFS server)
has replaced a file with another file of the same name.

This patch only affects NFS mounts that use the "rdirplus"
mount option.

There may be other places in the NFS client where this needs
to be done, but no panic() has been observed during testing.

PR:	257043
MFC after:	2 weeks
2021-07-14 13:33:37 -07:00
Mark Johnston
b9ca419a21 fifo: Explicitly initialize generation numbers when opening
The fi_rgen and fi_wgen fields are generation numbers used when sleeping
waiting for the other end of the fifo to be opened.  The fields were not
explicitly initialized after allocation, but this was harmless.  To
avoid false positives from KMSAN, though, ensure that they get
initialized to zero.

Reported by:	KMSAN
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-07-13 17:45:49 -04:00
Rick Macklem
1e0a518d65 nfscl: Add a Linux compatible "nconnect" mount option
Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.

A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.

One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write).  The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30970
2021-07-08 17:39:04 -07:00
Rick Macklem
c5f4772c66 nfscl: Improve "Consider increasing kern.ipc.maxsockbuf" message
When the setting of kern.ipc.maxsockbuf is less than what is
desired for I/O based on vfs.maxbcachebuf and vfs.nfs.bufpackets,
a console message of "Consider increasing kern.ipc.maxsockbuf".
is printed.

This patch modifies the message to provide a suggested value
for kern.ipc.maxsockbuf.
Note that the setting is only needed when the NFS rsize/wsize
is set to vfs.maxbcachebuf.

While here, make nfs_bufpackets global, so that it can be used
by a future patch that adds a sysctl to set the NFS server's
maximum I/O size.  Also, remove "sizeof(u_int32_t)" from the maximum
packet length, since NFS_MAXXDR is already an "overestimate"
of the actual length.

MFC after:	2 weeks
2021-06-30 15:15:41 -07:00
Jason A. Harmening
372691a7ae unionfs: release parent vnodes in deferred context
Each unionfs node holds a reference to its parent directory vnode.
A single open file reference can therefore end up keeping an
arbitrarily deep vnode hierarchy in place.  When that reference is
released, the resulting VOP_RECLAIM call chain can then exhaust the
kernel stack.

This is easily reproducible by running the unionfs.sh stress2 test.
Fix it by deferring recursive unionfs vnode release to taskqueue
context.

PR: 238883
Reviewed By:	kib (earlier version), markj
Differential Revision: https://reviews.freebsd.org/D30748
2021-06-29 06:02:01 -07:00
Alan Somers
18b19f8c6e fusefs: correctly set lock owner during FUSE_SETLK
During FUSE_SETLK, the owner field should uniquely identify the calling
process.  The fusefs module now sets it to the process's pid.
Previously, it expected the calling process to set it directly, which
was wrong.

libfuse also apparently expects the owner field to be set during
FUSE_GETLK, though I'm not sure why.

PR:		256005
Reported by:	Agata <chogata@moosefs.pro>
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D30622
2021-06-25 20:40:08 -06:00
Rick Macklem
a145cf3f73 nfscl: Change the default minor version for NFSv4 mounts
When NFSv4.1 support was added to the client, the implementation was
still experimental and, as such, the default minor version was set to 0.
Since the NFSv4.1 client implementation is now believed to be solid
and the NFSv4.1/4.2 protocol is significantly better than NFSv4.0,
I beieve that NFSv4.1/4.2 should be used where possible.

This patch changes the default minor version for NFSv4 to be the highest
minor version supported by the NFSv4 server. If a specific minor version
is desired, the "minorversion" mount option can be used to override
this default.  This is compatible with the Linux NFSv4 client behaviour.

This was discussed on freebsd-current@ in mid-May 2021 under
the subject "changing the default NFSv4 minor version" and
the consensus seemed to be support for this change.
It also appeared that changing this for FreeBSD 13.1 was
not considered a POLA violation, so long as UPDATING
and RELNOTES entries were made for it.

MFC after:	2 weeks
2021-06-24 18:52:23 -07:00
Konstantin Belousov
190110f2eb unionfs: do not use bare struct componentname
Allocate nameidata on stack and NDPREINIT() it, for compatibility with
assumptions from other filesystems' lookup code.

Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30041
2021-06-23 23:46:15 +03:00
Alan Somers
5403f2c163 fusefs: ensure that FUSE ops' headers' unique values are actually unique
Every FUSE operation has a unique value in its header.  As the name
implies, these values are supposed to be unique among all outstanding
operations.  And since FUSE_INTERRUPT is asynchronous and racy, it is
desirable that the unique values be unique among all operations that are
"close in time".

Ensure that they are actually unique by incrementing them whenever we
reuse a fuse_dispatcher object, for example during fsync, write, and
listextattr.

PR:		244686
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D30810
2021-06-19 14:45:29 -06:00
Alan Somers
b97c7abc1a fusefs: delete dead code
It was always dead, accidentally included in SVN r345876.

MFC after:	2 weeks
Reviewed by:	pfg
2021-06-19 14:45:04 -06:00
gAlfonso-bit
9b876fbd50 Simplify fuse_device_filt_write
It always returns 1, so why bother having a variable.

MFC after:	2 weeks
MFC with:	7b8622fa22
Pull Request:	https://github.com/freebsd/freebsd-src/pull/478
2021-06-16 15:54:24 -06:00
Alan Somers
7b8622fa22 fusefs: support EVFILT_WRITE on /dev/fuse
/dev/fuse is always ready for writing, so it's kind of dumb to poll it.
But some applications do it anyway.  Better to return ready than EINVAL.

MFC after:	2 weeks
Reviewed by:	emaste, pfg
Differential Revision: https://reviews.freebsd.org/D30784
2021-06-16 13:34:14 -06:00
Alan Somers
0b9a5c6fa1 fusefs: improve warnings about buggy FUSE servers
The fusefs driver will print warning messages about FUSE servers that
commit protocol violations.  Previously it would print those warnings on
every violation, but that could spam the console.  Now it will print
each warning no more than once per lifetime of the mount.  There is also
now a dtrace probe for each violation.

MFC after:	2 weeks
Sponsored by:	Axcient
Reviewed by:	emaste, pfg
Differential Revision: https://reviews.freebsd.org/D30780
2021-06-16 13:31:31 -06:00
Rick Macklem
aed98fa5ac nfscl: Make NFSv4.0 client acquisition NFSv4.1/4.2 compatible
When the NFSv4.0 client was implemented, acquisition of a clientid
via SetClientID/SetClientIDConfirm was done upon the first Open,
since that was when it was needed.  NFSv4.1/4.2 acquires the clientid
during mount (via ExchangeID/CreateSession), since the associated
session is required during mount.

This patch modifies the NFSv4.0 mount so that it acquires the
clientid during mount.  This simplifies the code and makes it
easy to implement "find the highest minor version supported by
the NFSv4 server", which will be done for the default minorversion
in a future commit.
The "start_renewthread" argument for nfscl_getcl() is replaced
by "tryminvers", which will be used by the aforementioned
future commit.

MFC after:	2 weeks
2021-06-15 17:48:51 -07:00
Alan Somers
d63e6bc256 fusefs: delete dead code
Delete two fields in the per-mountpoint struct that have never been
used.

MFC after:	2 weeks
Sponsored by:	Axcient
2021-06-15 13:34:01 -06:00
Rick Macklem
e1a907a25c krpc: Acquire ref count of CLIENT for backchannel use
Michael Dexter <editor@callfortesting.org> reported
a crash in FreeNAS, where the first argument to
clnt_bck_svccall() was no longer valid.
This argument is a pointer to the callback CLIENT
structure, which is free'd when the associated
NFSv4 ClientID is free'd.

This appears to have occurred because a callback
reply was still in the socket receive queue when
the CLIENT structure was free'd.

This patch acquires a reference count on the CLIENT
that is not CLNT_RELEASE()'d until the socket structure
is destroyed. This should guarantee that the CLIENT
structure is still valid when clnt_bck_svccall() is called.
It also adds a check for closed or closing to
clnt_bck_svccall() so that it will not process the callback
RPC reply message after the ClientID is free'd.

Comments by:	mav
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30153
2021-06-11 16:57:14 -07:00
Rick Macklem
5e5ca4c8fc nfscl: Add a "has acquired a delegation" flag for delegations
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently.

For a common case where delegations are not being issued by the
NFSv4 server, the code acquires the mutex lock for open/lock state,
finds the delegation list empty and just unlocks the mutex and returns.
This patch adds an NFS mount point flag that is set when a delegation
is issued for the mount.  Then the patched code checks for this flag
before acquiring the open/lock mutex, avoiding the need to acquire
the lock for the case where delegations are not being issued by the
NFSv4 server.
This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.

This commit should not affect the high level semantics of delegation
handling.

MFC after:      2 weeks
2021-06-09 08:00:43 -07:00
Rick Macklem
03c81af249 nfscl: Fix generation of va_fsid for a tree of NFSv4 server file systems
Pre-r318997 the code looked like:
if (vp->v_mount->mnt_stat.f_fsid.val[0] != (uint32_t)np->n_vattr.na_filesid[0])
         vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0];
Doing this assignment got lost by r318997 and, as such, NFSv4 mounts
of servers with trees of file systems on the server is broken, due to duplicate
fileno values for the same st_dev/va_fsid.

Although I could have re-introduced the assignment, since the value of
na_filesid[0] is not guaranteed to be unique across the server file systems,
I felt it was better to always do the hash for na_filesid[0,1].
Since dev_t (st_dev/va_fsid) is now 64bits, I switched to a 64bit hash.

There is a slight chance of a hash conflict where 2 different na_filesid
values map to same va_fsid, which will be documented in the BUGS
section of the man page for mount_nfs(8).  Using a table to keep track
of mappings to catch conflicts would not easily scale to 10,000+ server file
systems and, when the conflict occurs, it only results in fts(3) reporting
a "directory cycle" under certain circumstances.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30660
2021-06-07 13:48:25 -07:00
Jason A. Harmening
59409cb90f Add a generic mechanism for preventing forced unmount
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount.  Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
2021-06-05 18:20:36 -07:00
Rick Macklem
a5df139ec6 nfsd: Fix when NFSERR_WRONGSEC may be replied to NFSv4 clients
Commit d224f05fcf pre-parsed the next operation number for
the put file handle operations.  This patch uses this next
operation number, plus the type of the file handle being set by
the put file handle operation, to implement the rules in RFC5661
Sec. 2.6 with respect to replying NFSERR_WRONGSEC.

This patch also adds a check to see if NFSERR_WRONGSEC should be
replied when about to perform Lookup, Lookupp or Open with a file
name component, so that the NFSERR_WRONGSEC reply is done for
these operations, as required by RFC5661 Sec. 2.6.

This patch does not have any practical effect for the FreeBSD NFSv4
client and I believe that the same is true for the Linux client,
since NFSERR_WRONGSEC is considered a fatal error at this time.

MFC after:	2 weeks
2021-06-05 16:53:07 -07:00
Rick Macklem
56e9d8e38e nfsd: Fix NFSv4.1/4.2 Secinfo_no_name when security flavors empty
Commit 947bd2479b added support for the Secinfo_no_name operation.
When a non-exported file system is being traversed, the list of
security flavors is empty.  It turns out that the Linux client
mount attempt fails when the security flavors list in the
Secinfo_no_name reply is empty.

This patch modifies Secinfo/Secinfo_no_name so that it replies
with all four security flavors when the list is empty.
This fixes Linux NFSv4.1/4.2 mounts when the file system at
the NFSv4 root (as specified on a V4: exports(5) line) is
not exported.

MFC after:	2 weeks
2021-06-04 20:31:20 -07:00
Rick Macklem
d224f05fcf nfsd: Pre-parse the next NFSv4 operation number for put FH operations
RFC5661 Sec. 2.6 specifies when a NFSERR_WRONGSEC error reply can be done.
For the four operations PutFH, PutrootFH, PutpublicFH and RestoreFH,
NFSERR_WRONGSEC can or cannot be replied, depending upon what operation
follows one of these operations in the compound.

This patch modifies nfsrvd_compound() so that it parses the next operation
number before executing any of the above four operations, storing it in
"nextop".

A future commit will implement use of "nextop" to decide if NFSERR_WRONGSEC
can be replied for the above four operations.

This commit should not change the semantics of performing the compound RPC.

MFC after:	2 weeks
2021-06-03 20:48:26 -07:00
Konstantin Belousov
f9b1e711f0 fdescfs: add an option to return underlying file vnode on lookup
The 'nodup' option forces fdescfs to return real vnode behind file
descriptor instead of the fdescfs fd vnode, on lookup. The end result
is that e.g. stat("/dev/fd/3") returns the stat data for the underlying
vnode, if any.  Similarly, fchdir(2) works in the expected way.

For open(2), if applied over file descriptor opened with O_PATH, it
effectively re-open that vnode into normal file descriptor which has the
specified access mode, assuming the current vnode permissions allow it.

If the file descriptor does not reference vnode, the behavior is unchanged.

This is done by a mount option, because permission check on open(2) breaks
established fdescfs open semantic of dup(2)-ing the descriptor.  So it
is not suitable for /dev/fd mount.

Tested by:	Andrew Walker <awalker@ixsystems.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30140
2021-06-04 03:30:12 +03:00
Rick Macklem
984c71f903 nfsd: Fix the failure return for non-fh NFSv4 operations
Without this patch, nfsd_checkrootexp() returns failure
and then the NFSv4 operation would reply NFSERR_WRONGSEC.
RFC5661 Sec. 2.6 only allows a few NFSv4 operations, none
of which call nfsv4_checktootexp(), to return NFSERR_WRONGSEC.
This patch modifies nfsd_checkrootexp() to return the
error instead of a boolean and sets the returned error to an RPC
layer AUTH_ERR, as discussed on nfsv4@ietf.org.
The patch also fixes nfsd_errmap() so that the pseudo
error NFSERR_AUTHERR is handled correctly such that an RPC layer
AUTH_ERR is replied to the NFSv4 client.

The two new "enum auth_stat" values have not yet been assigned
by IANA, but are the expected next two values.

The effect on extant NFSv4 clients of this change appears
limited to reporting a different failure error when a
mount that does not use adequate security is attempted.

MFC after:	2 weeks
2021-06-02 15:28:07 -07:00
Rick Macklem
1d4afcaca2 nfsd: Delete extraneous NFSv4 root checks
There are several NFSv4.1/4.2 server operation functions which
have unneeded checks for the NFSv4 root being set up.
The checks are not needed because the operations always follow
a Sequence operation, which performs the check.

This patch deletes these checks, simplifying the code so
that a future patch that fixes the checks to conform with
RFC5661 Sec. 2.6 will be less extension.

MFC after:	2 weeks
2021-05-31 19:41:17 -07:00
Mateusz Guzik
f4aa64528e tmpfs: save on relocking the allnode lock in tmpfs_free_node_locked 2021-05-31 23:21:15 +00:00
Mateusz Guzik
68c2544264 nfs: even up value returned by nfsrv_parsename with copyinstr
Reported by:	dim
Reviewed by:	rmacklem
2021-05-31 16:32:04 +00:00
Rick Macklem
947bd2479b nfsd: Add support for the NFSv4.1/4.2 Secinfo_no_name operation
The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts.  Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.

I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there.  I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().

Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.

MFC after:	2 weeks
2021-05-30 17:52:43 -07:00
Jason A. Harmening
a4b07a2701 VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h.  Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30556
2021-05-30 14:53:47 -07:00
Mateusz Guzik
284cf3f18b ext2: add missing uio_td initialization to ext2_htree_append_block
Reported by:	pho
2021-05-30 17:19:31 +00:00
Jason A. Harmening
271fcf1c28 Revert commits 6d3e78ad6c and 54256e7954
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason.  Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
2021-05-29 17:48:02 -07:00
Mateusz Guzik
331a7601c9 tmpfs: save on common case relocking in tmpfs_reclaim 2021-05-29 22:04:10 +00:00
Mateusz Guzik
439d942b9e tmpfs: drop a redundant NULL check in tmpfs_alloc_vp 2021-05-29 22:04:10 +00:00
Mateusz Guzik
7fbeaf33b8 tmpfs: drop useless parent locking from tmpfs_dir_getdotdotdent
The id field is immutable until the node gets freed.
2021-05-29 22:04:10 +00:00
Jason A. Harmening
6d3e78ad6c VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30218
2021-05-29 14:05:39 -07:00
Rick Macklem
96b40b8967 nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently.  When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.

Commit 3f7e14ad93 added a hash table of lists hashed on file handle
for the opens.  This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens.

This commit should not affect the high level semantics of open
handling.

MFC after:	2 weeks
2021-05-27 19:08:36 -07:00
Rick Macklem
724072ab1d nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently.  When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.

Commit 3f7e14ad93 added a hash table of lists hashed on file handle
for the opens.  This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens.  This patch also moves any found match to the front
of the hash list, to try and maintain the hash lists in recently
used ordering (least recently used at the end of the list).

This commit should not affect the high level semantics of open
handling.

MFC after:	2 weeks
2021-05-25 14:19:29 -07:00
Rick Macklem
3f7e14ad93 nfscl: Add hash lists for the NFSv4 opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently.  When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.

This patch adds a table of hash lists for the opens, hashed on
file handle.  This table will be used by future commits to
search for an open based on file handle more efficiently.

MFC after:	2 weeks
2021-05-22 14:53:56 -07:00
Konstantin Belousov
f784da883f Move mnt_maxsymlinklen into appropriate fs mount data structures
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC-Note:	struct mount layout
Differential revision:	https://reviews.freebsd.org/D30325
2021-05-22 15:16:09 +03:00
Konstantin Belousov
42881526d4 nullfs: dirty v_object must imply the need for inactivation
Otherwise pages are cleaned some time later when the lower fs decides
that it is time to do it.  This mostly manifests itself as delayed
mtime update, e.g. breaking make-like programs.

Reported by:	mav
Tested by:	mav, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-22 12:30:17 +03:00
Rick Macklem
d80a903a1c nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open
Commit b3d4c70dc6 added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way.  Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.

This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.

MFC after:	2 weeks
2021-05-20 18:37:40 -07:00
Rick Macklem
c28cb257dd nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease
The most difficult NFSv4 client recovery case happens when the
lease has expired on the server.  For NFSv4.0, the client will
receive a NFSERR_EXPIRED reply from the server to indicate this
has happened.
For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such,
the client will receive a NFSERR_BADSESSION reply when the lease
has expired for these RPCs.  The client will then call nfscl_recover()
to handle the NFSERR_BADSESSION reply.  However, for the expired lease
case, the first reclaim Open will fail with NFSERR_NOGRACE.

This patch recognizes this case and calls nfscl_expireclient()
to handle the recovery from an expired lease.

This patch only affects NFSv4.1/4.2 mounts when the lease
expires on the server, due to a network partitioning that
exceeds the lease duration or similar.

MFC after:	2 weeks
2021-05-19 14:52:56 -07:00
Mateusz Guzik
4fe925b81e fdescfs: allow shared locking of root vnode
Eliminates fdescfs from lock profile when running poudriere.
2021-05-19 17:58:54 +00:00
Mateusz Guzik
43999a5cba pseudofs: use vget_prep + vget_finish instead of vget + the interlock 2021-05-19 17:58:42 +00:00
Rick Macklem
fc0dc94029 nfsd: Reduce the callback timeout to 800msec
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.

This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.

MFC after:	2 weeks
2021-05-18 16:17:58 -07:00
Rick Macklem
b3d4c70dc6 nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file.  This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.

This patch only affects mounts from Linux clients when delegations
are enabled on the server.

MFC after:	2 weeks
2021-05-18 15:53:54 -07:00
Rick Macklem
46269d66ed NFSv4 server: Re-establish the delegation recall timeout
Commit 7a606f280a allowed the server to do retries of CB_RECALL
callbacks every couple of seconds.  This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.

This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.

This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.

MFC after:	2 weeks
2021-05-16 16:40:01 -07:00
Mateusz Guzik
eec2e4ef7f tmpfs: reimplement the mtime scan to use the lazy list
Tested by:	pho
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D30065
2021-05-15 20:48:45 +00:00
Mateusz Guzik
128e25842e vm: add another pager private flag
Move OBJ_SHADOWLIST around to let pager flags be next to each other.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D30258
2021-05-15 20:47:29 +00:00
Konstantin Belousov
28bc23ab92 tmpfs: dynamically register tmpfs pager
Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:13:34 +03:00
Konstantin Belousov
8b99833ac2 procfs_map: switch to use vm_object_kvme_type
to get object type, and stop enumerating OBJT_XXX constants.  This also
provides properly a pointer for the vnode, if object backs any.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30168
2021-05-13 20:10:35 +03:00
Rick Macklem
cb07628d9e nfscl: Delete unneeded redundant MODULE_DEPEND() calls
There are two module declarations in the nfscl.ko module for "nfscl"
and "nfs".  Both of these declarations had MODULE_DEPEND() calls.
This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid
confusion with respect to what modules this module is dependent upon.

The patch also adds comments explaining why there are two module
declarations within the module.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D30102
2021-05-10 17:34:29 -07:00
Fedor Uporov
2a984c2b49 Make encode/decode extra time functions inline.
Mentioned by:   pfg
MFC after:      2 weeks
2021-05-08 06:42:20 +03:00
Rick Macklem
dd02d9d605 nfscl: Add support for va_birthtime to NFSv4
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D30156
2021-05-07 17:30:56 -07:00
Konstantin Belousov
4b8365d752 Add OBJT_SWAP_TMPFS pager
This is OBJT_SWAP pager, specialized for tmpfs.  Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways.  Replace (almost) all such places with
proper methods.

Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.

This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30070
2021-05-07 17:08:03 +03:00
Fedor Uporov
c40a160fd0 Make inode extra time fields updating logic more closer to linux.
Found using pjdfstest:
pjdfstest/tests/utimensat/09.t

Reviewed by:    pfg
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D29933
2021-05-07 10:46:55 +03:00
Fedor Uporov
b3f4665639 Invalidate inode extents cache on truncation.
It is needed to invalidate cache in case of inode space removal
to avoid situation, when extents cache returns not exist extent.

Reviewed by:    pfg
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D29931
2021-05-07 10:27:37 +03:00
Fedor Uporov
5679656e09 Improve extents verification logic.
It is possible to walk thru inode extents if EXT2FS_PRINT_EXTENTS
macro is defined. The extents headers magics and physical blocks
ranges are checked during extents walk.

Reviewed by:    pfg
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D29932
2021-05-07 10:27:28 +03:00
Fedor Uporov
1ed5f62d61 Add chr/blk devices support.
The dev field is placed into the inode structure.
The major/minor numbers conversion to/from linux compatile
format happen during on-disk inodes writing/reading.

Reviewed by:    pfg
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D29930
2021-05-07 10:08:31 +03:00
Fedor Uporov
1484574843 Fix inode birthtime updating logic.
The birthtime field of struct vattr does not checked
for VNOVAL in case of ext2_setattr() and produce incorrect
inode birthtime values.

Found using pjdfstest:
    pjdfstest/tests/utimensat/03.t

Reviewed by:    pfg
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D29929
2021-05-07 10:08:20 +03:00
Mark Johnston
8bde6d15d1 nfsclient: Copy only initialized fields in nfs_getattr()
When loading attributes from the cache, the NFS client is careful to
copy only the fields that it initialized.  After fetching attributes
from the server, however, it would copy the entire vattr structure
initialized from the RPC response, so uninitialized stack bytes would
end up being copied to userspace.  In particular, va_birthtime (v2 and
v3) and va_gen (v3) had this problem.

Use a common subroutine to copy fields provided by the NFS client, and
ensure that we provide a dummy va_gen for the v3 case.

Reviewed by:	rmacklem
Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30090
2021-05-04 08:53:57 -04:00
Rick Macklem
0755df1eee nfscl: fix typo in a comment
MFC after:	2 weeks
2021-05-03 18:29:27 -07:00
Mark Johnston
243b324f96 devfs: Avoid comparison with an uninitialized var in devfs_fp_check()
devvn_refthread() will initialize *devp only if it succeeds, so check for
success before comparing with fp->f_data.  Other devvn_refthread()
callers are careful to do this.

Reported by:	KMSAN
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30068
2021-05-03 13:24:30 -04:00
Rick Macklem
f6fec55fe3 nfscl: add check for NULL clp and forced dismounts to nfscl_delegreturnvp()
Commit aad780464f added a function called nfscl_delegreturnvp()
to return delegations during the NFS VOP_RECLAIM().
The function erroneously assumed that nm_clp would
be non-NULL. It will be NULL for NFSV4.0 mounts until
a regular file is opened. It will also be NULL during
vflush() in nfs_unmount() for a forced dismount.

This patch adds a check for clp == NULL to fix this.

Also, since it makes no sense to call nfscl_delegreturnvp()
during a forced dismount, the patch adds a check for that
case and does not do the call during forced dismounts.

PR:	255436
Reported by:	ish@amail.plala.or.jp
MFC after:	2 weeks
2021-04-27 17:30:16 -07:00
Rick Macklem
f5ff282bc0 nfscl: fix the handling of NFSERR_DELAY for Open/LayoutGet RPCs
For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that
have both Open and LayoutGet operations in them.
If the pNFS server were tp reply NFSERR_DELAY for one of these
compounds, the retry after a delay cannot be handled by
newnfs_request(), since there is a reference held on the open
state for the Open operation in them.

Fix this by adding these RPCs to the "don't do delay here"
list in newnfs_request().

This patch is only needed if the mount is using pNFS (the "pnfs"
mount option) and probably only matters if the MDS server
is issuing delegations as well as pNFS layouts.

Found by code inspection.

MFC after:	2 weeks
2021-04-26 17:48:21 -07:00
Rick Macklem
8759773148 nfsd: fix the slot sequence# when a callback fails
Commit 4281bfec36 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.

To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case.  For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.

Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled.  This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.

MFC after:	2 weeks
2021-04-26 16:24:10 -07:00
Rick Macklem
aad780464f nfscl: return delegations in the NFS VOP_RECLAIM()
After a vnode is recycled it can no longer be
acquired via vfs_hash_get() and, as such,
a delegation for the vnode cannot be recalled.

In the unlikely event that a delegation still
exists when the vnode is being recycled, return
the delegation since it will no longer be
recallable.

Until you have this patch in your NFSv4 client,
you should consider avoiding the use of delegations.

MFC after:	2 weeks
2021-04-25 17:57:55 -07:00
Rick Macklem
02695ea890 nfscl: fix delegation recall when the file is not open
Without this patch, if a NFSv4 server recalled a
delegation when the file is not open, the renew
thread would block in the NFS VOP_INACTIVE()
trying to acquire the client state lock that it
already holds.

This patch fixes the problem by delaying the
vrele() call until after the client state
lock is released.

This bug has been in the NFSv4 client for
a long time, but since it only affects
delegation when recalled due to another
client opening the file, it got missed
during previous testing.

Until you have this patch in your client,
you should avoid the use of delegations.

MFC after:	2 weeks
2021-04-25 12:55:00 -07:00
Rick Macklem
4281bfec36 nfsd: fix session slot handling for failed callbacks
When the NFSv4.1/4.2 server does a callback to a client
on the back channel, it will use a session slot in the
back channel session. If the back channel has failed,
the callback will fail and, without this patch, the
session slot will not be released.
As more callbacks are attempted, all session slots
can become busy and then the nfsd thread gets stuck
waiting for a back channel session slot.

This patch frees the session slot upon callback
failure to avoid this problem.

Without this patch, the problem can be avoided by leaving
delegations disabled in the NFS server.

MFC after:	2 weeks
2021-04-23 15:24:47 -07:00
Rick Macklem
78ffcb86d9 nfscommon: fix function name in comment
MFC after:	2 weeks
2021-04-19 20:09:46 -07:00
Rick Macklem
5a89498d19 nfsd: fix stripe size reply for the File Layout pNFS server
At a recent testing event I found out that I had misinterpreted
RFC5661 where it describes the stripe size in the File Layout's
nfl_util field. This patch fixes the pNFS File Layout server
so that it returns the correct value to the NFSv4.1/4.2 pNFS
enabled client.

This affects almost no one, since pNFS server configurations
are rare and the extant pNFS aware NFS clients seemed to
function correctly despite the erroneous stripe size.
It *might* be needed for correct behaviour if a recent
Linux client mounts a FreeBSD pNFS server configuration
that is using File Layout (non-mirrored configuration).

MFC after:	2 weeks
2021-04-19 17:54:54 -07:00
Rick Macklem
34256484af Revert "nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661"
This reverts commit 9edaceca81.

It turns out that the Linux client intentionally does an NFSv4.1
RPC with only a Sequence operation in it and with "seqid + 1"
for the slot.  This is used to re-synchronize the slot's seqid
and the client expects the NFS4ERR_SEQ_MISORDERED error reply.

As such, revert the patch, so that the server remains RFC5661
compliant.
2021-04-15 14:08:40 -07:00
Konstantin Belousov
5edf7227ec pseudofs: limit writes to 1M
Noted and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29752
2021-04-14 10:23:21 +03:00
Konstantin Belousov
8cca7b7f28 nfs client: depend on xdr
Since 7763814fc9 nfsrpc_setclient() uses mem_alloc() that is macro
around malloc(M_RPC).  M_RPC is provided by xdr.ko.

Reviewed by:	rmacklem
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-04-13 18:04:43 +03:00
Rick Macklem
9edaceca81 nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Sometimes, after some Linux NFSv4.1/4.2 clients establish
a new TCP connection, they will advance the sequence number
for a session slot by 2 instead of 1.
RFC5661 specifies that a server should reply
NFS4ERR_SEQ_MISORDERED for this case.
This might result in a system call error in the client and
seems to disable future use of the slot by the client.
Since advancing the sequence number by 2 seems harmless,
allow this case if vfs.nfs.linuxseqsesshack is non-zero.

Note that, if the order of RPCs is actually reversed,
a subsequent RPC with a smaller sequence number value
for the slot will be received.  This will result in
a NFS4ERR_SEQ_MISORDERED reply.
This has not been observed during testing.
Setting vfs.nfs.linuxseqsesshack to 0 will provide
RFC5661 compliant behaviour.

This fix affects the fairly rare case where a NFSv4
Linux client does a TCP reconnect and then apparently
erroneously increments the sequence number for the
session slot twice during the reconnect cycle.

PR:	254816
MFC after:	2 weeks
2021-04-11 16:51:25 -07:00
Rick Macklem
7763814fc9 nfsv4 client: do the BindConnectionToSession as required
During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it.  For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.

Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.

This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.

If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.

PR:	254840
Comments by:	asomers
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D29475
2021-04-11 14:34:57 -07:00
Rick Macklem
22cefe3d83 nfsd: fix replies from session cache for multiple retries
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Commit 05a39c2c1c fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.

This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times.  Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.

MFC after:	2 weeks
2021-04-10 15:50:25 -07:00
Rick Macklem
05a39c2c1c nfsd: fix replies from session cache for retried RPCs
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

The FreeBSD server failec to reply using the cached
reply in the session slot when an RPC was retried on
the session slot, as indicated by same slot sequence#.

This patch fixes this.  It should also fix a similar
failure for NFSv4.0 mounts, when the sequence# in
the open/lock_owner requires a reply be done from
an entry locked into the DRC.

This fix affects the fairly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation.  Note that retries only occur after the
client has needed to create a new TCP connection.

MFC after:	2 weeks
2021-04-08 14:04:22 -07:00
Rick Macklem
7a606f280a nfsd: make the server repeat CB_RECALL every couple of seconds
Commit 01ae8969a9 stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. This will fix the back channel, but the
first attempt at a callback like CB_RECALL will already have
failed. Without this patch, a CB_RECALL will not be retried
and that can result in a 5 minute delay until the delegation
times out.

This patch modifies the code so that it will retry the
CB_RECALL every couple of seconds, often avoiding the
5 minute delay.

This is not critical for correct behaviour, but avoids
the 5 minute delay for the case where the Linux client
re-binds the back channel via BindConnectionToSession.

MFC after:	2 weeks
2021-04-04 18:15:54 -07:00
Rick Macklem
6f2addd838 nfsd: fix BindConnectionToSession so that it clears "cb path down"
Commit 01ae8969a9 stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. It will do this for every RPC
reply until it no longer sees the flag.
Without that patch, this will happen until the client does
an Open, which will clear LCL_CBDOWN.

This patch clears LCL_CBDOWN right away, so that
NFSV4SEQ_CBPATHDOWN will no longer be sent to the client
in Sequence replies and the Linux client will not repeat
the BindConnectionToSession RPCs.

This is not critical for correct behaviour, but reduces
RPC overheads for cases where the Open will not be done
for a while.

MFC after:	2 weeks
2021-04-04 15:05:39 -07:00
Konstantin Belousov
76b1b5ce6d nullfs: protect against user creating inconsistent state
The VFS conventions is that VOP_LOOKUP() methods do not need to handle
ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all).  Nullfs
bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions,
it is possible to get into situation where
- upper vnode does not have VV_ROOT set
- lower vnode is root
- ISDOTDOT is requested
User just needs to nullfs-mount non-root of some filesystem, and then move
some directory under mount, out of mount, using lower filesystem.

In this case, nullfs cannot do much, but we still should and can ensure
internal kernel structures are consistent.  Avoid ISDOTDOT lookup forwarding
when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT.

PR:	253593
Reported by:	Gregor Koscak <elogin41@gmail.com>
Test by:	Patrick Sullivan <sulli00777@gmail.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-02 15:40:25 +03:00
Rick Macklem
4e6c2a1ee9 nfsv4 client: factor loop contents out into a separate function
Commit fdc9b2d50f replaced a couple of while loops with LIST_FOREACH()
loops.  This patch factors the body of that loop out into a separate
function called nfscl_checkown().
This prepares the code for future changes to use a hash table of
lists for open searches via file handle.

This patch should not result in a semantics change.

MFC after:	2 weeks
2021-04-01 15:36:37 -07:00
Rick Macklem
01ae8969a9 nfsd: do not implicitly bind the back channel for NFSv4.1/4.2 mounts
The NFSv4.1 (and 4.2 on 13) server incorrectly binds
a new TCP connection to the back channel when first
used by an RPC with a Sequence op in it (almost all of them).
RFC5661 specifies that only the fore channel should be bound.

This was done because early clients (including FreeBSD)
did not do the required BindConnectionToSession RPC.

Unfortunately, this breaks the Linux client when the
"nconnects" mount option is used, since the server
may do a callback on the incorrect TCP connection.

This patch converts the server behaviour to that
required by the RFC.  It also makes the server test/indicate
failure of the back channel more aggressively.

Until this patch is applied to the server, the
"nconnects" mount option is not recommended for a Linux
NFSv4.1/4.2 client mount to the FreeBSD server.

Reported by:	bcodding@redhat.com
Tested by:	bcodding@redhat.com
PR:		254560
MFC after:	1 week
2021-03-30 14:31:05 -07:00
Rick Macklem
fdc9b2d50f nfsv4 client: replace while loops with LIST_FOREACH() loops
This patch replaces a couple of while() loops with LIST_FOREACH() loops.
While here, declare a couple of variables "bool".
I think LIST_FOREACH() is preferred and makes the code more readable.
This also prepares the code for future changes to use a hash table of
lists for open searches via file handle.

This patch should not result in a semantics change.

MFC after:	2 weeks
2021-03-29 14:14:51 -07:00
Rick Macklem
e61b29ab5d nfsv4.1/4.2 client: fix handling of delegations for "oneopenown" mnt option
If a delegation for a file has been acquired, the "oneopenown" option
was ignored when the local open was issued. This could result in multiple
openowners/opens for a file, that would be transferred to the server
when the delegation was recalled.
This would not be serious, but could result in more than one openowner.
Since the Amazon/EFS does not issue delegations, this probably never
occurs in practice.
Spotted during code inspection.

This small patch fixes the code so that it checks for "oneopenown"
when doing client local opens on a delegation.

MFC after:	2 weeks
2021-03-29 12:09:19 -07:00
Rick Macklem
82ee386c2a nfsv4 client: fix forced dismount when sleeping in the renew thread
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed.  The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.

This is the second of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.

This patch adds a 5second timeout on the msleep()s and checks for
the forced dismount flag so that the renew thread will
wake up and see the forced dismount flag.  Normally a wakeup()
will occur in less than 5seconds, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.
The patch also adds the "mp" argument to nfsv4_lock() so that it
will return when the forced dismount flag is set.

While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call.

MFC after:	2 weeks
2021-03-23 13:04:37 -07:00
Alan Somers
9c5aac8f2e fusefs: fix a dead store in fuse_vnop_advlock
kevans actually caught this in the original review and I fixed it, but
then I committed an older copy of the branch.  Whoops.

Reported by:	kevans
MFC after:	13 days
MFC with:	929acdb19a
Differential Revision:	https://reviews.freebsd.org/D29031
2021-03-19 19:38:57 -06:00
Rick Macklem
5f742d3879 nfsv4 client: fix forced dismount when sleeping on nfsv4lck
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed.  The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.

This is the first of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.

nfsv4_lock() checks for forced dismount, but only after it wakes up
from msleep().  Without this patch, a wakeup() call was required.
This patch adds a 1second timeout on the msleep(), so that it will
wake up and see the forced dismount flag.  Normally a wakeup()
will occur in less than 1second, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.

While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call and make the same change for nfsv4_getref().

MFC after:	2 weeks
2021-03-19 14:09:33 -07:00
Alan Somers
929acdb19a fusefs: fix two bugs regarding fcntl file locks
1) F_SETLKW (blocking) operations would be sent to the FUSE server as
   F_SETLK (non-blocking).

2) Release operations, F_SETLK with lk_type = F_UNLCK, would simply
   return EINVAL.

PR:		253500
Reported by:	John Millikin <jmillikin@gmail.com>
MFC after:	2 weeks
2021-03-18 17:09:10 -06:00
Rick Macklem
fd232a21bb nfsv4 pnfs client: fix updating of the layout stateid.seqid
During a recent NFSv4 testing event a test server was replying
NFSERR_OLDSTATEID for layout stateids presented to the server
for LayoutReturn operations.  Upon rereading RFC5661, it was
apparent that the FreeBSD NFSv4.1/4.2 pNFS client did not
maintain the seqid field of the layout stateid correctly.

This patch is believed to correct the problem.  Tested against
a FreeBSD pNFS server with diagnostics added to check the stateid's
seqid did not indicate problems.  Unfortunately, testing aginst
this server will not happen in the near future, so the fix may
not be correct yet.

MFC after:	2 weeks
2021-03-18 12:20:25 -07:00
Gordon Bergling
5666643a95 Fix some common typos in comments
- occured -> occurred
- normaly -> normally
- controling -> controlling
- fileds -> fields
- insterted -> inserted
- outputing -> outputting

MFC after:	1 week
2021-03-13 18:26:15 +01:00
Konstantin Belousov
16dea83410 null_vput_pair(): release use reference on dvp earlier
We might own the last use reference, and then vrele() at the end would
need to take the dvp vnode lock to inactivate, which causes deadlock
with vp. We cannot vrele() dvp from start since this might unlock ldvp.

Handle it by holding the vnode and dropping use ref after lowerfs
VOP_VPUT_PAIR() ended.  This effectivaly requires unlock of the vp vnode
after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true
unconditionally.  This opens more opportunities for vp to be reclaimed,
if lvp is still alive we reinstantiate vp with null_nodeget().

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Rick Macklem
c04199affe nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing I/O DS operations on a Flexible File Layout pNFS
server.  For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
instead of the ReadDS/WriteDS/CommitDS counts.
This patch fixes this.

Only the RPC counts reported by nfsstat(1) were affected by this bug,
the I/O operations were performed correctly.

MFC after:	2 weeks
2021-03-02 14:18:23 -08:00
Rick Macklem
94f2e42f5e nfsclient: Fix the stripe unit size for a File Layout pNFS layout
During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing a File Layout pNFS DS I/O operation.
The size of the I/O operation was smaller than expected.
The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
in the layout.  I had misinterpreted RFC5661 and had shifted the value
right by 6 bits. The correct interpretation is to use the value as
presented (it is always an exact multiple of 64), clearing bits 0->5.
This patch fixes this.

Without the patch, I/O through the DSs work, but the I/O size is 1/64th
of what is optimal.

MFC after:	2 weeks
2021-03-01 12:49:32 -08:00
Rick Macklem
15bed8c46b nfsclient: add nfs node locking around uses of n_direofoffset
During code inspection I noticed that the n_direofoffset field
of the NFS node was being manipulated without any lock being
held to make it SMP safe.
This patch adds locking of the NFS node's mutex around
handling of n_direofoffset to make it SMP safe.

I have not seen any failure that could be attributed to n_direofoffset
being manipulated concurrently by multiple processors, but I think this
is possible, since directories are read with shared vnode
locking, plus locks only on individual buffer cache blocks.
However, there have been as yet unexplained issues w.r.t reading
large directories over NFS that could have conceivably been caused
by concurrent manipulation of n_direofoffset.

MFC after:	2 weeks
2021-02-28 14:53:54 -08:00
Rick Macklem
3e04ab36ba nfsclient: add checks for a server returning the current directory
Commit 3fe2c68ba2 dealt with a panic in cache_enter_time() where
the vnode referred to the directory argument.
It would also be possible to get these panics if a broken
NFS server were to return the directory as an new object being
created within the directory or in a Lookup reply.

This patch adds checks to avoid the panics and logs
messages to indicate that the server is broken for the
file object creation cases.

Reviewd by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D28987
2021-02-28 14:15:32 -08:00
Alexander Motin
d01032736c Fix diroffdiroff, probably copy/paste bug.
Too long name looks bad in `vmstat -m`.

MFC after:	1 week
2021-02-28 09:08:31 -05:00
Rick Macklem
3fe2c68ba2 nfsclient: fix panic in cache_enter_time()
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.

This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.

Reported by:	otis
Tested by:	otis
Reviewed by:	mjg
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28969
2021-02-27 17:54:05 -08:00
Ryan Libby
d7671ad8d6 Close races in vm object chain traversal for unlock
We were unlocking the vm object before reading the backing_object field.
In the meantime, the object could be freed and reused.  This could cause
us to go off the rails in the object chain traversal, failing to unlock
the rest of the objects in the original chain and corrupting the lock
state of the victim chain.

Reviewed by:	bdrewery, kib, markj, vangyzen
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D28926
2021-02-25 12:11:19 -08:00
Alex Richardson
ba2cfa80e1 Fix makefs bootstrap after d485c77f20
The makefs msdosfs code includes fs/msdosfs/denode.h which directly uses
struct buf from <sys/buf.h> rather than the makefs struct m_buf.
To work around this problem provide a local denode.h that includes
ffs/buf.h and defines buf as an alias for m_buf.

Reviewed By:	kib, emaste
Differential Revision: https://reviews.freebsd.org/D28835
2021-02-22 17:55:45 +00:00
Konstantin Belousov
8b7239681e ext2fs: clear write cluster tracking on truncation
Reviewed by:	fsu, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Konstantin Belousov
2bfd8992c7 vnode: move write cluster support data to inodes.
The data is only needed by filesystems that
1. use buffer cache
2. utilize clustering write support.

Requested by:	mjg
Reviewed by:	asomers (previous version), fsu (ext2 parts), mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Konstantin Belousov
d485c77f20 Remove #define _KERNEL hacks from libprocstat
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition.  Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Alexander V. Chernikov
605284b894 Enforce net epoch in in6_selectsrc().
in6_selectsrc() may call fib6_lookup() in some cases, which requires
 epoch. Wrap in6_selectsrc* calls into epoch inside its users.
Mark it as requiring epoch by adding NET_EPOCH_ASSERT().

MFC after:	1 weeek
Differential Revision:	https://reviews.freebsd.org/D28647
2021-02-15 22:33:12 +00:00
Alan Somers
71befc3506 fusefs: set d_off during VOP_READDIR
This allows d_off to be used with lseek to position the file so that
getdirentries(2) will return the next entry.  It is not used by
readdir(3).

PR:		253411
Reported by:	John Millikin <jmillikin@gmail.com>
Reviewed by:	cem
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28605
2021-02-12 21:50:52 -07:00
Konstantin Belousov
4a21bcb241 nfsserver: use VOP_VPUT_PAIR().
Apply VOP_VPUT_PAIR() to the end of vnode operations after the
VOP_MKNOD(), VOP_MKDIR(), VOP_LINK(), VOP_SYMLINK(), VOP_CREATE().

Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:21 +02:00
Konstantin Belousov
e4aaf35ab5 nullfs: provide special bypass for VOP_VPUT_PAIR
Generic bypass cannot understand the rules of liveness for the VOP.

Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Konstantin Belousov
ee965dfa64 vn_open(): If the vnode is reclaimed during open(2), do not return error.
Most future operations on the returned file descriptor will fail
anyway, and application should be ready to handle that failures.  Not
forcing it to understand the transient failure mode on open, which is
implementation-specific, should make us less special without loss of
reporting of errors.

Suggested by: chs
Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Mateusz Guzik
3bc17248d3 devfs: fix use count leak when using TIOCSCTTY
by matching devfs_ctty_ref

Fixes: 3b44443626 ("devfs: rework si_usecount to track opens")
2021-02-09 01:54:21 +00:00
Edward Tomasz Napierala
b8073b3c74 msdosfs: fix vnode leak with msdosfs_rename()
This could happen when failing due to disappearing source file.

Reviewed By:	kib
Tested by:	pho
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27338
2021-01-31 21:37:44 +00:00
Edward Tomasz Napierala
cb69621249 msdosfs: fix double unlock if the source file disappears
We would unlock fvp here, only to unlock it again below,
just before "bad".

Reviewed By:	kib
Tested by:	pho
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27339
2021-01-31 21:35:34 +00:00
Alex Richardson
1d15bceae6 tmpfs: implement pathconf(_PC_SYMLINK_MAX)
This fixes one of the sys/audit tests when running them on tmpfs.

Reviewed By:	delphij, kib
Differential Revision: https://reviews.freebsd.org/D28387
2021-01-29 09:30:25 +00:00
Kyle Evans
0f919ed4ae tmpfs: push VEXEC check into tmpfs_lookup()
vfs_cache_lookup() has already done the appropriate VEXEC check, therefore
we must not re-check in VOP_CACHEDLOOKUP.

This fixes O_SEARCH semantics on tmpfs and removes a redundant descent into
VOP_ACCESS() in the common case.

Reported-by:	arichardson (via CheriBSD Jenkins CI)
Reviewed-by:	kib
MFC-after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28401
2021-01-28 19:25:11 -06:00
Mateusz Guzik
c09f799271 tmpfs: drop acq fence now that vn_load_v_data_smr has consume semantics 2021-01-25 22:40:15 +00:00
Mateusz Guzik
cc96f92a57 atomic: make atomic_store_ptr type-aware 2021-01-25 22:40:15 +00:00
Alex Richardson
8d55837dc1 qeueue.h: Add {SLIST,STAILQ,LIST,TAILQ}_END()
We provide these for compat with other queue.h headers since some software
assumes it exists (e.g. the libevent contrib code), but we are not
encouraging their use (NULL should be used instead).

This fixes the following warning (which should arguable be an error since
it results in a function call to an undefined function):

.../contrib/libevent/buffer.c:495:16: warning: implicit declaration of function 'LIST_END' is invalid in C99 [-Wimplicit-function-declaration]
             cbent != LIST_END(&buffer->callbacks);
                      ^
.../contrib/libevent/buffer.c:495:13: warning: comparison between pointer and integer ('struct evbuffer_cb_entry *' and 'int') [-Wpointer-integer-compare]
             cbent != LIST_END(&buffer->callbacks);
             ~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed By:	jhb
Differential Revision: https://reviews.freebsd.org/D27151
2021-01-25 15:09:35 +00:00
Konstantin Belousov
bd01a69f48 nfs_write(): do not call ncl_pager_setsize() after clearing TDP2_SBPAGES
This might unnecessary truncate file undoing extension done by the write.

Reported by:	Yasuhiro Kimura <yasu@utahime.org>
Reviewed by:	rmacklem
Tested by:	rmacklem, Yasuhiro Kimura <yasu@utahime.org>
MFC after:	6 days
Sponsored by:	The FreeBSD Foundation
2021-01-25 01:02:03 +02:00
Konstantin Belousov
aa8c1f8d84 nfs client: block vnode_pager_setsize() calls from nfscl_loadattrcache in nfs_write
Otherwise writing thread might wait on sbusy state of the pages which were
busied by itself, similarly to nfs_read().  But also we need to clear
NVNSETSZKSIP flag possibly set by ncl_pager_setsize(), to not undo
extension done by write.

Reported by:	bdrewery
Reviewed by:	rmacklem
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28306
2021-01-23 17:24:32 +02:00
Mateusz Guzik
618029af50 tmpfs: add support for lockless symlink lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Mateusz Guzik
739ecbcf1c cache: add symlink support to lockless lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Konstantin Belousov
2d1e4220eb tmpfs_reclaim: detach unlinked node on dereferencing.
Otherwise it is dereferenced one extra time at unmount, if it survives
long enough.  One way to hold the reference on such node is to keep it
open.

tmpfs_vptocnp() now needs to account for the possibility that unlocked
node was removed from the list.

Reported by:	danfe
Tested by:	danfe, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-14 14:51:37 +02:00
Konstantin Belousov
685265ecfb tmpfs_reclaim: style
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-01-14 14:43:13 +02:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Rick Macklem
148a227bf8 nfsd: add KASSERTs to nfsm_trimtrailing() for M_EXTPG mbufs
Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of
the arguments for the M_EXTPG case.

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28053
2021-01-10 13:50:15 -08:00
Konstantin Belousov
ac2576b9f7 tmpfs open: assert that there is no double-init of f_data.
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:36 +02:00
Konstantin Belousov
9f200bc47b tmpfs_free_tmp(): explicitly assert that tmp is locked
Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked
mutex provides different failure mode.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:29 +02:00
Konstantin Belousov
42bebbda9e tmpfs: make M_TMPFSMNT static to tmpfs_vfsops.c
This malloc type is only used in this file.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:44:55 +02:00
Alan Somers
17a82e6af8 Fix vnode locking bug in fuse_vnop_copy_file_range
MFC-With:	92bbfe1f0d
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27938
2021-01-03 11:16:20 -07:00
Mark Johnston
90f580b954 Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry.  Several filesystems were not initializing
the field, which ends up being copied out to userland.

Reported by:	Syed Faraz Abrar <faraz@elttam.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27792
2021-01-03 11:50:31 -05:00
Alan Somers
34477e25c1 fusefs: only check vnode locks with DEBUG_VFS_LOCKS
MFC-With:	37df9d3bba
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27939
2021-01-03 09:19:00 -07:00
Alan Somers
542711e520 Fix a vnode locking bug in fuse_vnop_advlock.
Must lock the vnode before accessing the fufh table.  Also, check for
invalid parameters earlier.  Bug introduced by r346170.

MFC after:	2 weeks

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27936
2021-01-03 09:16:23 -07:00
Mateusz Guzik
3e506a67bb vfs: add v_irflag accessors
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27793
2021-01-03 06:50:06 +00:00
Konstantin Belousov
51a9b978e7 nfs server: improve use of the VFS KPI
In particular, do not assume that vn_start_write() returns the same mp
as it was passed in, or never returns error.

Also be more accurate to return NULL vp and mp when error occured, to
catch wrong control flow easier.

Stop checking for NULL mp before calling vn_finished_write(), NULL mp
is handled transparently by the function.

Reviewed by:	rmacklem
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27881
2021-01-02 20:17:12 +02:00
Rick Macklem
dc78533a52 nfsd: fix NFSv4.0 seqid handling for ERELOOKUP
Commit 774a36851e fixed the NFS server so that it could handle
ERELOOKUP returns from VOP calls by redoing the operation/RPC.
However, for NFSv4.0, redoing an Open would increment
the open_owner's seqid multiple times, breaking the protocol.
This patch sets a new flag called ND_ERELOOKUP on the RPC when
a redo is in progress.  Then the code that increments the seqid
avoids the seqid increment/check when the flag is set, since
it indicates this has already been done for the Open.
2021-01-01 14:21:51 -08:00
Rick Macklem
774a36851e nfsd: fix NFS server for ERELOOKUP
r367672 modified UFS such that certain VOPs, such as
VOP_CREATE() will intermittently return ERELOOKUP.
When this happens, the entire system call, or NFS
operation in the case of the NFS server, must be redone.

This patch adds that support to the NFS server by rolling
back the state of the NFS request arguments and NFS
reply arguments mbuf lists to the condition they were
in before the operation and then redoing the operation.

Tested by:	pho
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D27875
2021-01-01 13:55:51 -08:00
Alan Somers
92bbfe1f0d fusefs: implement FUSE_COPY_FILE_RANGE.
This updates the FUSE protocol to 7.28, though most of the new features
are optional and are not yet implemented.

MFC after:	2 weeks
Relnotes:	yes
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27818
2021-01-01 10:18:23 -07:00
Mateusz Guzik
d71965127f tmpfs: use VNPASS when asserting on a vnode in tmpfs_read_pgcache 2021-01-01 03:23:01 +00:00
Alan Somers
37df9d3bba fusefs: update FUSE protocol to 7.24 and implement FUSE_LSEEK
FUSE_LSEEK reports holes on fuse file systems, and is used for example
by bsdtar.

MFC after:	2 weeks
Relnotes:	yes
Reviewed by:	cem
Differential Revision: https://reviews.freebsd.org/D27804
2020-12-31 08:51:47 -07:00
Edward Tomasz Napierala
4ddb3cc597 devfs(4): defer freeing until we drop devmtx ("cdev")
Before r332974 the old code would sometimes cause a rare lock order
reversal against pagequeue, which looked roughly like this:

witness_checkorder()
__mtx_lock-flags()
vm_page_alloc()
uma_small_alloc()
keg_alloc_slab()
keg_fetch-slab()
zone_fetch-slab()
zone_import()
zone_alloc_bucket()
uma_zalloc_arg()
bucket_alloc()
uma_zfree_arg()
free()
devfs_metoo()
devfs_populate_loop()
devfs_populate()
devfs_rioctl()
VOP_IOCTL_APV()
VOP_IOCTL()
vn_ioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()

Since r332974 the original problem no longer exists, but it still
makes sense to move things out of the - often congested - lock.

Reviewed By:	kib, markj
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27334
2020-12-29 13:47:36 +00:00
Alan Somers
4f4111d2c5 fusefs: delete some dead code
The original fusefs GSoC project seems to have envisioned exchanging two
types of messages with FUSE servers.  Perhaps vectored and non-vectored?
But in practice only one type has ever been used.  Delete the other type.

Reviewed by:		cem
Differential Revision:	https://reviews.freebsd.org/D27770
2020-12-28 19:05:35 +00:00
Mark Johnston
599f904463 msdosfs: Fix a leak of dirent padding bytes
This was missed in r340856 / commit
6d2e2df764.  Three bytes from the kernel
stack may be leaked when reading directory entries.

Reported by:	Syed Faraz Abrar <faraz@elttam.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-12-27 17:01:44 -05:00
Rick Macklem
665b1365fe Add a new "tlscertname" NFS mount option.
When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake.  For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.

This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided.  This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.
2020-12-23 13:42:55 -08:00
Brooks Davis
52e63ec2f1 VFS_QUOTACTL: Remove needless casts of arg
The argument is a void * so there's no need to cast it to caddr_t.

Update documentation to match function decleration.

Reviewed by:	freqlabs
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27093
2020-12-17 21:58:10 +00:00
Kirk McKusick
645027c89d In ext2fs, BA_CLRBUF is used in ext2_balloc() not UFS_BALLOC().
Noted by:     kib
MFC after:    3 days
Sponsored by: Netflix
2020-12-08 00:49:31 +00:00
Kirk McKusick
bb3c01ec79 Document the BA_CLRBUF flag used in ufs and ext2fs filesystems.
Suggested by: kib
MFC after:    3 days
Sponsored by: Netflix
2020-12-06 20:50:21 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Konstantin Belousov
f7af6e5e54 nullfs: provide custom bypass for VOP_READ_PGCACHE().
Normal bypass expects locked vnode, which is not true for
VOP_READ_PGCACHE().  Ensure liveness of the lower vnode by taking the
upper vnode interlock, which is also taked by null_reclaim() when
setting v_data to NULL.

Reported and tested by:	pho
Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27327
2020-11-26 18:16:32 +00:00
Konstantin Belousov
6936779347 msdosfs: suspend around unmount or remount rw->ro.
This also eliminates unsafe use of VFS_SYNC(MNT_WAIT).

Requested by:	mckusick
Discussed with:	imp
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27269
2020-11-20 15:19:30 +00:00
Konstantin Belousov
1b3cb4dc04 msdosfs: Add trivial support for suspension.
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D27269
2020-11-20 12:31:02 +00:00
Conrad Meyer
c1c4d0e9a8 msdosfs(5): Fix debug-only format string
No functional change; MSDOSFS_DEBUG isn't a real build option, so this isn't
covered by LINT kernels.
2020-11-18 20:20:03 +00:00