The daemon can specify fsname=XXX in its mount options. If so, the file
system should report f_mntfromname as XXX during statfs. This will show
up in the output of commands like mount and df.
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35090
Prior to fuse protocol version 7.9, the fuse_entry_out structure had a
smaller size. But fuse_vnop_create did not take that into account when
working with servers that use older protocols. The bug does not matter
for servers which don't use file handles or open flags (the only fields
affected).
PR: 263625
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC after: 2 weeks
During a FUSE_WRITE, the kernel requests the server to write a certain
amount of data, and the server responds with the amount that it actually
did write. It is obviously an error for the server to write more than
it was provided, and we always treated it as such, but there were two
problems:
* If the server responded with a huge amount, greater than INT_MAX, it
would trigger an integer overflow which would cause a panic.
* When extending the file, we wrongly set the file's size before
validing the amount written.
PR: 263263
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D34955
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
local to nfs_clrpcops.c.
Future commits will do the same for other functions.
Formerly fusefs would pass up the stack any error value returned by the
fuse server. However, some values aren't valid for userland, but have
special meanings within the kernel. One of these, EJUSTRETURN, could
cause a kernel page fault if the server returned it in response to
FUSE_LOOKUP. Fix by validating all errors returned by the server.
Also, fix a data lifetime bug in the FUSE_DESTROY test.
PR: 263220
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 3 weeks
Sponsored by: Axcient
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D34931
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_nget().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_loadattrcache().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_request().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_postop_attr().
Future commits will do the same for other functions.
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.
This patch modifies nfscl_wcc_data() to optionally
acquire the file's size, for use with an AppendWrite.
Although the "stuff" arguments are always NULL
(these were used for the Mac OSX port and should be
cleared out someday), make the argument to
nfscl_wcc_data() explicitly NULL for clarity.
This patch does not cause any semantics change until
the AppendWrite is added in a future commit.
* We never send FUSE_LOOKUP for the root inode, since its inode number
is hard-coded to 1. Therefore, we should not send FUSE_FORGET for it,
lest the server see its lookup count fall below 0.
* During VOP_RECLAIM, if we are reclaiming the root inode, we must clear
the file system's vroot pointer. Otherwise it will be left pointing
at a reclaimed vnode, which will cause future VOP_LOOKUP operations to
fail. Previously we only cleared that pointer during VFS_UMOUNT. I
don't know of any real-world way to trigger this bug.
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D34753
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct before doing the Write operation.
This patch prepares the NFSv4 client for such an
RPC, which will be added in a future commit.
This patch does not cause any semantics change.
Commit 867c27c23a5c modified the NFS client so that
it did IO_APPEND writes directly to the NFS server
bypassing the buffer cache, via a call to
nfs_directio_write(). Unfortunately, this (very old)
function assumed that the uio iov was for user space
addresses. As such, a IO_APPEND VOP_WRITE() that
was for system space, such as ktrace(1) does, would
write bogus data.
This patch fixes nfs_directio_write() so that it
handles kernel space uio iovs.
Reported by: bz
Tested by: bz
MFC after: 2 weeks
A NFSv4.1/4.2 pNFS mount needs to do a
separate Open+LayoutGet RPC, so do not do
a Lookup+Open RPC for these mounts.
The Lookup+Open RPCs are still disabled,
until further testing is done, so this patch
has no effect at this time.
Use of the Lookup+Open RPC is currently disabled,
due to a problem detected during testing. This
patch fixes this problem. The problem was that
nfscl_postop_attr() does not parse the attributes
if nd_repstat != 0. It also would parse the
return status for the operation, where the
Lookup+Open code had already parsed it.
The first change in the patch does not make any
semantics change, but makes the code identical
to what is done later in the function, so that
it is apparent that the semantics should be the
same in both places.
Lookup+Open remains disabled while further
testing is being done, so this patch has no
effect at this time.
The Fsinfo RPC is exempt from the check for
Kerberized NFS being required, as recommended
by RFC2623. However, there is no reason to
exempt Fsinfo from the requirement to use TLS.
This patch fixes the code so that the exemption
only applies to Kerberized NFS and not
NFS-over-TLS.
This only affects NFS-over-TLS for an NFSv3
mount when it is required, but the client does
not do so.
MFC after: 1 month
ler@, markj@ reported a use after free in nfscl_cleanupkext().
They also provided two possible causes:
- In nfscl_cleanup_common(), "own" is the owner string
owp->nfsow_owner. If we free that particular
owner structure, than in subsequent comparisons
"own" will point to freed memory.
- nfscl_cleanup_common() can free more than one owner, so the use
of LIST_FOREACH_SAFE() in nfscl_cleanupkext() is not sufficient.
I also believe there is a 3rd:
- If nfscl_freeopenowner() or nfscl_freelockowner() is called
without the NFSCLSTATE mutex held, this could race with
nfscl_cleanupkext().
This could happen when the exclusive lock is held
on the client, such as when delegations are being returned
or when recovering from NFSERR_EXPIRED.
This patch fixes them as follows:
1 - Copy the owner string to a local variable before the
nfscl_cleanup_common() call.
2 - Modify nfscl_cleanup_common() so that it will never free more
than the first matching element. Normally there should only
be one element in each list with a matching open/lock owner
anyhow (but there might be a bug that results in a duplicate).
This should guarantee that the FOREACH_SAFE loops in
nfscl_cleanupkext() are adequate.
3 - Acquire the NFSCLSTATE mutex in nfscl_freeopenowner()
and nfscl_freelockowner(), if it is not already held.
This serializes all of these calls with the ones done in
nfscl_cleanup_common().
Reported by: ler
Reviewed by: markj
Tested by: cy
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34334
When renaming a directory into a different parent directory, invalidate
the cached attributes of the new parent. Otherwise, stat will show the
wrong st_nlink value.
MFC after: 1 week
Reviewed by: ngie
Differential Revision: https://reviews.freebsd.org/D34336
This reverts commit dd08b84e35b6fdee0df5fd0e0533cd361dec71cb.
cy@ reported a problem caused by this patch. He will be
testing an alternate patch, but I'm reverting this one.
VOP_GETWRITEMOUNT() is called on the vn_start_write() path without any
vnode locks guaranteed to be held. It's therefore unsafe to blindly
access per-mount and per-vnode data. Instead, follow the approach taken
by nullfs and use the vnode interlock coupled with the hold count to
ensure the mount and the vnode won't be recycled while they are being
accessed.
Reviewed by: kib (earlier version), markj, pho
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D34282
ler@, markj@ reported a use after free in nfscl_cleanupkext().
They also provided two possible causes:
- In nfscl_cleanup_common(), "own" is the owner string
owp->nfsow_owner. If we free that particular
owner structure, than in subsequent comparisons
"own" will point to freed memory.
- nfscl_cleanup_common() can free more than one owner, so the use
of LIST_FOREACH_SAFE() in nfscl_cleanupkext() is not sufficient.
I also believe there is a 3rd:
- If nfscl_freeopenowner() or nfscl_freelockowner() is called
without the NFSCLSTATE mutex held, this could race with
nfscl_cleanupkext().
This could happen when the exclusive lock is held
on the client, such as when delegations are being returned.
This patch fixes them as follows:
1 - Copy the owner string to a local variable before the
nfscl_cleanup_common() call.
2 - Modify nfscl_cleanup_common() to return whether or not a
free was done.
When a free was done, do a goto to restart the loop, instead
of using FOREACH_SAFE, which was not safe in this case.
3 - Acquire the NFSCLSTATE mutex in nfscl_freeopenowner()
and nfscl_freelockowner(), if it not already held.
This serializes all of these calls with the ones done in
nfscl_cleanup_common().
Reported by: ler
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34334
HugeSectors * BytesPerSec should be computed before converting
HugeSectors to a DEV_BSIZE-based count.
Fixes: ba2c98389b78 ("msdosfs: sanity check sector count from BPB")
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34264
The ESXi NFSv4.1 client bogusly sends the wrong value
for the csa_sequence argument for a Create_session operation.
RFC8881 requires this value to be the same as the sequence
reply from the ExchangeID operation most recently done for
the client ID.
Without this patch, the server replies NFSERR_STALECLIENTID,
which is the correct response for an NFSv4.0 SetClientIDConfirm
but is not the correct error for NFSv4.1/4.2, which is
specified as NFSERR_SEQMISORDERED in RFC8881.
This patch fixes this.
This change does not fix the issue reported in the PR, where
the ESXi client loops, attempting ExchangeID/Create_session
repeatedly.
Reported by: asomers
Tested by: asomers
PR: 261291
MFC after: 1 week
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:
* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.
Rather than doing a poor job of exporting such file systems, it's better
just to refuse.
Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.
MFC after: 3 weeks.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33726
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:
* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.
Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.
MFC after: 3 weeks
Reviewed by: rmacklem
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle. Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server. Instead, it must
read the directory from 0 each time. It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for. That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.
MFC after: 3 weeks
Reviewed by: rmacklem
I see no apparent need to avoid waiting on the lock just because
vinactive() may be called on another thread while the thread that
cleared the vnode refcount has the lock dropped. In fact, this
can at least lead to a panic of the form "vn_lock: error <errno>
incompatible with flags" if LK_RETRY was passed to VOP_LOCK().
In this case LK_NOWAIT may cause the underlying FS to return an
error which is incompatible with LK_RETRY.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
The unionfs root vnode will always share a lock with its lower vnode.
If unionfs was mounted with the 'below' option, this will also be the
vnode covered by the unionfs mount. During unmount, the covered vnode
will be locked by dounmount() while the unionfs root vnode will be
locked by vgone(). This effectively requires recursion on the same
underlying like, albeit through two different vnodes.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
VOP_LOCK() may be handed a vnode that is concurrently reclaimed.
unionfs_lock() accounts for this by checking for empty vnode private
data under the interlock. But it incorrectly asserts that the vnode
is using the unionfs dispatch table before making this check.
Reverse the order, and also update KASSERT_UNIONFS_VNODE() to provide
more useful information.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
Commit b0b7d978b6a8 changed the NFSv4 server's default
behaviour to check the file's mode or ACL for permission to
open the file, to be Linux and Solaris compatible.
However, it turns out that Linux makes an exception for
the case of Claim_delegate_cur(_fh).
When a NFSv4 client is returning a delegation, it must
acquire Opens against the server to replace the ones
done locally in the client. The client does this via
an Open operation with Claim_delegate_cur(_fh). If
this operation fails, due to a change to the file's
mode or ACL after the delegation was issued, the
client does not have any way to retain the open.
As such, the Linux client allows the file's owner
to perform an Open with Claim_delegate_cur(_fh)
no matter what the mode or ACL allows.
This patch makes the FreeBSD server allow this case,
to be Linux compatible.
This patch only affects the case where delegations
are enabled, which is not the default.
MFC after: 2 weeks
When allocating new vnode, we need to lock it exclusively before
making it externally visible. Since other threads cannot observe the
vnode yet, current lock order cannot create LoR conditions.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34126
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071
This avoids a potentially wild reference to the mount object.
Additionally, simplify some of the checks around VV_ROOT in
unionfs_nodeget().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33914