When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.
The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260111
MFC after: 2 weeks
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260076
MFC after: 2 weeks
This prevents a kernel panic on a damaged ext2 superblock.
PR: 259107
Reported by: Robert Morris <rtm@lcs.mit.edu>
Differential Revision: https://reviews.freebsd.org/D33029
When using cached attributes, whether or not the data cache is enabled,
fusefs must update a file's atime whenever it reads from it, so long as
it wasn't mounted with -o noatime. Update it in-kernel, and flush it to
the server on close or during the next setattr operation.
The downside is that close() will now frequently trigger a FUSE_SETATTR
upcall. But if you care about performance, you should be using
-o noatime anyway.
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33145
When copy_file_range extends a file, it must update the cached file
size.
MFC after: 2 weeks
Reviewed by: rmacklem, pfg
Differential Revision: https://reviews.freebsd.org/D33151
For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.
The LookupOpen RPC reduces the number of Open RPCs
needed. Unfortunately, it breaks certain software
builds over NFS, so disable it until this is fixed.
The LookupOpen RPC is only used for NFSv4.1/4.2
mounts when the "oneopenown" mount option is
specified, so this should not affect many users.
This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33127
The slotid in the Sequence reply must be the same as
in the request. Check that it is the same and log
a console message if it is not, plus set it to the
correct value.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260071
MFC after: 2 weeks
The check for the original len being >= retlen needs to
be done before the "if (nd->nd_repstat == 0)" code, so
that it can be reported as too small.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260046
MFC after: 2 weeks
For a LayoutReturn when using the Flexible File Layout,
error reports may be provided in the request.
Sanity check the size of these error reports and
check that they exist before calling nfsrv_flexlayouterr().
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260012
MFC after: 2 weeks
This avoids needing to inspect the mount point every time.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D33125
VOP_FSYNC() asserts that the vnode is exclusively locked for NFS.
If we try to execute file with recently modified content, the assert is
triggered.
Reviewed by: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32999
VOP_RMDIR() is called with both parent and child directory vnodes
locked. The relookup operation performed by the unionfs implementation
may relock both vnodes. Accordingly, unionfs_relookup() drops the
parent vnode lock, but the child vnode lock is never dropped.
Although relookup() will very likely try to relock the child vnode
which is already locked, in most cases this doesn't produce a deadlock
because unionfs_lock() forces LK_CANRECURSE (!). However, relocking
of the parent vnode while the child vnode remains locked effectively
reverses the expected parent->child lock order, which can produce
a deadlock e.g. in the presence of a concurrent unionfs_lookup()
operation. Address the issue by dropping the child lock around
the unionfs_relookup() call in unionfs_rmdir().
Reported by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D32986
For pNFS, Layouts are issued by the server to indicate
where a file's data resides on the DS(s). This patch
adds counters for how many layouts are allocated to
the nfsstatsv1 structure, using two reserved fields.
MFC after: 2 weeks
Require devvp locked for mntfs_freevp(), to have it locked around
vgone(). Make that true for ffs, which is the only consumer of
the interface.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32761
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error. This patch adds the same to the
FreeBSD NFSv4.2 pNFS client, to maintain Linux compatible
behaviour, particlularily for non-FreeBSD pNFS servers.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.
This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.
Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.
MFC after: 2 weeks
Since the NFS Space_available and Files_available are unsigned,
the NFSv3 server sets them to 0 when negative, so that they
do not appear to be large positive values for non-FreeBSD clients.
This patch fixes the NFSv4 server to do the same.
Found during a recent IEFT NFSv4 working group testing event.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the server to
tell it about the error and keeps retrying, doing repeated
LayoutGet and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up.
For a mirrored server configuration, the first mirror that
ran out of space was taken offline. This does not make
much sense, since the other mirror(s) will run out of space
soon and the fix is a manual cleanup up disk space.
This patch changes the pNFS server to not disable a mirror
for the mirrored case when this occurs.
Further work is needed, since the Linux client expects the
MDS to reply NFSERR_NOSPC to LayoutGets once the DS is out
of space. Without this further change, the above mentioned
looping occurs.
Found during a recent IEFT NFSv4 working group testing event.
MFC after: 2 weeks
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.
This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.
The VOP_ALLOCATE.9 man page will be patched separately.
Reviewed by: khng, kib
Differential Revision: https://reviews.freebsd.org/D32865
Instead of validating that a vnode belongs to unionfs only when the
caller attempts to extract the upper or lower vnode pointers, do this
validation any time the caller tries to extract a unionfs_node from
the vnode private data.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32629
The lower FS VOP_READDIR() shouldn't return an empty read without
setting EOF; don't try to handle this case only for non-DIAGNOSTIC
builds.
Noted by: kib
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32629
Add an assertion to unionfs_node_update() that the upper vnode is
exclusively locked; we already make the same assertion for the lower
vnode.
Also, assert in unionfs_noderem() that the vnode lock is not recursed
and acquire v_lock with LK_NOWAIT. Since v_lock is not the active
lock for the vnode at this point, it should not be contended.
Finally, remove VDIR assertions from unionfs_get_cached_vnode().
lvp/uvp will be referenced but not locked at this point, so v_type
may concurrently change due to vgonel(). The cached unionfs node,
if one exists, would only have made it into the cache if lvp/uvp
were of type VDIR at the time of insertion; the corresponding
VDIR assert in unionfs_ins_cached_vnode() should be safe because
lvp/uvp will be locked by that time and will not be used if either
is doomed.
Noted by: kib
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32629
--Clearing cached subdirectories in unionfs_noderem() should be done
under the vnode interlock
--When preparing to switch the vnode lock in both unionfs_node_update()
and unionfs_noderem(), the incoming lock should be acquired before
updating the v_vnlock field to point to it. Otherwise we effectively
break the locking contract for a brief window.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32629
Although I was not able to cause a failure during testing, there
are places in nfscl_removedeleg() and nfscl_renamedeleg() where
I think a forced dismount could get hung. This patch fixes those.
This patch only affects forced dismount and only if the NFSv4
server is issuing delegations to the client.
Found by code inspection.
MFC after: 2 weeks
When a mount with the "pnfs" and "nconnect" options specified
does an I/O operation, it erroneously uses a TCP connection
to the MDS when it is meant to be a DS operation and, as such,
needs to use a TCP connection to the DS. This patch fixes this.
When the "pnfs" and "nconnect" options are specified for a
NFSv4.1/4.2 mount, there probably should be N connections
established to each DS for I/O RPCs. This is a fair amount
of work and may be done in a future commit.
This problem was found during a recent IETF NFSv4 working
group testing event.
MFC after: 2 weeks
Now that proc_get_binpath() does not return NULL in fullpath on success,
directly use sbuf_cat() over the value.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
When a forced dismount is in progress, it is possible to
end up looping, retrying commits that fail.
This patch fixes the problem by pretending
that commits succeeded when a forced dismount is in prgress.
MFC after: 2 weeks
When a forced dismount is done and the "nconnect" mount
option was used, the additional connections must be closed.
This patch does that.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
When a forced dismount is done and delegations are being
issued by the server (disabled by default for FreeBSD
servers), the delegation structure is free'd before the
loop calling vflush(). This could result in a use after
free of the delegation structure.
This patch changes the code so that the delegation
structures are not free'd until after the vflush()
loop for forced dismounts.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
The nfscl_getref() function is called within nfscl_doiods() when
the NFSv4.1/4.2 pNFS client is doing I/O on a DS. As such,
nfscl_getref() needs to check for a forced dismount.
This patch adds that check.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
For NFS RPCs that receive a NFSERR_DELAY reply, the delay time
is initially 1sec and then increases exponentially to NFS_TRYLATERDEL.
It was found that this delay time is excessive for some NFSv4
servers, which work well with a 1msec delay.
A 1sec delay resulted in very slow performance for Remove and
Rename when delegations and pNFS were enabled.
This patch decreases the initial delay time to 1msec.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
For pNFS servers that specify that Layouts are to be returned
upon close, they may expect that LayoutReturn to happen before
the associated Close.
This patch modifies the NFSv4.1/4.2 pNFS client so that this
is done. This only affects a pNFS mount against a non-FreeBSD
NFSv4.1/4.2 server that specifies return_on_close in LayoutGet
replies.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
Use proc_get_binpath() to get the hardlink right.
PR: 248184
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32738
Similar to commit 2be417843a, I believe there could be a race between
the NFS client VOP_LOOKUP() and file Writing that could result in stale
file attributes being loaded into the NFS vnode by VOP_LOOKUP().
I have not been able to reproduce a failure due to this race, but
I believe that there are two possibilities:
The Lookup RPC happens while VOP_WRITE() is being executed and loads
stale file attributes after VOP_WRITE() returns when it has already
completed the Write/Commit RPC(s).
--> For this case, setting the local modify timestamp at the end of
VOP_WRITE() should ensure that stale file attributes are not loaded.
The Lookup RPC occurs after VOP_WRITE() has returned, while
asynchronous Write/Commit RPCs are in progress and then is
blocked by the vnode held by VOP_OPEN/VOP_CLOSE/VOP_FSYNC which
will flush writes via ncl_flush() or ncl_vinvalbuf(), clearing the
NMODIFIED flag (which indicates Writes-in-progress). The VOP_LOOKUP()
then acquires the NFS vnode lock and fills in stale file attributes.
--> Setting the local modify timestamp in ncl_flsuh() and ncl_vinvalbuf()
when they clear NMODIFIED should ensure that stale file attributes
are not loaded.
This patch does the above.
PR: 259071
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32677
Commit 2be417843a added n_localmodtime, which is used by Lookup
and ReaddirPlus to check to see if the file attributes in an RPC
reply might be stale. This patch sets n_localmodtime in Deallocate.
Done as a separate commit, since Deallocate is not in stable/13.
PR: 259071
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D32635
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.
The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).
Here's what seems to happen:
Child Parent
does stat(), which does
VOP_LOOKUP(), doing the Lookup
RPC with the directory vnode
locked, acquiring file attributes
valid at this point in time
blocks waiting for locked file does ftruncate(), which
vnode does VOP_SETATTR() of Size,
changing the file's size
while holding an exclusive
lock on the file's vnode
releases the vnode lock
acquires file vnode and fills in
now stale attributes including
the old wrong Size
does a read() which returns
wrong data size
This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate).
Then lookup/readdirplus compares that timestamp with the time just
before starting the RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.
With this patch the test program works as expected.
Note that the test program does not fail on a July stable/12,
although this race is in the NFS client code. I suspect a
fairly recent change to the name caching code exposed this
bug.
PR: 259071
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32635
Commit 5e5ca4c8fc added a NFSMNTP_DELEGISSUED flag to indicate when
a delegation has been issued to the mount. For the common case
where an NFSv4 server is not issuing delegations, this flag
can be checked to avoid acquisition of the NFSCLSTATEMUTEX.
This patch adds checks for NFSMNTP_DELEGISSUED being set
to two more functions.
This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.
MFC after: 2 week
There was a case in nfscl_doiods() where the function would return
without releasing the delegation shared lock, if it was aquired by
the call to nfscl_getstateid(). This patch adds that release.
I have never observed a failure due to this missing release, so I
do not know if it ever happens in practice. However, since the pNFS
client is not yet heavily used, it might be the case.
Found by code inspection during a recent NFSv4 IETF working group
testing event.
MFC after: 2 week
unionfs uses a per-directory hashtable to cache subdirectory nodes.
Currently this hashtable is looked up using the directory name, but
since unionfs nodes aren't removed from the cache until they're
reclaimed, this poses some problems. For example, if a directory is
created on a unionfs mount shortly after deleting a previous directory
with the same path, the cache may end up reusing the node for the
previous directory, including its upper/lower FS vnodes. Operations
against those vnodes with then likely fail because the vnodes
represent deleted files; for example UFS will reject VOP_MKDIR()
against such a vnode because its effective link count is 0. This may
then manifest as e.g. mkdir(2) or open(2) returning ENOENT for an
attempt to create a file under the re-created directory.
While it would be possible to fix this by explicitly managing the
name-based cache during delete or rename operations, or by rejecting
cache hits if the underlying FS vnodes don't match those passed to
unionfs_nodeget(), it seems cleaner to instead hash the unionfs nodes
based on their underlying FS vnodes. Since unionfs prefers to operate
against the upper vnode if one is present, the lower vnode will only
be used for hashing as long as the upper vnode is NULL. This should
also make hashing faster by eliminating string traversal and using
the already-computed hash index stored in each vnode.
While here, fix a couple of other cache-related issues:
--Remove 8 bytes of unnecessary baggage from each unionfs node by
getting rid of the stored hash mask field. The mask is knowable
at compile time.
--When a matching node is found in the cache, reference its vnode
using vrefl() while still holding the vnode interlock. Previously
unionfs_nodeget() would vref() the vnode after the interlock was
dropped, but the vnode may be reclaimed during that window. This
caused intermittent panics from vn_lock(9) during unionfs stress
testing.
Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D32533