Use 'td' as the local thread name.
Wrap long lines.
Remove unneeded blank lines.
Reviewed by: asomers, jah, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542
Commit 40ada74ee1 modified the NFSv4.1/4.2 client so
that it would issue a DestroySession to the server when
all session slots are marked bad. This handles the
case where session slots get broken when "intr" or "soft"
NFSv4 fairly well.1/4.2 mounts are done.
There are two other cases where having an NFSv4.1/4.2
RPC attempt terminate without completion can leave
state in a non-determinate condition.
One is file locking RPCs. If the "nolockd" option is
used, this avoids file locking RPCs by doing locking
locally within the client.
The other is Open locks, but since all FreeBSD Open
locks are done with OPEN_SHARE_DENY_NONE, the locking
state for these should not be critical.
This patch enables use of "nolockd" for NFSv4 mounts,
so that it can be combined with "intr" and/or "soft",
making the latter more usable.
Use of "intr" or "soft" NFSv4 mounts are still not
recommended, but when combined with "nolockd" should
now work fairly well.
A man page update will be done as a separate commit.
MFC after: 2 weeks
Commit 40ada74ee1 modified the NFSv4.1/4.2 client so
that it would issue a DestroySession to the server when
all session slots are marked bad. Once this is done,
the Sequence operation should get a NFSERR_BADSESSION
reply from the server.
Without this patch, the code was setting ND_HASSLOTID
when, in fact, there was no slot marked in use by
nfsv4_sequencelookup(). This would result in the
code freeing a slot not in use. The effect of this
was minimal, since the session was already destroyed.
This patch fixes the code so that it does not set
ND_HASSLOTID for this case.
MFC after: 2 weeks
The NFSv4.1/4.2 client does recovery when it receives a
NFSERR_BADSESSION reply from the server. If the server has
not rebooted, this is often caused by multiple clients using
the same /etc/hostid and, as such, not being recognized as
different clients by the server.
This trivial patch adds a console message to suggest that
client's /etc/hostid's need to be checked for uniqueness.
MFC after: 2 weeks
The NFSv4.1/4.2 server generates a console message that indicates
that there is no session. I was until recently perplexed w.r.t. how
this could occur. It turns out that the common cause is multiple NFS
clients with the same /etc/hostid.
The host uuid is used by the FreeBSD NFSv4.1/4.2 client as a unique
identifier for the client. If multiple clients use the same host uuid,
this indicates to the NFSv4.1/4.2 server that they are the same client
and confusion occurs.
This trivial patch modifies the console message to suggest that the
client's /etc/hostid needs to be checked for uniqueness.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36377
When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.
This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.
This patch fixes the problem.
Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.
PR: 260011
MFC after: 1 week
When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.
This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.
This patch fixes the problem.
Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.
PR: 260011
MFC after: 1 week
When a session has been marked defunct by the server
sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2
client, nfsv4_sequencelookup() returns NFSERR_BADSESSION
without actually assigning a session slot.
Without this patch, newnfs_request() would erroneously
free slot 0.
This could result in the slot being reused prematurely,
but most likely just generated a "freeing free slot!!"
console message.
This patch fixes the code to not do the erroneous
freeing of the slot for this case.
PR: 260011
MFC after: 1 week
It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36268
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Currently the cuse(3) mmap(2) offset is split into 128 banks of 16 Mbytes.
Allow cuse(3) to make allocations that span multiple banks at the expense
of any fragmentation issues that may arise. Typically mmap(2) buffers are
well below 16 Mbytes. This allows 8K video resolution to work using webcamd.
Reviewed by: markj @
Differential Revision: https://reviews.freebsd.org/D35830
MFC after: 1 week
Sponsored by: NVIDIA Networking
With clang 15, the following -Werror warnings is produced:
sys/fs/nfsclient/nfs_clkdtrace.c:544:19: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
dtnfsclient_unload()
^
void
This is because dtnfsclient_unload() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.
MFC after: 3 days
Keep the definition around since it's used by userspace.
Reviewed by: alc, imp, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35791
I mis-read the RFC w.r.t. handling of the sequenceid
when a CreateSession is done after the initial one
that confirms the ClientID. Fortunately this does
not affect most extant NFSv4.1/4.2 clients, since
they only acquire a single session for TCP for a
ClientID (Solaris might be an exception?).
This patch fixes the server to handle this case,
where the RFC requires the sequenceid be incremented
for each CreateSession and is required to reply to
a retried CreateSession with a cached reply.
It adds a field to nfsclient called lc_prevsess,
which caches the sessionid, which is the only field
in a CreateSession reply that will change for a
retry, to implement this reply cache.
The recent commits up to d4a11b3e3b that mark
session slots bad when "intr" and/or "soft" mounts
are used by the client needs this server patch.
Without this patch, the client will do a full
recovery, including a new ClientID, losing all
byte range locks. However, prior to the recent
client commits, the client would hang when all
session slots were bad, so even without this
patch it is not a regression.
PR: 260011
MFC after: 2 weeks
Commit 981ef32230 added optional use of the session
slots marked bad to recover a new session when all
slots are marked bad. The recovery worked against
a FreeBSD NFSv4.1/4.2 server, but not a Linux one.
It turns out that it was a bug in the FreeBSD client
and not the Linux server.
This patch fixes the client so that DeleteSession
followed by CreateSession after receiving a
NFSERR_BADSESSION error reply works against the
Linux server (and conforms to the RFC).
This also implies that the FreeBSD NFSv4.1/4.2
server needs to be fixed in a future commit.
Without the fix, the FreeBSD server does a full
recovery, including creation of a new ClientID,
but since "intr" mounts were broken, this does
not result in a regression.
This patch only affects the case where a CreateSession
is done for an already confirmed ClientID, which was
not being done prior to commit 981ef32230.
PR: 260011
MFC after: 2 weeks
Commit 326bcf9394 added a new "cred" argument to nfscl_reqstart().
Fsinfo is a NFSv3 RPC and since the "cred" argument is not
used for NFSv3, it does not matter what is passed in.
However, to be consistent with the rest of the patch, change the
argument to NULL.
This patch should not result in a semantics change.
PR: 260011
MFC after: 2 weeks
Commit 981ef32230 enabled marking of potentially bad
session slots when an RPC is interrupted if the "intr"
mount option is used. As such, it no longer makes
sense to call nfscl_hasexpired() for I/O operations that
reply NFSERR_BADSTATEID for NFSv4.1/4.2, which does a full
recovery of NFSv4 open state, destroying all byte range locks.
Recovery of open state should not be usually needed, since
the session slot has been marked potentially bad and,
although opens for the process that has been terminated via
a signal may be broken, locks for other processes will still
be valid.
This patch disables calls to nfscl_hasexpired for NFSv4.1/4.2
mounts, when I/O RPCs receive NFSERR_BADSTATEID replies.
It does not affect the behaviour of NFSv4.0 mounts nor
hard (non "intr") mounts.
PR: 260011
MFC after: 2 weeks
To deal with broken session slots caused by the use of the
"soft" and/or "intr" mount options, nfsv4_sequencelookup()
has been modified to track the potentially broken session
slots (commit 40ada74ee1). Then, when all session slots
are potentially broken, nfsv4_sequencelookup() does a
DeleteSession operation, so that the NFSv4.1/4.2 server will
reply NFSERR_BADSESSION to uses of the session.
The client will then recover by doing a CreateSession to
acquire a new session.
This patch adds the code that marks potentially bad
slots, so that the above semantics become functional.
It has been successfully tested against a FreeBSD
NFSv4.1/4.2 server, but does not work against a Linux 5.15
NFSv4.1/4.2 server. (The Linux 5.15 server creates
a new session with the same sessionid as the destroyed
one and, as such, keeps returning NFSERR_BADSESSION.
I believe this is a bug in the Linux server.)
However, this should not cause a regression and will
make "intr" mounts fairly usable against the NFSv4.1/4.2
servers where it works.
PR: 260011
MFC after: 2 weeks
This patch adds support for session slots marked bad
to nfsv4_sequencelookup(). An additional boolean
argument indicates if the check for slots marked bad
should be done.
The "cred" argument added to nfscl_reqstart() by
commit 326bcf9394 is now passed into nfsv4_setquence()
so that it can optionally set the boolean argument
for nfsv4_sequencelookup(). When optionally enabled,
nfsv4_setsequence() will do a DestroySession when all
slots are marked bad.
Since the code that marks slots bad is not yet committed,
this patch should not result in a semantics change.
PR: 260011
MFC after: 2 weeks
This patch moves nfsrpc_destroysession() into nfscommon.ko
and also modifies its arguments slightly. This will allow
the function to be called from nfsv4_sequencelookup() in
a future commit.
This patch should not result in a semantics change.
PR: 260011
MFC after: 2 weeks
Commit 326bcf9394 added a "cred" argument to nfscl_reqstart().
For the pNFS proxy calls on the server, the argument
should be "cred" instead of NULL.
This patch fixes this.
Since the argument is not yet used, this patch
should not result in a semantics change.
PR: 260011
MFC after: 2 weeks
To deal with broken session slots caused by the use of the
"soft" and/or "intr" mount options, nfsv4_sequencelookup()
will be modified to track the potentially broken session
slots. Then, when all session slots are potentially
broken, do a DeleteSession operation, so that the NFSv4
server will reply NFSERR_BADSESSION to uses of the session.
These changes will be done in future commits. However,
to do the DeleteSession RPC, a "cred" argument is needed
for nfscl_reqstart(). This patch adds this argument,
which is unused at this time. If the argument is NULL,
it indicates that DeleteSession should not be done
(usually because the RPC does not use sessions).
This patch should not cause any semantics change.
PR: 260011
MFC after: 2 weeks
Commit a7bb120f8b added a printf for the case where recovery
has not marked the session defunct by setting nfsess_defunct
to 1. It turns out that nfscl_hasexpired() calls
nfsrpc_setclient() directly, without setting nfsess_defunct.
This patch replaces the printf with code that sets
nfsess_defunct to 1 to handle this case.
If SIGTERM is issued to a process when it is doing I/O on
an "intr" mount, the NFSv4 server may reply NFSERR_BADSTATEID,
due to the Open being prematurely closed.
This can result in a call to nfscl_hasexpired() to do a
recovery.
This would explain at least one hang described in the PR.
PR: 260011
MFC after: 2 weeks
To allow for a dynamic page size on arm64 remove the static value from libcuse.
Differential Revision: https://reviews.freebsd.org/D35585
MFC after: 1 week
Sponsored by: NVIDIA Networking
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code and, therefore,
use of the macro has been deleted by previous commits.
This commit deletes the, now unused, macro.
This commit should not result in a semantics change.
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.
This commit should not result in a semantics change.
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.
This commit should not result in a semantics change.
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.
This commit should not result in a semantics change.
The vfs_flags() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
remove it to clean up the code.
This commit should not result in a semantics change.
The definition of "APPLE" was used by the Mac OSX port.
For FreeBSD, this definition is never used, so remove
the references to it to clean up the code.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c and
nfs_clstate.c.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c and
nfs_clvfsops.c. Future commits will do the same for other functions.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.
This commit should not result in a semantics change.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.
This commit should not result in a semantics change.
null_nodeget() needs a valid mount point data, otherwise we might
race and dereference NULL.
Using MBF_NOWAIT makes non-forced unmount non-transparent for
vn_fullpath() over nullfs, but we make no guarantee that fullpath
calculation succeeds anyway.
Reported and tested by: pho
Reviewed by: jah
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35477
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.
This commit should not result in a semantics change.
When a NFSv4 byte range write lock is unlocked, all
data modifications need to be flushed to the server
to satisfy the coherency requirements for byte range
locking. However, if a write delegation for the
file is held by the client, flushing is not required,
since no other NFSv4 client can have the file NFSv4
Opened.
Found by inspection as suggested by a similar change
that was done to the Linux NFSv4 client.
Commits 3ad1e1c1ce and 57014f21e7 added a Lookup+Open
RPC for NFSv4.1/4.2, which can reduce the RPC count by
10-20% for some loads. This has now received a fair amount
of testing, so I think it is ok to enable it.
Note that the Lookup+Open RPC is only used when the
"oneopenown" mount option is specified. As such, this
change won't affect most NFSv4.1/4.2 mounts.
When a NFSv4.1/4.2 session to the NFS server (not a pNFS DS) is
replaced, the old session should always be marked defunct by
nfsess_defunct being set non-zero.
However, the hang reported by the PR suggests that this might
be the case.
This patch adds a printf() to indicate this has somehow happened.
PR: 260011
MFC after: 2 weeks
The NFSERR_BADSESSION reply from a NFSv4.1/4.2 server
is handled by newnfs_request(). It should not be handled
separately after newnfs_request() has returned.
These two cases were spotted during code inspection.
One of them should only redo what newnfs_request() already
did by the same "nfscl" thread. The other might have
resulted in recovery being done twice, but the code is
only used for "pnfs" mounts, so that would be rare.
Also, since NFSERR_BADSESSION should only be replied by
a server after the server reboots, this would be extremely
rare.
MFC after: 2 weeks
* If during FUSE_CREATE, FUSE_MKDIR, etc the server returns the same
inode number for the new file as for its parent directory, reject it.
Previously this would triggers a recurse-on-non-recursive lock panic.
* If during FUSE_LINK the server returns a different inode number for
the new name as for the old one, reject it. Obviously, that can't be
a hard link.
* If during FUSE_LOOKUP the server returns the same inode number for the
new file as for its parent directory, reject it. Nothing good can
come of this.
PR: 263662
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D35128
Robert Morris reported that, if a client sends an absurdly
large Owner/OwnerGroup string, the kernel malloc() for the
large size string can block forever.
This patch adds a sanity limit for Owner/OwnerGroup string
length. Since the RFCs do not specify any limit and FreeBSD
can handle a group name greater than 1Kbyte, the limit is
set at a generous 10Kbytes.
Reported by: rtm@lcs.mit.edu
PR: 260546
MFC after: 2 weeks
When the MDS of a pNFS service receives an Open/Create
and the file already exists, it must do a Setattr of
size == 0. Without this patch, this was eroneously
done via a VOP_SETAATR() call, which would set the
length of the MDS file to 0 (which is already is,
since all data lives on the DSs).
This patch fixes the problem by doing a nfsvno_setattr()
instead of VOP_SETATTR(), which knows to do a proxied
Setattr on the DSs.
For a non-pNFS server, the change has no effect, since
nfsvno_setattr() only does a VOP_SETATTR() for that case.
This was found during a recent IETF NFSv4 testing event.
MFC after: 2 weeks
When the NFSv4.1/4.2 client is doing a pnfs mount to
mirrored DS(s), asynchronous threads are used to do the
RPCs against the DS(s) concurrently. If a DS is slow
to reply, it is possible for the "cred" to be free'd
before the asynchronous thread is done with it, causing
a panic/crash.
This patch fixes the problem by acquiring a refcount on
the "cred" while it is being used by the asynchronous thread
for a DS RPC. This bug was found during a recent IETF
NFSv4 testing event.
This bug only affects "pnfs" mounts to mirrored pNFS
servers.
MFC after: 2 weeks
Without this patch the NFSv4.1/4.2 server erroneously
always frees session slot zero for callbacks. This only
affects 4.1/4.2 mounts if the server has delegations
enabled or is a pNFS configuration. Even for those
cases, the effect is mainly to only use slot 0 for
callbacks, serializing all of them. There is a slight
chance that callbacks will fail if the client performs
them in a different order than received on the TCP
connection.
If this bug affects your server, you will see console
messages like:
newnfs_request: Bad session slot
This patch fixes the problem. Found during a recent
IETF NFSv4 testing event.
PR: 263728
MFC after: 2 weeks
Robert Morris reported that, for the case of SecinfoNoname
with the Parent option, providing a non-directory could
cause a crash.
This patch adds a sanity check for v_type == VDIR for
this case, to avoid the crash.
Reported by: rtm@lcs.mit.edu
PR: 260300
MFC after: 2 weeks
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.
This patch modifies the NFSv4 client to use an Appendwrite
RPC, which does a Verify to check the file's size before
doing the Write. This avoids the need for a Getattr RPC
to preceed this RPC and reduces the RPC count by half for
IO_APPEND writes, so long as the client knows the file's
size.
The nfsd structure was moved from the stack to be malloc()'d,
since the kernel stack limit was being exceeded.
While here, fix the types of a few variables, although
there should not be any semantics change caused by these
type changes.
The daemon can specify fsname=XXX in its mount options. If so, the file
system should report f_mntfromname as XXX during statfs. This will show
up in the output of commands like mount and df.
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35090
Prior to fuse protocol version 7.9, the fuse_entry_out structure had a
smaller size. But fuse_vnop_create did not take that into account when
working with servers that use older protocols. The bug does not matter
for servers which don't use file handles or open flags (the only fields
affected).
PR: 263625
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC after: 2 weeks
During a FUSE_WRITE, the kernel requests the server to write a certain
amount of data, and the server responds with the amount that it actually
did write. It is obviously an error for the server to write more than
it was provided, and we always treated it as such, but there were two
problems:
* If the server responded with a huge amount, greater than INT_MAX, it
would trigger an integer overflow which would cause a panic.
* When extending the file, we wrongly set the file's size before
validing the amount written.
PR: 263263
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D34955
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for assorted functions
local to nfs_clrpcops.c.
Future commits will do the same for other functions.
Formerly fusefs would pass up the stack any error value returned by the
fuse server. However, some values aren't valid for userland, but have
special meanings within the kernel. One of these, EJUSTRETURN, could
cause a kernel page fault if the server returned it in response to
FUSE_LOOKUP. Fix by validating all errors returned by the server.
Also, fix a data lifetime bug in the FUSE_DESTROY test.
PR: 263220
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 3 weeks
Sponsored by: Axcient
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D34931
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_nget().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_loadattrcache().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_request().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_postop_attr().
Future commits will do the same for other functions.
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.
This patch modifies nfscl_wcc_data() to optionally
acquire the file's size, for use with an AppendWrite.
Although the "stuff" arguments are always NULL
(these were used for the Mac OSX port and should be
cleared out someday), make the argument to
nfscl_wcc_data() explicitly NULL for clarity.
This patch does not cause any semantics change until
the AppendWrite is added in a future commit.
* We never send FUSE_LOOKUP for the root inode, since its inode number
is hard-coded to 1. Therefore, we should not send FUSE_FORGET for it,
lest the server see its lookup count fall below 0.
* During VOP_RECLAIM, if we are reclaiming the root inode, we must clear
the file system's vroot pointer. Otherwise it will be left pointing
at a reclaimed vnode, which will cause future VOP_LOOKUP operations to
fail. Previously we only cleared that pointer during VFS_UMOUNT. I
don't know of any real-world way to trigger this bug.
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D34753
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct before doing the Write operation.
This patch prepares the NFSv4 client for such an
RPC, which will be added in a future commit.
This patch does not cause any semantics change.
Commit 867c27c23a modified the NFS client so that
it did IO_APPEND writes directly to the NFS server
bypassing the buffer cache, via a call to
nfs_directio_write(). Unfortunately, this (very old)
function assumed that the uio iov was for user space
addresses. As such, a IO_APPEND VOP_WRITE() that
was for system space, such as ktrace(1) does, would
write bogus data.
This patch fixes nfs_directio_write() so that it
handles kernel space uio iovs.
Reported by: bz
Tested by: bz
MFC after: 2 weeks
A NFSv4.1/4.2 pNFS mount needs to do a
separate Open+LayoutGet RPC, so do not do
a Lookup+Open RPC for these mounts.
The Lookup+Open RPCs are still disabled,
until further testing is done, so this patch
has no effect at this time.
Use of the Lookup+Open RPC is currently disabled,
due to a problem detected during testing. This
patch fixes this problem. The problem was that
nfscl_postop_attr() does not parse the attributes
if nd_repstat != 0. It also would parse the
return status for the operation, where the
Lookup+Open code had already parsed it.
The first change in the patch does not make any
semantics change, but makes the code identical
to what is done later in the function, so that
it is apparent that the semantics should be the
same in both places.
Lookup+Open remains disabled while further
testing is being done, so this patch has no
effect at this time.
The Fsinfo RPC is exempt from the check for
Kerberized NFS being required, as recommended
by RFC2623. However, there is no reason to
exempt Fsinfo from the requirement to use TLS.
This patch fixes the code so that the exemption
only applies to Kerberized NFS and not
NFS-over-TLS.
This only affects NFS-over-TLS for an NFSv3
mount when it is required, but the client does
not do so.
MFC after: 1 month
ler@, markj@ reported a use after free in nfscl_cleanupkext().
They also provided two possible causes:
- In nfscl_cleanup_common(), "own" is the owner string
owp->nfsow_owner. If we free that particular
owner structure, than in subsequent comparisons
"own" will point to freed memory.
- nfscl_cleanup_common() can free more than one owner, so the use
of LIST_FOREACH_SAFE() in nfscl_cleanupkext() is not sufficient.
I also believe there is a 3rd:
- If nfscl_freeopenowner() or nfscl_freelockowner() is called
without the NFSCLSTATE mutex held, this could race with
nfscl_cleanupkext().
This could happen when the exclusive lock is held
on the client, such as when delegations are being returned
or when recovering from NFSERR_EXPIRED.
This patch fixes them as follows:
1 - Copy the owner string to a local variable before the
nfscl_cleanup_common() call.
2 - Modify nfscl_cleanup_common() so that it will never free more
than the first matching element. Normally there should only
be one element in each list with a matching open/lock owner
anyhow (but there might be a bug that results in a duplicate).
This should guarantee that the FOREACH_SAFE loops in
nfscl_cleanupkext() are adequate.
3 - Acquire the NFSCLSTATE mutex in nfscl_freeopenowner()
and nfscl_freelockowner(), if it is not already held.
This serializes all of these calls with the ones done in
nfscl_cleanup_common().
Reported by: ler
Reviewed by: markj
Tested by: cy
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34334
When renaming a directory into a different parent directory, invalidate
the cached attributes of the new parent. Otherwise, stat will show the
wrong st_nlink value.
MFC after: 1 week
Reviewed by: ngie
Differential Revision: https://reviews.freebsd.org/D34336
VOP_GETWRITEMOUNT() is called on the vn_start_write() path without any
vnode locks guaranteed to be held. It's therefore unsafe to blindly
access per-mount and per-vnode data. Instead, follow the approach taken
by nullfs and use the vnode interlock coupled with the hold count to
ensure the mount and the vnode won't be recycled while they are being
accessed.
Reviewed by: kib (earlier version), markj, pho
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D34282
ler@, markj@ reported a use after free in nfscl_cleanupkext().
They also provided two possible causes:
- In nfscl_cleanup_common(), "own" is the owner string
owp->nfsow_owner. If we free that particular
owner structure, than in subsequent comparisons
"own" will point to freed memory.
- nfscl_cleanup_common() can free more than one owner, so the use
of LIST_FOREACH_SAFE() in nfscl_cleanupkext() is not sufficient.
I also believe there is a 3rd:
- If nfscl_freeopenowner() or nfscl_freelockowner() is called
without the NFSCLSTATE mutex held, this could race with
nfscl_cleanupkext().
This could happen when the exclusive lock is held
on the client, such as when delegations are being returned.
This patch fixes them as follows:
1 - Copy the owner string to a local variable before the
nfscl_cleanup_common() call.
2 - Modify nfscl_cleanup_common() to return whether or not a
free was done.
When a free was done, do a goto to restart the loop, instead
of using FOREACH_SAFE, which was not safe in this case.
3 - Acquire the NFSCLSTATE mutex in nfscl_freeopenowner()
and nfscl_freelockowner(), if it not already held.
This serializes all of these calls with the ones done in
nfscl_cleanup_common().
Reported by: ler
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34334
HugeSectors * BytesPerSec should be computed before converting
HugeSectors to a DEV_BSIZE-based count.
Fixes: ba2c98389b ("msdosfs: sanity check sector count from BPB")
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34264
The ESXi NFSv4.1 client bogusly sends the wrong value
for the csa_sequence argument for a Create_session operation.
RFC8881 requires this value to be the same as the sequence
reply from the ExchangeID operation most recently done for
the client ID.
Without this patch, the server replies NFSERR_STALECLIENTID,
which is the correct response for an NFSv4.0 SetClientIDConfirm
but is not the correct error for NFSv4.1/4.2, which is
specified as NFSERR_SEQMISORDERED in RFC8881.
This patch fixes this.
This change does not fix the issue reported in the PR, where
the ESXi client loops, attempting ExchangeID/Create_session
repeatedly.
Reported by: asomers
Tested by: asomers
PR: 261291
MFC after: 1 week
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:
* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.
Rather than doing a poor job of exporting such file systems, it's better
just to refuse.
Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.
MFC after: 3 weeks.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33726
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:
* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.
Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.
MFC after: 3 weeks
Reviewed by: rmacklem
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle. Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server. Instead, it must
read the directory from 0 each time. It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for. That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.
MFC after: 3 weeks
Reviewed by: rmacklem
I see no apparent need to avoid waiting on the lock just because
vinactive() may be called on another thread while the thread that
cleared the vnode refcount has the lock dropped. In fact, this
can at least lead to a panic of the form "vn_lock: error <errno>
incompatible with flags" if LK_RETRY was passed to VOP_LOCK().
In this case LK_NOWAIT may cause the underlying FS to return an
error which is incompatible with LK_RETRY.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
The unionfs root vnode will always share a lock with its lower vnode.
If unionfs was mounted with the 'below' option, this will also be the
vnode covered by the unionfs mount. During unmount, the covered vnode
will be locked by dounmount() while the unionfs root vnode will be
locked by vgone(). This effectively requires recursion on the same
underlying like, albeit through two different vnodes.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
VOP_LOCK() may be handed a vnode that is concurrently reclaimed.
unionfs_lock() accounts for this by checking for empty vnode private
data under the interlock. But it incorrectly asserts that the vnode
is using the unionfs dispatch table before making this check.
Reverse the order, and also update KASSERT_UNIONFS_VNODE() to provide
more useful information.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
Commit b0b7d978b6 changed the NFSv4 server's default
behaviour to check the file's mode or ACL for permission to
open the file, to be Linux and Solaris compatible.
However, it turns out that Linux makes an exception for
the case of Claim_delegate_cur(_fh).
When a NFSv4 client is returning a delegation, it must
acquire Opens against the server to replace the ones
done locally in the client. The client does this via
an Open operation with Claim_delegate_cur(_fh). If
this operation fails, due to a change to the file's
mode or ACL after the delegation was issued, the
client does not have any way to retain the open.
As such, the Linux client allows the file's owner
to perform an Open with Claim_delegate_cur(_fh)
no matter what the mode or ACL allows.
This patch makes the FreeBSD server allow this case,
to be Linux compatible.
This patch only affects the case where delegations
are enabled, which is not the default.
MFC after: 2 weeks
When allocating new vnode, we need to lock it exclusively before
making it externally visible. Since other threads cannot observe the
vnode yet, current lock order cannot create LoR conditions.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34126
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071
This avoids a potentially wild reference to the mount object.
Additionally, simplify some of the checks around VV_ROOT in
unionfs_nodeget().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33914
The function nfscl_getcookie(), which is essentially the
same as ncl_getcookie(), is never called, so delete it.
This is probably cruft left over from the port of the
NFSv4 code to FreeBSD several years ago.
Found while modifying the code to better use the
directory offset cookies.
MFC after: 2 weeks
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.
Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.
Noted by: kib
Reported by: pho
do_execve() will hold the vnode lock shared when it calls VOP_OPEN(),
but unionfs_open() requires the lock to be held exclusive to
correctly synchronize node status updates. This requirement is
asserted in unionfs_get_node_status().
Change unionfs_open() to temporarily upgrade the lock as is already
done in unionfs_close(). Related to this, fix various cases throughout
unionfs in which vnodes are not checked for reclamation following lock
upgrades that may have temporarily dropped the lock. Also fix another
related issue in which unionfs_lock() can incorrectly add LK_NOWAIT
during a downgrade operation, which trips a lockmgr assertion.
Reviewed by: kib (prior version), markj, pho
Reported by: pho
Differential Revision: https://reviews.freebsd.org/D33729
The UFS and ZFS file systems only support Allow/Deny ACEs
in the NFSv4 ACLs. This patch does not allow the server
to parse Audit/Alarm ACEs. The NFSv4 client is still
allowed to pase Audit/Alarm ACEs, since non-FreeBSD NFSv4
servers may use them.
This patch should not have a significant effect, since the
UFS and ZFS file systems will not handle these ACEs anyhow.
It simply serves as an additional "safety belt" for the
NFSv4 server.
MFC after: 2 weeks
This reverts commit 0fa074b53e.
I now see that the implementation of the "dacl" operation
requires that the NFSv4 server to "automatic inheritance"
and I do not plan on doing this. As such, this patch is
harmless, but unneeded.
This reverts commit f10dc28ec2.
The client should still be able to getfacl
audit and alarm ACEs, for non-FreeBSD NFSv4 servers.
A patch that only disables audit/alarm for the server
side will be committed to replace this patch.
Before this callouts were scheduled twice a seconds even if nfsd was
never used. This reduces the rate to ~1Hz and only after nfsd first
started.
MFC after: 2 weeks
to prevent races with devfs VCHR vnode reclamation, same as it was
done for UFS.
Reported by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
A function to remount the filesystem from rw to ro on integrity error.
The work is performed in taskqueue to allow the call to be done from
almost arbitrary context where erronous state was detected.
Tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
We use sector count to size the FAT inuse bitset. If sector count is
corrupted, kernel might be tricked into doing unbound allocation.
Ensure that the sector count does not exceed the actual volume size.
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
Change its return type to void, because its result is ignored in both
call sites. Remove oldcnp argument as well, it is NULL always.
Suggested and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
This means that filesystem is corrupted, there is a loop.
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
In other words, stop silently accepting freeing free cluster in
non-debug kernels, but return the error to the caller. Modify callers
to handle errors from usemap_free().
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
It is possible, on the corrupted msdosfs volume, to have file which
denode inode number does not match the one calculated using directory
cluster. Instead of asserting the condition as impossible, handle it
and return error, after reclaiming the aliased vnode.
In collaboration with: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721
kib@ reported a problem which was resolved by
reverting commit 867c27c23a, which changed the NFS
client to use direct RPCs to the server for
IO_APPEND writes. He also spotted that the
code only invalidated buffer cache buffers
when they were marked NMODIFIED (had been
written into).
This patch modifies the NFS VOP_WRITE() to
always invalidate the buffer cache buffers
and pages for the file when IO_APPEND is
specified. It also includes some cleanup
suggested by kib@.
Reported by: kib
Tested by: kib
Reviewed by: kib
MFC after: 10 weeks
The implementation simply passes the text ref to the appropriate
underlying vnode. Without this, the default [un]set_text
implementation will only manage the text ref on the unionfs vnode,
causing it to be out of sync with the underlying filesystems and
potentially allowing corruption of executable file contents.
On INVARIANTS kernels, it also readily produces a panic on process
termination because the VM object representing the executable mapping
is backed by the underlying vnode, not the unionfs vnode.
PR: 251342
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33611
Use atomics to track the writecount granted to the underlying FS,
and avoid holding the vnode interlock while calling the underling FS'
VOP_ADD_WRITECOUNT(). This also fixes a WITNESS warning about nesting
the same lock type. Also add comments explaining why we need to track
the writecount on the unionfs vnode in the first place. Finally,
simplify writecount management to only use the upper vnode and assert
that we shouldn't have an active writecount on the lower vnode through
unionfs.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33611
Now posix_fallocate will be correctly forwarded to fuse file system
servers, for those that support it.
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33389
By default, FUSE file systems are assumed not to support lookups for "."
and "..". They must opt-in to that. To cope with this limitation, the
fusefs kernel module caches every fuse vnode's parent's inode number,
and uses that during VOP_LOOKUP for "..". But if the parent's vnode has
been reclaimed that won't be possible. Previously we paniced in this
situation. Now, we'll return ESTALE instead. Or, if the file system
has opted into ".." lookups, we'll just do that instead.
This commit also fixes VOP_LOOKUP to respect the cache timeout for ".."
lookups, if the FUSE file system specified a finite timeout.
PR: 259974
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33239
If FUSE_COPY_FILE_RANGE returns successfully, update the atime of the
source and the mtime and ctime of the destination.
MFC after: 2 weeks
Reviewers: pfg
Differential Revision: https://reviews.freebsd.org/D33159
VOPs like VOP_SETATTR can change a file's size, with the vnode
exclusively locked. But VOPs like VOP_LOOKUP look up the file size from
the server without the vnode locked. So a race is possible. For
example:
1) One thread calls VOP_SETATTR to truncate a file. It locks the vnode
and sends FUSE_SETATTR to the server.
2) A second thread calls VOP_LOOKUP and fetches the file's attributes from
the server. Then it blocks trying to acquire the vnode lock.
3) FUSE_SETATTR returns and the first thread releases the vnode lock.
4) The second thread acquires the vnode lock and caches the file's
attributes, which are now out-of-date.
Fix this race by recording a timestamp in the vnode of the last time
that its filesize was modified. Check that timestamp during VOP_LOOKUP
and VFS_VGET. If it's newer than the time at which FUSE_LOOKUP was
issued to the server, ignore the attributes returned by FUSE_LOOKUP.
PR: 259071
Reported by: Agata <chogata@moosefs.pro>
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33158
Add functionality for extents validation inside the filesystem
extents block. The main logic is implemented under
ext4_validate_extent_entries() function, which verifies extents
or extents indexes depending of extent depth value.
PR: 259112
Reported by: Robert Morris
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33375
Rename ext2_dirbadentry() to ext2_check_direntry(). Add directory
entry inode value check, and call ext2_check_direntry() in all cases.
The dirchk sysctl is removed.
PR: 259024,259041
Reported by: Robert Morris
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33374
FreeBSD only supports Allow/Deny ACEs in NFSv4 ACLs.
As such, it does not make sense to parse Audit/Alarm
ACEs. Modify nfsrv_dissectace() so that it returns
NFSERR_ATTRNOTSUPP if an Audit/Alarm ACE is found in
the ACL being parsed. The code has been #ifdef notnow'd,
since Audit/Alarm ACEs might be supported someday.
This should not have significant impact, since FreeBSD
reports to clients that only Allow/Deny ACEs are
supported and an attempt to set one would have failed
anyhow.
MFC after: 2 weeks
NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.
This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.
Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.
MFC after: 2 weeks
I thought that these new auth_stat values had been agreed
upon by the IETF NFSv4 working group, but that no longer
is the case. As such, delete them and use AUTH_TOOWEAK
instead. Leave the code that uses these new auth_stat
values in the sources #ifdef notnow, in case they are
defined in the future.
MFC after: 1 week
Commit 867c27c23a modified the NFS client so that
it does IO_APPEND writes directly to the NFS server,
bypassing the buffer cache. However, this could result
in stale data in client pages when the file is mmap(2)'d.
As such, the NFS client needs to call vm_object_is_active()
to check if the file is mmap(2)'d and only do direct
output if the file is not mmap(2)'d.
This patch adds this check.
Although a simple patch, I have given it a long MFC,
since the related commit 867c27c23a made a significant
semantics change and, as such, has a long MFC.
MFC after: 3 months
Commit 867c27c23a enabled the n_directio_opens code
in open/close, which sets/clears NNONCACHE, for
IO_APPEND. This code should not be enabled unless
newnfs_directio_enable is non-zero.
This patch reverts that part of commit 867c27c23a.
A future patch that fixes the case where the
file that is being written IO_APPEND is mmap()'d.
MFC after: 3 months
The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.
PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
In NFSv2, the directory cookie was 32-bits. NFSv3 widened it to
64-bits and SVN r22521 widened the corresponding argument in
VOP_READDIR, but FreeBSD's NFS server continued to treat the cookies as
32-bits, and 0-extended to fill the field on the wire. Nobody ever
noticed, because every in-tree file system generates cookies that fit
comfortably within 32-bits.
Also, have better type safety for txdr_hyper. Turn it into an inline
function that type-checks its arguments. Prevents warnings about
shift-count-overflow.
PR: 260375
MFC after: 2 weeks
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
The check for "not first operation" in CB_SEQUENCE
was done after the slot, etc. was updated. This patch
moves the check to the beginning of CB_SEQUENCE
processing.
While here, also fix the check for "no CB_SEQUENCE operation first"
by moving the check to the beginning of callback operation parsing,
since the check was in a couple of the other operations, but
not all of them.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260412
MFC after: 2 weeks
IO_APPEND writes have always been very slow over NFS, due to
the need to acquire an up to date file size after flushing
all writes to the NFS server.
This patch switches the IO_APPEND writes to use direct I/O,
bypassing the buffer cache. As such, flushing of writes
normally only occurs when the open(..O_APPEND..) is done.
It does imply that all writes must be done synchronously
and must be committed to stable storage on the file server
(NFSWRITE_FILESYNC).
For a simple test program that does 10,000 IO_APPEND writes
in a loop, performance improved significantly with this patch.
For a UFS exported file system, the test ran 12x faster.
This drops to 3x faster when the open(2)/close(2) are done
for each loop iteration.
For a ZFS exported file system, the test ran 40% faster.
The much smaller improvement may have been because the ZFS
file system I tested against does not have a ZIL log and
does have "sync" enabled.
Note that IO_APPEND write performance is still much slower
than when done on local file systems.
Although this is a simple patch, it does result in a
significant semantics change, so I have given it a
large MFC time.
Tested by: otis
MFC after: 3 months
As reported in PR#260343, nfs_allocate() did not check
the filesize rlimit. This patch adds that check.
PR: 260343
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33422
This patch decrements maxcnt by the appropriate
number of bytes during parsing and checks to see
if there is data remaining. If not, it just returns
from nfsrv_flexlayouterr() without further processing.
This prevents the tl pointer from running off the end
of the error data pointed at by layp, if there are
flaws in the data.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260293
MFC after: 2 weeks
For pNFS mounts to mirrored Flexible File layout pNFS servers,
the "must_commit" component in the nfsclwritedsdorpc
structure must be checked and the "must_commit" argument passed
into nfscl_doiods() must be updated. Technically, only writes to
the DS with a writeverf change must be redone, but since this
occurrence will be rare, the must_commit argument to nfscl_doiosd()
is set to 1, so all writes to all DSs will be redone.
This bug would affect few, since use of mirrored pNFS servers
is rare and "writeverf" rarely changes. Normally "writeverf"
only changes when a NFS server reboots.
MFC after: 2 weeks
Without this patch, the KASSERT(must_commit == 0,..) can be
triggered by the writeverf in the Direct I/O write reply changing.
This is not a situation that should cause a panic(). Correct
handling is to ignore the change in "writeverf" for Direct
I/O, since it is done with NFSWRITE_FILESYNC.
This patch modifies the semantics of the "must_commit"
argument slightly, allowing an initial value of 2 to indicate
that a change in "writeverf" should be ignored.
It also fixes the KASSERT()s.
This bug would affect few, since Direct I/O is not enabled
by default and "writeverf" rarely changes. Normally "writeverf"
only changes when a NFS server reboots, however I found the
bug when testing against a Linux 5.15.1 kernel nfsd, which
replied to a NFSWRITE_FILESYNC write with a "writeverf" of all
0x0 bytes.
MFC after: 2 weeks
There were two places where the client code did not check
for a parse error return from nfsrv_getattrbits().
This patch fixes both of these cases.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260272
MFC after: 2 weeks
The sanity check for tag length in a callback request
was broken in two ways:
It checked for a negative value, but not a large positive
value.
It did not set taglen to -1, to indicate to the code that
it should not be used.
This patch fixes both of these issues.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260266
MFC after: 2 weeks
With various firmware files used by graphics and wireless drivers
we are exceeding the current 32 character module name (file path
in kldxref) length.
In order to overcome this issue bump it to the maximum path length
for the next version.
To be able to MFC provide backward compat support for another version
of the struct as the offsets for the second half change due to the
array size increase.
MAXMODNAME being defined to MAXPATHLEN needs param.h to be
included first. With only 7 modules (or LinuxKPI module.h) not
doing that adjust them rather than including param.h in module.h [1].
Reported by: Greg V (greg unrelenting.technology)
Sponsored by: The FreeBSD Foundation
Suggested by: imp [1]
MFC after: 10 days
Reviewed by: imp (and others to different level)
Differential Revision: https://reviews.freebsd.org/D32383
unionfs must pass VOP_VPUT_PAIR directly to the underlying FS so that
it can have a chance to manage any special locking considerations that
may be necessary. The unionfs implementation is based heavily on the
corresponding nullfs implementation.
Also note some outstanding issues with the unionfs locking scheme, as
a first step in fixing those issues in a future change.
Discussed with: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D33008
Also remove a couple of write-only variables found by the recent clang
update. No functional change intended.
Discussed with: kib
Differential Revision: https://reviews.freebsd.org/D33008
FUSE_COPY_FILE_RANGE instructs the server to write data to a file.
fusefs must invalidate any cached data within the written range.
PR: 260242
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33280
This function was always confusing, because it created an H-shaped
callgraph: two functions called in and left via different paths based on
which which called.
MFC after: 2 weeks
Correctly handle the situation where a FUSE server unlinks a file, then
creates a new file of a different type but with the same inode number.
Previously fuse_vnop_lookup in this situation would return EAGAIN. But
since it didn't call vgone(), the vnode couldn't be reused right away.
Fix this by immediately calling vgone() and reallocating a new vnode.
This problem can occur in three code paths, during VOP_LOOKUP,
VOP_SETATTR, or following FUSE_GETATTR, which usually happens during
VOP_GETATTR but can occur during other vops, too. Note that the correct
response actually doesn't depend on whether the entry cache has expired.
In fact, during VOP_LOOKUP, we can't even tell. Either it has expired
already, or else the vnode got reclaimed by vnlru.
Also, correct the error code during the VOP_SETATTR path.
PR: 258022
Reported by: chogata@moosefs.pro
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33283
When the Verify operation calls nfsv4_loadattr(), it provides
the "struct statfs" information that can be used for doing a
compare for FilesAvail, FilesFree, FilesTotal, SpaceAvail,
SpaceFree and SpaceTotal. However, the code erroneously
used the "struct nfsstatfs *" argument that is NULL.
This patch fixes these cases to use the correct argument
structure. For the case of FilesAvail, the code in
nfsv4_fillattr() was factored out into a separate function
called nfsv4_filesavail(), so that it can be called from
nfsv4_loadattr() as well as nfsv4_fillattr().
In fact, most of the code in nfsv4_filesavail() is old
OpenBSD code that does not build/run on FreeBSD, but I
left it in place, in case it is of some use someday.
I am not aware of any extant NFSv4 client that does Verify
on these attributes.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260176
MFC after: 2 weeks
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.
The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260111
MFC after: 2 weeks
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260076
MFC after: 2 weeks
This prevents a kernel panic on a damaged ext2 superblock.
PR: 259107
Reported by: Robert Morris <rtm@lcs.mit.edu>
Differential Revision: https://reviews.freebsd.org/D33029
When using cached attributes, whether or not the data cache is enabled,
fusefs must update a file's atime whenever it reads from it, so long as
it wasn't mounted with -o noatime. Update it in-kernel, and flush it to
the server on close or during the next setattr operation.
The downside is that close() will now frequently trigger a FUSE_SETATTR
upcall. But if you care about performance, you should be using
-o noatime anyway.
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33145
When copy_file_range extends a file, it must update the cached file
size.
MFC after: 2 weeks
Reviewed by: rmacklem, pfg
Differential Revision: https://reviews.freebsd.org/D33151
For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.
The LookupOpen RPC reduces the number of Open RPCs
needed. Unfortunately, it breaks certain software
builds over NFS, so disable it until this is fixed.
The LookupOpen RPC is only used for NFSv4.1/4.2
mounts when the "oneopenown" mount option is
specified, so this should not affect many users.
This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33127
The slotid in the Sequence reply must be the same as
in the request. Check that it is the same and log
a console message if it is not, plus set it to the
correct value.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260071
MFC after: 2 weeks
The check for the original len being >= retlen needs to
be done before the "if (nd->nd_repstat == 0)" code, so
that it can be reported as too small.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260046
MFC after: 2 weeks
For a LayoutReturn when using the Flexible File Layout,
error reports may be provided in the request.
Sanity check the size of these error reports and
check that they exist before calling nfsrv_flexlayouterr().
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260012
MFC after: 2 weeks
This avoids needing to inspect the mount point every time.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D33125