The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_request().
Future commits will do the same for other functions.
The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.
This commit gets rid of "stuff" for nfscl_postop_attr().
Future commits will do the same for other functions.
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.
This patch modifies nfscl_wcc_data() to optionally
acquire the file's size, for use with an AppendWrite.
Although the "stuff" arguments are always NULL
(these were used for the Mac OSX port and should be
cleared out someday), make the argument to
nfscl_wcc_data() explicitly NULL for clarity.
This patch does not cause any semantics change until
the AppendWrite is added in a future commit.
For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.
Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct before doing the Write operation.
This patch prepares the NFSv4 client for such an
RPC, which will be added in a future commit.
This patch does not cause any semantics change.
The function nfscl_getcookie(), which is essentially the
same as ncl_getcookie(), is never called, so delete it.
This is probably cruft left over from the port of the
NFSv4 code to FreeBSD several years ago.
Found while modifying the code to better use the
directory offset cookies.
MFC after: 2 weeks
The UFS and ZFS file systems only support Allow/Deny ACEs
in the NFSv4 ACLs. This patch does not allow the server
to parse Audit/Alarm ACEs. The NFSv4 client is still
allowed to pase Audit/Alarm ACEs, since non-FreeBSD NFSv4
servers may use them.
This patch should not have a significant effect, since the
UFS and ZFS file systems will not handle these ACEs anyhow.
It simply serves as an additional "safety belt" for the
NFSv4 server.
MFC after: 2 weeks
This reverts commit 0fa074b53e.
I now see that the implementation of the "dacl" operation
requires that the NFSv4 server to "automatic inheritance"
and I do not plan on doing this. As such, this patch is
harmless, but unneeded.
This reverts commit f10dc28ec2.
The client should still be able to getfacl
audit and alarm ACEs, for non-FreeBSD NFSv4 servers.
A patch that only disables audit/alarm for the server
side will be committed to replace this patch.
Before this callouts were scheduled twice a seconds even if nfsd was
never used. This reduces the rate to ~1Hz and only after nfsd first
started.
MFC after: 2 weeks
FreeBSD only supports Allow/Deny ACEs in NFSv4 ACLs.
As such, it does not make sense to parse Audit/Alarm
ACEs. Modify nfsrv_dissectace() so that it returns
NFSERR_ATTRNOTSUPP if an Audit/Alarm ACE is found in
the ACL being parsed. The code has been #ifdef notnow'd,
since Audit/Alarm ACEs might be supported someday.
This should not have significant impact, since FreeBSD
reports to clients that only Allow/Deny ACEs are
supported and an attempt to set one would have failed
anyhow.
MFC after: 2 weeks
NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.
This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.
Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.
MFC after: 2 weeks
In NFSv2, the directory cookie was 32-bits. NFSv3 widened it to
64-bits and SVN r22521 widened the corresponding argument in
VOP_READDIR, but FreeBSD's NFS server continued to treat the cookies as
32-bits, and 0-extended to fill the field on the wire. Nobody ever
noticed, because every in-tree file system generates cookies that fit
comfortably within 32-bits.
Also, have better type safety for txdr_hyper. Turn it into an inline
function that type-checks its arguments. Prevents warnings about
shift-count-overflow.
PR: 260375
MFC after: 2 weeks
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
When the Verify operation calls nfsv4_loadattr(), it provides
the "struct statfs" information that can be used for doing a
compare for FilesAvail, FilesFree, FilesTotal, SpaceAvail,
SpaceFree and SpaceTotal. However, the code erroneously
used the "struct nfsstatfs *" argument that is NULL.
This patch fixes these cases to use the correct argument
structure. For the case of FilesAvail, the code in
nfsv4_fillattr() was factored out into a separate function
called nfsv4_filesavail(), so that it can be called from
nfsv4_loadattr() as well as nfsv4_fillattr().
In fact, most of the code in nfsv4_filesavail() is old
OpenBSD code that does not build/run on FreeBSD, but I
left it in place, in case it is of some use someday.
I am not aware of any extant NFSv4 client that does Verify
on these attributes.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260176
MFC after: 2 weeks
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.
The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260111
MFC after: 2 weeks
For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.
The slotid in the Sequence reply must be the same as
in the request. Check that it is the same and log
a console message if it is not, plus set it to the
correct value.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260071
MFC after: 2 weeks
For pNFS, Layouts are issued by the server to indicate
where a file's data resides on the DS(s). This patch
adds counters for how many layouts are allocated to
the nfsstatsv1 structure, using two reserved fields.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error. This patch adds the same to the
FreeBSD NFSv4.2 pNFS client, to maintain Linux compatible
behaviour, particlularily for non-FreeBSD pNFS servers.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.
This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.
Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.
MFC after: 2 weeks
Since the NFS Space_available and Files_available are unsigned,
the NFSv3 server sets them to 0 when negative, so that they
do not appear to be large positive values for non-FreeBSD clients.
This patch fixes the NFSv4 server to do the same.
Found during a recent IEFT NFSv4 working group testing event.
MFC after: 2 weeks
When a mount with the "pnfs" and "nconnect" options specified
does an I/O operation, it erroneously uses a TCP connection
to the MDS when it is meant to be a DS operation and, as such,
needs to use a TCP connection to the DS. This patch fixes this.
When the "pnfs" and "nconnect" options are specified for a
NFSv4.1/4.2 mount, there probably should be N connections
established to each DS for I/O RPCs. This is a fair amount
of work and may be done in a future commit.
This problem was found during a recent IETF NFSv4 working
group testing event.
MFC after: 2 weeks
When a forced dismount is done and the "nconnect" mount
option was used, the additional connections must be closed.
This patch does that.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
When a forced dismount is done and delegations are being
issued by the server (disabled by default for FreeBSD
servers), the delegation structure is free'd before the
loop calling vflush(). This could result in a use after
free of the delegation structure.
This patch changes the code so that the delegation
structures are not free'd until after the vflush()
loop for forced dismounts.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
For NFS RPCs that receive a NFSERR_DELAY reply, the delay time
is initially 1sec and then increases exponentially to NFS_TRYLATERDEL.
It was found that this delay time is excessive for some NFSv4
servers, which work well with a 1msec delay.
A 1sec delay resulted in very slow performance for Remove and
Rename when delegations and pNFS were enabled.
This patch decreases the initial delay time to 1msec.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
For pNFS servers that specify that Layouts are to be returned
upon close, they may expect that LayoutReturn to happen before
the associated Close.
This patch modifies the NFSv4.1/4.2 pNFS client so that this
is done. This only affects a pNFS mount against a non-FreeBSD
NFSv4.1/4.2 server that specifies return_on_close in LayoutGet
replies.
Found during a recent IETF NFSv4 working group testing event.
MFC after: 2 weeks
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
Without this patch, if a NFSv4.1/4.2 server replies NFSERR_DELAY to
a Close operation, the client loops retrying the Close while holding
a shared lock on the clientID. This shared lock blocks returns of
delegations, even though the server has issued a CB_RECALL to request
the delegation return.
This patch delays doing a retry of a Close that received a reply of
NFSERR_DELAY until after the shared lock on the clientID is released,
for NFSv4.1/4.2. To fix this for NFSv4.0 would be very difficult and
since the only known NFSv4 server to reply NFSERR_DELAY to Close only
does NFSv4.1/4.2, this fix is hoped to be sufficient.
This problem was detected during a recent IETF working group NFSv4
testing event.
MFC after: 2 week
This patch adds a new argument to nfscl_tryclose() to indicate
whether or not it should loop when a NFSERR_DELAY reply is received
from the NFSv4 server. Since this new argument is always passed in
as "true" at this time, no semantics change should occur.
This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.
MFC after: 2 week
This patch factors the unlinking of the nfsclopen structure out of
nfscl_freeopen() into a separate function called nfscl_unlinkopen().
It also adds a new argument to nfscl_freeopen() to conditionally do
the unlink. Since this new argument is always passed in as "true"
at this time, no semantics change should occur.
This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.
MFC after: 2 week
Without this patch, if a pNFS read layout has already been acquired
for a file, writes would be redirected to the Metadata Server (MDS),
because nfscl_getlayout() would not acquire a read/write layout for
the file. This happened because there was no "mode" argument to
nfscl_getlayout() to indicate whether reading or writing was being done.
Since doing I/O through the Metadata Server is not encouraged for some
pNFS servers, it is preferable to get a read/write layout for writes
instead of redirecting the write to the MDS.
This patch adds a access mode argument to nfscl_getlayout() and
nfsrpc_getlayout(), so that nfscl_getlayout() knows to acquire a read/write
layout for writing, even if a read layout has already been acquired.
This patch only affects NFSv4.1/4.2 client behaviour when pNFS ("pnfs" mount
option against a server that supports pNFS) is in use.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 2 week
Without this patch, it is possible for a process doing an NFSv4
Open/create of a file to block to allow another process
to acquire the exclusive lock on the clientID when holding
a shared lock on the clientID. As such, both processes
deadlock, with one wanting the exclusive lock, while the
other holds the shared lock. This deadlock is unlikely to occur
unless delegations are in use on the NFSv4 mount.
This patch fixes the problem by not deferring to the process
waiting for the exclusive lock when a shared lock (reference cnt)
is already held by the process.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 1 week
As of commit 103b207536, the NFSv4.2 server will limit the size
of a Copy operation based upon a 1 second timeout. The Linux 5.2
kernel server also limits Copy operation size to 4Mbytes.
As such, the NFSv4.2 client can attempt a large Copy without
resulting in a long RPC RTT for these servers.
This patch changes vfs.nfs.maxcopyrange to 64bits and sets
the default to the maximum possible size of SSIZE_MAX, since
a larger size makes the Copy operation more efficient and
allows for copying to complete with fewer RPCs.
The sysctl may be need to be made smaller for other non-FreeBSD
NFSv4.2 servers.
MFC after: 2 weeks
This patch adds a VOP_DEALLOCATE() to the NFS client.
For NFSv4.2 servers that support the Deallocate operation,
it is used. Otherwise, it falls back on calling
vop_stddeallocate().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31640
The recently added VOP_DEALLOCATE(9) VOP call allows
implementation of the Deallocate NFSv4.2 operation.
Since the Deallocate operation is a single succeed/fail
operation, the call to VOP_DEALLOCATE(9) loops so long
as progress is being made. It calls maybe_yield()
between loop iterations to allow other processes
to preempt it.
Where RFC 7862 underspecifies behaviour, the code
is written to be Linux NFSv4.2 server compatible.
Reviewed by: khng
Differential Revision: https://reviews.freebsd.org/D31624
This patch adds a Lookup+Open compound RPC to the NFSv4.1/4.2
NFS client, which can be used by nfs_lookup() so that a
subsequent Open RPC is not required.
It uses the cn_flags OPENREAD, OPENWRITE added by commit c18c74a87c.
This reduced the number of RPCs by about 15% for a kernel
build over NFS.
For now, use of Lookup+Open is only done when the "oneopenown"
mount option is used. It may be possible for Lookup+Open to
be used for non-oneopenown NFSv4.1/4.2 mounts, but that will
require extensive further testing to determine if it works.
While here, I've added the changes to the nfscommon module
that are needed to implement the Deallocate NFSv4.2 operation.
This avoids needing another cycle of changes to the internal
KAPI between the NFS modules.
This commit has changed the internal KAPI between the NFS
modules and, as such, all need to be rebuilt from sources.
I have not bumped __FreeBSD_version, since it was bumped a
few days ago.
Since MAXPHYS now allows the FreeBSD NFS client
to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio
so that the maximum NFS server I/O size can be set up to 1Mbyte.
The Linux NFS client can also do 1Mbyte I/O operations.
The default of 128Kbytes for the maximum I/O size has
not been changed for two reasons:
- kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O
- The limited benchmarking I can do actually shows a drop in I/O rate
when the I/O size is above 256Kbytes.
However, daveb@spectralogic.com reports seeing an increase
in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30826
Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.
A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.
One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30970
When the setting of kern.ipc.maxsockbuf is less than what is
desired for I/O based on vfs.maxbcachebuf and vfs.nfs.bufpackets,
a console message of "Consider increasing kern.ipc.maxsockbuf".
is printed.
This patch modifies the message to provide a suggested value
for kern.ipc.maxsockbuf.
Note that the setting is only needed when the NFS rsize/wsize
is set to vfs.maxbcachebuf.
While here, make nfs_bufpackets global, so that it can be used
by a future patch that adds a sysctl to set the NFS server's
maximum I/O size. Also, remove "sizeof(u_int32_t)" from the maximum
packet length, since NFS_MAXXDR is already an "overestimate"
of the actual length.
MFC after: 2 weeks
When the NFSv4.0 client was implemented, acquisition of a clientid
via SetClientID/SetClientIDConfirm was done upon the first Open,
since that was when it was needed. NFSv4.1/4.2 acquires the clientid
during mount (via ExchangeID/CreateSession), since the associated
session is required during mount.
This patch modifies the NFSv4.0 mount so that it acquires the
clientid during mount. This simplifies the code and makes it
easy to implement "find the highest minor version supported by
the NFSv4 server", which will be done for the default minorversion
in a future commit.
The "start_renewthread" argument for nfscl_getcl() is replaced
by "tryminvers", which will be used by the aforementioned
future commit.
MFC after: 2 weeks
Commit d224f05fcf pre-parsed the next operation number for
the put file handle operations. This patch uses this next
operation number, plus the type of the file handle being set by
the put file handle operation, to implement the rules in RFC5661
Sec. 2.6 with respect to replying NFSERR_WRONGSEC.
This patch also adds a check to see if NFSERR_WRONGSEC should be
replied when about to perform Lookup, Lookupp or Open with a file
name component, so that the NFSERR_WRONGSEC reply is done for
these operations, as required by RFC5661 Sec. 2.6.
This patch does not have any practical effect for the FreeBSD NFSv4
client and I believe that the same is true for the Linux client,
since NFSERR_WRONGSEC is considered a fatal error at this time.
MFC after: 2 weeks
The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts. Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.
I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there. I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().
Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.
MFC after: 2 weeks
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
This patch adds a table of hash lists for the opens, hashed on
file handle. This table will be used by future commits to
search for an open based on file handle more efficiently.
MFC after: 2 weeks
Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.
This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.
MFC after: 2 weeks