NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.
This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.
Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.
MFC after: 2 weeks
I thought that these new auth_stat values had been agreed
upon by the IETF NFSv4 working group, but that no longer
is the case. As such, delete them and use AUTH_TOOWEAK
instead. Leave the code that uses these new auth_stat
values in the sources #ifdef notnow, in case they are
defined in the future.
MFC after: 1 week
The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.
PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
In NFSv2, the directory cookie was 32-bits. NFSv3 widened it to
64-bits and SVN r22521 widened the corresponding argument in
VOP_READDIR, but FreeBSD's NFS server continued to treat the cookies as
32-bits, and 0-extended to fill the field on the wire. Nobody ever
noticed, because every in-tree file system generates cookies that fit
comfortably within 32-bits.
Also, have better type safety for txdr_hyper. Turn it into an inline
function that type-checks its arguments. Prevents warnings about
shift-count-overflow.
PR: 260375
MFC after: 2 weeks
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
This patch decrements maxcnt by the appropriate
number of bytes during parsing and checks to see
if there is data remaining. If not, it just returns
from nfsrv_flexlayouterr() without further processing.
This prevents the tl pointer from running off the end
of the error data pointed at by layp, if there are
flaws in the data.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260293
MFC after: 2 weeks
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260076
MFC after: 2 weeks
For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.
The check for the original len being >= retlen needs to
be done before the "if (nd->nd_repstat == 0)" code, so
that it can be reported as too small.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260046
MFC after: 2 weeks
For a LayoutReturn when using the Flexible File Layout,
error reports may be provided in the request.
Sanity check the size of these error reports and
check that they exist before calling nfsrv_flexlayouterr().
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260012
MFC after: 2 weeks
For pNFS, Layouts are issued by the server to indicate
where a file's data resides on the DS(s). This patch
adds counters for how many layouts are allocated to
the nfsstatsv1 structure, using two reserved fields.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.
This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.
Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the server to
tell it about the error and keeps retrying, doing repeated
LayoutGet and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up.
For a mirrored server configuration, the first mirror that
ran out of space was taken offline. This does not make
much sense, since the other mirror(s) will run out of space
soon and the fix is a manual cleanup up disk space.
This patch changes the pNFS server to not disable a mirror
for the mirrored case when this occurs.
Further work is needed, since the Linux client expects the
MDS to reply NFSERR_NOSPC to LayoutGets once the DS is out
of space. Without this further change, the above mentioned
looping occurs.
Found during a recent IEFT NFSv4 working group testing event.
MFC after: 2 weeks
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.
This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.
The VOP_ALLOCATE.9 man page will be patched separately.
Reviewed by: khng, kib
Differential Revision: https://reviews.freebsd.org/D32865
Some exported file systems, such as ZFS ones, cannot do VOP_ALLOCATE().
Since an NFSv4.2 server must either support the Allocate operation for
all file systems or not support it at all, define a sysctl called
vfs.nfsd.enable_v42allocate to enable the Allocate operation.
This sysctl is false by default and can only be set true if all
exported file systems (or all DSs for a pNFS server) can perform
VOP_ALLOCATE().
Unfortunately, there is no way to know if a ZFS file system will
be exported once the nfsd is operational, even if there are none
exported when the nfsd is started up, so enabling Allocate must
be done manually for a server configuration.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 2 weeks
For a pNFS server configuration, an NFSv4.2 Deallocate operation
is proxied to the DS(s). The code that parsed the reply for the
proxy RPC is broken and did not process the pre-operation attributes.
This patch fixes this problem.
This bug would only affect pNFS servers built from recent main/FreeBSD14
sources.
By default NFS server reports as scope and owner major the host UUID
value and zero for owner minor. It works good in case of standalone
server. But in case of CARP-based HA cluster failover the values
should remain persistent, otherwise some clients like VMware ESXi
get confused by the change and fail to reconnect automatically.
The patch makes server scope, major owner and minor owner values
configurable via sysctls. If not set (by default) the host UUID
value is still used.
Reviewed by: rmacklem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31952
Although it is not specified in the RFCs, the concept that
the NFSv4 server should reply to an RPC request within a
reasonable time is accepted practice within the NFSv4 community.
Without this patch, the NFSv4.2 server attempts to reply to
a Copy operation within 1 second by limiting the copy to
vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at
best, given the large variation in I/O subsystem performance.
This patch uses the COPY_FILE_RANGE_TIMEO1SEC flag added by
commit c5128c48df to limit the reply time for a Copy
operation to approximately 1 second.
MFC after: 2 weeks
The NFSv4.2 Deallocate operation loops on VOP_DEALLOCATE()
while progress is being made (remaining length decreasing).
This patch changes the loop on VOP_ALLOCATE() for the NFSv4.2
Allocate operation do the same, instead of stopping after
an arbitrary 20 iterations.
MFC after: 2 weeks
The recently added VOP_DEALLOCATE(9) VOP call allows
implementation of the Deallocate NFSv4.2 operation.
Since the Deallocate operation is a single succeed/fail
operation, the call to VOP_DEALLOCATE(9) loops so long
as progress is being made. It calls maybe_yield()
between loop iterations to allow other processes
to preempt it.
Where RFC 7862 underspecifies behaviour, the code
is written to be Linux NFSv4.2 server compatible.
Reviewed by: khng
Differential Revision: https://reviews.freebsd.org/D31624
The NFSv4.2 Allocate operation sanity checks the aa_offset
and aa_length arguments. Since they are assigned to variables
of type off_t (signed) it was possible for them to be negative.
It was also possible for aa_offset+aa_length to exceed OFF_MAX
when stored in lo_end, which is uint64_t.
This patch adds checks for these cases to the sanity check.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31511
Since MAXPHYS now allows the FreeBSD NFS client
to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio
so that the maximum NFS server I/O size can be set up to 1Mbyte.
The Linux NFS client can also do 1Mbyte I/O operations.
The default of 128Kbytes for the maximum I/O size has
not been changed for two reasons:
- kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O
- The limited benchmarking I can do actually shows a drop in I/O rate
when the I/O size is above 256Kbytes.
However, daveb@spectralogic.com reports seeing an increase
in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30826
Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.
A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.
One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30970
Michael Dexter <editor@callfortesting.org> reported
a crash in FreeNAS, where the first argument to
clnt_bck_svccall() was no longer valid.
This argument is a pointer to the callback CLIENT
structure, which is free'd when the associated
NFSv4 ClientID is free'd.
This appears to have occurred because a callback
reply was still in the socket receive queue when
the CLIENT structure was free'd.
This patch acquires a reference count on the CLIENT
that is not CLNT_RELEASE()'d until the socket structure
is destroyed. This should guarantee that the CLIENT
structure is still valid when clnt_bck_svccall() is called.
It also adds a check for closed or closing to
clnt_bck_svccall() so that it will not process the callback
RPC reply message after the ClientID is free'd.
Comments by: mav
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30153
Commit d224f05fcf pre-parsed the next operation number for
the put file handle operations. This patch uses this next
operation number, plus the type of the file handle being set by
the put file handle operation, to implement the rules in RFC5661
Sec. 2.6 with respect to replying NFSERR_WRONGSEC.
This patch also adds a check to see if NFSERR_WRONGSEC should be
replied when about to perform Lookup, Lookupp or Open with a file
name component, so that the NFSERR_WRONGSEC reply is done for
these operations, as required by RFC5661 Sec. 2.6.
This patch does not have any practical effect for the FreeBSD NFSv4
client and I believe that the same is true for the Linux client,
since NFSERR_WRONGSEC is considered a fatal error at this time.
MFC after: 2 weeks
Commit 947bd2479b added support for the Secinfo_no_name operation.
When a non-exported file system is being traversed, the list of
security flavors is empty. It turns out that the Linux client
mount attempt fails when the security flavors list in the
Secinfo_no_name reply is empty.
This patch modifies Secinfo/Secinfo_no_name so that it replies
with all four security flavors when the list is empty.
This fixes Linux NFSv4.1/4.2 mounts when the file system at
the NFSv4 root (as specified on a V4: exports(5) line) is
not exported.
MFC after: 2 weeks
RFC5661 Sec. 2.6 specifies when a NFSERR_WRONGSEC error reply can be done.
For the four operations PutFH, PutrootFH, PutpublicFH and RestoreFH,
NFSERR_WRONGSEC can or cannot be replied, depending upon what operation
follows one of these operations in the compound.
This patch modifies nfsrvd_compound() so that it parses the next operation
number before executing any of the above four operations, storing it in
"nextop".
A future commit will implement use of "nextop" to decide if NFSERR_WRONGSEC
can be replied for the above four operations.
This commit should not change the semantics of performing the compound RPC.
MFC after: 2 weeks
Without this patch, nfsd_checkrootexp() returns failure
and then the NFSv4 operation would reply NFSERR_WRONGSEC.
RFC5661 Sec. 2.6 only allows a few NFSv4 operations, none
of which call nfsv4_checktootexp(), to return NFSERR_WRONGSEC.
This patch modifies nfsd_checkrootexp() to return the
error instead of a boolean and sets the returned error to an RPC
layer AUTH_ERR, as discussed on nfsv4@ietf.org.
The patch also fixes nfsd_errmap() so that the pseudo
error NFSERR_AUTHERR is handled correctly such that an RPC layer
AUTH_ERR is replied to the NFSv4 client.
The two new "enum auth_stat" values have not yet been assigned
by IANA, but are the expected next two values.
The effect on extant NFSv4 clients of this change appears
limited to reporting a different failure error when a
mount that does not use adequate security is attempted.
MFC after: 2 weeks
There are several NFSv4.1/4.2 server operation functions which
have unneeded checks for the NFSv4 root being set up.
The checks are not needed because the operations always follow
a Sequence operation, which performs the check.
This patch deletes these checks, simplifying the code so
that a future patch that fixes the checks to conform with
RFC5661 Sec. 2.6 will be less extension.
MFC after: 2 weeks
The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts. Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.
I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there. I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().
Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.
MFC after: 2 weeks
Commit b3d4c70dc6 added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.
This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.
MFC after: 2 weeks
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.
This patch only affects mounts from Linux clients when delegations
are enabled on the server.
MFC after: 2 weeks
Commit 7a606f280a allowed the server to do retries of CB_RECALL
callbacks every couple of seconds. This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.
This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.
This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.
MFC after: 2 weeks
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156
Commit 4281bfec36 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.
To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case. For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.
Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled. This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.
MFC after: 2 weeks
When the NFSv4.1/4.2 server does a callback to a client
on the back channel, it will use a session slot in the
back channel session. If the back channel has failed,
the callback will fail and, without this patch, the
session slot will not be released.
As more callbacks are attempted, all session slots
can become busy and then the nfsd thread gets stuck
waiting for a back channel session slot.
This patch frees the session slot upon callback
failure to avoid this problem.
Without this patch, the problem can be avoided by leaving
delegations disabled in the NFS server.
MFC after: 2 weeks
At a recent testing event I found out that I had misinterpreted
RFC5661 where it describes the stripe size in the File Layout's
nfl_util field. This patch fixes the pNFS File Layout server
so that it returns the correct value to the NFSv4.1/4.2 pNFS
enabled client.
This affects almost no one, since pNFS server configurations
are rare and the extant pNFS aware NFS clients seemed to
function correctly despite the erroneous stripe size.
It *might* be needed for correct behaviour if a recent
Linux client mounts a FreeBSD pNFS server configuration
that is using File Layout (non-mirrored configuration).
MFC after: 2 weeks
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
The FreeBSD server failec to reply using the cached
reply in the session slot when an RPC was retried on
the session slot, as indicated by same slot sequence#.
This patch fixes this. It should also fix a similar
failure for NFSv4.0 mounts, when the sequence# in
the open/lock_owner requires a reply be done from
an entry locked into the DRC.
This fix affects the fairly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation. Note that retries only occur after the
client has needed to create a new TCP connection.
MFC after: 2 weeks
Commit 01ae8969a9 stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. This will fix the back channel, but the
first attempt at a callback like CB_RECALL will already have
failed. Without this patch, a CB_RECALL will not be retried
and that can result in a 5 minute delay until the delegation
times out.
This patch modifies the code so that it will retry the
CB_RECALL every couple of seconds, often avoiding the
5 minute delay.
This is not critical for correct behaviour, but avoids
the 5 minute delay for the case where the Linux client
re-binds the back channel via BindConnectionToSession.
MFC after: 2 weeks
Commit 01ae8969a9 stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. It will do this for every RPC
reply until it no longer sees the flag.
Without that patch, this will happen until the client does
an Open, which will clear LCL_CBDOWN.
This patch clears LCL_CBDOWN right away, so that
NFSV4SEQ_CBPATHDOWN will no longer be sent to the client
in Sequence replies and the Linux client will not repeat
the BindConnectionToSession RPCs.
This is not critical for correct behaviour, but reduces
RPC overheads for cases where the Open will not be done
for a while.
MFC after: 2 weeks
The NFSv4.1 (and 4.2 on 13) server incorrectly binds
a new TCP connection to the back channel when first
used by an RPC with a Sequence op in it (almost all of them).
RFC5661 specifies that only the fore channel should be bound.
This was done because early clients (including FreeBSD)
did not do the required BindConnectionToSession RPC.
Unfortunately, this breaks the Linux client when the
"nconnects" mount option is used, since the server
may do a callback on the incorrect TCP connection.
This patch converts the server behaviour to that
required by the RFC. It also makes the server test/indicate
failure of the back channel more aggressively.
Until this patch is applied to the server, the
"nconnects" mount option is not recommended for a Linux
NFSv4.1/4.2 client mount to the FreeBSD server.
Reported by: bcodding@redhat.com
Tested by: bcodding@redhat.com
PR: 254560
MFC after: 1 week
Apply VOP_VPUT_PAIR() to the end of vnode operations after the
VOP_MKNOD(), VOP_MKDIR(), VOP_LINK(), VOP_SYMLINK(), VOP_CREATE().
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of
the arguments for the M_EXTPG case.
Suggested by: kib
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28053