When the MDS of a pNFS service receives an Open/Create
and the file already exists, it must do a Setattr of
size == 0. Without this patch, this was eroneously
done via a VOP_SETAATR() call, which would set the
length of the MDS file to 0 (which is already is,
since all data lives on the DSs).
This patch fixes the problem by doing a nfsvno_setattr()
instead of VOP_SETATTR(), which knows to do a proxied
Setattr on the DSs.
For a non-pNFS server, the change has no effect, since
nfsvno_setattr() only does a VOP_SETATTR() for that case.
This was found during a recent IETF NFSv4 testing event.
MFC after: 2 weeks
Without this patch the NFSv4.1/4.2 server erroneously
always frees session slot zero for callbacks. This only
affects 4.1/4.2 mounts if the server has delegations
enabled or is a pNFS configuration. Even for those
cases, the effect is mainly to only use slot 0 for
callbacks, serializing all of them. There is a slight
chance that callbacks will fail if the client performs
them in a different order than received on the TCP
connection.
If this bug affects your server, you will see console
messages like:
newnfs_request: Bad session slot
This patch fixes the problem. Found during a recent
IETF NFSv4 testing event.
PR: 263728
MFC after: 2 weeks
Robert Morris reported that, for the case of SecinfoNoname
with the Parent option, providing a non-directory could
cause a crash.
This patch adds a sanity check for v_type == VDIR for
this case, to avoid the crash.
Reported by: rtm@lcs.mit.edu
PR: 260300
MFC after: 2 weeks
The Fsinfo RPC is exempt from the check for
Kerberized NFS being required, as recommended
by RFC2623. However, there is no reason to
exempt Fsinfo from the requirement to use TLS.
This patch fixes the code so that the exemption
only applies to Kerberized NFS and not
NFS-over-TLS.
This only affects NFS-over-TLS for an NFSv3
mount when it is required, but the client does
not do so.
MFC after: 1 month
The ESXi NFSv4.1 client bogusly sends the wrong value
for the csa_sequence argument for a Create_session operation.
RFC8881 requires this value to be the same as the sequence
reply from the ExchangeID operation most recently done for
the client ID.
Without this patch, the server replies NFSERR_STALECLIENTID,
which is the correct response for an NFSv4.0 SetClientIDConfirm
but is not the correct error for NFSv4.1/4.2, which is
specified as NFSERR_SEQMISORDERED in RFC8881.
This patch fixes this.
This change does not fix the issue reported in the PR, where
the ESXi client loops, attempting ExchangeID/Create_session
repeatedly.
Reported by: asomers
Tested by: asomers
PR: 261291
MFC after: 1 week
Commit b0b7d978b6 changed the NFSv4 server's default
behaviour to check the file's mode or ACL for permission to
open the file, to be Linux and Solaris compatible.
However, it turns out that Linux makes an exception for
the case of Claim_delegate_cur(_fh).
When a NFSv4 client is returning a delegation, it must
acquire Opens against the server to replace the ones
done locally in the client. The client does this via
an Open operation with Claim_delegate_cur(_fh). If
this operation fails, due to a change to the file's
mode or ACL after the delegation was issued, the
client does not have any way to retain the open.
As such, the Linux client allows the file's owner
to perform an Open with Claim_delegate_cur(_fh)
no matter what the mode or ACL allows.
This patch makes the FreeBSD server allow this case,
to be Linux compatible.
This patch only affects the case where delegations
are enabled, which is not the default.
MFC after: 2 weeks
The UFS and ZFS file systems only support Allow/Deny ACEs
in the NFSv4 ACLs. This patch does not allow the server
to parse Audit/Alarm ACEs. The NFSv4 client is still
allowed to pase Audit/Alarm ACEs, since non-FreeBSD NFSv4
servers may use them.
This patch should not have a significant effect, since the
UFS and ZFS file systems will not handle these ACEs anyhow.
It simply serves as an additional "safety belt" for the
NFSv4 server.
MFC after: 2 weeks
This reverts commit 0fa074b53e.
I now see that the implementation of the "dacl" operation
requires that the NFSv4 server to "automatic inheritance"
and I do not plan on doing this. As such, this patch is
harmless, but unneeded.
Before this callouts were scheduled twice a seconds even if nfsd was
never used. This reduces the rate to ~1Hz and only after nfsd first
started.
MFC after: 2 weeks
NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.
This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.
Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.
MFC after: 2 weeks
I thought that these new auth_stat values had been agreed
upon by the IETF NFSv4 working group, but that no longer
is the case. As such, delete them and use AUTH_TOOWEAK
instead. Leave the code that uses these new auth_stat
values in the sources #ifdef notnow, in case they are
defined in the future.
MFC after: 1 week
The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.
PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
In NFSv2, the directory cookie was 32-bits. NFSv3 widened it to
64-bits and SVN r22521 widened the corresponding argument in
VOP_READDIR, but FreeBSD's NFS server continued to treat the cookies as
32-bits, and 0-extended to fill the field on the wire. Nobody ever
noticed, because every in-tree file system generates cookies that fit
comfortably within 32-bits.
Also, have better type safety for txdr_hyper. Turn it into an inline
function that type-checks its arguments. Prevents warnings about
shift-count-overflow.
PR: 260375
MFC after: 2 weeks
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
This patch decrements maxcnt by the appropriate
number of bytes during parsing and checks to see
if there is data remaining. If not, it just returns
from nfsrv_flexlayouterr() without further processing.
This prevents the tl pointer from running off the end
of the error data pointed at by layp, if there are
flaws in the data.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260293
MFC after: 2 weeks
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260076
MFC after: 2 weeks
For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.
The check for the original len being >= retlen needs to
be done before the "if (nd->nd_repstat == 0)" code, so
that it can be reported as too small.
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260046
MFC after: 2 weeks
For a LayoutReturn when using the Flexible File Layout,
error reports may be provided in the request.
Sanity check the size of these error reports and
check that they exist before calling nfsrv_flexlayouterr().
Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260012
MFC after: 2 weeks
For pNFS, Layouts are issued by the server to indicate
where a file's data resides on the DS(s). This patch
adds counters for how many layouts are allocated to
the nfsstatsv1 structure, using two reserved fields.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.
This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.
Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.
MFC after: 2 weeks
If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the server to
tell it about the error and keeps retrying, doing repeated
LayoutGet and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up.
For a mirrored server configuration, the first mirror that
ran out of space was taken offline. This does not make
much sense, since the other mirror(s) will run out of space
soon and the fix is a manual cleanup up disk space.
This patch changes the pNFS server to not disable a mirror
for the mirrored case when this occurs.
Further work is needed, since the Linux client expects the
MDS to reply NFSERR_NOSPC to LayoutGets once the DS is out
of space. Without this further change, the above mentioned
looping occurs.
Found during a recent IEFT NFSv4 working group testing event.
MFC after: 2 weeks
When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.
This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.
The VOP_ALLOCATE.9 man page will be patched separately.
Reviewed by: khng, kib
Differential Revision: https://reviews.freebsd.org/D32865
Some exported file systems, such as ZFS ones, cannot do VOP_ALLOCATE().
Since an NFSv4.2 server must either support the Allocate operation for
all file systems or not support it at all, define a sysctl called
vfs.nfsd.enable_v42allocate to enable the Allocate operation.
This sysctl is false by default and can only be set true if all
exported file systems (or all DSs for a pNFS server) can perform
VOP_ALLOCATE().
Unfortunately, there is no way to know if a ZFS file system will
be exported once the nfsd is operational, even if there are none
exported when the nfsd is started up, so enabling Allocate must
be done manually for a server configuration.
This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.
MFC after: 2 weeks
For a pNFS server configuration, an NFSv4.2 Deallocate operation
is proxied to the DS(s). The code that parsed the reply for the
proxy RPC is broken and did not process the pre-operation attributes.
This patch fixes this problem.
This bug would only affect pNFS servers built from recent main/FreeBSD14
sources.
By default NFS server reports as scope and owner major the host UUID
value and zero for owner minor. It works good in case of standalone
server. But in case of CARP-based HA cluster failover the values
should remain persistent, otherwise some clients like VMware ESXi
get confused by the change and fail to reconnect automatically.
The patch makes server scope, major owner and minor owner values
configurable via sysctls. If not set (by default) the host UUID
value is still used.
Reviewed by: rmacklem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31952
Although it is not specified in the RFCs, the concept that
the NFSv4 server should reply to an RPC request within a
reasonable time is accepted practice within the NFSv4 community.
Without this patch, the NFSv4.2 server attempts to reply to
a Copy operation within 1 second by limiting the copy to
vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at
best, given the large variation in I/O subsystem performance.
This patch uses the COPY_FILE_RANGE_TIMEO1SEC flag added by
commit c5128c48df to limit the reply time for a Copy
operation to approximately 1 second.
MFC after: 2 weeks
The NFSv4.2 Deallocate operation loops on VOP_DEALLOCATE()
while progress is being made (remaining length decreasing).
This patch changes the loop on VOP_ALLOCATE() for the NFSv4.2
Allocate operation do the same, instead of stopping after
an arbitrary 20 iterations.
MFC after: 2 weeks
The recently added VOP_DEALLOCATE(9) VOP call allows
implementation of the Deallocate NFSv4.2 operation.
Since the Deallocate operation is a single succeed/fail
operation, the call to VOP_DEALLOCATE(9) loops so long
as progress is being made. It calls maybe_yield()
between loop iterations to allow other processes
to preempt it.
Where RFC 7862 underspecifies behaviour, the code
is written to be Linux NFSv4.2 server compatible.
Reviewed by: khng
Differential Revision: https://reviews.freebsd.org/D31624
The NFSv4.2 Allocate operation sanity checks the aa_offset
and aa_length arguments. Since they are assigned to variables
of type off_t (signed) it was possible for them to be negative.
It was also possible for aa_offset+aa_length to exceed OFF_MAX
when stored in lo_end, which is uint64_t.
This patch adds checks for these cases to the sanity check.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31511
Since MAXPHYS now allows the FreeBSD NFS client
to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio
so that the maximum NFS server I/O size can be set up to 1Mbyte.
The Linux NFS client can also do 1Mbyte I/O operations.
The default of 128Kbytes for the maximum I/O size has
not been changed for two reasons:
- kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O
- The limited benchmarking I can do actually shows a drop in I/O rate
when the I/O size is above 256Kbytes.
However, daveb@spectralogic.com reports seeing an increase
in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30826
Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.
A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.
One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30970
Michael Dexter <editor@callfortesting.org> reported
a crash in FreeNAS, where the first argument to
clnt_bck_svccall() was no longer valid.
This argument is a pointer to the callback CLIENT
structure, which is free'd when the associated
NFSv4 ClientID is free'd.
This appears to have occurred because a callback
reply was still in the socket receive queue when
the CLIENT structure was free'd.
This patch acquires a reference count on the CLIENT
that is not CLNT_RELEASE()'d until the socket structure
is destroyed. This should guarantee that the CLIENT
structure is still valid when clnt_bck_svccall() is called.
It also adds a check for closed or closing to
clnt_bck_svccall() so that it will not process the callback
RPC reply message after the ClientID is free'd.
Comments by: mav
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30153
Commit d224f05fcf pre-parsed the next operation number for
the put file handle operations. This patch uses this next
operation number, plus the type of the file handle being set by
the put file handle operation, to implement the rules in RFC5661
Sec. 2.6 with respect to replying NFSERR_WRONGSEC.
This patch also adds a check to see if NFSERR_WRONGSEC should be
replied when about to perform Lookup, Lookupp or Open with a file
name component, so that the NFSERR_WRONGSEC reply is done for
these operations, as required by RFC5661 Sec. 2.6.
This patch does not have any practical effect for the FreeBSD NFSv4
client and I believe that the same is true for the Linux client,
since NFSERR_WRONGSEC is considered a fatal error at this time.
MFC after: 2 weeks
Commit 947bd2479b added support for the Secinfo_no_name operation.
When a non-exported file system is being traversed, the list of
security flavors is empty. It turns out that the Linux client
mount attempt fails when the security flavors list in the
Secinfo_no_name reply is empty.
This patch modifies Secinfo/Secinfo_no_name so that it replies
with all four security flavors when the list is empty.
This fixes Linux NFSv4.1/4.2 mounts when the file system at
the NFSv4 root (as specified on a V4: exports(5) line) is
not exported.
MFC after: 2 weeks
RFC5661 Sec. 2.6 specifies when a NFSERR_WRONGSEC error reply can be done.
For the four operations PutFH, PutrootFH, PutpublicFH and RestoreFH,
NFSERR_WRONGSEC can or cannot be replied, depending upon what operation
follows one of these operations in the compound.
This patch modifies nfsrvd_compound() so that it parses the next operation
number before executing any of the above four operations, storing it in
"nextop".
A future commit will implement use of "nextop" to decide if NFSERR_WRONGSEC
can be replied for the above four operations.
This commit should not change the semantics of performing the compound RPC.
MFC after: 2 weeks
Without this patch, nfsd_checkrootexp() returns failure
and then the NFSv4 operation would reply NFSERR_WRONGSEC.
RFC5661 Sec. 2.6 only allows a few NFSv4 operations, none
of which call nfsv4_checktootexp(), to return NFSERR_WRONGSEC.
This patch modifies nfsd_checkrootexp() to return the
error instead of a boolean and sets the returned error to an RPC
layer AUTH_ERR, as discussed on nfsv4@ietf.org.
The patch also fixes nfsd_errmap() so that the pseudo
error NFSERR_AUTHERR is handled correctly such that an RPC layer
AUTH_ERR is replied to the NFSv4 client.
The two new "enum auth_stat" values have not yet been assigned
by IANA, but are the expected next two values.
The effect on extant NFSv4 clients of this change appears
limited to reporting a different failure error when a
mount that does not use adequate security is attempted.
MFC after: 2 weeks
There are several NFSv4.1/4.2 server operation functions which
have unneeded checks for the NFSv4 root being set up.
The checks are not needed because the operations always follow
a Sequence operation, which performs the check.
This patch deletes these checks, simplifying the code so
that a future patch that fixes the checks to conform with
RFC5661 Sec. 2.6 will be less extension.
MFC after: 2 weeks
The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts. Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.
I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there. I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().
Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.
MFC after: 2 weeks
Commit b3d4c70dc6 added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.
This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.
MFC after: 2 weeks
The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.
This patch only affects mounts from Linux clients when delegations
are enabled on the server.
MFC after: 2 weeks
Commit 7a606f280a allowed the server to do retries of CB_RECALL
callbacks every couple of seconds. This was needed to allow the
Linux client to re-establish the back channel.
However this patch broke the delegation timeout check, such that
it would just keep retrying CB_RECALLS.
If the client has crashed or been network patitioned from the
server, this continues until the client TCP reconnects to
the server and re-establishes the back channel.
This patch modifies the code such that it still times out the
delegation recall after some minutes, so that the server will
allow the conflicting client request once the delegation times out.
This patch only affects the NFSv4 server when delegations are
enabled and a NFSv4 client that holds a delegation has crashed
or been network partitioned from the server for at least several
minutes when a delegation needs to be recalled.
MFC after: 2 weeks
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156