Commit Graph

255 Commits

Author SHA1 Message Date
rmacklem
0d33c81f6d Fix the NFSv4 server to obey vfs.nfsd.nfs_privport.
When the NFSv4 server was coded, I believed that the specification authors
did not want NFSv4 servers to require a client to use a reserved port#.
However, recently it has been noted that the Linux NFSv4 server does support
a check for a reserved port#.
Since both the FreeBSD and Linux NFSv4 clients use a reserved port# by
default, enabling vfs.nfsd.nfs_privport to require a reserved port# for
NFSv4 the same as it does for NFSv2, 3 seems reasonable.
The only case where this could cause a POLA violation is a FreeBSD NFSv4
server with vfs.nfsd.nfs_privport set, but with NFSv4 clients doing mounts
without using a reserved port# (< 1024).

Tested by:	chaz.newton58@gmail.com
PR:		234106
MFC after:	1 week
2018-12-20 22:21:41 +00:00
mjg
7e31d1de7e Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
rmacklem
8a42e50bb3 Improve sanity checking for the dircount hint argument to
NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code
checked for a zero argument, but did not check for a very large value.
This patch clips dircount at the server's maximum data size.

MFC after:	1 week
2018-11-20 01:59:57 +00:00
rmacklem
89ba9a8c9e r304026 added code that started statistics gathering for an operation
before the operation number (the variable called "op") was sanity checked.
This patch moves the code down to below the range sanity check for "op".
2018-11-20 01:52:45 +00:00
avg
c98294c273 nfsrvd_readdirplus: for some errors, do not fail the entire request
Instead, a failing entry is skipped.
This change consist of two logical changes.

A failure to vget or lookup an entry is considered to be a result of a
concurrent removal, which is the only reasonable explanation given that
the filesystem is busied.  So, the entry would be silently skipped.

In the case of a failure to get attributes of an entry for an NFSv3
request, the entry would be silently skipped.  There can be legitimate
reasons for the failure, but NFSv3 does not provide any means to report
the error, so we have two options: either fail the whole request or
ignore the failed entry.  Traditionally, the old NFS server used the
latter option, so the code is reverted to it.  Making the whole
directory unreadable because of a single entry seems to be unpractical.

Additionally, some bits of code are slightly re-arranged to account for
the new control flow and to honor style(9).

Reviewed by:	rmacklem
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D15424
2018-10-22 15:33:05 +00:00
rmacklem
a0aee6c7bd Fix the pNFS server's reporting of disk space usage for the "#<path>" case.
The pNFS server would report the total disk space used and free for all
of the DSs, even when certain DSs are assigned to the file system via
the "#<path>" suffix used in the "nfsd -p" option argument.
This patch fixes this case. It only reports usage for the file system
that the argument vnode resides on. This is consistent with the non-pNFS
NFSv4 server. In NFSv4 it is possible to have subtrees on other file
systems, but these are not included in the usage information for NFSv4.

Approved by:	re (gjb)
2018-10-09 01:10:50 +00:00
rmacklem
453e23bd43 Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr().
When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr()
done while the vnodes were locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes the LORs by moving the vn_start_write() calls up to before
where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't
necessary, because NFS mounts can't be suspended. However, I think doing
so is harmless.
Thanks go to kib@ for letting me know that I had introduced these LORs.
This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8)
is used to recover a mirrored DS.
2018-08-18 19:14:06 +00:00
rmacklem
4178db1cdc Fix LORs between vn_start_write() and vn_lock() in the pNFS server.
When coding the pNFS server, I added several vn_start_write() calls done
while the vnode was locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes this by removing the added vn_start_write() calls and
modifying the code so that the extant vn_start_write() call before the
NFS RPC/operation is done when needed by the pNFS server.
Flags are changed so that LayoutCommit and LayoutReturn now get a
vn_start_write() done for them.
When the pNFS server is enabled, the code now also changes the flags for
Getattr, so that the vn_start_write() is done for Getattr, since it may
need to do a vn_set_extattr(). The nfs_writerpc flag array was made global
to the NFS server and renamed nfsrv_writerpc, which is consistent naming
for globals in the NFS server.
Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is
locked results in a LOR.
This patch only affects the behaviour of the pNFS server.
2018-08-17 21:12:16 +00:00
rmacklem
624b46f401 Don't set a file's size for the MDS file of a pNFS service.
When a pNFS service is running, the size of the files created on the MDS
are normally 0, since the data is written to the data files on the DS(s).
However, without this patch, if a Setattr with a non-zero size was done by
a client, the MDS file was set to that size.  This was thought to be benign,
but it turns out that files with a non-zero size plus extended attributes
can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this
panic() has not been isolated, this patch avoids the panic() and leaves
the MDS files in a consistent state of always having a size == 0.
Note that these MDS files never store data. The patch also includes an
unnecessary initialization of savsize in case some compiler or static
analyser complains it might not be initialized.
This patch only affects the NFS server when pNFS is enabled via the "-p"
command line option on nfsd.
2018-08-17 12:32:38 +00:00
rmacklem
db4bc2514c Assorted fixes to handling of LayoutRecall callbacks, mostly error handling.
After a re-read of the appropriate section of RFC5661, I decided that a
few things should be changed related to LayoutRecall callback handling.
Here are the things fixed by this patch.
- For two of the three cases that LayoutRecall is done, I now think
  setting the clora_changed argument false is correct.
- All errors other than NFSERR_DELAY returned by LayoutRecall appear
  permanent, so don't retry for any of them. (NFSERR_DELAY is retried by
  newnfs_request(), so it is not affected by this patch.)
- Instead of waiting "forever" (actually until the process is SIGTERM'd)
  for Layouts to be returned during a mirror copy, fail and return
  ENXIO after about 1minute.
  Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself,
  but did not make sense when done via find(1).
This patch only affects the pNFS server.
2018-08-08 20:21:45 +00:00
rmacklem
6e654ddb19 Copy all bits of a file handle in case there is padding in the structure.
At least on x86, fhandle_t is a packed structure, so I believe an
assignment will copy all the bits. However, for some current/future
architectures, there might be padding in the structure that doesn't get
copied via an assignment.
Since NFS assumes a file handle is an opaque blob of bits that can be
compared via memcmp()/bcmp(), all the bits including any padding must be
copied.
This patch replaces the assignments with a call to a byte copy function.
Spotted during code inspection.
2018-08-05 19:21:50 +00:00
rmacklem
fa822f0042 Silence newer gcc warnings.
Newer versions of gcc generate "set, but not used" warnings in the NFS server.
Add __unused macros to silence these warnings.

Requested by:	mmacy
2018-07-29 21:51:17 +00:00
rmacklem
5fb1d5502d Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7.
I believe that a ReclaimComplete with rca_one_fs == TRUE is only
to be used after a file system has been transferred to a different
file server.  However, RFC5661 is somewhat vague w.r.t. this and
the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE
and one with ReclaimComplete with rca_one_fs == FALSE.
Therefore, just ignore the rca_one_fs == TRUE operation and return
NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP.
This allows the ESXi 6.7 NFSv4.1 client to do a mount.
After discussion on the NFSv4 IETF working group mailing list, doing this
along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE
was an appropriate way to handle this.
The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was
done may be used to disable replies of NFS4ERR_GRACE for non-reclaim
state operations in a future commit.

This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts
work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are
violations of RFC-5661 and should not be used.

Reported by:	andreas.nagy@frequentis.com
Tested by:	andreas.nagy@frequentis.com, daniel@ftml.net (earlier version)
MFC after:	2 weeks
Relnotes:	yes
2018-07-28 20:21:04 +00:00
rmacklem
4d66080e75 Modify the reasons for not issuing a delegation in the NFSv4.1 server.
The ESXi NFSv4.1 client will generate warning messages when the reason for
not issuing a delegation is two. Two refers to a resource limit and I do
not see why it would be considered invalid. However it probably was not the
best choice of reason for not issuing a delegation.
This patch changes the reasons used to ones that the ESXi client doesn't
complain about. This change does not affect the FreeBSD client and does
not appear to affect behaviour of the Linux NFSv4.1 client.
RFC5661 defines these "reasons" but does not give any guidance w.r.t. which
ones are more appropriate to return to a client.

Tested by:	andreas.nagy@frequentis.com
PR:		226650
MFC after:	2 weeks
2018-07-16 21:32:50 +00:00
rmacklem
7ad0403b62 Ignore the cookie verifier for NFSv4.1 when the cookie is 0.
RFC5661 states that the cookie verifier should be 0 when the cookie is 0.
However, the wording is somewhat unclear and a recent discussion on the
nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore
the cookie verifier's value when the dirctory offset cookie is 0.
This patch deletes the check for this that would return NFSERR_BAD_COOKIE
when the verifier was not 0.
This was found during testing of the ESXi client against the NFSv4.1 server.

Reported by:	daniel@ftml.net (via packet trace)
MFC after:	2 weeks
2018-07-11 23:23:29 +00:00
rmacklem
2af903ca13 Add support for a "forced" pnfsdskill to the pNFS server kernel code.
The pnfsdskill(8) command will normally fail if there is no valid mirror
for the DS to be disabled. However, a system administrator may need to
disable a DS which does not have a valid mirror so that the nfsd threads
can be terminated. This patch adds the kernel code needed by pnfsdskill(8)
to implement this "forced" case of disabling a DS.
This patch only affects the pNFS server.
2018-07-09 19:58:01 +00:00
rmacklem
da485b59a5 Fix the kernel part of pnfsdscopymr() to handle holes in the file being copied.
If a mirrored DS is being recovered that has a lot of large sparse files,
pnfsdscopymr(8) would use a lot of space on the recovered mirror since it
would write the "holes" in the file being mirrored.
This patch adds code to check for a "hole" and skip doing the write.
The check is done on a "per PNFSDS_COPYSIZ size block", which is currently 64K.
I think that most file server file systems will be using a blocksize at
least this large. If the file server is using a smaller blocksize and
smaller holes need to be preserved, PNFSDS_COPYSIZ could be decreased.
The block of 0s is malloc()d, since pnfsdcopymr(8) should be an infrequent
occurrence.
2018-07-08 18:15:55 +00:00
rmacklem
356c1de1ef Fix handling of the hybrid DS case for a pNFS server.
After the addition of the "#mds_path" suffix for a DS specification on the
"-p" nfsd option, it is possible to have a mix of DSs assigned to an MDS
file system and DSs that store files for all DSs. This is what I referred
to as "hybrid" above.
At first, I didn't think this hybrid case would be useful, but I now believe
that some system administrators may fine it useful.
This patch modifies the file storage assignment algorithm so that it
makes the "#mds_path" DSs take priority and the all file systems DSs
are now only used for MDS file systems with no "#mds_path" DS servers.
This only affects the pNFS server for this "hybrid" case.
2018-07-07 19:27:49 +00:00
rmacklem
f28840fbcf Change the pNFS server so that it does not disable a mirrored DS for
an NFSERR_STALE error reported via a LayoutReturn.

The current FreeBSD client can generate these errors for an operational
DS while doing a recovery of a mirror after a mirrored DS has been repaired.
I am not sure why these errors occur, but my best current guess is a race
between the Layout Recall issued by the kernel code run from pnfsdscopymr(8)
and a Read operation on the DS for the file bing copied.
The errors are not fatal, since the client falls back on doing I/O through
the MDS, which can do the I/O successfully as a proxy. (The fact that the
MDS can do this indicates that the file does still exist on the functioning
DS.)
This change only affects the pNFS server and only when a client does a
LayoutReturn with the NFSERR_STALE error report.
2018-07-06 19:18:45 +00:00
rmacklem
bc2294b707 Fix the pNFS server so that it handles the "#mds_path" check for mirrors.
The recently added feature of the pNFS server will set an fsid for the
MDS file system to define the file system a DS should store files for.
For a case where a DS handling all file systems has failed, it was possible
for the code to check for a mirror with a specified fs, even though
nfsdev_mdsisset was 0, possibly causing a false successful check for a mirror.
This patch adds a check for nfsdev_mdsisset != 0 to avoid this.
It only affects the pNFS server for a rare case. Found via code inspection.
2018-07-04 19:46:26 +00:00
rmacklem
59f9a8892a Add an optional feature to the pNFS server.
Without this patch, the pNFS server distributes the data storage files across
all of the specified DSs.
A tester noted that it would be nice if a system administrator could control
which DSs are used to store the file data for a given exported MDS file system.
This patch adds the kernel support to do this. It also makes a slight semantic
change to nfsv4_findmirror(), since some uses of it no longer require that
the DS being searched for have a current mirror.
A patch that will be committed in a few minutes will modify the nfsd daemon
to support this feature.
The patch should only affect sites using the pNFS server (specified via the
"-p" command line option for nfsd.

Suggested by:	james.rose@framestore.com
2018-07-02 19:21:33 +00:00
rmacklem
9892d91051 Fix the pNFS server for a case where mirror level equals number of DSs.
If a pNFS service was set up where the number of DSs equals the mirror level
and then a DS was disabled, the service would create files with duplicate
entries for the same DS. This bug occurred because I didn't realize that
TAILQ_FOREACH_FROM() would start at the beginning of the list when the
inital value of the variable was NULL.
This patch also changes the pNFS server DS file creation code so that it
creates entrie(s) with 0.0.0.0 IP address when it cannot create mirror level
files due to lack of DSs.
The patch only affects the pNFS service and only when it was created with
a number of DSs equal to the mirror level and mirroring is enabled.
2018-06-29 12:41:36 +00:00
rmacklem
3dd7911419 Add a counter to limit the number of disabled DSs for a mirrored pNFS MDS.
This patch adds a counter that limits the number of disabled mirrored DSs
to mirror level - 1.  It also makes a small change that keeps a Write that
has failed with EACCES when attempted by a client to a DS from disabling
the DS.
This patch only affects the pNFS server.
2018-06-22 00:55:39 +00:00
rmacklem
77f312d46a Merge the pNFS server code from projects/pnfs-planb-server into head.
This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".

Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate.  Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.

A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
    the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
    and Change attributes for the file, so that the MDS doesn't need to
    acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories.  The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.

For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
             is. The critical bits of information returned by the FreeBSD
             server is the IP address of the DS and, for the Flexible
             File layout, that NFSv4.1 is to be used and that it is
             "tightly coupled".
             There is a "deviceid" which identifies the DeviceInfo.
Layout     - This is per file and can be recalled by the server when it
             is no longer valid. For the FreeBSD server, there is support
             for two types of layout, call File and Flexible File layout.
             Both allow the client to do I/O on the DS via NFSv4.1 I/O
             operations. The Flexible File layout is a more recent variant
             that allows specification of mirrors, where the client is
             expected to do writes to all mirrors to maintain them in a
             consistent state. The Flexible File layout also allows the
             client to report I/O errors for a DS back to the MDS.
             The Flexible File layout supports two variants referred to as
             "tightly coupled" vs "loosely coupled". The FreeBSD server always
             uses the "tightly coupled" variant where the client uses the
             same credentials to do I/O on the DS as it would on the MDS.
             For the "loosely coupled" variant, the layout specifies a
             synthetic user/group that the client uses to do I/O on the DS.
             The FreeBSD server does not do striping and always returns
             layouts for the entire file. The critical information in a layout
             is Read vs Read/Writea and DeviceID(s) that identify which
             DS(s) the data is stored on.

At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.

Tested by:	james.rose@framestore.com
Relnotes:	yes
2018-06-12 19:36:32 +00:00
rmacklem
d080b0360c Revert r334586 since I now think __unused is the better way to handle this. 2018-06-04 11:35:04 +00:00
rmacklem
c5c0d85fbc Fix a gcc8 warning about a write only variable.
gcc8 warns that "verf" was set but not used. This was because the code
that uses it is disabled via a "#if 0".
This patch adds a "#if 0" to the variable's declaration and assignment
to get rid of the warning.
This way the code could be re-enabled without difficulty.

Requested by:	mmacy
MFC after:	2 weeks
2018-06-03 19:46:44 +00:00
rmacklem
3cc36cf922 Add the BindConnectiontoSession operation to the NFSv4.1 server.
Under some fairly unusual circumstances, the Linux NFSv4.1 client is
doing a BindConnectiontoSession operation for TCP connections.
It is also used by the ESXi6.5 NFSv4.1 client.
This patch adds this operation to the NFSv4.1 server.

Reported by:	andreas.nagy@frequentis.com
Tested by:	andreas.nagy@frequentis.com
MFC after:	2 weeks
2018-06-01 19:47:41 +00:00
rmacklem
7e7e0a78a9 Strengthen locking for the NFSv4.1 server DestroySession operation.
If a client did a DestroySession on a session while it was still in use,
the server might try to use the session structure after it is free'd.
I think the client has violated RFC5661 if it does this, but this patch
makes DestroySession block all other nfsd threads so no thread could
be using the session when it is free'd. After the DestroySession, nfsd
threads will not be able to find the session. The patch also adds a check
for nd_sessionid being set, although if that was not the case it would have
been all 0s and unlikely to have a false match.
This might fix the crashes described in PR#228497 for the FreeNAS server.

PR:		228497
MFC after:	1 week
2018-05-30 20:16:17 +00:00
rmacklem
fafcbcec14 Add a missing nfsrv_freesession() call for an unlikely failure case.
Since NFSv4.1 clients normally create a single session which supports
both fore and back channels, it is unlikely that a callback will fail
due to a lack of a back channel.
However, if this failure occurred, the session wasn't being dereferenced
and would never be free'd.
Found by inspection during pNFS server development.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-17 21:17:20 +00:00
rmacklem
098223ca00 End grace for the NFSv4 server if all mounts do ReclaimComplete.
The NFSv4 protocol requires that the server only allow reclaim of state
and not issue any new open/lock state for a grace period after booting.
The NFSv4.0 protocol required this grace period to be greater than the
lease duration (over 2minutes). For NFSv4.1, the client tells the server
that it has done reclaiming state by doing a ReclaimComplete operation.
If all NFSv4 clients are NFSv4.1, the grace period can end once all the
clients have done ReclaimComplete, shortening the time period considerably.
This patch does this. If there are any NFSv4.0 mounts, the grace period
will still be over 2minutes.
This change is only an optimization and does not affect correct operation.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-15 20:28:50 +00:00
rmacklem
082a58b8e3 Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.

MFC after:	1 week
2018-05-13 23:38:01 +00:00
rmacklem
de34a49ecf Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.

Tested by:	andreas.nagy@frequentis.com
MFC after:	2 months
2018-05-13 12:42:53 +00:00
rmacklem
1e4675982e The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK.
When an NFSv4.1 session is busy due to a callback being in progress,
nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK.
The only effect this has is that the DestroySession operation will report
the failure for this case and this probably has little or no effect on a
client. Spotted by inspection and no failures related to this have been
reported.

MFC after:	2 months
2018-05-13 12:29:09 +00:00
rmacklem
962b30b569 Add support for the TestStateID operation to the NFSv4.1 server.
The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.

MFC after:	2 months
2018-05-11 22:16:23 +00:00
rmacklem
d3389fdd15 Revert r333183, since I am not sure that just initializing the
list is the correct thing to do and that is already done without
this commit.
2018-05-02 21:29:42 +00:00
rmacklem
713342ac10 Add two missing LIST_INIT()s.
This patch adds two missing LIST_INIT()s. Found by inspection.
In practice, these are currently no-ops, since the structure they are
in is malloc'd with M_ZERO and all LIST_INIT does is set the pointer
in the list head to NULL. (In other words, the M_ZERO has already
correctly initialized it.)

MFC after:	2 months
2018-05-02 20:36:11 +00:00
rmacklem
8cd51ec4a0 Fix OpenDowngrade for NFSv4.1 if a client sets the OPEN_SHARE_ACCESS_WANT* bits.
The NFSv4.1 RFC specifies that the OPEN_SHARE_ACCESS_WANT bits can be set
in the OpenDowngrade share_access argument and are basically ignored.
I do not know of a extant NFSv4.1 client that does this, but this little
patch fixes it just in case.
It also changes the error from NFSERR_BADXDR to NFSERR_INVAL since the NFSv4.1
RFC specifies this as the error to be returned if bogus bits are set.
(The NFSv4.0 RFC didn't specify any error for this, so the error reply can
 be changed for NFSv4.0 as well.)
Found by inspection while looking at a problem with OpenDowngrade reported
for the ESXi 6.5 NFSv4.1 client.

Reported by:	andreas.nagy@frequentis.com
PR:		227214
MFC after:	1 week
2018-04-19 20:30:33 +00:00
cem
e83cbe4f78 nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros
They were just thin wrappers over malloc(9) w/ M_ZERO and free(9).

Discussed with:	rmacklem, markj
Sponsored by:	Dell EMC Isilon
2018-01-25 22:38:39 +00:00
cem
c060d198e3 style: Remove remaining deprecated MALLOC/FREE macros
Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions).  Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by:	jhb
Reviewed by:	cy, imp, markj, rmacklem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14035
2018-01-25 22:25:13 +00:00
manu
77258f226e nfs: Do not printf each time a lock structure is freed during module unload
There can be a lot of those structures and printing a line each time we free
one on module unload.

MFC after:	3 days
2018-01-18 15:28:49 +00:00
jhb
0aaf564f94 Use long for the last argument to VOP_PATHCONF rather than a register_t.
pathconf(2) and fpathconf(2) both return a long.  The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly.  Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by:	bde
Reviewed by:	kib
Sponsored by:	Chelsio Communications
2018-01-17 22:36:58 +00:00
kan
c8da6fae2c Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
rmacklem
7a197a7ade Avoid the overhead of acquiring a lock in nfsrv_checkgetattr() when
there are no write delegations issued.

manu@ reported on the freebsd-current@ mailing list that there was
a significant performance hit in nfsrv_checkgetattr() caused by
the acquisition/release of a state lock, even when there were no
write delegations issued.
This patch add a count of outstanding issued write delegations to the
NFSv4 server. This count allows nfsrv_checkgetattr() to return without
acquiring any lock when the count is 0, avoiding the performance hit
for the case where no write delegations are issued.

Reported by:	manu
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13327
2017-12-04 21:50:27 +00:00
pfg
43f5681c36 sys/fs: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:15:37 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
rmacklem
da67d313c5 Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1.
The client IP address was not being reported for some NFSv4 mounts by
nfsdumpstate. Upon investigation, two problems were found for mounts
using IPv4. One was that the code (originally written and tested on i386)
assumed that a "u_long" was a "uint32_t" and would exactly store an
IPv4 host address. Not correct for 64bit arches.
Also, for NFSv4.1 mounts, the field was not being filled in. This was
basically correct, because NFSv4.1 does not use a callback address.
However, it meant that nfsdumpstate could not report the client IP addr.
This patch should fix both of these issues.
For IPv6, the address will still not be reported. The original NFSv4 RFC
only specified IPv4 callback addresses. I think this has changed and, if so,
a future commit to fix reporting of IPv6 addresses will be needed.

Reported by:	manu
PR:		223036
MFC after:	2 weeks
2017-10-15 22:22:27 +00:00
rmacklem
a18386826f Change a panic to an error return.
There was a panic() in the NFS server's write operation that didn't
need to be a panic() and could just be an error return.
This patch makes that change.
Found by code inspection during development of the pNFS service.

MFC after:	2 weeks
2017-09-24 20:05:48 +00:00
mav
c2a28dc66d Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately.  It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.

The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time.  Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes.  There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2017-07-31 15:23:19 +00:00
trasz
04dc5d07ae Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.

Reviewed by:	rmacklem@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-07-19 09:59:32 +00:00
trasz
cdfada8c4c Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.

The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.

Reviewed by:	rmacklem (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11326
2017-06-26 13:11:21 +00:00