Commit Graph

57 Commits

Author SHA1 Message Date
Rick Macklem
e3e7c612f3 Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.

This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.
2020-04-11 23:37:58 +00:00
Rick Macklem
9f6624d317 Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.
2020-04-11 20:57:15 +00:00
Mark Johnston
355b3b7fd7 Simplify td_ucred handling in newnfs_connect().
No functional change intended.

MFC after:	1 week
2020-03-26 15:02:56 +00:00
Rick Macklem
8e1906f700 Fix kernel handling of a NFSERR_MINORVERSMISMATCH NFSv4 server reply.
When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate
a status result for the first operation in the compound. Without this
patch, this will result in a bogus EBADXDR error return.
Returning EBADXDR is relatively harmless, but a correct reply of
NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct
minor version to use for a File Layout DS now that there can be NFSv4.2
DS servers.

mount_nfs.c still needs to be fixed for this, although how the mount fails
is only useful to help sysadmins isolate why a mount fails.

Found during testing of the NFSv4.2 client and server.

MFC after:	2 weeks
2019-12-08 00:06:00 +00:00
Rick Macklem
02c8dd7d72 Revert r320698, since the related userland changes were reverted by r338192.
r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL
socket, since it resulted in a vnode locking panic().
Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and
do not use these kernel changes.
I left them in for a while, so that nfsuserd daemons built from head sources
between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them
by default.
This only affects head, since the changes were never MFC'd.
I will add an UPDATING entry, since an nfsuserd daemon built from head
sources between r320757 and r338192 will not run unless the "-use-udpsock"
option is specified. (This command line option is only in the affected
revisions of the nfsuserd daemon.)

I suspect few will be affected by this, since most who run systems built
from head sources (not stable or releases) will have rebuilt their nfsuserd
daemon from sources post r338192 (Aug. 22, 2018)

This is being reverted in preparation for an update to include AF_INET6
support to the code.
2019-04-04 23:30:27 +00:00
Rick Macklem
93df87f208 Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply.
The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY,
but exempts certain NFSv4 operations. However, for callback RPCs, there
should not be any exemptions at this time. The code would have erroneously
exempted the CBRECALL callback, since it has the same operation number as
the CLOSE operation.
This patch fixes this by checking for a callback RPC (indicated by clp != NULL)
and not checking for exempt operations for callbacks.
This would have only affected the NFSv4 server when delegations are enabled
(they are not enabled by default) and the client replies to CBRECALL with
NFSERR_DELAY. This may never actually happen.
Spotted during code inspection.

MFC after:	2 weeks
2018-08-07 21:29:14 +00:00
Rick Macklem
cecf6c6e9c Set CLSET_TIMEOUT on TCP connections to pNFS DSs.
Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of
specifying a timeout on each RPC. This is done so that SO_SNDTIMEO
is set on the TCP socket as well as specifying a time limit when
waiting for an RPC reply.  Useful if the send queue for the TCP
connection has become constipated, due to a failed DS.
The choice of lease_duration / 4 is fairly arbitrary, but seems to work
ok, with a lower bound of 10sec.
For client connections to a DS, set the retry limit to vfs.nfsd.dsretries,
which is 2 by default.
This patch should only affect pNFS connections to DSs.
This patch requires r336542.

MFC after:	2 weeks
2018-07-21 01:33:07 +00:00
Rick Macklem
ba6cce3aea Fix the handling of NFSv4.1 sessions for "soft" mounts.
When a "soft" mount is used for NFSv4.1, an RPC that fails without completing
will leave a slot in the NFSv4.1 session in an indeterminate state.
As such, all that can be done is free up the slot while making is no longer
usable.
A "soft" NFSv4.1 mount is not recommended in general, since it will leave
Open/Lock state in an indeterminate state. An exception is a pNFS mount of
a DS, since there are no Opens/Locks done for them except file creates
where loss of the Open state does not matter.
The patch also makes connections to DSs soft, so that they will fail when
a DS is non-functional or network partitioned, allowing the pNFS MDS to disable
the DS for a mirrored configuration.
This patch should not affect normal "hard" NFSv4.1 mounts.

MFC after:	2 weeks
2018-06-22 21:37:20 +00:00
Rick Macklem
755e4b7936 Revert r335263, since it can cause crashes in unusual circumstances.
This needs to be fixed in a different way.
2018-06-17 23:08:54 +00:00
Rick Macklem
46d30d3d9c Fix NFSv4.1 client side handling of "soft,retrans=2" mounts.
Normally "soft,retrans=2" cannot be safely used on NFSv4 mounts, since
the RPC can fail and leave the open/lock state in an undefined state.
Doing I/O on a pNFS DS is an exception to this, since no open/lock state
is maintained on the DS server.
It is useful to do "soft,retrans=2" connections to a DS when it is mirrored,
so that the client can detect failure of the DS. As such, mounts from the MDS
to the DSs should use these mount options when mirroring is enabled.
However, the NFSv4.1 client still leaves the session in an undefined state
when this happens.
This patch fixes the problem by setting the session defunct, so it will
no longer be used.
The patch also sets "retries=2" on the connections done by the client to
a DS, which is the internal equivalent of "soft,retrans=2".
The client does not know if the server implements mirroring at connection
time, but always doing this should be safe, since it will fall back on doing
I/O via the MDS as a proxy when there is a failure doing an I/O RPC to the DS.

This patch should not affect non-pNFS client mounts.

MFC after:	2 weeks
2018-06-16 19:45:06 +00:00
Rick Macklem
73b1879c2d Add a couple of safety belt checks to the NFSv4.1 client related to sessions.
There were a couple of cases in newnfs_request() that it assumed that it
was an NFSv4.1 mount with a session. This should always be the case when
a Sequence operation is in the reply or the server replies NFSERR_BADSESSION.
However, if a server was broken and sent an erroneous reply, these safety
belt checks should avoid trouble.
The one check required a small tweak to nfsmnt_mdssession() so that it
returns NULL when there is no session instead of the offset of the field
in the structure (0x8 for i386).
This patch should have no effect on normal operation of the client.
Found by inspection during pNFS server development.

MFC after:	2 weeks
2018-06-11 19:00:07 +00:00
Conrad Meyer
222daa421f style: Remove remaining deprecated MALLOC/FREE macros
Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions).  Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by:	jhb
Reviewed by:	cy, imp, markj, rmacklem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14035
2018-01-25 22:25:13 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Rick Macklem
63918d3848 Fix forced dismount when a pNFS mount is hung on a DS.
When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS,
a forced dismount wouldn't work, because the RPC socket for the DS
was not being closed. This patch fixes this.
This will only affect "pnfs" mounts where the pNFS server's DS
is unresponsive (crashed or network partitioned or...).
Found during testing of the pNFS server.

MFC after:	2 weeks
2017-10-10 21:05:40 +00:00
Rick Macklem
16f300fa4a Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.
This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".

Suggested by:	kib
MFC after:	2 weeks
2017-07-27 20:55:31 +00:00
Rick Macklem
25d694a6fa Add support for AF_LOCAL socket upcalls to the nfsuserd daemon.
This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon
that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL
sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are
in use.

Suggested by:	dfr
PR:		205193
2017-07-06 00:53:12 +00:00
Rick Macklem
ee791357a2 Add the definition of maxbcachebuf to sys/buf.h.
r320070 removed the definition of maxbcachebuf from sys/param.h to
fix the build for arm.
This patch adds the definition of maxbcachebuf to sys/buf.h, which
should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S.

Suggested by:	kib
MFC after:	2 weeks
2017-06-19 22:07:53 +00:00
Rick Macklem
1d9f01b18e Take "extern int maxbcachebuf" out of sys/param.h, since it breaks the
arm build.

In the arm build, elf_note.S includes sys/param.h and then does an
elf macro called ELFNOTE(). Although the compile error doesn't make
sense to me, I believe it just means that an "extern ..." can't exist
in param.h for this inclusion case.
I suspect adding #if !defined(LOCORE) might fix the build, but this
commit just takes the definition out.
I will ask freebsd-current@ what is the best was to deal with this
and do a subsequent commit after that.

Reported by:	melounmichal@gmail.com
2017-06-18 12:28:43 +00:00
Rick Macklem
d1c5e240a8 Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf.
By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D10991
2017-06-17 22:24:19 +00:00
Rick Macklem
0596f343f8 Don't set ND_NOMOREDATA for a failed Setattr operation (NFSv4).
The NFSv4 Setattr operation always has reply data even when it fails,
so don't set the ND_NOMOREDATA for it. This would only affect unusual
cases where Setattr fails and the RPC code wants to parse the rest of
the compound. Detected during recent development related to the pNFS server.

MFC after:	2 weeks
2017-04-21 23:01:32 +00:00
Rick Macklem
40f8ff4800 Don't create a backchannel for a DS connection.
An NFSv4.1 client connection to a Data Server (DS) should not have a
backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel
for this case.
Found during recent testing with the pNFS server under development.

MFC after:	2 weeks
2017-04-21 22:38:26 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Rick Macklem
b2fc0141d9 Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
  would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
  it would fail the RPC/syscall instead of initiating recovery and then
  looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
  until after a new session was created for a NFS4ERR_BAD_SESSION error,
  it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
  structure (which is always the current metadata session) was slightly
  racey. With changes for the above problems it became more racey, so all
  uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
  and, as such, this wouldn't have caused problems, the allocation of a
  session structure was 1 byte smaller than it should have been.
  (Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D8745
2016-12-23 23:14:53 +00:00
Rick Macklem
1b819cf265 Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.

Subsequent commits will update nfsstat(8) to use the new fields.

Submitted by:	will (earlier version)
Reviewed by:	ken
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D1626
2016-08-12 22:44:59 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Rick Macklem
c15882f091 Remove the old NFS client and server from head,
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.

Discussed on: freebsd-fs
2014-12-23 00:47:46 +00:00
Rick Macklem
c59e4cc34d Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.

MFC after:	1 month
2014-07-01 20:47:16 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Rick Macklem
cc085ba84d During code inspection, I spotted that there was a code path where
CLNT_CONTROL() would be called on "client" after it was
released via CLNT_RELEASE(). It was unlikely that this
code path gets executed and I have not heard of any problem
report caused by this bug. This patch fixes the code so that
this cannot happen.

MFC after:	2 months
2013-11-03 23:17:30 +00:00
Rick Macklem
88a2437a65 Add support for host-based (Kerberos 5 service principal) initiator
credentials to the kernel rpc. Modify the NFSv4 client to add
support for the gssname and allgssname mount options to use this
capability. Requires the gssd daemon to be running with the "-h" option.

Reviewed by:	jhb
2013-07-09 01:05:28 +00:00
Kenneth D. Merry
d96b98a360 Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes).  It does not currently work for NFSv4 requests.  They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately.  Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads.  This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered.  Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS.  Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once.  ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking.  This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables.  The read offset window calculation has been slightly
modified as well.  Instead of having separate bins, each file
handle has a rolling window of bin_shift size.  This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
	Add nfs_fha_new.c and nfs_fha_old.c.  Compile nfs_fha.c
	when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
	Bring in changes from Rick Macklem to newnfs_realign that
	allow it to operate in blocking (M_WAITOK) or non-blocking
	(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
	Bring in a change from Rick Macklem to allow telling
	nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
	Bring in changes from Rick Macklem to create a new
	nfsm_dissect_nonblock() inline function and
	NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
	Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
	Setup the new NFS server's RPC thread pool so that it will
	call the FHA code.

	Add the malloc flag argument to newnfs_realign().

	Unstaticize newnfs_nfsv3_procid[] so that we can use it in
	the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
	In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
	that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
	In nfsd_fhtovp(), if we're starting a write, check to see
	whether the underlying filesystem supports shared writes.
	If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
	Remove all code that is specific to the NFS server
	implementation.  Anything that is server-specific is now
	accessed through a callback supplied by that server's FHA
	shim in the new softc.

	There are now separate sysctls and tunables for the FHA
	implementations for the old and new NFS servers.  The new
	NFS server has its tunables under vfs.nfsd.fha, the old
	NFS server's tunables are under vfs.nfsrv.fha as before.

	In fha_extract_info(), use callouts for all server-specific
	code.  Getting file handles and offsets is now done in the
	individual server's shim module.

	In fha_hash_entry_choose_thread(), change the way we decide
	whether two reads are in proximity to each other.
	Previously, the calculation was a simple shift operation to
	see whether the offsets were in the same power of 2 bucket.
	The issue was that there would be a bucket (and therefore
	thread) transition, even if the reads were in close
	proximity.  When there is a thread transition, reads wind
	up going somewhat out of order, and ZFS gets confused.

	The new calculation simply tries to see whether the offsets
	are within 1 << bin_shift of each other.  If they are, the
	reads will be sent to the same thread.

	The effect of this change is that for sequential reads, if
	the client doesn't exceed the max_reqs_per_nfsd parameter
	and the bin_shift is set to a reasonable value (22, or
	4MB works well in my tests), the reads in any sequential
	stream will largely be confined to a single thread.

	Change fha_assign() so that it takes a softc argument.  It
	is now called from the individual server's shim code, which
	will pass in the softc.

	Change fhe_stats_sysctl() so that it takes a softc
	parameter.  It is now called from the individual server's
	shim code.  Add the current offset to the list of things
	printed out about each active thread.

	Change the num_reads and num_writes counters in the
	fha_hash_entry structure to 32-bit values, and rename them
	num_rw and num_exclusive, respectively, to reflect their
	changed usage.

	Add an enable sysctl and tunable that allows the user to
	disable the FHA code (when vfs.XXX.fha.enable = 0).  This
	is useful for before/after performance comparisons.

nfs_fha.h:
	Move most structure definitions out of nfs_fha.c and into
	the header file, so that the individual server shims can
	see them.

	Change the default bin_shift to 22 (4MB) instead of 18
	(256K).  Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
	Add shims for the old and new NFS servers to interface with
	the FHA code, and callbacks for the

	The shims contain all of the code and definitions that are
	specific to the NFS servers.

	They setup the server-specific callbacks and set the server
	name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
	Configure the RPC code to call fhaold_assign() instead of
	fha_assign().

sys/modules/nfsd/Makefile:
	Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
	Add nfs_fha_old.c.

Reviewed by:	rmacklem
Sponsored by:	Spectra Logic
MFC after:	2 weeks
2013-04-17 21:00:22 +00:00
John Baldwin
593efaf9f7 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
John Baldwin
a120a7a3cd Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
John Baldwin
a89a2c8ba4 Further cleanups to use of timestamps in NFS:
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
  portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
  NFS server.
- Use nanotime() for the modification time on a delegation to get as
  precise a time as possible.
- Use time_second instead of extracting the second from a call to
  getmicrotime().

Submitted by:	bde (3)
Reviewed by:	bde, rmacklem
MFC after:	2 weeks
2013-01-25 15:25:24 +00:00
John Baldwin
6910d7a0d8 - More properly handle interrupted NFS requests on an interruptible mount
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
  when krpc was committed.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-15 22:08:17 +00:00
Rick Macklem
1f60bfd822 Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.
2012-12-08 22:52:39 +00:00
Rick Macklem
23b3566364 Martin Cracauer reported a problem to freebsd-current@ under the
subject "Data corruption over NFS in -current". During investigation
of this, I came across an ugly bogusity in the new NFS client where
it replaced the cr_uid with the one used for the mount. This was
done so that "system operations" like the NFSv4 Renew would be
performed as the user that did the mount. However, if any other
thread shares the credential with the one doing this operation,
it could do an RPC (or just about anything else) as the wrong cr_uid.
This patch fixes the above, by using the mount credentials instead of
the one provided as an argument for this case. It appears
to have fixed Martin's problem.
This patch is needed for NFSv4 mounts and NFSv3 mounts against
some non-FreeBSD servers that do not put post operation attributes
in the NFSv3 Statfs RPC reply.

Tested by:	Martin Cracauer (cracauer at cons.org)
Reviewed by:	jhb
MFC after:	2 weeks
2012-01-20 00:58:51 +00:00
Rick Macklem
f725864490 opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after:	2 weeks
2012-01-08 01:54:46 +00:00
Rick Macklem
713f46ac47 jwd@ reported a problem via email where the old NFS client would
get a reply of EEXIST from an NFS server when a Mkdir RPC was retried,
for an NFS over UDP mount.
Upon investigation, it was found that the client was retransmitting
the Mkdir RPC request over UDP, but with a different xid. As such,
the retransmitted message would miss the Duplicate Request Cache
in the server, causing it to reply EEXIST. The kernel client side
UDP rpc code has two timers. The first one causes a retransmit using
the same xid and socket and was set to a fixed value of 3seconds.
(The default can be overridden via CLSET_RETRY_TIMEOUT.)
The second one creates a new socket and xid and should be larger
than the first. However, both NFS clients were setting the second
timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to
1second, so the first timer would never time out.
This patch fixes both NFS clients so that they set the first timer
using nm_timeo and makes the second timer larger than the first one.

Reported by:	jwd
Tested by:	jwd
Reviewed by:	jhb
MFC after:	2 weeks
2011-12-21 02:45:51 +00:00
Rick Macklem
6a536ceea5 The new NFSv4 client handled NFSERR_GRACE as a fatal error
for the remove and rename operations. Some NFSv4 servers will
report NFSERR_GRACE for these operations. This patch changes
the behaviour of the client so that it handles NFSERR_GRACE
like NFSERR_DELAY for non-state related operations like
remove and rename. It also exempts the delegreturn operation
from handling within newnfs_request() for NFSERR_DELAY/NFSERR_GRACE
so that it can handle NFSERR_GRACE in the same manner as before.
This problem was resolved thanks to discussion with bfields at fieldses.org.
The problem was identified at the recent NFSv4 ineroperability
bakeathon.

MFC after:	2 weeks
2011-07-16 20:53:27 +00:00
Zack Kirsch
a9285ae5c4 Add DEXITCODE plumbing to NFS.
Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.

This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.

Submitted by:   David Kwan <dkwan@isilon.com>
Reviewed by:    rmacklem
Approved by:    zml (mentor)
MFC after:      2 weeks
2011-07-16 08:51:09 +00:00
Rick Macklem
7bb55def77 Plug an mbuf leak in the new NFS client that occurred when a
server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc.
This affected both NFSv3 and NFSv4. Found during testing
at the recent NFSv4 interoperability Bakeathon.

MFC after:	2 weeks
2011-06-22 21:10:12 +00:00
Rick Macklem
72b7c8ddb1 Fix the new NFSv4 client so that it uses the same uid as
was used for doing a mount when performing system operations
on AUTH_SYS mounts.  This resolved an issue when mounting
a Linux server. Found during testing at the recent
NFSv4 interoperability Bakeathon.

MFC after:	2 weeks
2011-06-22 19:47:45 +00:00
Rick Macklem
7e7fd7d177 Fix the kgssapi so that it can be loaded as a module. Currently
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.

Reviewed by:	jhb
MFC after:	2 weeks
2011-06-19 22:08:55 +00:00
Rick Macklem
8f0e65c915 Add DTrace support to the new NFS client. This is essentially
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.
2011-06-18 23:02:53 +00:00
Rick Macklem
1f3765902c Change the sysctl naming for the old and new NFS clients
to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes
the default nfs client use vfs.nfs.xxx after r221124.
2011-05-15 20:52:43 +00:00
Rick Macklem
ebd9ef339f Get rid of the "nfscl: consider increasing kern.ipc.maxsockbuf"
message that was generated when doing experimental NFS client
mounts. I put that message in because the krpc would hang with
the default size for mounts that used large rsize/wsize values.
Since the bug that caused these hangs was fixed by r213756,
I think the message is no longer needed.

MFC after:	2 weeks
2011-04-17 20:01:32 +00:00
Rick Macklem
23d9efa7a8 Patch the experimental NFS client so that it works for NFSv2
by adding the necessary mapping from NFSv3 procedure numbers
to NFSv2 procedure numbers when doing NFSv2 RPCs.

MFC after:	1 week
2010-05-08 01:24:18 +00:00
Rick Macklem
23f929dfe8 An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.

MFC after:	1 week
2010-04-24 22:52:14 +00:00