freebsd-nq/sys/fs
Rick Macklem b2fc0141d9 Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.
For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
  would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
  it would fail the RPC/syscall instead of initiating recovery and then
  looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
  until after a new session was created for a NFS4ERR_BAD_SESSION error,
  it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
  structure (which is always the current metadata session) was slightly
  racey. With changes for the above problems it became more racey, so all
  uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
  and, as such, this wouldn't have caused problems, the allocation of a
  session structure was 1 byte smaller than it should have been.
  (Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D8745
2016-12-23 23:14:53 +00:00
..
autofs Remove spurious space. 2016-11-13 12:06:25 +00:00
cd9660 Use buffer pager for cd9660. 2016-10-28 11:46:39 +00:00
cuse Prevent cuse4bsd.ko and cuse.ko from loading at the same time by 2016-09-23 07:41:23 +00:00
deadfs Style changes for deadfs: 2014-10-15 13:22:33 +00:00
devfs Hide the boottime and bootimebin globals, provide the getboottime(9) 2016-07-27 11:08:59 +00:00
ext2fs ext2fs: renumber the license clauses to avoid skipping #3. 2016-12-02 19:47:23 +00:00
fdescfs Hide the boottime and bootimebin globals, provide the getboottime(9) 2016-07-27 11:08:59 +00:00
fifofs Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible 2016-06-26 20:07:24 +00:00
fuse If a local (AF_LOCAL, AF_UNIX) socket creation (bind) is attempted 2016-05-18 22:23:20 +00:00
msdosfs Use buffer pager for msdosfs. 2016-10-28 11:46:15 +00:00
nandfs Fix panic() message reporting ufs instead of nandfs 2016-10-13 19:33:07 +00:00
nfs Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. 2016-12-23 23:14:53 +00:00
nfsclient Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. 2016-12-23 23:14:53 +00:00
nfsserver Fix the NFSv4.1 server for Open reclaim after a reboot. 2016-12-05 22:36:25 +00:00
nullfs NFSv4 client tracks opens, and the track records are only dropped when 2016-11-27 09:20:58 +00:00
procfs Hide the boottime and bootimebin globals, provide the getboottime(9) 2016-07-27 11:08:59 +00:00
pseudofs Remove Giant asserts. Update comment. 2016-08-03 08:57:15 +00:00
smbfs Replace all remaining calls to vprint(9) with vn_printf(9), and remove 2016-08-10 16:12:31 +00:00
tmpfs When tmpfs and POSIX shm pagein a page for the sole purpose of performing 2016-12-11 19:24:41 +00:00
udf On error, bread(9) zeroes buffer pointer, do not dereference it. 2016-11-22 13:24:57 +00:00
unionfs Replace all remaining calls to vprint(9) with vn_printf(9), and remove 2016-08-10 16:12:31 +00:00