freebsd-skq/sys/rpc
Rick Macklem dfd174d6e0 Fix the client side krpc from doing TCP reconnects for ERESTART from sosend().
When sosend() replies ERESTART in the client side krpc, it indicates that
the RPC message hasn't yet been sent and that the send queue is full or
locked while a signal is posted for the process.
Without this patch, this would result in a RPC_CANTSEND reply from
clnt_vc_call(), which would cause clnt_reconnect_call() to create a new
TCP transport connection. For most NFS servers, this wasn't a serious problem,
although it did imply retries of outstanding RPCs, which could possibly
have missed the DRC.
For an NFSv4.1 mount to AmazonEFS, this caused a serious problem, since
AmazonEFS often didn't retain the NFSv4.1 session and would reply with
NFS4ERR_BAD_SESSION. This implies to the client a crash/reboot which
requires open/lock state recovery.

Three options were considered to fix this:
- Return the ERESTART all the way up to the system call boundary and then
  have the system call redone. This is fraught with risk, due to convoluted
  code paths, asynchronous I/O RPCs etc. cperciva@ worked on this, but it
  is still a work in prgress and may not be feasible.
- Set SB_NOINTR for the socket buffer. This fixes the problem, but makes
  the sosend() completely non interruptible, which kib@ considered
  inappropriate. It also would break forced dismount when a thread
  was blocked in sosend().
- Modify the retry loop in clnt_vc_call(), so that it loops for this case
  for up to 15sec. Testing showed that the sosend() usually succeeded by
  the 2nd retry. The extreme case observed was 111 loop iterations, or
  about 100msec of delay.
This third alternative is what is implemented in this patch, since the
change is:
- localized
- straightforward
- forced dismount is not broken by it.

This patch has been tested by cperciva@ extensively against AmazonEFS.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	2 weeks
2017-05-07 12:12:45 +00:00
..
rpcsec_gss Hide the boottime and bootimebin globals, provide the getboottime(9) 2016-07-27 11:08:59 +00:00
auth_none.c
auth_unix.c
auth.h
authunix_prot.c
clnt_bck.c Deobfuscate cleanup path in clnt_bck_create(..) 2016-06-10 17:53:28 +00:00
clnt_dg.c Deobfuscate cleanup path in clnt_dg_create(..) 2016-07-11 06:58:24 +00:00
clnt_rc.c Fix a crash during unmount of an NFSv4.1 mount. 2017-04-10 22:47:18 +00:00
clnt_stat.h
clnt_vc.c Fix the client side krpc from doing TCP reconnects for ERESTART from sosend(). 2017-05-07 12:12:45 +00:00
clnt.h
getnetconfig.c
krpc.h
netconfig.h
nettype.h
pmap_prot.h
replay.c
replay.h
rpc_callmsg.c
rpc_com.h
rpc_generic.c Remove some NULL checks for M_WAITOK allocations. 2016-03-29 13:56:59 +00:00
rpc_msg.h
rpc_prot.c
rpc.h
rpcb_clnt.c Fix the rpcb_getaddr() definition to match its declaration. 2016-06-09 14:33:00 +00:00
rpcb_clnt.h
rpcb_prot.c
rpcb_prot.h
rpcm_subs.h Renumber copyright clause 4 2017-02-28 23:42:47 +00:00
rpcsec_gss.h
svc_auth_unix.c
svc_auth.c
svc_auth.h
svc_dg.c Don't test for xpt not being NULL before calling svc_xprt_free(..) 2016-07-11 07:24:56 +00:00
svc_generic.c
svc_vc.c Quell false positives in svc_vc_create and svc_vc_create_conn with cd and xprt 2016-05-27 08:48:33 +00:00
svc.c add svcpool_close to handle killed nfsd threads 2017-02-14 17:49:08 +00:00
svc.h add svcpool_close to handle killed nfsd threads 2017-02-14 17:49:08 +00:00
types.h sys/rpc: minor spelling fixes. 2016-05-06 01:49:46 +00:00
xdr.h