Remote and local adv lock servers might de-synchronize (the added comment

explains the plausible scenario), resulting in EDEADLK returned on the local registration attempt. Handle this by re-trying the local op [1]. On unmount, local registration abort is indicated as EINTR, abort the nlm call as well. Reported and tested by: pho Suggested and reviewed by: dfr (previous version, [1]) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (delphij)
svn path=/head/; revision=302020
2016-06-19 18:32:35 +00:00 · 2016-06-19 18:32:35 +00:00 · 04b49c9154 · 2020-12-20 02:59:44 +00:00
commit 04b49c9154
parent e37dfd3d2b
1 changed files with 31 additions and 1 deletions
--- a/sys/nlm/nlm_advlock.c
+++ b/sys/nlm/nlm_advlock.c
@ -713,7 +713,37 @@ nlm_record_lock(struct vnode *vp, int op, struct flock *fl,
 	newfl.l_pid = svid;
 	newfl.l_sysid = NLM_SYSID_CLIENT | sysid;

-	error = lf_advlockasync(&a, &vp->v_lockf, size);
+	for (;;) {
+		error = lf_advlockasync(&a, &vp->v_lockf, size);
+		if (error == EDEADLK) {
+			/*
+			 * Locks are associated with the processes and
+			 * not with threads.  Suppose we have two
+			 * threads A1 A2 in one process, A1 locked
+			 * file f1, A2 is locking file f2, and A1 is
+			 * unlocking f1. Then remote server may
+			 * already unlocked f1, while local still not
+			 * yet scheduled A1 to make the call to local
+			 * advlock manager. The process B owns lock on
+			 * f2 and issued the lock on f1.  Remote would
+			 * grant B the request on f1, but local would
+			 * return EDEADLK.
+			*/
+			pause("nlmdlk", 1);
+			/* XXXKIB allow suspend */
+		} else if (error == EINTR) {
+			/*
+			 * lf_purgelocks() might wake up the lock
+			 * waiter and removed our lock graph edges.
+			 * There is no sense in re-trying recording
+			 * the lock to the local manager after
+			 * reclaim.
+			 */
+			error = 0;
+			break;
+		} else
+			break;
+	}
 	KASSERT(error == 0 || error == ENOENT,
 	    ("Failed to register NFS lock locally - error=%d", error));
 }