Fix up NFS client write error handling. Errors are split into

recoverable and unrecoverable. For the former, we redirty the buffer and hang onto it for future retries. For the latter (eg. ESTALE), we discard the buffer and return the error back to the user on the next syscall. This fixes a number of vfs panics and fixes having a large number of dirty buffers (that cannot be written out and reclaimed) from hanging around. Thanks to ups@ for discussions on this issue. Reported by: kris, Kai, others Approved by: re (kensmith)
2007-07-03 18:30:55 +00:00 · 2007-07-03 18:30:55 +00:00 · 03e557fd5a
commit 03e557fd5a
parent 9d53363bc8
1 changed files with 15 additions and 0 deletions
--- a/sys/nfsclient/nfs_bio.c
+++ b/sys/nfsclient/nfs_bio.c
@ -1714,6 +1714,19 @@ nfs_doio(struct vnode *vp, struct buf *bp, struct ucred *cr, struct thread *td)
 		 * the vp's paging queues so we cannot call bdirty().  The
 		 * bp in this case is not an NFS cache block so we should
 		 * be safe. XXX
+		 *
+		 * The logic below breaks up errors into recoverable and 
+		 * unrecoverable. For the former, we clear B_INVAL|B_NOCACHE
+		 * and keep the buffer around for potential write retries.
+		 * For the latter (eg ESTALE), we toss the buffer away (B_INVAL)
+		 * and save the error in the nfsnode. This is less than ideal 
+		 * but necessary. Keeping such buffers around could potentially
+		 * cause buffer exhaustion eventually (they can never be written
+		 * out, so will get constantly be re-dirtied). It also causes
+		 * all sorts of vfs panics. For non-recoverable write errors, 
+		 * also invalidate the attrcache, so we'll be forced to go over
+		 * the wire for this object, returning an error to user on next
+		 * call (most of the time).
 		 */
    		if (error == EINTR || error == EIO || error == ETIMEDOUT
 		    || (!error && (bp->b_flags & B_NEEDCOMMIT))) {
@ -1731,9 +1744,11 @@ nfs_doio(struct vnode *vp, struct buf *bp, struct ucred *cr, struct thread *td)
 	    	} else {
 		    if (error) {
 			bp->b_ioflags |= BIO_ERROR;
+			bp->b_flags |= B_INVAL;
 			bp->b_error = np->n_error = error;
 			mtx_lock(&np->n_mtx);
 			np->n_flag |= NWRITEERR;
+			np->n_attrstamp = 0;
 			mtx_unlock(&np->n_mtx);
 		    }
 		    bp->b_dirtyoff = bp->b_dirtyend = 0;