From a37313d23426c47e1813239ae8963bb2024ed621 Mon Sep 17 00:00:00 2001 From: Matthew Dillon Date: Wed, 19 Jun 2002 09:39:41 +0000 Subject: [PATCH] In rev 1.72 a situation related to write/mmap was fixed which could result in a user process gaining visibility into the 'old' contents of a filesystem block. There were two cases: (1) when uiomove() fails (user process issues illegal write), and (2) when uiomove() overlaps a mmap() of the same file at the same offset (fault -> recursive buffer I/O reads contents of old block). Unfortunately 1.72 also had the unintended effect of forcing the filesystem to do a read-before-write in the case of a full-block-write (non append case), e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This destroys performance.. not only is a read forced for every write, but clustering breaks as well. The solution is to clear the buffer manually in the full-block case rather then asking BALLOC to do it (BALLOC issues the read-before-write). In the partial-block case we want BALLOC to do it because the read-before-write is necessary. This patch should greatly improve database and news-feed server performance. Found by: MKI MFC after: 3 days --- sys/ufs/ufs/ufs_readwrite.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/sys/ufs/ufs/ufs_readwrite.c b/sys/ufs/ufs/ufs_readwrite.c index af7298c26029..c76081f5bd62 100644 --- a/sys/ufs/ufs/ufs_readwrite.c +++ b/sys/ufs/ufs/ufs_readwrite.c @@ -491,23 +491,27 @@ WRITE(ap) vnode_pager_setsize(vp, uio->uio_offset + xfersize); /* - * Avoid a data-consistency race between write() and mmap() - * by ensuring that newly allocated blocks are zerod. The - * race can occur even in the case where the write covers - * the entire block. + * We must perform a read-before-write if the transfer size + * does not cover the entire buffer. */ - flags |= B_CLRBUF; -#if 0 if (fs->fs_bsize > xfersize) flags |= B_CLRBUF; else flags &= ~B_CLRBUF; -#endif /* XXX is uio->uio_offset the right thing here? */ error = UFS_BALLOC(vp, uio->uio_offset, xfersize, ap->a_cred, flags, &bp); if (error != 0) break; + /* + * If the buffer is not valid we have to clear out any + * garbage data from the pages instantiated for the buffer. + * If we do not, a failed uiomove() during a write can leave + * the prior contents of the pages exposed to a userland + * mmap(). XXX deal with uiomove() errors a better way. + */ + if ((bp->b_flags & B_CACHE) == 0 && fs->fs_bsize <= xfersize) + vfs_bio_clrbuf(bp); if (ioflag & IO_DIRECT) bp->b_flags |= B_DIRECT; if (ioflag & IO_NOWDRAIN)