In rev 1.72 a situation related to write/mmap was fixed which could result

in a user process gaining visibility into the 'old' contents of a filesystem
block.  There were two cases:  (1) when uiomove() fails (user process issues
illegal write), and (2) when uiomove() overlaps a mmap() of the same file at
the same offset (fault -> recursive buffer I/O reads contents of old block).

Unfortunately 1.72 also had the unintended effect of forcing the filesystem
to do a read-before-write in the case of a full-block-write (non append case),
e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'.  This
destroys performance.. not only is a read forced for every write, but
clustering breaks as well.

The solution is to clear the buffer manually in the full-block case rather
then asking BALLOC to do it (BALLOC issues the read-before-write).  In the
partial-block case we want BALLOC to do it because the read-before-write
is necessary.  This patch should greatly improve database and news-feed
server performance.

Found by: MKI <mki@mozone.net>
MFC after:	3 days
This commit is contained in:
Matthew Dillon 2002-06-19 09:39:41 +00:00
parent 98dd638658
commit a37313d234

View File

@ -491,23 +491,27 @@ WRITE(ap)
vnode_pager_setsize(vp, uio->uio_offset + xfersize);
/*
* Avoid a data-consistency race between write() and mmap()
* by ensuring that newly allocated blocks are zerod. The
* race can occur even in the case where the write covers
* the entire block.
* We must perform a read-before-write if the transfer size
* does not cover the entire buffer.
*/
flags |= B_CLRBUF;
#if 0
if (fs->fs_bsize > xfersize)
flags |= B_CLRBUF;
else
flags &= ~B_CLRBUF;
#endif
/* XXX is uio->uio_offset the right thing here? */
error = UFS_BALLOC(vp, uio->uio_offset, xfersize,
ap->a_cred, flags, &bp);
if (error != 0)
break;
/*
* If the buffer is not valid we have to clear out any
* garbage data from the pages instantiated for the buffer.
* If we do not, a failed uiomove() during a write can leave
* the prior contents of the pages exposed to a userland
* mmap(). XXX deal with uiomove() errors a better way.
*/
if ((bp->b_flags & B_CACHE) == 0 && fs->fs_bsize <= xfersize)
vfs_bio_clrbuf(bp);
if (ioflag & IO_DIRECT)
bp->b_flags |= B_DIRECT;
if (ioflag & IO_NOWDRAIN)