In rev 1.72 a situation related to write/mmap was fixed which could result
in a user process gaining visibility into the 'old' contents of a filesystem block. There were two cases: (1) when uiomove() fails (user process issues illegal write), and (2) when uiomove() overlaps a mmap() of the same file at the same offset (fault -> recursive buffer I/O reads contents of old block). Unfortunately 1.72 also had the unintended effect of forcing the filesystem to do a read-before-write in the case of a full-block-write (non append case), e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'. This destroys performance.. not only is a read forced for every write, but clustering breaks as well. The solution is to clear the buffer manually in the full-block case rather then asking BALLOC to do it (BALLOC issues the read-before-write). In the partial-block case we want BALLOC to do it because the read-before-write is necessary. This patch should greatly improve database and news-feed server performance. Found by: MKI <mki@mozone.net> MFC after: 3 days
This commit is contained in:
parent
98dd638658
commit
a37313d234
@ -491,23 +491,27 @@ WRITE(ap)
|
||||
vnode_pager_setsize(vp, uio->uio_offset + xfersize);
|
||||
|
||||
/*
|
||||
* Avoid a data-consistency race between write() and mmap()
|
||||
* by ensuring that newly allocated blocks are zerod. The
|
||||
* race can occur even in the case where the write covers
|
||||
* the entire block.
|
||||
* We must perform a read-before-write if the transfer size
|
||||
* does not cover the entire buffer.
|
||||
*/
|
||||
flags |= B_CLRBUF;
|
||||
#if 0
|
||||
if (fs->fs_bsize > xfersize)
|
||||
flags |= B_CLRBUF;
|
||||
else
|
||||
flags &= ~B_CLRBUF;
|
||||
#endif
|
||||
/* XXX is uio->uio_offset the right thing here? */
|
||||
error = UFS_BALLOC(vp, uio->uio_offset, xfersize,
|
||||
ap->a_cred, flags, &bp);
|
||||
if (error != 0)
|
||||
break;
|
||||
/*
|
||||
* If the buffer is not valid we have to clear out any
|
||||
* garbage data from the pages instantiated for the buffer.
|
||||
* If we do not, a failed uiomove() during a write can leave
|
||||
* the prior contents of the pages exposed to a userland
|
||||
* mmap(). XXX deal with uiomove() errors a better way.
|
||||
*/
|
||||
if ((bp->b_flags & B_CACHE) == 0 && fs->fs_bsize <= xfersize)
|
||||
vfs_bio_clrbuf(bp);
|
||||
if (ioflag & IO_DIRECT)
|
||||
bp->b_flags |= B_DIRECT;
|
||||
if (ioflag & IO_NOWDRAIN)
|
||||
|
Loading…
x
Reference in New Issue
Block a user