to write all the dirty blocks. If some of those blocks have dependencies,
they will be remarked dirty when the I/O completes. On systems with
really fast I/O systems, it is possible to get in an infinite loop trying
to flush the buffers, because the I/O finishes before we can get all the
dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The
previous algorithm looped over the v_dirtyblkhd list writing out buffers
until the list emptied.) So, now we mark each buffer that we try to
write so that we can distinguish the ones that are being remarked dirty
from those that we have not yet tried to flush. Once we have tried to
push every buffer once, we then push any associated metadata that is
causing the remaining buffers to be redirtied.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Specifically, the test was in the wrong place, lacked a cast, didn't
unlock the node, and exited to bad rather than abortit. Now we don't
allow renaming of a file with LINK_MAX references. Move the test to
earlier in the code as it is closer to where ip is obtained, as that
is the style of the rest of the function.
Didn't fix the problems bruce pointed out in the rename man page to
include EMLINK, nor address his complaints about how the whole idea of
incrementing the link count during a rename is potentially asking for
trouble.
Also didn't try to correct potential problem Terry pointed out with
decrements not being similarly protected against underflow.
turns out to not be useful to unwind the dependencies and continue in
the face of a fatal error.
Also changed the log() to a printf() in softdep_error() so that it will
be output in the case of a impending panic.
Submitted by: Kirk McKusick <mckusick@mckusick.com>
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
MNT_WAIT when we mean boolean `true' or check for that value not being
passed. There was no problem in practice because MNT_WAIT had the
magic value of 1.
I/O requests must be marked P_SYSTEM because if it isn't and the system
decides to swap it or (god forbid) kill it, the system stands a good
chance of locking up.
may be revoked, so vnop routines must be careful about accessing
the vnode if they may have blocked.
Fixed marking for update after successfully reading or writing 0
bytes. In this case, POSIX.1 specifies marking if and only if the
requested count is nonzero, but rev.1.86 never marked.
basically do a on-the-fly defragmentation of the FFS filesystem, changing
file block allocations to make them contiguous. Thanks to Kirk McKusick
for providing hints on what needed to be done to get this working.
They checked for the magic major number for the "device" behind mfs
mount points. Use a more obvious check for this device.
Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>
when bdevsw[] became sparse. We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.
Removed redundant `major(dev) < nblkdev' tests instead of updating them.
partition that the label ioctl is being done on just because it has
offset 0, since there is no guarantee that such a partition is large
enough to contain the label. Don't use the wrong raw partition (0
instead of RAW_PART).
This fixes problems rewriting bizarre labels (with a nonzero offset
for the 'a' partition) in newfs(8). Such labels shouldn't normally
be used, but creating them was allowed if the ioctl was done on the
raw partition, and sysinstall creates them if the root partition isn't
allocated first.
Note that allowing write access to a partition other than the one that
has been checked for write access doesn't increase security holes
significantly, since write access to any partition already allows
changing the in-core label.
This fix should be in 3.0R. Rev.1.26 of newfs/newfs.c shouldn't be
in 3.0R.
but when i_effnlink was added to support soft updates, there was only
room for 4 spares. The number of spares was not reduced, so the inode
size became 260 (on i386's), or 512 after rounding up by malloc().
Use one spare field in `struct dinode' instead of the 5th spare field
in the inode and reduced to 4 spares in the inode so that the size is
256 again.
Changed the types of the spares in the inode from int to u_int32_t
so that the inode size has more chance of being <= 256 under other
arches, and downdated ext2fs to match (it was broken to use ints
before rev.1.1).
As a side effect, a few wakeup() calls are added, which might fix some of the
missing vm_page wakeups people have been seeing.
Reviewed by: Doug Rabson <dfr@nlsystems.com>
The problem is caused when a directory block is compacted. When this
occurs, softdep_change_directoryentry_offset() is called to relocate each
directory entry and adjust its matching diradd structure, if any, to match
the new location of the entry. The bug is that while
softdep_change_directoryentry_offset() correctly adjusts the offsets of
the diradd structures on the pd_diraddhd[] lists (which are not yet ready
to be committed to disk), it fails to adjust the offsets of the diradd
structures on the pd_pendinghd list (which are ready to be committed to
disk). This causes the dependency structures to be inconsistent with
the buf contents. Now, if the compaction has moved a directory entry to
the same offset as one of the diradd structures on the pd_pendinghd list
*and* a syscall is done that tries to remove this directory entry before
this directory block has been written to disk (which would empty
pd_pendinghd), a sanity check in newdirrem() will call panic() when it
notices that the inode number in the entry that it is to be removed doesn't
match the inode number in the diradd structure with that offset of that
entry.
Reviewed by: Kirk McKusick <mckusick@McKusick.COM>
Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>
happen when an NFS exported filesystem tries to remove a locally
mounted on directory.
PR: kern/7272
Submitted by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
- don't set the clean flag on unmount of an unclean filesystem that was
(forcibly) mounted rw.
- set the clean flag on rw -> ro update of a mounted initially-clean
filesystem.
- fixed some style bugs (mostly long lines).
This uses the fs_flags field and FS_UNCLEAN state bit which were
introduced in the softdep changes. NetBSD uses extra state bits in
fs_clean.
Reviewed by: luoqui
Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Two minor changes are also included,
1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set,
vn_lock should always succeed in these cases.
2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount
a little more unstable. It also keeps us in sync with other BSDs.
Suggested by: Bruce Evans <bde@zeta.org.au>
references to them.
The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
device drivers about sectors no longer in use.
Device-drivers receive the call through d_strategy, if they have
D_CANFREE in d_flags.
This allows flash based devices to erase the sectors and avoid
pointlessly carrying them around in compactions.
Reviewed by: Kirk Mckusick, bde
Sponsored by: M-Systems (www.m-sys.com)
clustering is obsolescent technology so hardly anyone noticed. On
a DORS 32160 SCSI drive with 4 tags, read clustering makes very
little difference even for huge sequential reads. However, on a
ZIP SCSI drive with 0 tags, the minimum overhead per block is about
40 msec, so very large clusters must be used to get anywhere near
the maximum transfer rate. Using clusters consisting of 1 8K block
reduces the transfer rate to about 250K/sec. Under msdosfs, missing
read clustering is normal and a cluster size of 1 512 byte block
reduces the transfer rate to about 25K/sec.
Broken in: rev.1.18
formats and args to match. Fixed old printf format errors (all related;
most were hidden by calling printf indirectly).
This change somehow avoids compiler bugs for 64-bit longs on i386's,
although it increases the number of 64-bit calculations.
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
as possible (when the inode is reclaimed). Temporarily only do
this if option UFS_LAZYMOD configured and softupdates aren't enabled.
UFS_LAZYMOD is intentionally left out of /sys/conf/options.
This is mainly to avoid almost useless disk i/o on battery powered
machines. It's silly to write to disk (on the next sync or when the
inode becomes inactive) just because someone hit a key or something
wrote to the screen or /dev/null.
PR: 5577
Previous version reviewed by: phk
in ufs_setattr() so that there is no need to pass timestamps to
UFS_UPDATE() (everything else just needs the current time). Ignore
the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes()
(was: itimes()) to do the update. The timestamps are still passed
so that all the callers don't need to be changed yet.
that had an inode that has not yet been written to disk, when the inode of the
new file is also not yet written to disk, and your old directory entry is not
yet on disk but you need to remove it and the new name exists in memory
but has been deleted but the transaction to write the deleted name to disk
exists and has not yet been cancelled by the request to delete the non
existant name. I don't know how kirk could have missed such a glaring
problem for so long. :-) Especially since the inconsitency survived on
the disk for a whole 4 second on average before being fixed by other code.
This was not a crashing bug but just led to filesystem inconsitencies
if you crashed.
Submitted by: Kirk McKusick (mckusick@mckusick.com)
(doingdirectory && !newparent) case of ufs_rename().
rename("D1/X/", "D2/Y/") gives a wrong link count for D2.
Submitted by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Kirk McKusick <mckusick@McKusick.COM>
is now ignored for special files, so that mounting root with option
noatime doesn't break reporting of idle times in programs like `w'.
The problem of execessive disk updates just to stamp atimes will be
handled for special files by only writing atimes to disk when inodes
become active. This works well because special files are relatively
uncommon and their atimes are even more disposable at panic time than
regular files' atimes.
1. mark atimes and mtimes of special files and fifos for update upon
successful completion of non-null i/o, not at the beginning of the
syscall.
2. never update file times for readonly filesystems. They were updated
for stats and closes but not for syncs. The updates were of course
only in-core and were thrown away when the inode was uncached, so
the times sometimes appeared to go backwards.
Improved comments in code related to (1) (mostly by removing them).
Unmacroized ITIMES(). The test in (2) bloated it even more. Don't
call getmicrotime() in the function version of it when we only need
the time in seconds.