- markvoldirty() needs to write to underlying GEOM provider. We
have to do that *before* g_access() which sets the GEOM provider
to read-only.
- Remove dirty flag before free'ing iconv related resources. The
dirty flag removal could fail, and it is hard to revert the
iconv-free after the fail.
- Mark volume as dirty if we have failed to mark it clean for safe.
- Other style fixes to the touched functions.
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.
This is more complete and correct than in ffs. Several places in ffs
are are still missing the choice. ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).
However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too. Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race. Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
leaving space for adding missing options. Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.
(except indirectly for the size pseudo-attribute). If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them. (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)
Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).
This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
(it is established practice) and ``-o whiteout=whenneeded'' is less
disk-space using mode especially for resource restricted environments
like embedded environments. (Contributed by Ed Schouten. Thanks)
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
Some folks who have reported some issues have solved with transparent mode.
We guess it is time to change the default copy mode. The transparent-mode is
the best in most situations.
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
applications that use procfs on unionfs.
- Removed unionfs internal cache mechanism because it has
vfs_cache support instead. As a result, it just simplified code of
unionfs.
- Fixed kern/111262 issue.
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
directory itself (rather than any of its contents) is visible to the
current thread.
MFC after: 1 week
PR: kern/90063
Submitted by: john of 8192.net
Approved by: re (kensmith)
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.
This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy. This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search. Starting the search
at 0 instead of at (the in-core copy of) fsinxtfree did more than
defeat the reasons for existence of fsinxtfree. fsinxtfree exists
mainly to avoid having to start at 0 for just the first search per
mount, but has the side effect of reducing bias towards allocating
near cluster 0. The bias would normally only be generated by the
first search per mount (if fsinxtfree is not supported), but since
we also adjusted the in-core copy of fsinxtfree here, we were doing
extra work to maximize the bias.
Approved by: re (kensmith)
Eliminates panics due to locking issues.
Idea taken from src/sys/gnu/fs/xfs/FreeBSD/xfs_super.c.
PR: 89966, 92000, 104393
Reported by: H. Matsuo <hiroshi50000 yahoo co jp>,
Chris <m2chrischou gmail.com>,
Andrey V. Elsukov <bu7cher yandex ru>,
Jan Henrik Sylvester <me janh de>
Approved by: re (kensmith)
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.
Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.
The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer. They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.
The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.
The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do. Stack use from this
is large but not too large. This also fixes a memory leak on module
unload.
Reviewed by: kib
Approved by: re (kensmith)
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)
Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow
In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home
Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)
Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.
Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)
(*) This is a WIP
[1] by pjd via howardsu
Thanks kib@ for his valuable VFS related comments.
Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)
(uio_offset < 0) since this can't happen. If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).
In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write()
already didn't check.
In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid. ffs checks using KASSERT(),
and that is enough sanity checking. In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.
In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.
In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.
Approved by: re (kensmith) (blanket)
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).
Improve an error message by quoting ".", and by not printing large positive
values as negative ones.
Approved by: re (kensmith) (blanket)
namespace pollution in <sys/vnode.h>.
Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.
Approved by: re (kensmith) (blanket)
sector, instead of failing the whole mount if it is garbage. Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).
This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD. 1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary. Now we handle the magic value as a special case of
fixing up all out of bounds values.
Also fix up the estimated next free cluster number when there is no
fsinfo sector. We were using 0, but CLUST_FIRST is safer.
Approved by: re (kensmith)
message explained why the size is 1 sector, but the code used a
size of 1 cluster.
I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache. Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.
Approved by: re (kensmith)
- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.
Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.
Reported by: Craig Boston <craig xfoil gank org>
PR: kern/114870
Approved by: re (tmpfs blanket)
o Initialize ownerships and permissions. They were garbage (0) for
root mounts since vfs_mountroot_try() doesn't ask for them to be set
and msdosfs's old incomplete code to set them was removed. The
garbage happened to give the correct ownerships root:wheel, but it
gave permissions 000 so init could not be execed. Use the macros
for root: wheel and 0755. (The removed code gave 0:0 and 0777. 0755
is more normal and secure, thought wrong for /tmp.)
o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
correct place, as in ffs. For root mounts, it is only passed in
mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
and nothing translates the flag to the "ro" option string. msdosfs
only looked for it in the string, so it gave a rw mount for root
mounts without even clearing the flag in mp->mnt_flags, so the final
state was inconsistent. Checking the flag only in mp->mnt_flags
works for initial userland mounts too. The MNT_UPDATE case is
messier.
The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro. This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw. The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful. So fsck must be turned off to use
msdosfs as root. This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.
Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.
Approved by: re (kensmith)
physical memory pages into account for tm_maxfilesize.
Reported by: Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by: Howard Su
Approved by: re (tmpfs blanket)
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read). mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.
msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes. Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.
The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code. This implementation loops
calling a lower level bmap function to do the hard parts. This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.
Approved by: re (hrs)