Commit Graph

2086 Commits

Author SHA1 Message Date
rwatson
af893b7bf8 Use VOP_NULL rather than VOP_PANIC for Coda's vop_print routine, so as
to avoid panicking in DDB show lockedvnods.

MFC after:	3 days
2008-01-19 13:41:56 +00:00
rwatson
7c43871d32 Lock the new directory vnode returned by coda_mkdir(), as this is required
by FreeBSD's vnode locking protocol.

MFC after:	3 days
2008-01-19 13:29:14 +00:00
rwatson
642dbf24b6 Borrow the VM object associated with an underlying cache vnode with the
Coda vnode derived from it, in the style of nullfs.  This allows files
in the Coda file system to be memory-mapped, such as with execve(2) or
mmap(2).

MFC after:	3 days
Reported by:	Rune <u+openafsdev-sr55 at chalmers dot se>
2008-01-19 13:27:14 +00:00
kib
8aee27b5a3 udf_vget() shall vgone() the vnode when the file_entry cannot be allocated
or read from the volume. Otherwise, half-constructed vnode could be found
later and cause panic when accessed.

PR:	118322
MFC after:	1 week
2008-01-18 12:09:54 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
attilio
9e182da191 Remove explicit calling of lockmgr() with the NULL argument.
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.

In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().

Tested by: kris

[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.
2008-01-08 23:48:31 +00:00
jhb
77e5cd5541 Lock the vnode interlock while reading v_usecount to update si_usecount
in a cdev in devfs_reclaim().

MFC after:	3 days
Reviewed by:	jeff (a while ago)
2008-01-08 04:45:24 +00:00
jhb
f8a246b979 Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
  a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
  object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
  vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by:	rwatson
2008-01-07 20:05:19 +00:00
attilio
be66714153 g_vfs_close() wants the sx topology lock held while executing, so just
add correct locking to the operation of unmounting.
This will prevent debugging kernels from panicking if mounting a
non-hpfs partition (I'm not sure if this can be a problem with a
successful mounting operation though).

MFC: 3 days
2008-01-07 16:51:24 +00:00
jeff
ce18638805 Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
   in such a way that the ops vector is only valid after the data, type,
   and flags are valid.
 - Protect f_flag and f_count with atomic operations.
 - Remove the global list of all files and associated accounting.
 - Rewrite the unp garbage collection such that it no longer requires
   the global list of all files and instead uses a list of all unp sockets.
 - Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by:	kris, pho
2007-12-30 01:42:15 +00:00
attilio
c720b77ca9 Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace.
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.

FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.

In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.
2007-12-28 00:38:13 +00:00
rwatson
bdee30611d Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
markus
042cca2ff9 Fix calculation of descriptor tag checksums. According to ECMA-167, Part 4,
7.2.3, bytes 0-3 and 5-15 are used to calculate the checksum of a descriptor
tag.

PR:		kern/90521
Submitted by:	Björn König <bkoenig@cs.tu-berlin.de>
Reviewed by:	scottl
Approved by:	emax (mentor)
2007-12-11 19:49:40 +00:00
delphij
e93a39275f Turn MPASS(0) into panic with more obvious reason why the assertion
is failed.
2007-12-07 00:00:21 +00:00
delphij
17d10d92fe size_max should be unsigned, as such, use size_t here. 2007-12-06 23:19:05 +00:00
wkoszek
411cf00f62 Explicitly initialize 'error' to 0 (two places). It lets one to build tmpfs
from the latest source tree with older compiler--gcc3.

Reviewed by:	kib@ (on freebsd-current@)
Approved by:	cognet@ (mentor)
2007-12-04 20:14:15 +00:00
maxim
f5a81b2cdb o English lesson from bde@: "iff" is not a typo, it means "if and only if".
Backout previous.
2007-11-18 09:21:30 +00:00
delphij
c5d6580fd4 MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
   because nothing protects vnode/tmpfs node between lookup is done, and
   actual operation is performed, in the case the vnode lock is dropped.
   At least, this is the case with the from vnode for rename.

   For now, we do the linear lookup in the parent node. This has its own
   drawbacks. Not mentioning speed (that could be fixed by using hash), the
   real problem is the situation where several hardlinks exist in the dvp.
   But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
   reclaimed and allocated again. This fixes MPASS assertion at the start
   of the tmpfs_lookup() reported by many.

Submitted by:	kib
2007-11-18 04:52:40 +00:00
delphij
b0d971c673 MFp4: Fix several style(9) bugs.
Submitted by:	des
2007-11-18 04:40:42 +00:00
maxim
8654cdfefc o Mask maximum file permissions we get from mount_ntfs -m
with ACCESSPERMS.  Document in mount_ntfs(8) only the nine
low-order bits of mask are used (taken from mount_msdosfs(8)).

PR:		kern/114856
Submitted by:	Ighighi
MFC after:	1 month
2007-11-17 17:05:01 +00:00
maxim
31bc38f095 o Fix a typo in the comment. 2007-11-17 16:19:48 +00:00
maxim
92ae81ca91 o Do not leak inodes hash table at module unload.
PR:		kern/118017
Submitted by:	Ighighi
MFC after:	1 week
2007-11-13 19:34:06 +00:00
delphij
99be157da9 Correct a stack overflow which will trigger panics when
mode= is specified, caused by incorrect format string
specified to vfs_scanopt() and subsequently vsscanf().

Pointed out by:	kib
Submitted by:	des
2007-11-12 18:57:33 +00:00
trhodes
9d167f82e4 Remove some debugging code that, while useful, doesn't belong in the committed
version.  While here, expand a macro only used once.

Discussed with/oked by:	bde
2007-10-25 08:23:08 +00:00
rwatson
60570a92bf Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
delphij
f1d1779743 Fixes to msdosfs dirtyflag related stuff:
- markvoldirty() needs to write to underlying GEOM provider.  We
   have to do that *before* g_access() which sets the GEOM provider
   to read-only.
 - Remove dirty flag before free'ing iconv related resources.  The
   dirty flag removal could fail, and it is hard to revert the
   iconv-free after the fail.
 - Mark volume as dirty if we have failed to mark it clean for safe.
 - Other style fixes to the touched functions.
2007-10-22 17:43:43 +00:00
bde
c590272b42 Implement the async (really, delayed-write) mount option for msdosfs.
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.

This is more complete and correct than in ffs.  Several places in ffs
are are still missing the choice.  ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).

However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too.  Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race.  Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
2007-10-19 12:23:25 +00:00
bde
d2c2b5f35c Add noclusterr and noclusterw options to the options list. I forgot these
when I implemented clustering.
2007-10-18 16:25:47 +00:00
bde
adbeba35f8 Fix some style bugs in the mount options list. Mainly, sort the list,
leaving space for adding missing options.  Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.
2007-10-18 15:48:10 +00:00
bde
896bab2157 In msdosfs_settattr(), don't do synchronous updates of the denode
(except indirectly for the size pseudo-attribute).  If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them.  (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)

Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).

This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
2007-10-18 07:26:21 +00:00
alfred
3a60df401c Get rid of qaddr_t.
Requested by: bde
2007-10-16 10:54:55 +00:00
daichi
87bd60ac74 This changes give nullfs correctly work with latest unionfs.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:57:11 +00:00
daichi
b4e293afdf Added whiteout behavior option. ``-o whiteout=always'' is default mode
(it is established practice) and ``-o whiteout=whenneeded'' is less
disk-space using mode especially for resource restricted environments
like embedded environments. (Contributed by Ed Schouten. Thanks)

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:55:38 +00:00
daichi
7759a8a0eb Default copy mode has been changed from traditional-mode to transparent-mode.
Some folks who have reported some issues have solved with transparent mode.
We guess it is time to change the default copy mode. The transparent-mode is
the best in most situations.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:53:38 +00:00
daichi
1b42caf41d Fixed un-vrele issue of upper layer root vnode of unionfs.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:52:01 +00:00
daichi
1f6ec6407c Added NULL check code pointed out by Coverity. (via Stanislav
Sedov. Thanks)

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:50:58 +00:00
daichi
bf7aeca620 - It has been become MPSAFE.
- Fixed lock panic issue under MPSAFE.
- Fixed panic issue whenever it locks vnode with reclaim.
- Fixed lock implementations not conforming to vnode_if.src style.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:49:30 +00:00
daichi
f3fd8ae96c Fixed vnode unlock/vrele untreated issues whenever errors have
occurred during some treatments.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:47:44 +00:00
daichi
a009cf6b3c - Added support for vfs_cache on unionfs. As a result, you can use
applications that use procfs on unionfs.
- Removed unionfs internal cache mechanism because it has
  vfs_cache support instead. As a result, it just simplified code of
  unionfs.
- Fixed kern/111262 issue.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:46:11 +00:00
daichi
4aad1608ad Added treatments to prevent readdir infinity loop using with Linux binary
compatibility feature.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:44:06 +00:00
daichi
a763e0d0a2 Changed it frees unneeded memory ASAP.
Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:42:05 +00:00
daichi
dc348d6e70 Log:
Improved access permission check treatments.

Submitted by:   Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:    jeff, kensmith
Approved by:    re (kensmith)
MFC after:      1 week
2007-10-14 13:37:52 +00:00
jhb
3739d97391 Use the correct pid when checking to see whether or not the /proc/<pid>
directory itself (rather than any of its contents) is visible to the
current thread.

MFC after:	1 week
PR:		kern/90063
Submitted by:	john of 8192.net
Approved by:	re (kensmith)
2007-10-05 17:37:25 +00:00
delphij
0c91cfd26b MFp4: Provide a dummy verb "export" to shut up the message
showed up at start when NFS is enabled.

Reported by:	rafan
Approved by:	re (tmpfs blanket)
2007-10-04 17:11:48 +00:00
delphij
679cdcf0e4 Additional work is still needed before we can claim that tmpfs
is stable enough for production usage.  Warn user upon mount.

Approved by:	re (tmpfs blanket)
2007-10-04 17:08:46 +00:00
bde
5cdc06872e Remove some of the pessimizations involving writing the fsi sector.
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.

This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy.  This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
  not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search.  Starting the search
  at 0 instead of at (the in-core copy of) fsinxtfree did more than
  defeat the reasons for existence of fsinxtfree.  fsinxtfree exists
  mainly to avoid having to start at 0 for just the first search per
  mount, but has the side effect of reducing bias towards allocating
  near cluster 0.  The bias would normally only be generated by the
  first search per mount (if fsinxtfree is not supported), but since
  we also adjusted the in-core copy of fsinxtfree here, we were doing
  extra work to maximize the bias.

Approved by:	re (kensmith)
2007-09-23 14:49:32 +00:00
rodrigc
b2b7d089f7 Disable multiple ntfs mounts to the same mountpoint.
Eliminates panics due to locking issues.
Idea taken from src/sys/gnu/fs/xfs/FreeBSD/xfs_super.c.

PR:	89966, 92000, 104393
Reported by:	H. Matsuo <hiroshi50000 yahoo co jp>,
		Chris <m2chrischou gmail.com>,
		Andrey V. Elsukov <bu7cher yandex ru>,
		Jan Henrik Sylvester <me janh de>
Approved by:	re (kensmith)
2007-09-21 23:50:15 +00:00
jeff
3fc0f8b973 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
bde
8e0e951bed Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.

The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer.  They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.

The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.

The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do.  Stack use from this
is large but not too large.  This also fixes a memory leak on module
unload.

Reviewed by:	kib
Approved by:	re (kensmith)
2007-08-31 22:29:55 +00:00
delphij
e83de305a6 MFp4: rework tmpfs_readdir() logic in terms of correctness.
Approved by:	re (tmpfs blanket)
Tested with:	fstest, fsx
2007-08-16 11:00:07 +00:00
jhb
7fdc86bfe3 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
delphij
5496743409 MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
   Remove associated code. [1]
 - Properly use vhold() and vdrop() instead of their unlocked
   versions, we are guaranteed to have the vnode's interlock
   unheld. [1]
 - Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
   with the same way used in modern NetBSD versions. [2]
 - Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by:	kib [1]
Obtained from:	NetBSD [2]
Approved by:	re (tmpfs blanket)
2007-08-10 11:00:30 +00:00
delphij
1e2d5f7f4a MFp4:
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
 - Properly lock around tn_vnode to avoid NULL deference
 - Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with:	fsx, fstest, tmpfs regression test set
Found by:	pho's stress2 suite
Approved by:	re (tmpfs blanket)
2007-08-10 05:24:49 +00:00
bde
7fe18219e6 In msdosfs_read() and msdosfs_write(), don't check explicitly for
(uio_offset < 0) since this can't happen.  If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).

In msdosfs_read(), don't check for (uio_resid < 0).  msdosfs_write()
already didn't check.

In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid.  ffs checks using KASSERT(),
and that is enough sanity checking.  In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.

In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.

In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.

Approved by:	re (kensmith) (blanket)
2007-08-07 10:35:27 +00:00
bde
c2333909d4 Fix and update the comments about the effect of the read-only flag on writing.
They are still too verbose.

Remove nearby unreachable code for handling symlinks.

Approved by:	re (kensmith) (blanket)
2007-08-07 05:42:10 +00:00
bde
bc5f57144e Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).

Improve an error message by quoting ".", and by not printing large positive
values as negative ones.

Approved by:	re (kensmith) (blanket)
2007-08-07 03:59:49 +00:00
bde
fa70acb379 Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix only a couple of whtespace errors).

Approved by:	re (kensmith) (blanket)
2007-08-07 03:43:28 +00:00
bde
e46ce9b810 Fix some style bugs (mainly some whitespace errors).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:38:36 +00:00
bde
23aced0f9b Fix some style bugs (some whitespace errors only).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:22:10 +00:00
bde
a5b7135230 Sort includes.
Remove rotted banal comment attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:28:33 +00:00
bde
17de976386 Sort includes.
Remove banal comments attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:27:35 +00:00
bde
2a03f71880 Sort includes.
Remove banal comments before includes.  Remove rotted banal comments attached
to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:20:37 +00:00
bde
874f990600 Remove unused include(s).
Remove banal comments before includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:11:16 +00:00
bde
8bf123bea8 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 02:08:06 +00:00
bde
cc09c09c22 Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>

Approved by:	re (kensmith) (blanket)
2007-08-07 01:40:27 +00:00
bde
c7403014c7 Include <sys/mutex.h>'s prerequisite <sys/lock.h> instead of depending on
namespace pollution in <sys/vnode.h>.

Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.

Approved by:	re (kensmith) (blanket)
2007-08-07 01:37:59 +00:00
bde
602091cba8 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 01:07:16 +00:00
bde
2e613b8127 Silently fix up the estimated next free cluster number from the fsinfo
sector, instead of failing the whole mount if it is garbage.  Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).

This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD.  1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary.  Now we handle the magic value as a special case of
fixing up all out of bounds values.

Also fix up the estimated next free cluster number when there is no
fsinfo sector.  We were using 0, but CLUST_FIRST is safer.

Approved by:	re (kensmith)
2007-08-05 12:58:34 +00:00
bde
e8377e738c Oops, fix the fix for the i/o size of the fsinfo block. Its log
message explained why the size is 1 sector, but the code used a
size of 1 cluster.

I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache.  Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.

Approved by:	re (kensmith)
2007-08-03 23:13:50 +00:00
delphij
8f39689f22 MFp4 - Refine locking to eliminate some potential race/panics:
- Copy before testing a pointer.  This closes a race window.
 - Use msleep with the node interlock instead of tsleep.
 - Do proper locking around access to tn_vpstate.
 - Assert vnode VOP lock for dir_{atta,de}tach to capture
   inconsistent locking.

Suggested by:	kib
Submitted by:	delphij
Reviewed by:	Howard Su
Approved by:	re (tmpfs blanket)
2007-08-03 06:24:31 +00:00
pjd
fe74e944d1 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
delphij
41fd4384d1 MFp4: Force 64-bit arithmatic when caculating the maximum file size.
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.

Reported by:	Craig Boston <craig xfoil gank org>
PR:		kern/114870
Approved by:	re (tmpfs blanket)
2007-07-24 17:14:53 +00:00
bde
3a5cb59d83 Make using msdosfs as the root file system sort of work:
o Initialize ownerships and permissions.  They were garbage (0) for
  root mounts since vfs_mountroot_try() doesn't ask for them to be set
  and msdosfs's old incomplete code to set them was removed.  The
  garbage happened to give the correct ownerships root:wheel, but it
  gave permissions 000 so init could not be execed.  Use the macros
  for root: wheel and 0755.  (The removed code gave 0:0 and 0777.  0755
  is more normal and secure, thought wrong for /tmp.)

o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
  correct place, as in ffs.  For root mounts, it is only passed in
  mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
  and nothing translates the flag to the "ro" option string.  msdosfs
  only looked for it in the string, so it gave a rw mount for root
  mounts without even clearing the flag in mp->mnt_flags, so the final
  state was inconsistent.  Checking the flag only in mp->mnt_flags
  works for initial userland mounts too.  The MNT_UPDATE case is
  messier.

The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro.  This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw.  The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful.  So fsck must be turned off to use
msdosfs as root.  This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.

Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.

Approved by:	re (kensmith)
2007-07-23 07:10:17 +00:00
delphij
0321a712a7 MFp4: When swapping is not enabled, allow creating files by taking
physical memory pages into account for tm_maxfilesize.

Reported by:	Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by:	Howard Su
Approved by:	re (tmpfs blanket)
2007-07-23 06:54:58 +00:00
bde
8fdd1a79d0 Implement vfs clustering for msdosfs.
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read).  mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.

msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes.  Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.

The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code.  This implementation loops
calling a lower level bmap function to do the hard parts.  This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.

Approved by:	re (hrs)
2007-07-20 17:06:57 +00:00
bde
b2bdcce9e1 Clean up before implementing vfs clustering for msdosfs:
In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().

In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf().  I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.

In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.

In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function.  This comment belongs in
VOP_BMAP.9 but that doesn't exist.

In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.

Approved by:	re (hrs)
2007-07-20 16:21:47 +00:00
rwatson
f1da927af0 Make sure we release the control vnode in Coda:
We allocate coda_ctlvp when /coda is mounted, but never release it.
During the unmount this vnode was marked as UNMOUNTING and when venus
is started a second time the system would hang, possibly waiting for
the old vnode to disappear.

So now we call vrele on the control vnode when file system is unmounted
to drop the reference we got during the mount. I'm pretty sure it is
also necessary to not skip the handling in coda_inactive for the control
vnode, it seems like that is the place we actually get rid of the vnode
once the refcount has dropped to 0.

Submitted by:	Jan Harkes <jaharkes at cs dot cmu dot edu>
Approved by:	re (kensmith)
2007-07-20 11:14:51 +00:00
delphij
d6ea8c65f3 MFp4: Rework on tmpfs's mapped read/write procedures. This
should finally fix fsx test case.

The printf's added here would be eventually turned into
assertions.

Submitted by:	Mingyan Guo (mostly)
Approved by:	re (tmpfs blanket)
2007-07-19 03:34:50 +00:00
rwatson
25869d84fa Complete repo-copy and move of Coda from src/sys/coda to src/sys/fs/coda
by removing files from src/sys/coda, and updating include paths in the
new location, kernel configuration, and  Makefiles.  In one case add
$FreeBSD$.

Discussed with:		anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:		re (kensmith)
Repo-copy madness:	simon
2007-07-12 21:04:58 +00:00
rwatson
fd977ace64 Forced commit to recognize repo-copy of Coda files from src/sys/coda to
src/sys/fs/coda.

Discussed with:         anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:            re (kensmith)
Repo-copy madness:      simon
2007-07-12 20:40:38 +00:00
bde
dba17f21d3 Round up the FAT block size to a multiple of the sector size so that i/o
to the FAT is possible.

Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors.  The magic 3 is
  the default number of 512-byte FAT sectors on a floppy drive.  That
  many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096.  Remove
  MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.

For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work.  We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K).  I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.

This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write).  Microsoft documents support for sector
sizes up to 4K in mdosfs.  ffs is currently limited to 8K for both
read and write.

Approved by:	re (kensmith)
Approved by:	nyan (several years ago)
2007-07-12 17:17:47 +00:00
bde
01c1ec9b1a Fix some bugs involving the fsinfo block (many remain unfixed). This is
part of fixing msdosfs for large sector sizes.  One of the fixed bugs
was fatal for large sector sizes.

1. The fsinfo block has size 512, but it was misunderstood and declared
   as having size 1024, with nothing in the second 512 bytes except a
   signature at the end.  The second 512 bytes actually normally (if
   the file system was created by Windows) consist of a second boot
   sector which is normally (in WinXP) empty except for a signature --
   the normal layout is one boot sector, one fsinfo sector, another
   boot sector, then these 3 sectors duplicated.  However, other
   layouts are valid.  newfs_msdos produces a valid layout with one
   boot sector, one fsinfo sector, then these 2 sectors duplicated.
   The signature check for the extra part of the fsinfo was thus
   normally checking the signature in either the second boot sector
   or the first boot sector in the copy, and thus accidentally
   succeeding.  The extra signature check would just fail for weirder
   layouts with 512-byte sectors, and for normal layouts with any other
   sector size.

   Remove the extra bytes and the extra signature check.

2. Old versions did i/o to the fsinfo block using size 1024, with the
   second half only used for the extra signature check on read.  This
   was harmless for sector size 512, and worked accidentally for sector
   size 1024.  The i/o just failed for larger sector sizes.

   The version being fixed did i/o to the fsinfo block using size
   fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)).  This
   expression makes no sense.  It happens to work for sector small
   sector sizes, but for sector size 32K it gives the preposterous
   value of 64M and thus causes panics.  A sector size of 32768 is
   necessary for at least some DVD-RW's (where the minimum write size
   is 32768 although the minimum read size is 2048).

   Now that the size of the fsinfo block is 512, it always fits in
   one sector so there is no need for a macro to express it.  Just
   use the sector size where the old code uses 1024.

Approved by:	re (kensmith)
Approved by:	nyan (several years ago for a different version of (2))
2007-07-12 16:09:07 +00:00
rwatson
e9fe8191d7 Fix ioctls on the control vnode: ioctls on a character device fail with
ENOTTY.  Make the control vnode a regular file so that ioctls are passed
through to our kernel module.

Submitted by:	Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:	re (kensmith)
2007-07-11 21:34:41 +00:00
rwatson
1c2785e3fe Avoid a panic in insmntque when we pass a NULL mount: this reenables
some previously disabled code which according to the comment caused a
problem during shutdown.  But even that is still better than
triggering a kernel panic whenever venus is started.

Submitted by:	Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:	re (kensmith)
2007-07-11 21:33:46 +00:00
rwatson
4f942d92aa Replace CODA_OPEN with CODA_OPEN_BY_FD: coda_open was disabled because
we can't open container files by device/inode number pair anymore.
Replace the CODA_OPEN upcall with CODA_OPEN_BY_FD, where venus returns
an open file descriptor for the container file.  We can then grab a
reference on the vnode coda_psdev.c:vc_nb_write and use this vnode for
further accesses to the container file.

Submitted by:	Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:	re (kensmith)
2007-07-11 21:32:08 +00:00
rwatson
f30a555ada Resolve Coda mount failing because Coda failed to match the device
operations.  But we don't have to, if we find the coda_mntinfo structure
for this device in our linked list, we know the device is good.

Submitted by:	Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:	re (kensmith)
2007-07-11 21:21:55 +00:00
rwatson
3287a2c53f Avoid crash when opening Coda device: when allocating coda_mntinfo, we
need to initialize dev so that we can actually find the allocated
coda_mntinfo structure later on.

Submitted by:	Jan Harkes <jaharkes@cs.cmu.edu>
Approved by:	re (kensmith)
2007-07-11 20:39:53 +00:00
delphij
ff9bf4b792 MFp4: Make use of the kernel unit number allocation facility
for tmpfs nodes.

Submitted by:	Mingyan Guo <guomingyan gmail com>
Approved by:	re (tmpfs blanket)
2007-07-11 14:26:27 +00:00
bde
fb1dc96e72 Don't use almost perfectly pessimal cluster allocation. Allocation
of the the first cluster in a file (and, if the allocation cannot be
continued contiguously, for subsequent clusters in a file) was randomized
in an attempt to leave space for contiguous allocation of subsequent
clusters in each file when there are multiple writers.  This reduced
internal fragmentation by a few percent, but it increased external
fragmentation by up to a few thousand percent.

Use simple sequential allocation instead.  Actually maintain the fsinfo
sequence index for this.  The read and write of this index from/to
disk still have many non-critical bugs, but we now write an index that
has something to do with our allocations instead of being modified
garbage.  If there is no fsinfo on the disk, then we maintain the index
internally and don't go near the bugs for writing it.

Allocating the first free cluster gives a layout that is almost as good
(better in some cases), but takes too much CPU if the FAT is large and
the first free cluster is not near the beginning.

The effect of this change for untar and tar of a slightly reduced copy
of /usr/src on a new file system was:

Before (msdosfs 4K-clusters):
untar:  459.57 real              untar from cached file (actually a pipe)
tar:    342.50 real              tar from uncached tree to /dev/zero
Before (ffs2 soft updates 4K-blocks 4K-frags)
untar:   39.18 real
tar:     29.94 real
Before (ffs2 soft updates 16K-blocks 2K-frags)
untar:   31.35 real
tar:     18.30 real

After (msdosfs 4K-clusters):
untar    54.83 real
tar      16.18 real

All of these times can be improved further.

With multiple concurrent writers or readers (especially readers), the
improvement is smaller, but I couldn't find any case where it is
negative.  342 seconds for tarring up about 342 MB on a ~47MB/S partition
is just hard to unimprove on.  (This operation would take about 7.3
seconds with reasonably localized allocation and perfect read-ahead.)
However, for active file systems, 342 seconds is closer to normal than
the 16+ seconds above or the 11 seconds with other changes (best I've
measured -- won easily by msdosfs!).  E.g., my active /usr/src on ffs1
is quite old and fragmented, so reading to prepare for the above
benchmark takes about 6 times longer than reading back the fresh copies
of it.

Approved by:	re (kensmith)
2007-07-10 13:20:24 +00:00
delphij
b4ff595242 MFp4:
- Plug memory leak.
 - Respect underlying vnode's properties rather than assuming that
   the user want root:wheel + 0755.  Useful for using tmpfs(5) for
   /tmp.
 - Use roundup2 and howmany macros instead of rolling our own version.
 - Try to fix fsx -W -R foo case.
 - Instead of blindly zeroing a page, determine whether we need a pagein
   order to prevent data corruption.
 - Fix several bugs reported by Coverity.

Submitted by:	Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID:	CID 2550, 2551, 2552, 2557
Approved by:	re (tmpfs blanket)
2007-07-08 15:56:12 +00:00
kib
0ae42a4095 Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by:	tegge (early versions), njl (programming interface)
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:42:37 +00:00
delphij
f5523801a3 MFp4:
- Remove unnecessary NULL checks after M_WAITOK allocations.
 - Use VOP_ACCESS instead of hand-rolled suser_cred()
   calls. [1]
 - Use malloc(9) KPI to allocate memory for string.  The
   optimization taken from NetBSD is not valid for FreeBSD
   because our malloc(9) already act that way. [2]

Requested by:	rwatson [1]
Submitted by:	Howard Su [2]
Approved by:	re (tmpfs blanket)
2007-06-29 05:23:15 +00:00
delphij
66a8b537d6 Space/style cleanups after last set of commits.
Approved by:	re (tmpfs blanket)
2007-06-28 02:39:31 +00:00
delphij
17bb881f75 Staticify most of fifo/vn operations, they should not
be directly exposed outside.

Approved by:	re (tmpfs blanket)
2007-06-28 02:36:41 +00:00
delphij
ee97250932 Use vfs_timestamp instead of nanotime when obtaining
a timestamp for use with timekeeping.

Approved by:	re (tmpfs blanket)
2007-06-28 02:34:32 +00:00
delphij
eedf8bf53d Reorder tf_gen and tf_id in struct tmpfs_fid. This
saves 8 bytes on amd64 architecture.

Obtained from:	NetBSD
Approved by:	re (tmpfs blanket)
2007-06-28 02:32:44 +00:00
delphij
a404044c90 Remove two function prototypes that are no longer used.
Approved by:	re (tmpfs blanket)
2007-06-26 02:08:29 +00:00
delphij
73fffc81bc - Sync with NetBSD's RCSID (HEAD preferred).
- Correct a typo.

Approved by:	re (tmpfs blanket)
2007-06-26 02:07:08 +00:00
delphij
a78e2646a2 MFp4: Several clean-ups and improvements over tmpfs:
- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
   they does not bring any value now.
 - Use |= instead of = when applying VV_ROOT flag.
 - Remove tm_avariable_nodes list.  Use uma to hold the
   released nodes.
 - init/destory interlock mutex of node when init/fini
   instead of ctor/dtor.
 - Change memory computing using u_int to fix negative
   value in 2G mem machine.
 - Remove unnecessary bzero's
 - Rely uma logic to make file id allocation harder to
   guess.
 - Fix some unsigned/signed related things.  Make sure
   we respect -o size=xxxx
 - Use wire instead of hold a page.
 - Pass allocate_zero to obtain zeroed pages upon first
   use.

Submitted by:	Howard Su
Approved by:	re (tmpfs blanket, kensmith)
2007-06-25 18:46:13 +00:00