Commit Graph

503 Commits

Author SHA1 Message Date
Konstantin Belousov
ef6a2be307 Assert that the msdosfs vnode is (e)locked in several places.
The plan is to use vnode lock to protect denode and fat cache,
and having separate lock for block use map.

Change the check and return on impossible condition into KASSERT().

Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:07:49 +00:00
Konstantin Belousov
35fcc0662b Remove unused global statistic about fat cache usage.
Tested by:	pho
MFC after:	3 weeks
2010-02-28 17:06:42 +00:00
Ulrich Spörlein
8fa03d08ca Fix common misspelling of hierarchy
Pointed out by:		bf1783 at gmail
Approved by:		np (cxgb), kientzle (tar, etc.), philip (mentor)
2010-02-20 10:19:19 +00:00
Konstantin Belousov
3c8b687fe1 Invalid filesystem might cause the bp to be never read.
Noted by:	Pedro F. Giffuni <giffunip tutopia com>
Obtanined from:	NetBSD
MFC after:	1 week
2010-02-14 12:10:49 +00:00
Konstantin Belousov
699d124f23 Fix function name in the comment in the second location too.
Submitted by:	ed
MFC after:	1 week
2010-02-13 12:50:09 +00:00
Konstantin Belousov
48d1bcf8e0 - Add idempotency guards so the structures can be used in other utilities.
- Update bpb structs with reserved fields.
- In direntry struct join deName with deExtension. Although a
  fix was attempted in the past, these fields were being overflowed,
  Now this is consistent with the spec, and we can now share the
  WinChksum code with NetBSD.

Submitted by:	Pedro F. Giffuni <giffunip tutopia com>
Mostly obtained from:	NetBSD
Reviewed by:	bde
MFC after:	2 weeks
2010-02-13 12:41:07 +00:00
Konstantin Belousov
8b36e81367 Use M_ZERO instead of calling bzero().
Fix function name in the comment.

MFC after:	1 week
2010-02-13 12:11:03 +00:00
Konstantin Belousov
4f160a1c59 Remove unused macros.
MFC after:	1 week
2010-02-13 11:34:25 +00:00
Poul-Henning Kamp
6778431478 Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.
2009-09-08 13:19:05 +00:00
Poul-Henning Kamp
b34421bf9c Add necessary include. 2009-09-08 13:16:55 +00:00
Konstantin Belousov
d6da640860 Fix r193923 by noting that type of a_fp is struct file *, not int.
It was assumed that r193923 was trivial change that cannot be done
wrong.

MFC after:	2 weeks
2009-06-10 14:24:31 +00:00
Konstantin Belousov
e4d9bdc105 s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args
definition.

Discussed with:	bde
MFC after:	2 weeks
2009-06-10 14:09:05 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
John Baldwin
c72ae1423b - Hold a reference on the cdev a filesystem is mounted from in the mount.
- Remove the cdev pointers from the denode and instead use the mountpoint's
  reference to call dev2udev() in getattr().

Reviewed by:	kib, julian
2009-02-27 20:00:15 +00:00
Edward Tomasz Napierala
4f560d7595 Right now, when trying to unmount a device that's already gone,
msdosfs_unmount() and ffs_unmount() exit early after getting ENXIO.
However, dounmount() treats ENXIO as a success and proceeds with
unmounting.  In effect, the filesystem gets unmounted without closing
GEOM provider etc.

Reviewed by:	kib
Approved by:	rwatson (mentor)
Tested by:	dho
Sponsored by:	FreeBSD Foundation
2009-02-23 21:09:28 +00:00
Edward Tomasz Napierala
abb0cbf9c9 Turn a "panic: non-decreasing id" into an error printf. This seems
to be caused by a metadata corruption that occurs quite often after
unplugging a pendrive during write activity.

Reviewed by:	scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-13 22:35:26 +00:00
Edward Tomasz Napierala
f99f675d5a Fix msdosfs_print(), which in turn fixes "show lockedvnods" for msdosfs
vnodes.

Reviewed by:	kib
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-11 17:11:01 +00:00
Edward Tomasz Napierala
0da50f6ef8 According to phk@, VOP_STRATEGY should never, _ever_, return
anything other than 0.  Make it so.  This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem.  Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().

Reviewed by:	scottl, phk
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 21:13:11 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
Konstantin Belousov
4c5a20e3da Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR().
NODEV is more appropriate when va_rdev doesn't have a meaningful value.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Suggested by:   bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:49:15 +00:00
Konstantin Belousov
67c7bbf39c In rev. 1.17 (r33548) of msdosfs_fat.c, relative cluster numbers were
replaced by file relative sector numbers as the buffer block number when
zero-padding a file during extension. Revert the change, it causes wrong
blocks filled with zeroes on seeking beyond end of file.

PR:	kern/47628
Submitted by:	tegge
MFC after:	3 days
2008-09-01 13:18:16 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Konstantin Belousov
813d71de08 The uniqdosname() function takes char[12] as it third argument.
Found by:	-fstack-protector
Reported by:	dougb
Tested by:	dougb, Rainer Hurling <rhurlin gwdg de>
MFC after:	3 days
2008-07-04 09:40:52 +00:00
Konstantin Belousov
eab626f110 Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by:	kris
Tested by:	kris, pho
Discussed with:	jeff, dfr
MFC after:	2 weeks
2008-04-16 11:33:32 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
Jeff Roberson
698b1a6643 - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
Marcel Moolenaar
043ec583dc Don't check the bpbSecPerTrack and bpbHeads fields of the BPB.
They are typically 0 on new ia64 systems. Since we don't use
either field, there's no harm in not checking.
2008-02-21 03:19:46 +00:00
Attilio Rao
0e9eb108f0 Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Maxim Konovalov
23c1e989a6 o English lesson from bde@: "iff" is not a typo, it means "if and only if".
Backout previous.
2007-11-18 09:21:30 +00:00
Maxim Konovalov
4adf89efc6 o Fix a typo in the comment. 2007-11-17 16:19:48 +00:00
Tom Rhodes
ededffc06b Remove some debugging code that, while useful, doesn't belong in the committed
version.  While here, expand a macro only used once.

Discussed with/oked by:	bde
2007-10-25 08:23:08 +00:00
Xin LI
3247c9ddcc Fixes to msdosfs dirtyflag related stuff:
- markvoldirty() needs to write to underlying GEOM provider.  We
   have to do that *before* g_access() which sets the GEOM provider
   to read-only.
 - Remove dirty flag before free'ing iconv related resources.  The
   dirty flag removal could fail, and it is hard to revert the
   iconv-free after the fail.
 - Mark volume as dirty if we have failed to mark it clean for safe.
 - Other style fixes to the touched functions.
2007-10-22 17:43:43 +00:00
Bruce Evans
cb65c1ee29 Implement the async (really, delayed-write) mount option for msdosfs.
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.

This is more complete and correct than in ffs.  Several places in ffs
are are still missing the choice.  ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).

However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too.  Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race.  Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
2007-10-19 12:23:25 +00:00
Bruce Evans
9e916c3163 Add noclusterr and noclusterw options to the options list. I forgot these
when I implemented clustering.
2007-10-18 16:25:47 +00:00
Bruce Evans
7c3fc9de5c Fix some style bugs in the mount options list. Mainly, sort the list,
leaving space for adding missing options.  Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.
2007-10-18 15:48:10 +00:00
Bruce Evans
cefb55828f In msdosfs_settattr(), don't do synchronous updates of the denode
(except indirectly for the size pseudo-attribute).  If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them.  (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)

Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).

This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
2007-10-18 07:26:21 +00:00
Alfred Perlstein
77465d9390 Get rid of qaddr_t.
Requested by: bde
2007-10-16 10:54:55 +00:00
Bruce Evans
ed316d339f Remove some of the pessimizations involving writing the fsi sector.
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.

This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy.  This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
  not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search.  Starting the search
  at 0 instead of at (the in-core copy of) fsinxtfree did more than
  defeat the reasons for existence of fsinxtfree.  fsinxtfree exists
  mainly to avoid having to start at 0 for just the first search per
  mount, but has the side effect of reducing bias towards allocating
  near cluster 0.  The bias would normally only be generated by the
  first search per mount (if fsinxtfree is not supported), but since
  we also adjusted the in-core copy of fsinxtfree here, we were doing
  extra work to maximize the bias.

Approved by:	re (kensmith)
2007-09-23 14:49:32 +00:00
Bruce Evans
c2819440b3 Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.

The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer.  They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.

The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.

The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do.  Stack use from this
is large but not too large.  This also fixes a memory leak on module
unload.

Reviewed by:	kib
Approved by:	re (kensmith)
2007-08-31 22:29:55 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Bruce Evans
a4e6807c49 In msdosfs_read() and msdosfs_write(), don't check explicitly for
(uio_offset < 0) since this can't happen.  If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).

In msdosfs_read(), don't check for (uio_resid < 0).  msdosfs_write()
already didn't check.

In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid.  ffs checks using KASSERT(),
and that is enough sanity checking.  In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.

In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.

In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.

Approved by:	re (kensmith) (blanket)
2007-08-07 10:35:27 +00:00
Bruce Evans
b7837a91c9 Fix and update the comments about the effect of the read-only flag on writing.
They are still too verbose.

Remove nearby unreachable code for handling symlinks.

Approved by:	re (kensmith) (blanket)
2007-08-07 05:42:10 +00:00
Bruce Evans
e3117f852e Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).

Improve an error message by quoting ".", and by not printing large positive
values as negative ones.

Approved by:	re (kensmith) (blanket)
2007-08-07 03:59:49 +00:00
Bruce Evans
c0f5121cac Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix only a couple of whtespace errors).

Approved by:	re (kensmith) (blanket)
2007-08-07 03:43:28 +00:00
Bruce Evans
2d7c6b2724 Fix some style bugs (mainly some whitespace errors).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:38:36 +00:00
Bruce Evans
b6d0381e7e Fix some style bugs (some whitespace errors only).
Approved by:	re (kensmith) (blanket)
2007-08-07 03:22:10 +00:00
Bruce Evans
d2bb66bacd Sort includes.
Remove rotted banal comment attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:28:33 +00:00
Bruce Evans
6becd1c855 Sort includes.
Remove banal comments attached to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:27:35 +00:00
Bruce Evans
5696c6e0b2 Sort includes.
Remove banal comments before includes.  Remove rotted banal comments attached
to includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:20:37 +00:00
Bruce Evans
9b0802c90b Remove unused include(s).
Remove banal comments before includes.

Approved by:	re (kensmith) (blanket)
2007-08-07 02:11:16 +00:00
Bruce Evans
a878a31c13 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 02:08:06 +00:00
Bruce Evans
eba34270fa Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>

Approved by:	re (kensmith) (blanket)
2007-08-07 01:40:27 +00:00
Bruce Evans
1103771d95 Include <sys/mutex.h>'s prerequisite <sys/lock.h> instead of depending on
namespace pollution in <sys/vnode.h>.

Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.

Approved by:	re (kensmith) (blanket)
2007-08-07 01:37:59 +00:00
Bruce Evans
6fd81fc7a6 Remove unused include(s).
Approved by:	re (kensmith) (blanket)
2007-08-07 01:07:16 +00:00
Bruce Evans
8d61a735c6 Silently fix up the estimated next free cluster number from the fsinfo
sector, instead of failing the whole mount if it is garbage.  Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).

This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD.  1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary.  Now we handle the magic value as a special case of
fixing up all out of bounds values.

Also fix up the estimated next free cluster number when there is no
fsinfo sector.  We were using 0, but CLUST_FIRST is safer.

Approved by:	re (kensmith)
2007-08-05 12:58:34 +00:00
Bruce Evans
3726942956 Oops, fix the fix for the i/o size of the fsinfo block. Its log
message explained why the size is 1 sector, but the code used a
size of 1 cluster.

I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache.  Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.

Approved by:	re (kensmith)
2007-08-03 23:13:50 +00:00
Bruce Evans
4eb3abf0a5 Make using msdosfs as the root file system sort of work:
o Initialize ownerships and permissions.  They were garbage (0) for
  root mounts since vfs_mountroot_try() doesn't ask for them to be set
  and msdosfs's old incomplete code to set them was removed.  The
  garbage happened to give the correct ownerships root:wheel, but it
  gave permissions 000 so init could not be execed.  Use the macros
  for root: wheel and 0755.  (The removed code gave 0:0 and 0777.  0755
  is more normal and secure, thought wrong for /tmp.)

o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
  correct place, as in ffs.  For root mounts, it is only passed in
  mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
  and nothing translates the flag to the "ro" option string.  msdosfs
  only looked for it in the string, so it gave a rw mount for root
  mounts without even clearing the flag in mp->mnt_flags, so the final
  state was inconsistent.  Checking the flag only in mp->mnt_flags
  works for initial userland mounts too.  The MNT_UPDATE case is
  messier.

The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro.  This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw.  The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful.  So fsck must be turned off to use
msdosfs as root.  This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.

Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.

Approved by:	re (kensmith)
2007-07-23 07:10:17 +00:00
Bruce Evans
6b6c5f5ef9 Implement vfs clustering for msdosfs.
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read).  mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.

msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes.  Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.

The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code.  This implementation loops
calling a lower level bmap function to do the hard parts.  This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.

Approved by:	re (hrs)
2007-07-20 17:06:57 +00:00
Bruce Evans
d34b0a1bac Clean up before implementing vfs clustering for msdosfs:
In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().

In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf().  I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.

In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.

In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function.  This comment belongs in
VOP_BMAP.9 but that doesn't exist.

In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.

Approved by:	re (hrs)
2007-07-20 16:21:47 +00:00
Bruce Evans
93fe42b62f Round up the FAT block size to a multiple of the sector size so that i/o
to the FAT is possible.

Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors.  The magic 3 is
  the default number of 512-byte FAT sectors on a floppy drive.  That
  many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096.  Remove
  MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.

For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work.  We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K).  I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.

This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write).  Microsoft documents support for sector
sizes up to 4K in mdosfs.  ffs is currently limited to 8K for both
read and write.

Approved by:	re (kensmith)
Approved by:	nyan (several years ago)
2007-07-12 17:17:47 +00:00
Bruce Evans
fd7c4230b2 Fix some bugs involving the fsinfo block (many remain unfixed). This is
part of fixing msdosfs for large sector sizes.  One of the fixed bugs
was fatal for large sector sizes.

1. The fsinfo block has size 512, but it was misunderstood and declared
   as having size 1024, with nothing in the second 512 bytes except a
   signature at the end.  The second 512 bytes actually normally (if
   the file system was created by Windows) consist of a second boot
   sector which is normally (in WinXP) empty except for a signature --
   the normal layout is one boot sector, one fsinfo sector, another
   boot sector, then these 3 sectors duplicated.  However, other
   layouts are valid.  newfs_msdos produces a valid layout with one
   boot sector, one fsinfo sector, then these 2 sectors duplicated.
   The signature check for the extra part of the fsinfo was thus
   normally checking the signature in either the second boot sector
   or the first boot sector in the copy, and thus accidentally
   succeeding.  The extra signature check would just fail for weirder
   layouts with 512-byte sectors, and for normal layouts with any other
   sector size.

   Remove the extra bytes and the extra signature check.

2. Old versions did i/o to the fsinfo block using size 1024, with the
   second half only used for the extra signature check on read.  This
   was harmless for sector size 512, and worked accidentally for sector
   size 1024.  The i/o just failed for larger sector sizes.

   The version being fixed did i/o to the fsinfo block using size
   fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)).  This
   expression makes no sense.  It happens to work for sector small
   sector sizes, but for sector size 32K it gives the preposterous
   value of 64M and thus causes panics.  A sector size of 32768 is
   necessary for at least some DVD-RW's (where the minimum write size
   is 32768 although the minimum read size is 2048).

   Now that the size of the fsinfo block is 512, it always fits in
   one sector so there is no need for a macro to express it.  Just
   use the sector size where the old code uses 1024.

Approved by:	re (kensmith)
Approved by:	nyan (several years ago for a different version of (2))
2007-07-12 16:09:07 +00:00
Bruce Evans
8e55bfaf4b Don't use almost perfectly pessimal cluster allocation. Allocation
of the the first cluster in a file (and, if the allocation cannot be
continued contiguously, for subsequent clusters in a file) was randomized
in an attempt to leave space for contiguous allocation of subsequent
clusters in each file when there are multiple writers.  This reduced
internal fragmentation by a few percent, but it increased external
fragmentation by up to a few thousand percent.

Use simple sequential allocation instead.  Actually maintain the fsinfo
sequence index for this.  The read and write of this index from/to
disk still have many non-critical bugs, but we now write an index that
has something to do with our allocations instead of being modified
garbage.  If there is no fsinfo on the disk, then we maintain the index
internally and don't go near the bugs for writing it.

Allocating the first free cluster gives a layout that is almost as good
(better in some cases), but takes too much CPU if the FAT is large and
the first free cluster is not near the beginning.

The effect of this change for untar and tar of a slightly reduced copy
of /usr/src on a new file system was:

Before (msdosfs 4K-clusters):
untar:  459.57 real              untar from cached file (actually a pipe)
tar:    342.50 real              tar from uncached tree to /dev/zero
Before (ffs2 soft updates 4K-blocks 4K-frags)
untar:   39.18 real
tar:     29.94 real
Before (ffs2 soft updates 16K-blocks 2K-frags)
untar:   31.35 real
tar:     18.30 real

After (msdosfs 4K-clusters):
untar    54.83 real
tar      16.18 real

All of these times can be improved further.

With multiple concurrent writers or readers (especially readers), the
improvement is smaller, but I couldn't find any case where it is
negative.  342 seconds for tarring up about 342 MB on a ~47MB/S partition
is just hard to unimprove on.  (This operation would take about 7.3
seconds with reasonably localized allocation and perfect read-ahead.)
However, for active file systems, 342 seconds is closer to normal than
the 16+ seconds above or the 11 seconds with other changes (best I've
measured -- won easily by msdosfs!).  E.g., my active /usr/src on ffs1
is quite old and fragmented, so reading to prepare for the above
benchmark takes about 6 times longer than reading back the fresh copies
of it.

Approved by:	re (kensmith)
2007-07-10 13:20:24 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Tom Rhodes
1be5bc7459 Revert previous, part of NFS that I didn't know about. 2007-06-01 17:06:46 +00:00
Tom Rhodes
a33ebaecf6 Garbage collect msdosfs_fhtovp; it appears unused and I have been using
MSDOSFS without this function and problems for the last month.
2007-06-01 14:57:19 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
Craig Rodrigues
d6140aaa69 Add noatime to the list of mount options that msdosfs accepts.
PR:		108896
Submitted by:	Eugene Grosbein <eugen grosbein pp ru>
2007-02-08 02:30:55 +00:00
Craig Rodrigues
dc9a617afb Style fixes: use ANSI C function declarations. 2007-02-08 02:25:35 +00:00
Craig Rodrigues
8a4cab026b Eliminate some dead code which was introduced in 1.23, yet was always
commented out.
2007-02-06 03:30:58 +00:00
Tai-hwa Liang
61ad2e26ef Fixing compilation bustage by removing references to opt_msdosfs.h.
This auto-generated header file no longer exists since the removal of
MSDOSFS_LARGE in sys/conf/options:1.574.
2007-01-30 08:05:04 +00:00
Tom Rhodes
bade0e00f3 Fix spacing from my previous commit to this file:
Noticed by:	fjoe
2007-01-30 04:41:38 +00:00
Craig Rodrigues
f458f2a553 Add a "-o large" mount option for msdosfs. Convert compile-time checks for
#ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.

Test case provided by Oliver Fromme:
  truncate -s 200G test.img
  mdconfig -a -t vnode -f test.img -u 9
  newfs_msdos -s 419430400 -n 1 /dev/md9 zip250
  mount -t msdosfs /dev/md9 /mnt    # should fail
  mount -t msdosfs -o large /dev/md9 /mnt   # should succeed

PR:		105964
Requested by:	Oliver Fromme <olli lurza secnetix de>
Tested by:	trhodes
MFC after:	2 weeks
2007-01-30 03:11:45 +00:00
Tom Rhodes
752945d6c0 Add a 3rd entry in the cache, which keeps the end position
from just before extending a file.  This has the desired effect
of keeping the write speed constant.  And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.

Stolen from:	NetBSD
Reviewed by:	bde
2007-01-16 23:43:14 +00:00
Craig Rodrigues
82c59ec651 When performing a mount update to change a mount from read-only to read-write,
do not call markvoldirty() until the mount has been flagged as read-write.
Due to the nature of the msdosfs code, this bug only seemed to appear for
FAT-16 and FAT-32.

This fixes the testcase:
#!/bin/sh
dd if=/dev/zero bs=1m count=1 oseek=119 of=image.msdos
mdconfig -a -t vnode -f image.msdos
newfs_msdos -F 16 /dev/md0 fd120m
mount_msdosfs -o ro /dev/md0 /mnt
mount | grep md0
mount -u -o rw /dev/md0; echo $?
mount | grep md0
umount /mnt
mdconfig -d -u 0

PR:		105412
Tested by:	Eugene Grosbein <eugen grosbein pp ru>
2007-01-06 20:46:02 +00:00
Craig Rodrigues
9170c87faa Eliminate obsolete comment, now that getushort() is implemented in
terms of functions in <sys/endian.h>.
2007-01-05 05:28:57 +00:00
Marcel Moolenaar
94632b9fe1 Unbreak 64-bit little-endian systems that do require alignment.
The fix involves using le16dec(), le32dec(), le16enc() and
le32enc(). This eliminates invalid casts and duplicated logic.
2006-12-21 05:40:46 +00:00
Craig Rodrigues
3244bb8a12 For big-endian version of getulong() macro, cast result to u_int32_t.
This macro was written expecting a 32-bit unsigned long, and
doesn't work properly on 64-bit systems.  This bug caused vn_stat()
to return incorrect values for files larger than 2gb on msdosfs filesystems
on 64-bit systems.

PR:		106703
Submitted by:	Axel Gonzalez <loox e-shell net>
MFC after:	3 days
2006-12-19 02:31:58 +00:00
Craig Rodrigues
d01e83878b Fix get_ulong() macro on AMD64 (or any little-endian 64-bit platform).
This bug caused vn_stat() to fail on files larger than 2gb on msdosfs
filesystems on AMD64.

PR:		106703
Tested by:	Axel Gonzalez <loox e-shell net>
MFC after:	3 days
2006-12-19 01:55:45 +00:00
Craig Rodrigues
e9022ef898 Minor cleanup. If we are doing a mount update, and we pass in
an "export" flag indicating that we are trying to NFS export the
filesystem, and the MSDOSFS_LARGEFS flag is set on the filesystem,
then deny the mount update and export request.  Otherwise,
let the full mount update proceed normally.
MSDOSFS_LARGES and NFS don't mix because of the way inodes are calculated
for MSDOSFS_LARGEFS.

MFC after:	3 days
2006-12-09 01:49:19 +00:00
Maxim Konovalov
1c5cf521ae o Do not leave uninitialized birthtime: in MSDOSFSMNT_LONGNAME
set birthtime to FAT CTime (creation time) and in the other cases
set birthtime to -1.

o Set ctime to mtime instead of FAT CTime which has completely
different meaning.

PR:		kern/106018
Submitted by:	Oliver Fromme
MFC after:	1 month
2006-12-03 19:04:26 +00:00
Maxim Konovalov
cc005bb92c o From the submitter: dos2unixchr will convert to lower case if
LCASE_BASE or LCASE_EXT or both are set.  But dos2unixfn uses
dos2unixchr separately for the basename and the extension.  So if
either LCASE_BASE or LCASE_EXT is set, dos2unixfn will convert both
the basename and extension to lowercase because it is blindly
passing in the state of both flags to dos2unixchr.  The bit masks I
used ensure that only the state of LCASE_BASE gets passed to
dos2unixchr when the basename is converted, and only the state of
LCASE_EXT is passed in when the extension is converted.

PR:		kern/86655
Submitted by:	Micah Lieske
MFC after:	3 weeks
2006-11-26 18:49:44 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Poul-Henning Kamp
3c960d9379 Replace slightly crummy fattime<->timespec conversion functions. 2006-10-24 11:14:05 +00:00
Poul-Henning Kamp
e5037a18a9 Use utc_offset() where applicable, and hide the internals of it
as static variables.
2006-10-02 18:23:37 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Warner Losh
1a3c917f9d while (0); -> while (0) in multi-line macros 2006-08-17 22:50:33 +00:00
Xin LI
bcc4260f3b When the volume is being downgraded from a read-write mode, mark
it as clean.

PR:		kern/85366
Submitted by:	Dan Lukes <dan at obluda dot cz>
MFC After:	2 weeks
2006-08-03 03:55:52 +00:00
Craig Rodrigues
829b898c7c mount_msdosfs.c:
- remove call to getmntopts(), and just pass -o options to
    nmount().  This removes some confusion as to what options
    msdosfs can parse, by pushing the responsibility of option parsing
    to the VFS and FS specific code in the kernel.

msdosfs_vfsops.c:
  - add "force" and "sync" to msdosfs_opts.  They used to be specified
    in mount_msdosfs.c, so move them here.  It's not clear whethere these
    options should be placed into global_opts in vfs_mount.c or not.

Motivated by:	marcus
2006-06-01 02:25:00 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Jeff Roberson
89b0e10910 - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
Tom Rhodes
9fc31f8a5f Update incorrect comments here, there should not be a call to panic()
over fs corruption.

Discussed with:	alfred, phk
2006-01-23 17:45:57 +00:00
Max Khon
710a9accfe Do not assume that `char direntry::deExtension[3]' starts right after
`char direntry::deName[8]' and access deExtension[] explicitly.

Found by:	Coverity Prevent(tm)
CID:		350, 351, 352
2006-01-22 21:09:38 +00:00
Alfred Perlstein
92e73f5711 I ran into an nfs client panic a couple of times in a row over the
last few days.  I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter.  The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any).  I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week
2006-01-17 17:29:03 +00:00