Commit Graph

1793 Commits

Author SHA1 Message Date
Mike Pritchard
6192525baf Fix a spelling error in some comments. heirarchy -> hierarchy.
Obtained from: OpenBSD
2007-01-16 19:35:43 +00:00
Robert Watson
8102a9d4d5 Canonicalize copyright: use a date range rather than comma-delimited
list.

MFC after:	3 days
2007-01-08 17:55:32 +00:00
Kip Macy
2f6a774be4 change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)
2006-11-13 05:51:22 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Konstantin Belousov
2276d0814f Aquire Giant in the softdep_flush for clear_remove() and clear_inodedeps()
processing when QUOTA is set.

Reported and tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	3 days
2006-11-01 13:48:44 +00:00
Pawel Jakub Dawidek
1a60c7fc8e Add gjournal specific code to the UFS file system:
- Add FS_GJOURNAL flag which enables gjournal support on a file system.
- Add cg_unrefs field to the cylinder group structure which holds
  number of unreferenced (orphaned) inodes in the given cylinder group.
- Add fs_unrefs field to the super block structure which holds
  total number of unreferenced (orphaned) inodes.
- When file or a directory is orphaned (last reference is removed, but
  object is still open), increase fs_unrefs and cg_unrefs fields,
  which is a hint for fsck in which cylinder groups looks for such
  (orphaned) objects.
- When file is last closed, decrease {fs,cg}_unrefs fields.
- Add VV_DELETED vnode flag which points at orphaned objects.

Sponsored by:	home.pl
2006-10-31 21:48:54 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Konstantin Belousov
ec7a247a24 Do not translate the IN_ACCESS inode flag into the IN_MODIFIED while filesystem
is suspending/suspended. Doing so may result in deadlock. Instead, set the
(new) IN_LAZYACCESS flag, that becomes IN_MODIFIED when suspend is lifted.

Change the locking protocol in order to set the IN_ACCESS and timestamps
without upgrading shared vnode lock to exclusive (see comments in the
inode.h). Before that, inode was modified while holding only shared
lock.

Tested by:	Peter Holm
Reviewed by:	tegge, bde
Approved by:	pjd (mentor)
MFC after:	3 weeks
2006-10-10 09:20:54 +00:00
Tor Egge
ad4276811a Correct check for when IO_SYNC should be set for filesystem
not using softupdates when truncating a directory to zero length.

Discussed with:	bde
2006-10-02 02:08:31 +00:00
Tor Egge
8d0547c68b Protect change to bo_flag by holding the bufobj mutex. 2006-09-26 04:21:20 +00:00
Tor Egge
e60c361218 Reduce fluctuations of mnt_flag to allow unlocked readers to get a
slightly more consistent view.
2006-09-26 04:20:09 +00:00
Tor Egge
9b65c22cf4 Don't restore MNT_QUOTA bit in mnt_flag after snapshot creation,
closing a race between nmount() and quotactl().
2006-09-26 04:19:11 +00:00
Tor Egge
55b4ff0d9f Increase mnt_noasync once in softdep_mount() to disallow async io,
closing a window where a file system using softupdates could be async
for a short while if both MNT_UPDATE and MNT_ASYNC were passed as flags
to nmount().  Add MNTK_SOFTDEP flag to ensure that softdep_mount()
doesn't increase mnt_noasync multiple times.
2006-09-26 04:17:17 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Konstantin Belousov
28de2218ec Fix the glitch introduced in rev. 1.93. In softdep_sync_metadata(),
switch by worklist type contains two for() loops, for D_INDIRDEP and
D_PAGEDEP. On error, these loops are exited by break, where the switch
actually shall be leaved. Use goto instead of break to reach the error
handling code.

Reported by:	Peter Holm
Reviewed by:	tegge
Approved by:	pjd (mentor)
MFC after:	2 weeks
2006-09-20 07:49:28 +00:00
Robert Watson
5702e0965e Declare security and security.bsd sysctl hierarchies in sysctl.h along
with other commonly used sysctl name spaces, rather than declaring them
all over the place.

MFC after:	1 month
Sponsored by:	nCircle Network Security, Inc.
2006-09-17 20:00:36 +00:00
Konstantin Belousov
3f65847e2f While checking for update of snapshot file in the ffs_copyonwrite,
first filter out metadata update. Otherwise, devfs vnode could be
erronously interpreted as ufs one, causing further check of i_flags
to use random memory.

PR:	kern/100365
Debugged and fix described by:	tegge
Approved by:	pjd (mentor)
MFC after:	2 weeks
2006-08-21 17:20:19 +00:00
Pawel Jakub Dawidek
f4cc92c97c Correct typo in comment. 2006-08-20 10:52:44 +00:00
David E. O'Brien
80cd95f9cc Rather than print out a nice error message giving details sufficent to fix
a 'ufs_dirbad' and then panicing (making it very hard to see the details),
put them in the panic message itself.
2006-07-31 15:44:13 +00:00
Stefan Farfeleder
2b8c9fa46b Drop two unnecessary casts. 2006-07-18 07:03:43 +00:00
Daichi GOTO
55e9893a66 The ufs_lookup.c has a critical bug around the whiteout
process. UFS must check a whiteout name when it uses the
whiteout, but the current implementation does not check
the whileout name, so sometimes UFS writes over a wrong
whtieout. UFS *MUST* check the whiteout name to use a
corrent whiteout. This bug leads unionfs. panic.
This commit fixes this trouble.

Submitted by:	Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by:	tegge & rodrigc (mentor)
Approved by:	rodrigc (mentor)
MFC after:	2 weeks
2006-07-11 17:27:04 +00:00
Pawel Jakub Dawidek
5fe6d2beb4 Declare UFS module version. 2006-07-09 14:11:09 +00:00
Pawel Jakub Dawidek
946478fca6 Change fs->fs_fsmnt to mp->mnt_stat.f_mntonname in warnings about missing
MAC and ACLs support in the kernel. If it is a first mount, fs->fs_fsmnt
is empty.

MFC after:	1 week
2006-07-09 14:10:35 +00:00
Craig Rodrigues
71ac2d7c7c Check the sectorsize of the underlying disk before trying to
bread() the UFS superblock.  Should eliminate crashes when trying
to do: mount -t ufs on an audio CD.

PR:		kern/85893
Reported by:	Russell Francis <rfrancis at ev dot net>
MFC after:	1 week
2006-06-03 21:20:37 +00:00
Maxim Konovalov
e680b88a3d o Rearrange and remove incorrect comments.
Requested by:	bde
2006-05-31 15:55:52 +00:00
Maxim Konovalov
3593da9e81 o According to POSIX, the result of ftruncate(2) is unspecified
for file types other than VREG, VDIR and shared memory objects.
We already handle VREG, VLNK and VDIR cases.  Silently ignore
truncate requests for all the rest.  Adjust comments.

PR:		kern/98064
Submitted by:	bde
Security:	local DoS
Regress. test:	regression/fifo/fifo_misc
MFC after:	2 weeks
2006-05-31 13:15:29 +00:00
Craig Rodrigues
ee98eb825b Remove "update" from ffs_opts. It has been moved to global_opts
in vfs_mount.c.
2006-05-26 12:44:12 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Craig Rodrigues
4ba8c2a5d3 Take errmsg out of ffs_opts. It is already part of global_opts
in vfs_mount.c.
2006-05-24 00:12:21 +00:00
Maxim Konovalov
5d1d31b4b3 o Fix a comment: ufs2_dinode.di_blocks counts blocks not bytes actually held. 2006-05-21 21:55:29 +00:00
Maxim Konovalov
b6893ab299 o Fix a comment: directory whiteout type is DT_WHT not DT_W. 2006-05-21 21:28:34 +00:00
Tom Rhodes
e45269bf51 Provide a less cryptic panic message in place of just "found inode." 2006-05-16 18:51:22 +00:00
Tor Egge
e0cf717542 Read block hints list from last snapshot on the active snapshot list. 2006-05-16 00:14:20 +00:00
Tor Egge
d93d98d98f Copy last block on file system again after file system has been suspended.
Obtained from:	NetBSD
2006-05-15 23:18:49 +00:00
Tor Egge
ae5d9f3b1c Don't leak a locked buffer if last block on file system cannot be read. 2006-05-15 22:59:23 +00:00
Tor Egge
ebb78f64c7 Errors detected while file system is suspended should not trigger an
assertion failure.
2006-05-15 22:52:22 +00:00
Tor Egge
b405cb5ea5 Expunge traces of unlinked snapshot files when making a new snapshot. 2006-05-13 20:41:37 +00:00
Tor Egge
4613aa0e99 Bring the call to softdep_releasefile() within the region protected by
vn_start_secondary_write() since it might cause file system write activity
(e.g. ffs_snapremove()).
2006-05-09 22:33:43 +00:00
Tor Egge
43e07fffb6 ffs_syncvnode() might skip some of the blocks due to them being locked,
assuming them to be inflight write buffers.  This is not always the case.
bufdaemon might hold the buffer lock and give up writing the buffer due to it
having dependencies, the file system being suspended or the vnode lock being
held by another thread.  When bufdaemon decides to write the buffer there is
still a window before bufobj_wref() has been called, allowing other threads to
believe that the vnode has no dirty buffers or inflight writes.

Try harder to flush first block of new subdirectory to get rid of MKDIR_BODY
dependency.
2006-05-06 20:51:31 +00:00
Tor Egge
b673e7b7eb Return error if vnode was reclaimed while it was temporarily unlocked.
Add missing calls to vn_finished_write() in error handling.
2006-05-05 21:27:31 +00:00
Tor Egge
0911ecffe7 Turn off disk quotas for snapshot files. 2006-05-05 20:10:04 +00:00
Tor Egge
c7793f61dc Avoid locking overhead when snapshots are disabled. 2006-05-05 19:58:36 +00:00
Pawel Jakub Dawidek
5b139b2d75 - Set bio_done directly to NULL to indicate that we want to wait for the bio.
- Use biowait() instead of copying the code.

MFC after:	1 month
2006-05-05 10:06:22 +00:00
Tor Egge
d81daf63bc Detect the snapshot file being prematurely unlinked. 2006-05-03 00:29:22 +00:00
Tor Egge
868bb88ff2 Temporarily undo clusters contribution to global runningbufspace while
handling copy on write for the buffers taking part in the cluster.
2006-05-03 00:10:29 +00:00
Tor Egge
5515ad4282 A side effect of calling runningbufwakeup() is that bp->b_runningbufspace is
cleared.  Save old value and restore bp->b_runningbufspace before returning
from ffs_copyonwrite().
2006-05-03 00:04:38 +00:00
Tor Egge
6d94935d36 Close a race when VOP_LOCK() on a snapshot file is attempted at the
same time as it is changed back into a normal file.  The locker would
get the shared "snaplk" lock which would no longer be the correct lock
for the vnode.
2006-05-02 23:52:43 +00:00
Scott Long
cbd6fedbf2 Fix a typo. 2006-04-28 04:39:50 +00:00
Jeff Roberson
6ca9fcc586 - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
 - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
   runs in the context of the buf daemon and may require Giant.
2006-04-28 01:05:31 +00:00
Tom Rhodes
7b3f1bbd61 Revert previous to this file before an actual request is made. 2006-04-22 04:22:15 +00:00
Tom Rhodes
8fc22c9d2e Remove what I believe are two useless ifdefs. If a user or administrator
enables multilabel, or any option for that matter, most likely they have
a reason.  This will allow users to see that mulilabel is enabled via an
issued "mount" command and remove an annoying warning - printed only when
a MAC kernel is not installed - on boot up.

Discussed with:	green, brueffer, Samy Al Bahra.
Probably ran past:	csjp (though I can't remember).
2006-04-21 07:14:25 +00:00
Ken Smith
39fac37953 Fix panic() message to give the right function name. 2006-04-17 07:43:56 +00:00
Tor Egge
68e8466655 Eliminate softdep_flush() livelock by accounting for number of worklist items
marked as being in progress.
2006-04-03 22:23:23 +00:00
Jeff Roberson
3bbd6d8ae6 - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:54:20 +00:00
Tor Egge
700118c72f Allow compilation when not using softupdates. 2006-03-19 22:16:44 +00:00
Tor Egge
7de3839d0d Let snapshots make a copy of old contents for all buffers taking part in a
cluster instead of just the first buffer.

Delay buf_start() calls until snapshots have a copy of old content.

PR:		kern/93942
2006-03-19 21:43:36 +00:00
Tor Egge
30b3a49fab Add kludge to avoid deadlock when unlinking snapshot. 2006-03-19 21:29:20 +00:00
Tor Egge
95e7a3c3ac Reduce probability of unmount failing after having unmounted snapshots. 2006-03-19 21:09:19 +00:00
Tor Egge
8c86028f11 Ensure that vnode for directory isn't reclaimed before ffs_snapshot() has
completed expunging unlinked files.  It could come back at another memory
location causing a lock order reversal.
2006-03-19 21:05:10 +00:00
Jeff Roberson
8db357205c - Remove the call to softdep_waitidle after suspending the filesystem.
This does not do what I wanted as all dirty buffers must be flushed
   by the call to ffs_sync and any remaining dependency work would mean
   that this failed.

Pointed out by: tegge
2006-03-12 05:26:12 +00:00
Jeff Roberson
2eedeb7e60 - Remove the call to softdep_waitidle after suspending the filesystem.
This does not do what I wanted as all dirty buffers must be flushed
   by the call to ffs_sync and any remaining dependency work would mean
   that this failed.

Pointed out by:	tegge
2006-03-12 05:24:14 +00:00
Tor Egge
ca2fa80767 Block secondary writes while expunging active unlinked files.
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.

Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).

Perform deferred inactive handling after the file system is resumed.
2006-03-11 01:08:37 +00:00
Tor Egge
1e70cd7fc7 Remove unneeded (and broken) usage of MNT_REF()/MNT_REL(). 2006-03-10 02:31:12 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Tor Egge
a695d54404 Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems.  This could cause deadlocks when creating snapshots.

Reviewed by:	jeff
2006-03-08 02:14:39 +00:00
Tor Egge
3b582b4e72 Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held.  Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
2006-03-02 22:13:28 +00:00
Jeff Roberson
b9b12498fd - Acquire lk in softdep_slowdown so that it's owned when we call
softdep_speedup().
 - Assert that lk is held in softdep_speedup() rather than acquiring it.
   This avoids a potential lock recursion.
2006-03-02 08:52:53 +00:00
Jeff Roberson
eb2ea10590 - Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
   interdependencies between mounts that can lead to deadlocks, etc.
 - Add the softdep worklist and various counters to the ufsmnt structure.
 - Add a mount pointer to the workitem and remove mount pointers from the
   various structures derived from the workitem as they are now redundant.
 - Remove the poor-man's semaphore protecting softdep_process_worklist and
   softdep_flushworklist.  Several threads may now process the list
   simultaneously.
 - Add softdep_waitidle() to block the thread until all pending
   dependencies being operated on by other threads have been flushed.
 - Use softdep_waitidle() in unmount and snapshots to block either
   operation until the fs is stable.
 - Remove softdep worklist processing from the syncer and move it into the
   softdep_flush() thread.  This thread processes all softdep mounts
   once each second and when it is called via the new softdep_speedup()
   when there is a resource shortage.  This removes the softdep hook
   from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with:	tegge, truckman, mckusick
Tested by:	kris
2006-03-02 05:50:23 +00:00
Jeff Roberson
f5a4db791d - Using LK_NOWAIT in qsync() can get us into infinite loop situations that
lead to deadlocks.  Remove it.

MFC After:	1 week
2006-02-22 06:12:53 +00:00
Robert Watson
5652c15c24 In quotaoff(), lock the vnode instead of asserting it when manipulating
v_vflags.

MFC after:	1 week
Submitted by:	Antoine Brodin <antoine at brodin at laposte dot net>
2006-02-12 13:20:06 +00:00
Robert Watson
4a99d6f90a Instead of asserting the vnode lock before manipulating v_vflag, acquire
it and drop it afterwards.

Found by:	kris
MFC after:	1 week
2006-02-11 21:09:27 +00:00
Jeff Roberson
89b0e10910 - Reorder calls to vrele() after calls to vput() when the vrele is a
directory.  vrele() may lock the passed vnode, which in these cases would
   give an invalid lock order of child -> parent.  These situations are
   deadlock prone although do not typically deadlock because the vrele
   is typically not releasing the last reference to the vnode.  Users of
   vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 	1 week
Sponsored by:	Isilon Systems, Inc.
Tested by:	kkenn
2006-02-01 00:25:26 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Tor Egge
6c62b2acd0 If the lock passed to getdirtybuf() is the softdep lock then the background
write completed wakeup could be missed.  Close the race by grabbing the lock
normally used for protection of bp->b_xflags.

Reviewed by:	truckman
2006-01-09 19:32:21 +00:00
Tor Egge
c8c7711d66 Broaden scope of softdep_worklist_busy rwlock protection of softdep processing
to avoid some dependencies being missed by softdep_flushworklist().

Reviewed by:	truckman
2006-01-09 19:16:56 +00:00
Warner Losh
5c65ae3a88 New option: NO_FFS_SNAPSHOT. I did this in p4 about the same time
that NetBSD implemented it independently of them (don't know which one
was actually first).  This saves about 24k for those times you don't
need snapshot support (like when running off a ram disk, or in an
embedded environment where size matters).
2006-01-06 04:44:09 +00:00
Xin LI
cd34c8b6a2 Typo. 2005-12-23 15:50:57 +00:00
Dag-Erling Smørgrav
0430a5e289 Eradicate caddr_t from the VFS API. 2005-12-14 00:49:52 +00:00
Craig Rodrigues
b6bd025c35 Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow
mount options.

Noticed by:	Amir Shalem < amir at boom dot org dot il>
2005-11-24 15:06:40 +00:00
Craig Rodrigues
cea903627f If export mount flag is not passed in, set default parameters
for export structure and pass that to vfs_export().
Currently in userland mount(8), an export structure is unconditionally
passed in, only for UFS.  This is an attempt to move that UFS-specific
behavior out of mount(8) and into the UFS filesystem code.
2005-11-20 17:04:50 +00:00
Craig Rodrigues
359d438885 Add more options to ffs_opts, so that vfs_filteropts() will not
complain when we pass these options to a UFS filesystem as strings
via nmount():  noexec, nosuid, nosymfollow, sync, suiddir
2005-11-19 23:28:19 +00:00
Craig Rodrigues
26f59b6455 - Add parsing for the following existing UFS/FFS mount options in the nmount()
callpath via vfs_getopt(), and set the appropriate MNT_* flag:
  -> acls, async, force, multilabel, noasync, noatime,
  -> noclusterr, noclusterw, snapshot, update

- Allow errmsg as a valid mount option via vfs_getopt(),
  so we can later add a hook to propagate mount errors back
  to userspace via vfs_mount_error().
2005-11-18 06:06:10 +00:00
Xin LI
fad951e3a0 Slightly reorganize to reduce duplicated code.
Reviewed by:	rwatson
2005-11-07 18:25:23 +00:00
Paul Saab
e1cef62715 Rate limit filesystem full and out of inodes messages to once a
second.
2005-10-31 20:33:28 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Xin LI
eb2893ec18 Remove an unneeded "a" from comment. 2005-10-25 19:46:15 +00:00
Nate Lawson
8680d6985f Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit
to (max block - 1) * bsize.  For DEV_BSIZE, this doubles the limit from
0.5 TB to 1 TB.  For the old 4.4 FFS case, decrease the limit from 0.5 TB
to 2 GB - 1.  Older systems had a 32 bit off_t so they couldn't access the
larger files anyway.

Collaboration with:	bde
2005-10-21 01:54:00 +00:00
Don Lewis
875e108755 Correct the type of the temporary variable used by ufs_lookup.c:1.78
to fix the race condition in the ufs_lookup() ISDOTDOT code.

Noticed by:	bde
MFC after:	12 days
2005-10-16 21:31:46 +00:00
Don Lewis
12d360453c Close a race in the ufs_lookup() code that handles the ISDOTDOT
case by saving the value of dp->i_ino before unlocking the vnode
for the current directory and passing the saved value to VFS_VGET().

Without this change, another thread can overwrite dp->i_ino after
the current directory is unlocked, causing  ufs_lookup() to lock
and return the wrong vnode in place of the vnode for its parent
directory.  A deadlock can occur if dp->i_ino was changed to a
subdirectory of the current directory because the root to leaf vnode
lock ordering will be violated.  A vnode lock can be leaked if
dp->i_ino was changed to point to the current directory, which
causes the current vnode lock for the current directory to be
recursed, which confuses lookup() into calling vrele() when it
should be calling vput().

The probability of this bug being triggered seems to be quite low
unless the sysctl variable debug.vfscache is set to 0.

Reviewed by:	jhb
MFC after:	2 weeks
2005-10-14 22:13:33 +00:00
Robert Watson
606dcf085f When performing a VOP_LOOKUP() as part of UFS1 extended attribute
auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE.  This flag must now
be set so that lockmgr knows what kind of lock to acquire, and it
will panic if not specified.  This resulted in a panic when using
extended attributes on UFS1 as of locking work present in the 6.x
branch.

This is a RELENG_6_0 merge candidate.

Reported by:	lofi
MFC after:	3 days
2005-10-12 14:18:58 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Tor Egge
48c2ac4539 Avoid unintended VMIO on directories and symlinks due to leftover object
not having been destroyed.
2005-10-10 19:02:04 +00:00
Tor Egge
4e0cd00988 Adjust totread argument passed to cluster_read() to account for offset not
being block aligned.
2005-10-09 21:11:25 +00:00
Tor Egge
9248a8271c Don't pretend that a failed sync write was succesful. 2005-10-09 20:49:01 +00:00
Tor Egge
021869b542 Reduce probability for a deadlock that can occur when a snapshot inode is
updated by a process holding the snapshot lock.  Another process updating a
different inode in the same inodeblock will do copy on write checks and lock in
the opposite direction.

The snapshot code force a copy on write of these blocks manually (cf. start of
expunge_ufs[12]) and these inode blocks are later put on snapblklist.

This partial fix is to 'drain' the relevant ffs_copyonwrite() operation after
installing new snapblklist.  This is not a 100% solution since a failed block
allocation can cause implicit fsync() which might deadlock before the new
snapblklist has been installed.
2005-10-09 20:15:15 +00:00
Tor Egge
d4d530da96 Eliminate a deadlock that can occur when a dirty block belonging to a snapshot
file is flushed by a process not holding snaplk (e.g. bufdaemon).  Another
process might hold snaplk and try to access the block due to ffs_copyonwrite
processing.
2005-10-09 20:07:51 +00:00
Tor Egge
45f91051da Eliminate a deadlock that can occur during the cgaccount() processing due to
the cg map buffer being held when writing indirect blocks.  The process ends up
in ffs_copyonwrite(), attempting to get snaplk while holding the cg map buffer
lock.

Another process might be in ffs_copyonwrite(), trying to allocate a new block
for a copy.  It would hold snaplk while trying to get the cg map buffer lock.

Release the cg map buffer early and use the copy for most of the cgaccount
processing to avoid this deadlock.
2005-10-09 20:00:16 +00:00
Tor Egge
17026ff61a Reduce the probability of low block numbers passed to ffs_snapblkfree() by
skipping the call from ffs_snapremove() if the block number is zero.

Simplify snapshot locking in ffs_copyonwrite() and ffs_snapblkfree() by using
the same locking protocol for low block numbers as for larger block numbers.
This removes a lock leak that could happen if vn_lock() succeeded after
lockmgr() failed in ffs_snapblkfree().

Check if snapshot is gone before retrying a lock in ffs_copyonwrite().
2005-10-09 19:45:01 +00:00
Tor Egge
c73e9e9c7b Reinitialize v_type and v_op fields in case vnode has been reused without
reclamation.  If the vnode previously was a fifo then v_op would point to
ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at
the end of ffsext_strategy.
2005-10-09 19:06:34 +00:00
Don Lewis
448434c3c9 Initialize the inode i_flag field in ffs_valloc() to clean up any
stale flag bits left over from before the inode was recycled.

Without this change, a leftover IN_SPACECOUNTED flag could prevent
softdep_freefile() and softdep_releasefile() from incrementing
fs_pendinginodes.  Because handle_workitem_freefile() unconditionally
decrements fs_pendinginodes, a negative value could be reported at
file system unmount time with a message like:
        unmount pending error: blocks 0 files -3
The pending block count in fs_pendingblocks could also be negative
for similar reasons.  These errors can cause the data returned by
statfs() to be slightly incorrect.  Some other cleanup code in
softdep_releasefile() could also be incorrectly bypassed.

MFC after:	3 days
2005-10-03 21:57:43 +00:00
Don Lewis
460858e9ef Correct previous commit to fix the sense of the TDP_NORUNNINGBUF
check in ffs_copyonwrite() that is a precondition for calling
waitrunningbufspace().

Pointed out by:	tegge
Pointy hat to:	truckman
MFC after:	3 days
2005-10-01 19:10:48 +00:00
Don Lewis
bd3c2d867d Un-staticize waitrunningbufspace() and call it before returning from
ffs_copyonwrite() if any async writes were launched.

Restore the threads previous TDP_NORUNNINGBUF state before returning
from ffs_copyonwrite().
2005-09-30 18:07:41 +00:00
Don Lewis
6c8b634f1d Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.

Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads.  A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.

Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk.  This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk.  The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.

Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.

Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.

See <http://www.holm.cc/stress/log/cons143.html>
2005-09-30 01:30:01 +00:00
Don Lewis
445193b887 After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count.  This will force the update of
the parent directory's actual link to actually be scheduled.  Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely.  ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.

If a background fsck is run before the update of the parent directory's
actual link count has been performed, or at least scheduled by
putting the dirrem on the leaf directory's inodedep id_bufwait list,
fsck will corrupt the file system by decrementing the parent
directory's effective link count, which was previously correct
because it already took the removal of the leaf directory into
account, and setting the actual link count to the same value as the
effective link count after the dangling, removed, leaf directory
has been removed.  This happens because fsck acts based on the
actual link count, which will be too high when fsck creates the
file system snapshot that it references.

This change has the fortunate side effect of more quickly cleaning
up the large number dirrem structures that linger for an extended
time after the removal of a large directory tree.  It also fixes a
potential problem with the shutdown of the syncer thread timing out
if the system is rebooted immediately after removing a large directory
tree.

Submitted by:	tegge
MFC after:	3 days
2005-09-29 21:50:26 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
John Baldwin
7e9e371f2d Use the refcount API to manage the reference count for user credentials
rather than using pool mutexes.

Tested on:	i386, alpha, sparc64
2005-09-27 18:09:42 +00:00
Xin LI
0e4f6eecd7 Restore a historical ufs_inactive behavior that has been changed
in rev. 1.40 of ufs_inode.c, which allows an inode being truncated
even when the filesystem itself is marked RDONLY.  A subsequent
call of UFS_TRUNCATE (ffs_truncate) would panic the system as it
asserts that it can only be called when the filesystem is mounted
read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c).

Because ffs_mount() already takes care of sync'ing the filesystem
to disk before being downgraded to readonly, it appears to be more
desirable that we should not permit this sort of writes to disk.

This change would fix a panic that occours when read-only mounted
a corrupted filesystem and doing some file operations.

MT6/5/4 candidate

Reviewed by:	mckusick
2005-09-23 20:49:57 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Tor Egge
2f0ffabcf4 Giant is no longer needed here. 2005-09-12 01:21:42 +00:00
Christian S.J. Peron
d1dfd92177 Convert the primary ACL allocator from malloc(9) to using a UMA zone instead.
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.

MFC after:	1 month
Discussed with:	rwatson
2005-09-06 00:06:30 +00:00
Tor Egge
d536ff2edb Retain generation count when writing zeroes instead of an inode to disk.
Don't free a struct inodedep if another process is allocating saved inode
memory for the same struct inodedep in initiate_write_inodeblock_ufs[12]().

Handle disappearing dependencies in softdep_disk_io_initiation().

Reviewed by:	mckusick
2005-09-05 22:14:33 +00:00
Suleiman Souhlal
fdedad764a ffs_mountfs() needs devvp to be locked, so lock it.
Glanced at by:	phk
Tested by:	pjd
MFC after:	3 days
2005-09-02 13:52:55 +00:00
Suleiman Souhlal
93373c422f Set the mountpoint path in the superblock (fs_fsmnt) at mount-time
so that it appears in the various messages (not cleanly unmounted,
filesystem full, etc). This has been broken since rev 1.261.
2005-08-21 22:06:41 +00:00
Tor Egge
15da51f73c Don't set the COMPLETE flag in an inodedep structure before the related
inode has been written.
2005-08-21 18:19:06 +00:00
Ian Dowse
65ed954554 In the ufsdirhash_build() failure case for corrupted directories
or unreadable blocks, make sure to destroy the mutex we created.
Also fix an unrelated typo in a comment.

Found by:	Peter Holm's stress tests
Reviewed by:	dwmalone
MFC after:	3 days
2005-08-17 08:48:42 +00:00
Stephan Uphoff
ed8938e082 Delay freeing disk space for file system blocks until all dirty buffers
are safely released. This fixes softdep problems on truncation (deletion)
of files with dirty buffers.

Reviewed by:	jeff@, mckusick@, ps@, tegge@
Tested by: 	glebius@, ps@
MFC after:	3 weeks
2005-07-31 20:24:14 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
Suleiman Souhlal
679985d03a Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
Ken Smith
6341095e0d This patch addresses a standards violation issue. The standards say a
file's access time should be updated when it gets executed.  A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.

A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called.  The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all.  Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc.  In particular vn_start_write() has not been called.  The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point.  The actual time update will happen later
during a sync (which handles all the necessary locking).

Got me into this:	cperciva
Discussed with:		a lot with bde, a little with kan
Showed patches to:	phk, jeffr, standards@, arch@
Minor discussion on:	arch@
2005-05-31 19:39:52 +00:00
Jeff Roberson
204ec66d38 - Don't set our bio op to be a READ when we've just completed a write. There
are subtle differences in the read and write completion path.  Instead,
   grab an extra write ref so the write path can drop it when we recursively
   call bufdone().  I believe this may be the source of the wrong bufobj
   panics.

Reported by:	pho, kkenn
2005-05-30 07:04:15 +00:00
Kirk McKusick
6a52a06851 Allow removal of empty directories with high link counts. These can
occur on a filesystem running with soft updates after a crash and
before a background fsck has been run. To prevent discrepancies
from arising in a background fsck that may already be running,
the directory is removed but its inode is not freed and is left
with the residual reference count. When encountered by the
background fsck it will be reclaimed.
2005-05-18 22:18:21 +00:00
Jeff Roberson
6c71a2208d - Don't restrict the softdep stats to DEBUG kernels, they cost nothing to
export.  This was happening anyway since this file manually sets DEBUG.
 - Add a sysctl for the number of items on the worklist.
 - Use a more canonical loop restart in softdep_fsync_mountdev, it saves
   some code at the expense of a goto and makes me worry less about
   modifying a variable that should be private to the TAILQ_FOREACH_SAFE
   macro.
2005-05-03 11:03:29 +00:00
Jeff Roberson
2524c26de8 - Use bdone() directly instead of calling it indirectly through
ffs_rawreaddone().

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:28:19 +00:00
Pawel Jakub Dawidek
231b1be179 - Plug memory leak.
- Fix two style nits.

Found by:	Coverity Prevent analysis tool
Reviewed by:	rwatson
MFC after:	1 week
2005-04-16 10:57:49 +00:00
Jeff Roberson
4585e3ac5a - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
Jeff Roberson
ece9473efa - Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate().
Do the same for oip.

Pointed out by:	glebius
2005-04-05 08:49:41 +00:00
Jeff Roberson
bcc8f66c8b - Use M_ZERO rather than explicitly calling bzero().
- Don't intermingle direct calls to lockmgr and indirect calls through
   VOPs.  This will be important in the future.
 - Dont lock the devvp's interlock just to release it on the next line by
   passing LK_INTERLOCK to lockmgr.
 - Restructure ffs_snapshot_unmount so we don't call free() with the
   devvp's interlock locked.
2005-04-03 12:03:44 +00:00
Jeff Roberson
41d4783d49 - In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode
because it may change identities while we're sleeping on the lock.
   Otherwise we may bail out of ffs_sync() early due to an error from
   deadfs.
 - Collapse a VOP_UNLOCK, vrele into a single vput().
2005-04-03 10:38:18 +00:00
Jeff Roberson
153910e0f5 - Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix
two bugs.
 - ffs_disk_prewrite was pulling the vp from the buf and checking for
   COPYONWRITE, when really it wanted the vp from the bufobj that we're
   writing to, which is the devvp.  This lead to us skipping the copy on
   write to all file data, which significantly broke snapshots for the
   last few months.
 - When the SOFTUPDATES option was not included in the kernel config we
   would also skip the copy on write check, which would effectively disable
   snapshots.
 - Remove an invalid mp_fixme().

Debugging tips from:	mckusick
Reported by:		iedowse, others
Discussed with:		phk
2005-04-03 10:29:55 +00:00
Jeff Roberson
278c5a6efa - Fix botched LK_NOWAIT removal. I mistakenly thought this compiled as
part of GENERIC.
2005-03-31 05:58:14 +00:00
Jeff Roberson
aa7ba42796 - FFS supports shared locks, clear LK_NOSHARE from our vnode locks.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:23:20 +00:00
Jeff Roberson
ec3db02a3e - Set LK_NOSHARE for snapshot locks. snapshots require exclusive only
access.
 - Remove the hack from ffs_lock() to implement LK_NOSHARE in a ffs
   specific way.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:21:17 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Jeff Roberson
52f6886551 - Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
   LOCKPARENT or WANTPARENT.  It wasn't even properly used in the CREATE
   or DELETE cases.
2005-03-29 13:16:38 +00:00
Jeff Roberson
d6919865fa - Upgrade a shared lock request to exclusive in ffs_vget() if we have
to create the vnode.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:51 +00:00
Jeff Roberson
a69c43548d - Honor the cn_lkflags passed from namei() when locking the leaf.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:01 +00:00
Jeff Roberson
e19881ff08 - UFS no longer uses PDIRUNLOCK to track the parent state. Instead, we now
rely on ufs to always leave the parent locked except in the ISDOTDOT
   case.  Adjust asserts to deal with these changes.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:35:58 +00:00
Jeff Roberson
eddcb03d02 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:34:36 +00:00
David Schultz
188f6433f6 When the softupdates worklist gets too long, threads that attempt to
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist.  Add a per-thread
flag to detect this situation and avoid the recursion.  This should
fix the stack overflows that could occur while removing large
directory trees.

Tested by:	kris
Reviewed by:	mckusick
2005-03-25 17:30:31 +00:00
Jeff Roberson
080c061ad0 - Call VFS_ROOT() with LK_EXCLUSIVE.
Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:33:45 +00:00
Jeff Roberson
469ec10c1e - Update the ufs_root() prototype.
- Pass the ufs_root() flags argument to VFS_VGET() to allow callers to
   specify shared locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:32:50 +00:00
Jeff Roberson
23d15e852d - Lock the clearing of v_data in ufs_reclaim() to prevent a pagefault
in ffs_lock() when it acesses v_data without the vnlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-17 11:58:43 +00:00
Poul-Henning Kamp
51f5ce0c8c Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
Poul-Henning Kamp
de68347b1b Don't hold a reference on the disk vnode for each inode. 2005-03-15 20:50:58 +00:00
Poul-Henning Kamp
45c26fa2b6 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
Poul-Henning Kamp
e82ef95c11 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
Jeff Roberson
4483fe9227 - Destroy the vnode object earlier in VOP_RECLAIM as we need more of
the vnode valid before the vm flushes pages.
 - Get rid of some extraneous uses of the vnode interlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 01:42:58 +00:00
Poul-Henning Kamp
14bc0685ac Use vfs_hash instead of home-rolled. 2005-03-14 10:21:16 +00:00
Jeff Roberson
9cbe5da9d5 - It is not legal to access v_data without the vnode lock or interlock
held.  Grab the vnode interlock if LK_INTERLOCK has not been passed in
   so that we can inspect v_data in ffs_lock().

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:04:12 +00:00
Jeff Roberson
fe68abe291 - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.
 - Shorten ffs_reload by one step.  The old check for an inactive vnode
   was slightly racey, and the code which deals with still active vnodes
   is not much more expensive.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:03:14 +00:00
Jeff Roberson
fdcc82276e - The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem.  Check that rather than VI_XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:01:50 +00:00
Jeff Roberson
b5411d4fcb - Fix an assert now that the XLOCK no longer exists.
Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:00:41 +00:00
Jeff Roberson
b6ee8476d3 - In ufs_mknod(), hold the lock across the call to vgone() as that is now
required.
 - In ufs_close(), don't do the EAGAIN vrele hack, the top layer now calls
   vn_start_write before the lock is acquired as it should.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:59:14 +00:00
Jeff Roberson
38d504db44 - Don't drop the lock in ufs_inactive().
- Also in ufs_inactive, don't acquire the vnode interlock where it isn't
   strictly needed.  Also owning the vnode interlock while calling vprint()
   will cause locking assertions to trip.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:57:39 +00:00
Jeff Roberson
41766826eb - Fix anoter dyslexic moment; an atomic_set_int should've become ACTIVESET,
not ACTIVECLEAR.

Submitted by:	iedowse
2005-03-01 07:38:45 +00:00
Poul-Henning Kamp
7ce296cf04 Remove debug printout of major/minor numbers, print name instead. 2005-02-27 21:16:26 +00:00
Sam Leffler
d5bbad8372 use uiomove return value instead of always returning 0 when doing a
readlink of a fast link

Noticed by:	Coverity Prevent analysis tool
Reviewed by:	phk
2005-02-27 18:58:31 +00:00
Jeff Roberson
1a4a9672f1 - Add VOP locking asserts in several functions that have been implicated in
recent deadlocks.
2005-02-22 23:56:42 +00:00
Xin LI
a16baf37b9 The recomputation of file system summary at mount time can be a
very slow process, especially for large file systems that is just
recovered from a crash.

Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash.  With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.

Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior.  When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.

Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively.  Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.

This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.

Reviewed by:	mckusick
MFC After:	1 week
2005-02-20 08:02:15 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Xin LI
d5128ab2af When clearing a fragment, it's possible that the length is zero.
Reviewed by:	mckusick
MFC After:	1 week
2005-02-19 07:31:33 +00:00
Jeff Roberson
a8127ebb5d - Remove the unused and unsafe ufs_ihashlookup. This function returned a
vnode pointer that could not be used since no locks were held.

Sponsored by:	Isilon Systems, Inc.
2005-02-14 20:51:39 +00:00
Poul-Henning Kamp
1121c39497 Make non-SOFTUPDATES kernels compile again.
Integrate the stubfile into the main file now that license issues have been
long resolved.
2005-02-11 08:13:31 +00:00
Poul-Henning Kamp
adf4157738 Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static. 2005-02-10 12:20:08 +00:00
Jeff Roberson
a3caf16e99 - In the softupdates case for ffs_truncate() we use vinvalbuf() to
invalidate pending io and dependencies.  However, vinvalbuf() rightfully
   does not call vnode_pager_setsize() for us.  We must do this here.  This
   could potentially have caused numerous kinds of bugs, but it was
   specifically causing msync() deadlocks because msync() was writing
   flushing pages that should not have been valid.

Sponsored by:	Isilon Systems, Inc.
Reported by:	kkenn
2005-02-09 23:05:20 +00:00
Poul-Henning Kamp
365b18aa89 style polishing. 2005-02-09 12:22:16 +00:00
Colin Percival
79653046d8 Add a new sysctl, "security.jail.chflags_allowed", which controls the
behaviour of chflags within a jail.  If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.

This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.

Discussed with:	csjp, rwatson
MFC after:	2 weeks
2005-02-08 21:31:11 +00:00
Poul-Henning Kamp
02f2c6a9d8 Split the vop_vector for ffs1 and ffs2, this is mostly for the different
EXTATTR support.
2005-02-08 21:03:52 +00:00
Poul-Henning Kamp
44787ceb0b Use ffs_truncate() directly instead of UFS_TRUNCATE() 2005-02-08 20:51:00 +00:00
Poul-Henning Kamp
dd19a799b8 Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.

Remove all the background write related stuff from the normal bufwrite.

This drags the softdep_move_dependencies() back into FFS.

Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance).  I don't think we really gain anything
but complexity from doing this with a buf.
2005-02-08 20:29:10 +00:00
Poul-Henning Kamp
88e5b12a20 Drag another softupdates tentacle back into FFS: Now that FFS's
vop_fsync is separate from the internal use we can do the full job
there.
2005-02-08 18:09:11 +00:00
Poul-Henning Kamp
efd6d9808c Don't use the UFS_* and VFS_* functions where a direct call is possble.
The UFS_ functions are for UFS to call back into VFS.  The VFS functions
are external entry points into the filesystem.
2005-02-08 17:40:01 +00:00
Robert Watson
45faa442c3 Don't use VOP_LEASE() with operations on extended attribute backing
files.

Pointed out by:	phk
2005-02-08 17:05:38 +00:00
Poul-Henning Kamp
40854ff546 For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().

Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
2005-02-08 16:25:50 +00:00
Poul-Henning Kamp
d6f622cc2f For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().

Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
2005-02-08 15:54:30 +00:00
Poul-Henning Kamp
32a870da8a Use VOP_STRATEGY_APV() instead of direct dereference, this is more
correct.
2005-02-08 15:40:11 +00:00
Jeff Roberson
9087d86e66 - Use a seperate malloc tag for saved inode contents to help in debugging
memory modified after free errors.

Sponsored by:	Isilon Systems, Inc.
2005-02-02 20:30:47 +00:00
Ken Smith
87c29bf93e Back out previous commit, bde@ provided an example of something this
breaks.
2005-02-02 14:21:01 +00:00
Ken Smith
0fac1537a2 It was noticed that we do not change a file's access time when it gets
executed.  This appears to violate most of the UNIX-ish standards.
One example quote from:

  http://www.opengroup.org/onlinepubs/009695399/functions/exec.html

    Upon successful completion, the exec functions shall mark for update
    the st_atime field of the file. If an exec function failed but was
    able to locate the process image file, whether the st_atime field is
    marked for update is unspecified. Should the exec function succeed,
    the process image file shall be considered to have been opened with
    open().

This appears to take care of it for ufs filesystems, doing the necessary
sanity checks (read-only filesystem, etc) without violating any other
standards (setting atime for any open appears to be allowed in any standards
I could find).

Noticed by:	cperciva
Reviewed by:	kan, rwatson
2005-02-02 00:21:38 +00:00
Warner Losh
1f0ce611b3 nit in /*- 2005-01-31 08:16:45 +00:00
Peter Edwards
e697161fa2 Tell vnode_create_vobject() how big an object to create, rather
than having it work it out via the more expensive VOP_GETATTR

Reviewed by: phk@
2005-01-29 14:23:09 +00:00
Poul-Henning Kamp
a369f34d76 Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().
2005-01-28 14:42:17 +00:00
Poul-Henning Kamp
d4eb29ba71 Remove unused argument to vrecycle() 2005-01-28 13:08:21 +00:00
Poul-Henning Kamp
84a6975215 Introduce and use g_vfs_close(). 2005-01-25 15:52:04 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
f74b3b1f6c Create a vnode object when the file is opened. Trust that we did so. 2005-01-24 23:04:33 +00:00
Poul-Henning Kamp
ce12d37e7b Don't create vnode_pager objects for the disk device.
geom_vfs will do that.
2005-01-24 22:41:59 +00:00
Poul-Henning Kamp
625d4bc03a Create a vp->v_object in VFS_FHTOVP() if we want to be exportable
with NFS.

We are moving responsibility for creating the vnode_pager object into
the filesystems which own the vnode, and this is one of the places
we have to cover.

We call vnode_create_vobject() directly because we own the vnode.

If we can get the size easily, pass it as an argument to save the
call to VOP_GETATTR() in vnode_create_vobject()
2005-01-24 21:51:19 +00:00
Poul-Henning Kamp
091710ab22 Polish style. 2005-01-24 12:19:28 +00:00
Jeff Roberson
08023360a0 - Convert the global LK lock to a mutex.
- Expand the scope of lk to cover not only interrupt races, but also
   top-half races, which includes many new uses over global top-half
   only data.
 - Get rid of interlocked_sleep() and use msleep or BUF_LOCK where
   appropriate.
 - Use the lk mutex in place of the various hand rolled semaphores.
 - Stop dropping the lk lock before we panic.
 - Fix getdirtybuf() callers so that they reacquire access to whatever
   softdep datastructure they were inxpecting in the failure/retry
   case.  Previously, sleeps in getdirtybuf() could leave us with
   pointers to bad memory.
 - Update handling of ffs to be compatible with ffs locking changes.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:18:31 +00:00
Jeff Roberson
3ba649d792 - Initialize and destroy the per-filesystem ufs lock where appropriate.
- Use the buffer lock on the superblock buf to serialize calls to
   sbupdate.
 - Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:12:28 +00:00
Jeff Roberson
dec351f69e - Remove GIANT_REQUIRED where giant is no longer required.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:10:47 +00:00
Jeff Roberson
5cef9d6add - Use the ufs lock to protect fs_active.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:10:11 +00:00
Jeff Roberson
353255885c - Acquire the ufs lock around several ffs_alloc functions that require
it.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:09:10 +00:00
Jeff Roberson
8e37fbad3a - Don't use atomic operations to deal with the active array, instead
it is now quite naturally protected by the ufsmount mutex.
 - Use the ufs lock to protect various fields in struct fs, primarily the
   cg summary needs protection to avoid allocation races.  Several
   functions have been slightly re-arranged to reduce the number of
   lock operations.
 - Adjust several functions (blkfree, freefile, etc.) to accept a
   ufsmount as an argument so that we may access the ufs lock.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:08:35 +00:00
Jeff Roberson
5c77b03eff - Acquire the ufs lock when manipulating some fields of struct fs.
- Change arguments to various ffs functions to match their new
   prototypes.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:04:22 +00:00
Jeff Roberson
f2aa1113a3 - Mark the struct fs members that require the ufsmount mutex.
- Define some macros for manipulating the fs_active bitmap.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:03:17 +00:00
Jeff Roberson
aaee366929 - Change some function parameters so that the ufsmount structure is
accessable in places where the ufs lock will be needed.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:02:11 +00:00
Jeff Roberson
751d0d9fc9 - Add a mutex to the ufsmount structure. This mutex is used to protect
any per-instance global data that is not already protected by a
   buf or vnode lock.  Presently, only fields in ffs's struct fs utilize
   this lock.
 - Sort some ufsmount members so that fields used for quotas are grouped
   together.  This is in anticipation of quota locking.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:01:10 +00:00
Pawel Jakub Dawidek
39cfb23935 Fix ACLs handling for the root file system.
Without this fix, when ACLs are set via tunefs(8) on the root file system,
they are removed on boot when 'mount -a' is called, because mount(8)
called for the root file system always add MNT_UPDATE flag and MNT_UPDATE
flag isn't perfect.
Now, one cannot remove ACLs stored in superblock (configured with tunefs(8))
via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still
possible to mount file system which doesn't have ACLs in superblock via
'mount -o acls <file system>' or /etc/fstab's 'acls' option.

Reported by:	Lech Lorens/pl.comp.os.bsd
Discussed with:	phk, rwatson
Reviewed by:	rwatson
MFC after:	2 weeks
2005-01-15 17:09:53 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Poul-Henning Kamp
e39db32ab0 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
Poul-Henning Kamp
6ef8480a88 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
Poul-Henning Kamp
0391e5a151 Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE() 2005-01-11 09:10:46 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Poul-Henning Kamp
a7e8286f28 white space 2004-12-14 21:35:00 +00:00
Poul-Henning Kamp
59d42685ad Implement simpler panics for VOP_{read,write} on fifos. 2004-12-14 21:30:45 +00:00
Warner Losh
7a7e867742 LINT defines things which compile in code that as referring to the old
a_desc element.  change this to the new a_gen.a_desc to reflect
changes to vnode_if.h generation.

Noticed by: tinderbox, phk
2004-12-13 17:53:20 +00:00
Poul-Henning Kamp
4a18054d7b With the introduction of UFS2 we started looking for superblocks in
four different locations on a prospective filesystem.

If we found none, we forgot to invalidate the four buffers, thus the
following sequence would fails:

	(md0 = blank disk)
	mount /dev/md0 /mnt
	(fails, no superblocks)
	newfs /dev/md0
	(writes using physio which does not go through buffercache).
	mount /dev/md0 /mnt
	(still fails, the four cached buffers still contain no superblocks)

Found by:	ru
2004-12-12 14:19:11 +00:00
Marcel Moolenaar
9effe51e45 Revert previous commit. The null-pointer function call (a dereference
on ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.

Requested by: phk
2004-12-11 23:05:30 +00:00
Kirk McKusick
364ed814e7 Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by:	Henry Whincup <henry@jot.to>
MFC after:	1 week
2004-12-09 21:24:00 +00:00
Poul-Henning Kamp
8f25bad356 Fix snapshot creation. 2004-12-08 11:54:06 +00:00
Poul-Henning Kamp
f21cc2cafc Fix nfs exports (for now). The real fix is to teach mountd about
nmount.
2004-12-07 15:09:30 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
743312367a VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't.  Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way:  Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
2004-12-05 22:41:02 +00:00
Marcel Moolenaar
061f5ec825 Fix null-pointer indirect function calls introduced in the previous
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.

No pathos on: current@
2004-12-05 22:30:28 +00:00
Poul-Henning Kamp
93e0b506e3 typo in comment. 2004-12-03 20:36:55 +00:00
Poul-Henning Kamp
aec0fb7b40 Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

	Replace the entire vop_t frobbing thing with properly typed
	structures.  The only casualty is that we can not add a new
	VOP_ method with a loadable module.  History has not given
	us reason to belive this would ever be feasible in the the
	first place.

	Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

	Give coda correct prototypes and function definitions for
	all vop_()s.

	Generate a bit more data from the vnode_if.src file:  a
	struct vop_vector and protype typedefs for all vop methods.

	Add a new vop_bypass() and make vop_default be a pointer
	to another struct vop_vector.

	Remove a lot of vfs_init since vop_vector is ready to use
	from the compiler.

	Cast various vop_mumble() to void * with uppercase name,
	for instance VOP_PANIC, VOP_NULL etc.

	Implement VCALL() by making vdesc_offset the offsetof() the
	relevant function pointer in vop_vector.  This is disgusting
	but since the code is generated by a script comparatively
	safe.  The alternative for nullfs etc. would be much worse.

	Fix up all vnode method vectors to remove casts so they
	become typesafe.  (The bulk of this is generated by scripts)
2004-12-01 23:16:38 +00:00
Poul-Henning Kamp
6fde64c778 Mechanically change prototypes for vnode operations to use the new typedefs. 2004-12-01 12:24:41 +00:00
Poul-Henning Kamp
964ebefd8d Use system wide no-op vfs_start function. 2004-11-25 09:11:27 +00:00
Jeff Roberson
b646893f0f - Eliminate the acquisition and release of the bqlock in bremfree() by
setting the B_REMFREE flag in the buf.  This is done to prevent lock order
   reversals with code that must call bremfree() with a local lock held.
   This also reduces overhead by removing two lock operations per buf for
   fsync() and similar.
 - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
   has been acquired so that we may remove ourself from the free-list.
 - Provide a bremfreef() function to immediately remove a buf from a
   free-list for use only by NFS.  This is done because the nfsclient code
   overloads the b_freelist queue for its own async. io queue.
 - Simplify the numfreebuffers accounting by removing a switch statement
   that executed the same code in every possible case.
 - getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
   Remove a panic associated with this condition and delay asserts that
   inspect the buf until after it is locked.

Reviewed by:	phk
Sponsored by:	Isilon Systems, Inc.
2004-11-18 08:44:09 +00:00
Poul-Henning Kamp
9c83534dd8 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
Poul-Henning Kamp
51ac12ab28 Be prepared to accept NULL mountargs as part of root-mounting. 2004-11-13 13:04:31 +00:00
Poul-Henning Kamp
cf5e414960 Put back the vfs_object_create() calls, they do make a difference when
my test-setup does what I want it to instead of what I ask it to.

Pointed out by:	tegge
2004-11-12 10:27:14 +00:00
Poul-Henning Kamp
40ce27cb57 fix some comments 2004-11-10 06:53:31 +00:00
Poul-Henning Kamp
2e6649198a Use mount flags instead of NULL path to detect root filesystem mount. 2004-11-09 23:38:10 +00:00
Poul-Henning Kamp
5e2ccaff7a Stop pretending to have a vm_object backing the underlying disk vnode:
it isn't used for anything anywhere and the vnode_pager would explode
if we attempted to.
2004-11-09 23:12:45 +00:00
Poul-Henning Kamp
5349c79d75 Properly implement a default version of VOP_GETWRITEMOUNT.
Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.
2004-11-06 11:41:22 +00:00
Poul-Henning Kamp
40c340aa5d Don't grab the exclusive bit on a root filesystem until we are willing
to mount it.  Doing so prevented fsck to be run after a refused mount.
2004-11-04 09:11:22 +00:00
Poul-Henning Kamp
4392001125 Move UFS from DEVFS backing to GEOM backing.
This eliminates a bunch of vnode overhead (approx 1-2 % speed
improvement) and gives us more control over the access to the storage
device.

Access counts on the underlying device are not correctly tracked and
therefore it is possible to read-only mount the same disk device multiple
times:
	syv# mount -p
	/dev/md0        /var    ufs rw  2 2
	/dev/ad0        /mnt    ufs ro  1 1
	/dev/ad0        /mnt2   ufs ro  1 1
	/dev/ad0        /mnt3   ufs ro  1 1

Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches
things in RAM) this is not possible with read-write mounts, and the system
will correctly reject this.

Details:

	Add a geom consumer and a bufobj pointer to ufsmount.

	Eliminate the vnode argument from softdep_disk_prewrite().
	Pick the vnode out of bp->b_vp for now.  Eventually we
	should find it through bp->b_bufobj->b_private.

	In the mountcode, use g_vfs_open() once we have used
	VOP_ACCESS() to check permissions.

	When upgrading and downgrading between r/o and r/w do the
	right thing with GEOM access counts.  Remove all the
	workarounds for not being able to do this with VOP_OPEN().

	If we are the root mount, drop the exclusive access count
	until we upgrade to r/w.  This allows fsck of the root
	filesystem and the MNT_RELOAD to work correctly.

	Set bo_private to the GEOM consumer on the device bufobj.

	Change the ffs_ops->strategy function to call g_vfs_strategy()

	In ufs_strategy() directly call the strategy on the disk
	bufobj.  Same in rawread.

	In ffs_fsync() we will no longer see VCHR device nodes, so
	remove code which synced the filesystem mounted on it, in
	case we came there.  I'm not sure this code made sense in
	the first place since we would have taken the specfs route
	on such a vnode.

	Redo the highly bogus readblock() function in the snapshot
	code to something slightly less bogus: Constructing an uio
	and using physio was really quite a detour.  Instead just
	fill in a bio and ship it down.
2004-10-29 10:15:56 +00:00
Poul-Henning Kamp
570a7ddaa3 We only support backing UFS/FFS with disks. 2004-10-28 06:19:28 +00:00
Poul-Henning Kamp
a40a512387 Eliminate unnecessary KASSERTS. 2004-10-27 06:45:06 +00:00
Poul-Henning Kamp
93d244fb1a KASSERT that we only get to prewrite() on writes. 2004-10-26 20:13:49 +00:00
Poul-Henning Kamp
8dd5650594 White space changes. Add missing static. 2004-10-26 20:13:21 +00:00
Poul-Henning Kamp
53389dd64a Replace single case switch() with if(). 2004-10-26 20:12:25 +00:00
Poul-Henning Kamp
b6e2606155 Vertically align comment. 2004-10-26 20:12:00 +00:00
Poul-Henning Kamp
6e77a04170 The island council met and voted buf_prewrite() home.
Give ffs it's own bufobj->bo_ops vector and create a private strategy
routine, (currently misnamed for forwards compatibility), which is
just a copy of the generic bufstrategy routine except we call
softdep_disk_prewrite() directly instead of through the buf_prewrite()
indirection.

Teach UFS about the need for softdep_disk_prewrite() and call the
function directly in FFS.

Remove buf_prewrite() from the default bufstrategy() and from the
global bio_ops method vector.
2004-10-26 10:44:10 +00:00
Poul-Henning Kamp
58883a1fe5 Fix syntax errors introduced by last commit.
Why isn't DIRECTIO in NOTES/LINT ?
2004-10-26 09:04:20 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Poul-Henning Kamp
fae974f156 Degeneralize the per cdev copyonwrite callback. The only possible value
is ffs_copyonwrite() and the only place it can be called from is FFS which
would never want to call another filesystems copyonwrite method, should one
exist, so there is no reason why anything generic should know about this.
2004-10-26 06:25:56 +00:00
Poul-Henning Kamp
156cb26583 Loose the v_dirty* and v_clean* alias macros.
Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().
2004-10-25 09:14:03 +00:00
Poul-Henning Kamp
ee1d0eb330 Remove vnode->v_bsize. This was a dead-end. 2004-10-25 07:50:59 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
Robert Watson
60c9762920 Explicitly break out NETA license from Berkeley license to clearly
indicate license grant, as well as to indicate that NETA is asserting
only two clauses, not four clauses.

Requested by:	imp
2004-10-20 08:05:02 +00:00
Nate Lawson
894d8d3c03 Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB,
allowing for sizes up to 4 TB.  This doesn't affect UFS2 since b is already
a 64 bit type, coincidental with daddr_t.

Submitted by:	bde
2004-10-09 20:16:06 +00:00
Pawel Jakub Dawidek
8d02a378aa Back out changes which were introduced to delay mounting root file system.
Those changes were made on gmirror needs, but now gmirror handles this
by itself.
2004-10-05 11:26:43 +00:00
Poul-Henning Kamp
4f116178ba Remove support for accessing device nodes in UFS/FFS.
Device nodes can still be created and exported with NFS.
2004-09-28 13:30:58 +00:00
Poul-Henning Kamp
961da2716b Give cluster_write() an explicit vnode argument.
In the future a struct buf will not automatically point out a vnode for us.
2004-09-27 19:14:10 +00:00
Pawel Jakub Dawidek
5a19f8b0c4 Introduce new /boot/loader.conf variable: root_mount_delay.
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).

This will allow to boot from gmirror device in degraded mode.
2004-09-23 10:13:18 +00:00
Poul-Henning Kamp
d705e025d0 The getpages VOP was a good stab at getting scatter/gather I/O without
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.

The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory.  Once the VM/buf thing is sorted out it is next on
the list.

Retire most of vnode method. ffs_getpages().  It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.

Retire specfs_getpages() as well, as it has no users now.
2004-09-19 08:14:55 +00:00
Poul-Henning Kamp
b08c753baa Do not traverse list of snapshots if there isn't one.
Found by:	scottl
2004-09-16 17:28:56 +00:00
Poul-Henning Kamp
b85e29f007 Missed a place where snapshots were allocated in my last commit to
this file.
2004-09-16 15:58:18 +00:00
Poul-Henning Kamp
67673e6677 Create struct snapdata which contains the snapshot fields from cdev
and the previously malloc'ed snapshot lock.

Malloc struct snapdata instead of just the lock.

Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes).

While here, give the private readblock() function a vnode argument
in preparation for moving UFS to access GEOM directly.
2004-09-13 07:29:45 +00:00
Poul-Henning Kamp
883d3c0c07 Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork.  In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly.  Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
2004-09-13 06:50:42 +00:00
Poul-Henning Kamp
1affa3adc8 Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
2004-09-07 09:17:05 +00:00
Christian S.J. Peron
60088fb7b1 Currently, if the secure level is low enough, system flags can
be manipulated by prison root. In 4.x prison root can not manipulate
system flags, regardless of the security level. This behavior
should remain consistent to avoid any surprises which could lead
to security problems for system administrators which give out
privileged access to jails.

This commit changes suser_cred's flag argument from SUSER_ALLOWJAIL
to 0. This will prevent prison root from being able to manipulate
system flags on files.

This may be a MFC candidate for RELENG_5.

Discussed with:	cperciva
Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
PR:		kern/70298
2004-08-22 02:03:41 +00:00
John Baldwin
b72ea57f3b Generalize the UFS bad magic value used to determine when a filesystem
has only been partly initialized via newfs(8) so that it applies to both
UFS1 and UFS2.

Submitted by:	"Xin LI" delphij at frontfree dot net
MFC:		maybe?
2004-08-19 11:09:13 +00:00
David Malone
da126abaf1 When looking for some extra data to include in the hash, use the
address of the dirhash, rather than the first sizeof(struct dirhash
*) bytes of the structure (which, thankfully, seem to be constant).

Submitted by:	Ted Unangst <tedu@zeitbombe.org>
MFC after:	2 weeks
2004-08-16 10:00:44 +00:00
John-Mark Gurney
ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
Poul-Henning Kamp
7ac439fec4 use bufdone() not biodone(). 2004-08-08 13:23:05 +00:00
Poul-Henning Kamp
5e8c582ac2 Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
Poul-Henning Kamp
d634f69316 Remove global variable rootdevs and rootvp, they are unused as such.
Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition.  We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
2004-07-28 20:21:04 +00:00
Alexander Kabaev
b403319b8d Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper
inode field based on UFS version. Use DIP ro read values and DIP_SET
to modify them throughout FFS code base.
2004-07-28 06:41:27 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Poul-Henning Kamp
d8d3d4158b Make sure to update the mnt_stats before UFS1 extattr tried to
do I/O on the device.  Otherwise the blocksize is undefined in the
buffer cache.
2004-07-14 14:19:32 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Marcel Moolenaar
f65de26bf6 Update for the KDB debugger framework:
o  Make debugging code conditional upon KDB.
o  Use kdb_backtrace() instead of backtrace().
o  Remove inclusion of opt_ddb.h.
2004-07-10 20:45:47 +00:00
Poul-Henning Kamp
c94cd5fc8c Explicity initialize vp->v_bsize. 2004-07-07 20:04:06 +00:00
Poul-Henning Kamp
e3c5a7a4dd When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
Robert Watson
12ec7658a4 Annotate that we don't check the returned data length from ufs_readdir()
because UFS uses fixed-size directory blocks.  When using this code with
other file systems, such as HFS+, the value of auio.uio_resid will need
to be taken into account.
2004-06-24 18:31:23 +00:00
Robert Watson
bb0527fdd3 Remove unnecessary setting of VV_SYSTEM on extended attribute backing
files.  When this flag is used in our port of this code to Darwin, it
caused remarkable pain, and doesn't offer a benefit in FreeBSD.
2004-06-24 18:17:41 +00:00
Robert Watson
00a460dcf4 Protect a non-text comment with a '-'. 2004-06-24 17:45:45 +00:00
Robert Watson
cd39d9b661 White space cleanup: use spaces instead of tabs in variable declarations
local to a function.  Remove a couple of blank lines in variable
declarations.

In one case, explicitly test against NULL rather than using a pointer
as a boolean directly.
2004-06-24 17:44:14 +00:00
Bruce Evans
8f7c483f5c Backed out previous commit. The dev_t -> `struct cdev *' changes have
lots of errors.  Blind substitution of "dev_t foo" by "struct cdev *foo"
in comments usually just created an English syntax error (e.g.,
"struct cdev *changes"), but here it did less than that since the dev_t
is a user dev_t.
2004-06-20 03:11:19 +00:00
Jun Kuriyama
86030e4a00 Avoid deadlock which is caused by locking VDIR of parent and VREG of
snapshot itself in wrong order.
We can skip unlink check of that directory because it must have
snapshot in it.

Reviewed by:	mckusick and current@
2004-06-18 14:35:17 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Julian Elischer
fa88511615 Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
Stefan Farfeleder
1a5ff9285a Avoid assignments to cast expressions.
Reviewed by:	md5
Approved by:	das (mentor)
2004-06-08 13:08:19 +00:00
Tim J. Robbins
fa2a4d0595 Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by:	jhb
2004-06-03 01:47:37 +00:00
Kirill Ponomarev
b4a1d9299a - Fix typo
Approved by:	tobez
2004-05-31 16:55:12 +00:00
Ken Smith
4b14cc0205 Upon further review it was decided this piece of the msync(2)
fixes was applicable to HEAD, originally it was thought this
should only be done in RELENG_4.  Implement IO_INVAL in the vnode
op for writing by marking the buffer as "no cache".  This fix
has already been applied to RELENG_4 as Rev. 1.65.2.15 of
ufs/ufs/ufs_readwrite.c.

Reviewed by:	alc, tegge
2004-05-21 12:05:48 +00:00
Ken Smith
83d8045f16 Style fixup in previous commit.
Noticed by:	bde (thanks!)
2004-05-19 18:06:21 +00:00
Ken Smith
f7dd67d801 Change ffs_realloccg() to set the valid bits for the extended part of the
fragment to zero the valid parts of a VM_IO buffer.

RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately.

Reviewed by:	alc, tegge
2004-05-14 22:00:08 +00:00
Bosko Milekic
451079d4ab Revert previous change to this file because it breaks some
things which compare /etc/fstab entries to results from
getfsstat().  The real way to fix this is to make 'ufs2'
a recognized filesystem (for real, no beating around the
bush).

This should fix things like 'umount -a -t ufs' now.
Appologies for the previous breakage.
2004-04-29 15:10:42 +00:00
Bosko Milekic
2aebb586db The previous change to mount(8) to report ufs or ufs2 used
libufs, which only works for Charlie root.

This change reverts the introduction of libufs and moves the
check into the kernel.  Since the f_fstypename is the same
for both ufs and ufs2, we check fs_magic for presence of
ufs2 and copy "ufs2" explicitly instead.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
2004-04-26 15:13:46 +00:00
Bruce Evans
f679aa45a7 Record where half the bits in this file came from (from ufs_readwrite.c).
Damage to history from moving bits was especially large since a repo copy
is not feasible for partial files.
2004-04-07 11:21:18 +00:00
Warner Losh
012d41340a Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and irc message from Robert
Watson saying that clause 3 can be removed from those files with an
NAI copyright that also have only a University of California
copyrights.

Approved by: core, rwatson
2004-04-07 03:47:21 +00:00
John Baldwin
255ec151e6 Fix a paste-o from the buf_prewrite() cleanup commit and check for the
MNTK_SUSPEND flag on the correct vnode pointer in softdep_disk_prewrite().

Reviewed by:	phk
Tested by:	kensmith
2004-04-06 19:20:24 +00:00
Maxime Henrion
b1fddb236f Fix the remaining warnings of growfs(8) on my sparc64 box with
WARNS=6.  I don't change the WARNS level in the Makefile because I
didn't tested this on other archs.

The fs.h fix was suggested by:	marcel
Reviewed by:	md5(1)
2004-04-03 23:30:59 +00:00
Alexander Kabaev
c355fd5a84 Avoid doing bawrite to initialize inode block while holding cylinder
group block locked. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause panic due to recursive lock attempt.

PR:		64206
Reviewed by:	mckusick
Tested by:	pjd
2004-03-16 22:06:32 +00:00
Poul-Henning Kamp
ceb58ca58f When I was a kid my work table was one cluttered mess an cleaning it up
were a rather overwhelming task.  I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.

Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs:  Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.

It's not pretty, but at least it's one less layer violated.
2004-03-11 18:50:33 +00:00
Poul-Henning Kamp
4d453ef101 Properly vector all bwrite() and BUF_WRITE() calls through the same path
and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
2004-03-11 18:02:36 +00:00
Kirk McKusick
ecef42e1eb A more accurate test in the new ufs_lock than that in 1.235. 2004-02-23 19:05:05 +00:00
Kirk McKusick
546a1660f0 In the function clear_inodedeps(), a FREE_LOCK() should be called
AFTER the call to vn_start_write(), not before it. Otherwise, it is
possible to unlock it multiple times if the vn_start_write() fails.

Submitted by:	Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
2004-02-23 06:56:31 +00:00
Kirk McKusick
6c053cec34 Change UFS from using vop_stdlock to using its own ufs_lock.
In ufs_lock, check for attempts to acquire shared locks on
snapshot files and change them to be exclusive locks. This
change eliminates deadlocks and machine lockups reported in
-current since most read requests started using shared lock
requests.

Submitted by:	Jun Kuriyama <kuriyama@imgsrc.co.jp>
2004-02-23 06:40:17 +00:00
Robert Watson
f6a4109212 Update my personal copyrights and NETA copyrights in the kernel
to use the "year1-year3" format, as opposed to "year1, year2, year3".
This seems to make lawyers more happy, but also prevents the
lines from getting excessively long as the years start to add up.

Suggested by:	imp
2004-02-22 00:33:12 +00:00
David Malone
346180de08 Abstract dirhash's locking using macros. This should make it easier to
use the same dirhash code on different branches/platforms.

Reviewed by:	Ted Unangst <tedu@zeitbombe.org>
Reviewed by:	iedowse
MFC after:	3 weeks
2004-02-15 21:39:35 +00:00
Bruce Evans
e9827c6d93 Fixed some style bugs:
- don't unlock the vnode after vinvalbuf() only to have to relock it
  almost immediately.
- don't refer to devices classified by vn_isdisk() as block devices.
2004-02-14 04:41:13 +00:00
Bruce Evans
0efb13948d MFextfs: backed out secondary changes in rev.1.40 that had become just
style bugs (a variable that is used only once, and misformattings).
2004-02-13 03:05:12 +00:00
Jun Kuriyama
df1941fb59 Fix style bugs in previous commit.
Submitted by:	bde
2004-02-13 02:02:06 +00:00
Bruce Evans
8adff5fc12 Fixed some minor style bugs (English usage and formatting of binary
operators) in and near revs.1.169-1.170 (open mode bandaid).  This
(or better a proper fix) should have been done before cloning the
bandaid to many other file systems.
2004-02-12 16:52:24 +00:00
Jun Kuriyama
5580f04ab0 Reverse lock order by using local variable. This will shut up "acquiring
duplicate lock of same type" message.

Reviewed by:	mckusick
2004-02-12 08:52:08 +00:00
Bruce Evans
1723bc36ef Removed more vestiges of vfs_ioopt:
- rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads
  that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed
  timestamp bugs in it.  Removal of most of vfs_ioopt made it just and
  optimization, and removal of the vm object reference calls made it less
  than an optimization.  It was cloned in rev.1.94 of ufs_readwrite.c as
  part of cloning ffs_extwrite() although it was always less than an
  optimization in ffs_extwrite().
- some comments, compound statements and vertical whitespace were vestiges
  of dead code.
2004-02-11 15:27:26 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Alan Cox
bfb7317ebf Remove unnecessary vm object reference and deallocate calls from ffs_read()
and ffs_write().  These calls trace their origins to the dead vfs_ioopt
code, first appearing in revision 1.39 of ufs_readwrite.c.

Observed by:	bde
Discussed with:	tegge
2004-01-31 05:42:58 +00:00
Andrey A. Chernov
a0036d23a6 Turn uio_resid/uio_offset comments into KASSERTs
Reviewed by:    bde
2004-01-27 11:28:38 +00:00
Andrey A. Chernov
51cf017614 Copy comment about caller check from ffs_read to ffs_extread, don't
check for uio_resid < 0 here too.
2004-01-23 06:00:41 +00:00
Andrey A. Chernov
070f8eefb1 Fix various panic() strings to reflect true function name to allow
easy grep.
Small code reorganization to look more logic.
Copy ffs_write check from prev. commit to ffs_extwrite.
2004-01-23 05:52:31 +00:00
Andrey A. Chernov
bd0cc17757 ffs_read:
Replace wrong check returned EFBIG with EOVERFLOW handling from POSIX:

36708 [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the
starting position is before the end-of-file, and the starting position is
greater than or equal to the offset maximum established in the open file
description associated with fildes.

ffs_write:
Replace u_int64_t cast with uoff_t cast which is more natural for types
used.

ffs_write & ffs_read:
Remove uio_offset and uio_resid checks for negative values, the caller
supposed to do it already. Add comments about it.

Reviewed by:    bde
2004-01-23 05:38:02 +00:00
Alexander Kabaev
6bd39fe978 Spell magic '16' number as IO_SEQSHIFT. 2004-01-19 20:03:43 +00:00
Alexander Kabaev
291027ce9c Avoid calling vprint on a vnode while holding its interlock mutex.
Move diagnostic printf after vget. This might delay the debug
output some, but at least it keeps kernel from exploding if
DEBUG_VFS_LOCKS is in effect.
2004-01-04 04:08:34 +00:00
Don Lewis
31c81e4bed Set fs_ronly to the correct value in ffs_reload() when reloading the file
system super block after fsck has repaired the file system.  The value of
fs_ronly was getting overwritten, which caused ffs_update() to attempt to
update inode timestamps even though the file system was still mounted
read-only.

This fixes the "giving up on N buffers" error that is triggered by running
fsck on the root file system and then rebooting without mounting the file
system read-write.
2003-12-07 05:16:52 +00:00
Wes Peters
ec52df8eb9 Write the UFS2 superblock with a 'BAD' magic number at the beginning
of newfs, to signify the newfs operation has not yet completed.  Re-
write the superblock with the correct magic number once all of the
cylinder groups have been created to show the operation has finished.

Sponsored by:	St. Bernard Software
2003-11-16 07:08:27 +00:00
Poul-Henning Kamp
00cbe31bd8 Send B_PHYS out to pasture, it no longer serves any function. 2003-11-15 09:28:09 +00:00
Alan Cox
c78b8dfacf Call free(9) after the vnode interlock is released, avoiding a lock-order
reversal.
2003-11-13 03:56:32 +00:00
Kirk McKusick
fde81c7d8e Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Tim Robbins <tjr@freebsd.org>
Reviewed by:	Julian Elischer <julian@elischer.org>
Reviewed by:	the hoards of <arch@freebsd.org>
Sponsored by:   DARPA & NAI Labs.
2003-11-12 08:01:40 +00:00
Alexander Kabaev
ca430f2e92 Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff
2003-11-05 04:30:08 +00:00
Alexander Kabaev
45d45c6cde Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter
and one cannot be sure it is equal to curthread.
2003-11-03 04:46:19 +00:00
Alexander Kabaev
cb9ddc80ae Take care not to call vput if thread used in corresponding vget
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.

The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.

This fixes the panic on shutdown people have reported.
2003-11-02 04:52:53 +00:00
Alexander Kabaev
492c1e68fb Temporarily undo parts of the stuct mount locking commit by jeff.
It is unsafe to hold a mutex across vput/vrele calls.

This will be redone when a better locking strategy is agreed upon.

Discussed with: jeff
2003-11-01 05:51:54 +00:00
Don Lewis
9f206707a5 Tweak the calculation of minbfree in ffs_dirpref() so that only
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory.  The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.

Modify the calculation of minifree in the same way.

Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.

Reviewed by:	mckusick
Tested by:	kmarx@vicor.com
MFC after:	2 weeks
2003-10-31 07:25:06 +00:00
John Baldwin
787f162df6 Move the P_COWINPROGRESS flag from being a per-process p_flag to being a
per-thread td_pflag which doesn't require any locks to read or write as it
is only read or written by curthread on itself.

Glanced at by:	mckusick
2003-10-23 21:14:08 +00:00
Tor Egge
f0da6ec99b Initialize bp->b_offset to the physical offset in partition
so GEOM knows where to read from disk.
2003-10-22 18:57:59 +00:00
Poul-Henning Kamp
2c18019f14 DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
2003-10-18 14:10:28 +00:00
Poul-Henning Kamp
4e1694ecaf Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY() 2003-10-18 11:16:33 +00:00
Kirk McKusick
bd189c8c3e When expunging unlinked files from a snapshot, skip over holes in the
file rather than panicing with "indiracct: botched params".

Submitted by:	Mark Santcroos <marks@ripe.net>
2003-10-17 13:57:58 +00:00
Jeff Roberson
a844eb934c - My last commit to this file is still not safe, I believe that it may be
due to the recursion in indir_trunc().
2003-10-06 03:28:03 +00:00
Jeff Roberson
8af6a57099 - Reinstate 1.142 this was fixed by 1.144. 2003-10-06 02:39:37 +00:00
Jeff Roberson
69b609a85d - The VCHR case in ffs_sync() is an unneccsary optimization especially
considering how infrequently we access devices via ffs now that we have
   devfs.   Collapse this case with the other case.

Obtained from:	bde
2003-10-05 22:56:33 +00:00
Jeff Roberson
ab1f917b53 - Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE()
so make the code slightly more uniform.  The vnode lock is acquired in
   all cases and now the only difference between VCHR and other is we
   call UFS_UPDATE instead of VOP_FSYNC().
2003-10-05 09:42:24 +00:00
Jeff Roberson
cffa37d466 - In ffs_update() assert that either the vnode lock or the XLOCK is held. 2003-10-05 09:39:02 +00:00
Jeff Roberson
2f05568aa8 - Check the XLOCK before inspecting v_data.
- Slightly rewrite the fsync loop to be more lock friendly.  We must
   acquire the vnode interlock before dropping the mnt lock.  We must
   also check XLOCK to prevent vclean() races.
 - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean()
   races.
 - Use a local variable to store the results of the nvp == TAILQ_NEXT
   test so that we do not access the vp after we've vrele()d it.
 - Add an XXX comment about UFS_UPDATE() not being protected by any lock
   here.  I suspect that it should need the VOP lock.
2003-10-05 07:16:45 +00:00
Jeff Roberson
53938b4a86 - Skip over xvp if XLOCK is set. 2003-10-05 06:48:37 +00:00
Jeff Roberson
5c014b9d6d - Don't cache_purge() in ufs_reclaim. vclean() does it for us so
this is redundant.
2003-10-05 02:45:00 +00:00
Alan Cox
ccf78b6895 Synchronize access to a vm page's valid field using the containing
vm object's lock.
2003-10-04 20:38:32 +00:00
Jeff Roberson
cac3558da3 - The VI assert in getdirtybuf() is only valid if we're not on a VCHR
vnode.  VCHR vnodes don't do background writes.

Reported by:	kan
2003-10-04 15:57:05 +00:00
Jeff Roberson
04a17687ea - Increase the scope of the interlock in ffs_reload(). Acquire it before
we release the mntvnode_mtx.
 - Call vgonel() directly instead of going through vrecycle() since we own
   the interlock now.
 - Remove a few cases where we locked the interlock just so that we could
   call VOP_UNLOCK with interlock held.
2003-10-04 14:27:49 +00:00
Jeff Roberson
934914d2ef - Fix an unlocked call to GETATTR by slightly shuffling the code in
ffs_snapshot() around.
 - Acquire the interlock before releasing the mntvnode_mtx.  Use the
   interlock to protect v_usecount access.
2003-10-04 14:25:45 +00:00
Jeff Roberson
90e1659e41 - Use the VI_LOCK macro in two places where we directly called mtx_lock()
before.  Direct calls indicated places that needed review and these have
   now been reviewed.
2003-10-04 14:03:28 +00:00
Jeff Roberson
8f2e9e4388 - Properly acquire the vnode interlock before releasing the
mntvnode_mtx.
 - Use a local variable to store the results of the test to see if the
   next vnode on the mount list has changed.  This is so that we no longer
   acess the vnode after we vput() it.
2003-10-04 14:02:32 +00:00
Jeff Roberson
04c81ad83c - Remove a mp_fixme() and some locks that weren't necessary. I now
understand how this works.
2003-10-04 11:06:43 +00:00
Jeff Roberson
cfd5600c66 - Several of the callers to getdirtybuf() were erroneously changed to pass
in a list head instead of a pointer to the first element at the time of
   the first call.  These lists are subject to change, and getdirtybuf()
   would refetch from the wrong list in some cases.

Spottedy by:	tegge
Pointy hat to:	me
2003-09-03 04:08:15 +00:00
Jeff Roberson
23efe6dafc - Backout rev 1.142. This caused a deadlock that I do not understand. More
investigation is required.
2003-08-31 11:26:52 +00:00
Jeff Roberson
d919a11d06 - Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to
bail out if the buffer is not already present.
 - The buffer returned by incore() is not locked and should not be sent to
   brelse().  Use getblk() with the new GB_NOCREAT flag to preserve the
   desired semantics.
2003-08-31 08:50:11 +00:00
Jeff Roberson
a0ebaaddef - Don't acquire the vnode interlock in drain_output(). Instead, require the
caller to acquire it.  This permits drain_output() to be done atomically
   with other operations as well as reducing the number of lock operations.
 - Assert that the proper locks are held in drain_output().
 - Change getdirtybuf() to accept a mutex as an argument.  This mutex is used
   to protect the vnode's buf list and the BKGRDWAIT flag.  This lock is
   dropped when we successfully acquire a buffer and held on return
   otherwise.  These semantics reduce the number of cumbersome cases in
   calling code.
 - Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this
   mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF
   case of interlocked_sleep().
 - Change the return value of getdirtybuf() to be the resulting locked buffer
   or NULL otherwise.  This is for callers who pass in a list head that
   requires a lock.  It is necessary since the lock that protects the list
   head must be dropped in getdirtybuf() so that we don't have a lock order
   reversal with the buf queues lock in bremfree().
 - Adjust all callers of getdirtybuf() to match the new semantics.
 - Add a comment in indir_trunc() that points at unlocked access to a buf.
   This may also be one of the last instances of incore() in the tree.
2003-08-31 07:29:34 +00:00
Jeff Roberson
9dbfeb0ae6 - Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field.
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
   interlock.
 - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
   buffers.  Check for the BKGRDINPROG flag before recycling or throwing
   away a buffer.  We do this instead because it is not safe for us to move
   the original buffer to a new queue from the callback on the background
   write buffer.
 - Remove the B_LOCKED flag and the locked buffer queue.  They are no longer
   used.
 - The vnode interlock is used around checks for BKGRDINPROG where it may
   not be strictly necessary.  If we hold the buf lock the a back-ground
   write will not be started without our knowledge, one may only be
   completed while we're not looking.  Rather than remove the code, Document
   two of the places where this extra locking is done.  A pass should be
   done to verify and minimize the locking later.
2003-08-28 06:55:18 +00:00
Alan Cox
9cf8f2f707 The previous change necessitates the addition of a new #include. Otherwise,
there is a compilation warning.
2003-08-18 17:27:08 +00:00
Poul-Henning Kamp
b103854847 Don't use a VOP_*() function on our own vnodes, go directly to the
relevant internal function, in this case ufs_bmaparray().
2003-08-17 19:26:03 +00:00
Alan Cox
f6c098e569 Revision 1.44 of ufs/ufs/inode.h has made it necessary to add two new
#includes to this file.  Otherwise, it doesn't compile.
2003-08-16 06:15:17 +00:00
Poul-Henning Kamp
5c24d6ee26 Eliminate the i_devvp field from the incore UFS inodes, we can
get the same value from ip->i_ump->um_devvp.

This saves a pointer in the memory copies of inodes, which can
easily run into several hundred kilobytes.

The extra indirection is unmeasurable in benchmarks.

Approved by:	mckusick
2003-08-15 20:03:19 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
Robert Watson
2495048579 Now that the central POSIX.1e ACL code implements functions to
generate the inode mode from a default ACL and creation mask,
implement ufs_sync_inode_from_acl() using acl_posix1e_newfilemode().

Since ACL_OVERRIDE_MASK/ACL_PRESERVE_MASK are defined, we no
longer need to explicitly pass in a "preserve_mask" field: this
is implicit in the use of POSIX.1e semantics.

Note: this change contains a semantic bugfix for new file creation:
we now intersect the ACL-generated mode and the cmode requested by
the user process.  This means permissions on newly created file
objects will now be more conservative.  In the future, we may want
to provide alternative semantics (similar to Solaris and Linux) in
which the ACL mask overrides the umask, permitting ACLs to broaden
the rights beyond the requested umask.

PR:		50148
Reported by:	Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from:	TrustedBSD Project
2003-08-04 03:29:13 +00:00
Robert Watson
7942b925b8 In ufs_chmod(), use privilege only when required in the following
cases:

- Setting sticky bit on non-directory
- Setting setgid on a file with a group that isn't in the effective
  or extended groups of the authorizing credential

I.e., test the requirement first, then do the privilege test,
rather than doing the privilege test regardless of the need for
privilege.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-08-04 00:31:01 +00:00
Robert Watson
9080ff25cf Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-07-28 18:53:29 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Poul-Henning Kamp
a8d43c90af Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
Poul-Henning Kamp
b941a2beb7 We just cached the inode pointer, no need to call VTOI() again. 2003-07-04 12:16:33 +00:00
Alan Cox
4e28b22e35 Lock the vm object when freeing pages. 2003-06-15 21:50:38 +00:00
Poul-Henning Kamp
cefb5754dd Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.
2003-06-15 18:53:00 +00:00
Robert Watson
44533b1722 Re-implement kernel access control for quotactl() as found in the
UFS quota implementation.  Push some quite broken access control
logic out of ufs_quotactl() into the individual command
implementations in ufs_quota.c; fix that logic.  Pass in the thread
argument to any quotactl command that will need to perform access
control.

o quotaon() requires privilege (PRISON_ROOT).

o quotaoff() requires privilege (PRISON_ROOT).

o getquota() requires that:

    If the type is USRQUOTA, either the effective uid match the
    requested quota ID, that the unprivileged_get_quota flag be
    set, or that the thread be privileged (PRISON_ROOT).

    If the type is GRPQUOTA, require that either the thread be
    a member of the group represented by the requested quota ID,
    that the unprivileged_get_quota flag be set, or that the
    thread be privileged (PRISON_ROOT).

o setquota() requires privilege (PRISON_ROOT).

o setuse() requires privilege (PRISON_ROOT).

o qsync() requires no special privilege (consistent with what
  was present before, but probably not very useful).

Add a new sysctl, security.bsd.unprivileged_get_quota, which when
set to a non-zero value, will permit unprivileged users to query user
quotas with non-matching uids and gids.  Set this to 0 by default
to be mostly consistent with the previous behavior (the same for
USRQUOTA, but not for GRPQUOTA).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-15 06:36:19 +00:00
Poul-Henning Kamp
7652131bee Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
David E. O'Brien
f4636c5959 Use __FBSDID(). 2003-06-11 06:34:30 +00:00
Robert Watson
1e9e2eb598 Implement ffs_listextattr() by breaking out that logic and special-cased
attribute name of "" from ffs_getextattr().  Invoking VOP_GETETATTR()
with an empty name is now no longer supported; user application
compatibility is provided by a system call level compatibility
wrapper.  We make sure to explicitly reject attempts to set an EA
with the name "".

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-05 05:57:39 +00:00
Robert Watson
bd38ab57a1 Don't special-case handling of the empty string in the UFS1
extended attribute retrieval code: it's no longer special-cased,
and is caught by the normal UFS1 EA validity checks (and, in
fact, returns the same error, EINVAL).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-05 04:58:58 +00:00
Robert Watson
e1249def7d Return EOPNOTSUPP for attempted EA operations on VCHR vnodes in UFS2;
if we permit them to occur, the kernel panics due to our performing
EA operations using VOP_STRATEGY on the vnode.  This went unnoticed
previously because there are very for users of device nodes on UFS2
due to the introduction of devfs.  However, this can come up with
the Linux compat directories and its hard-coded dev nodes (which will
need to go away as we move away from hard-coded device numbers).
This can come up if you use EA-intensive features such as ACLs and
MAC.

The proper fix is pretty complicated, but this band-aid would be
an excellent MFC candidate for the release.
2003-06-01 02:42:18 +00:00
Poul-Henning Kamp
61301f74d0 Remove unused variable.
Found by:       FlexeLint
2003-05-31 19:56:09 +00:00
Poul-Henning Kamp
6280ed26af Remove unused local variables.
Found by:       FlexeLint
2003-05-31 18:17:32 +00:00
Poul-Henning Kamp
17a1391990 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
Alan Cox
7f758dabbb Lock the vm object when performing vm_object_page_clean().
Approved by:	re (rwatson)
2003-05-18 22:02:51 +00:00
Robert Watson
62d4b85ec1 Jeff added locking assertions that the VV_ flags on vnodes were modified
only while holding appropriate vnode locks.  This patch slides the lock
release for ufs_extattr_enable() to continue to hold the active vnode lock
on a backing file until after the flag change; it also acquires a vnode
lock when disabling an attribute and hence clearing a flag on the backing
vnode.  This permits VFS_DEBUG_LOCKS to run UFS1 extended attributes
without panicking, as well as preventing a potential race and vnode flag
problem.

Approved by:	re (jhb)
Pointed out by:	DEBUG_VFS_LOCKS
2003-05-15 21:07:33 +00:00
Alan Cox
ad682c4825 Lock the vm_object on entry to vm_object_vndeallocate(). 2003-05-03 20:28:26 +00:00
Tim J. Robbins
3632928957 Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree().
These fields can be left as NULL if ffs_vget() allocates an inode but
fails before the dinode memory has been allocated. There are two cases
when this can occur: when we lose a race and another process has added
the inode to the hash, and when reading the inode off disk fails.

The bug was observed by Kris on one of the package-building machines.
See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2
In Kris's case, it was the bread() that failed because of a disk error.

The alternative to this patch is to ensure that ffs_vget() does not call
vput() when the inode that hasn't been properly initialised.
2003-05-01 06:41:59 +00:00
Tim J. Robbins
8d721e877d Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems.
This is purely a cosmetic change because these members are in a
union together.
2003-05-01 06:38:27 +00:00
Mark Murray
51da11a27a Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.
2003-04-30 12:57:40 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
John Baldwin
a15cc35909 Lock both the proc lock and sched_lock when calling sched_nice since
kg_nice is now protected by both.  Being protected by both means that
other places in the kernel that want to read kg_nice only need one of the
two locks.
2003-04-22 20:45:38 +00:00
Jeff Roberson
86711bae9b - Use the sched_nice() api instead of setting the nice value directly.
Tested by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2003-04-12 01:05:19 +00:00
Alan Cox
6134838f99 Sufficient access checks are performed by vmapbuf() that calling useracc()
is pointless.  Remove the call to useracc().

Don't reinitialize fields that are already initialized by getpbuf().

Reviewed by:	tegge
2003-04-06 19:26:30 +00:00
Tor Egge
5e2e6a67c4 Check return value from vmapbuf instead of the function address. 2003-03-27 20:48:34 +00:00
Tor Egge
10dccf8ff2 Eliminate a buffer sleep/wakeup race. 2003-03-27 19:28:11 +00:00
Tor Egge
5bbb806004 Add support for reading directly from file to userland buffer when the
O_DIRECT descriptor status flag is set and both offset and length is a
multiple of the physical media sector size.
2003-03-26 23:40:42 +00:00
John Baldwin
31566c96f4 Use td->td_ucred instead of td->td_proc->p_ucred. 2003-03-20 21:17:40 +00:00
John Baldwin
2a53bfbe62 Minor fixes to ffs_fserr():
- Assume that curthread is not NULL.  It never is in -current.
- Use td_ucred instead of p_ucred.
2003-03-20 21:15:54 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Jeff Roberson
09f11da5a3 - Remove a race between fsync like functions and flushbufqueues() by
requiring locked bufs in vfs_bio_awrite().  Previously the buf could
   have been written out by fsync before we acquired the buf lock if it
   weren't for giant.  The cluster_wbuild() handles this race properly but
   the single write at the end of vfs_bio_awrite() would not.
 - Modify flushbufqueues() so there is only one copy of the loop.  Pass a
   parameter in that says whether or not we should sync bufs with deps.
 - Call flushbufqueues() a second time and then break if we couldn't find
   any bufs without deps.
2003-03-13 07:19:23 +00:00
Kirk McKusick
34968037b1 Use the appropriate size when zeroing out the unused portion
of a snapshot's copy of a superblock. This patch fixes a panic
when taking a snapshot of a 4096/512 filesystem.

Reported by:	Ian Freislich <ianf@za.uu.net>
Sponsored by:   DARPA & NAI Labs.
2003-03-07 23:49:16 +00:00
Alan Cox
09c80124a3 Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on:	arch@
2003-03-06 03:41:02 +00:00
Jeff Roberson
7261f5f68e - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
Nate Lawson
99648386d3 Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
2003-03-03 19:15:40 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Kirk McKusick
74f3809a19 Change the field used to test whether the superblock has been updated
from the filesystem size field to the filesystem maximum blocksize
field. The problem is that older versions of growfs updated only the
new size field and not the old size field. This resulted in the old
(smaller) size field being copied up to the new size field which
caused the filesystem to appear to fsck to be badly trashed.

This also adds a sanity check to ensure that the superblock is not
being updated when the filesystem is mounted read-only. Obviously
such an update should never happen.

Reported by:	Nate Lawson <nate@root.org>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:21:08 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
David Schultz
9cdb2d4d9d Expand the reference count on struct dquot to 32 bits.
This fixes a panic on large systems where a single user
may have more than 64K active or inactive vnodes.

PR:		48234
Reviewed by:	mike (mentor)
2003-02-24 08:49:59 +00:00
Kirk McKusick
3bf0ed940b When removing the last item from a non-empty worklist, the worklist
tail pointer must be updated.

Reported by:	Kris Kennaway <kris@obsecurity.org>
Sponsored by:   DARPA & NAI Labs.
2003-02-24 07:28:41 +00:00
Kirk McKusick
5bb651cb72 This patch fixes a deadlock between the bufdaemon and a process taking
a snapshot. As part of taking a snapshot of a filesystem, the kernel
builds up a list of the filesystem metadata (such as the cylinder
group bitmaps) that are contained in the snapshot. When doing a
copy-on-write check, the list is first consulted. If the block being
written is found on the list, then the full snapshot lookup can be
avoided. Besides providing an important performance speedup this
check also avoids a potential deadlock between the code creating
the snapshot and the bufdaemon trying to cleanup snapshot related
buffers. This fix creates a temporary list containing the key
metadata blocks that can cause the deadlock. This temporary list
is used between the time that the snapshot is first enabled and the
time that the fully complete list is built.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-22 00:59:34 +00:00
Kirk McKusick
37e2ebfdba This patch fixes a bug on an active filesystem on which a snapshot
is being taken from panicing with either "freeing free block" or
"freeing free inode". The problem arises when the snapshot code
is scanning the filesystem looking for inodes with a reference
count of zero (e.g., unlinked but still open) so that it can
expunge them from its view. If it encounters a reclaimed vnode
and has to restart its scan, then it will panic if it encounters
and tries to free an inode that it has already processed. The fix
is to check each candidate inode to see if it has already been
processed before trying to delete it from the snapshot image.

Sponsored by:   DARPA & NAI Labs.
2003-02-22 00:29:51 +00:00