Commit Graph

1344 Commits

Author SHA1 Message Date
csjp
223ed343c6 MFC Log:
Convert the primary ACL allocator from malloc(9) to using a UMA zone instead.
  Also introduce an aclinit function which will be used to create the UMA zone
  for use by file systems at system start up.
2005-11-12 20:55:59 +00:00
scottl
2c9b2a4d12 MFC rev 1.158
Submitted by: tegge
Approved by: re
2005-10-29 06:43:55 +00:00
scottl
88d8ec8306 MFC rev 1.294
Submitted by: tegge
Approved by: re
2005-10-29 06:42:25 +00:00
scottl
1a6eedb3ba MFC rev 1.106 - 1.110
Submitted by: tegge
Approved by: re
2005-10-29 06:40:41 +00:00
scottl
25763ae2a7 MFC rev 1.136 and 1.137
Submitted by: tegge
Approved by: re
2005-10-29 06:38:13 +00:00
truckman
35158d47d9 MFC ufs_lookup.c 1.78 and 1.79.
Original commit messages:
  Modified files:
    sys/ufs/ufs          ufs_lookup.c
  Log:
  Close a race in the ufs_lookup() code that handles the ISDOTDOT
  case by saving the value of dp->i_ino before unlocking the vnode
  for the current directory and passing the saved value to VFS_VGET().

  Without this change, another thread can overwrite dp->i_ino after
  the current directory is unlocked, causing  ufs_lookup() to lock
  and return the wrong vnode in place of the vnode for its parent
  directory.  A deadlock can occur if dp->i_ino was changed to a
  subdirectory of the current directory because the root to leaf vnode
  lock ordering will be violated.  A vnode lock can be leaked if
  dp->i_ino was changed to point to the current directory, which
  causes the current vnode lock for the current directory to be
  recursed, which confuses lookup() into calling vrele() when it
  should be calling vput().

  The probability of this bug being triggered seems to be quite low
  unless the sysctl variable debug.vfscache is set to 0.

  Reviewed by:    jhb
  MFC after:      2 weeks

  Revision  Changes    Path
  1.78      +3 -1      src/sys/ufs/ufs/ufs_lookup.c

  Modified files:
    sys/ufs/ufs          ufs_lookup.c
  Log:
  Correct the type of the temporary variable used by ufs_lookup.c:1.78
  to fix the race condition in the ufs_lookup() ISDOTDOT code.

  Noticed by:     bde
  MFC after:      12 days

  Revision  Changes    Path
  1.79      +1 -1      src/sys/ufs/ufs/ufs_lookup.c

Approved by:	re (scottl)
2005-10-19 20:31:45 +00:00
rwatson
bf446dfcca Merge ufs_extattr.c:1.82 from HEAD to RELENG_6:
When performing a VOP_LOOKUP() as part of UFS1 extended attribute
  auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE.  This flag must now
  be set so that lockmgr knows what kind of lock to acquire, and it
  will panic if not specified.  This resulted in a panic when using
  extended attributes on UFS1 as of locking work present in the 6.x
  branch.

  This is a RELENG_6_0 merge candidate.

  Reported by:    lofi

Approved by:	re (kensmith)
MFC after:	1 day
2005-10-15 18:32:55 +00:00
truckman
7267aabbe1 MFC ffs_alloc.c 1.135 - clear i_flag field in recycled inodes
Original commit message:

  FreeBSD src repository

  Modified files:
    sys/ufs/ffs          ffs_alloc.c
  Log:
  Initialize the inode i_flag field in ffs_valloc() to clean up any
  stale flag bits left over from before the inode was recycled.

  Without this change, a leftover IN_SPACECOUNTED flag could prevent
  softdep_freefile() and softdep_releasefile() from incrementing
  fs_pendinginodes.  Because handle_workitem_freefile() unconditionally
  decrements fs_pendinginodes, a negative value could be reported at
  file system unmount time with a message like:
          unmount pending error: blocks 0 files -3
  The pending block count in fs_pendingblocks could also be negative
  for similar reasons.  These errors can cause the data returned by
  statfs() to be slightly incorrect.  Some other cleanup code in
  softdep_releasefile() could also be incorrectly bypassed.

Reviewed by:	tegge
Approved by:	re (scottl)
2005-10-05 05:24:53 +00:00
truckman
4df51e5f85 MFC snaplk deadlock fix
src/sys/kern/vfs_bio.c          1.495, 1.496
        src/sys/kern/vfs_subr.c         1.648
        src/sys/sys/buf.h               1.190, 1.191
        src/sys/sys/proc.h              1.436
        src/sys/ufs/ffs/ffs_snapshot.c  1.104, 1.105, 1.106

Original commit messages:

    Log:
    Un-staticize runningbufwakeup() and staticize updateproc.

    Add a new private thread flag to indicate that the thread should
    not sleep if runningbufspace is too large.

    Set this flag on the bufdaemon and syncer threads so that they skip
    the waitrunningbufspace() call in bufwrite() rather than than
    checking the proc pointer vs. the known proc pointers for these two
    threads.  A way of preventing these threads from being starved for
    I/O but still placing limits on their outstanding I/O would be
    desirable.

    Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
    blocking on the runningbufspace check while holding snaplk.  This
    prevents snaplk from being held for an arbitrarily long period of
    time if runningbufspace is high and greatly reduces the contention
    for snaplk.  The disadvantage is that ffs_copyonwrite() can start
    a large amount of I/O if there are a large number of snapshots,
    which could cause a deadlock in other parts of the code.

    Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
    before attempting to grab snaplk so that I/O requests waiting on
    snaplk are not counted in runningbufspace as being in-progress.
    Increment runningbufspace again before actually launching the
    original I/O request.

    Prior to the above two changes, the system could deadlock if enough
    I/O requests were blocked by snaplk to prevent runningbufspace from
    falling below lorunningspace and one of the bawrite() calls in
    ffs_copyonwrite() blocked in waitrunningbufspace() while holding
    snaplk.

    See <http://www.holm.cc/stress/log/cons143.html>

    Revision  Changes    Path
    1.495     +3 -3      src/sys/kern/vfs_bio.c
    1.648     +2 -1      src/sys/kern/vfs_subr.c
    1.190     +1 -0      src/sys/sys/buf.h
    1.436     +1 -1      src/sys/sys/proc.h
    1.104     +16 -4     src/sys/ufs/ffs/ffs_snapshot.c

    Log:
    Un-staticize waitrunningbufspace() and call it before returning from
    ffs_copyonwrite() if any async writes were launched.

    Restore the threads previous TDP_NORUNNINGBUF state before returning
    from ffs_copyonwrite().

    Revision  Changes    Path
    1.496     +1 -1      src/sys/kern/vfs_bio.c
    1.191     +1 -0      src/sys/sys/buf.h
    1.105     +13 -1     src/sys/ufs/ffs/ffs_snapshot.c

    Log:
    Correct previous commit to fix the sense of the TDP_NORUNNINGBUF
    check in ffs_copyonwrite() that is a precondition for calling
    waitrunningbufspace().

    Pointed out by: tegge
    Pointy hat to:  truckman
    MFC after:      3 days

    Revision  Changes    Path
    1.106     +1 -1      src/sys/ufs/ffs/ffs_snapshot.c

Approved by:	re (scottl)
2005-10-04 04:41:27 +00:00
truckman
670ac96738 MFC ffs_softdep.c 1.185
Original commit message:

  truckman    2005-09-29 21:50:26 UTC

   FreeBSD src repository

   Modified files:
     sys/ufs/ffs          ffs_softdep.c
   Log:
   After a rmdir()ed directory has been truncated, force an update of
   the directory's inode after queuing the dirrem that will decrement
   the parent directory's link count.  This will force the update of
   the parent directory's actual link to actually be scheduled.  Without
   this change the parent directory's actual link count would not be
   updated until ufs_inactive() cleared the inode of the newly removed
   directory, which might be deferred indefinitely.  ufs_inactive()
   will not be called as long as any process holds a reference to the
   removed directory, and ufs_inactive() will not clear the inode if
   the link count is non-zero, which could be the result of an earlier
   system crash.

   If a background fsck is run before the update of the parent directory's
   actual link count has been performed, or at least scheduled by
   putting the dirrem on the leaf directory's inodedep id_bufwait list,
   fsck will corrupt the file system by decrementing the parent
   directory's effective link count, which was previously correct
   because it already took the removal of the leaf directory into
   account, and setting the actual link count to the same value as the
   effective link count after the dangling, removed, leaf directory
   has been removed.  This happens because fsck acts based on the
   actual link count, which will be too high when fsck creates the
   file system snapshot that it references.

   This change has the fortunate side effect of more quickly cleaning
   up the large number dirrem structures that linger for an extended
   time after the removal of a large directory tree.  It also fixes a
   potential problem with the shutdown of the syncer thread timing out
   if the system is rebooted immediately after removing a large directory
   tree.

   Submitted by:   tegge
   MFC after:      3 days

   Revision  Changes    Path
   1.185     +2 -0      src/sys/ufs/ffs/ffs_softdep.c

Submitted by:	tegge
Approved by:	re (scottl)
2005-10-02 08:25:33 +00:00
delphij
8eaf585962 MFC 1.293 (by ssouhlal):
ffs_mountfs() needs devvp to be locked, so lock it.

Approved by:	re (scottl)
2005-09-30 06:14:44 +00:00
delphij
2e3c58f846 MFC 1.64: Restore a historical ufs_inactive behavior that respect
the RDONLY option, so subsequent call of UFS_TRUNCATE (ffs_truncate)
would not panic the system.  This fixes a panic that can happen
when mounting a corrputed filesystem read-only, and reading data
from it.

Reviewed by:	mckusick
Approved by:	re (scottl)
2005-09-27 17:03:53 +00:00
tegge
1323e854ee MFC: Giant is no longer needed here.
Approved by:	re (scottl)
2005-09-12 15:56:07 +00:00
tegge
a09e810913 MFC: Retain generation count when writing zeroes instead of an inode to disk.
Don't free a struct inodedep if another process is allocating saved
     inode memory for the same struct inodedep in
     initiate_write_inodeblock_ufs[12]().

     Handle disappearing dependencies in softdep_disk_io_initiation().

Approved by:	re (scottl)
2005-09-07 00:03:38 +00:00
ssouhlal
bfc657ece4 MFC rev 1.292:
Set the mountpoint path in the superblock (fs_fsmnt) at mount-time
  so that it appears in the various messages (not cleanly unmounted,
  filesystem full, etc). This has been broken since rev 1.261.

Approved by:	re (scottl)
2005-08-28 17:04:43 +00:00
tegge
df06317979 MFC: Don't set the COMPLETE flag in an inodedep structure before the
related inode has been written.

Approved by:	re (scottl)
2005-08-27 18:40:06 +00:00
alc
a574f3c833 MFC
Eliminate inconsistency in the setting of the B_DONE flag.

Approved by:	re (kensmith)
2005-08-20 06:07:55 +00:00
iedowse
b9efd783a8 MFC 1.22: in the ufsdirhash_build() failure case for corrupted
directories or unreadable blocks, make sure to destroy the mutex
we created.

Approved by:	re (scottl)
2005-08-20 04:27:15 +00:00
ups
31fe0d40f0 MFC ffs_softdep.c 1.182, softdep.h 1.18
Delay freeing disk space for file system blocks until all
dirty buffers are safely released. This fixes softdep
problems on truncation (deletion) of files with dirty
buffers.

Approved by:	re (kensmith)
2005-08-10 14:09:25 +00:00
ssouhlal
0835f7b4a9 Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
kensmith
3a7e275ce6 This patch addresses a standards violation issue. The standards say a
file's access time should be updated when it gets executed.  A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.

A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called.  The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all.  Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc.  In particular vn_start_write() has not been called.  The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point.  The actual time update will happen later
during a sync (which handles all the necessary locking).

Got me into this:	cperciva
Discussed with:		a lot with bde, a little with kan
Showed patches to:	phk, jeffr, standards@, arch@
Minor discussion on:	arch@
2005-05-31 19:39:52 +00:00
jeff
36b04bf4ea - Don't set our bio op to be a READ when we've just completed a write. There
are subtle differences in the read and write completion path.  Instead,
   grab an extra write ref so the write path can drop it when we recursively
   call bufdone().  I believe this may be the source of the wrong bufobj
   panics.

Reported by:	pho, kkenn
2005-05-30 07:04:15 +00:00
mckusick
72bafed72f Allow removal of empty directories with high link counts. These can
occur on a filesystem running with soft updates after a crash and
before a background fsck has been run. To prevent discrepancies
from arising in a background fsck that may already be running,
the directory is removed but its inode is not freed and is left
with the residual reference count. When encountered by the
background fsck it will be reclaimed.
2005-05-18 22:18:21 +00:00
jeff
dde0e2eb94 - Don't restrict the softdep stats to DEBUG kernels, they cost nothing to
export.  This was happening anyway since this file manually sets DEBUG.
 - Add a sysctl for the number of items on the worklist.
 - Use a more canonical loop restart in softdep_fsync_mountdev, it saves
   some code at the expense of a goto and makes me worry less about
   modifying a variable that should be private to the TAILQ_FOREACH_SAFE
   macro.
2005-05-03 11:03:29 +00:00
jeff
808d90b655 - Use bdone() directly instead of calling it indirectly through
ffs_rawreaddone().

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:28:19 +00:00
pjd
18f74c4005 - Plug memory leak.
- Fix two style nits.

Found by:	Coverity Prevent analysis tool
Reviewed by:	rwatson
MFC after:	1 week
2005-04-16 10:57:49 +00:00
jeff
afab3762a0 - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
jeff
2a0f491ada - Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate().
Do the same for oip.

Pointed out by:	glebius
2005-04-05 08:49:41 +00:00
jeff
184934a8ee - Use M_ZERO rather than explicitly calling bzero().
- Don't intermingle direct calls to lockmgr and indirect calls through
   VOPs.  This will be important in the future.
 - Dont lock the devvp's interlock just to release it on the next line by
   passing LK_INTERLOCK to lockmgr.
 - Restructure ffs_snapshot_unmount so we don't call free() with the
   devvp's interlock locked.
2005-04-03 12:03:44 +00:00
jeff
e2abc701a5 - In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode
because it may change identities while we're sleeping on the lock.
   Otherwise we may bail out of ffs_sync() early due to an error from
   deadfs.
 - Collapse a VOP_UNLOCK, vrele into a single vput().
2005-04-03 10:38:18 +00:00
jeff
e0e3d6c9e0 - Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix
two bugs.
 - ffs_disk_prewrite was pulling the vp from the buf and checking for
   COPYONWRITE, when really it wanted the vp from the bufobj that we're
   writing to, which is the devvp.  This lead to us skipping the copy on
   write to all file data, which significantly broke snapshots for the
   last few months.
 - When the SOFTUPDATES option was not included in the kernel config we
   would also skip the copy on write check, which would effectively disable
   snapshots.
 - Remove an invalid mp_fixme().

Debugging tips from:	mckusick
Reported by:		iedowse, others
Discussed with:		phk
2005-04-03 10:29:55 +00:00
jeff
569acf54a8 - Fix botched LK_NOWAIT removal. I mistakenly thought this compiled as
part of GENERIC.
2005-03-31 05:58:14 +00:00
jeff
322d56df72 - FFS supports shared locks, clear LK_NOSHARE from our vnode locks.
Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:23:20 +00:00
jeff
f13ee4b8f2 - Set LK_NOSHARE for snapshot locks. snapshots require exclusive only
access.
 - Remove the hack from ffs_lock() to implement LK_NOSHARE in a ffs
   specific way.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:21:17 +00:00
jeff
97c40ebd49 - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
jeff
e6d7b24c6e - Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
   LOCKPARENT or WANTPARENT.  It wasn't even properly used in the CREATE
   or DELETE cases.
2005-03-29 13:16:38 +00:00
jeff
36bc306f63 - Upgrade a shared lock request to exclusive in ffs_vget() if we have
to create the vnode.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:51 +00:00
jeff
16ed71ae90 - Honor the cn_lkflags passed from namei() when locking the leaf.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:10:01 +00:00
jeff
f4493fbc1c - UFS no longer uses PDIRUNLOCK to track the parent state. Instead, we now
rely on ufs to always leave the parent locked except in the ISDOTDOT
   case.  Adjust asserts to deal with these changes.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:35:58 +00:00
jeff
b136fd4eee - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:   Isilon Systems, Inc.
2005-03-28 09:34:36 +00:00
das
3b88b0f403 When the softupdates worklist gets too long, threads that attempt to
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist.  Add a per-thread
flag to detect this situation and avoid the recursion.  This should
fix the stack overflows that could occur while removing large
directory trees.

Tested by:	kris
Reviewed by:	mckusick
2005-03-25 17:30:31 +00:00
jeff
c9591f9ecd - Call VFS_ROOT() with LK_EXCLUSIVE.
Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:33:45 +00:00
jeff
ca7edef8ef - Update the ufs_root() prototype.
- Pass the ufs_root() flags argument to VFS_VGET() to allow callers to
   specify shared locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:32:50 +00:00
jeff
479ac055a1 - Lock the clearing of v_data in ufs_reclaim() to prevent a pagefault
in ffs_lock() when it acesses v_data without the vnlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-17 11:58:43 +00:00
phk
98f1c9b062 Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
phk
54d4b170ba Don't hold a reference on the disk vnode for each inode. 2005-03-15 20:50:58 +00:00
phk
d043926750 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
phk
124bf5e823 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
jeff
10270f3a1e - Destroy the vnode object earlier in VOP_RECLAIM as we need more of
the vnode valid before the vm flushes pages.
 - Get rid of some extraneous uses of the vnode interlock.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 01:42:58 +00:00
phk
503a6885b8 Use vfs_hash instead of home-rolled. 2005-03-14 10:21:16 +00:00