Commit Graph

170 Commits

Author SHA1 Message Date
Kirk McKusick
43a3cc7796 Ensure that filesystem metadata contained within persistent snapshots
is always kept consistent.

Suggested by:	Jeff Roberson
2011-06-15 23:19:09 +00:00
Kirk McKusick
2191e465cc With the restructuring of the block reclaimation code, the notification
messages for a filesystem being out of space need to be moved so that
they do not print out until after a failed cleanup attempt.

Suggested by:	Jeff Roberson
2011-06-15 18:05:08 +00:00
Kirk McKusick
9eb8728aa5 Update to soft updates journaling to properly track freed blocks
that get claimed by snapshots.

Submitted by:	Jeff Roberson
Tested by:	Peter Holm
2011-06-12 19:27:05 +00:00
Jeff Roberson
280e091a99 Implement fully asynchronous partial truncation with softupdates journaling
to resolve errors which can cause corruption on recovery with the old
synchronous mechanism.

 - Append partial truncation freework structures to indirdeps while
   truncation is proceeding.  These prevent new block pointers from
   becoming valid until truncation completes and serialize truncations.
 - On completion of a partial truncate journal work waits for zeroed
   pointers to hit indirects.
 - softdep_journal_freeblocks() handles last frag allocation and last
   block zeroing.
 - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it
   is only implemented in one place.
 - Block allocation failure handling moved up one level so it does not
   proceed with buf locks held.  This permits us to do more extensive
   reclaims when filesystem space is exhausted.
 - softdep_sync_metadata() is broken into two parts, the first executes
   once at the start of ffs_syncvnode() and flushes truncations and
   inode dependencies.  The second is called on each locked buf.  This
   eliminates excessive looping and rollbacks.
 - Improve the mechanism in process_worklist_item() that handles
   acquiring vnode locks for handle_workitem_remove() so that it works
   more generally and does not loop excessively over the same worklist
   items on each call.
 - Don't corrupt directories by zeroing the tail in fsck.  This is only
   done for regular files.
 - Push a fsync complete record for files that need it so the checker
   knows a truncation in the journal is no longer valid.

Discussed with:	mckusick, kib (ffs_pages_remove and ffs_truncate parts)
Tested by:	pho
2011-06-10 22:48:35 +00:00
Kirk McKusick
9f62b10cb3 Grammer fix in comment.
Eliminate one (of several) possible conflicting buffer locks when
trying to reclaim blocks. Rest of fix to be incorporated as part
of SUJ update by jeff.

Pointed out by: Kostik Belousov
2011-06-05 22:36:30 +00:00
Kirk McKusick
1508294bb6 Due to a lag in updating the fs_pendinginodes count, we cannot depend
on it to decide whether we should try to reclaim inodes when we run
short.

Discovered by: Peter Holm
2011-05-28 15:07:29 +00:00
Kirk McKusick
99f6ac66ad The check for whether a block is going to be claimed by a snapshot
needs to happen before we notify the underlying layer that it is
being freed.
2011-05-26 23:56:58 +00:00
Konstantin Belousov
d9ca1af7ed VFS sometimes is unable to inactivate a vnode when vnode use count
goes to zero. E.g., the vnode might be only shared-locked at the time of
vput() call. Such vnodes are kept in the hash, so they can be found later.

If ffs_valloc() allocated an inode that has its vnode cached in hash, and
still owing the inactivation, then vget() call from ffs_valloc() clears
VI_OWEINACT, and then the vnode is reused for the newly allocated inode.

The problem is, the vnode is not reclaimed before it is put to the new
use. ffs_valloc() recycles vnode vm object, but this is not enough.
In particular, at least v_vflag should be cleared, and several bits of
UFS state need to be removed.

It is very inconvenient to call vgone() at this point. Instead, move
some parts of ufs_reclaim() into helper function ufs_prepare_reclaim(),
and call the helper from VOP_RECLAIM and ffs_valloc().

Reviewed by:	mckusick
Tested by:	pho
MFC after:	3 weeks
2011-04-24 10:47:56 +00:00
Kirk McKusick
4c821a3978 Be far more persistent in reclaiming blocks and inodes before giving
up and declaring a filesystem out of space. Especially necessary when
running on a small filesystem. With this improvement, it should be
possible to use soft updates on a small root filesystem.

Kudos to: Peter Holm
Testing by: Peter Holm
MFC: 2 weeks
2011-04-05 21:26:05 +00:00
Kirk McKusick
0a809056ce Add retry code analogous to the block allocation retry code
to avoid running out of inodes.

Reported by: Peter Holm
2011-03-23 05:13:54 +00:00
John Baldwin
96e1934a43 Use ffs() to locate free bits in the inode bitmap rather than a loop with
bit shifts.

Reviewed by:	mckusick
MFC after:	1 month
2011-03-04 22:26:41 +00:00
Konstantin Belousov
8c2a54de80 Add kernel side support for BIO_DELETE/TRIM on UFS.
The FS_TRIM fs flag indicates that administrator requested issuing of
TRIM commands for the volume. UFS will only send the command to disk
if the disk reports GEOM::candelete attribute.

Since disk queue is reordered, data block is marked as free in the bitmap
only after TRIM command completed. Due to need to sleep waiting for
i/o to finish, TRIM bio_done routine schedules taskqueue to set the
bitmap bit.

Based on the patch by:	mckusick
Reviewed by:	mckusick, pjd
Tested by:	pho
MFC after:	1 month
2010-12-29 12:25:28 +00:00
Jeff Roberson
9f9c8c59ae - Handle the truncation of an inode with an effective link count of 0 in
the context of the process that reduced the effective count.  Previously
   all truncation as a result of unlink happened in the softdep flush
   thread.  This had the effect of being impossible to rate limit properly
   with the journal code.  Now the process issuing unlinks is suspended
   when the journal files.  This has a side-effect of improving rm
   performance by allowing more concurrent work.
 - Handle two cases in inactive, one for effnlink == 0 and another when
   nlink finally reaches 0.
 - Eliminate the SPACECOUNTED related code since the truncation is no
   longer delayed.

Discussed with:	mckusick
2010-07-06 07:11:04 +00:00
Jeff Roberson
113db2dddb - Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
   for background fsck on unclean shutdown.

Sponsored by:   iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
2010-04-24 07:05:35 +00:00
Konstantin Belousov
2950ff259c When ffs_realloccg() failed to allocate bigger fragment and, because
pending blocks are scheduled for removal, goes to retry the (re)allocation,
clear the bp pointer. It might happen that meantime free space is really
exhausted and we are entering nospace: label without bread()ing buffer,
causing stale bp value to be brelse()d again.

Tested by:	pho
    (Producing a scenario to reliably reproduce the
     race appeared to be much harder then fixing the bug)
MFC after:	1 week
2010-02-13 10:34:50 +00:00
Kirk McKusick
e870d1e6f9 This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR:          133980
MFC after:   2 weeks
2010-02-10 20:10:35 +00:00
Kirk McKusick
53298164b8 Cast 64-bit quantity to intptr_t rather than int so as to work properly
with 64-bit architectures (such as amd64).

Reported by:	bz
2010-01-11 22:42:06 +00:00
Kirk McKusick
e268f54cb4 Background:
When renaming a directory it passes through several intermediate
states. First its new name will be created causing it to have two
names (from possibly different parents). Next, if it has different
parents, its value of ".." will be changed from pointing to the old
parent to pointing to the new parent. Concurrently, its old name
will be removed bringing it back into a consistent state. When fsck
encounters an extra name for a directory, it offers to remove the
"extraneous hard link"; when it finds that the names have been
changed but the update to ".." has not happened, it offers to rewrite
".." to point at the correct parent. Both of these changes were
considered unexpected so would cause fsck in preen mode or fsck in
background mode to fail with the need to run fsck manually to fix
these problems. Fsck running in preen mode or background mode now
corrects these expected inconsistencies that arise during directory
rename. The functionality added with this update is used by fsck
running in background mode to make these fixes.

Solution:

This update adds three new fsck sysctl commands to support background
fsck in correcting expected inconsistencies that arise from incomplete
directory rename operations. They are:

setcwd(dirinode) - set the current directory to dirinode in the
    filesystem associated with the snapshot.
setdotdot(oldvalue, newvalue) - Verify that the inode number for ".."
    in the current directory is oldvalue then change it to newvalue.
unlink(nameptr, oldvalue) - Verify that the inode number associated
    with nameptr in the current directory is oldvalue then unlink it.

As with all other fsck sysctls, these new ones may only be used by
processes with appropriate priviledge.

Reported by:    	jeff
Security issues:	rwatson
2010-01-11 20:44:05 +00:00
Alan Cox
6e5982caf7 Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This
eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg().

In collaboration with:	tegge
2009-05-17 20:26:00 +00:00
Edward Tomasz Napierala
8a3f2c376a When a device containing mounted UFS filesystem disappears, the type
of devvp becomes VBAD, which UFS incorrectly interprets as snapshot
vnode, which in turns causes panic.  Fix it by replacing '!= VCHR'
with '== VREG'.

With this fix in place, you should no longer be able to panic the system
by removing a device with an UFS filesystem mounted from it - assuming
you don't use softupdates.

Reviewed by:	kib
Tested by:	pho
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-02-06 17:14:07 +00:00
Robert Watson
ec7e66e84c Following a fair amount of real world experience with ACLs and
extended attributes since FreeBSD 5, make the following semantic
changes:

- Don't update the inode modification time (mtime) when extended
  attributes (and hence also ACLs) are added, modified, or removed.
- Don't update the inode access tie (atime) when extended attributes
  (and hence also ACLs) are queried.

This means that rsync (and related tools) won't improperly think
that the data in the file has changed when only the ACL has changed.

Note that ffs_reallocblks() has not been changed to not update on an
IO_EXT transaction, but currently EAs don't use the cluster write
routines so this shouldn't be a problem.  If EAs grow support for
clustering, then VOP_REALLOCBLKS() will need to grow a flag argument
to carry down IO_EXT to UFS.

MFC after:	1 week
PR:             ports/125739
Reported by:    Alexander Zagrebin <alexz@visp.ru>
Tested by:      pluknet <pluknet@gmail.com>,
                Greg Byshenk <freebsd@byshenk.net>
Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
2009-01-27 21:48:47 +00:00
Konstantin Belousov
acd05e468a In ffs_valloc(), ffs_vget() may fail because insmntque() refused to
insert new vnode into the mount vnode list. Then, for the SU-enabled
mount, ffs_vfree could create freefile dependency. This dependency can
hang around forever since inode is not marked as IN_MODIFIED and
correspondingly inodeblock may be not marked as dirty.

After ffs_vget() fails, retry with FFSV_FORCEINSMQ, mark the inode as
modified, and vput() it immediately. Take care of the dup alloc.

Tested by:	pho
Reviewed by:	tegge
MFC after:	1 month
2008-08-28 09:19:50 +00:00
Ken Smith
d9e6294e4f Fix a broken check that recently became more annoying because it now
gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently
nobody uses).  From Tor's description:

  This happens when the block range spans two block maps, the first in the
  inode (mapping up to NDADDR direct blocks) and the second being the first
  indirect block.  The current check assumes that both block maps are
  indirect blocks.

Work done by:	tegge
Tested by:	kris, kensmith
2007-12-01 13:12:43 +00:00
David E. O'Brien
1102b89baa Turn most ffs 'DIAGNOSTIC's into INVARIANTS. 2007-11-08 17:21:51 +00:00
Bjoern A. Zeeb
7fd627f00f Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir
is given (with newfs or tunefs) and dirsize overflows.

In case dirsize is <= 0 because of an overflow set maxcontigdirs
to 0 so it will be 1 later. This is what would happen for large
fs_avgfilesize. [1]

Identified with help from:	roberto, pjd
Submitted by:			pjd [1]
Approved by:			re (rwatson)
MFC after:			8 days
2007-09-10 14:12:29 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Brian Somers
98fff6b57c Account for di_blocks allocations when IN_SPACECOUNTED is set in an
inode's i_flag.

It's possible that after ufs_infactive() calls softdep_releasefile(),
i_nlink stays >0 for a considerable amount of time (> 60 seconds here).
During this period, any ffs allocation routines that alter di_blocks
must also account for the blocks in the filesystem's fs_pendingblocks
value.

This change fixes an eventual df/du discrepency that will happen as
the result of fs_pendingblocks being reduced to <0.

The only manifestation of this that people may recognise is the
following message on boot:

    /somefs: update error: blocks -N files M

at which point the negative pending block count is adjusted to zero.

Reviewed by:	tegge
MFC after:	3 weeks
2007-02-23 20:23:35 +00:00
Mike Pritchard
db9b81eabc Quota system cleanup.
1) Do not do quota accounting for the actual quota data files
   or for file system snapshot files ("system" files).  This
   prevents a deadlock descibed in PR kern/30958 if the kernel
   ever has to grow the quota file.  Snapshot files were already
   exempt from the quota checks, but this change generalized the check.
2) Fix a cast that caused extremely large uids/gids to incorrectly
   write the quota information to the data file at a truncated
   value for a uint_t32 id value.  The incorrect cast caused quota
   files in this case to be around 4GB in size, with the correct cast
   they can now be 131GB in size.  Also related to PR kern/30958.
3) Check for what appear to be negative UIDs/GIDs and not account
   for them.  This prevents the quota files from becoming 131GB in
   size and causing quotacheck to run forever at bootup.  This could
   also cause the kernel to try and expand the quota file, which might
   deadlock due to the issue in #1.  kern/30958 and kern/38156
   (and some much older closed PR's).
4) With the deadlock problems gone, the kernel can now expand the
   size of the quota database files if it needs to.
5) Pass in the i-node count change value to chkiq and chkiqchg as an
   int, like it used to be before the common routine was split up
   into 2 different routines to increase / decrease the i-node in-use
   count.  Prevents an underflow on the i-node count.  Related
   to PR kern/89247.
6) Prevent the block usage from growing slowly if a file system is
   full and the write was denied due to that fact.  PR kern/89247.

Some of these changes require an updated quotacheck to prevent
the creation of huge (131GB) quota data files (item #3).

#1/#4 probably fixes a lot of the random hangs when quotas are enabled,
possibly some of the jail hangs.
2007-01-20 11:58:32 +00:00
Mike Pritchard
6192525baf Fix a spelling error in some comments. heirarchy -> hierarchy.
Obtained from: OpenBSD
2007-01-16 19:35:43 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Stefan Farfeleder
2b8c9fa46b Drop two unnecessary casts. 2006-07-18 07:03:43 +00:00
Jeff Roberson
eb2ea10590 - Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
   interdependencies between mounts that can lead to deadlocks, etc.
 - Add the softdep worklist and various counters to the ufsmnt structure.
 - Add a mount pointer to the workitem and remove mount pointers from the
   various structures derived from the workitem as they are now redundant.
 - Remove the poor-man's semaphore protecting softdep_process_worklist and
   softdep_flushworklist.  Several threads may now process the list
   simultaneously.
 - Add softdep_waitidle() to block the thread until all pending
   dependencies being operated on by other threads have been flushed.
 - Use softdep_waitidle() in unmount and snapshots to block either
   operation until the fs is stable.
 - Remove softdep worklist processing from the syncer and move it into the
   softdep_flush() thread.  This thread processes all softdep mounts
   once each second and when it is called via the new softdep_speedup()
   when there is a resource shortage.  This removes the softdep hook
   from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with:	tegge, truckman, mckusick
Tested by:	kris
2006-03-02 05:50:23 +00:00
Paul Saab
e1cef62715 Rate limit filesystem full and out of inodes messages to once a
second.
2005-10-31 20:33:28 +00:00
Tor Egge
48c2ac4539 Avoid unintended VMIO on directories and symlinks due to leftover object
not having been destroyed.
2005-10-10 19:02:04 +00:00
Tor Egge
c73e9e9c7b Reinitialize v_type and v_op fields in case vnode has been reused without
reclamation.  If the vnode previously was a fifo then v_op would point to
ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at
the end of ffsext_strategy.
2005-10-09 19:06:34 +00:00
Don Lewis
448434c3c9 Initialize the inode i_flag field in ffs_valloc() to clean up any
stale flag bits left over from before the inode was recycled.

Without this change, a leftover IN_SPACECOUNTED flag could prevent
softdep_freefile() and softdep_releasefile() from incrementing
fs_pendinginodes.  Because handle_workitem_freefile() unconditionally
decrements fs_pendinginodes, a negative value could be reported at
file system unmount time with a message like:
        unmount pending error: blocks 0 files -3
The pending block count in fs_pendingblocks could also be negative
for similar reasons.  These errors can cause the data returned by
statfs() to be slightly incorrect.  Some other cleanup code in
softdep_releasefile() could also be incorrectly bypassed.

MFC after:	3 days
2005-10-03 21:57:43 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Xin LI
a16baf37b9 The recomputation of file system summary at mount time can be a
very slow process, especially for large file systems that is just
recovered from a crash.

Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash.  With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.

Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior.  When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.

Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively.  Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.

This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.

Reviewed by:	mckusick
MFC After:	1 week
2005-02-20 08:02:15 +00:00
Poul-Henning Kamp
adf4157738 Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static. 2005-02-10 12:20:08 +00:00
Poul-Henning Kamp
efd6d9808c Don't use the UFS_* and VFS_* functions where a direct call is possble.
The UFS_ functions are for UFS to call back into VFS.  The VFS functions
are external entry points into the filesystem.
2005-02-08 17:40:01 +00:00
Jeff Roberson
8e37fbad3a - Don't use atomic operations to deal with the active array, instead
it is now quite naturally protected by the ufsmount mutex.
 - Use the ufs lock to protect various fields in struct fs, primarily the
   cg summary needs protection to avoid allocation races.  Several
   functions have been slightly re-arranged to reduce the number of
   lock operations.
 - Adjust several functions (blkfree, freefile, etc.) to accept a
   ufsmount as an argument so that we may access the ufs lock.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:08:35 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Kirk McKusick
364ed814e7 Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by:	Henry Whincup <henry@jot.to>
MFC after:	1 week
2004-12-09 21:24:00 +00:00
Poul-Henning Kamp
4392001125 Move UFS from DEVFS backing to GEOM backing.
This eliminates a bunch of vnode overhead (approx 1-2 % speed
improvement) and gives us more control over the access to the storage
device.

Access counts on the underlying device are not correctly tracked and
therefore it is possible to read-only mount the same disk device multiple
times:
	syv# mount -p
	/dev/md0        /var    ufs rw  2 2
	/dev/ad0        /mnt    ufs ro  1 1
	/dev/ad0        /mnt2   ufs ro  1 1
	/dev/ad0        /mnt3   ufs ro  1 1

Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches
things in RAM) this is not possible with read-write mounts, and the system
will correctly reject this.

Details:

	Add a geom consumer and a bufobj pointer to ufsmount.

	Eliminate the vnode argument from softdep_disk_prewrite().
	Pick the vnode out of bp->b_vp for now.  Eventually we
	should find it through bp->b_bufobj->b_private.

	In the mountcode, use g_vfs_open() once we have used
	VOP_ACCESS() to check permissions.

	When upgrading and downgrading between r/o and r/w do the
	right thing with GEOM access counts.  Remove all the
	workarounds for not being able to do this with VOP_OPEN().

	If we are the root mount, drop the exclusive access count
	until we upgrade to r/w.  This allows fsck of the root
	filesystem and the MNT_RELOAD to work correctly.

	Set bo_private to the GEOM consumer on the device bufobj.

	Change the ffs_ops->strategy function to call g_vfs_strategy()

	In ufs_strategy() directly call the strategy on the disk
	bufobj.  Same in rawread.

	In ffs_fsync() we will no longer see VCHR device nodes, so
	remove code which synced the filesystem mounted on it, in
	case we came there.  I'm not sure this code made sense in
	the first place since we would have taken the specfs route
	on such a vnode.

	Redo the highly bogus readblock() function in the snapshot
	code to something slightly less bogus: Constructing an uio
	and using physio was really quite a detour.  Instead just
	fill in a bio and ship it down.
2004-10-29 10:15:56 +00:00
Robert Watson
60c9762920 Explicitly break out NETA license from Berkeley license to clearly
indicate license grant, as well as to indicate that NETA is asserting
only two clauses, not four clauses.

Requested by:	imp
2004-10-20 08:05:02 +00:00
Poul-Henning Kamp
883d3c0c07 Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork.  In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly.  Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
2004-09-13 06:50:42 +00:00
Alexander Kabaev
b403319b8d Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper
inode field based on UFS version. Use DIP ro read values and DIP_SET
to modify them throughout FFS code base.
2004-07-28 06:41:27 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00