Commit Graph

675 Commits

Author SHA1 Message Date
Alexander Kabaev
45d45c6cde Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter
and one cannot be sure it is equal to curthread.
2003-11-03 04:46:19 +00:00
Alexander Kabaev
cb9ddc80ae Take care not to call vput if thread used in corresponding vget
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.

The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.

This fixes the panic on shutdown people have reported.
2003-11-02 04:52:53 +00:00
Alexander Kabaev
492c1e68fb Temporarily undo parts of the stuct mount locking commit by jeff.
It is unsafe to hold a mutex across vput/vrele calls.

This will be redone when a better locking strategy is agreed upon.

Discussed with: jeff
2003-11-01 05:51:54 +00:00
Don Lewis
9f206707a5 Tweak the calculation of minbfree in ffs_dirpref() so that only
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory.  The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.

Modify the calculation of minifree in the same way.

Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.

Reviewed by:	mckusick
Tested by:	kmarx@vicor.com
MFC after:	2 weeks
2003-10-31 07:25:06 +00:00
John Baldwin
787f162df6 Move the P_COWINPROGRESS flag from being a per-process p_flag to being a
per-thread td_pflag which doesn't require any locks to read or write as it
is only read or written by curthread on itself.

Glanced at by:	mckusick
2003-10-23 21:14:08 +00:00
Tor Egge
f0da6ec99b Initialize bp->b_offset to the physical offset in partition
so GEOM knows where to read from disk.
2003-10-22 18:57:59 +00:00
Poul-Henning Kamp
2c18019f14 DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)
2003-10-18 14:10:28 +00:00
Poul-Henning Kamp
4e1694ecaf Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY() 2003-10-18 11:16:33 +00:00
Kirk McKusick
bd189c8c3e When expunging unlinked files from a snapshot, skip over holes in the
file rather than panicing with "indiracct: botched params".

Submitted by:	Mark Santcroos <marks@ripe.net>
2003-10-17 13:57:58 +00:00
Jeff Roberson
a844eb934c - My last commit to this file is still not safe, I believe that it may be
due to the recursion in indir_trunc().
2003-10-06 03:28:03 +00:00
Jeff Roberson
8af6a57099 - Reinstate 1.142 this was fixed by 1.144. 2003-10-06 02:39:37 +00:00
Jeff Roberson
69b609a85d - The VCHR case in ffs_sync() is an unneccsary optimization especially
considering how infrequently we access devices via ffs now that we have
   devfs.   Collapse this case with the other case.

Obtained from:	bde
2003-10-05 22:56:33 +00:00
Jeff Roberson
ab1f917b53 - Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE()
so make the code slightly more uniform.  The vnode lock is acquired in
   all cases and now the only difference between VCHR and other is we
   call UFS_UPDATE instead of VOP_FSYNC().
2003-10-05 09:42:24 +00:00
Jeff Roberson
cffa37d466 - In ffs_update() assert that either the vnode lock or the XLOCK is held. 2003-10-05 09:39:02 +00:00
Jeff Roberson
2f05568aa8 - Check the XLOCK before inspecting v_data.
- Slightly rewrite the fsync loop to be more lock friendly.  We must
   acquire the vnode interlock before dropping the mnt lock.  We must
   also check XLOCK to prevent vclean() races.
 - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean()
   races.
 - Use a local variable to store the results of the nvp == TAILQ_NEXT
   test so that we do not access the vp after we've vrele()d it.
 - Add an XXX comment about UFS_UPDATE() not being protected by any lock
   here.  I suspect that it should need the VOP lock.
2003-10-05 07:16:45 +00:00
Jeff Roberson
53938b4a86 - Skip over xvp if XLOCK is set. 2003-10-05 06:48:37 +00:00
Alan Cox
ccf78b6895 Synchronize access to a vm page's valid field using the containing
vm object's lock.
2003-10-04 20:38:32 +00:00
Jeff Roberson
cac3558da3 - The VI assert in getdirtybuf() is only valid if we're not on a VCHR
vnode.  VCHR vnodes don't do background writes.

Reported by:	kan
2003-10-04 15:57:05 +00:00
Jeff Roberson
04a17687ea - Increase the scope of the interlock in ffs_reload(). Acquire it before
we release the mntvnode_mtx.
 - Call vgonel() directly instead of going through vrecycle() since we own
   the interlock now.
 - Remove a few cases where we locked the interlock just so that we could
   call VOP_UNLOCK with interlock held.
2003-10-04 14:27:49 +00:00
Jeff Roberson
934914d2ef - Fix an unlocked call to GETATTR by slightly shuffling the code in
ffs_snapshot() around.
 - Acquire the interlock before releasing the mntvnode_mtx.  Use the
   interlock to protect v_usecount access.
2003-10-04 14:25:45 +00:00
Jeff Roberson
04c81ad83c - Remove a mp_fixme() and some locks that weren't necessary. I now
understand how this works.
2003-10-04 11:06:43 +00:00
Jeff Roberson
cfd5600c66 - Several of the callers to getdirtybuf() were erroneously changed to pass
in a list head instead of a pointer to the first element at the time of
   the first call.  These lists are subject to change, and getdirtybuf()
   would refetch from the wrong list in some cases.

Spottedy by:	tegge
Pointy hat to:	me
2003-09-03 04:08:15 +00:00
Jeff Roberson
23efe6dafc - Backout rev 1.142. This caused a deadlock that I do not understand. More
investigation is required.
2003-08-31 11:26:52 +00:00
Jeff Roberson
d919a11d06 - Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to
bail out if the buffer is not already present.
 - The buffer returned by incore() is not locked and should not be sent to
   brelse().  Use getblk() with the new GB_NOCREAT flag to preserve the
   desired semantics.
2003-08-31 08:50:11 +00:00
Jeff Roberson
a0ebaaddef - Don't acquire the vnode interlock in drain_output(). Instead, require the
caller to acquire it.  This permits drain_output() to be done atomically
   with other operations as well as reducing the number of lock operations.
 - Assert that the proper locks are held in drain_output().
 - Change getdirtybuf() to accept a mutex as an argument.  This mutex is used
   to protect the vnode's buf list and the BKGRDWAIT flag.  This lock is
   dropped when we successfully acquire a buffer and held on return
   otherwise.  These semantics reduce the number of cumbersome cases in
   calling code.
 - Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this
   mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF
   case of interlocked_sleep().
 - Change the return value of getdirtybuf() to be the resulting locked buffer
   or NULL otherwise.  This is for callers who pass in a list head that
   requires a lock.  It is necessary since the lock that protects the list
   head must be dropped in getdirtybuf() so that we don't have a lock order
   reversal with the buf queues lock in bremfree().
 - Adjust all callers of getdirtybuf() to match the new semantics.
 - Add a comment in indir_trunc() that points at unlocked access to a buf.
   This may also be one of the last instances of incore() in the tree.
2003-08-31 07:29:34 +00:00
Jeff Roberson
9dbfeb0ae6 - Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field.
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
   interlock.
 - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
   buffers.  Check for the BKGRDINPROG flag before recycling or throwing
   away a buffer.  We do this instead because it is not safe for us to move
   the original buffer to a new queue from the callback on the background
   write buffer.
 - Remove the B_LOCKED flag and the locked buffer queue.  They are no longer
   used.
 - The vnode interlock is used around checks for BKGRDINPROG where it may
   not be strictly necessary.  If we hold the buf lock the a back-ground
   write will not be started without our knowledge, one may only be
   completed while we're not looking.  Rather than remove the code, Document
   two of the places where this extra locking is done.  A pass should be
   done to verify and minimize the locking later.
2003-08-28 06:55:18 +00:00
Alan Cox
9cf8f2f707 The previous change necessitates the addition of a new #include. Otherwise,
there is a compilation warning.
2003-08-18 17:27:08 +00:00
Poul-Henning Kamp
b103854847 Don't use a VOP_*() function on our own vnodes, go directly to the
relevant internal function, in this case ufs_bmaparray().
2003-08-17 19:26:03 +00:00
Alan Cox
f6c098e569 Revision 1.44 of ufs/ufs/inode.h has made it necessary to add two new
#includes to this file.  Otherwise, it doesn't compile.
2003-08-16 06:15:17 +00:00
Poul-Henning Kamp
5c24d6ee26 Eliminate the i_devvp field from the incore UFS inodes, we can
get the same value from ip->i_ump->um_devvp.

This saves a pointer in the memory copies of inodes, which can
easily run into several hundred kilobytes.

The extra indirection is unmeasurable in benchmarks.

Approved by:	mckusick
2003-08-15 20:03:19 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
Robert Watson
9080ff25cf Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-07-28 18:53:29 +00:00
Poul-Henning Kamp
a8d43c90af Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
Alan Cox
4e28b22e35 Lock the vm object when freeing pages. 2003-06-15 21:50:38 +00:00
Poul-Henning Kamp
cefb5754dd Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.
2003-06-15 18:53:00 +00:00
Poul-Henning Kamp
7652131bee Initialize struct vfsops C99-sparsely.
Submitted by:   hmp
Reviewed by:	phk
2003-06-12 20:48:38 +00:00
David E. O'Brien
f4636c5959 Use __FBSDID(). 2003-06-11 06:34:30 +00:00
Robert Watson
1e9e2eb598 Implement ffs_listextattr() by breaking out that logic and special-cased
attribute name of "" from ffs_getextattr().  Invoking VOP_GETETATTR()
with an empty name is now no longer supported; user application
compatibility is provided by a system call level compatibility
wrapper.  We make sure to explicitly reject attempts to set an EA
with the name "".

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-06-05 05:57:39 +00:00
Robert Watson
e1249def7d Return EOPNOTSUPP for attempted EA operations on VCHR vnodes in UFS2;
if we permit them to occur, the kernel panics due to our performing
EA operations using VOP_STRATEGY on the vnode.  This went unnoticed
previously because there are very for users of device nodes on UFS2
due to the introduction of devfs.  However, this can come up with
the Linux compat directories and its hard-coded dev nodes (which will
need to go away as we move away from hard-coded device numbers).
This can come up if you use EA-intensive features such as ACLs and
MAC.

The proper fix is pretty complicated, but this band-aid would be
an excellent MFC candidate for the release.
2003-06-01 02:42:18 +00:00
Poul-Henning Kamp
6280ed26af Remove unused local variables.
Found by:       FlexeLint
2003-05-31 18:17:32 +00:00
Poul-Henning Kamp
17a1391990 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
Alan Cox
7f758dabbb Lock the vm object when performing vm_object_page_clean().
Approved by:	re (rwatson)
2003-05-18 22:02:51 +00:00
Alan Cox
ad682c4825 Lock the vm_object on entry to vm_object_vndeallocate(). 2003-05-03 20:28:26 +00:00
Tim J. Robbins
3632928957 Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree().
These fields can be left as NULL if ffs_vget() allocates an inode but
fails before the dinode memory has been allocated. There are two cases
when this can occur: when we lose a race and another process has added
the inode to the hash, and when reading the inode off disk fails.

The bug was observed by Kris on one of the package-building machines.
See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2
In Kris's case, it was the bread() that failed because of a disk error.

The alternative to this patch is to ensure that ffs_vget() does not call
vput() when the inode that hasn't been properly initialised.
2003-05-01 06:41:59 +00:00
Tim J. Robbins
8d721e877d Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems.
This is purely a cosmetic change because these members are in a
union together.
2003-05-01 06:38:27 +00:00
Mark Murray
51da11a27a Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.
2003-04-30 12:57:40 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
John Baldwin
a15cc35909 Lock both the proc lock and sched_lock when calling sched_nice since
kg_nice is now protected by both.  Being protected by both means that
other places in the kernel that want to read kg_nice only need one of the
two locks.
2003-04-22 20:45:38 +00:00
Jeff Roberson
86711bae9b - Use the sched_nice() api instead of setting the nice value directly.
Tested by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
2003-04-12 01:05:19 +00:00
Alan Cox
6134838f99 Sufficient access checks are performed by vmapbuf() that calling useracc()
is pointless.  Remove the call to useracc().

Don't reinitialize fields that are already initialized by getpbuf().

Reviewed by:	tegge
2003-04-06 19:26:30 +00:00
Tor Egge
5e2e6a67c4 Check return value from vmapbuf instead of the function address. 2003-03-27 20:48:34 +00:00
Tor Egge
10dccf8ff2 Eliminate a buffer sleep/wakeup race. 2003-03-27 19:28:11 +00:00
Tor Egge
5bbb806004 Add support for reading directly from file to userland buffer when the
O_DIRECT descriptor status flag is set and both offset and length is a
multiple of the physical media sector size.
2003-03-26 23:40:42 +00:00
John Baldwin
31566c96f4 Use td->td_ucred instead of td->td_proc->p_ucred. 2003-03-20 21:17:40 +00:00
John Baldwin
2a53bfbe62 Minor fixes to ffs_fserr():
- Assume that curthread is not NULL.  It never is in -current.
- Use td_ucred instead of p_ucred.
2003-03-20 21:15:54 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Jeff Roberson
09f11da5a3 - Remove a race between fsync like functions and flushbufqueues() by
requiring locked bufs in vfs_bio_awrite().  Previously the buf could
   have been written out by fsync before we acquired the buf lock if it
   weren't for giant.  The cluster_wbuild() handles this race properly but
   the single write at the end of vfs_bio_awrite() would not.
 - Modify flushbufqueues() so there is only one copy of the loop.  Pass a
   parameter in that says whether or not we should sync bufs with deps.
 - Call flushbufqueues() a second time and then break if we couldn't find
   any bufs without deps.
2003-03-13 07:19:23 +00:00
Kirk McKusick
34968037b1 Use the appropriate size when zeroing out the unused portion
of a snapshot's copy of a superblock. This patch fixes a panic
when taking a snapshot of a 4096/512 filesystem.

Reported by:	Ian Freislich <ianf@za.uu.net>
Sponsored by:   DARPA & NAI Labs.
2003-03-07 23:49:16 +00:00
Alan Cox
09c80124a3 Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on:	arch@
2003-03-06 03:41:02 +00:00
Jeff Roberson
7261f5f68e - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Kirk McKusick
74f3809a19 Change the field used to test whether the superblock has been updated
from the filesystem size field to the filesystem maximum blocksize
field. The problem is that older versions of growfs updated only the
new size field and not the old size field. This resulted in the old
(smaller) size field being copied up to the new size field which
caused the filesystem to appear to fsck to be badly trashed.

This also adds a sanity check to ensure that the superblock is not
being updated when the filesystem is mounted read-only. Obviously
such an update should never happen.

Reported by:	Nate Lawson <nate@root.org>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:21:08 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
Kirk McKusick
3bf0ed940b When removing the last item from a non-empty worklist, the worklist
tail pointer must be updated.

Reported by:	Kris Kennaway <kris@obsecurity.org>
Sponsored by:   DARPA & NAI Labs.
2003-02-24 07:28:41 +00:00
Kirk McKusick
5bb651cb72 This patch fixes a deadlock between the bufdaemon and a process taking
a snapshot. As part of taking a snapshot of a filesystem, the kernel
builds up a list of the filesystem metadata (such as the cylinder
group bitmaps) that are contained in the snapshot. When doing a
copy-on-write check, the list is first consulted. If the block being
written is found on the list, then the full snapshot lookup can be
avoided. Besides providing an important performance speedup this
check also avoids a potential deadlock between the code creating
the snapshot and the bufdaemon trying to cleanup snapshot related
buffers. This fix creates a temporary list containing the key
metadata blocks that can cause the deadlock. This temporary list
is used between the time that the snapshot is first enabled and the
time that the fully complete list is built.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-22 00:59:34 +00:00
Kirk McKusick
37e2ebfdba This patch fixes a bug on an active filesystem on which a snapshot
is being taken from panicing with either "freeing free block" or
"freeing free inode". The problem arises when the snapshot code
is scanning the filesystem looking for inodes with a reference
count of zero (e.g., unlinked but still open) so that it can
expunge them from its view. If it encounters a reclaimed vnode
and has to restart its scan, then it will panic if it encounters
and tries to free an inode that it has already processed. The fix
is to check each candidate inode to see if it has already been
processed before trying to delete it from the snapshot image.

Sponsored by:   DARPA & NAI Labs.
2003-02-22 00:29:51 +00:00
Kirk McKusick
d60682c239 This patch fixes a bug in the logical block calculation macros so
that they convert to 64-bit values before shifting rather than
afterwards. Once fixed, they can be used rather than inline expanded.

Sponsored by:   DARPA & NAI Labs.
2003-02-22 00:19:26 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Kirk McKusick
aca3e4974f Replace use of random() with arc4random() to provide less guessable
values for the initial inode generation numbers in newfs and for
newly allocated inode generation numbers in the kernel.

Submitted by:	Theo de Raadt <deraadt@cvs.openbsd.org>
Sponsored by:   DARPA & NAI Labs.
2003-02-14 21:31:58 +00:00
Kirk McKusick
50bd54e391 Correct lines incorrectly added to the copyright message.
Submitted by:	Frank van der Linden <fvdl@wasabisystems.com>
Sponsored by:   DARPA & NAI Labs.
2003-02-14 00:31:06 +00:00
Jeff Roberson
767b9a529d - Cleanup unlocked accesses to buf flags by introducing a new b_vflag member
that is protected by the vnode lock.
 - Move B_SCANNED into b_vflags and call it BV_SCANNED.
 - Create a vop_stdfsync() modeled after spec's sync.
 - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
   fs specific processing.  This gives all of these filesystems proper
   behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
 - Annotate the locking in buf.h
2003-02-09 11:28:35 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Marcel Moolenaar
cc4a858397 o Improve wording of the comment that accompanies fs_pad. The
padding is not specific to non-i386 architectures. It is
   caused by non-i386 specific alignment requirements of
   fs_swuid,
o  Add a CTASSERT to catch a change in the size of struct fs
   at compile-time rather than run-time.

Ok'd: gordon
Tested on: i386 ia64
2003-01-10 06:59:34 +00:00
Gordon Tetlow
963cae780f Fix superblock alignment problems on non-i386 platforms. Also change fs_uuid
to fs_swuid, making it more descriptive.

Submitted by:	marcel
Reviewed by:	peter
Pointy hat to:	gordon
2003-01-09 23:53:30 +00:00
Gordon Tetlow
291871da9e Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volname
will be used to support volume names with the help of a GEOM module (to be
committed). uuid will be used to deal with conflicting volume names (which
doesn't work just yet).

Approved by:	mckusick@
2003-01-08 22:53:54 +00:00
Kirk McKusick
fa06a012cd This patch fixes a problem caused by applications that rapidly and
repeatedly truncate the same file. Each time the file is truncated,
a buffer is grabbed to store the indirect block numbers that need
to be freed. Those blocks cannot be freed until the inode claiming
them is written to disk. Thus, the number of buffers being held by
soft updates explodes and in extreme cases can run the kernel out
of buffers. The problem can be avoided by doing an fsync on the
file every debug.maxindirdep truncates (currently defaulted to 50).
The fsync causes the inode to be written so that the held buffers
can be freed. The check for excessive buffers is checked as part
of the existing hook for excessive dependencies (softdep_slowdown)
in the truncate code.

Reported by:	David Schultz <dschultz@uclink.Berkeley.EDU>
Sponsored by:   DARPA & NAI Labs.
MFC after:	3 weeks
2003-01-07 18:23:50 +00:00
Poul-Henning Kamp
862702306b Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Alfred Perlstein
13438f6823 When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.
2003-01-01 01:56:19 +00:00
Poul-Henning Kamp
aa4d7a8a4b Use three UMA zones for FFS/UFS inodes instead of malloc space.
Since inodes are currently 144 bytes, this will save 112 bytes per
inode.  This can amount to up to 10MByte on large systems.
2002-12-27 11:05:05 +00:00
Poul-Henning Kamp
de6ba7c016 Move the allocation of the inode contents into ffs_vfsops.c rather than
passing malloc types around.
2002-12-27 10:23:03 +00:00
Poul-Henning Kamp
975512a907 Make ffs_mountfs() static.
Remove the malloctype from the ufs mount structure, instead add a callback
to the storage method for freeing inodes: UFS_IFREE().

Add vfs_ifree() method function which frees an inode.

Unvariablelize the malloc type used for allocating inodes.
2002-12-27 10:06:37 +00:00
Kirk McKusick
4c572f6222 Fix corruption introduced in previous delta.
Reported by:	Aurelien Nephtali <aurelien.nephtali@wanadoo.fr>
Sponsored by:   DARPA & NAI Labs.
2002-12-18 19:50:28 +00:00
Kirk McKusick
6d967351b4 Keep comments consistent with the code. Minor optimization.
Sponsored by:   DARPA & NAI Labs.
2002-12-18 07:19:41 +00:00
Kirk McKusick
c021e44776 Cosmetic cleanup of unsigned buglets.
Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:   DARPA & NAI Labs.
2002-12-18 00:53:45 +00:00
Poul-Henning Kamp
120a6d842a Remove unused lockcnt variable.
Approved by:	mckusick
2002-12-17 20:23:51 +00:00
Kirk McKusick
8efcd9a794 Update to previous change (1.54) to use an approperly wide inode field
so as to work correctly on 64-bit platforms.

Reported-by:	Jake Burkholder <jake@locore.ca>
Sponsored by:   DARPA & NAI Labs.
Approved by:	Ian Dowse <iedowse@maths.tcd.ie>
2002-12-15 19:25:59 +00:00
Kirk McKusick
0db138a6b0 Only the most recent snapshot contains the complete list of blocks
that were copied in all of the earlier snapshots, thus its precomputed
list must be used in the copyonwrite test. Using incomplete lists may
lead to deadlock. Also do not include the blocks used for the indirect
pointers in the indirect pointers as this may lead to inconsistent
snapshots.

Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-14 01:36:59 +00:00
Tom Rhodes
1626155b82 Remove the comment about dump(8) not working properly with snapshots.
Discussed with:	mckusick
Approved by:	re (rwatson)
2002-12-12 00:31:45 +00:00
Kirk McKusick
8d6754f289 More tightly verify the preference returned for the new inode.
Submitted by:	Kris Kennaway <kris@obsecurity.org>
Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-06 02:08:46 +00:00
Kirk McKusick
0cb652d925 Have to use bread() rather than UFS_BALLOC() when obtaining a
previously allocated block as the previous use of the block may
have fallen out of the cache. Failure to reread its contents cause
zeroed results to be written instead of the proper contents.
Conversely, when the block is going to be entirely filled in, it
is not necessary reread the old contents.

Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-03 18:19:27 +00:00
Kirk McKusick
31574422a3 Add a check to disable the previous patch so that future filesystems
that choose to place their superblocks in non-standard locations will
not get them smashed.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 19:04:57 +00:00
Kirk McKusick
c6964d3bc9 Remove a race condition / deadlock from snapshots. When
converting from individual vnode locks to the snapshot
lock, be sure to pass any waiting processes along to the
new lock as well. This transfer is done by a new function
in the lock manager, transferlockers(from_lock, to_lock);
Thanks to Lamont Granquist <lamont@scriptkiddie.org> for
his help in pounding on snapshots beyond all reason and
finding this deadlock.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 19:00:51 +00:00
Kirk McKusick
63cf5b0ee2 Fix two deadlocks in snapshots:
1) Release the snapshot file lock while suspending the system. Otherwise
   a process trying to read the lock may block on its containing directory
   preventing the suspension from completing. Thanks to Sean Kelly
   <smkelly@zombie.org> for finding this deadlock.

2) Replace some bdwrite's with bawrite's so as not to fill all the
   buffers with dirty data. The buffers could not be cleaned as the
   snapshot vnode was locked hence the system could deadlock when
   making snapshots of really massive filesystems. Thanks to
   Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring
   this out.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 07:27:12 +00:00
Kirk McKusick
fa5d33e242 Check to make sure that the fs_sblockloc field was properly updated
before using it to write the superblock. This is to guard against
accidentally trashing the disklabel if the superblock format missed
being upgraded by the new kernel.

Reported by:	Sam Leffler <sam@errno.com>
Sponsored by:   DARPA & NAI Labs.
Approved by:	Murray Stokely <murray@FreeBSD.org>
2002-11-29 19:20:15 +00:00
Kirk McKusick
ada981b228 Create a new 32-bit fs_flags word in the superblock. Add code to move
the old 8-bit fs_old_flags to the new location the first time that the
filesystem is mounted by a new kernel. One of the unused flags in
fs_old_flags is used to indicate that the flags have been moved.
Leave the fs_old_flags word intact so that it will work properly if
used on an old kernel.

Change the fs_sblockloc superblock location field to be in units
of bytes instead of in units of filesystem fragments. The old units
did not work properly when the fragment size exceeeded the superblock
size (8192). Update old fs_sblockloc values at the same time that
the flags are moved.

Suggested by:	BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk>
Sponsored by:   DARPA & NAI Labs.
2002-11-27 02:18:58 +00:00
Kirk McKusick
f5235f70a4 The target for the maximum number of dependencies has been cut
in half because of reports that under heavy load the kernel could
exhaust its memory pool. The limit is now (desiredvnodes * 4)
rather than (desiredvnodes * 8), so it will still scale with
larger systems, just not as quickly.

Sponsored by:   DARPA & NAI Labs.
2002-11-20 05:16:11 +00:00
Kirk McKusick
3374bb5ad6 If an error occurs while writing a buffer, then the data will
not have hit the disk and the dependencies cannot be unrolled.
In this case, the system will mark the buffer as dirty again so
that the write can be retried in the future. When the write
succeeds or the system gives up on the buffer and marks it as
invalid (B_INVAL), the dependencies will be cleared.

Sponsored by:   DARPA & NAI Labs.
2002-11-20 05:14:16 +00:00