Commit Graph

1029 Commits

Author SHA1 Message Date
phk
157437ec08 Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by:       src/tools/tools/vop_table
2003-01-04 08:47:19 +00:00
phk
daf6948653 Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
schweikh
d3367c5f5d Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
alfred
927595101c When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.
2003-01-01 01:56:19 +00:00
phk
72aba11981 Use three UMA zones for FFS/UFS inodes instead of malloc space.
Since inodes are currently 144 bytes, this will save 112 bytes per
inode.  This can amount to up to 10MByte on large systems.
2002-12-27 11:05:05 +00:00
phk
3d9be4e20a Move the allocation of the inode contents into ffs_vfsops.c rather than
passing malloc types around.
2002-12-27 10:23:03 +00:00
phk
afd8ad09d7 Make ffs_mountfs() static.
Remove the malloctype from the ufs mount structure, instead add a callback
to the storage method for freeing inodes: UFS_IFREE().

Add vfs_ifree() method function which frees an inode.

Unvariablelize the malloc type used for allocating inodes.
2002-12-27 10:06:37 +00:00
mckusick
91195ac766 Fix corruption introduced in previous delta.
Reported by:	Aurelien Nephtali <aurelien.nephtali@wanadoo.fr>
Sponsored by:   DARPA & NAI Labs.
2002-12-18 19:50:28 +00:00
mckusick
f86b91ebe0 Keep comments consistent with the code. Minor optimization.
Sponsored by:   DARPA & NAI Labs.
2002-12-18 07:19:41 +00:00
mckusick
345fe7ca1f Cosmetic cleanup of unsigned buglets.
Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:   DARPA & NAI Labs.
2002-12-18 00:53:45 +00:00
phk
723e3b218d Remove unused lockcnt variable.
Approved by:	mckusick
2002-12-17 20:23:51 +00:00
mckusick
668d95fdc4 Update to previous change (1.54) to use an approperly wide inode field
so as to work correctly on 64-bit platforms.

Reported-by:	Jake Burkholder <jake@locore.ca>
Sponsored by:   DARPA & NAI Labs.
Approved by:	Ian Dowse <iedowse@maths.tcd.ie>
2002-12-15 19:25:59 +00:00
iedowse
3bf3f7056c Undo the adjustment of the total memory used by dirhash in the case
where allocating the dirhash structure fails. Fix a few typos in
comments and update copyright.

MFC after:	1 week
2002-12-14 17:16:16 +00:00
mckusick
069ffa44b5 Only the most recent snapshot contains the complete list of blocks
that were copied in all of the earlier snapshots, thus its precomputed
list must be used in the copyonwrite test. Using incomplete lists may
lead to deadlock. Also do not include the blocks used for the indirect
pointers in the indirect pointers as this may lead to inconsistent
snapshots.

Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-14 01:36:59 +00:00
trhodes
b45b4e9e9e Remove the comment about dump(8) not working properly with snapshots.
Discussed with:	mckusick
Approved by:	re (rwatson)
2002-12-12 00:31:45 +00:00
mckusick
461d1f1c94 More tightly verify the preference returned for the new inode.
Submitted by:	Kris Kennaway <kris@obsecurity.org>
Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-06 02:08:46 +00:00
mckusick
6822be5fe2 Have to use bread() rather than UFS_BALLOC() when obtaining a
previously allocated block as the previous use of the block may
have fallen out of the cache. Failure to reread its contents cause
zeroed results to be written instead of the proper contents.
Conversely, when the block is going to be entirely filled in, it
is not necessary reread the old contents.

Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-03 18:19:27 +00:00
mckusick
69c6a0dc68 Add a check to disable the previous patch so that future filesystems
that choose to place their superblocks in non-standard locations will
not get them smashed.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 19:04:57 +00:00
mckusick
c1f52553a2 Remove a race condition / deadlock from snapshots. When
converting from individual vnode locks to the snapshot
lock, be sure to pass any waiting processes along to the
new lock as well. This transfer is done by a new function
in the lock manager, transferlockers(from_lock, to_lock);
Thanks to Lamont Granquist <lamont@scriptkiddie.org> for
his help in pounding on snapshots beyond all reason and
finding this deadlock.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 19:00:51 +00:00
mckusick
d4a32db4ae Fix two deadlocks in snapshots:
1) Release the snapshot file lock while suspending the system. Otherwise
   a process trying to read the lock may block on its containing directory
   preventing the suspension from completing. Thanks to Sean Kelly
   <smkelly@zombie.org> for finding this deadlock.

2) Replace some bdwrite's with bawrite's so as not to fill all the
   buffers with dirty data. The buffers could not be cleaned as the
   snapshot vnode was locked hence the system could deadlock when
   making snapshots of really massive filesystems. Thanks to
   Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring
   this out.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 07:27:12 +00:00
mckusick
78d15991e4 Check to make sure that the fs_sblockloc field was properly updated
before using it to write the superblock. This is to guard against
accidentally trashing the disklabel if the superblock format missed
being upgraded by the new kernel.

Reported by:	Sam Leffler <sam@errno.com>
Sponsored by:   DARPA & NAI Labs.
Approved by:	Murray Stokely <murray@FreeBSD.org>
2002-11-29 19:20:15 +00:00
mckusick
9251693096 Create a new 32-bit fs_flags word in the superblock. Add code to move
the old 8-bit fs_old_flags to the new location the first time that the
filesystem is mounted by a new kernel. One of the unused flags in
fs_old_flags is used to indicate that the flags have been moved.
Leave the fs_old_flags word intact so that it will work properly if
used on an old kernel.

Change the fs_sblockloc superblock location field to be in units
of bytes instead of in units of filesystem fragments. The old units
did not work properly when the fragment size exceeeded the superblock
size (8192). Update old fs_sblockloc values at the same time that
the flags are moved.

Suggested by:	BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk>
Sponsored by:   DARPA & NAI Labs.
2002-11-27 02:18:58 +00:00
mckusick
a51ebc9dc0 The target for the maximum number of dependencies has been cut
in half because of reports that under heavy load the kernel could
exhaust its memory pool. The limit is now (desiredvnodes * 4)
rather than (desiredvnodes * 8), so it will still scale with
larger systems, just not as quickly.

Sponsored by:   DARPA & NAI Labs.
2002-11-20 05:16:11 +00:00
mckusick
637af64f54 If an error occurs while writing a buffer, then the data will
not have hit the disk and the dependencies cannot be unrolled.
In this case, the system will mark the buffer as dirty again so
that the write can be retried in the future. When the write
succeeds or the system gives up on the buffer and marks it as
invalid (B_INVAL), the dependencies will be cleared.

Sponsored by:   DARPA & NAI Labs.
2002-11-20 05:14:16 +00:00
peter
c841be9bcb Do not assume that time_t is an int.
Approved by:	re (jhb)
2002-11-15 22:36:57 +00:00
jhb
b8daeabf59 Print daddr_t's with %j and intmax_t. 2002-11-08 22:28:35 +00:00
rwatson
3ad18c8074 Update licenses and wording: NAI has authorized the removal of clause three
of their BSD-style license; also, carry out the NAI Labs -> Network
Associates Laboratories renaming in these files.
2002-11-04 02:35:46 +00:00
wollman
ce3867deda Implement the new 1003.1-2001 pathconf() keys, including the Advisory
Information option.  Other filesystem implementations should do something
similar.

With advice from:	mckusick, phk
2002-10-27 18:09:49 +00:00
rwatson
312cab0dee Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception.  For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system.  With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance.  This
also corrects sematics for shared vnode locks, which were not
previously present in the system.  This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form.  With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception.  We'll introduce a work around for this shortly.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-26 14:38:24 +00:00
mckusick
6b1611bd94 Within ufs, the ffs_sync and ffs_fsync functions did not always
check for and/or report I/O errors. The result is that a VFS_SYNC
or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in
the presence of a hard error writing a disk sector or in a filesystem
full condition. This patch ensures that I/O errors will always be
checked and returned.  This patch also ensures that every call to
VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes
appropriate action when an error is returned.

Sponsored by:   DARPA & NAI Labs.
2002-10-25 00:20:37 +00:00
mckusick
0337df10b7 We must be careful to avoid recursive copy-on-write faults when
trying to clean up during disk-full senarios.

Sponsored by:	DARPA & NAI Labs.
2002-10-23 21:47:02 +00:00
mckusick
3819d46020 Missplaced FREE_LOCK causes a panic when hit while taking a snapshot.
Sponsored by:	DARPA & NAI Labs.
2002-10-23 05:14:06 +00:00
mckusick
04450228c6 This update further fine tunes the locking of snapshot vnodes in
the ffs_copyonwrite routine to avoid a deadlock between the syncer
daemon trying to sync out a snapshot vnode and the bufdaemon
trying to write out a buffer containing the snapshot inode.
With any luck this will be the last snapshot race condition.

Sponsored by:	DARPA & NAI Labs.
2002-10-22 01:23:00 +00:00
mckusick
a515fcf789 This update is a performance improvement when allocating blocks on
a full filesystem. Previously, if the allocation failed, we had to
fsync the file before rolling back any partial allocation of indirect
blocks. Most block allocation requests only need to allocate a single
data block and if that allocation fails, there is nothing to unroll.
So, before doing the fsync, we check to see if any rollback will
really be necessary. If none is necessary, then we simply return.
This update eliminates the flurry of disk activity that got triggered
whenever a filesystem would run out of space.

Sponsored by:	DARPA & NAI Labs.
2002-10-22 01:14:25 +00:00
mckusick
305e5868f3 This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):

----------------------------
revision 1.65
date: 2002/04/22 06:53:20;  author: phk;  state: Exp;  lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *".  Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------

As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.

On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.

Sponsored by:	DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
rwatson
d862ecfee8 Rename _POSIX_FOO_PRESENT and friends from POSIX.1e to _PC_FOO_PRESENT
and related friends.  This would have been corrected had POSIX.1e
progressed to a standard.

Pointed out by:	wollman
2002-10-20 22:11:13 +00:00
rwatson
438835cabb Implement _POSIX_ACL_PATH_MAX, which returns the maximum number of ACL
entries for a file system node using pathconf().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 22:08:26 +00:00
rwatson
9d17032f64 Teach UFS to respond to pathconf() tests for _POSIX_ACL_EXTENDED and
_POSIX_MAC_PRESENT based on available mount flags, if the services are
available.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 21:49:41 +00:00
rwatson
a2eb2e3662 Clarify that the UFS1 extended attribute configuration steps do not apply
to UFS2 file systems.

Submitted by:	jedgar
Obtained from:	TrustedBSD Project
2002-10-19 16:09:16 +00:00
dillon
d155b8f135 Fix a file-rewrite performance case for UFS[2]. When rewriting portions
of a file in chunks that are less then the filesystem block size, if the
data is not already cached the system will perform a read-before-write.
The problem is that it does this on a block-by-block basis, breaking up the
I/Os and making clustering impossible for the writes.  Programs such
as INN using cyclic file buffers suffer greatly.  This problem is only going
to get worse as we use larger and larger filesystem block sizes.

The solution is to extend the sequential heuristic so UFS[2] can perform
a far larger read and readahead when dealing with this case.

(note: maximum disk write bandwidth is 27MB/sec thru filesystem)
(note: filesystem blocksize in test is 8K (1K frag))
dd if=/dev/zero of=test.dat bs=1k count=2m conv=notrunc

Before:  (note half of these are reads)
      tty             da0              da1             acd0             cpu
 tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   0   76 14.21 598  8.30   0.00   0  0.00   0.00   0  0.00   0  0  7  1 92
   0   76 14.09 813 11.19   0.00   0  0.00   0.00   0  0.00   0  0  9  5 86
   0   76 14.28 821 11.45   0.00   0  0.00   0.00   0  0.00   0  0  8  1 91

After:	(note half of these are reads)
      tty             da0              da1             acd0             cpu
 tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   0   76 63.62 434 26.99   0.00   0  0.00   0.00   0  0.00   0  0 18  1 80
   0   76 63.58 424 26.30   0.00   0  0.00   0.00   0  0.00   0  0 17  2 82
   0   76 63.82 438 27.32   0.00   0  0.00   0.00   0  0.00   1  0 19  2 79

Reviewed by:	mckusick
Approved by:	re
X-MFC after:	immediately (was heavily tested in -stable for 4 months)
2002-10-18 22:52:41 +00:00
rwatson
ab9568ccbf Update extended attribute readme file to note that no special configuration
is required to use EAs with UFS2, and that UFS2 is recommend for EA use
for a variety of reasons.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-18 21:11:36 +00:00
rwatson
10e2a00a6a Update instructions for ACLs given recent tunefs, mount changes. Also
note that UFS2 doesn't require explicit extended attribute configuration,
and is recommends for this and other reasons if you plan to use ACLs.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-18 21:09:57 +00:00
rwatson
749729a702 Use 'size_t' instead of 'int' for the result of sizeof(). 2002-10-18 21:03:30 +00:00
mckusick
0af0d22682 With the revised single-lock method used in snapshots, the
BA_NOWAIT flag is no longer needed.

Sponsored by:	DARPA & NAI Labs.
2002-10-18 01:17:28 +00:00
mckusick
733bfbdd78 Change locking so that all snapshots on a particular filesystem share
a common lock. This change avoids a deadlock between snapshots when
separate requests cause them to deadlock checking each other for a
need to copy blocks that are close enough together that they fall
into the same indirect block. Although I had anticipated a slowdown
from contention for the single lock, my filesystem benchmarks show
no measurable change in throughput on a uniprocessor system with
three active snapshots. I conjecture that this result is because
every copy-on-write fault must check all the active snapshots, so
the process was inherently serial already. This change removes the
last of the deadlocks of which I am aware in snapshots.

Sponsored by:	DARPA & NAI Labs.
2002-10-16 00:19:23 +00:00
rwatson
174c4a0034 Push most UFS ACL behavior behind a check for MNT_ACLS, permitting ACLs
to be administratively disabled as needed on UFS/UFS2 file systems.  This
also has the effect of preventing the slightly more expensive ACL code
from running on non-ACL file systems, avoiding storage allocation for
ACLs that may be read from disk.  MNT_ACLS may be set at mount-time
using mount -o acls, or implicitly by setting the FS_ACLS flag using
tunefs.  On UFS1, you may also have to configure ACL store.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-15 21:28:24 +00:00
rwatson
e112f21cae If the FS_MULTILABEL flag is set in a UFS or UFS2 superblock,
automatically set MNT_MULTILABEL in the mount flags.

If FS_ACLS is set in a UFS or UFS2 superblock, automatically
set MNT_ACLS in the mount flags.

If either of these flags is set, but the appropriate kernel option
to support the features associated with the flag isn't available,
then print a warning at mount-time.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-15 20:00:06 +00:00
mckusick
750c7540cc When reading or writing the extended attributes of a special device
or fifo in UFS2, the normal ufs_strategy routine needs to be used
rather than the spec_strategy or fifo_strategy routine. Thus the
ffsext_strategy routine is interposed in the ffs_vnops vectors for
special devices and fifo's to pick off this special case. Otherwise
it simply falls through to the usual spec_strategy or fifo_strategy
routine.

Submitted by:	Robert Watson <rwatson@FreeBSD.org>
Sponsored by:	DARPA & NAI Labs.
2002-10-14 23:18:09 +00:00
rwatson
67d568c288 Fix two memory leaks in error conditions involving the UFS ACL code:
if failures occur, make sure that we release both the default ACL
and access ACL storage during new object creation.

Spotted by:	phk and his pet flexelint
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-14 19:55:49 +00:00
rwatson
9c6b9f51d1 Define two new superblock file system flags:
FS_ACLS		Administrative enable/disable of extended ACL support
FS_MULTILABEL	Administrative flag to indicate to the MAC Framework
		that objects in the file system are individually
		labeled using extended attributes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
Reviewed by:	(in principal) mckusick, phk
2002-10-14 17:07:11 +00:00