914 Commits

Author SHA1 Message Date
mckusick
88d85c15ef This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
dillon
7aa1bb9d05 In rev 1.72 a situation related to write/mmap was fixed which could result
in a user process gaining visibility into the 'old' contents of a filesystem
block.  There were two cases:  (1) when uiomove() fails (user process issues
illegal write), and (2) when uiomove() overlaps a mmap() of the same file at
the same offset (fault -> recursive buffer I/O reads contents of old block).

Unfortunately 1.72 also had the unintended effect of forcing the filesystem
to do a read-before-write in the case of a full-block-write (non append case),
e.g. 'dd if=/dev/zero of=test.dat bs=1m count=256 conv=notrunc'.  This
destroys performance.. not only is a read forced for every write, but
clustering breaks as well.

The solution is to clear the buffer manually in the full-block case rather
then asking BALLOC to do it (BALLOC issues the read-before-write).  In the
partial-block case we want BALLOC to do it because the read-before-write
is necessary.  This patch should greatly improve database and news-feed
server performance.

Found by: MKI <mki@mozone.net>
MFC after:	3 days
2002-06-19 09:39:41 +00:00
semenu
f48bc32790 Fix a typo in my recently added comment: s/beleived/believed/
Submitted by:	keramida
2002-06-06 20:43:03 +00:00
alfred
28ce0e9ae5 Backout/modify previous revision:
"empty default cases shouldn't be removed, they should have a break;
  statement added to them."

Requested by: billf
2002-06-01 20:54:21 +00:00
alfred
04d357d9e8 Silence warnings, remove some empty 'default' switch cases. 2002-06-01 20:40:42 +00:00
semenu
b9960bece1 Remove lock from ffs_vget introduced by v1.24. Instead of locking the
vnode creation globaly, we allow processes to create vnodes concurently.
In case of concurent creation of vnode for the one ino, we allow processes
to race and then check who wins.

Assuming that concurent creation of vnode for same ino is really rare case,
this is belived to be an improvement, as it just allows concurent creation
of vnodes.

Idea by:	bp
Reviewed by:	dillon
MFC after:	1 month
2002-05-30 22:04:17 +00:00
rwatson
930f7599ed Remove IFS from 5.0-CURRENT. This facilitates introducing UFS2 as
IFS had its fingers deep in the belly of the UFS/FFS split.  IFS
will be reimplemented by the maintainer at a later date.

Requested by:	adrian (maintainer)
2002-05-19 00:11:08 +00:00
iedowse
caf2c5a16f Fix two casts to "daddr_t *" that should have been "ufs_daddr_t *". 2002-05-18 19:03:00 +00:00
iedowse
cc9d214396 Fix a typo where sizeof(daddr_t) was specified instead of sizeof(doff_t).
Now that daddr_t is 64-bit, this caused hash blocks to be allocated
twice as large as they need to be.
2002-05-18 18:58:27 +00:00
iedowse
794dddf9bf Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u
unions, since these were only necessary when ext2fs used ufs code.

Reviewed by:	mckusick
2002-05-18 18:51:14 +00:00
phk
e127001407 Fix ufs_daddr_t/daddr_t type problems.
Sponsored by:	DARPA & NAI labs.
2002-05-17 18:59:53 +00:00
phk
152c7cfbca Call ufs_bmaparray() with right parameter type.
Sponsored by: DARPA & NAI Labs.
2002-05-17 18:53:29 +00:00
trhodes
28d42899b7 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
phk
8536ea3cdb Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by:	DARPA & NAI Labs.
2002-05-14 11:09:43 +00:00
phk
b591373208 Remove register keyword.
Sponsored by:	DARPA & NAI Labs.
Submitted by:	mckusick
2002-05-13 09:22:31 +00:00
phk
e76663bf87 Remove two "register" and a blank line.
Submitted by:	mckusick
Sponsored by:	DARPA & NAI Labs.
2002-05-12 22:54:48 +00:00
phk
e27fed0950 ARGH! SBLOCK is not unused. Try to get this right.
BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant).

Define SBLOCK again, using the right math.

Sponsored by: DARPA & NAI Labs.
2002-05-12 20:21:40 +00:00
phk
4aa59982cb Remove #define for BBOFF, it is assumed == 0 so many places that we might
as well forget about it.  In fact the only thing which used it was the
SBOFF macro.

Sponsored by: DARPA & NAI Labs.
2002-05-12 20:00:21 +00:00
phk
57e4ad9a8b Remove unused BBLOCK and SBLOCK #defines.
Sponsored by: DARPA & NAI Labs.
2002-05-12 19:56:31 +00:00
alc
d60a525036 o Condition the compilation and use of vm_freeze_copyopts()
on ENABLE_VFS_IOOPT.
2002-05-06 05:45:57 +00:00
phk
f075f5e39a Move some UFS related stuff home where it belongs. 2002-05-05 20:04:33 +00:00
jeff
86e123b512 Include systm.h so panic(9) is defined when doing DEBUG_ALL_VFS_LOCKS. 2002-05-04 02:40:37 +00:00
phk
a1f99d3509 Name ufs_vop_[gs]etextattr() consistently with the rest of our VOPs and
put then in the ufs_vnops where they belong, rather than in the ffs_vnops.

Ok'ed by:	rwatson
Sponsored by:	DARPA & NAI Labs.
2002-05-03 08:40:33 +00:00
phk
860ddc59bc Use vop_panic() instead of our home-rolled version. 2002-05-02 19:15:52 +00:00
alfred
d1526107c3 Remove support for using soon to be retired "special" poll(2) ops.
Replace with kevent(2) ops.

This is untested, but the code would rot even further if this wasn't
applied.  I've chosen to apply this to prompt some cleanup.

Submitted by: bde
2002-04-18 14:52:28 +00:00
jeff
b972756970 Don't peak into the malloc_type structure for limits. The desired vnodes
check should be sufficient.  This is required for the pending removal of
malloc_type limits.
2002-04-15 03:35:35 +00:00
phk
33405073ec Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>.
Sponsored by:	DARPA & NAI Labs
2002-04-08 09:20:07 +00:00
jhb
db9aa81e23 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
phk
65c4a4cb9d Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h>
Sponsored by:	DARPA & NAI Labs.
2002-04-03 20:39:27 +00:00
phk
79a894e10e Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl. 2002-04-02 11:23:14 +00:00
jhb
dc2e474f79 Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
bde
a9b7c63b29 In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally
provided the latter is nonzero.  At this point, the former is a fairly
arbitrary default value (DFTPHYS), so changing it to any reasonable
value specified by the device driver is safe.  Using the maximum of
these limits broke ffs clustered i/o for devices whose si_iosize_max
is < DFLTPHYS.  Using the minimum would break device drivers' ability
to increase the active limit from DFTLPHYS up to MAXPHYS.

Copied the code for this and the associated (unnecessary?) fixup of
mp_iosize_max to all other filesystems that use clustering (ext2fs and
msdosfs).  It was completely missing.

PR:		36309
MFC-after:	1 week
2002-03-30 15:12:57 +00:00
dwmalone
246869b676 Two minor changes to dirhash, which result in some marginal benchmark
improvements.

1) If deleting an entry results in a chain of deleted slots ending in an
   empty slot, then we can be a bit more aggressive about marking slots as
   empty.

2) The last stage of the FNV hash is to xor the last byte of data
   into the hash. This means that filenames which differ only in
   the last byte will be placed close to one another in the hash
   table, which forms longer chains. To work around this common
   case, we also hash in the address of the dirhash structure.

     news/cancel = news/articles/control/cancel for a tradspool inn server
     squid2 = squid level 2 directory (dirs called 00->FF)
     squid3 = squid level 3 directory (files called 00001F00->00001FFF)

                             mean #probes for
                  home dir  mh inbox  news/cancel  tmp    squid2  squid3
old   successful  1.02      3.19      4.07         1.10    7.85   2.06
new   successful  1.04      1.32      1.27         1.04    1.93   1.17

old unsuccessful  1.08      4.50      5.37         1.17   10.76   2.69
new unsuccessful  1.08      1.73      1.64         1.17    2.89   1.37

Reviewed by:	iedowse
MFC after:	2 weeks
2002-03-20 17:58:02 +00:00
jeff
7bd172afc2 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 08:48:07 +00:00
alfred
79061a9306 Remove __P. 2002-03-19 22:40:48 +00:00
bde
df8144b98e Fixed some printf format errors (hopefully all of the remaining daddr64_t
ones for GENERIC, and all others on the same line as those).  Reformat
the printfs if necessary to avoid new long lones or old format printf
errors.
2002-03-19 04:09:21 +00:00
mckusick
14dd08fd15 Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
2002-03-17 01:25:47 +00:00
mckusick
e929f2e4f0 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
obrien
b4c5b60736 Quiet a warning on the Alpha. 2002-03-15 04:06:10 +00:00
mckusick
670fad8a9e This corrects the first of two known deadlock conditions that
come from the presence of a snapshot file.
2002-03-14 01:21:13 +00:00
iedowse
cf446bea56 Fix a bug in ufsdirhash_adjfree() that caused it to incorrectly
update the free-space statistics in some cases. The problem affected
directory blocks when the free space dropped below the size of the
maximum allowed entry size. When this happened, the free-space
summary information could claim that there are no further blocks
that can fit a maximum-size entry, even if there are.

The effect of this bug is that the directory may be enlarged even
though there is space within the directory for the new entry. This
wastes disk space and has a negative impact on performance.

Fix it by correctly computing the dh_firstfree array index, adding
a helper macro for clarity. Put an extra sanity check into
ufsdirhash_checkblock() to detect the situation in future.

Found by:	dwmalone
Reviewed by:	dwmalone
MFC after:	1 week
2002-03-11 19:13:22 +00:00
phk
2098322771 I missed one VOP_CLOSE in the previous commit.
Pointed out by:	bde
2002-03-11 16:27:04 +00:00
phk
c68bd5ed71 As a XXX bandaid open the mounted device READ/WRITE even if we only mount
read-only.

The trouble here is that we don't reopen the device in read/write mode
when we remount in read/write mode resulting in a filesystem sending
write requests to a device which was only opened read/only.

I'm not quite sure how such a reopen would best be done and defer
the problem to more agile hackers.
2002-03-11 13:53:00 +00:00
rwatson
1a2700eed9 Update DBA for NAI. We have several. We used the wrong one. :-) 2002-03-07 17:49:06 +00:00
green
965906f60d Add new errno ``ENOATTR''. 2002-03-07 15:13:44 +00:00
dillon
e05fb41e7e cleanup readability syntax prior to ongoing b_resid work commits.
MFC after:	1 day
2002-03-06 00:44:30 +00:00
jhb
b8b3ac8816 Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic
and isn't strictly required.  However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.
2002-02-27 19:18:10 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
phk
62d248fb9e Replace bowrite() with BUF_WRITE in ufs.
Remove bowrite(), it is now unused.

This is the first step in getting entirely rid of BIO_ORDERED which is
a generally accepted evil thing.

Approved by:	mckusick
2002-02-22 09:03:00 +00:00
rwatson
ad827f33f7 o Minor style fix on #endif, missing '_' in comment. 2002-02-20 15:44:43 +00:00