freebsd-skq

Author	SHA1	Message	Date
grog	1f5de30718	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
mckusick	6ea67910b6	Background fsck sysctl operations must use vn_start_write and vn_finished_write so that they do not attempt to modify a suspended filesystem.	2001-04-17 05:06:37 +00:00
mckusick	3931e94b1f	Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved. ------ One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm. First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are: 1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35 2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50 You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html Test Results tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66 "old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref. ------ Algorithm description The old dirpref algorithm is described in comments: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. / A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent. What I mean by a big file system ? 1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache. The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations. My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: / * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. / My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files. My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array. The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are: int32_t fs_avgfilesize; / expected average file size / int32_t fs_avgfpdir; / expected # of files per directory */ These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache. I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down. Obtained from: Grigoriy Orlov <gluk@ptci.ru>	2001-04-10 08:38:59 +00:00
asmodai	05c87e82c2	Fix typo ); -> ,	2001-03-24 15:25:04 +00:00
mckusick	c6fdb61aa7	Check that background fsck operation is being done on a ufs filesystem. Obtained from: Robert Watson <rwatson@FreeBSD.org>	2001-03-23 20:58:25 +00:00
mckusick	69603157de	Add kernel support for running fsck on active filesystems.	2001-03-21 04:09:01 +00:00
mckusick	d22815bec3	Report the correct inode number when panicing with freeing free inode. Report the correct block number when panicing with freeing free block.	2001-03-21 04:01:02 +00:00
asmodai	3065478332	Preceed/preceeding are not english words. Use precede and preceding.	2001-02-18 10:43:53 +00:00
peter	cfc0cd38b7	Minor change: fix warning - move a 'struct vnode *vp' declaration inside a #ifdef DIAGNOSTIC to match its corresponding usage.	2000-07-28 22:27:00 +00:00
mckusick	a3d0c189ea	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
phk	36c3965ff9	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
rwatson	a0dd5ab0fd	Introduce extended attribute support for FFS, allowing arbitrary (name, value) pairs to be associated with inodes. This support is used for ACLs, MAC labels, and Capabilities in the TrustedBSD security extensions, which are currently under development. In this implementation, attributes are backed to data vnodes in the style of the quota support in FFS. Support for FFS extended attributes may be enabled using the FFS_EXTATTR kernel option (disabled by default). Userland utilities and man pages will be committed in the next batch. VFS interfaces and man pages have been in the repo since 4.0-RELEASE and are unchanged. o ufs/ufs/extattr.h: UFS-specific extattr defines o ufs/ufs/ufs_extattr.c: bulk of support routines o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes o contrib/softupdates/ffs_softdep.c: extattr.h includes o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h (This should not be the case, and will be fixed in a future commit) Currently attributes are not supported in MFS. This will be fixed. Reviewed by: adrian, bp, freebsd-fs, other unthanked souls Obtained from: TrustedBSD Project	2000-04-15 03:34:27 +00:00
mckusick	acdd0d6f53	Use 64-bit math to decide if optimization needs to be changed. Necessary for coherent results on filesystems bigger than 0.5Tb. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-03-15 07:08:36 +00:00
mckusick	d4409da210	Several performance improvements for soft updates have been added: 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.	2000-01-10 00:24:24 +00:00
eivind	8befc1a2b8	Change incorrect NULLs to 0s	1999-12-21 11:14:12 +00:00
mckusick	579c93e793	Preferentially allocate the first indirect block in the same cylinder group as the inode. This makes a 15% difference in read speed for files in the 96K to 500K size range.	1999-12-01 19:33:12 +00:00
peter	3b842d34e8	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
sheldonh	190863bb6d	Fix bug introduced in rev 1.28, which causes kernel build to break for the case where DEBUG is defined but not DIAGNOSTIC. ffs_checkblk is declared conditionally on DIAGNOSTIC, not DEBUG. PR: 13314 Reviewed by: bde	1999-08-24 08:39:41 +00:00
bde	2a5ff1f726	Use devtoname() to print dev_t's instead of casting them to long or u_long for misprinting in %lx format.	1999-08-23 20:35:21 +00:00
peter	5c0287c834	Try and fix a dev_t/major/minor etc nit.	1999-05-12 22:32:07 +00:00
peter	73556bfee1	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.	1999-05-06 18:13:11 +00:00
bde	2facf6978a	Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().	1999-01-07 16:14:19 +00:00
bde	e5ba679f2a	Ifdefed the conditionally used variable `prtrealloc'. Declare it as volatile so that there is no chance that the code that it controls is optimised away.	1999-01-06 17:04:33 +00:00
dg	841cc6703a	Restored the "reallocblks" code to its former glory. What this does is basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working.	1998-11-13 01:01:44 +00:00
bde	4100b68615	Put the zombie ffs sysctl node in "notyet" state together with its few remaining children. Prepare it for MOUNT_UFS going away.	1998-09-07 11:50:19 +00:00
phk	4630814c8b	Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform device drivers about sectors no longer in use. Device-drivers receive the call through d_strategy, if they have D_CANFREE in d_flags. This allows flash based devices to erase the sectors and avoid pointlessly carrying them around in compactions. Reviewed by: Kirk Mckusick, bde Sponsored by: M-Systems (www.m-sys.com)	1998-09-05 14:13:12 +00:00
bde	4e2d834c27	Removed unused includes.	1998-08-17 19:09:36 +00:00
bde	f0b863f4b5	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
phk	9b703b1455	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde	1998-03-30 09:56:58 +00:00
julian	10c5ccc30a	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree	1998-03-08 09:59:44 +00:00
eivind	4547a09753	Back out DIAGNOSTIC changes.	1998-02-06 12:14:30 +00:00
eivind	c552a9a1c3	Turn DIAGNOSTIC into a new-style option.	1998-02-04 22:34:03 +00:00
bde	e5bce5c78c	Fix a small style bug in the generation number change (rev.1.33) before copying the change to other fs's.	1997-12-02 11:21:16 +00:00
bde	975c3797b1	Staticized.	1997-11-22 08:35:46 +00:00
bde	3304eb82bb	Unremoved prtrealloc and the declaration of ffs_clusteralloc(). These are used in the `#ifdef notyet' case :-). This case is used except in the `#if !defined (not_yes)' case :-\|. This has something to do with the `#ifdef notyet_block_reallocation_enabled' case in vfs_cluster.c :-(.	1997-11-22 07:00:40 +00:00
phk	4d26888936	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
phk	373a865574	Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)	1997-10-16 20:32:40 +00:00
phk	d166441755	VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)	1997-10-16 10:50:27 +00:00
phk	15a4cff98f	I think my previous change may have opened a race conditio. This patch does the same thing, with no change in semantics.	1997-10-14 18:46:48 +00:00
phk	2b4a6ad696	ufs_ihashrem() should not be called from the UFS layer, but from the lower layer (LFS/FFS/?) like the rest of the ihash functions. Otherwise it is impossible to make a lower layer that doesn't use the ihash facility.	1997-10-14 14:22:31 +00:00
phk	f6da000c05	[Regarding the previous patch] This is completely wrong. 1. ffs_alloc() actually allowed writing one block less one frag (normally 7 frags or 7/8 blocks) beyond the limit. 2. freebufspace() gives the free space in frags, but `size' is in bytes, so the change results in approximately `size' fragments too many being reserved. 3. ffs_realloccg() has the same bug but wasn't changed. PR: 3398 Submitted by: bde Eyeballed by: phk	1997-09-19 11:13:16 +00:00
phk	b24a8c6a9a	Ffs_alloc allow users to write one block beyond the limit. PR: 3398 Reviewed by: phk Submitted by: Wolfram Schneider <wosch@apfel.de>	1997-09-18 18:07:45 +00:00
bde	6ffb8bf9af	Removed unused #includes.	1997-09-02 20:06:59 +00:00
phk	b3f221cd7a	We got a couple of "map mismatch" panics from the following code. According to the crash dump, bpref is set to 445 and cgp->cg_nclusterblks is 444. Hence in the for loop, the test fails immediately but the following failure check (got == cgp->cg_nclusterblks) doesn't trigger because got > cgp->cg_nclusterblks. This wreaks havoc in the code after that. Fix: Move one source bit to the left :-) Noticed by: Mike Hibler <mike@fast.cs.utah.edu> Submitted by: Kirk McKusick <mckusick@McKusick.COM>	1997-08-04 07:30:43 +00:00
guido	c337c37259	Add generation number randomization. Newly created filesystems wil now automatically have random generation numbers. The kenel way of handling those also changed. Further it is advised to run fsirand on all your nfs exported filesystems. the code is mostly copied from OpenBSD, with the randomization chanegd to use /dev/urandom Reviewed by: Garrett Obtained from: OpenBSD	1997-03-23 20:08:22 +00:00
bde	0bc1781701	Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.	1997-03-22 06:53:45 +00:00
mpp	e016dd059c	Update a number of panic messages to reflect the actual name of the routine that caused the panic.	1997-03-09 06:00:44 +00:00
peter	94b6d72794	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
mpp	34c278f970	Correct the new Lite2 #ifdef DIAGNOSTIC ffs_checkblk routine to not return without setting a return value when it can't read a block error or detects a bad cylinder group, since the caller is expecting a return value. It will now panic at this point, since the thing to do in this case would be to return a "bad block" status to the caller, and the caller will panic anyways when that happens. Also updated to panic strings in this routine to read "ffs_checkblk: ..." instead of "checkblk: ...".	1997-02-10 17:05:30 +00:00
dyson	10f666af84	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>	1997-02-10 02:22:35 +00:00

1 2

77 Commits