freebsd-dev

Author	SHA1	Message	Date
dd	515b3c8c27	size_t is not a struct (fix mislabelling in a comment).	2002-10-02 05:15:34 +00:00
jmallett	4991b48519	When spamming me with a printf(9), under DIAGNOSTIC, at least be nice enough to include a newline. MFC after: 4 days Sponsored by: Bright Path Solutions	2002-09-28 19:04:49 +00:00
phk	2175ba2a2d	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
phk	7416ce6c57	Use our mount-credential if we get a NOCRED when we try to write out EA space back to disk. This is wrong in many ways, but not as wrong as a panic. Pancied on: rwatson & jmallet Sponsored by: DARPA & NAI Labs.	2002-09-27 20:00:03 +00:00
jeff	b145d94bc3	- Convert locks to use standard macros. - Lock access to the buflists. - Document broken locking. - Use vrefcnt().	2002-09-25 02:49:48 +00:00
jeff	8eeb859399	- Document broken locking. - Use vrefcnt().	2002-09-25 02:47:49 +00:00
phk	f1b5b39d15	We don't need to #include <sys/disklabel.h>. We don't need to #include <sys/disklabel.h> second time either. Sponsored by: DARPA & NAI Labs.	2002-09-20 16:42:33 +00:00
obrien	2b8e2c45e7	intmax_t is printed with %jd, not %lld.	2002-09-19 03:55:30 +00:00
njl	a4849fcf1b	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
phk	a1052e1e82	Implement the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods. Use extattr_check_cred() to check access to EAs. This is still a WIP. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:59:42 +00:00
bde	5100a085d0	Include <sys/malloc.h> instead of depending on namespace pollution 2 layers deep in <sys/proc.h> or <sys/vnode.h>. Include <sys/vmmeter.h> instead of depending on namespace pollution in <sys/pcpu.h>. Sorted includes as much as possible.	2002-09-05 09:43:24 +00:00
phk	88cf68c906	Correctly handle setting, getting and deleting EA's with zero length content. Sponsored by: DARPA & NAI Labs.	2002-08-30 08:57:09 +00:00
alc	0437141e85	o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.	2002-08-25 00:22:31 +00:00
phk	dc31df855c	Implement list of EA return functionality. Correctly delete EA's when the content length is set to zero. Sponsored by: DARPA & NAI Labs.	2002-08-20 11:34:58 +00:00
phk	17f38b1373	First snapshot of UFS2 EA support. Sponsored by: DARPA & NAI Labs.	2002-08-19 07:01:55 +00:00
phk	bcbdd3577c	Expand the arguments to ffs_ext{read,write}() to their component parts rather than use vop_{read,write}_args. Access to these functions will ultimately not be available through the "vop_{read,write}+IO_EXT" API but this functionality is retained for debugging purposes for now. Sponsored by: DARPA & NAI Labs.	2002-08-13 11:33:01 +00:00
phk	cb5c8c694f	Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is an UFS only thing, and FFS should in principle not know if it is enabled or not. This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c Sponsored by: DARPA and NAI Labs.	2002-08-13 10:33:57 +00:00
phk	b5da86a878	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.	2002-08-13 10:05:50 +00:00
phk	8200dbe2fd	Stop pretending that the FFS file ufs_readwrite.c is a UFS file. Instead of #including it, pull it into ffs_vnops.c and name things correctly. Sponsored by: DARPA & NAI Labs.	2002-08-12 10:32:56 +00:00
iedowse	fb0fc73e34	Don't call softdep_slowdown() if soft updates are not active on the filesystem. This causes a panic for kernels compiled without softupdates. Reported by: luigi	2002-08-05 17:59:20 +00:00
jeff	da7d3beee0	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
phk	b0e7dc01b7	I forgot this bit of uglyness in the fsck_ffs cleanup.	2002-07-31 07:01:18 +00:00
phk	b9ebc84981	Fix braino in last commit.	2002-07-30 12:02:41 +00:00
phk	4aae061a5c	Move ffs_isfreeblock() to ffs_alloc.c and make it static. Sponsored by: DARPA & NAI Labs.	2002-07-30 11:54:48 +00:00
benno	a11fa1ad3e	Add a missing argument to the stub for softdep_setup_freeblocks. Forgotten by: mckusick	2002-07-20 04:07:15 +00:00
peter	4c33ee9f15	Fix a warning: ffs_softdep.c:1630: warning: int format, different type arg (arg 2)	2002-07-20 01:09:35 +00:00
mckusick	9fe9a146c9	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
mckusick	0e74bce51c	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
trhodes	97e17c531d	Fix a type: s/your are/you are/	2002-07-12 19:56:31 +00:00
bde	f7264c2c84	Fixed some printf format errors (4 new ones reported by gcc and 5 nearby old ones not reported by gcc). This helps unbreak LINT.	2002-07-08 12:42:29 +00:00
iedowse	c9b544afdb	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
iedowse	864dc55250	Add the ffs bits necessary to support unloading of the ufs kernel module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case. Reviewed by: mckusick	2002-07-01 11:00:47 +00:00
iedowse	1bd3996837	Remove the kernel file-size limit for UFS2, so that only the limit imposed by the filesystem structure itself remains. With 16k blocks, the maximum file size is now just over 128TB. For now, the UFS1 file size limit is left unchanged so as to remain consistent with RELENG_4, but it too could be removed in the future. Reviewed by: mckusick	2002-06-26 18:34:51 +00:00
jlemon	ae81ab8edf	Prototype fixes (long newinum --> ino_t newinum).	2002-06-24 17:20:19 +00:00
mux	3d9fcd01da	Warning fixes for 64 bits platforms. This eliminates all the warnings I have had in the FFS code on sparc64. Reviewed by: mckusick	2002-06-23 18:17:27 +00:00
dillon	0d1889fa3a	Rename the BALLOC flags from B_* to BA_* to avoid confusion with the struct buf B_ flags. Approved by: mckusick	2002-06-23 06:12:22 +00:00
mckusick	8d943d9a95	This patch fixes a problem whereby filesystems that ran out of inodes in a cylinder group would fail to check for free inodes in other cylinder groups. This bug was introduced in the UFS2 code merge two days ago. An inode is allocated by calling ffs_valloc which calls ffs_hashalloc to do the filesystem scan. Ffs_hashalloc walks around the cylinder groups calling its passed allocator (ffs_nodealloccg in this case) until the allocator returns a non-zero result. The bug is that ffs_hashalloc expects the passed allocator function to return a 64-bit ufs2_daddr_t. When allocating inodes, it calls ffs_nodealloccg which was returning a 32-bit ino_t. The ffs_hashalloc code checked a 64-bit return value and usually found random non-zero bits in the high 32-bits so decided that the allocation had succeeded (in this case in the only cylinder group that it checked). When the result was passed back to ffs_valloc it looked at only the bottom 32-bits, saw zero and declared the system out of inodes. But ffs_hashalloc had really only checked one cylinder group. The fix is to change ffs_nodealloccg to return 64-bit results. Sponsored by: DARPA & NAI Labs. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Reviewed by: Maxime Henrion <mux@freebsd.org>	2002-06-22 21:24:58 +00:00
mckusick	0694ff55c2	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>	2002-06-21 06:18:05 +00:00
semenu	4767980b4a	Fix a typo in my recently added comment: s/beleived/believed/ Submitted by: keramida	2002-06-06 20:43:03 +00:00
semenu	0a2cef21b1	Remove lock from ffs_vget introduced by v1.24. Instead of locking the vnode creation globaly, we allow processes to create vnodes concurently. In case of concurent creation of vnode for the one ino, we allow processes to race and then check who wins. Assuming that concurent creation of vnode for same ino is really rare case, this is belived to be an improvement, as it just allows concurent creation of vnodes. Idea by: bp Reviewed by: dillon MFC after: 1 month	2002-05-30 22:04:17 +00:00
iedowse	67a7ae436f	Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u unions, since these were only necessary when ext2fs used ufs code. Reviewed by: mckusick	2002-05-18 18:51:14 +00:00
phk	ded0e74c64	Fix ufs_daddr_t/daddr_t type problems. Sponsored by: DARPA & NAI labs.	2002-05-17 18:59:53 +00:00
trhodes	bf5f8aaae5	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
phk	eb078c598f	Remove register keyword. Sponsored by: DARPA & NAI Labs. Submitted by: mckusick	2002-05-13 09:22:31 +00:00
phk	cd0ff99c54	ARGH! SBLOCK is not unused. Try to get this right. BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant). Define SBLOCK again, using the right math. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:21:40 +00:00
phk	135175475d	Remove #define for BBOFF, it is assumed == 0 so many places that we might as well forget about it. In fact the only thing which used it was the SBOFF macro. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:00:21 +00:00
phk	7f89ad4a41	Remove unused BBLOCK and SBLOCK #defines. Sponsored by: DARPA & NAI Labs.	2002-05-12 19:56:31 +00:00
phk	983c9d7ce7	Name ufs_vop_[gs]etextattr() consistently with the rest of our VOPs and put then in the ufs_vnops where they belong, rather than in the ffs_vnops. Ok'ed by: rwatson Sponsored by: DARPA & NAI Labs.	2002-05-03 08:40:33 +00:00
jeff	6e7ba7da0d	Don't peak into the malloc_type structure for limits. The desired vnodes check should be sufficient. This is required for the pending removal of malloc_type limits.	2002-04-15 03:35:35 +00:00
phk	36e7ec5e79	Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>. Sponsored by: DARPA & NAI Labs	2002-04-08 09:20:07 +00:00
jhb	5a4577c2f0	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
phk	5afaf0fe02	Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h> Sponsored by: DARPA & NAI Labs.	2002-04-03 20:39:27 +00:00
phk	ac70edf84a	Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl.	2002-04-02 11:23:14 +00:00
jhb	e3cbbf9804	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
bde	0cfbb0a37b	In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally provided the latter is nonzero. At this point, the former is a fairly arbitrary default value (DFTPHYS), so changing it to any reasonable value specified by the device driver is safe. Using the maximum of these limits broke ffs clustered i/o for devices whose si_iosize_max is < DFLTPHYS. Using the minimum would break device drivers' ability to increase the active limit from DFTLPHYS up to MAXPHYS. Copied the code for this and the associated (unnecessary?) fixup of mp_iosize_max to all other filesystems that use clustering (ext2fs and msdosfs). It was completely missing. PR: 36309 MFC-after: 1 week	2002-03-30 15:12:57 +00:00
alfred	acd70619ab	Remove __P.	2002-03-19 22:40:48 +00:00
bde	97b1d5e817	Fixed some printf format errors (hopefully all of the remaining daddr64_t ones for GENERIC, and all others on the same line as those). Reformat the printfs if necessary to avoid new long lones or old format printf errors.	2002-03-19 04:09:21 +00:00
mckusick	ee79f6a54d	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.	2002-03-17 01:25:47 +00:00
mckusick	11d31f8461	Introduce the new 64-bit size disk block, daddr64_t. Change the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.	2002-03-15 18:49:47 +00:00
obrien	c6db5d10d3	Quiet a warning on the Alpha.	2002-03-15 04:06:10 +00:00
mckusick	1e898731a1	This corrects the first of two known deadlock conditions that come from the presence of a snapshot file.	2002-03-14 01:21:13 +00:00
phk	3c1100894c	I missed one VOP_CLOSE in the previous commit. Pointed out by: bde	2002-03-11 16:27:04 +00:00
phk	44066f813f	As a XXX bandaid open the mounted device READ/WRITE even if we only mount read-only. The trouble here is that we don't reopen the device in read/write mode when we remount in read/write mode resulting in a filesystem sending write requests to a device which was only opened read/only. I'm not quite sure how such a reopen would best be done and defer the problem to more agile hackers.	2002-03-11 13:53:00 +00:00
jhb	1dc49d429f	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
jhb	8bda78899a	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
julian	5c85128ac6	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
mckusick	08a5a86637	Occationally background fsck would cause a spurious ``freeing free inode'' panic. This change corrects that problem by setting the fs_active flag when the inode map changes to notify the snapshot code that the cylinder group must be rescanned. Submitted by: Robert Watson <rwatson@FreeBSD.org>	2002-02-07 22:13:56 +00:00
mckusick	21f56a6f78	Occationally deleted files would hang around for hours or days without being reclaimed. This bug was introduced in revision 1.95 dealing with filenames placed in newly allocated directory blocks, thus is not present in 4.X systems. The bug is triggered when a new entry is made in a directory after the data block containing the original new entry has been written, but before the inode that references the data block has been written. Submitted by: Bill Fenner <fenner@research.att.com>	2002-02-07 00:54:32 +00:00
mckusick	9faad9c5e6	When taking a snapshot, we must check for active files that have been unlinked (e.g., with a zero link count). We have to expunge all trace of these files from the snapshot so that they are neither reclaimed prematurely by fsck nor saved unnecessarily by dump.	2002-02-02 01:42:44 +00:00
mckusick	a643d17fe3	Add a stub for softdep_request_cleanup() so that compilation without SOFTUPDATES option works properly. Submitted by: Benno Rice <benno@jeamland.net>	2002-01-23 02:18:56 +00:00
mckusick	339c35e22d	This patch fixes a long standing complaint with soft updates in which small and/or nearly full filesystems would fail with `file system full' messages when trying to replace a number of existing files (for example during a system installation). When the allocation routines are about to fail with a file system full condition, they make a call to softdep_request_cleanup() which attempts to accelerate the flushing of pending deletion requests in an effort to free up space. In the face of filesystem I/O requests that exceed the available disk transfer capacity, the cleanup request could take an unbounded amount of time. Thus, the softdep_request_cleanup() routine will only try for tickdelay seconds (default 2 seconds) before giving up and returning a filesystem full error. Under typical conditions, the softdep_request_cleanup() routine is able to free up space in under fifty milliseconds.	2002-01-22 06:17:22 +00:00
mckusick	d595c0925a	Fix a bug introduced in ffs_snapshot.c -r1.25 and fs.h -r1.26 which caused incomplete snapshots to be taken. When background fsck would run on these snapshots, the result would be files being incorrectly released which would subsequently panic the kernel with ``handle_workitem_freefile: inodedep survived'', ``handle_written_inodeblock: live inodedep'', and ``handle_workitem_remove: lost inodedep'' errors.	2002-01-17 08:33:32 +00:00
mckusick	55caac3c81	Put write on read-only filesystem panic after we have weeded out block and character devices, fifo's, etc. Submitted by: Bruce Evans <bde@zeta.org.au>	2002-01-16 04:59:09 +00:00
mckusick	11173c24fa	When downgrading a filesystem from read-write to read-only, operations involving file removal or file update were not always being fully committed to disk. The result was lost files or corrupted file data. This change ensures that the filesystem is properly synced to disk before the filesystem is down-graded. This delta also fixes a long standing bug in which a file open for reading has been unlinked. When the last open reference to the file is closed, the inode is reclaimed by the filesystem. Previously, if the filesystem had been down-graded to read-only, the inode could not be reclaimed, and thus was lost and had to be later recovered by fsck. With this change, such files are found at the time of the down-grade. Normally they will result in the filesystem down-grade failing with `device busy'. If a forcible down-grade is done, then the affected files will be revoked causing the inode to be released and the open file descriptors to begin failing on attempts to read. Submitted by: "Sam Leffler" <sam@errno.com>	2002-01-15 07:17:12 +00:00
alfred	d6fae34086	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
mckusick	f8ec12d8e9	When going to sleep, we must save our SPL so that it does not get lost if some other process uses the lock while we are sleeping. We restore it after we have slept. This functionality is provided by a new routine interlocked_sleep() that wraps the interlocking with functions that sleep. This function is then used in place of the old ACQUIRE_LOCK_INTERLOCKED() and FREE_LOCK_INTERLOCKED() macros. Submitted by: Debbie Chu <dchu@juniper.net>	2002-01-12 20:57:36 +00:00
mckusick	c96bbc0dd7	Must call drain_output() before checking the dirty block list in softdep_sync_metadata(). Otherwise we may miss dependencies that need to be flushed which will result in a later panic with the message ``vinvalbuf: dirty bufs''. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2002-01-11 19:59:27 +00:00
msmith	746c9274c4	Initialise the bioops vector hack at runtime rather than at link time. This avoids the use of common variables. Reviewed by: mckusick	2002-01-08 19:32:18 +00:00
dillon	7add2f35d4	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
mckusick	c8566412f3	Change the atomic_set_char to atomic_set_int and atomic_clear_char to atomic_clear_int to ease the implementation for the sparc64. Requested by: Jake Burkholder <jake@locore.ca>	2001-12-18 18:05:17 +00:00
iedowse	fe09bab338	Make sure we ignore the value of `fs_active' when reloading the superblock, and move the initialisation of it to beside where other pointer fields are initialised.	2001-12-16 18:54:09 +00:00
iedowse	c8ec3c90a3	Move the new superblock field `fs_active' into the region of the superblock that is already set up to handle pointer types. This fixes an accidental change in the superblock size on 64-bit platforms caused by revision 1.24.	2001-12-16 18:51:11 +00:00
mckusick	2b5acac851	Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.	2001-12-14 00:15:06 +00:00
mckusick	240246479c	When a file is partially truncated, we first check to see if the new file end will land in the middle of a file hole. Since the last block of a file must always be allocated, the hole is filled by allocating a block at that location. If the hole being filled is a direct block, then the truncation may eventually reduce the full sized block down to a fragment. When running with soft updates, it is necessary to FSYNC the file after allocating the block and before creating the fragment to avoid triggering a soft updates inconsistency when the block unexpectedly shrinks. Found by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2001-12-13 05:07:48 +00:00
dillon	aafb37b657	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
dillon	c11fc13d2b	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
rwatson	494f543d01	o Replace two direct uid!=0 comparisons with suser_xxx() calls. Obtained from: TrustedBSD Project	2001-10-02 14:41:43 +00:00
rwatson	43f61f37a3	o Replace two direct uid!=0 comparisons with suser_td() calls. Obtained from: TrustedBSD Project	2001-10-02 14:34:22 +00:00
jhb	66509c6872	- Fix some minor whitespace nits. - Move the SPECIAL_FLAG #define up next to the NOHOLDER #define and fix a little nit that caused it to be defined as -(sizeof (struct thread) + 1) instead of -2.	2001-09-27 21:04:13 +00:00
julian	aff3918a2c	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
iedowse	71b3fc3b44	The "dirpref" directory layout preference improvements make use of an array "fs_contigdirs[]" to avoid too many directories getting created in each cylinder group. The memory required for this and two other arrays (fs_csp[] and fs_maxcluster[]) is allocated with a single malloc() call, and divided up afterwards. However, the 'space' pointer is not advanced correctly, so fs_contigdirs and fs_maxcluster end up pointing to the same address. Add the missing code to advance the 'space' pointer, and remove an unnecessary update of the pointer that follows. This is likely to fix the "ffs_clusteralloc: map mismatch" panics that have been reported recently. Submitted by: Luke Mewburn <lukem@wasabisystems.com>	2001-09-09 23:48:28 +00:00
rwatson	dc12246dad	o At some point, unmounting a non-EA file system with EA's compiled in got a bit broken, when ufs_extattr_stop() was called and failed, ufs_extattr_destroy() would panic. This makes the call to destroy() conditional on the success of stop(). Submitted by: Christian Carstensen <cc@devcon.net> Obtained from: TrustedBSD Project	2001-09-01 20:11:05 +00:00
peter	b1cf415d98	Use a fixed type for times in on-disk structures for ufs rather than something that could potentially change like time_t.	2001-07-16 00:55:27 +00:00
jhb	62d93ab4ad	Fix more mntvnode and vnode interlock order reversals.	2001-06-28 22:21:33 +00:00
jhb	d19d170601	- Fix a mntvnode and vnode interlock reversal. - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.	2001-06-28 04:12:56 +00:00
peter	b38f2aba5e	Fix warning: 1973: warning: int format, long int arg (arg 5)	2001-06-15 07:44:39 +00:00
mckusick	ff309fb095	Build on the change in revision 1.98 by Tor.Egge@fast.no. The symptom being treated in 1.98 was to avoid freeing a pagedep dependency if there was still a newdirblk dependency referencing it. That change is correct and no longer prints a warning message when it occurs. The other part of revision 1.98 was to panic when a newdirblk dependency was encountered during a file truncation. This fix removes that panic and replaces it with code to find and delete the newdirblk dependency so that the truncation can succeed.	2001-06-13 23:13:13 +00:00
obrien	9ef33c8546	There seems to be a problem that the order of disk write operation being incorrect due to a missing check for some dependency. This change avoids the freelist corruption (but not the temporarily inconsistent state of the file system). A message is printed as a reminder of the under lying problem when a pagedep structure is not freed due to the NEWBLOCK flag being set. Submitted by: Tor.Egge@fast.no	2001-06-05 01:49:37 +00:00
jhb	80cd95fd1a	Revert the previous commit in favor of the fix in rev 1.42 of ufs/ffs/ffs_extern.h instead. Requested by: bde	2001-05-30 23:09:19 +00:00
jhb	38492f2308	Forward declare struct cg to quiet a warning. Submitted by: bde	2001-05-30 23:08:40 +00:00
jhb	38e1153eed	Include <ufs/ffs/fs.h> to get the definition of struct cg to quiet a warning.	2001-05-29 23:53:16 +00:00
phk	88aea0d7ff	Remove last vestiges of MFS.	2001-05-29 21:21:53 +00:00
mckusick	0f917f00a5	Update softdep_setup_directory_add prototype to reflect changes in actual function. Obtained from: Jim Bloom <bloom@jbloom.jbloom.org>	2001-05-20 15:59:55 +00:00
mckusick	576ed04ba3	Must ensure that all the entries on the pd_pendinghd list have been committed to disk before clearing them. More specifically, when free_newdirblk is called, we know that the inode claims the new directory block. However, if the associated pagedep is still linked onto the directory buffer dependency chain, then some of the entries on the pd_pendinghd list may not be committed to disk yet. In this case, we will simply note that the inode claims the block and let the pd_pendinghd list be processed when the pagedep is next written. If the pagedep is no longer on the buffer dependency chain, then all the entries on the pd_pending list are committed to disk and we can free them in free_newdirblk. This corrects a window of vulnerability introduced in the code added in version 1.95.	2001-05-19 19:24:26 +00:00
mckusick	6e83687b77	Must be a bit less aggressive about freeing pagedep structures. Obtained from: Robert Watson <rwatson@FreeBSD.org> and Matthew Jacob <mjacob@feral.com>	2001-05-18 22:16:28 +00:00
mckusick	a3c514dadf	When a new block is allocated to a directory, an fsync of a file whose name is within that block must ensure not only that the block containing the file name has been written, but also that the on-disk directory inode references that block. When a new directory block is created, we allocate a newdirblk structure which is linked to the associated allocdirect (on its ad_newdirblk list). When the allocdirect has been satisfied, the newdirblk structure is moved to the inodedep id_bufwait list of its directory to await the inode being written. When the inode is written, the directory entries are fully committed and can be deleted from their pagedep->id_pendinghd and inodedep->id_pendinghd lists.	2001-05-17 07:24:03 +00:00
iedowse	14dd931aa7	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp	2001-05-16 18:04:37 +00:00
mckusick	bed12dd525	Further fixes for deadlock in the presence of multiple snapshots. There are still more to find, but this fix should cover the common cases that folks are hitting.	2001-05-14 17:16:49 +00:00
mckusick	6f851aa835	Remove yet another deadlock case.	2001-05-11 07:12:03 +00:00
mckusick	f9ea5f696e	When running with soft updates, track the number of blocks and files that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.	2001-05-08 07:42:20 +00:00
mckusick	97d5d5bddb	Several fixes for units errors: 1) Do not assume that the superblock will be of size fs->fs_bsize. This fixes a panic when taking a snapshot on a filesystem with a block size bigger than 8K. 2) Properly calculate the number of fragments that follow the superblock summary information. This fixes a bug with inconsistent snapshots. 3) When cleaning up a snapshot that is about to be removed, properly calculate the number of blocks that need to be checked. This fixes a bug that created partially allocated inodes. 4) When moving blocks from a snapshot that is about to be removed to another snapshot, properly account for the reduced number of blocks in the snapshot from which they are taken. This fixes a bug in which the number of blocks released from a snapshot did not match the number that it claimed to have.	2001-05-08 07:29:03 +00:00
mckusick	c48e2c8b82	When syncing out snapshot metadata, we must temporarily allow recursive buffer locking so as to avoid locking against ourselves if we need to write filesystem metadata.	2001-05-08 07:13:00 +00:00
mckusick	4bd6a98c02	Refinement to revision 1.16 of ufs/ffs/ffs_snapshot.c to reduce the amount of time that the filesystem must be suspended. The current snapshot is elided as well as the earlier snapshots.	2001-05-04 05:49:28 +00:00
phk	e249b8764b	Remove blatantly pointless call to VOP_BMAP(). Use ufs_bmaparray() rather than VOP_BMAP() on our own vnodes.	2001-05-01 09:12:31 +00:00
phk	9f0c2a965a	Implement vop_std{get\|put}pages() and add them to the default vop[]. Un-copy&paste all the VOP_{GET\|PUT}PAGES() functions which do nothing but the default.	2001-05-01 08:34:45 +00:00
phk	aa58659992	VOP_BALLOC was never really a VOP in the first place, so convert it to UFS_BALLOC like the other "between UFS and FFS function interfaces".	2001-04-29 12:36:52 +00:00
phk	aa1facaa59	Remove faint traces of non-existant ffs_bmap().	2001-04-29 10:23:32 +00:00
grog	609bc7e870	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
mckusick	61ba5bd5a3	Rather than copying all the indirect blocks of the snapshot, simply mark them as BLK_NOCOPY. This trick cuts the initial size of the snapshot in half and cuts the time to take a snapshot by a third.	2001-04-26 00:50:53 +00:00
mckusick	472f7b8265	When closing the last reference to an unlinked file, it is freed by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended. For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written. Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.	2001-04-25 08:11:18 +00:00
phk	e5523c794f	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>	2001-04-25 07:07:52 +00:00
iedowse	b37cd4dec0	Pre-dirpref versions of fsck may zero out the new superblock fields fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause panics if these fields were zeroed while a filesystem was mounted read-only, and then remounted read-write. Add code to ffs_reload() which copies the fs_contigdirs pointer from the previous superblock, and reinitialises fs_avgf* if necessary. Reviewed by: mckusick	2001-04-24 00:37:16 +00:00
grog	405d532596	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
mckusick	4501b8e393	Add debugging option to always read/write cylinder groups as full sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'. Add debugging option to disable background writes of cylinder groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'. These debugging options should be tried on systems that are panicing with corrupted cylinder group maps to see if it makes the problem go away. The set of panics in question are: ffs_clusteralloc: map mismatch ffs_nodealloccg: map corrupted ffs_nodealloccg: block not in map ffs_alloccg: map corrupted ffs_alloccg: block not in map ffs_alloccgblk: cyl groups corrupted ffs_alloccgblk: can't find blk in cyl ffs_checkblk: partially free fragment The following panics are less likely to be related to this problem, but might be helped by these debugging options: ffs_valloc: dup alloc ffs_blkfree: freeing free block ffs_blkfree: freeing free frag ffs_vfree: freeing free inode If you try these options, please report whether they helped reduce your bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com> and to Matt Dillon <dillon@earth.backplane.com>.	2001-04-17 05:37:51 +00:00
mckusick	e06af95642	Background fsck sysctl operations must use vn_start_write and vn_finished_write so that they do not attempt to modify a suspended filesystem.	2001-04-17 05:06:37 +00:00
mckusick	9bf65b6596	Update to describe use of mdconfig instead of deprecated vnconfig. Submitted by: Steve Ames <steve@virtual-voodoo.com>	2001-04-14 18:32:09 +00:00
mckusick	5d96cc9cf4	This checkin adds support in ufs/ffs for the FS_NEEDSFSCK flag. It is described in ufs/ffs/fs.h as follows: /* * Filesystem flags. * * Note that the FS_NEEDSFSCK flag is set and cleared only by the * fsck utility. It is set when background fsck finds an unexpected * inconsistency which requires a traditional foreground fsck to be * run. Such inconsistencies should only be found after an uncorrectable * disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when * it has successfully cleaned up the filesystem. The kernel uses this * flag to enforce that inconsistent filesystems be mounted read-only. / #define FS_UNCLEAN 0x01 / filesystem not clean at mount / #define FS_DOSOFTDEP 0x02 / filesystem using soft dependencies / #define FS_NEEDSFSCK 0x04 / filesystem needs sync fsck before mount */	2001-04-14 05:26:28 +00:00
mckusick	d84cca13c9	Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved. ------ One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm. First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are: 1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35 2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50 You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html Test Results tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66 "old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref. ------ Algorithm description The old dirpref algorithm is described in comments: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. / A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent. What I mean by a big file system ? 1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache. The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations. My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: / * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. / My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files. My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array. The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are: int32_t fs_avgfilesize; / expected average file size / int32_t fs_avgfpdir; / expected # of files per directory */ These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache. I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down. Obtained from: Grigoriy Orlov <gluk@ptci.ru>	2001-04-10 08:38:59 +00:00
asmodai	de38bad452	Fix typo ); -> ,	2001-03-24 15:25:04 +00:00
mckusick	9ebaece818	Check that background fsck operation is being done on a ufs filesystem. Obtained from: Robert Watson <rwatson@FreeBSD.org>	2001-03-23 20:58:25 +00:00
mckusick	a768b3faf9	Add kernel support for running fsck on active filesystems.	2001-03-21 04:09:01 +00:00
mckusick	bdf6c34569	Clear the fs_clean flag only when the FS_UNCLEAN flag is not set (as is done in unmount). Remove a snapshot inode from the superblock list when its last name goes away rather than when its last reference goes away. That way it will be properly reclaimed by fsck after a crash rather than reenabled when the filesystem is mounted.	2001-03-21 04:05:20 +00:00
mckusick	684cbbdfad	Report the correct inode number when panicing with freeing free inode. Report the correct block number when panicing with freeing free block.	2001-03-21 04:01:02 +00:00
rwatson	bf17769feb	o Change options FFS_EXTATTR and options FFS_EXTATTR_AUTOSTART to options UFS_EXTATTR and UFS_EXTATTR_AUTOSTART respectively. This change reflects the fact that our EA support is implemented entirely at the UFS layer (modulo FFS start/stop/autostart hooks for mount and unmount events). This also better reflects the fact that [shortly] MFS will also support EAs, as well as possibly IFS. o Consumers of the EA support in FFS are reminded that as a result, they must change kernel config files to reflect the new option names. Obtained from: TrustedBSD Project	2001-03-19 04:35:40 +00:00
rwatson	37684f8763	o Implement "options FFS_EXTATTR_AUTOSTART", which depends on "options FFS_EXTATTR". When extended attribute auto-starting is enabled, FFS will scan the .attribute directory off of the root of each file system, as it is mounted. If .attribute exists, EA support will be started for the file system. If there are files in the directory, FFS will attempt to start them as attribute backing files for attributes baring the same name. All attributes are started before access to the file system is permitted, so this permits race-free enabling of attributes. For attributes backing support for security features, such as ACLs, MAC, Capabilities, this is vital, as it prevents the file system attributes from getting out of sync as a result of file system operations between mount-time and the enabling of the extended attribute. The userland extattrctl tool will still function exactly as previously. Files must be placed directly in .attribute, which must be directly off of the file system root: symbolic links are not permitted. FFS_EXTATTR will continue to be able to function without FFS_EXTATTR_AUTOSTART for sites that do not want/require auto-starting. If you're using the UFS_ACL code available from www.TrustedBSD.org, using FFS_EXTATTR_AUTOSTART is recommended. o This support is implemented by adding an invocation of ufs_extattr_autostart() to ffs_mountfs(). In addition, several new supporting calls are introduced in ufs_extattr.c: ufs_extattr_autostart(): start EAs on the specified mount ufs_extattr_lookup(): given a directory and filename, return the vnode for the file. ufs_extattr_enable_with_open(): invoke ufs_extattr_enable() after doing the equililent of vn_open() on the passed file. ufs_extattr_iterate_directory(): iterate over a directory, invoking ufs_extattr_lookup() and ufs_extattr_enable_with_open() on each entry. o This feature is not widely tested, and therefore may contain bugs, caution is advised. Several changes are in the pipeline for this feature, including breaking out of EA namespaces into subdirectories of .attribute (this is waiting on the updated EA API), as well as a per-filesystem flag indicating whether or not EAs should be auto-started. This is required because administrators may not want .attribute auto-started on all file systems, especially if non-administrators have write access to the root of a file system. Obtained from: TrustedBSD Project	2001-03-14 05:32:31 +00:00
mckusick	f01ec099d7	Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).	2001-03-07 07:09:55 +00:00
mckusick	1e6e2e153e	Free lock before returning from process_worklist_item. Obtained from: Constantine Sapuntzakis <csapuntz@stanford.edu>	2001-03-01 21:43:46 +00:00
adrian	8650fa6cdb	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.	2001-03-01 21:00:17 +00:00
mckusick	bc35986d9d	Free lock before calling panic so that subsequent attempt to write out buffers does not re-panic with `locking against myself'. This change should not affect normal operations of soft updates in any way.	2001-02-23 09:01:31 +00:00
mckusick	3c351c2d02	When cleaning up excess inode dependencies, check for being done. Reviewed by: Jan Koum <jkb@yahoo-inc.com>	2001-02-22 10:17:57 +00:00
mckusick	b5b51d050c	This patch corrects two problems with the rate limiting code that was introduced in revision 1.80. The problem manifested itself with a `locking against myself' panic and could also result in soft updates inconsistences associated with inodedeps. The two problems are: 1) One of the background operations could manipulate the bitmap while holding it locked with intent to create. This held lock results in a `locking against myself' panic, when the background processing that we have been coopted to do tries to lock the bitmap which we are already holding locked. To understand how to fix this problem, first, observe that we can do the background cleanups in inodedep_lookup only when allocating inodedeps (DEPALLOC is set in the call to inodedep_lookup). Second observe that calls to inodedep_lookup with DEPALLOC set can only happen from the following calls into the softdep code: softdep_setup_inomapdep softdep_setup_allocdirect softdep_setup_remove softdep_setup_freeblocks softdep_setup_directory_change softdep_setup_directory_add softdep_change_linkcnt Only the first two of these can come from ffs_alloc.c while holding a bitmap locked. Thus, inodedep_lookup must not go off to do request_cleanups when being called from these functions. This change adds a flag, NODELAY, that can be passed to inodedep_lookup to let it know that it should not do background processing in those cases. 2) The return value from request_cleanup when helping out with the cleanup was 0 instead of 1. This meant that despite the fact that we may have slept while doing the cleanups, the code did not recheck for the appearance of an inodedep (e.g., goto top in inodedep_lookup). This lead to the softdep inconsistency in which we ended up with two inodedep's for the same inode. Reviewed by: Peter Wemm <peter@yahoo-inc.com>, Matt Dillon <dillon@earth.backplane.com>	2001-02-20 11:14:38 +00:00
asmodai	d8144cf92b	Preceed/preceeding are not english words. Use precede and preceding.	2001-02-18 10:43:53 +00:00
jake	45e3d6a42c	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
bmilekic	e67bcfcaf3	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
phk	780a12ba92	Another round of the <sys/queue.h> FOREACH transmogriffer. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 16:08:18 +00:00
phk	8363061cc7	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
phk	5888a3757b	Use <sys/queue.h> macro API.	2001-02-04 12:37:48 +00:00
dillon	fa650a9da8	Fix a race between the syncer and umount. When you umount a softupdates filesystem softdep_process_worklist() is called in a loop until it indicates that no dependancies remain, but the determination of that fact depends on there only being one softdep_process_worklist() instance running. It was possible for the syncer to also be running softdep_process_worklist() and the pre-existing checks in the code to prevent this were not sufficient to prevent the race. This patch solves the problem. Approved-by: mckusick	2001-01-30 06:31:59 +00:00
jasone	cd97c8f6f2	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
iedowse	5c5ba0de81	The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem. Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer. With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels. Reviewed by: mckusick	2001-01-15 18:30:40 +00:00

1 2 3 4 5 ...

607 Commits