freebsd-nq

Author	SHA1	Message	Date
Kirk McKusick	bd189c8c3e	When expunging unlinked files from a snapshot, skip over holes in the file rather than panicing with "indiracct: botched params". Submitted by: Mark Santcroos <marks@ripe.net>	2003-10-17 13:57:58 +00:00
Jeff Roberson	a844eb934c	- My last commit to this file is still not safe, I believe that it may be due to the recursion in indir_trunc().	2003-10-06 03:28:03 +00:00
Jeff Roberson	8af6a57099	- Reinstate 1.142 this was fixed by 1.144.	2003-10-06 02:39:37 +00:00
Jeff Roberson	69b609a85d	- The VCHR case in ffs_sync() is an unneccsary optimization especially considering how infrequently we access devices via ffs now that we have devfs. Collapse this case with the other case. Obtained from: bde	2003-10-05 22:56:33 +00:00
Jeff Roberson	ab1f917b53	- Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE() so make the code slightly more uniform. The vnode lock is acquired in all cases and now the only difference between VCHR and other is we call UFS_UPDATE instead of VOP_FSYNC().	2003-10-05 09:42:24 +00:00
Jeff Roberson	cffa37d466	- In ffs_update() assert that either the vnode lock or the XLOCK is held.	2003-10-05 09:39:02 +00:00
Jeff Roberson	2f05568aa8	- Check the XLOCK before inspecting v_data. - Slightly rewrite the fsync loop to be more lock friendly. We must acquire the vnode interlock before dropping the mnt lock. We must also check XLOCK to prevent vclean() races. - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean() races. - Use a local variable to store the results of the nvp == TAILQ_NEXT test so that we do not access the vp after we've vrele()d it. - Add an XXX comment about UFS_UPDATE() not being protected by any lock here. I suspect that it should need the VOP lock.	2003-10-05 07:16:45 +00:00
Jeff Roberson	53938b4a86	- Skip over xvp if XLOCK is set.	2003-10-05 06:48:37 +00:00
Alan Cox	ccf78b6895	Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-04 20:38:32 +00:00
Jeff Roberson	cac3558da3	- The VI assert in getdirtybuf() is only valid if we're not on a VCHR vnode. VCHR vnodes don't do background writes. Reported by: kan	2003-10-04 15:57:05 +00:00
Jeff Roberson	04a17687ea	- Increase the scope of the interlock in ffs_reload(). Acquire it before we release the mntvnode_mtx. - Call vgonel() directly instead of going through vrecycle() since we own the interlock now. - Remove a few cases where we locked the interlock just so that we could call VOP_UNLOCK with interlock held.	2003-10-04 14:27:49 +00:00
Jeff Roberson	934914d2ef	- Fix an unlocked call to GETATTR by slightly shuffling the code in ffs_snapshot() around. - Acquire the interlock before releasing the mntvnode_mtx. Use the interlock to protect v_usecount access.	2003-10-04 14:25:45 +00:00
Jeff Roberson	04c81ad83c	- Remove a mp_fixme() and some locks that weren't necessary. I now understand how this works.	2003-10-04 11:06:43 +00:00
Jeff Roberson	cfd5600c66	- Several of the callers to getdirtybuf() were erroneously changed to pass in a list head instead of a pointer to the first element at the time of the first call. These lists are subject to change, and getdirtybuf() would refetch from the wrong list in some cases. Spottedy by: tegge Pointy hat to: me	2003-09-03 04:08:15 +00:00
Jeff Roberson	23efe6dafc	- Backout rev 1.142. This caused a deadlock that I do not understand. More investigation is required.	2003-08-31 11:26:52 +00:00
Jeff Roberson	d919a11d06	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
Jeff Roberson	a0ebaaddef	- Don't acquire the vnode interlock in drain_output(). Instead, require the caller to acquire it. This permits drain_output() to be done atomically with other operations as well as reducing the number of lock operations. - Assert that the proper locks are held in drain_output(). - Change getdirtybuf() to accept a mutex as an argument. This mutex is used to protect the vnode's buf list and the BKGRDWAIT flag. This lock is dropped when we successfully acquire a buffer and held on return otherwise. These semantics reduce the number of cumbersome cases in calling code. - Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF case of interlocked_sleep(). - Change the return value of getdirtybuf() to be the resulting locked buffer or NULL otherwise. This is for callers who pass in a list head that requires a lock. It is necessary since the lock that protects the list head must be dropped in getdirtybuf() so that we don't have a lock order reversal with the buf queues lock in bremfree(). - Adjust all callers of getdirtybuf() to match the new semantics. - Add a comment in indir_trunc() that points at unlocked access to a buf. This may also be one of the last instances of incore() in the tree.	2003-08-31 07:29:34 +00:00
Jeff Roberson	9dbfeb0ae6	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
Alan Cox	9cf8f2f707	The previous change necessitates the addition of a new #include. Otherwise, there is a compilation warning.	2003-08-18 17:27:08 +00:00
Poul-Henning Kamp	b103854847	Don't use a VOP_*() function on our own vnodes, go directly to the relevant internal function, in this case ufs_bmaparray().	2003-08-17 19:26:03 +00:00
Alan Cox	f6c098e569	Revision 1.44 of ufs/ufs/inode.h has made it necessary to add two new #includes to this file. Otherwise, it doesn't compile.	2003-08-16 06:15:17 +00:00
Poul-Henning Kamp	5c24d6ee26	Eliminate the i_devvp field from the incore UFS inodes, we can get the same value from ip->i_ump->um_devvp. This saves a pointer in the memory copies of inodes, which can easily run into several hundred kilobytes. The extra indirection is unmeasurable in benchmarks. Approved by: mckusick	2003-08-15 20:03:19 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
Robert Watson	9080ff25cf	Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the kernel ACL interfaces and system call names. Break out UFS2 and FFS extattr delete and list vnode operations from setextattr and getextattr to deleteextattr and listextattr, which cleans up the implementations, and makes the results more readable, and makes the APIs more clear. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-07-28 18:53:29 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
Alan Cox	4e28b22e35	Lock the vm object when freeing pages.	2003-06-15 21:50:38 +00:00
Poul-Henning Kamp	cefb5754dd	Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.	2003-06-15 18:53:00 +00:00
Poul-Henning Kamp	7652131bee	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk	2003-06-12 20:48:38 +00:00
David E. O'Brien	f4636c5959	Use __FBSDID().	2003-06-11 06:34:30 +00:00
Robert Watson	1e9e2eb598	Implement ffs_listextattr() by breaking out that logic and special-cased attribute name of "" from ffs_getextattr(). Invoking VOP_GETETATTR() with an empty name is now no longer supported; user application compatibility is provided by a system call level compatibility wrapper. We make sure to explicitly reject attempts to set an EA with the name "". Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 05:57:39 +00:00
Robert Watson	e1249def7d	Return EOPNOTSUPP for attempted EA operations on VCHR vnodes in UFS2; if we permit them to occur, the kernel panics due to our performing EA operations using VOP_STRATEGY on the vnode. This went unnoticed previously because there are very for users of device nodes on UFS2 due to the introduction of devfs. However, this can come up with the Linux compat directories and its hard-coded dev nodes (which will need to go away as we move away from hard-coded device numbers). This can come up if you use EA-intensive features such as ACLs and MAC. The proper fix is pretty complicated, but this band-aid would be an excellent MFC candidate for the release.	2003-06-01 02:42:18 +00:00
Poul-Henning Kamp	6280ed26af	Remove unused local variables. Found by: FlexeLint	2003-05-31 18:17:32 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Alan Cox	7f758dabbb	Lock the vm object when performing vm_object_page_clean(). Approved by: re (rwatson)	2003-05-18 22:02:51 +00:00
Alan Cox	ad682c4825	Lock the vm_object on entry to vm_object_vndeallocate().	2003-05-03 20:28:26 +00:00
Tim J. Robbins	3632928957	Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree(). These fields can be left as NULL if ffs_vget() allocates an inode but fails before the dinode memory has been allocated. There are two cases when this can occur: when we lose a race and another process has added the inode to the hash, and when reading the inode off disk fails. The bug was observed by Kris on one of the package-building machines. See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2 In Kris's case, it was the bread() that failed because of a disk error. The alternative to this patch is to ensure that ffs_vget() does not call vput() when the inode that hasn't been properly initialised.	2003-05-01 06:41:59 +00:00
Tim J. Robbins	8d721e877d	Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems. This is purely a cosmetic change because these members are in a union together.	2003-05-01 06:38:27 +00:00
Mark Murray	51da11a27a	Fix some easy, global, lint warnings. In most cases, this means making some local variables static. In a couple of cases, this means removing an unused variable.	2003-04-30 12:57:40 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
John Baldwin	a15cc35909	Lock both the proc lock and sched_lock when calling sched_nice since kg_nice is now protected by both. Being protected by both means that other places in the kernel that want to read kg_nice only need one of the two locks.	2003-04-22 20:45:38 +00:00
Jeff Roberson	86711bae9b	- Use the sched_nice() api instead of setting the nice value directly. Tested by: Steve Kargl <sgk@troutmask.apl.washington.edu>	2003-04-12 01:05:19 +00:00
Alan Cox	6134838f99	Sufficient access checks are performed by vmapbuf() that calling useracc() is pointless. Remove the call to useracc(). Don't reinitialize fields that are already initialized by getpbuf(). Reviewed by: tegge	2003-04-06 19:26:30 +00:00
Tor Egge	5e2e6a67c4	Check return value from vmapbuf instead of the function address.	2003-03-27 20:48:34 +00:00
Tor Egge	10dccf8ff2	Eliminate a buffer sleep/wakeup race.	2003-03-27 19:28:11 +00:00
Tor Egge	5bbb806004	Add support for reading directly from file to userland buffer when the O_DIRECT descriptor status flag is set and both offset and length is a multiple of the physical media sector size.	2003-03-26 23:40:42 +00:00
John Baldwin	31566c96f4	Use td->td_ucred instead of td->td_proc->p_ucred.	2003-03-20 21:17:40 +00:00
John Baldwin	2a53bfbe62	Minor fixes to ffs_fserr(): - Assume that curthread is not NULL. It never is in -current. - Use td_ucred instead of p_ucred.	2003-03-20 21:15:54 +00:00
Poul-Henning Kamp	b4b138c27f	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.	2003-03-18 08:45:25 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Kirk McKusick	34968037b1	Use the appropriate size when zeroing out the unused portion of a snapshot's copy of a superblock. This patch fixes a panic when taking a snapshot of a 4096/512 filesystem. Reported by: Ian Freislich <ianf@za.uu.net> Sponsored by: DARPA & NAI Labs.	2003-03-07 23:49:16 +00:00
Alan Cox	09c80124a3	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Kirk McKusick	74f3809a19	Change the field used to test whether the superblock has been updated from the filesystem size field to the filesystem maximum blocksize field. The problem is that older versions of growfs updated only the new size field and not the old size field. This resulted in the old (smaller) size field being copied up to the new size field which caused the filesystem to appear to fsck to be badly trashed. This also adds a sanity check to ensure that the superblock is not being updated when the filesystem is mounted read-only. Obviously such an update should never happen. Reported by: Nate Lawson <nate@root.org> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:21:08 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Kirk McKusick	3bf0ed940b	When removing the last item from a non-empty worklist, the worklist tail pointer must be updated. Reported by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs.	2003-02-24 07:28:41 +00:00
Kirk McKusick	5bb651cb72	This patch fixes a deadlock between the bufdaemon and a process taking a snapshot. As part of taking a snapshot of a filesystem, the kernel builds up a list of the filesystem metadata (such as the cylinder group bitmaps) that are contained in the snapshot. When doing a copy-on-write check, the list is first consulted. If the block being written is found on the list, then the full snapshot lookup can be avoided. Besides providing an important performance speedup this check also avoids a potential deadlock between the code creating the snapshot and the bufdaemon trying to cleanup snapshot related buffers. This fix creates a temporary list containing the key metadata blocks that can cause the deadlock. This temporary list is used between the time that the snapshot is first enabled and the time that the fully complete list is built. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-22 00:59:34 +00:00
Kirk McKusick	37e2ebfdba	This patch fixes a bug on an active filesystem on which a snapshot is being taken from panicing with either "freeing free block" or "freeing free inode". The problem arises when the snapshot code is scanning the filesystem looking for inodes with a reference count of zero (e.g., unlinked but still open) so that it can expunge them from its view. If it encounters a reclaimed vnode and has to restart its scan, then it will panic if it encounters and tries to free an inode that it has already processed. The fix is to check each candidate inode to see if it has already been processed before trying to delete it from the snapshot image. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:29:51 +00:00
Kirk McKusick	d60682c239	This patch fixes a bug in the logical block calculation macros so that they convert to 64-bit values before shifting rather than afterwards. Once fixed, they can be used rather than inline expanded. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:19:26 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Kirk McKusick	aca3e4974f	Replace use of random() with arc4random() to provide less guessable values for the initial inode generation numbers in newfs and for newly allocated inode generation numbers in the kernel. Submitted by: Theo de Raadt <deraadt@cvs.openbsd.org> Sponsored by: DARPA & NAI Labs.	2003-02-14 21:31:58 +00:00
Kirk McKusick	50bd54e391	Correct lines incorrectly added to the copyright message. Submitted by: Frank van der Linden <fvdl@wasabisystems.com> Sponsored by: DARPA & NAI Labs.	2003-02-14 00:31:06 +00:00
Jeff Roberson	767b9a529d	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h	2003-02-09 11:28:35 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Matthew Dillon	48e3128b34	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.	2003-01-13 00:33:17 +00:00
Matthew Dillon	cd72f2180b	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.	2003-01-12 01:37:13 +00:00
Marcel Moolenaar	cc4a858397	o Improve wording of the comment that accompanies fs_pad. The padding is not specific to non-i386 architectures. It is caused by non-i386 specific alignment requirements of fs_swuid, o Add a CTASSERT to catch a change in the size of struct fs at compile-time rather than run-time. Ok'd: gordon Tested on: i386 ia64	2003-01-10 06:59:34 +00:00
Gordon Tetlow	963cae780f	Fix superblock alignment problems on non-i386 platforms. Also change fs_uuid to fs_swuid, making it more descriptive. Submitted by: marcel Reviewed by: peter Pointy hat to: gordon	2003-01-09 23:53:30 +00:00
Gordon Tetlow	291871da9e	Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volname will be used to support volume names with the help of a GEOM module (to be committed). uuid will be used to deal with conflicting volume names (which doesn't work just yet). Approved by: mckusick@	2003-01-08 22:53:54 +00:00
Kirk McKusick	fa06a012cd	This patch fixes a problem caused by applications that rapidly and repeatedly truncate the same file. Each time the file is truncated, a buffer is grabbed to store the indirect block numbers that need to be freed. Those blocks cannot be freed until the inode claiming them is written to disk. Thus, the number of buffers being held by soft updates explodes and in extreme cases can run the kernel out of buffers. The problem can be avoided by doing an fsync on the file every debug.maxindirdep truncates (currently defaulted to 50). The fsync causes the inode to be written so that the held buffers can be freed. The check for excessive buffers is checked as part of the existing hook for excessive dependencies (softdep_slowdown) in the truncate code. Reported by: David Schultz <dschultz@uclink.Berkeley.EDU> Sponsored by: DARPA & NAI Labs. MFC after: 3 weeks	2003-01-07 18:23:50 +00:00
Poul-Henning Kamp	862702306b	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Alfred Perlstein	13438f6823	When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.	2003-01-01 01:56:19 +00:00
Poul-Henning Kamp	aa4d7a8a4b	Use three UMA zones for FFS/UFS inodes instead of malloc space. Since inodes are currently 144 bytes, this will save 112 bytes per inode. This can amount to up to 10MByte on large systems.	2002-12-27 11:05:05 +00:00
Poul-Henning Kamp	de6ba7c016	Move the allocation of the inode contents into ffs_vfsops.c rather than passing malloc types around.	2002-12-27 10:23:03 +00:00
Poul-Henning Kamp	975512a907	Make ffs_mountfs() static. Remove the malloctype from the ufs mount structure, instead add a callback to the storage method for freeing inodes: UFS_IFREE(). Add vfs_ifree() method function which frees an inode. Unvariablelize the malloc type used for allocating inodes.	2002-12-27 10:06:37 +00:00
Kirk McKusick	4c572f6222	Fix corruption introduced in previous delta. Reported by: Aurelien Nephtali <aurelien.nephtali@wanadoo.fr> Sponsored by: DARPA & NAI Labs.	2002-12-18 19:50:28 +00:00
Kirk McKusick	6d967351b4	Keep comments consistent with the code. Minor optimization. Sponsored by: DARPA & NAI Labs.	2002-12-18 07:19:41 +00:00
Kirk McKusick	c021e44776	Cosmetic cleanup of unsigned buglets. Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.	2002-12-18 00:53:45 +00:00
Poul-Henning Kamp	120a6d842a	Remove unused lockcnt variable. Approved by: mckusick	2002-12-17 20:23:51 +00:00
Kirk McKusick	8efcd9a794	Update to previous change (1.54) to use an approperly wide inode field so as to work correctly on 64-bit platforms. Reported-by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs. Approved by: Ian Dowse <iedowse@maths.tcd.ie>	2002-12-15 19:25:59 +00:00
Kirk McKusick	0db138a6b0	Only the most recent snapshot contains the complete list of blocks that were copied in all of the earlier snapshots, thus its precomputed list must be used in the copyonwrite test. Using incomplete lists may lead to deadlock. Also do not include the blocks used for the indirect pointers in the indirect pointers as this may lead to inconsistent snapshots. Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-14 01:36:59 +00:00
Tom Rhodes	1626155b82	Remove the comment about dump(8) not working properly with snapshots. Discussed with: mckusick Approved by: re (rwatson)	2002-12-12 00:31:45 +00:00
Kirk McKusick	8d6754f289	More tightly verify the preference returned for the new inode. Submitted by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-06 02:08:46 +00:00
Kirk McKusick	0cb652d925	Have to use bread() rather than UFS_BALLOC() when obtaining a previously allocated block as the previous use of the block may have fallen out of the cache. Failure to reread its contents cause zeroed results to be written instead of the proper contents. Conversely, when the block is going to be entirely filled in, it is not necessary reread the old contents. Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-03 18:19:27 +00:00
Kirk McKusick	31574422a3	Add a check to disable the previous patch so that future filesystems that choose to place their superblocks in non-standard locations will not get them smashed. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:04:57 +00:00
Kirk McKusick	c6964d3bc9	Remove a race condition / deadlock from snapshots. When converting from individual vnode locks to the snapshot lock, be sure to pass any waiting processes along to the new lock as well. This transfer is done by a new function in the lock manager, transferlockers(from_lock, to_lock); Thanks to Lamont Granquist <lamont@scriptkiddie.org> for his help in pounding on snapshots beyond all reason and finding this deadlock. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:00:51 +00:00
Kirk McKusick	63cf5b0ee2	Fix two deadlocks in snapshots: 1) Release the snapshot file lock while suspending the system. Otherwise a process trying to read the lock may block on its containing directory preventing the suspension from completing. Thanks to Sean Kelly <smkelly@zombie.org> for finding this deadlock. 2) Replace some bdwrite's with bawrite's so as not to fill all the buffers with dirty data. The buffers could not be cleaned as the snapshot vnode was locked hence the system could deadlock when making snapshots of really massive filesystems. Thanks to Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring this out. Sponsored by: DARPA & NAI Labs.	2002-11-30 07:27:12 +00:00
Kirk McKusick	fa5d33e242	Check to make sure that the fs_sblockloc field was properly updated before using it to write the superblock. This is to guard against accidentally trashing the disklabel if the superblock format missed being upgraded by the new kernel. Reported by: Sam Leffler <sam@errno.com> Sponsored by: DARPA & NAI Labs. Approved by: Murray Stokely <murray@FreeBSD.org>	2002-11-29 19:20:15 +00:00
Kirk McKusick	ada981b228	Create a new 32-bit fs_flags word in the superblock. Add code to move the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel. Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved. Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.	2002-11-27 02:18:58 +00:00
Kirk McKusick	f5235f70a4	The target for the maximum number of dependencies has been cut in half because of reports that under heavy load the kernel could exhaust its memory pool. The limit is now (desiredvnodes * 4) rather than (desiredvnodes * 8), so it will still scale with larger systems, just not as quickly. Sponsored by: DARPA & NAI Labs.	2002-11-20 05:16:11 +00:00
Kirk McKusick	3374bb5ad6	If an error occurs while writing a buffer, then the data will not have hit the disk and the dependencies cannot be unrolled. In this case, the system will mark the buffer as dirty again so that the write can be retried in the future. When the write succeeds or the system gives up on the buffer and marks it as invalid (B_INVAL), the dependencies will be cleared. Sponsored by: DARPA & NAI Labs.	2002-11-20 05:14:16 +00:00
Peter Wemm	cdf5e9ccb6	Do not assume that time_t is an int. Approved by: re (jhb)	2002-11-15 22:36:57 +00:00
Robert Watson	763bbd2f4f	Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-26 14:38:24 +00:00
Kirk McKusick	9ab73fd11a	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.	2002-10-25 00:20:37 +00:00
Kirk McKusick	c0762674c9	We must be careful to avoid recursive copy-on-write faults when trying to clean up during disk-full senarios. Sponsored by: DARPA & NAI Labs.	2002-10-23 21:47:02 +00:00
Kirk McKusick	2eff16f057	Missplaced FREE_LOCK causes a panic when hit while taking a snapshot. Sponsored by: DARPA & NAI Labs.	2002-10-23 05:14:06 +00:00
Kirk McKusick	0152387ade	This update further fine tunes the locking of snapshot vnodes in the ffs_copyonwrite routine to avoid a deadlock between the syncer daemon trying to sync out a snapshot vnode and the bufdaemon trying to write out a buffer containing the snapshot inode. With any luck this will be the last snapshot race condition. Sponsored by: DARPA & NAI Labs.	2002-10-22 01:23:00 +00:00
Kirk McKusick	127ab960d5	This update is a performance improvement when allocating blocks on a full filesystem. Previously, if the allocation failed, we had to fsync the file before rolling back any partial allocation of indirect blocks. Most block allocation requests only need to allocate a single data block and if that allocation fails, there is nothing to unroll. So, before doing the fsync, we check to see if any rollback will really be necessary. If none is necessary, then we simply return. This update eliminates the flurry of disk activity that got triggered whenever a filesystem would run out of space. Sponsored by: DARPA & NAI Labs.	2002-10-22 01:14:25 +00:00
Kirk McKusick	e03486d198	This checkin reimplements the io-request priority hack in a way that works in the new threaded kernel. It was commented out of the disksort routine earlier this year for the reasons given in kern/subr_disklabel.c (which is where this code used to reside before it moved to kern/subr_disk.c): ---------------------------- revision 1.65 date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0 Comment out Kirks io-request priority hack until we can do this in a civilized way which doesn't cause grief. The problem is that it is not generally safe to cast a "struct bio " to a "struct buf ". Things like ccd, vinum, ata-raid and GEOM constructs bio's which are not entrails of a struct buf. Also, curthread may or may not have anything to do with the I/O request at hand. The correct solution can either be to tag struct bio's with a priority derived from the requesting threads nice and have disksort act on this field, this wouldn't address the "silly-seek syndrome" where two equal processes bang the diskheads from one edge to the other of the disk repeatedly. Alternatively, and probably better: a sleep should be introduced either at the time the I/O is requested or at the time it is completed where we can be sure to sleep in the right thread. The sleep also needs to be in constant timeunits, 1/hz can be practicaly any sub-second size, at high HZ the current code practically doesn't do anything. ---------------------------- As suggested in this comment, it is no longer located in the disk sort routine, but rather now resides in spec_strategy where the disk operations are being queued by the thread that is associated with the process that is really requesting the I/O. At that point, the disk queues are not visible, so the I/O for positively niced processes is always slowed down whether or not there is other activity on the disk. On the issue of scaling HZ, I believe that the current scheme is better than using a fixed quantum of time. As machines and I/O subsystems get faster, the resolution on the clock also rises. So, ten years from now we will be slowing things down for shorter periods of time, but the proportional effect on the system will be about the same as it is today. So, I view this as a feature rather than a drawback. Hence this patch sticks with using HZ. Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>	2002-10-22 00:59:49 +00:00
Matthew Dillon	1b7e3dafdf	Fix a file-rewrite performance case for UFS[2]. When rewriting portions of a file in chunks that are less then the filesystem block size, if the data is not already cached the system will perform a read-before-write. The problem is that it does this on a block-by-block basis, breaking up the I/Os and making clustering impossible for the writes. Programs such as INN using cyclic file buffers suffer greatly. This problem is only going to get worse as we use larger and larger filesystem block sizes. The solution is to extend the sequential heuristic so UFS[2] can perform a far larger read and readahead when dealing with this case. (note: maximum disk write bandwidth is 27MB/sec thru filesystem) (note: filesystem blocksize in test is 8K (1K frag)) dd if=/dev/zero of=test.dat bs=1k count=2m conv=notrunc Before: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 14.21 598 8.30 0.00 0 0.00 0.00 0 0.00 0 0 7 1 92 0 76 14.09 813 11.19 0.00 0 0.00 0.00 0 0.00 0 0 9 5 86 0 76 14.28 821 11.45 0.00 0 0.00 0.00 0 0.00 0 0 8 1 91 After: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 63.62 434 26.99 0.00 0 0.00 0.00 0 0.00 0 0 18 1 80 0 76 63.58 424 26.30 0.00 0 0.00 0.00 0 0.00 0 0 17 2 82 0 76 63.82 438 27.32 0.00 0 0.00 0.00 0 0.00 1 0 19 2 79 Reviewed by: mckusick Approved by: re X-MFC after: immediately (was heavily tested in -stable for 4 months)	2002-10-18 22:52:41 +00:00
Kirk McKusick	86aeb27fa2	Change locking so that all snapshots on a particular filesystem share a common lock. This change avoids a deadlock between snapshots when separate requests cause them to deadlock checking each other for a need to copy blocks that are close enough together that they fall into the same indirect block. Although I had anticipated a slowdown from contention for the single lock, my filesystem benchmarks show no measurable change in throughput on a uniprocessor system with three active snapshots. I conjecture that this result is because every copy-on-write fault must check all the active snapshots, so the process was inherently serial already. This change removes the last of the deadlocks of which I am aware in snapshots. Sponsored by: DARPA & NAI Labs.	2002-10-16 00:19:23 +00:00
Robert Watson	80830407c6	If the FS_MULTILABEL flag is set in a UFS or UFS2 superblock, automatically set MNT_MULTILABEL in the mount flags. If FS_ACLS is set in a UFS or UFS2 superblock, automatically set MNT_ACLS in the mount flags. If either of these flags is set, but the appropriate kernel option to support the features associated with the flag isn't available, then print a warning at mount-time. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-15 20:00:06 +00:00
Kirk McKusick	48f0495d85	When reading or writing the extended attributes of a special device or fifo in UFS2, the normal ufs_strategy routine needs to be used rather than the spec_strategy or fifo_strategy routine. Thus the ffsext_strategy routine is interposed in the ffs_vnops vectors for special devices and fifo's to pick off this special case. Otherwise it simply falls through to the usual spec_strategy or fifo_strategy routine. Submitted by: Robert Watson <rwatson@FreeBSD.org> Sponsored by: DARPA & NAI Labs.	2002-10-14 23:18:09 +00:00
Robert Watson	3ceef565b2	Define two new superblock file system flags: FS_ACLS Administrative enable/disable of extended ACL support FS_MULTILABEL Administrative flag to indicate to the MAC Framework that objects in the file system are individually labeled using extended attributes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: (in principal) mckusick, phk	2002-10-14 17:07:11 +00:00
Kirk McKusick	a5b65058d5	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
Maxime Henrion	cba63e0291	Fix build of 64 bit platforms.	2002-10-09 12:19:36 +00:00
Kirk McKusick	4d533db182	When creating a snapshot, create a list of initially allocated blocks. Whenever doing a copy-on-write check, first look in the list of initially allocated blocks to see if it is there. If so, no further check is needed. If not, fall through and do the full check. This change eliminates one of two known deadlocks caused by snapshots. Handling the second deadlock will be the subject of another check-in. This change also reduces the cost of the copy-on-write check by speeding up the verification of frequently checked blocks. Sponsored by: DARPA & NAI Labs.	2002-10-09 06:13:48 +00:00
Kirk McKusick	b6cef5648d	The appropriate units for disk block addresses are always DEV_BSIZE, even when the underlying device has a larger sector size. Therefore, the filesystem code should not (and with this patch does not) try to use the underlying sector size when doing disk block address calculations. This patch fixes problems in -current when using the swap-based memory-disk device (mdconfig -a -t swap ...). This bugfix is not relevant to -stable as -stable does not have the memory-disk device. Sponsored by: DARPA & NAI Labs.	2002-10-09 04:01:23 +00:00
Jeff Roberson	a2c4ff970b	- Remove LK_INTERLOCK from the vn_lock() in ffs_snapshot(). Pointy hat to: me Found by: green	2002-10-08 21:00:52 +00:00
Dima Dorfman	85bba62925	size_t is not a struct (fix mislabelling in a comment).	2002-10-02 05:15:34 +00:00
Juli Mallett	85de3147ea	When spamming me with a printf(9), under DIAGNOSTIC, at least be nice enough to include a newline. MFC after: 4 days Sponsored by: Bright Path Solutions	2002-09-28 19:04:49 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Poul-Henning Kamp	993b0567b2	Use our mount-credential if we get a NOCRED when we try to write out EA space back to disk. This is wrong in many ways, but not as wrong as a panic. Pancied on: rwatson & jmallet Sponsored by: DARPA & NAI Labs.	2002-09-27 20:00:03 +00:00
Jeff Roberson	2ee5711e84	- Convert locks to use standard macros. - Lock access to the buflists. - Document broken locking. - Use vrefcnt().	2002-09-25 02:49:48 +00:00
Jeff Roberson	6ef1763407	- Document broken locking. - Use vrefcnt().	2002-09-25 02:47:49 +00:00
Poul-Henning Kamp	cf09d67418	We don't need to #include <sys/disklabel.h>. We don't need to #include <sys/disklabel.h> second time either. Sponsored by: DARPA & NAI Labs.	2002-09-20 16:42:33 +00:00
David E. O'Brien	47a561263d	intmax_t is printed with %jd, not %lld.	2002-09-19 03:55:30 +00:00
Nate Lawson	06be2aaa83	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
Poul-Henning Kamp	0e168822b2	Implement the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods. Use extattr_check_cred() to check access to EAs. This is still a WIP. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:59:42 +00:00
Bruce Evans	8f767abf71	Include <sys/malloc.h> instead of depending on namespace pollution 2 layers deep in <sys/proc.h> or <sys/vnode.h>. Include <sys/vmmeter.h> instead of depending on namespace pollution in <sys/pcpu.h>. Sorted includes as much as possible.	2002-09-05 09:43:24 +00:00
Poul-Henning Kamp	d0e9b8dbc4	Correctly handle setting, getting and deleting EA's with zero length content. Sponsored by: DARPA & NAI Labs.	2002-08-30 08:57:09 +00:00
Alan Cox	fff6062ab6	o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.	2002-08-25 00:22:31 +00:00
Poul-Henning Kamp	7428de69d2	Implement list of EA return functionality. Correctly delete EA's when the content length is set to zero. Sponsored by: DARPA & NAI Labs.	2002-08-20 11:34:58 +00:00
Poul-Henning Kamp	0176455bc8	First snapshot of UFS2 EA support. Sponsored by: DARPA & NAI Labs.	2002-08-19 07:01:55 +00:00
Poul-Henning Kamp	18280bc653	Expand the arguments to ffs_ext{read,write}() to their component parts rather than use vop_{read,write}_args. Access to these functions will ultimately not be available through the "vop_{read,write}+IO_EXT" API but this functionality is retained for debugging purposes for now. Sponsored by: DARPA & NAI Labs.	2002-08-13 11:33:01 +00:00
Poul-Henning Kamp	d6fe88e475	Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is an UFS only thing, and FFS should in principle not know if it is enabled or not. This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c Sponsored by: DARPA and NAI Labs.	2002-08-13 10:33:57 +00:00
Poul-Henning Kamp	9bf1a75697	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.	2002-08-13 10:05:50 +00:00
Poul-Henning Kamp	e179b40f14	Stop pretending that the FFS file ufs_readwrite.c is a UFS file. Instead of #including it, pull it into ffs_vnops.c and name things correctly. Sponsored by: DARPA & NAI Labs.	2002-08-12 10:32:56 +00:00
Ian Dowse	98caa2e4e9	Don't call softdep_slowdown() if soft updates are not active on the filesystem. This causes a panic for kernels compiled without softupdates. Reported by: luigi	2002-08-05 17:59:20 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Poul-Henning Kamp	c3a0d1d4e1	I forgot this bit of uglyness in the fsck_ffs cleanup.	2002-07-31 07:01:18 +00:00
Poul-Henning Kamp	9fbc6a330d	Fix braino in last commit.	2002-07-30 12:02:41 +00:00
Poul-Henning Kamp	17b1994bbe	Move ffs_isfreeblock() to ffs_alloc.c and make it static. Sponsored by: DARPA & NAI Labs.	2002-07-30 11:54:48 +00:00
Benno Rice	683eac8dbb	Add a missing argument to the stub for softdep_setup_freeblocks. Forgotten by: mckusick	2002-07-20 04:07:15 +00:00
Peter Wemm	382f95d332	Fix a warning: ffs_softdep.c:1630: warning: int format, different type arg (arg 2)	2002-07-20 01:09:35 +00:00
Kirk McKusick	7aca6291e3	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
Kirk McKusick	faab4e2722	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
Tom Rhodes	ae76f60046	Fix a type: s/your are/you are/	2002-07-12 19:56:31 +00:00
Bruce Evans	2daf9dc825	Fixed some printf format errors (4 new ones reported by gcc and 5 nearby old ones not reported by gcc). This helps unbreak LINT.	2002-07-08 12:42:29 +00:00
Ian Dowse	6bd521df93	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
Ian Dowse	5346934fe7	Add the ffs bits necessary to support unloading of the ufs kernel module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case. Reviewed by: mckusick	2002-07-01 11:00:47 +00:00
Ian Dowse	8f42fb8fc9	Remove the kernel file-size limit for UFS2, so that only the limit imposed by the filesystem structure itself remains. With 16k blocks, the maximum file size is now just over 128TB. For now, the UFS1 file size limit is left unchanged so as to remain consistent with RELENG_4, but it too could be removed in the future. Reviewed by: mckusick	2002-06-26 18:34:51 +00:00
Jonathan Lemon	c86c4abf99	Prototype fixes (long newinum --> ino_t newinum).	2002-06-24 17:20:19 +00:00
Maxime Henrion	cfbf0a4678	Warning fixes for 64 bits platforms. This eliminates all the warnings I have had in the FFS code on sparc64. Reviewed by: mckusick	2002-06-23 18:17:27 +00:00
Matthew Dillon	10cfbc1978	Rename the BALLOC flags from B_* to BA_* to avoid confusion with the struct buf B_ flags. Approved by: mckusick	2002-06-23 06:12:22 +00:00
Kirk McKusick	5006e77609	This patch fixes a problem whereby filesystems that ran out of inodes in a cylinder group would fail to check for free inodes in other cylinder groups. This bug was introduced in the UFS2 code merge two days ago. An inode is allocated by calling ffs_valloc which calls ffs_hashalloc to do the filesystem scan. Ffs_hashalloc walks around the cylinder groups calling its passed allocator (ffs_nodealloccg in this case) until the allocator returns a non-zero result. The bug is that ffs_hashalloc expects the passed allocator function to return a 64-bit ufs2_daddr_t. When allocating inodes, it calls ffs_nodealloccg which was returning a 32-bit ino_t. The ffs_hashalloc code checked a 64-bit return value and usually found random non-zero bits in the high 32-bits so decided that the allocation had succeeded (in this case in the only cylinder group that it checked). When the result was passed back to ffs_valloc it looked at only the bottom 32-bits, saw zero and declared the system out of inodes. But ffs_hashalloc had really only checked one cylinder group. The fix is to change ffs_nodealloccg to return 64-bit results. Sponsored by: DARPA & NAI Labs. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Reviewed by: Maxime Henrion <mux@freebsd.org>	2002-06-22 21:24:58 +00:00
Kirk McKusick	1c85e6a35d	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>	2002-06-21 06:18:05 +00:00
Semen Ustimenko	13866b3fd2	Fix a typo in my recently added comment: s/beleived/believed/ Submitted by: keramida	2002-06-06 20:43:03 +00:00
Semen Ustimenko	f576a00d1b	Remove lock from ffs_vget introduced by v1.24. Instead of locking the vnode creation globaly, we allow processes to create vnodes concurently. In case of concurent creation of vnode for the one ino, we allow processes to race and then check who wins. Assuming that concurent creation of vnode for same ino is really rare case, this is belived to be an improvement, as it just allows concurent creation of vnodes. Idea by: bp Reviewed by: dillon MFC after: 1 month	2002-05-30 22:04:17 +00:00
Ian Dowse	00b162d018	Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u unions, since these were only necessary when ext2fs used ufs code. Reviewed by: mckusick	2002-05-18 18:51:14 +00:00
Poul-Henning Kamp	8fdbc99b69	Fix ufs_daddr_t/daddr_t type problems. Sponsored by: DARPA & NAI labs.	2002-05-17 18:59:53 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Poul-Henning Kamp	05f4ff5da1	Remove register keyword. Sponsored by: DARPA & NAI Labs. Submitted by: mckusick	2002-05-13 09:22:31 +00:00
Poul-Henning Kamp	7110af7577	ARGH! SBLOCK is not unused. Try to get this right. BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant). Define SBLOCK again, using the right math. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:21:40 +00:00
Poul-Henning Kamp	7cb71b749c	Remove #define for BBOFF, it is assumed == 0 so many places that we might as well forget about it. In fact the only thing which used it was the SBOFF macro. Sponsored by: DARPA & NAI Labs.	2002-05-12 20:00:21 +00:00
Poul-Henning Kamp	16910634dd	Remove unused BBLOCK and SBLOCK #defines. Sponsored by: DARPA & NAI Labs.	2002-05-12 19:56:31 +00:00
Poul-Henning Kamp	afe564a200	Name ufs_vop_[gs]etextattr() consistently with the rest of our VOPs and put then in the ufs_vnops where they belong, rather than in the ffs_vnops. Ok'ed by: rwatson Sponsored by: DARPA & NAI Labs.	2002-05-03 08:40:33 +00:00
Jeff Roberson	5dacf95488	Don't peak into the malloc_type structure for limits. The desired vnodes check should be sufficient. This is required for the pending removal of malloc_type limits.	2002-04-15 03:35:35 +00:00
Poul-Henning Kamp	2dd527b3ac	Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>. Sponsored by: DARPA & NAI Labs	2002-04-08 09:20:07 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
Poul-Henning Kamp	a463023d6d	Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h> Sponsored by: DARPA & NAI Labs.	2002-04-03 20:39:27 +00:00
Poul-Henning Kamp	46a67eaced	Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl.	2002-04-02 11:23:14 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Bruce Evans	0508986cce	In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally provided the latter is nonzero. At this point, the former is a fairly arbitrary default value (DFTPHYS), so changing it to any reasonable value specified by the device driver is safe. Using the maximum of these limits broke ffs clustered i/o for devices whose si_iosize_max is < DFLTPHYS. Using the minimum would break device drivers' ability to increase the active limit from DFTLPHYS up to MAXPHYS. Copied the code for this and the associated (unnecessary?) fixup of mp_iosize_max to all other filesystems that use clustering (ext2fs and msdosfs). It was completely missing. PR: 36309 MFC-after: 1 week	2002-03-30 15:12:57 +00:00
Alfred Perlstein	6f1e855112	Remove __P.	2002-03-19 22:40:48 +00:00
Bruce Evans	367b50a28f	Fixed some printf format errors (hopefully all of the remaining daddr64_t ones for GENERIC, and all others on the same line as those). Reformat the printfs if necessary to avoid new long lones or old format printf errors.	2002-03-19 04:09:21 +00:00
Kirk McKusick	a0595d0249	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.	2002-03-17 01:25:47 +00:00
Kirk McKusick	0d2af52141	Introduce the new 64-bit size disk block, daddr64_t. Change the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.	2002-03-15 18:49:47 +00:00
David E. O'Brien	f0c8652ed4	Quiet a warning on the Alpha.	2002-03-15 04:06:10 +00:00
Kirk McKusick	9721068f95	This corrects the first of two known deadlock conditions that come from the presence of a snapshot file.	2002-03-14 01:21:13 +00:00
Poul-Henning Kamp	063f776327	I missed one VOP_CLOSE in the previous commit. Pointed out by: bde	2002-03-11 16:27:04 +00:00
Poul-Henning Kamp	3dbceccb78	As a XXX bandaid open the mounted device READ/WRITE even if we only mount read-only. The trouble here is that we don't reopen the device in read/write mode when we remount in read/write mode resulting in a filesystem sending write requests to a device which was only opened read/only. I'm not quite sure how such a reopen would best be done and defer the problem to more agile hackers.	2002-03-11 13:53:00 +00:00
John Baldwin	fdcc1cc09f	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.	2002-02-27 19:18:10 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Julian Elischer	2c1007663f	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)	2002-02-11 20:37:54 +00:00
Kirk McKusick	b06051cf7c	Occationally background fsck would cause a spurious ``freeing free inode'' panic. This change corrects that problem by setting the fs_active flag when the inode map changes to notify the snapshot code that the cylinder group must be rescanned. Submitted by: Robert Watson <rwatson@FreeBSD.org>	2002-02-07 22:13:56 +00:00
Kirk McKusick	cfdaa88697	Occationally deleted files would hang around for hours or days without being reclaimed. This bug was introduced in revision 1.95 dealing with filenames placed in newly allocated directory blocks, thus is not present in 4.X systems. The bug is triggered when a new entry is made in a directory after the data block containing the original new entry has been written, but before the inode that references the data block has been written. Submitted by: Bill Fenner <fenner@research.att.com>	2002-02-07 00:54:32 +00:00
Kirk McKusick	c9f96392c7	When taking a snapshot, we must check for active files that have been unlinked (e.g., with a zero link count). We have to expunge all trace of these files from the snapshot so that they are neither reclaimed prematurely by fsck nor saved unnecessarily by dump.	2002-02-02 01:42:44 +00:00
Kirk McKusick	7b60855308	Add a stub for softdep_request_cleanup() so that compilation without SOFTUPDATES option works properly. Submitted by: Benno Rice <benno@jeamland.net>	2002-01-23 02:18:56 +00:00
Kirk McKusick	03a2057a5b	This patch fixes a long standing complaint with soft updates in which small and/or nearly full filesystems would fail with `file system full' messages when trying to replace a number of existing files (for example during a system installation). When the allocation routines are about to fail with a file system full condition, they make a call to softdep_request_cleanup() which attempts to accelerate the flushing of pending deletion requests in an effort to free up space. In the face of filesystem I/O requests that exceed the available disk transfer capacity, the cleanup request could take an unbounded amount of time. Thus, the softdep_request_cleanup() routine will only try for tickdelay seconds (default 2 seconds) before giving up and returning a filesystem full error. Under typical conditions, the softdep_request_cleanup() routine is able to free up space in under fifty milliseconds.	2002-01-22 06:17:22 +00:00
Kirk McKusick	99bef8782b	Fix a bug introduced in ffs_snapshot.c -r1.25 and fs.h -r1.26 which caused incomplete snapshots to be taken. When background fsck would run on these snapshots, the result would be files being incorrectly released which would subsequently panic the kernel with ``handle_workitem_freefile: inodedep survived'', ``handle_written_inodeblock: live inodedep'', and ``handle_workitem_remove: lost inodedep'' errors.	2002-01-17 08:33:32 +00:00
Kirk McKusick	8af31e7b46	Put write on read-only filesystem panic after we have weeded out block and character devices, fifo's, etc. Submitted by: Bruce Evans <bde@zeta.org.au>	2002-01-16 04:59:09 +00:00
Kirk McKusick	cd6005961f	When downgrading a filesystem from read-write to read-only, operations involving file removal or file update were not always being fully committed to disk. The result was lost files or corrupted file data. This change ensures that the filesystem is properly synced to disk before the filesystem is down-graded. This delta also fixes a long standing bug in which a file open for reading has been unlinked. When the last open reference to the file is closed, the inode is reclaimed by the filesystem. Previously, if the filesystem had been down-graded to read-only, the inode could not be reclaimed, and thus was lost and had to be later recovered by fsck. With this change, such files are found at the time of the down-grade. Normally they will result in the filesystem down-grade failing with `device busy'. If a forcible down-grade is done, then the affected files will be revoked causing the inode to be released and the open file descriptors to begin failing on attempts to read. Submitted by: "Sam Leffler" <sam@errno.com>	2002-01-15 07:17:12 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
Kirk McKusick	0bc7a833ec	When going to sleep, we must save our SPL so that it does not get lost if some other process uses the lock while we are sleeping. We restore it after we have slept. This functionality is provided by a new routine interlocked_sleep() that wraps the interlocking with functions that sleep. This function is then used in place of the old ACQUIRE_LOCK_INTERLOCKED() and FREE_LOCK_INTERLOCKED() macros. Submitted by: Debbie Chu <dchu@juniper.net>	2002-01-12 20:57:36 +00:00
Kirk McKusick	794ef3471f	Must call drain_output() before checking the dirty block list in softdep_sync_metadata(). Otherwise we may miss dependencies that need to be flushed which will result in a later panic with the message ``vinvalbuf: dirty bufs''. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2002-01-11 19:59:27 +00:00
Mike Smith	b9a4338d29	Initialise the bioops vector hack at runtime rather than at link time. This avoids the use of common variables. Reviewed by: mckusick	2002-01-08 19:32:18 +00:00
Matthew Dillon	23b590188f	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
Kirk McKusick	f305c5d199	Change the atomic_set_char to atomic_set_int and atomic_clear_char to atomic_clear_int to ease the implementation for the sparc64. Requested by: Jake Burkholder <jake@locore.ca>	2001-12-18 18:05:17 +00:00
Ian Dowse	143a5346c9	Make sure we ignore the value of `fs_active' when reloading the superblock, and move the initialisation of it to beside where other pointer fields are initialised.	2001-12-16 18:54:09 +00:00
Ian Dowse	3fa4044e34	Move the new superblock field `fs_active' into the region of the superblock that is already set up to handle pointer types. This fixes an accidental change in the superblock size on 64-bit platforms caused by revision 1.24.	2001-12-16 18:51:11 +00:00
Kirk McKusick	cc5a92334f	Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.	2001-12-14 00:15:06 +00:00
Kirk McKusick	9db12e5108	When a file is partially truncated, we first check to see if the new file end will land in the middle of a file hole. Since the last block of a file must always be allocated, the hole is filled by allocating a block at that location. If the hole being filled is a direct block, then the truncation may eventually reduce the full sized block down to a fragment. When running with soft updates, it is necessary to FSYNC the file after allocating the block and before creating the fragment to avoid triggering a soft updates inconsistency when the block unexpectedly shrinks. Found by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week	2001-12-13 05:07:48 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Matthew Dillon	c72ccd014d	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
Robert Watson	ab66aa1468	o Replace two direct uid!=0 comparisons with suser_xxx() calls. Obtained from: TrustedBSD Project	2001-10-02 14:41:43 +00:00
Robert Watson	b73d2870cd	o Replace two direct uid!=0 comparisons with suser_td() calls. Obtained from: TrustedBSD Project	2001-10-02 14:34:22 +00:00
John Baldwin	eb46fac565	- Fix some minor whitespace nits. - Move the SPECIAL_FLAG #define up next to the NOHOLDER #define and fix a little nit that caused it to be defined as -(sizeof (struct thread) + 1) instead of -2.	2001-09-27 21:04:13 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Ian Dowse	4691e9ead0	The "dirpref" directory layout preference improvements make use of an array "fs_contigdirs[]" to avoid too many directories getting created in each cylinder group. The memory required for this and two other arrays (fs_csp[] and fs_maxcluster[]) is allocated with a single malloc() call, and divided up afterwards. However, the 'space' pointer is not advanced correctly, so fs_contigdirs and fs_maxcluster end up pointing to the same address. Add the missing code to advance the 'space' pointer, and remove an unnecessary update of the pointer that follows. This is likely to fix the "ffs_clusteralloc: map mismatch" panics that have been reported recently. Submitted by: Luke Mewburn <lukem@wasabisystems.com>	2001-09-09 23:48:28 +00:00
Robert Watson	7df97b6117	o At some point, unmounting a non-EA file system with EA's compiled in got a bit broken, when ufs_extattr_stop() was called and failed, ufs_extattr_destroy() would panic. This makes the call to destroy() conditional on the success of stop(). Submitted by: Christian Carstensen <cc@devcon.net> Obtained from: TrustedBSD Project	2001-09-01 20:11:05 +00:00
Peter Wemm	815d14ddab	Use a fixed type for times in on-disk structures for ufs rather than something that could potentially change like time_t.	2001-07-16 00:55:27 +00:00
John Baldwin	ed87274d16	Fix more mntvnode and vnode interlock order reversals.	2001-06-28 22:21:33 +00:00
John Baldwin	49d2d9f4a4	- Fix a mntvnode and vnode interlock reversal. - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.	2001-06-28 04:12:56 +00:00
Peter Wemm	78236790cd	Fix warning: 1973: warning: int format, long int arg (arg 5)	2001-06-15 07:44:39 +00:00
Kirk McKusick	eb87cd754f	Build on the change in revision 1.98 by Tor.Egge@fast.no. The symptom being treated in 1.98 was to avoid freeing a pagedep dependency if there was still a newdirblk dependency referencing it. That change is correct and no longer prints a warning message when it occurs. The other part of revision 1.98 was to panic when a newdirblk dependency was encountered during a file truncation. This fix removes that panic and replaces it with code to find and delete the newdirblk dependency so that the truncation can succeed.	2001-06-13 23:13:13 +00:00
David E. O'Brien	1239674238	There seems to be a problem that the order of disk write operation being incorrect due to a missing check for some dependency. This change avoids the freelist corruption (but not the temporarily inconsistent state of the file system). A message is printed as a reminder of the under lying problem when a pagedep structure is not freed due to the NEWBLOCK flag being set. Submitted by: Tor.Egge@fast.no	2001-06-05 01:49:37 +00:00
John Baldwin	1c11b01562	Revert the previous commit in favor of the fix in rev 1.42 of ufs/ffs/ffs_extern.h instead. Requested by: bde	2001-05-30 23:09:19 +00:00
John Baldwin	55d132317c	Forward declare struct cg to quiet a warning. Submitted by: bde	2001-05-30 23:08:40 +00:00
John Baldwin	59718ee556	Include <ufs/ffs/fs.h> to get the definition of struct cg to quiet a warning.	2001-05-29 23:53:16 +00:00
Poul-Henning Kamp	c7a3e2379c	Remove last vestiges of MFS.	2001-05-29 21:21:53 +00:00
Kirk McKusick	57042c7f72	Update softdep_setup_directory_add prototype to reflect changes in actual function. Obtained from: Jim Bloom <bloom@jbloom.jbloom.org>	2001-05-20 15:59:55 +00:00
Kirk McKusick	dc01275be9	Must ensure that all the entries on the pd_pendinghd list have been committed to disk before clearing them. More specifically, when free_newdirblk is called, we know that the inode claims the new directory block. However, if the associated pagedep is still linked onto the directory buffer dependency chain, then some of the entries on the pd_pendinghd list may not be committed to disk yet. In this case, we will simply note that the inode claims the block and let the pd_pendinghd list be processed when the pagedep is next written. If the pagedep is no longer on the buffer dependency chain, then all the entries on the pd_pending list are committed to disk and we can free them in free_newdirblk. This corrects a window of vulnerability introduced in the code added in version 1.95.	2001-05-19 19:24:26 +00:00
Kirk McKusick	9f5192ff71	Must be a bit less aggressive about freeing pagedep structures. Obtained from: Robert Watson <rwatson@FreeBSD.org> and Matthew Jacob <mjacob@feral.com>	2001-05-18 22:16:28 +00:00
Kirk McKusick	24a83a4b3f	When a new block is allocated to a directory, an fsync of a file whose name is within that block must ensure not only that the block containing the file name has been written, but also that the on-disk directory inode references that block. When a new directory block is created, we allocate a newdirblk structure which is linked to the associated allocdirect (on its ad_newdirblk list). When the allocdirect has been satisfied, the newdirblk structure is moved to the inodedep id_bufwait list of its directory to await the inode being written. When the inode is written, the directory entries are fully committed and can be deleted from their pagedep->id_pendinghd and inodedep->id_pendinghd lists.	2001-05-17 07:24:03 +00:00
Ian Dowse	0864ef1e8a	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp	2001-05-16 18:04:37 +00:00
Kirk McKusick	7389126d9a	Further fixes for deadlock in the presence of multiple snapshots. There are still more to find, but this fix should cover the common cases that folks are hitting.	2001-05-14 17:16:49 +00:00
Kirk McKusick	9b35c30cf7	Remove yet another deadlock case.	2001-05-11 07:12:03 +00:00
Kirk McKusick	9ccb939ef0	When running with soft updates, track the number of blocks and files that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.	2001-05-08 07:42:20 +00:00
Kirk McKusick	27b047acf0	Several fixes for units errors: 1) Do not assume that the superblock will be of size fs->fs_bsize. This fixes a panic when taking a snapshot on a filesystem with a block size bigger than 8K. 2) Properly calculate the number of fragments that follow the superblock summary information. This fixes a bug with inconsistent snapshots. 3) When cleaning up a snapshot that is about to be removed, properly calculate the number of blocks that need to be checked. This fixes a bug that created partially allocated inodes. 4) When moving blocks from a snapshot that is about to be removed to another snapshot, properly account for the reduced number of blocks in the snapshot from which they are taken. This fixes a bug in which the number of blocks released from a snapshot did not match the number that it claimed to have.	2001-05-08 07:29:03 +00:00
Kirk McKusick	0c6fbff0a5	When syncing out snapshot metadata, we must temporarily allow recursive buffer locking so as to avoid locking against ourselves if we need to write filesystem metadata.	2001-05-08 07:13:00 +00:00
Kirk McKusick	23371b2f22	Refinement to revision 1.16 of ufs/ffs/ffs_snapshot.c to reduce the amount of time that the filesystem must be suspended. The current snapshot is elided as well as the earlier snapshots.	2001-05-04 05:49:28 +00:00
Poul-Henning Kamp	3c7a8027cb	Remove blatantly pointless call to VOP_BMAP(). Use ufs_bmaparray() rather than VOP_BMAP() on our own vnodes.	2001-05-01 09:12:31 +00:00
Poul-Henning Kamp	a62615e59b	Implement vop_std{get\|put}pages() and add them to the default vop[]. Un-copy&paste all the VOP_{GET\|PUT}PAGES() functions which do nothing but the default.	2001-05-01 08:34:45 +00:00
Poul-Henning Kamp	855aa097af	VOP_BALLOC was never really a VOP in the first place, so convert it to UFS_BALLOC like the other "between UFS and FFS function interfaces".	2001-04-29 12:36:52 +00:00
Poul-Henning Kamp	0c25dbeb17	Remove faint traces of non-existant ffs_bmap().	2001-04-29 10:23:32 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Kirk McKusick	c9509f5865	Rather than copying all the indirect blocks of the snapshot, simply mark them as BLK_NOCOPY. This trick cuts the initial size of the snapshot in half and cuts the time to take a snapshot by a third.	2001-04-26 00:50:53 +00:00
Kirk McKusick	112f737245	When closing the last reference to an unlinked file, it is freed by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended. For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written. Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.	2001-04-25 08:11:18 +00:00
Poul-Henning Kamp	a13234bb35	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>	2001-04-25 07:07:52 +00:00
Ian Dowse	5d69bac493	Pre-dirpref versions of fsck may zero out the new superblock fields fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause panics if these fields were zeroed while a filesystem was mounted read-only, and then remounted read-write. Add code to ffs_reload() which copies the fs_contigdirs pointer from the previous superblock, and reinitialises fs_avgf* if necessary. Reviewed by: mckusick	2001-04-24 00:37:16 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Kirk McKusick	5819ab3f12	Add debugging option to always read/write cylinder groups as full sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'. Add debugging option to disable background writes of cylinder groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'. These debugging options should be tried on systems that are panicing with corrupted cylinder group maps to see if it makes the problem go away. The set of panics in question are: ffs_clusteralloc: map mismatch ffs_nodealloccg: map corrupted ffs_nodealloccg: block not in map ffs_alloccg: map corrupted ffs_alloccg: block not in map ffs_alloccgblk: cyl groups corrupted ffs_alloccgblk: can't find blk in cyl ffs_checkblk: partially free fragment The following panics are less likely to be related to this problem, but might be helped by these debugging options: ffs_valloc: dup alloc ffs_blkfree: freeing free block ffs_blkfree: freeing free frag ffs_vfree: freeing free inode If you try these options, please report whether they helped reduce your bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com> and to Matt Dillon <dillon@earth.backplane.com>.	2001-04-17 05:37:51 +00:00
Kirk McKusick	f0f3f19f05	Background fsck sysctl operations must use vn_start_write and vn_finished_write so that they do not attempt to modify a suspended filesystem.	2001-04-17 05:06:37 +00:00
Kirk McKusick	74046077a7	Update to describe use of mdconfig instead of deprecated vnconfig. Submitted by: Steve Ames <steve@virtual-voodoo.com>	2001-04-14 18:32:09 +00:00
Kirk McKusick	1a6a661032	This checkin adds support in ufs/ffs for the FS_NEEDSFSCK flag. It is described in ufs/ffs/fs.h as follows: /* * Filesystem flags. * * Note that the FS_NEEDSFSCK flag is set and cleared only by the * fsck utility. It is set when background fsck finds an unexpected * inconsistency which requires a traditional foreground fsck to be * run. Such inconsistencies should only be found after an uncorrectable * disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when * it has successfully cleaned up the filesystem. The kernel uses this * flag to enforce that inconsistent filesystems be mounted read-only. / #define FS_UNCLEAN 0x01 / filesystem not clean at mount / #define FS_DOSOFTDEP 0x02 / filesystem using soft dependencies / #define FS_NEEDSFSCK 0x04 / filesystem needs sync fsck before mount */	2001-04-14 05:26:28 +00:00
Kirk McKusick	a61ab64ac4	Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved. ------ One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm. First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are: 1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35 2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50 You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html Test Results tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66 "old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref. ------ Algorithm description The old dirpref algorithm is described in comments: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. / A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent. What I mean by a big file system ? 1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache. The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations. My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: / * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. / My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files. My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array. The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are: int32_t fs_avgfilesize; / expected average file size / int32_t fs_avgfpdir; / expected # of files per directory */ These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache. I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down. Obtained from: Grigoriy Orlov <gluk@ptci.ru>	2001-04-10 08:38:59 +00:00
Jeroen Ruigrok van der Werven	5d0b660f2a	Fix typo ); -> ,	2001-03-24 15:25:04 +00:00
Kirk McKusick	fca26df055	Check that background fsck operation is being done on a ufs filesystem. Obtained from: Robert Watson <rwatson@FreeBSD.org>	2001-03-23 20:58:25 +00:00
Kirk McKusick	812b1d416c	Add kernel support for running fsck on active filesystems.	2001-03-21 04:09:01 +00:00
Kirk McKusick	31c6ce0aed	Clear the fs_clean flag only when the FS_UNCLEAN flag is not set (as is done in unmount). Remove a snapshot inode from the superblock list when its last name goes away rather than when its last reference goes away. That way it will be properly reclaimed by fsck after a crash rather than reenabled when the filesystem is mounted.	2001-03-21 04:05:20 +00:00
Kirk McKusick	7e72e9918a	Report the correct inode number when panicing with freeing free inode. Report the correct block number when panicing with freeing free block.	2001-03-21 04:01:02 +00:00
Robert Watson	516081f288	o Change options FFS_EXTATTR and options FFS_EXTATTR_AUTOSTART to options UFS_EXTATTR and UFS_EXTATTR_AUTOSTART respectively. This change reflects the fact that our EA support is implemented entirely at the UFS layer (modulo FFS start/stop/autostart hooks for mount and unmount events). This also better reflects the fact that [shortly] MFS will also support EAs, as well as possibly IFS. o Consumers of the EA support in FFS are reminded that as a result, they must change kernel config files to reflect the new option names. Obtained from: TrustedBSD Project	2001-03-19 04:35:40 +00:00
Robert Watson	f5161237ad	o Implement "options FFS_EXTATTR_AUTOSTART", which depends on "options FFS_EXTATTR". When extended attribute auto-starting is enabled, FFS will scan the .attribute directory off of the root of each file system, as it is mounted. If .attribute exists, EA support will be started for the file system. If there are files in the directory, FFS will attempt to start them as attribute backing files for attributes baring the same name. All attributes are started before access to the file system is permitted, so this permits race-free enabling of attributes. For attributes backing support for security features, such as ACLs, MAC, Capabilities, this is vital, as it prevents the file system attributes from getting out of sync as a result of file system operations between mount-time and the enabling of the extended attribute. The userland extattrctl tool will still function exactly as previously. Files must be placed directly in .attribute, which must be directly off of the file system root: symbolic links are not permitted. FFS_EXTATTR will continue to be able to function without FFS_EXTATTR_AUTOSTART for sites that do not want/require auto-starting. If you're using the UFS_ACL code available from www.TrustedBSD.org, using FFS_EXTATTR_AUTOSTART is recommended. o This support is implemented by adding an invocation of ufs_extattr_autostart() to ffs_mountfs(). In addition, several new supporting calls are introduced in ufs_extattr.c: ufs_extattr_autostart(): start EAs on the specified mount ufs_extattr_lookup(): given a directory and filename, return the vnode for the file. ufs_extattr_enable_with_open(): invoke ufs_extattr_enable() after doing the equililent of vn_open() on the passed file. ufs_extattr_iterate_directory(): iterate over a directory, invoking ufs_extattr_lookup() and ufs_extattr_enable_with_open() on each entry. o This feature is not widely tested, and therefore may contain bugs, caution is advised. Several changes are in the pipeline for this feature, including breaking out of EA namespaces into subdirectories of .attribute (this is waiting on the updated EA API), as well as a per-filesystem flag indicating whether or not EAs should be auto-started. This is required because administrators may not want .attribute auto-started on all file systems, especially if non-administrators have write access to the root of a file system. Obtained from: TrustedBSD Project	2001-03-14 05:32:31 +00:00
Kirk McKusick	589c7af992	Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).	2001-03-07 07:09:55 +00:00
Kirk McKusick	8775e64a5d	Free lock before returning from process_worklist_item. Obtained from: Constantine Sapuntzakis <csapuntz@stanford.edu>	2001-03-01 21:43:46 +00:00
Adrian Chadd	f3a90da995	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.	2001-03-01 21:00:17 +00:00
Kirk McKusick	a5a94e3936	Free lock before calling panic so that subsequent attempt to write out buffers does not re-panic with `locking against myself'. This change should not affect normal operations of soft updates in any way.	2001-02-23 09:01:31 +00:00
Kirk McKusick	cc686e21c0	When cleaning up excess inode dependencies, check for being done. Reviewed by: Jan Koum <jkb@yahoo-inc.com>	2001-02-22 10:17:57 +00:00
Kirk McKusick	2cf5d587a9	This patch corrects two problems with the rate limiting code that was introduced in revision 1.80. The problem manifested itself with a `locking against myself' panic and could also result in soft updates inconsistences associated with inodedeps. The two problems are: 1) One of the background operations could manipulate the bitmap while holding it locked with intent to create. This held lock results in a `locking against myself' panic, when the background processing that we have been coopted to do tries to lock the bitmap which we are already holding locked. To understand how to fix this problem, first, observe that we can do the background cleanups in inodedep_lookup only when allocating inodedeps (DEPALLOC is set in the call to inodedep_lookup). Second observe that calls to inodedep_lookup with DEPALLOC set can only happen from the following calls into the softdep code: softdep_setup_inomapdep softdep_setup_allocdirect softdep_setup_remove softdep_setup_freeblocks softdep_setup_directory_change softdep_setup_directory_add softdep_change_linkcnt Only the first two of these can come from ffs_alloc.c while holding a bitmap locked. Thus, inodedep_lookup must not go off to do request_cleanups when being called from these functions. This change adds a flag, NODELAY, that can be passed to inodedep_lookup to let it know that it should not do background processing in those cases. 2) The return value from request_cleanup when helping out with the cleanup was 0 instead of 1. This meant that despite the fact that we may have slept while doing the cleanups, the code did not recheck for the appearance of an inodedep (e.g., goto top in inodedep_lookup). This lead to the softdep inconsistency in which we ended up with two inodedep's for the same inode. Reviewed by: Peter Wemm <peter@yahoo-inc.com>, Matt Dillon <dillon@earth.backplane.com>	2001-02-20 11:14:38 +00:00
Jeroen Ruigrok van der Werven	d7d97eb0aa	Preceed/preceeding are not english words. Use precede and preceding.	2001-02-18 10:43:53 +00:00
Jake Burkholder	d5a08a6065	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.	2001-02-12 00:20:08 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
Poul-Henning Kamp	37d4006626	Another round of the <sys/queue.h> FOREACH transmogriffer. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 16:08:18 +00:00
Poul-Henning Kamp	fc2ffbe604	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
Poul-Henning Kamp	ef9e85abba	Use <sys/queue.h> macro API.	2001-02-04 12:37:48 +00:00
Matthew Dillon	f8e071a1eb	Fix a race between the syncer and umount. When you umount a softupdates filesystem softdep_process_worklist() is called in a loop until it indicates that no dependancies remain, but the determination of that fact depends on there only being one softdep_process_worklist() instance running. It was possible for the syncer to also be running softdep_process_worklist() and the pre-existing checks in the code to prevent this were not sufficient to prevent the race. This patch solves the problem. Approved-by: mckusick	2001-01-30 06:31:59 +00:00
Jason Evans	1b367556b5	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
Ian Dowse	f55ff3f3ef	The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem. Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer. With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels. Reviewed by: mckusick	2001-01-15 18:30:40 +00:00
Kirk McKusick	cb3ab5aaf7	Properly compute the size of the final block of superblock summary information. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2001-01-12 21:56:55 +00:00
Kirk McKusick	48d617487d	Several small but important fixes for snapshots: 1) Be more tolerant of missing snapshot files by only trying to decrement their reference count if they are registered as active. 2) Fix for snapshots of filesystems with block sizes larger than 8K (from Ollivier Robert <roberto@eurocontrol.fr>). 3) Fix to avoid losing last block in snapshot file when calculating blocks that need to be copied (from Don Coleman <coleman@coleman.org>).	2000-12-19 04:41:09 +00:00
Kirk McKusick	6da443cb22	Get rid of spurious check in ffs_truncate for i_size == length which fails to set the modification time on the file. The same check a few lines later takes the correct action. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>	2000-12-19 04:20:13 +00:00
Assar Westerlund	ca85ca6099	add a stub for softdep_slowdown so that it's possible to build the kernel without SOFTUPDATES	2000-12-17 23:59:56 +00:00
Seigo Tanimura	937c4dfa08	Do not race for the lock of an inode hash. Reviewed by: jhb	2000-12-13 10:04:01 +00:00
Kirk McKusick	1d733bbd10	Preventing runaway kernel soft updates memory, take three. Previously, the syncer process was the only process in the system that could process the soft updates background work list. If enough other processes were adding requests to that list, it would eventually grow without bound. Because some of the work list requests require vnodes to be locked, it was not generally safe to let random processes process the work list while they already held vnodes locked. By adding a flag to the work list queue processing function to indicate whether the calling process could safely lock vnodes, it becomes possible to co-opt other processes into helping out with the work list. Now when the worklist gets too large, other processes can safely help out by picking off those work requests that can be handled without locking a vnode, leaving only the small number of requests requiring a vnode lock for the syncer process. With this change, it appears possible to keep even the nastiest workloads under control. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-12-13 08:30:35 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Poul-Henning Kamp	959b7375ed	Staticize some malloc M_ instances.	2000-12-08 20:09:00 +00:00
Kirk McKusick	71868b020d	More aggressively rate limit the growth of soft dependency structures in the face of multiple processes doing massive numbers of filesystem operations. While this patch will work in nearly all situations, there are still some perverse workloads that can overwhelm the system. Detecting and handling these perverse workloads will be the subject of another patch. Reviewed by: Paul Saab <ps@yahoo-inc.com> Obtained from: Ethan Solomita <ethan@geocast.com>	2000-11-20 06:22:39 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
Kirk McKusick	bd4bd019fb	When deleting a file, the ordering of events imposed by soft updates is to first write the deleted directory entry to disk, second write the zero'ed inode to disk, and finally to release the freed blocks and the inode back to the cylinder-group map. As this ordering requires two disk writes to occur which are normally spaced about 30 seconds apart (except when memory is under duress), it takes about a minute from the time that a file is deleted until its inode and data blocks show up in the cylinder-group map for reallocation. If a file has had only a brief lifetime (less than 30 seconds from creation to deletion), neither its inode nor its directory entry may have been written to disk. If its directory entry has not been written to disk, then we need not wait for that directory block to be written as the on-disk directory block does not reference the inode. Similarly, if the allocated inode has never been written to disk, we do not have to wait for it to be written back either as its on-disk representation is still zero'ed out. Thus, in the case of a short lived file, we can simply release the blocks and inode to the cylinder-group map immediately. As the inode and its blocks are released immediately, they are immediately available for other uses. If they are not released for a minute, then other inodes and blocks must be allocated for short lived files, cluttering up the vnode and buffer caches. The previous code was a bit too aggressive in trying to release the blocks and inode back to the cylinder-group map resulting in their being made available when in fact the inode on disk had not yet been zero'ed. This patch takes a more conservative approach to doing the release which avoids doing the release prematurely.	2000-11-14 09:00:25 +00:00
Adrian Chadd	0b0c10b48d	Initial commit of IFS - a inode-namespaced FFS. Here is a short description: How it works: -- Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground. File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion: fd = open("newfile", O_CREAT\|O_RDWR, 0644); fstat(fd, &st); printf("new file is %d\n", (int)st.st_ino); Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie: fd = open("5", O_RDWR); The creation permissions depend entirely if you have write access to the root directory of the filesystem. To get the list of currently allocated inodes, VOP_READDIR has been added which returns a directory listing of those currently allocated. -- What this entails: * patching conf/files and conf/options to include IFS as a new compile option (and since ifs depends upon FFS, include the FFS routines) * An entry in i386/conf/NOTES indicating IFS exists and where to go for an explanation * Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS routines require (ffs_mount() and ffs_reload()) * a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS routines. IFS replaces some of the vfsops, and a handful of vnops - most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR(). Any other directory operation is marked as invalid. What this results in: * an IFS partition's create permissions are controlled by the perm/ownership of the root mount point, just like a normal directory * Each inode has perm and ownership too * IFS does NOT mean an FFS partition can be opened per inode. This is a completely seperate filesystem here * Softupdates doesn't work with IFS, and really I don't think it needs it. Besides, fsck's are FAST. (Try it :-) * Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC). Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against this particular inode, and unravelling THAT code isn't trivial. Therefore, useful inodes start at 3. Enjoy, and feedback is definitely appreciated!	2000-10-14 03:02:30 +00:00
Eivind Eklund	7eb9fca557	Blow away the v_specmountpoint define, replacing it with what it was defined as (rdev->si_mountpoint)	2000-10-09 17:31:39 +00:00
Robert Watson	ff435dcb91	o Move initialization of ump from mp to the top of the function so that it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic due to a NULL pointer dereference. Submitted by: Wesley Morgan <morganw@chemicals.tacorp.com>	2000-10-06 15:31:28 +00:00
Robert Watson	9de54ba513	o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean up lock on extattrs. o Get for free a comment indicating where auto-starting of extended attributes will eventually occur, as it was in my commit tree also. No implementation change here, only a comment.	2000-10-04 04:44:51 +00:00
Jason Evans	a18b1f1d4d	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.	2000-10-04 01:29:17 +00:00
Boris Popov	67e871664b	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick	2000-09-25 15:24:04 +00:00
Robert Watson	907da7c385	o Permit UFS Extended Attributes to be associated with special devices and FIFOs. Obtained from: TrustedBSD Project	2000-09-21 19:06:02 +00:00
Dag-Erling Smørgrav	8461bdba85	Silence a warning.	2000-09-17 19:41:26 +00:00
Kirk McKusick	52a3bfa2e7	Cannot do MALLOC with M_WAITOK while holding ACQUIRE_LOCK Obtained from: Ethan Solomita <ethan@geocast.com>	2000-09-07 23:02:55 +00:00
Jason Evans	0384fff8c5	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh	2000-09-07 01:33:02 +00:00
Tor Egge	b5ee7ec63a	Initialize *countp to 0 in stub for softdep_flushworklist(). This allows ffs_fsync() to break out of a loop that might otherwise be infinite on kernels compiled without the SOFTUPDATES option. The observed symptom was a system hang at the first unmount attempt.	2000-08-09 00:41:54 +00:00
Ollivier Robert	8694d8e912	Fix the lockmgr panic everyone is seeing at shutdown time. vput assumes curproc is the lock holder, but it's not true in this case. Thanks a lot Luoqi ! Submitted by: luoqi Tested by: phk	2000-08-01 14:15:07 +00:00
Peter Wemm	6ee6b42ef7	Minor change: fix warning - move a 'struct vnode *vp' declaration inside a #ifdef DIAGNOSTIC to match its corresponding usage.	2000-07-28 22:27:00 +00:00
Kirk McKusick	3592b7155c	Clean up the snapshot code so that it no longer depends on the use of the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken. Submitted by: Robert Watson <rwatson@FreeBSD.org>	2000-07-26 23:07:01 +00:00
Kirk McKusick	55ba28c60a	Add stub for softdep_flushworklist() so that kernels compiled without the SOFTUPDATES option will load correctly. Obtained from: John Baldwin <jhb@bsdi.com>	2000-07-25 05:28:59 +00:00
Kirk McKusick	9b97113391	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.	2000-07-24 05:28:33 +00:00
Boris Popov	3fbd97427e	Prevent possible dereference of NULL pointer. Submitted by: Marius Bendiksen <mbendiks@eunet.no>	2000-07-13 02:17:14 +00:00
Kirk McKusick	d303f71fdc	Brain fault, forgot to update ffs_snapshot.c with the new calling convention for vn_start_write.	2000-07-12 00:27:27 +00:00
Kirk McKusick	f2a2857bb3	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
Kirk McKusick	d4c1816924	Clean up warning about undeclared function by declaring softdep_fsync in mount.h instead of ffs_extern.h. The correct solution is to use an indirect function pointer so that the kernel does not have to be built with options FFS, but that will be left for another day.	2000-07-11 19:28:26 +00:00
Kirk McKusick	cc3962a9cd	Delete README as it is now obsolete. Relevant information is in README.softupdates.	2000-07-08 02:32:49 +00:00
Kirk McKusick	876578906d	Update to reflect current status.	2000-07-08 02:31:21 +00:00
Kirk McKusick	22e5a6234e	Get userland visible flags added for snapshots to give a few days advance preparation for them to get migrated into place so that subsequent changes in utilities will not fail to compile for lack of up-to-date header files in /usr/include.	2000-07-04 04:58:34 +00:00
Poul-Henning Kamp	3275cf7379	Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES, that is way cleaner than using the softupdates_stub stunt, which should be killed when convenient. Discussed with: mckusick	2000-07-03 13:26:54 +00:00
Andrey A. Chernov	2d90744fd8	Remove obsoleted info about linking from contrib	2000-06-24 13:29:25 +00:00
Kirk McKusick	858c16fab8	Update to new copyright.	2000-06-22 00:29:53 +00:00
Kirk McKusick	6019e6208f	When running with quotas enabled on a filesystem using soft updates, the system would panic when a user's inode quota was exceeded (see PR 18959 for details). This fixes that problem. PR: 18959 Submitted by: Jason Godsey <jason@unixguy.fidalgo.net>	2000-06-18 22:14:28 +00:00
Kirk McKusick	d3abb52714	Some additional performance improvements. When freeing an inode check to see if it has been committed to disk. If it has never been written, it can be freed immediately. For short lived files this change allows the same inode to be reused repeatedly. Similarly, when upgrading a fragment to a larger size, if it has never been claimed by an inode on disk, it too can be freed immediately making it available for reuse often in the next slowly growing block of the same file.	2000-06-18 22:05:57 +00:00
Poul-Henning Kamp	7c50d77218	Revert part of my bioops change which implemented panic(8).	2000-06-16 14:32:13 +00:00
Poul-Henning Kamp	7523681895	ARGH! I have too many source trees :-( Fix prototype errors in last commit.	2000-06-16 13:00:33 +00:00
Poul-Henning Kamp	a2e7a027a7	Virtualizes & untangles the bioops operations vector. Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@	2000-06-16 08:48:51 +00:00
Poul-Henning Kamp	6ea6805f8c	Remove a comment which should never have made it in.	2000-06-14 21:48:19 +00:00
Robert Watson	b2b0497ab5	o If FFS_EXTATTR is defined, don't print out an error message on unmount if an FFS partition returns EOPNOTSUPP, as it just means extended attributes weren't enabled on that partition. Prevents spurious warning per-partition at shutdown.	2000-06-04 04:50:36 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Robert Watson	f3706a0361	s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf. Reported by: knu	2000-05-07 17:21:08 +00:00
Poul-Henning Kamp	9626b608de	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
Poul-Henning Kamp	2c9b67a8df	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude	2000-04-30 18:52:11 +00:00
Poul-Henning Kamp	87150cb06d	s/biowait/bufwait/g Prodded by: several.	2000-04-29 16:25:22 +00:00
Poul-Henning Kamp	eb95c536ad	Remove unneeded #include <sys/kernel.h>	2000-04-29 15:36:14 +00:00
Poul-Henning Kamp	3389ae9350	Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>	2000-04-19 14:58:28 +00:00
Robert Watson	a64ed08955	Introduce extended attribute support for FFS, allowing arbitrary (name, value) pairs to be associated with inodes. This support is used for ACLs, MAC labels, and Capabilities in the TrustedBSD security extensions, which are currently under development. In this implementation, attributes are backed to data vnodes in the style of the quota support in FFS. Support for FFS extended attributes may be enabled using the FFS_EXTATTR kernel option (disabled by default). Userland utilities and man pages will be committed in the next batch. VFS interfaces and man pages have been in the repo since 4.0-RELEASE and are unchanged. o ufs/ufs/extattr.h: UFS-specific extattr defines o ufs/ufs/ufs_extattr.c: bulk of support routines o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes o contrib/softupdates/ffs_softdep.c: extattr.h includes o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h (This should not be the case, and will be fixed in a future commit) Currently attributes are not supported in MFS. This will be fixed. Reviewed by: adrian, bp, freebsd-fs, other unthanked souls Obtained from: TrustedBSD Project	2000-04-15 03:34:27 +00:00
Poul-Henning Kamp	c244d2de43	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.	2000-04-02 15:24:56 +00:00
Poul-Henning Kamp	b99c307a21	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
Poul-Henning Kamp	21144e3bf1	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.	2000-03-20 10:44:49 +00:00
Kirk McKusick	584508a741	Use 64-bit math to calculate if we have hit our freespace limit. Necessary for coherent results on filesystems bigger than 0.5Tb.	2000-03-17 03:44:47 +00:00
Kirk McKusick	9f043878d0	Use 64-bit math to decide if optimization needs to be changed. Necessary for coherent results on filesystems bigger than 0.5Tb. Submitted by: Paul Saab <ps@yahoo-inc.com>	2000-03-15 07:08:36 +00:00
Matthew Dillon	f8fa53397f	Fix a 'freeing free block' panic in UFS. The problem occurs when the filesystem fills up. If the first indirect block exists and FFS is able to allocate deeper indirect blocks, but is not able to allocate the data block, FFS improperly unwinds the indirect blocks and leaves a block pointer hanging to a freed block. This will cause a panic later when the file is removed. The solution is to properly account for the first block-pointer-to-an-indirect-block we had to create in a balloc operation and then unwind it if a failure occurs. Detective work by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie> Approved by: jkh	2000-02-24 20:43:20 +00:00
Kirk McKusick	4434ff1d38	When writing out bitmap buffers, need to skip over ones that already have a write in progress. Otherwise one can get in an infinite loop trying to get them all flushed. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	2000-01-30 20:32:59 +00:00
Kirk McKusick	57a91f6fb0	During fastpath processing for removal of a short-lived inode, the set of restrictions for cancelling an inode dependency (inodedep) is somewhat stronger than originally coded. Since this check appears in two places, we codify it into the function check_inode_unwritten which we then call from the two sites, one freeing blocks and the other freeing directory entries. Submitted by: Steinar Haug via Matthew Dillon	2000-01-18 01:33:05 +00:00
Kirk McKusick	4c6adb0622	Need to reorganize the flushing of directory entry (pagedep) dependencies so that they never try to lock an inode corresponding to ".." as this can lead to deadlock. We observe that any inode with an updated link count is always pushed into its buffer at the time of the link count change, so we do not need to do a VOP_UPDATE, but merely find its buffer and write it. The only time we need to get the inode itself is from the result of a mkdir whose name will never be ".." and hence locking such an inode will never request a lock above us in the filesystem tree. Thanks to Brian Fundakowski Feldman for providing the test program that tickled soft updates into hanging in "inode" sleep. Submitted by: Brian Fundakowski Feldman <green@FreeBSD.org>	2000-01-18 01:30:03 +00:00
Kirk McKusick	105ef72c55	Better bounding on softdep_flushfiles; other minor tweeks to checks.	2000-01-17 06:35:11 +00:00
Kirk McKusick	107d5039ef	Must track multiple uncommitted renames until one ultimately gets committed to disk or is removed.	2000-01-17 06:28:18 +00:00
Matthew Dillon	173cce7c8e	Non-operational change, fix compiler warning. Reviewed by: mckusick	2000-01-14 04:39:28 +00:00
Kirk McKusick	d7127837a2	Confirming Peter's fix (locking 101: release the lock before you go to sleep). Locking 101, part 2: do not look at buffer contents after you have been asleep. There is no telling what wonderous changes may have occurred.	2000-01-13 20:03:22 +00:00
Peter Wemm	7f473504e6	Free the global softupdates lock prior to tsleep() in getdirtybuf(). This seems to be responsible for a bunch of panics where the process sleeps and something else finds softupdates "locked" when it shouldn't be. This commit is unreviewed, but has been a big help here. Previously my boxes would panic pretty much on the first fsync() that wrote something to disk.	2000-01-13 18:48:12 +00:00
Kirk McKusick	1c2ceb2880	Because cylinder group blocks are now written in background, it is no longer sufficient to get a lock on a buffer to know that its write has been completed. We have to first get the lock on the buffer, then check to see if it is doing a background write. If it is doing background write, we have to wait for the background write to finish, then check to see if that fullfilled our dependency, and if not to start another write. Luckily the explanation is longer than the fix.	2000-01-13 07:20:01 +00:00
Kirk McKusick	94313add1f	A panic occurs during an fsync when a dirty block associated with a vnode has not been written (which would clear certain of its dependencies). The problems arises because fsync with MNT_NOWAIT no longer pushes all the dirty blocks associated with a vnode. It skips those that require rollbacks, since they will just get instantly dirty again. Such skipped blocks are marked so that they will not be skipped a second time (otherwise circular dependencies would never clear). So, we fsync twice to ensure that everything will be written at least once.	2000-01-13 07:17:39 +00:00
Kirk McKusick	4ed62fbd7f	The only known cause of this panic is running out of disk space. The problem occurs when an indirect block and a data block are being allocated at the same time. For example when the 13th block of the file is written, the filesystem needs to allocate the first indirect block and a data block. If the indirect block allocation succeeds, but the data block allocation fails, the error code dellocates the indirect block as it has nothing at which to point. Unfortunately, it does not deallocate the indirect block's associated dependencies which then fail when they find the block unexpectedly gone (ptr == 0 instead of its expected value). The fix is to fsync the file before doing the block rollback, as the fsync will flush out all of the dependencies. Once the rollback is done the file must be fsync'ed again so that the soft updates code does not find unexpected changes. This approach is much slower than writing the code to back out the extraneous dependencies, but running out of disk space is not expected to be a common occurence, so just getting it right is the main criterion. PR: kern/15063 Submitted by: Assar Westerlund <assar@stacken.kth.se>	2000-01-11 08:27:00 +00:00
Kirk McKusick	10767f840b	We cannot proceed to free the blocks of the file until the dependencies have been cleaned up by deallocte_dependencies(). Once that is done, it is safe to post the request to free the blocks. A similar change is also needed for the freefile case.	2000-01-11 06:52:35 +00:00
Poul-Henning Kamp	ba4ad1fcea	Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde	2000-01-10 12:04:27 +00:00
Kirk McKusick	26e5527c86	Missing FREE_LOCK call before handle_workitem_freeblocks. Submitted by: "Kenneth D. Merry" <ken@kdm.org>	2000-01-10 08:39:03 +00:00
Kirk McKusick	cf60e8e4bf	Several performance improvements for soft updates have been added: 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.	2000-01-10 00:24:24 +00:00
Kirk McKusick	f0f7d38386	Keep tighter control of removal dependencies by limiting the number of dirrem structure rather than the collaterally created freeblks and freefile structures. Limit the rate of buffer dirtying by the syncer process during periods of intense file removal.	2000-01-09 23:35:38 +00:00
Kirk McKusick	3f5b28bc07	Reorganize softdep_fsync so that it only does the inode-is-flushed check before the inode is unlocked while grabbing its parent directory. Once it is unlocked, other operations may slip in that could make the inode-is-flushed check fail. Allowing other writes to the inode before returning from fsync does not break the semantics of fsync since we have flushed everything that was dirty at the time of the fsync call.	2000-01-09 23:14:57 +00:00
Kirk McKusick	e2dc60835d	Get rid of unreferenced function.	2000-01-09 22:42:42 +00:00
Kirk McKusick	83aaf63ab2	Make static non-exported functions from soft updates.	2000-01-09 22:40:09 +00:00
Peter Wemm	c447342094	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.	1999-12-29 05:07:58 +00:00
Bruce Evans	7e58bfacbe	Update the unclean flag for mount -u. I forgot to handle this case when I made the absence of the clean flag sticky in rev.1.88. This was a problem main for "mount /". There is no way to mount "/" for writing without using mount -u (normally implicitly), so after "mount -f /" of an unclean filesystem, the absence of the clean flag was sticky forever.	1999-12-23 15:42:14 +00:00
Eivind Eklund	369dc8ceb8	Change incorrect NULLs to 0s	1999-12-21 11:14:12 +00:00
Robert Watson	91f37dcba1	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind	1999-12-19 06:08:07 +00:00
Kirk McKusick	6a4152243f	The function request_cleanup() had a tsleep() with PCATCH. It is quite dangerous, since the process may hold locks at the point, and if it is stopped in that tsleep the machine may hang. Because the sleep is so short, the PCATCH is not required here, so it has been removed. For the future, the FreeBSD team needs to decide whether it is still reasonable to stop a process in tsleep, as that may affect any other code that uses PCATCH while holding kernel locks. Submitted by: Dmitrij Tejblum <tejblum@arc.hq.cti.ru> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-12-16 22:02:09 +00:00
Eivind Eklund	762e6b856c	Introduce NDFREE (and remove VOP_ABORTOP)	1999-12-15 23:02:35 +00:00
Eivind Eklund	6bdfe06ad9	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter	1999-12-11 16:13:02 +00:00
Bill Fumerola	43cd4e8815	Remove the 'alpha, use at your own risk' death-statement. Reviewed by: mckusick (verbally at FreeBSDcon)	1999-12-03 00:40:31 +00:00
Bill Fumerola	cfa5001489	Fix typo, add $FreeBSD$	1999-12-03 00:34:26 +00:00
Kirk McKusick	9f54c05286	Preferentially allocate the first indirect block in the same cylinder group as the inode. This makes a 15% difference in read speed for files in the 96K to 500K size range.	1999-12-01 19:33:12 +00:00
Poul-Henning Kamp	38224dcd59	Convert various pieces of code to use vn_isdisk() rather than checking for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos	1999-11-22 10:33:55 +00:00
Eivind Eklund	b2f2b704d0	We do not have ffs_checkexp, so remove the prototype	1999-11-20 16:44:44 +00:00
Poul-Henning Kamp	0429e37ade	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
Poul-Henning Kamp	698f9cf828	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.	1999-11-09 14:15:33 +00:00
Bruce Evans	5bd5c8b9e5	Quick fix for breakage of ext2fs link counts as reported by stat(2) by the soft updates changes: only report the link count to be i_effnlink in ufs_getattr() for file systems that maintain i_effnlink. Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>	1999-11-03 12:05:39 +00:00
Mike Smith	6d14782861	Newline-terminate the complaint message about not being able to find the root vnode pointer.	1999-11-01 23:57:28 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Poul-Henning Kamp	b89392e703	Remove the D_NOCLUSTER[RW] options which were added because vn had problems. Now that Matt has fixed vn, this can go. The vn driver should have used d_maxio (now si_iosize_max) anyway.	1999-09-30 07:11:30 +00:00
Poul-Henning Kamp	1b5464ef9d	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
Alfred Perlstein	c24fda81c9	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD	1999-09-11 00:46:08 +00:00
Peter Wemm	280652828b	$Id$ -> $FreeBSD$	1999-08-28 02:16:32 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	41d2e3e09e	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.	1999-08-25 12:24:39 +00:00
Sheldon Hearn	740e3a15f7	Fix bug introduced in rev 1.28, which causes kernel build to break for the case where DEBUG is defined but not DIAGNOSTIC. ffs_checkblk is declared conditionally on DIAGNOSTIC, not DEBUG. PR: 13314 Reviewed by: bde	1999-08-24 08:39:41 +00:00
Bruce Evans	d918320517	Use devtoname() to print dev_t's instead of casting them to long or u_long for misprinting in %lx format.	1999-08-23 20:35:21 +00:00
Poul-Henning Kamp	7dc5cd047f	The bdevsw() and cdevsw() are now identical, so kill the former.	1999-08-13 10:29:38 +00:00
Poul-Henning Kamp	0ef1c82630	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.	1999-08-08 18:43:05 +00:00
Kirk McKusick	4dc0c8f521	Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>	1999-07-13 18:20:13 +00:00
Poul-Henning Kamp	68de329e34	Use the fsid from the superblock, unless it looks bogus or has already been taken by some other filesystem.	1999-07-11 19:16:50 +00:00
Ollivier Robert	7fe29b0aef	Add $Id$ Approved by: kirk	1999-07-07 07:51:04 +00:00
John Polstra	24755bdc25	Update pathnames for new location of soft-updates sources.	1999-07-03 21:34:05 +00:00
Kirk McKusick	48703fedf1	No longer need to set B_ASYNC flag since BUF_KERNPROC now unconditionally sets the identity of the buffer.	1999-06-29 15:57:40 +00:00
Peter Wemm	a6451da76b	Keep the inlines for <sys/buf.h> happy..	1999-06-27 13:26:23 +00:00
Kirk McKusick	67812eacd7	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.	1999-06-26 02:47:16 +00:00
Kirk McKusick	7481264c1e	On our final pass through ffs_fsync, do all I/O synchronously so that we can find out if our flush is failing because of write errors. This change avoids a "flush failed" panic during unrecoverable disk errors.	1999-06-18 05:49:46 +00:00
Kirk McKusick	f9c8cab591	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.	1999-06-16 23:27:55 +00:00
Kirk McKusick	e4ab40bcb6	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.	1999-06-15 23:37:29 +00:00
Poul-Henning Kamp	2447bec829	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.	1999-05-31 11:29:30 +00:00
Julian Elischer	2e897e94b6	Cosmetic changes to make it compile without errors in gcc -Wall	1999-05-22 04:43:04 +00:00
Kirk McKusick	c2606ec5c6	Add a hook to ffs_fsync to allow soft updates to get first chance at doing a sync on the block device for the filesystem. That allows it to push the bitmap blocks before the inode blocks which greatly reduces the number of inode rollbacks that need to be done.	1999-05-14 01:26:46 +00:00
Peter Wemm	51b5226683	Try and fix a dev_t/major/minor etc nit.	1999-05-12 22:32:07 +00:00
Kirk McKusick	71a0942aca	Put back changes that might be causing trouble on Alpha.	1999-05-09 19:39:54 +00:00
Poul-Henning Kamp	4be2eb8c49	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.	1999-05-08 06:40:31 +00:00
Poul-Henning Kamp	46eede0058	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)	1999-05-07 10:11:40 +00:00
Kirk McKusick	36cfb417de	Whitespace cleanup.	1999-05-07 05:21:16 +00:00
Kirk McKusick	7957996abd	Get rid of random debugging cruft; sync up with latest version.	1999-05-07 05:11:31 +00:00
Kirk McKusick	224a6aa241	Severe slowdowns have been reported when creating or removing many files at once on a filesystem running soft updates. The root of the problem is that soft updates limits the amount of memory that may be allocated to dependency structures so as to avoid hogging kernel memory. The original algorithm just waited for the disk I/O to catch up and reduce the number of dependencies. This new code takes a much more aggressive approach. Basically there are two resources that routinely hit the limit. Inode dependencies during periods with a high file creation rate and file and block removal dependencies during periods with a high file removal rate. I have attacked these problems from two fronts. When the inode dependency limits are reached, I pick a random inode dependency, UFS_UPDATE it together with all the other dirty inodes contained within its disk block and then write that disk block. This trick usually clears 5-50 inode dependencies in a single disk I/O. For block and file removal dependencies, I pick a random directory page that has at least one remove pending and VOP_FSYNC its directory. That releases all its removal dependencies to the work queue. To further hasten things along, I also immediately start the work queue process rather than waiting for its next one second scheduled run.	1999-05-07 02:26:47 +00:00
Peter Wemm	dfd5dee1b0	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.	1999-05-06 18:13:11 +00:00
Alan Cox	4221e284a3	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-05-02 23:57:16 +00:00
Mike Smith	f4711b2df4	Simplify the tunefs example, since tunefs uses getfsfile(). Lots of people complain about working out what device their filesystems are mounted on.	1999-04-27 21:11:19 +00:00
Kirk McKusick	38e28fd66b	Reorganize locking to avoid holding the lock during calls to bdwrite and brelse (which may sleep in some systems). Obtained from: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 06:38:07 +00:00
Kirk McKusick	eef33ce9bd	When fsync'ing a file on a filesystem using soft updates, we first try to write all the dirty blocks. If some of those blocks have dependencies, they will be remarked dirty when the I/O completes. On systems with really fast I/O systems, it is possible to get in an infinite loop trying to flush the buffers, because the I/O finishes before we can get all the dirty buffers off the v_dirtyblkhd list and into the I/O queue. (The previous algorithm looped over the v_dirtyblkhd list writing out buffers until the list emptied.) So, now we mark each buffer that we try to write so that we can distinguish the ones that are being remarked dirty from those that we have not yet tried to flush. Once we have tried to push every buffer once, we then push any associated metadata that is causing the remaining buffers to be redirtied. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>	1999-03-02 04:04:31 +00:00
Kirk McKusick	4cbb89d95d	Ensure that softdep_sync_metadata can handle bmsafemap and mkdir entries if they ever arise (which should not happen as softdep_sync_metadata is currently used).	1999-03-02 00:19:47 +00:00
Kirk McKusick	133ff2619a	fix double LIST_REMOVE; other cosmetic changes to match version 9.32. Obtained from: Jeffrey Hsu <hsu@FreeBSD.ORG>	1999-02-17 20:01:20 +00:00
Matthew Dillon	8aef171243	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile	1999-01-28 00:57:57 +00:00
David Greenman	8ab2fa0073	Gutted softdep_deallocate_dependencies and replaced it with a panic. It turns out to not be useful to unwind the dependencies and continue in the face of a fatal error. Also changed the log() to a printf() in softdep_error() so that it will be output in the case of a impending panic. Submitted by: Kirk McKusick <mckusick@mckusick.com>	1999-01-22 09:07:32 +00:00
Eivind Eklund	5b1b6c5859	Silence warning about unused debug function. (I'll turn this function into a DDB command in my next staticization sweep).	1999-01-12 11:42:41 +00:00
Eivind Eklund	a862221fa0	Add a warning about the copyright restraints.	1999-01-08 16:03:12 +00:00
Bruce Evans	de5d1ba57c	Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().	1999-01-07 16:14:19 +00:00
Bruce Evans	4591d9bb7e	UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value MNT_WAIT when we mean boolean `true' or check for that value not being passed. There was no problem in practice because MNT_WAIT had the magic value of 1.	1999-01-06 18:18:06 +00:00
Bruce Evans	d64dbc8719	Ifdefed the conditionally used variable `prtrealloc'. Declare it as volatile so that there is no chance that the code that it controls is optimised away.	1999-01-06 17:04:33 +00:00
Bruce Evans	5991fd0370	Backed out rev.1.47. It just broke my optimisations for lazy syncing of timestamps in rev.1.45. The soft updates bug was elsewhere. Forgotten by: luoqi	1999-01-06 16:52:38 +00:00
Eivind Eklund	fb1167777a	Remove the 'waslocked' parameter to vfs_object_create().	1999-01-05 18:50:03 +00:00
Eivind Eklund	a777e82019	Remove the last clients of vfs_object_create(..., waslocked=1); waslocked will go away shortly. Reviewed by: dg	1999-01-02 01:32:36 +00:00
Julian Elischer	1f35e8c8da	Remove some compiler warnings.	1998-12-10 20:11:47 +00:00
Bruce Evans	672be20b9f	Don't use the strange null pointer constant `(ufs_daddr_t)0' in a call to VOP_BMAP(). Don't use uncast NULLs in the same call.	1998-11-29 03:12:06 +00:00
David Greenman	1c680b45a2	Restored the "reallocblks" code to its former glory. What this does is basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working.	1998-11-13 01:01:44 +00:00
Peter Wemm	2ec07c6614	Change dirty block list handling to use TAILQ macros.	1998-10-31 15:33:32 +00:00
Peter Wemm	40c8cfe552	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.	1998-10-31 15:31:29 +00:00
Jordan K. Hubbard	2dcc2f0693	Clarify a rather ambiguous debugging message.	1998-10-28 10:37:54 +00:00
Bruce Evans	b5ee16407f	Oops, the redundant tests for major numbers weren't redundant here. They checked for the magic major number for the "device" behind mfs mount points. Use a more obvious check for this device. Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>	1998-10-27 11:47:08 +00:00
Bruce Evans	9c0619dace	Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted when bdevsw[] became sparse. We still depend on magic to avoid having to check that (v_rdev) device numbers in vnodes are not NODEV. Removed redundant `major(dev) < nblkdev' tests instead of updating them.	1998-10-25 19:02:48 +00:00
Poul-Henning Kamp	f5ef029e92	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.	1998-10-25 17:44:59 +00:00
Nate Williams	ed8d80c2de	Fix 'noatime' bug that was unrelated to use of noatime. The problem is caused when a directory block is compacted. When this occurs, softdep_change_directoryentry_offset() is called to relocate each directory entry and adjust its matching diradd structure, if any, to match the new location of the entry. The bug is that while softdep_change_directoryentry_offset() correctly adjusts the offsets of the diradd structures on the pd_diraddhd[] lists (which are not yet ready to be committed to disk), it fails to adjust the offsets of the diradd structures on the pd_pendinghd list (which are ready to be committed to disk). This causes the dependency structures to be inconsistent with the buf contents. Now, if the compaction has moved a directory entry to the same offset as one of the diradd structures on the pd_pendinghd list and a syscall is done that tries to remove this directory entry before this directory block has been written to disk (which would empty pd_pendinghd), a sanity check in newdirrem() will call panic() when it notices that the inode number in the entry that it is to be removed doesn't match the inode number in the diradd structure with that offset of that entry. Reviewed by: Kirk McKusick <mckusick@McKusick.COM> Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>	1998-10-03 19:17:11 +00:00
Bruce Evans	0922cce61c	Fixed clean flag handling: - don't set the clean flag on unmount of an unclean filesystem that was (forcibly) mounted rw. - set the clean flag on rw -> ro update of a mounted initially-clean filesystem. - fixed some style bugs (mostly long lines). This uses the fs_flags field and FS_UNCLEAN state bit which were introduced in the softdep changes. NetBSD uses extra state bits in fs_clean. Reviewed by: luoqui	1998-09-26 04:59:42 +00:00
Luoqi Chen	e266594c25	Eliminate a race in VOP_FSYNC() when softupdates is enabled. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Two minor changes are also included, 1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set, vn_lock should always succeed in these cases. 2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount a little more unstable. It also keeps us in sync with other BSDs. Suggested by: Bruce Evans <bde@zeta.org.au>	1998-09-24 15:02:46 +00:00
Luoqi Chen	f9e84c2fee	Restore pre-v1.44 behavior: always copy modified in-core inode to disk buffer. Otherwise some in-core inode changes might be lost, including important meta data (e.g. size) if softupdates is enabled.	1998-09-15 14:45:28 +00:00
Søren Schmidt	d024c95599	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.	1998-09-14 19:56:42 +00:00
Bruce Evans	9164000766	Don't dereference an uninitialized pointer in dead code. The dead code gets executed if it is compiled without optimization.	1998-09-12 14:46:15 +00:00
Bruce Evans	8994ca3ce9	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.	1998-09-07 13:17:06 +00:00
Bruce Evans	ff261f16f6	Put the zombie ffs sysctl node in "notyet" state together with its few remaining children. Prepare it for MOUNT_UFS going away.	1998-09-07 11:50:19 +00:00
Poul-Henning Kamp	0375c9f2b8	Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform device drivers about sectors no longer in use. Device-drivers receive the call through d_strategy, if they have D_CANFREE in d_flags. This allows flash based devices to erase the sectors and avoid pointlessly carrying them around in compactions. Reviewed by: Kirk Mckusick, bde Sponsored by: M-Systems (www.m-sys.com)	1998-09-05 14:13:12 +00:00
Bruce Evans	0492d857d1	Removed unused includes.	1998-08-17 19:09:36 +00:00
Julian Elischer	55d80b2df1	Handle the case of moving a directory onto the top of a sibling's child of the same name. Submitted by: Kirk Mckusick with fixes from luoqi Chen Obtained from: Whistle test tree.	1998-08-12 20:46:47 +00:00
Bruce Evans	ac1e407b32	Fixed printf format errors.	1998-07-11 07:46:16 +00:00
Julian Elischer	bcbd6c6fdd	Don't update superblock if mounted readonly, also fixes some problems with softupdates on root. More cleanups are needed here.. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>	1998-07-08 23:52:27 +00:00
Julian Elischer	fd5d1124e2	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>	1998-07-04 20:45:42 +00:00
Bruce Evans	3055187290	Sync timestamp changes for inodes of special files to disk as late as possible (when the inode is reclaimed). Temporarily only do this if option UFS_LAZYMOD configured and softupdates aren't enabled. UFS_LAZYMOD is intentionally left out of /sys/conf/options. This is mainly to avoid almost useless disk i/o on battery powered machines. It's silly to write to disk (on the next sync or when the inode becomes inactive) just because someone hit a key or something wrote to the screen or /dev/null. PR: 5577 Previous version reviewed by: phk	1998-07-03 22:17:03 +00:00
Bruce Evans	33cc029eab	Centralized in-core inode update. Update the in-core inode directly in ufs_setattr() so that there is no need to pass timestamps to UFS_UPDATE() (everything else just needs the current time). Ignore the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes() (was: itimes()) to do the update. The timestamps are still passed so that all the callers don't need to be changed yet.	1998-07-03 18:46:52 +00:00
Jordan K. Hubbard	d94ce17be4	Flesh this document out just a little in response to some user questions and also recommend linking over copying since, at this stage, a stale copy is a real concern.	1998-06-26 10:35:55 +00:00
Julian Elischer	c619155f0e	Slight change to directory cleanup Makes soft updates a bit cleaner. Eliminates some warnings about 'corrupted directories' from fsck.	1998-06-14 19:31:28 +00:00
Julian Elischer	28ed032673	Note which version of Kirk's sources this corresponds to.	1998-06-12 21:21:26 +00:00
Julian Elischer	aa75cb86b4	Fix the case when renaming to a file that you've just created and deleted, that had an inode that has not yet been written to disk, when the inode of the new file is also not yet written to disk, and your old directory entry is not yet on disk but you need to remove it and the new name exists in memory but has been deleted but the transaction to write the deleted name to disk exists and has not yet been cancelled by the request to delete the non existant name. I don't know how kirk could have missed such a glaring problem for so long. :-) Especially since the inconsitency survived on the disk for a whole 4 second on average before being fixed by other code. This was not a crashing bug but just led to filesystem inconsitencies if you crashed. Submitted by: Kirk McKusick (mckusick@mckusick.com)	1998-06-12 20:48:30 +00:00
Julian Elischer	6d0ba44288	Add B_NOCACHE to several cases where BSD4.4 only required a B_INVAL. Change worked out by john and kirk in consort.	1998-06-11 17:44:32 +00:00
Julian Elischer	8c221701c3	Fix for "live inode" panic. Submitted by: Kirk McKusick <mckusick@McKusick.COM> Reviewed by: yeah right...	1998-06-10 20:45:46 +00:00
Julian Elischer	4af0bb0f9e	Remove buggy debugging code.	1998-06-10 20:03:16 +00:00
Julian Elischer	939001af5c	Back out John's changes 1.45 -> 1.46 Kirk confirms that the original semantic was what he wanted... (well, a very slight difference) May fix "dangling deps" panic with soft updates.	1998-06-10 19:27:56 +00:00
Doug Rabson	8435e0aef5	Use size_t instead of u_int for sizes.	1998-06-04 17:21:39 +00:00
Julian Elischer	00076e7cf9	Add a reference to the original softupdates paper	1998-06-02 01:30:51 +00:00
Julian Elischer	3942b533f8	Add a reference to the Ganger/Patt paper	1998-06-02 01:27:27 +00:00
Julian Elischer	b8cf4de4c8	A fix to a debug test from Kirk.	1998-05-27 03:32:23 +00:00
Julian Elischer	928c9ddf81	Ensure that there is enough information here, so that people can use soft updates should they desire.	1998-05-19 23:18:37 +00:00
Julian Elischer	25db4e8a66	Bring up-to-date with Whistle's current version Includes some debugging code.	1998-05-19 23:07:25 +00:00
Julian Elischer	46e752be05	Merge with Kirk's version as of Feb 20 His version 9.23 == our version 1.5 of ffs_softdep.c His version 9.5 == our version 1.4 of softdep.c	1998-05-19 22:54:53 +00:00
Julian Elischer	62e12c760c	Merge in Kirk's changes to stop softupdates from hogging all of memory.	1998-05-19 21:45:53 +00:00
Julian Elischer	b6dad36385	Change to stop a silly panic. This should be understood better. Change a buffer swizzle trick to a bcopy. It would be nice if the efficient trick could be used in the future.	1998-05-19 20:50:41 +00:00
Julian Elischer	987614a910	First published FreeBSD version of soft updates Feb 5.	1998-05-19 20:18:42 +00:00
Julian Elischer	a697eb98d4	This commit was generated by cvs2svn to compensate for changes in r36206, which included commits to RCS files with non-trunk default branches.	1998-05-19 20:03:29 +00:00
Julian Elischer	8e95b94dec	Import the next version received from kirk after some FreeBSD feedback.	1998-05-19 20:03:29 +00:00
Julian Elischer	8d1c524575	This commit was generated by cvs2svn to compensate for changes in r36201, which included commits to RCS files with non-trunk default branches.	1998-05-19 19:47:22 +00:00
Julian Elischer	467e1a6e7a	Import the earliest version of the soft update code that I have.	1998-05-19 19:47:22 +00:00
Julian Elischer	c11d29814e	try stop the user from using mount -u to set the async flag on a filesystem currently using soft updates. Also needs a new copy of ffs_softdep.c to complete the fix.	1998-05-18 06:38:18 +00:00

... 7 8 9 10 11 ...

1017 Commits