freebsd-skq

Author	SHA1	Message	Date
Poul-Henning Kamp	67673e6677	Create struct snapdata which contains the snapshot fields from cdev and the previously malloc'ed snapshot lock. Malloc struct snapdata instead of just the lock. Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes). While here, give the private readblock() function a vnode argument in preparation for moving UFS to access GEOM directly.	2004-09-13 07:29:45 +00:00
Poul-Henning Kamp	883d3c0c07	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
John Baldwin	b72ea57f3b	Generalize the UFS bad magic value used to determine when a filesystem has only been partly initialized via newfs(8) so that it applies to both UFS1 and UFS2. Submitted by: "Xin LI" delphij at frontfree dot net MFC: maybe?	2004-08-19 11:09:13 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Poul-Henning Kamp	7ac439fec4	use bufdone() not biodone().	2004-08-08 13:23:05 +00:00
Poul-Henning Kamp	5e8c582ac2	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alexander Kabaev	b403319b8d	Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.	2004-07-28 06:41:27 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Poul-Henning Kamp	d8d3d4158b	Make sure to update the mnt_stats before UFS1 extattr tried to do I/O on the device. Otherwise the blocksize is undefined in the buffer cache.	2004-07-14 14:19:32 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Marcel Moolenaar	f65de26bf6	Update for the KDB debugger framework: o Make debugging code conditional upon KDB. o Use kdb_backtrace() instead of backtrace(). o Remove inclusion of opt_ddb.h.	2004-07-10 20:45:47 +00:00
Poul-Henning Kamp	c94cd5fc8c	Explicity initialize vp->v_bsize.	2004-07-07 20:04:06 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Jun Kuriyama	86030e4a00	Avoid deadlock which is caused by locking VDIR of parent and VREG of snapshot itself in wrong order. We can skip unlink check of that directory because it must have snapshot in it. Reviewed by: mckusick and current@	2004-06-18 14:35:17 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Stefan Farfeleder	1a5ff9285a	Avoid assignments to cast expressions. Reviewed by: md5 Approved by: das (mentor)	2004-06-08 13:08:19 +00:00
Tim J. Robbins	fa2a4d0595	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
Kirill Ponomarev	b4a1d9299a	- Fix typo Approved by: tobez	2004-05-31 16:55:12 +00:00
Ken Smith	4b14cc0205	Upon further review it was decided this piece of the msync(2) fixes was applicable to HEAD, originally it was thought this should only be done in RELENG_4. Implement IO_INVAL in the vnode op for writing by marking the buffer as "no cache". This fix has already been applied to RELENG_4 as Rev. 1.65.2.15 of ufs/ufs/ufs_readwrite.c. Reviewed by: alc, tegge	2004-05-21 12:05:48 +00:00
Ken Smith	83d8045f16	Style fixup in previous commit. Noticed by: bde (thanks!)	2004-05-19 18:06:21 +00:00
Ken Smith	f7dd67d801	Change ffs_realloccg() to set the valid bits for the extended part of the fragment to zero the valid parts of a VM_IO buffer. RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately. Reviewed by: alc, tegge	2004-05-14 22:00:08 +00:00
Bosko Milekic	451079d4ab	Revert previous change to this file because it breaks some things which compare /etc/fstab entries to results from getfsstat(). The real way to fix this is to make 'ufs2' a recognized filesystem (for real, no beating around the bush). This should fix things like 'umount -a -t ufs' now. Appologies for the previous breakage.	2004-04-29 15:10:42 +00:00
Bosko Milekic	2aebb586db	The previous change to mount(8) to report ufs or ufs2 used libufs, which only works for Charlie root. This change reverts the introduction of libufs and moves the check into the kernel. Since the f_fstypename is the same for both ufs and ufs2, we check fs_magic for presence of ufs2 and copy "ufs2" explicitly instead. Submitted by: Christian S.J. Peron <maneo@bsdpro.com>	2004-04-26 15:13:46 +00:00
Bruce Evans	f679aa45a7	Record where half the bits in this file came from (from ufs_readwrite.c). Damage to history from moving bits was especially large since a repo copy is not feasible for partial files.	2004-04-07 11:21:18 +00:00
Warner Losh	012d41340a	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson	2004-04-07 03:47:21 +00:00
John Baldwin	255ec151e6	Fix a paste-o from the buf_prewrite() cleanup commit and check for the MNTK_SUSPEND flag on the correct vnode pointer in softdep_disk_prewrite(). Reviewed by: phk Tested by: kensmith	2004-04-06 19:20:24 +00:00
Maxime Henrion	b1fddb236f	Fix the remaining warnings of growfs(8) on my sparc64 box with WARNS=6. I don't change the WARNS level in the Makefile because I didn't tested this on other archs. The fs.h fix was suggested by: marcel Reviewed by: md5(1)	2004-04-03 23:30:59 +00:00
Alexander Kabaev	c355fd5a84	Avoid doing bawrite to initialize inode block while holding cylinder group block locked. If filesystem has any active snapshots, bawrite can come back trying to allocate new snapshot data block from the same cylinder group and cause panic due to recursive lock attempt. PR: 64206 Reviewed by: mckusick Tested by: pjd	2004-03-16 22:06:32 +00:00
Poul-Henning Kamp	ceb58ca58f	When I was a kid my work table was one cluttered mess an cleaning it up were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.	2004-03-11 18:50:33 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
Kirk McKusick	546a1660f0	In the function clear_inodedeps(), a FREE_LOCK() should be called AFTER the call to vn_start_write(), not before it. Otherwise, it is possible to unlock it multiple times if the vn_start_write() fails. Submitted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>	2004-02-23 06:56:31 +00:00
Bruce Evans	e9827c6d93	Fixed some style bugs: - don't unlock the vnode after vinvalbuf() only to have to relock it almost immediately. - don't refer to devices classified by vn_isdisk() as block devices.	2004-02-14 04:41:13 +00:00
Bruce Evans	0efb13948d	MFextfs: backed out secondary changes in rev.1.40 that had become just style bugs (a variable that is used only once, and misformattings).	2004-02-13 03:05:12 +00:00
Jun Kuriyama	df1941fb59	Fix style bugs in previous commit. Submitted by: bde	2004-02-13 02:02:06 +00:00
Bruce Evans	8adff5fc12	Fixed some minor style bugs (English usage and formatting of binary operators) in and near revs.1.169-1.170 (open mode bandaid). This (or better a proper fix) should have been done before cloning the bandaid to many other file systems.	2004-02-12 16:52:24 +00:00
Jun Kuriyama	5580f04ab0	Reverse lock order by using local variable. This will shut up "acquiring duplicate lock of same type" message. Reviewed by: mckusick	2004-02-12 08:52:08 +00:00
Bruce Evans	1723bc36ef	Removed more vestiges of vfs_ioopt: - rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed timestamp bugs in it. Removal of most of vfs_ioopt made it just and optimization, and removal of the vm object reference calls made it less than an optimization. It was cloned in rev.1.94 of ufs_readwrite.c as part of cloning ffs_extwrite() although it was always less than an optimization in ffs_extwrite(). - some comments, compound statements and vertical whitespace were vestiges of dead code.	2004-02-11 15:27:26 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alan Cox	bfb7317ebf	Remove unnecessary vm object reference and deallocate calls from ffs_read() and ffs_write(). These calls trace their origins to the dead vfs_ioopt code, first appearing in revision 1.39 of ufs_readwrite.c. Observed by: bde Discussed with: tegge	2004-01-31 05:42:58 +00:00
Andrey A. Chernov	a0036d23a6	Turn uio_resid/uio_offset comments into KASSERTs Reviewed by: bde	2004-01-27 11:28:38 +00:00
Andrey A. Chernov	51cf017614	Copy comment about caller check from ffs_read to ffs_extread, don't check for uio_resid < 0 here too.	2004-01-23 06:00:41 +00:00
Andrey A. Chernov	070f8eefb1	Fix various panic() strings to reflect true function name to allow easy grep. Small code reorganization to look more logic. Copy ffs_write check from prev. commit to ffs_extwrite.	2004-01-23 05:52:31 +00:00
Andrey A. Chernov	bd0cc17757	ffs_read: Replace wrong check returned EFBIG with EOVERFLOW handling from POSIX: 36708 [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file, and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes. ffs_write: Replace u_int64_t cast with uoff_t cast which is more natural for types used. ffs_write & ffs_read: Remove uio_offset and uio_resid checks for negative values, the caller supposed to do it already. Add comments about it. Reviewed by: bde	2004-01-23 05:38:02 +00:00
Alexander Kabaev	6bd39fe978	Spell magic '16' number as IO_SEQSHIFT.	2004-01-19 20:03:43 +00:00
Alexander Kabaev	291027ce9c	Avoid calling vprint on a vnode while holding its interlock mutex. Move diagnostic printf after vget. This might delay the debug output some, but at least it keeps kernel from exploding if DEBUG_VFS_LOCKS is in effect.	2004-01-04 04:08:34 +00:00
Don Lewis	31c81e4bed	Set fs_ronly to the correct value in ffs_reload() when reloading the file system super block after fsck has repaired the file system. The value of fs_ronly was getting overwritten, which caused ffs_update() to attempt to update inode timestamps even though the file system was still mounted read-only. This fixes the "giving up on N buffers" error that is triggered by running fsck on the root file system and then rebooting without mounting the file system read-write.	2003-12-07 05:16:52 +00:00
Wes Peters	ec52df8eb9	Write the UFS2 superblock with a 'BAD' magic number at the beginning of newfs, to signify the newfs operation has not yet completed. Re- write the superblock with the correct magic number once all of the cylinder groups have been created to show the operation has finished. Sponsored by: St. Bernard Software	2003-11-16 07:08:27 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	c78b8dfacf	Call free(9) after the vnode interlock is released, avoiding a lock-order reversal.	2003-11-13 03:56:32 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Alexander Kabaev	45d45c6cde	Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter and one cannot be sure it is equal to curthread.	2003-11-03 04:46:19 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Don Lewis	9f206707a5	Tweak the calculation of minbfree in ffs_dirpref() so that only those cylinder groups that have at least 75% of the average free space per cylinder group for that file system are considered as candidates for the creation of a new directory. The previous formula for minbfree would set it to zero if the file system was more than 75% full, which allowed cylinder groups with no free space at all to be chosen as candidates for directory creation, which resulted in an expensive search for free blocks for each file that was subsequently created in that directory. Modify the calculation of minifree in the same way. Decrease maxcontigdirs as the file system fills to decrease the likelyhood that a cluster of directories will overflow the available space in a cylinder group. Reviewed by: mckusick Tested by: kmarx@vicor.com MFC after: 2 weeks	2003-10-31 07:25:06 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Tor Egge	f0da6ec99b	Initialize bp->b_offset to the physical offset in partition so GEOM knows where to read from disk.	2003-10-22 18:57:59 +00:00
Poul-Henning Kamp	2c18019f14	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
Poul-Henning Kamp	4e1694ecaf	Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()	2003-10-18 11:16:33 +00:00
Kirk McKusick	bd189c8c3e	When expunging unlinked files from a snapshot, skip over holes in the file rather than panicing with "indiracct: botched params". Submitted by: Mark Santcroos <marks@ripe.net>	2003-10-17 13:57:58 +00:00
Jeff Roberson	a844eb934c	- My last commit to this file is still not safe, I believe that it may be due to the recursion in indir_trunc().	2003-10-06 03:28:03 +00:00
Jeff Roberson	8af6a57099	- Reinstate 1.142 this was fixed by 1.144.	2003-10-06 02:39:37 +00:00
Jeff Roberson	69b609a85d	- The VCHR case in ffs_sync() is an unneccsary optimization especially considering how infrequently we access devices via ffs now that we have devfs. Collapse this case with the other case. Obtained from: bde	2003-10-05 22:56:33 +00:00
Jeff Roberson	ab1f917b53	- Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE() so make the code slightly more uniform. The vnode lock is acquired in all cases and now the only difference between VCHR and other is we call UFS_UPDATE instead of VOP_FSYNC().	2003-10-05 09:42:24 +00:00
Jeff Roberson	cffa37d466	- In ffs_update() assert that either the vnode lock or the XLOCK is held.	2003-10-05 09:39:02 +00:00
Jeff Roberson	2f05568aa8	- Check the XLOCK before inspecting v_data. - Slightly rewrite the fsync loop to be more lock friendly. We must acquire the vnode interlock before dropping the mnt lock. We must also check XLOCK to prevent vclean() races. - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean() races. - Use a local variable to store the results of the nvp == TAILQ_NEXT test so that we do not access the vp after we've vrele()d it. - Add an XXX comment about UFS_UPDATE() not being protected by any lock here. I suspect that it should need the VOP lock.	2003-10-05 07:16:45 +00:00
Jeff Roberson	53938b4a86	- Skip over xvp if XLOCK is set.	2003-10-05 06:48:37 +00:00
Alan Cox	ccf78b6895	Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-04 20:38:32 +00:00
Jeff Roberson	cac3558da3	- The VI assert in getdirtybuf() is only valid if we're not on a VCHR vnode. VCHR vnodes don't do background writes. Reported by: kan	2003-10-04 15:57:05 +00:00
Jeff Roberson	04a17687ea	- Increase the scope of the interlock in ffs_reload(). Acquire it before we release the mntvnode_mtx. - Call vgonel() directly instead of going through vrecycle() since we own the interlock now. - Remove a few cases where we locked the interlock just so that we could call VOP_UNLOCK with interlock held.	2003-10-04 14:27:49 +00:00
Jeff Roberson	934914d2ef	- Fix an unlocked call to GETATTR by slightly shuffling the code in ffs_snapshot() around. - Acquire the interlock before releasing the mntvnode_mtx. Use the interlock to protect v_usecount access.	2003-10-04 14:25:45 +00:00
Jeff Roberson	04c81ad83c	- Remove a mp_fixme() and some locks that weren't necessary. I now understand how this works.	2003-10-04 11:06:43 +00:00
Jeff Roberson	cfd5600c66	- Several of the callers to getdirtybuf() were erroneously changed to pass in a list head instead of a pointer to the first element at the time of the first call. These lists are subject to change, and getdirtybuf() would refetch from the wrong list in some cases. Spottedy by: tegge Pointy hat to: me	2003-09-03 04:08:15 +00:00
Jeff Roberson	23efe6dafc	- Backout rev 1.142. This caused a deadlock that I do not understand. More investigation is required.	2003-08-31 11:26:52 +00:00
Jeff Roberson	d919a11d06	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
Jeff Roberson	a0ebaaddef	- Don't acquire the vnode interlock in drain_output(). Instead, require the caller to acquire it. This permits drain_output() to be done atomically with other operations as well as reducing the number of lock operations. - Assert that the proper locks are held in drain_output(). - Change getdirtybuf() to accept a mutex as an argument. This mutex is used to protect the vnode's buf list and the BKGRDWAIT flag. This lock is dropped when we successfully acquire a buffer and held on return otherwise. These semantics reduce the number of cumbersome cases in calling code. - Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF case of interlocked_sleep(). - Change the return value of getdirtybuf() to be the resulting locked buffer or NULL otherwise. This is for callers who pass in a list head that requires a lock. It is necessary since the lock that protects the list head must be dropped in getdirtybuf() so that we don't have a lock order reversal with the buf queues lock in bremfree(). - Adjust all callers of getdirtybuf() to match the new semantics. - Add a comment in indir_trunc() that points at unlocked access to a buf. This may also be one of the last instances of incore() in the tree.	2003-08-31 07:29:34 +00:00
Jeff Roberson	9dbfeb0ae6	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
Alan Cox	9cf8f2f707	The previous change necessitates the addition of a new #include. Otherwise, there is a compilation warning.	2003-08-18 17:27:08 +00:00
Poul-Henning Kamp	b103854847	Don't use a VOP_*() function on our own vnodes, go directly to the relevant internal function, in this case ufs_bmaparray().	2003-08-17 19:26:03 +00:00
Alan Cox	f6c098e569	Revision 1.44 of ufs/ufs/inode.h has made it necessary to add two new #includes to this file. Otherwise, it doesn't compile.	2003-08-16 06:15:17 +00:00
Poul-Henning Kamp	5c24d6ee26	Eliminate the i_devvp field from the incore UFS inodes, we can get the same value from ip->i_ump->um_devvp. This saves a pointer in the memory copies of inodes, which can easily run into several hundred kilobytes. The extra indirection is unmeasurable in benchmarks. Approved by: mckusick	2003-08-15 20:03:19 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
Robert Watson	9080ff25cf	Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the kernel ACL interfaces and system call names. Break out UFS2 and FFS extattr delete and list vnode operations from setextattr and getextattr to deleteextattr and listextattr, which cleans up the implementations, and makes the results more readable, and makes the APIs more clear. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-07-28 18:53:29 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
Alan Cox	4e28b22e35	Lock the vm object when freeing pages.	2003-06-15 21:50:38 +00:00
Poul-Henning Kamp	cefb5754dd	Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.	2003-06-15 18:53:00 +00:00
Poul-Henning Kamp	7652131bee	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk	2003-06-12 20:48:38 +00:00
David E. O'Brien	f4636c5959	Use __FBSDID().	2003-06-11 06:34:30 +00:00
Robert Watson	1e9e2eb598	Implement ffs_listextattr() by breaking out that logic and special-cased attribute name of "" from ffs_getextattr(). Invoking VOP_GETETATTR() with an empty name is now no longer supported; user application compatibility is provided by a system call level compatibility wrapper. We make sure to explicitly reject attempts to set an EA with the name "". Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 05:57:39 +00:00
Robert Watson	e1249def7d	Return EOPNOTSUPP for attempted EA operations on VCHR vnodes in UFS2; if we permit them to occur, the kernel panics due to our performing EA operations using VOP_STRATEGY on the vnode. This went unnoticed previously because there are very for users of device nodes on UFS2 due to the introduction of devfs. However, this can come up with the Linux compat directories and its hard-coded dev nodes (which will need to go away as we move away from hard-coded device numbers). This can come up if you use EA-intensive features such as ACLs and MAC. The proper fix is pretty complicated, but this band-aid would be an excellent MFC candidate for the release.	2003-06-01 02:42:18 +00:00
Poul-Henning Kamp	6280ed26af	Remove unused local variables. Found by: FlexeLint	2003-05-31 18:17:32 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Alan Cox	7f758dabbb	Lock the vm object when performing vm_object_page_clean(). Approved by: re (rwatson)	2003-05-18 22:02:51 +00:00
Alan Cox	ad682c4825	Lock the vm_object on entry to vm_object_vndeallocate().	2003-05-03 20:28:26 +00:00
Tim J. Robbins	3632928957	Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree(). These fields can be left as NULL if ffs_vget() allocates an inode but fails before the dinode memory has been allocated. There are two cases when this can occur: when we lose a race and another process has added the inode to the hash, and when reading the inode off disk fails. The bug was observed by Kris on one of the package-building machines. See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2 In Kris's case, it was the bread() that failed because of a disk error. The alternative to this patch is to ensure that ffs_vget() does not call vput() when the inode that hasn't been properly initialised.	2003-05-01 06:41:59 +00:00
Tim J. Robbins	8d721e877d	Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems. This is purely a cosmetic change because these members are in a union together.	2003-05-01 06:38:27 +00:00
Mark Murray	51da11a27a	Fix some easy, global, lint warnings. In most cases, this means making some local variables static. In a couple of cases, this means removing an unused variable.	2003-04-30 12:57:40 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
John Baldwin	a15cc35909	Lock both the proc lock and sched_lock when calling sched_nice since kg_nice is now protected by both. Being protected by both means that other places in the kernel that want to read kg_nice only need one of the two locks.	2003-04-22 20:45:38 +00:00
Jeff Roberson	86711bae9b	- Use the sched_nice() api instead of setting the nice value directly. Tested by: Steve Kargl <sgk@troutmask.apl.washington.edu>	2003-04-12 01:05:19 +00:00
Alan Cox	6134838f99	Sufficient access checks are performed by vmapbuf() that calling useracc() is pointless. Remove the call to useracc(). Don't reinitialize fields that are already initialized by getpbuf(). Reviewed by: tegge	2003-04-06 19:26:30 +00:00
Tor Egge	5e2e6a67c4	Check return value from vmapbuf instead of the function address.	2003-03-27 20:48:34 +00:00
Tor Egge	10dccf8ff2	Eliminate a buffer sleep/wakeup race.	2003-03-27 19:28:11 +00:00
Tor Egge	5bbb806004	Add support for reading directly from file to userland buffer when the O_DIRECT descriptor status flag is set and both offset and length is a multiple of the physical media sector size.	2003-03-26 23:40:42 +00:00
John Baldwin	31566c96f4	Use td->td_ucred instead of td->td_proc->p_ucred.	2003-03-20 21:17:40 +00:00
John Baldwin	2a53bfbe62	Minor fixes to ffs_fserr(): - Assume that curthread is not NULL. It never is in -current. - Use td_ucred instead of p_ucred.	2003-03-20 21:15:54 +00:00
Poul-Henning Kamp	b4b138c27f	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.	2003-03-18 08:45:25 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Kirk McKusick	34968037b1	Use the appropriate size when zeroing out the unused portion of a snapshot's copy of a superblock. This patch fixes a panic when taking a snapshot of a 4096/512 filesystem. Reported by: Ian Freislich <ianf@za.uu.net> Sponsored by: DARPA & NAI Labs.	2003-03-07 23:49:16 +00:00
Alan Cox	09c80124a3	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Kirk McKusick	74f3809a19	Change the field used to test whether the superblock has been updated from the filesystem size field to the filesystem maximum blocksize field. The problem is that older versions of growfs updated only the new size field and not the old size field. This resulted in the old (smaller) size field being copied up to the new size field which caused the filesystem to appear to fsck to be badly trashed. This also adds a sanity check to ensure that the superblock is not being updated when the filesystem is mounted read-only. Obviously such an update should never happen. Reported by: Nate Lawson <nate@root.org> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:21:08 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Kirk McKusick	3bf0ed940b	When removing the last item from a non-empty worklist, the worklist tail pointer must be updated. Reported by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs.	2003-02-24 07:28:41 +00:00
Kirk McKusick	5bb651cb72	This patch fixes a deadlock between the bufdaemon and a process taking a snapshot. As part of taking a snapshot of a filesystem, the kernel builds up a list of the filesystem metadata (such as the cylinder group bitmaps) that are contained in the snapshot. When doing a copy-on-write check, the list is first consulted. If the block being written is found on the list, then the full snapshot lookup can be avoided. Besides providing an important performance speedup this check also avoids a potential deadlock between the code creating the snapshot and the bufdaemon trying to cleanup snapshot related buffers. This fix creates a temporary list containing the key metadata blocks that can cause the deadlock. This temporary list is used between the time that the snapshot is first enabled and the time that the fully complete list is built. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-22 00:59:34 +00:00
Kirk McKusick	37e2ebfdba	This patch fixes a bug on an active filesystem on which a snapshot is being taken from panicing with either "freeing free block" or "freeing free inode". The problem arises when the snapshot code is scanning the filesystem looking for inodes with a reference count of zero (e.g., unlinked but still open) so that it can expunge them from its view. If it encounters a reclaimed vnode and has to restart its scan, then it will panic if it encounters and tries to free an inode that it has already processed. The fix is to check each candidate inode to see if it has already been processed before trying to delete it from the snapshot image. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:29:51 +00:00
Kirk McKusick	d60682c239	This patch fixes a bug in the logical block calculation macros so that they convert to 64-bit values before shifting rather than afterwards. Once fixed, they can be used rather than inline expanded. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:19:26 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Kirk McKusick	aca3e4974f	Replace use of random() with arc4random() to provide less guessable values for the initial inode generation numbers in newfs and for newly allocated inode generation numbers in the kernel. Submitted by: Theo de Raadt <deraadt@cvs.openbsd.org> Sponsored by: DARPA & NAI Labs.	2003-02-14 21:31:58 +00:00
Kirk McKusick	50bd54e391	Correct lines incorrectly added to the copyright message. Submitted by: Frank van der Linden <fvdl@wasabisystems.com> Sponsored by: DARPA & NAI Labs.	2003-02-14 00:31:06 +00:00
Jeff Roberson	767b9a529d	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h	2003-02-09 11:28:35 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Matthew Dillon	48e3128b34	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.	2003-01-13 00:33:17 +00:00
Matthew Dillon	cd72f2180b	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.	2003-01-12 01:37:13 +00:00
Marcel Moolenaar	cc4a858397	o Improve wording of the comment that accompanies fs_pad. The padding is not specific to non-i386 architectures. It is caused by non-i386 specific alignment requirements of fs_swuid, o Add a CTASSERT to catch a change in the size of struct fs at compile-time rather than run-time. Ok'd: gordon Tested on: i386 ia64	2003-01-10 06:59:34 +00:00
Gordon Tetlow	963cae780f	Fix superblock alignment problems on non-i386 platforms. Also change fs_uuid to fs_swuid, making it more descriptive. Submitted by: marcel Reviewed by: peter Pointy hat to: gordon	2003-01-09 23:53:30 +00:00
Gordon Tetlow	291871da9e	Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volname will be used to support volume names with the help of a GEOM module (to be committed). uuid will be used to deal with conflicting volume names (which doesn't work just yet). Approved by: mckusick@	2003-01-08 22:53:54 +00:00
Kirk McKusick	fa06a012cd	This patch fixes a problem caused by applications that rapidly and repeatedly truncate the same file. Each time the file is truncated, a buffer is grabbed to store the indirect block numbers that need to be freed. Those blocks cannot be freed until the inode claiming them is written to disk. Thus, the number of buffers being held by soft updates explodes and in extreme cases can run the kernel out of buffers. The problem can be avoided by doing an fsync on the file every debug.maxindirdep truncates (currently defaulted to 50). The fsync causes the inode to be written so that the held buffers can be freed. The check for excessive buffers is checked as part of the existing hook for excessive dependencies (softdep_slowdown) in the truncate code. Reported by: David Schultz <dschultz@uclink.Berkeley.EDU> Sponsored by: DARPA & NAI Labs. MFC after: 3 weeks	2003-01-07 18:23:50 +00:00
Poul-Henning Kamp	862702306b	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Alfred Perlstein	13438f6823	When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.	2003-01-01 01:56:19 +00:00
Poul-Henning Kamp	aa4d7a8a4b	Use three UMA zones for FFS/UFS inodes instead of malloc space. Since inodes are currently 144 bytes, this will save 112 bytes per inode. This can amount to up to 10MByte on large systems.	2002-12-27 11:05:05 +00:00
Poul-Henning Kamp	de6ba7c016	Move the allocation of the inode contents into ffs_vfsops.c rather than passing malloc types around.	2002-12-27 10:23:03 +00:00
Poul-Henning Kamp	975512a907	Make ffs_mountfs() static. Remove the malloctype from the ufs mount structure, instead add a callback to the storage method for freeing inodes: UFS_IFREE(). Add vfs_ifree() method function which frees an inode. Unvariablelize the malloc type used for allocating inodes.	2002-12-27 10:06:37 +00:00
Kirk McKusick	4c572f6222	Fix corruption introduced in previous delta. Reported by: Aurelien Nephtali <aurelien.nephtali@wanadoo.fr> Sponsored by: DARPA & NAI Labs.	2002-12-18 19:50:28 +00:00
Kirk McKusick	6d967351b4	Keep comments consistent with the code. Minor optimization. Sponsored by: DARPA & NAI Labs.	2002-12-18 07:19:41 +00:00
Kirk McKusick	c021e44776	Cosmetic cleanup of unsigned buglets. Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.	2002-12-18 00:53:45 +00:00
Poul-Henning Kamp	120a6d842a	Remove unused lockcnt variable. Approved by: mckusick	2002-12-17 20:23:51 +00:00
Kirk McKusick	8efcd9a794	Update to previous change (1.54) to use an approperly wide inode field so as to work correctly on 64-bit platforms. Reported-by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs. Approved by: Ian Dowse <iedowse@maths.tcd.ie>	2002-12-15 19:25:59 +00:00
Kirk McKusick	0db138a6b0	Only the most recent snapshot contains the complete list of blocks that were copied in all of the earlier snapshots, thus its precomputed list must be used in the copyonwrite test. Using incomplete lists may lead to deadlock. Also do not include the blocks used for the indirect pointers in the indirect pointers as this may lead to inconsistent snapshots. Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-14 01:36:59 +00:00
Tom Rhodes	1626155b82	Remove the comment about dump(8) not working properly with snapshots. Discussed with: mckusick Approved by: re (rwatson)	2002-12-12 00:31:45 +00:00
Kirk McKusick	8d6754f289	More tightly verify the preference returned for the new inode. Submitted by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-06 02:08:46 +00:00
Kirk McKusick	0cb652d925	Have to use bread() rather than UFS_BALLOC() when obtaining a previously allocated block as the previous use of the block may have fallen out of the cache. Failure to reread its contents cause zeroed results to be written instead of the proper contents. Conversely, when the block is going to be entirely filled in, it is not necessary reread the old contents. Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-03 18:19:27 +00:00
Kirk McKusick	31574422a3	Add a check to disable the previous patch so that future filesystems that choose to place their superblocks in non-standard locations will not get them smashed. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:04:57 +00:00
Kirk McKusick	c6964d3bc9	Remove a race condition / deadlock from snapshots. When converting from individual vnode locks to the snapshot lock, be sure to pass any waiting processes along to the new lock as well. This transfer is done by a new function in the lock manager, transferlockers(from_lock, to_lock); Thanks to Lamont Granquist <lamont@scriptkiddie.org> for his help in pounding on snapshots beyond all reason and finding this deadlock. Sponsored by: DARPA & NAI Labs.	2002-11-30 19:00:51 +00:00
Kirk McKusick	63cf5b0ee2	Fix two deadlocks in snapshots: 1) Release the snapshot file lock while suspending the system. Otherwise a process trying to read the lock may block on its containing directory preventing the suspension from completing. Thanks to Sean Kelly <smkelly@zombie.org> for finding this deadlock. 2) Replace some bdwrite's with bawrite's so as not to fill all the buffers with dirty data. The buffers could not be cleaned as the snapshot vnode was locked hence the system could deadlock when making snapshots of really massive filesystems. Thanks to Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring this out. Sponsored by: DARPA & NAI Labs.	2002-11-30 07:27:12 +00:00
Kirk McKusick	fa5d33e242	Check to make sure that the fs_sblockloc field was properly updated before using it to write the superblock. This is to guard against accidentally trashing the disklabel if the superblock format missed being upgraded by the new kernel. Reported by: Sam Leffler <sam@errno.com> Sponsored by: DARPA & NAI Labs. Approved by: Murray Stokely <murray@FreeBSD.org>	2002-11-29 19:20:15 +00:00
Kirk McKusick	ada981b228	Create a new 32-bit fs_flags word in the superblock. Add code to move the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel. Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved. Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.	2002-11-27 02:18:58 +00:00
Kirk McKusick	f5235f70a4	The target for the maximum number of dependencies has been cut in half because of reports that under heavy load the kernel could exhaust its memory pool. The limit is now (desiredvnodes * 4) rather than (desiredvnodes * 8), so it will still scale with larger systems, just not as quickly. Sponsored by: DARPA & NAI Labs.	2002-11-20 05:16:11 +00:00
Kirk McKusick	3374bb5ad6	If an error occurs while writing a buffer, then the data will not have hit the disk and the dependencies cannot be unrolled. In this case, the system will mark the buffer as dirty again so that the write can be retried in the future. When the write succeeds or the system gives up on the buffer and marks it as invalid (B_INVAL), the dependencies will be cleared. Sponsored by: DARPA & NAI Labs.	2002-11-20 05:14:16 +00:00
Peter Wemm	cdf5e9ccb6	Do not assume that time_t is an int. Approved by: re (jhb)	2002-11-15 22:36:57 +00:00
Robert Watson	763bbd2f4f	Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-26 14:38:24 +00:00
Kirk McKusick	9ab73fd11a	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.	2002-10-25 00:20:37 +00:00
Kirk McKusick	c0762674c9	We must be careful to avoid recursive copy-on-write faults when trying to clean up during disk-full senarios. Sponsored by: DARPA & NAI Labs.	2002-10-23 21:47:02 +00:00
Kirk McKusick	2eff16f057	Missplaced FREE_LOCK causes a panic when hit while taking a snapshot. Sponsored by: DARPA & NAI Labs.	2002-10-23 05:14:06 +00:00
Kirk McKusick	0152387ade	This update further fine tunes the locking of snapshot vnodes in the ffs_copyonwrite routine to avoid a deadlock between the syncer daemon trying to sync out a snapshot vnode and the bufdaemon trying to write out a buffer containing the snapshot inode. With any luck this will be the last snapshot race condition. Sponsored by: DARPA & NAI Labs.	2002-10-22 01:23:00 +00:00
Kirk McKusick	127ab960d5	This update is a performance improvement when allocating blocks on a full filesystem. Previously, if the allocation failed, we had to fsync the file before rolling back any partial allocation of indirect blocks. Most block allocation requests only need to allocate a single data block and if that allocation fails, there is nothing to unroll. So, before doing the fsync, we check to see if any rollback will really be necessary. If none is necessary, then we simply return. This update eliminates the flurry of disk activity that got triggered whenever a filesystem would run out of space. Sponsored by: DARPA & NAI Labs.	2002-10-22 01:14:25 +00:00
Kirk McKusick	e03486d198	This checkin reimplements the io-request priority hack in a way that works in the new threaded kernel. It was commented out of the disksort routine earlier this year for the reasons given in kern/subr_disklabel.c (which is where this code used to reside before it moved to kern/subr_disk.c): ---------------------------- revision 1.65 date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0 Comment out Kirks io-request priority hack until we can do this in a civilized way which doesn't cause grief. The problem is that it is not generally safe to cast a "struct bio " to a "struct buf ". Things like ccd, vinum, ata-raid and GEOM constructs bio's which are not entrails of a struct buf. Also, curthread may or may not have anything to do with the I/O request at hand. The correct solution can either be to tag struct bio's with a priority derived from the requesting threads nice and have disksort act on this field, this wouldn't address the "silly-seek syndrome" where two equal processes bang the diskheads from one edge to the other of the disk repeatedly. Alternatively, and probably better: a sleep should be introduced either at the time the I/O is requested or at the time it is completed where we can be sure to sleep in the right thread. The sleep also needs to be in constant timeunits, 1/hz can be practicaly any sub-second size, at high HZ the current code practically doesn't do anything. ---------------------------- As suggested in this comment, it is no longer located in the disk sort routine, but rather now resides in spec_strategy where the disk operations are being queued by the thread that is associated with the process that is really requesting the I/O. At that point, the disk queues are not visible, so the I/O for positively niced processes is always slowed down whether or not there is other activity on the disk. On the issue of scaling HZ, I believe that the current scheme is better than using a fixed quantum of time. As machines and I/O subsystems get faster, the resolution on the clock also rises. So, ten years from now we will be slowing things down for shorter periods of time, but the proportional effect on the system will be about the same as it is today. So, I view this as a feature rather than a drawback. Hence this patch sticks with using HZ. Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>	2002-10-22 00:59:49 +00:00
Matthew Dillon	1b7e3dafdf	Fix a file-rewrite performance case for UFS[2]. When rewriting portions of a file in chunks that are less then the filesystem block size, if the data is not already cached the system will perform a read-before-write. The problem is that it does this on a block-by-block basis, breaking up the I/Os and making clustering impossible for the writes. Programs such as INN using cyclic file buffers suffer greatly. This problem is only going to get worse as we use larger and larger filesystem block sizes. The solution is to extend the sequential heuristic so UFS[2] can perform a far larger read and readahead when dealing with this case. (note: maximum disk write bandwidth is 27MB/sec thru filesystem) (note: filesystem blocksize in test is 8K (1K frag)) dd if=/dev/zero of=test.dat bs=1k count=2m conv=notrunc Before: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 14.21 598 8.30 0.00 0 0.00 0.00 0 0.00 0 0 7 1 92 0 76 14.09 813 11.19 0.00 0 0.00 0.00 0 0.00 0 0 9 5 86 0 76 14.28 821 11.45 0.00 0 0.00 0.00 0 0.00 0 0 8 1 91 After: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 63.62 434 26.99 0.00 0 0.00 0.00 0 0.00 0 0 18 1 80 0 76 63.58 424 26.30 0.00 0 0.00 0.00 0 0.00 0 0 17 2 82 0 76 63.82 438 27.32 0.00 0 0.00 0.00 0 0.00 1 0 19 2 79 Reviewed by: mckusick Approved by: re X-MFC after: immediately (was heavily tested in -stable for 4 months)	2002-10-18 22:52:41 +00:00
Kirk McKusick	86aeb27fa2	Change locking so that all snapshots on a particular filesystem share a common lock. This change avoids a deadlock between snapshots when separate requests cause them to deadlock checking each other for a need to copy blocks that are close enough together that they fall into the same indirect block. Although I had anticipated a slowdown from contention for the single lock, my filesystem benchmarks show no measurable change in throughput on a uniprocessor system with three active snapshots. I conjecture that this result is because every copy-on-write fault must check all the active snapshots, so the process was inherently serial already. This change removes the last of the deadlocks of which I am aware in snapshots. Sponsored by: DARPA & NAI Labs.	2002-10-16 00:19:23 +00:00
Robert Watson	80830407c6	If the FS_MULTILABEL flag is set in a UFS or UFS2 superblock, automatically set MNT_MULTILABEL in the mount flags. If FS_ACLS is set in a UFS or UFS2 superblock, automatically set MNT_ACLS in the mount flags. If either of these flags is set, but the appropriate kernel option to support the features associated with the flag isn't available, then print a warning at mount-time. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-15 20:00:06 +00:00
Kirk McKusick	48f0495d85	When reading or writing the extended attributes of a special device or fifo in UFS2, the normal ufs_strategy routine needs to be used rather than the spec_strategy or fifo_strategy routine. Thus the ffsext_strategy routine is interposed in the ffs_vnops vectors for special devices and fifo's to pick off this special case. Otherwise it simply falls through to the usual spec_strategy or fifo_strategy routine. Submitted by: Robert Watson <rwatson@FreeBSD.org> Sponsored by: DARPA & NAI Labs.	2002-10-14 23:18:09 +00:00
Robert Watson	3ceef565b2	Define two new superblock file system flags: FS_ACLS Administrative enable/disable of extended ACL support FS_MULTILABEL Administrative flag to indicate to the MAC Framework that objects in the file system are individually labeled using extended attributes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: (in principal) mckusick, phk	2002-10-14 17:07:11 +00:00
Kirk McKusick	a5b65058d5	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
Maxime Henrion	cba63e0291	Fix build of 64 bit platforms.	2002-10-09 12:19:36 +00:00
Kirk McKusick	4d533db182	When creating a snapshot, create a list of initially allocated blocks. Whenever doing a copy-on-write check, first look in the list of initially allocated blocks to see if it is there. If so, no further check is needed. If not, fall through and do the full check. This change eliminates one of two known deadlocks caused by snapshots. Handling the second deadlock will be the subject of another check-in. This change also reduces the cost of the copy-on-write check by speeding up the verification of frequently checked blocks. Sponsored by: DARPA & NAI Labs.	2002-10-09 06:13:48 +00:00
Kirk McKusick	b6cef5648d	The appropriate units for disk block addresses are always DEV_BSIZE, even when the underlying device has a larger sector size. Therefore, the filesystem code should not (and with this patch does not) try to use the underlying sector size when doing disk block address calculations. This patch fixes problems in -current when using the swap-based memory-disk device (mdconfig -a -t swap ...). This bugfix is not relevant to -stable as -stable does not have the memory-disk device. Sponsored by: DARPA & NAI Labs.	2002-10-09 04:01:23 +00:00
Jeff Roberson	a2c4ff970b	- Remove LK_INTERLOCK from the vn_lock() in ffs_snapshot(). Pointy hat to: me Found by: green	2002-10-08 21:00:52 +00:00
Dima Dorfman	85bba62925	size_t is not a struct (fix mislabelling in a comment).	2002-10-02 05:15:34 +00:00
Juli Mallett	85de3147ea	When spamming me with a printf(9), under DIAGNOSTIC, at least be nice enough to include a newline. MFC after: 4 days Sponsored by: Bright Path Solutions	2002-09-28 19:04:49 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Poul-Henning Kamp	993b0567b2	Use our mount-credential if we get a NOCRED when we try to write out EA space back to disk. This is wrong in many ways, but not as wrong as a panic. Pancied on: rwatson & jmallet Sponsored by: DARPA & NAI Labs.	2002-09-27 20:00:03 +00:00
Jeff Roberson	2ee5711e84	- Convert locks to use standard macros. - Lock access to the buflists. - Document broken locking. - Use vrefcnt().	2002-09-25 02:49:48 +00:00
Jeff Roberson	6ef1763407	- Document broken locking. - Use vrefcnt().	2002-09-25 02:47:49 +00:00
Poul-Henning Kamp	cf09d67418	We don't need to #include <sys/disklabel.h>. We don't need to #include <sys/disklabel.h> second time either. Sponsored by: DARPA & NAI Labs.	2002-09-20 16:42:33 +00:00
David E. O'Brien	47a561263d	intmax_t is printed with %jd, not %lld.	2002-09-19 03:55:30 +00:00
Nate Lawson	06be2aaa83	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
Poul-Henning Kamp	0e168822b2	Implement the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods. Use extattr_check_cred() to check access to EAs. This is still a WIP. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:59:42 +00:00
Bruce Evans	8f767abf71	Include <sys/malloc.h> instead of depending on namespace pollution 2 layers deep in <sys/proc.h> or <sys/vnode.h>. Include <sys/vmmeter.h> instead of depending on namespace pollution in <sys/pcpu.h>. Sorted includes as much as possible.	2002-09-05 09:43:24 +00:00
Poul-Henning Kamp	d0e9b8dbc4	Correctly handle setting, getting and deleting EA's with zero length content. Sponsored by: DARPA & NAI Labs.	2002-08-30 08:57:09 +00:00
Alan Cox	fff6062ab6	o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.	2002-08-25 00:22:31 +00:00
Poul-Henning Kamp	7428de69d2	Implement list of EA return functionality. Correctly delete EA's when the content length is set to zero. Sponsored by: DARPA & NAI Labs.	2002-08-20 11:34:58 +00:00
Poul-Henning Kamp	0176455bc8	First snapshot of UFS2 EA support. Sponsored by: DARPA & NAI Labs.	2002-08-19 07:01:55 +00:00
Poul-Henning Kamp	18280bc653	Expand the arguments to ffs_ext{read,write}() to their component parts rather than use vop_{read,write}_args. Access to these functions will ultimately not be available through the "vop_{read,write}+IO_EXT" API but this functionality is retained for debugging purposes for now. Sponsored by: DARPA & NAI Labs.	2002-08-13 11:33:01 +00:00
Poul-Henning Kamp	d6fe88e475	Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is an UFS only thing, and FFS should in principle not know if it is enabled or not. This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c Sponsored by: DARPA and NAI Labs.	2002-08-13 10:33:57 +00:00
Poul-Henning Kamp	9bf1a75697	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.	2002-08-13 10:05:50 +00:00
Poul-Henning Kamp	e179b40f14	Stop pretending that the FFS file ufs_readwrite.c is a UFS file. Instead of #including it, pull it into ffs_vnops.c and name things correctly. Sponsored by: DARPA & NAI Labs.	2002-08-12 10:32:56 +00:00
Ian Dowse	98caa2e4e9	Don't call softdep_slowdown() if soft updates are not active on the filesystem. This causes a panic for kernels compiled without softupdates. Reported by: luigi	2002-08-05 17:59:20 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Poul-Henning Kamp	c3a0d1d4e1	I forgot this bit of uglyness in the fsck_ffs cleanup.	2002-07-31 07:01:18 +00:00
Poul-Henning Kamp	9fbc6a330d	Fix braino in last commit.	2002-07-30 12:02:41 +00:00
Poul-Henning Kamp	17b1994bbe	Move ffs_isfreeblock() to ffs_alloc.c and make it static. Sponsored by: DARPA & NAI Labs.	2002-07-30 11:54:48 +00:00
Benno Rice	683eac8dbb	Add a missing argument to the stub for softdep_setup_freeblocks. Forgotten by: mckusick	2002-07-20 04:07:15 +00:00
Peter Wemm	382f95d332	Fix a warning: ffs_softdep.c:1630: warning: int format, different type arg (arg 2)	2002-07-20 01:09:35 +00:00
Kirk McKusick	7aca6291e3	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
Kirk McKusick	faab4e2722	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.	2002-07-16 22:36:00 +00:00
Tom Rhodes	ae76f60046	Fix a type: s/your are/you are/	2002-07-12 19:56:31 +00:00

... 2 3 4 5 6 ...

828 Commits