freebsd-nq

Author	SHA1	Message	Date
Jeff Roberson	41766826eb	- Fix anoter dyslexic moment; an atomic_set_int should've become ACTIVESET, not ACTIVECLEAR. Submitted by: iedowse	2005-03-01 07:38:45 +00:00
Poul-Henning Kamp	7ce296cf04	Remove debug printout of major/minor numbers, print name instead.	2005-02-27 21:16:26 +00:00
Sam Leffler	d5bbad8372	use uiomove return value instead of always returning 0 when doing a readlink of a fast link Noticed by: Coverity Prevent analysis tool Reviewed by: phk	2005-02-27 18:58:31 +00:00
Jeff Roberson	1a4a9672f1	- Add VOP locking asserts in several functions that have been implicated in recent deadlocks.	2005-02-22 23:56:42 +00:00
Xin LI	a16baf37b9	The recomputation of file system summary at mount time can be a very slow process, especially for large file systems that is just recovered from a crash. Since the summary is already re-sync'ed every 30 second, we will not lag behind too much after a crash. With this consideration in mind, it is more reasonable to transfer the responsibility to background fsck, to reduce the delay after a crash. Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to control this behavior. When set to nonzero, we will get the "old" behavior, that the summary is computed immediately at mount time. Add five new sysctl variables to adjust ndir, nbfree, nifree, nffree and numclusters respectively. Teach fsck_ffs about these API, however, intentionally not to check the existence, since kernels without these sysctls must have recomputed the summary and hence no adjustments are necessary. This change has eliminated the usual tens of minutes of delay of mounting large dirty volumes. Reviewed by: mckusick MFC After: 1 week	2005-02-20 08:02:15 +00:00
Poul-Henning Kamp	dfd4be14bd	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
Xin LI	d5128ab2af	When clearing a fragment, it's possible that the length is zero. Reviewed by: mckusick MFC After: 1 week	2005-02-19 07:31:33 +00:00
Jeff Roberson	a8127ebb5d	- Remove the unused and unsafe ufs_ihashlookup. This function returned a vnode pointer that could not be used since no locks were held. Sponsored by: Isilon Systems, Inc.	2005-02-14 20:51:39 +00:00
Poul-Henning Kamp	1121c39497	Make non-SOFTUPDATES kernels compile again. Integrate the stubfile into the main file now that license issues have been long resolved.	2005-02-11 08:13:31 +00:00
Poul-Henning Kamp	adf4157738	Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static.	2005-02-10 12:20:08 +00:00
Jeff Roberson	a3caf16e99	- In the softupdates case for ffs_truncate() we use vinvalbuf() to invalidate pending io and dependencies. However, vinvalbuf() rightfully does not call vnode_pager_setsize() for us. We must do this here. This could potentially have caused numerous kinds of bugs, but it was specifically causing msync() deadlocks because msync() was writing flushing pages that should not have been valid. Sponsored by: Isilon Systems, Inc. Reported by: kkenn	2005-02-09 23:05:20 +00:00
Poul-Henning Kamp	365b18aa89	style polishing.	2005-02-09 12:22:16 +00:00
Colin Percival	79653046d8	Add a new sysctl, "security.jail.chflags_allowed", which controls the behaviour of chflags within a jail. If set to 0 (the default), then a jailed root user is treated as an unprivileged user; if set to 1, then a jailed root user is treated the same as an unjailed root user. This is necessary to allow "make installworld" to work inside a jail, since it attempts to manipulate the system immutable flag on certain files. Discussed with: csjp, rwatson MFC after: 2 weeks	2005-02-08 21:31:11 +00:00
Poul-Henning Kamp	02f2c6a9d8	Split the vop_vector for ffs1 and ffs2, this is mostly for the different EXTATTR support.	2005-02-08 21:03:52 +00:00
Poul-Henning Kamp	44787ceb0b	Use ffs_truncate() directly instead of UFS_TRUNCATE()	2005-02-08 20:51:00 +00:00
Poul-Henning Kamp	dd19a799b8	Background writes are entirely an FFS/Softupdates thing. Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.	2005-02-08 20:29:10 +00:00
Poul-Henning Kamp	88e5b12a20	Drag another softupdates tentacle back into FFS: Now that FFS's vop_fsync is separate from the internal use we can do the full job there.	2005-02-08 18:09:11 +00:00
Poul-Henning Kamp	efd6d9808c	Don't use the UFS_* and VFS_* functions where a direct call is possble. The UFS_ functions are for UFS to call back into VFS. The VFS functions are external entry points into the filesystem.	2005-02-08 17:40:01 +00:00
Robert Watson	45faa442c3	Don't use VOP_LEASE() with operations on extended attribute backing files. Pointed out by: phk	2005-02-08 17:05:38 +00:00
Poul-Henning Kamp	40854ff546	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.	2005-02-08 16:25:50 +00:00
Poul-Henning Kamp	d6f622cc2f	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.	2005-02-08 15:54:30 +00:00
Poul-Henning Kamp	32a870da8a	Use VOP_STRATEGY_APV() instead of direct dereference, this is more correct.	2005-02-08 15:40:11 +00:00
Jeff Roberson	9087d86e66	- Use a seperate malloc tag for saved inode contents to help in debugging memory modified after free errors. Sponsored by: Isilon Systems, Inc.	2005-02-02 20:30:47 +00:00
Ken Smith	87c29bf93e	Back out previous commit, bde@ provided an example of something this breaks.	2005-02-02 14:21:01 +00:00
Ken Smith	0fac1537a2	It was noticed that we do not change a file's access time when it gets executed. This appears to violate most of the UNIX-ish standards. One example quote from: http://www.opengroup.org/onlinepubs/009695399/functions/exec.html Upon successful completion, the exec functions shall mark for update the st_atime field of the file. If an exec function failed but was able to locate the process image file, whether the st_atime field is marked for update is unspecified. Should the exec function succeed, the process image file shall be considered to have been opened with open(). This appears to take care of it for ufs filesystems, doing the necessary sanity checks (read-only filesystem, etc) without violating any other standards (setting atime for any open appears to be allowed in any standards I could find). Noticed by: cperciva Reviewed by: kan, rwatson	2005-02-02 00:21:38 +00:00
Warner Losh	1f0ce611b3	nit in /*-	2005-01-31 08:16:45 +00:00
Peter Edwards	e697161fa2	Tell vnode_create_vobject() how big an object to create, rather than having it work it out via the more expensive VOP_GETATTR Reviewed by: phk@	2005-01-29 14:23:09 +00:00
Poul-Henning Kamp	a369f34d76	Make filesystems get rid of their own vnodes vnode_pager object in VOP_RECLAIM().	2005-01-28 14:42:17 +00:00
Poul-Henning Kamp	d4eb29ba71	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
Poul-Henning Kamp	84a6975215	Introduce and use g_vfs_close().	2005-01-25 15:52:04 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	f74b3b1f6c	Create a vnode object when the file is opened. Trust that we did so.	2005-01-24 23:04:33 +00:00
Poul-Henning Kamp	ce12d37e7b	Don't create vnode_pager objects for the disk device. geom_vfs will do that.	2005-01-24 22:41:59 +00:00
Poul-Henning Kamp	625d4bc03a	Create a vp->v_object in VFS_FHTOVP() if we want to be exportable with NFS. We are moving responsibility for creating the vnode_pager object into the filesystems which own the vnode, and this is one of the places we have to cover. We call vnode_create_vobject() directly because we own the vnode. If we can get the size easily, pass it as an argument to save the call to VOP_GETATTR() in vnode_create_vobject()	2005-01-24 21:51:19 +00:00
Poul-Henning Kamp	091710ab22	Polish style.	2005-01-24 12:19:28 +00:00
Jeff Roberson	08023360a0	- Convert the global LK lock to a mutex. - Expand the scope of lk to cover not only interrupt races, but also top-half races, which includes many new uses over global top-half only data. - Get rid of interlocked_sleep() and use msleep or BUF_LOCK where appropriate. - Use the lk mutex in place of the various hand rolled semaphores. - Stop dropping the lk lock before we panic. - Fix getdirtybuf() callers so that they reacquire access to whatever softdep datastructure they were inxpecting in the failure/retry case. Previously, sleeps in getdirtybuf() could leave us with pointers to bad memory. - Update handling of ffs to be compatible with ffs locking changes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:18:31 +00:00
Jeff Roberson	3ba649d792	- Initialize and destroy the per-filesystem ufs lock where appropriate. - Use the buffer lock on the superblock buf to serialize calls to sbupdate. - Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:12:28 +00:00
Jeff Roberson	dec351f69e	- Remove GIANT_REQUIRED where giant is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:10:47 +00:00
Jeff Roberson	5cef9d6add	- Use the ufs lock to protect fs_active. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:10:11 +00:00
Jeff Roberson	353255885c	- Acquire the ufs lock around several ffs_alloc functions that require it. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:09:10 +00:00
Jeff Roberson	8e37fbad3a	- Don't use atomic operations to deal with the active array, instead it is now quite naturally protected by the ufsmount mutex. - Use the ufs lock to protect various fields in struct fs, primarily the cg summary needs protection to avoid allocation races. Several functions have been slightly re-arranged to reduce the number of lock operations. - Adjust several functions (blkfree, freefile, etc.) to accept a ufsmount as an argument so that we may access the ufs lock. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:08:35 +00:00
Jeff Roberson	5c77b03eff	- Acquire the ufs lock when manipulating some fields of struct fs. - Change arguments to various ffs functions to match their new prototypes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:04:22 +00:00
Jeff Roberson	f2aa1113a3	- Mark the struct fs members that require the ufsmount mutex. - Define some macros for manipulating the fs_active bitmap. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:03:17 +00:00
Jeff Roberson	aaee366929	- Change some function parameters so that the ufsmount structure is accessable in places where the ufs lock will be needed. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:02:11 +00:00
Jeff Roberson	751d0d9fc9	- Add a mutex to the ufsmount structure. This mutex is used to protect any per-instance global data that is not already protected by a buf or vnode lock. Presently, only fields in ffs's struct fs utilize this lock. - Sort some ufsmount members so that fields used for quotas are grouped together. This is in anticipation of quota locking. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:01:10 +00:00
Pawel Jakub Dawidek	39cfb23935	Fix ACLs handling for the root file system. Without this fix, when ACLs are set via tunefs(8) on the root file system, they are removed on boot when 'mount -a' is called, because mount(8) called for the root file system always add MNT_UPDATE flag and MNT_UPDATE flag isn't perfect. Now, one cannot remove ACLs stored in superblock (configured with tunefs(8)) via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still possible to mount file system which doesn't have ACLs in superblock via 'mount -o acls <file system>' or /etc/fstab's 'acls' option. Reported by: Lech Lorens/pl.comp.os.bsd Discussed with: phk, rwatson Reviewed by: rwatson MFC after: 2 weeks	2005-01-15 17:09:53 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	0391e5a151	Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()	2005-01-11 09:10:46 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Poul-Henning Kamp	a7e8286f28	white space	2004-12-14 21:35:00 +00:00
Poul-Henning Kamp	59d42685ad	Implement simpler panics for VOP_{read,write} on fifos.	2004-12-14 21:30:45 +00:00
Warner Losh	7a7e867742	LINT defines things which compile in code that as referring to the old a_desc element. change this to the new a_gen.a_desc to reflect changes to vnode_if.h generation. Noticed by: tinderbox, phk	2004-12-13 17:53:20 +00:00
Poul-Henning Kamp	4a18054d7b	With the introduction of UFS2 we started looking for superblocks in four different locations on a prospective filesystem. If we found none, we forgot to invalidate the four buffers, thus the following sequence would fails: (md0 = blank disk) mount /dev/md0 /mnt (fails, no superblocks) newfs /dev/md0 (writes using physio which does not go through buffercache). mount /dev/md0 /mnt (still fails, the four cached buffers still contain no superblocks) Found by: ru	2004-12-12 14:19:11 +00:00
Marcel Moolenaar	9effe51e45	Revert previous commit. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed. Requested by: phk	2004-12-11 23:05:30 +00:00
Kirk McKusick	364ed814e7	Fixes a bug that caused UFS2 filesystems bigger than 2TB to prematurely report that they were full and/or to panic the kernel with the message ``ffs_clusteralloc: allocated out of group''. Submitted by: Henry Whincup <henry@jot.to> MFC after: 1 week	2004-12-09 21:24:00 +00:00
Poul-Henning Kamp	8f25bad356	Fix snapshot creation.	2004-12-08 11:54:06 +00:00
Poul-Henning Kamp	f21cc2cafc	Fix nfs exports (for now). The real fix is to teach mountd about nmount.	2004-12-07 15:09:30 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
Poul-Henning Kamp	743312367a	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.	2004-12-05 22:41:02 +00:00
Marcel Moolenaar	061f5ec825	Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day. No pathos on: current@	2004-12-05 22:30:28 +00:00
Poul-Henning Kamp	93e0b506e3	typo in comment.	2004-12-03 20:36:55 +00:00
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
Poul-Henning Kamp	6fde64c778	Mechanically change prototypes for vnode operations to use the new typedefs.	2004-12-01 12:24:41 +00:00
Poul-Henning Kamp	964ebefd8d	Use system wide no-op vfs_start function.	2004-11-25 09:11:27 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	9c83534dd8	Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)	2004-11-15 09:18:27 +00:00
Poul-Henning Kamp	51ac12ab28	Be prepared to accept NULL mountargs as part of root-mounting.	2004-11-13 13:04:31 +00:00
Poul-Henning Kamp	cf5e414960	Put back the vfs_object_create() calls, they do make a difference when my test-setup does what I want it to instead of what I ask it to. Pointed out by: tegge	2004-11-12 10:27:14 +00:00
Poul-Henning Kamp	40ce27cb57	fix some comments	2004-11-10 06:53:31 +00:00
Poul-Henning Kamp	2e6649198a	Use mount flags instead of NULL path to detect root filesystem mount.	2004-11-09 23:38:10 +00:00
Poul-Henning Kamp	5e2ccaff7a	Stop pretending to have a vm_object backing the underlying disk vnode: it isn't used for anything anywhere and the vnode_pager would explode if we attempted to.	2004-11-09 23:12:45 +00:00
Poul-Henning Kamp	5349c79d75	Properly implement a default version of VOP_GETWRITEMOUNT. Remove improper access to vop_stdgetwritemount() which should and will instead rely on the VOP default path.	2004-11-06 11:41:22 +00:00
Poul-Henning Kamp	40c340aa5d	Don't grab the exclusive bit on a root filesystem until we are willing to mount it. Doing so prevented fsck to be run after a refused mount.	2004-11-04 09:11:22 +00:00
Poul-Henning Kamp	4392001125	Move UFS from DEVFS backing to GEOM backing. This eliminates a bunch of vnode overhead (approx 1-2 % speed improvement) and gives us more control over the access to the storage device. Access counts on the underlying device are not correctly tracked and therefore it is possible to read-only mount the same disk device multiple times: syv# mount -p /dev/md0 /var ufs rw 2 2 /dev/ad0 /mnt ufs ro 1 1 /dev/ad0 /mnt2 ufs ro 1 1 /dev/ad0 /mnt3 ufs ro 1 1 Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches things in RAM) this is not possible with read-write mounts, and the system will correctly reject this. Details: Add a geom consumer and a bufobj pointer to ufsmount. Eliminate the vnode argument from softdep_disk_prewrite(). Pick the vnode out of bp->b_vp for now. Eventually we should find it through bp->b_bufobj->b_private. In the mountcode, use g_vfs_open() once we have used VOP_ACCESS() to check permissions. When upgrading and downgrading between r/o and r/w do the right thing with GEOM access counts. Remove all the workarounds for not being able to do this with VOP_OPEN(). If we are the root mount, drop the exclusive access count until we upgrade to r/w. This allows fsck of the root filesystem and the MNT_RELOAD to work correctly. Set bo_private to the GEOM consumer on the device bufobj. Change the ffs_ops->strategy function to call g_vfs_strategy() In ufs_strategy() directly call the strategy on the disk bufobj. Same in rawread. In ffs_fsync() we will no longer see VCHR device nodes, so remove code which synced the filesystem mounted on it, in case we came there. I'm not sure this code made sense in the first place since we would have taken the specfs route on such a vnode. Redo the highly bogus readblock() function in the snapshot code to something slightly less bogus: Constructing an uio and using physio was really quite a detour. Instead just fill in a bio and ship it down.	2004-10-29 10:15:56 +00:00
Poul-Henning Kamp	570a7ddaa3	We only support backing UFS/FFS with disks.	2004-10-28 06:19:28 +00:00
Poul-Henning Kamp	a40a512387	Eliminate unnecessary KASSERTS.	2004-10-27 06:45:06 +00:00
Poul-Henning Kamp	93d244fb1a	KASSERT that we only get to prewrite() on writes.	2004-10-26 20:13:49 +00:00
Poul-Henning Kamp	8dd5650594	White space changes. Add missing static.	2004-10-26 20:13:21 +00:00
Poul-Henning Kamp	53389dd64a	Replace single case switch() with if().	2004-10-26 20:12:25 +00:00
Poul-Henning Kamp	b6e2606155	Vertically align comment.	2004-10-26 20:12:00 +00:00
Poul-Henning Kamp	6e77a04170	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.	2004-10-26 10:44:10 +00:00
Poul-Henning Kamp	58883a1fe5	Fix syntax errors introduced by last commit. Why isn't DIRECTIO in NOTES/LINT ?	2004-10-26 09:04:20 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Poul-Henning Kamp	fae974f156	Degeneralize the per cdev copyonwrite callback. The only possible value is ffs_copyonwrite() and the only place it can be called from is FFS which would never want to call another filesystems copyonwrite method, should one exist, so there is no reason why anything generic should know about this.	2004-10-26 06:25:56 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Robert Watson	60c9762920	Explicitly break out NETA license from Berkeley license to clearly indicate license grant, as well as to indicate that NETA is asserting only two clauses, not four clauses. Requested by: imp	2004-10-20 08:05:02 +00:00
Nate Lawson	894d8d3c03	Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB, allowing for sizes up to 4 TB. This doesn't affect UFS2 since b is already a 64 bit type, coincidental with daddr_t. Submitted by: bde	2004-10-09 20:16:06 +00:00
Pawel Jakub Dawidek	8d02a378aa	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.	2004-10-05 11:26:43 +00:00
Poul-Henning Kamp	4f116178ba	Remove support for accessing device nodes in UFS/FFS. Device nodes can still be created and exported with NFS.	2004-09-28 13:30:58 +00:00
Poul-Henning Kamp	961da2716b	Give cluster_write() an explicit vnode argument. In the future a struct buf will not automatically point out a vnode for us.	2004-09-27 19:14:10 +00:00
Pawel Jakub Dawidek	5a19f8b0c4	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.	2004-09-23 10:13:18 +00:00
Poul-Henning Kamp	d705e025d0	The getpages VOP was a good stab at getting scatter/gather I/O without too much kernel copying, but it is not the right way to do it, and it is in the way for straightening out the buffer cache. The right way is to pass the VM page array down through the struct bio to the disk device driver and DMA directly in to/out off the physical memory. Once the VM/buf thing is sorted out it is next on the list. Retire most of vnode method. ffs_getpages(). It is not clear if what is left shouldn't be in the default implementation which we now fall back to. Retire specfs_getpages() as well, as it has no users now.	2004-09-19 08:14:55 +00:00
Poul-Henning Kamp	b08c753baa	Do not traverse list of snapshots if there isn't one. Found by: scottl	2004-09-16 17:28:56 +00:00
Poul-Henning Kamp	b85e29f007	Missed a place where snapshots were allocated in my last commit to this file.	2004-09-16 15:58:18 +00:00
Poul-Henning Kamp	67673e6677	Create struct snapdata which contains the snapshot fields from cdev and the previously malloc'ed snapshot lock. Malloc struct snapdata instead of just the lock. Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes). While here, give the private readblock() function a vnode argument in preparation for moving UFS to access GEOM directly.	2004-09-13 07:29:45 +00:00
Poul-Henning Kamp	883d3c0c07	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
Poul-Henning Kamp	1affa3adc8	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
Christian S.J. Peron	60088fb7b1	Currently, if the secure level is low enough, system flags can be manipulated by prison root. In 4.x prison root can not manipulate system flags, regardless of the security level. This behavior should remain consistent to avoid any surprises which could lead to security problems for system administrators which give out privileged access to jails. This commit changes suser_cred's flag argument from SUSER_ALLOWJAIL to 0. This will prevent prison root from being able to manipulate system flags on files. This may be a MFC candidate for RELENG_5. Discussed with: cperciva Reviewed by: rwatson Approved by: bmilekic (mentor) PR: kern/70298	2004-08-22 02:03:41 +00:00
John Baldwin	b72ea57f3b	Generalize the UFS bad magic value used to determine when a filesystem has only been partly initialized via newfs(8) so that it applies to both UFS1 and UFS2. Submitted by: "Xin LI" delphij at frontfree dot net MFC: maybe?	2004-08-19 11:09:13 +00:00
David Malone	da126abaf1	When looking for some extra data to include in the hash, use the address of the dirhash, rather than the first sizeof(struct dirhash *) bytes of the structure (which, thankfully, seem to be constant). Submitted by: Ted Unangst <tedu@zeitbombe.org> MFC after: 2 weeks	2004-08-16 10:00:44 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Poul-Henning Kamp	7ac439fec4	use bufdone() not biodone().	2004-08-08 13:23:05 +00:00
Poul-Henning Kamp	5e8c582ac2	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alexander Kabaev	b403319b8d	Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.	2004-07-28 06:41:27 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Poul-Henning Kamp	d8d3d4158b	Make sure to update the mnt_stats before UFS1 extattr tried to do I/O on the device. Otherwise the blocksize is undefined in the buffer cache.	2004-07-14 14:19:32 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Marcel Moolenaar	f65de26bf6	Update for the KDB debugger framework: o Make debugging code conditional upon KDB. o Use kdb_backtrace() instead of backtrace(). o Remove inclusion of opt_ddb.h.	2004-07-10 20:45:47 +00:00
Poul-Henning Kamp	c94cd5fc8c	Explicity initialize vp->v_bsize.	2004-07-07 20:04:06 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Robert Watson	12ec7658a4	Annotate that we don't check the returned data length from ufs_readdir() because UFS uses fixed-size directory blocks. When using this code with other file systems, such as HFS+, the value of auio.uio_resid will need to be taken into account.	2004-06-24 18:31:23 +00:00
Robert Watson	bb0527fdd3	Remove unnecessary setting of VV_SYSTEM on extended attribute backing files. When this flag is used in our port of this code to Darwin, it caused remarkable pain, and doesn't offer a benefit in FreeBSD.	2004-06-24 18:17:41 +00:00
Robert Watson	00a460dcf4	Protect a non-text comment with a '-'.	2004-06-24 17:45:45 +00:00
Robert Watson	cd39d9b661	White space cleanup: use spaces instead of tabs in variable declarations local to a function. Remove a couple of blank lines in variable declarations. In one case, explicitly test against NULL rather than using a pointer as a boolean directly.	2004-06-24 17:44:14 +00:00
Bruce Evans	8f7c483f5c	Backed out previous commit. The dev_t -> `struct cdev ' changes have lots of errors. Blind substitution of "dev_t foo" by "struct cdev foo" in comments usually just created an English syntax error (e.g., "struct cdev *changes"), but here it did less than that since the dev_t is a user dev_t.	2004-06-20 03:11:19 +00:00
Jun Kuriyama	86030e4a00	Avoid deadlock which is caused by locking VDIR of parent and VREG of snapshot itself in wrong order. We can skip unlink check of that directory because it must have snapshot in it. Reviewed by: mckusick and current@	2004-06-18 14:35:17 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Stefan Farfeleder	1a5ff9285a	Avoid assignments to cast expressions. Reviewed by: md5 Approved by: das (mentor)	2004-06-08 13:08:19 +00:00
Tim J. Robbins	fa2a4d0595	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
Kirill Ponomarev	b4a1d9299a	- Fix typo Approved by: tobez	2004-05-31 16:55:12 +00:00
Ken Smith	4b14cc0205	Upon further review it was decided this piece of the msync(2) fixes was applicable to HEAD, originally it was thought this should only be done in RELENG_4. Implement IO_INVAL in the vnode op for writing by marking the buffer as "no cache". This fix has already been applied to RELENG_4 as Rev. 1.65.2.15 of ufs/ufs/ufs_readwrite.c. Reviewed by: alc, tegge	2004-05-21 12:05:48 +00:00
Ken Smith	83d8045f16	Style fixup in previous commit. Noticed by: bde (thanks!)	2004-05-19 18:06:21 +00:00
Ken Smith	f7dd67d801	Change ffs_realloccg() to set the valid bits for the extended part of the fragment to zero the valid parts of a VM_IO buffer. RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately. Reviewed by: alc, tegge	2004-05-14 22:00:08 +00:00
Bosko Milekic	451079d4ab	Revert previous change to this file because it breaks some things which compare /etc/fstab entries to results from getfsstat(). The real way to fix this is to make 'ufs2' a recognized filesystem (for real, no beating around the bush). This should fix things like 'umount -a -t ufs' now. Appologies for the previous breakage.	2004-04-29 15:10:42 +00:00
Bosko Milekic	2aebb586db	The previous change to mount(8) to report ufs or ufs2 used libufs, which only works for Charlie root. This change reverts the introduction of libufs and moves the check into the kernel. Since the f_fstypename is the same for both ufs and ufs2, we check fs_magic for presence of ufs2 and copy "ufs2" explicitly instead. Submitted by: Christian S.J. Peron <maneo@bsdpro.com>	2004-04-26 15:13:46 +00:00
Bruce Evans	f679aa45a7	Record where half the bits in this file came from (from ufs_readwrite.c). Damage to history from moving bits was especially large since a repo copy is not feasible for partial files.	2004-04-07 11:21:18 +00:00
Warner Losh	012d41340a	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson	2004-04-07 03:47:21 +00:00
John Baldwin	255ec151e6	Fix a paste-o from the buf_prewrite() cleanup commit and check for the MNTK_SUSPEND flag on the correct vnode pointer in softdep_disk_prewrite(). Reviewed by: phk Tested by: kensmith	2004-04-06 19:20:24 +00:00
Maxime Henrion	b1fddb236f	Fix the remaining warnings of growfs(8) on my sparc64 box with WARNS=6. I don't change the WARNS level in the Makefile because I didn't tested this on other archs. The fs.h fix was suggested by: marcel Reviewed by: md5(1)	2004-04-03 23:30:59 +00:00
Alexander Kabaev	c355fd5a84	Avoid doing bawrite to initialize inode block while holding cylinder group block locked. If filesystem has any active snapshots, bawrite can come back trying to allocate new snapshot data block from the same cylinder group and cause panic due to recursive lock attempt. PR: 64206 Reviewed by: mckusick Tested by: pjd	2004-03-16 22:06:32 +00:00
Poul-Henning Kamp	ceb58ca58f	When I was a kid my work table was one cluttered mess an cleaning it up were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.	2004-03-11 18:50:33 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
Kirk McKusick	ecef42e1eb	A more accurate test in the new ufs_lock than that in 1.235.	2004-02-23 19:05:05 +00:00
Kirk McKusick	546a1660f0	In the function clear_inodedeps(), a FREE_LOCK() should be called AFTER the call to vn_start_write(), not before it. Otherwise, it is possible to unlock it multiple times if the vn_start_write() fails. Submitted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>	2004-02-23 06:56:31 +00:00
Kirk McKusick	6c053cec34	Change UFS from using vop_stdlock to using its own ufs_lock. In ufs_lock, check for attempts to acquire shared locks on snapshot files and change them to be exclusive locks. This change eliminates deadlocks and machine lockups reported in -current since most read requests started using shared lock requests. Submitted by: Jun Kuriyama <kuriyama@imgsrc.co.jp>	2004-02-23 06:40:17 +00:00
Robert Watson	f6a4109212	Update my personal copyrights and NETA copyrights in the kernel to use the "year1-year3" format, as opposed to "year1, year2, year3". This seems to make lawyers more happy, but also prevents the lines from getting excessively long as the years start to add up. Suggested by: imp	2004-02-22 00:33:12 +00:00
David Malone	346180de08	Abstract dirhash's locking using macros. This should make it easier to use the same dirhash code on different branches/platforms. Reviewed by: Ted Unangst <tedu@zeitbombe.org> Reviewed by: iedowse MFC after: 3 weeks	2004-02-15 21:39:35 +00:00
Bruce Evans	e9827c6d93	Fixed some style bugs: - don't unlock the vnode after vinvalbuf() only to have to relock it almost immediately. - don't refer to devices classified by vn_isdisk() as block devices.	2004-02-14 04:41:13 +00:00
Bruce Evans	0efb13948d	MFextfs: backed out secondary changes in rev.1.40 that had become just style bugs (a variable that is used only once, and misformattings).	2004-02-13 03:05:12 +00:00
Jun Kuriyama	df1941fb59	Fix style bugs in previous commit. Submitted by: bde	2004-02-13 02:02:06 +00:00
Bruce Evans	8adff5fc12	Fixed some minor style bugs (English usage and formatting of binary operators) in and near revs.1.169-1.170 (open mode bandaid). This (or better a proper fix) should have been done before cloning the bandaid to many other file systems.	2004-02-12 16:52:24 +00:00
Jun Kuriyama	5580f04ab0	Reverse lock order by using local variable. This will shut up "acquiring duplicate lock of same type" message. Reviewed by: mckusick	2004-02-12 08:52:08 +00:00
Bruce Evans	1723bc36ef	Removed more vestiges of vfs_ioopt: - rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed timestamp bugs in it. Removal of most of vfs_ioopt made it just and optimization, and removal of the vm object reference calls made it less than an optimization. It was cloned in rev.1.94 of ufs_readwrite.c as part of cloning ffs_extwrite() although it was always less than an optimization in ffs_extwrite(). - some comments, compound statements and vertical whitespace were vestiges of dead code.	2004-02-11 15:27:26 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alan Cox	bfb7317ebf	Remove unnecessary vm object reference and deallocate calls from ffs_read() and ffs_write(). These calls trace their origins to the dead vfs_ioopt code, first appearing in revision 1.39 of ufs_readwrite.c. Observed by: bde Discussed with: tegge	2004-01-31 05:42:58 +00:00
Andrey A. Chernov	a0036d23a6	Turn uio_resid/uio_offset comments into KASSERTs Reviewed by: bde	2004-01-27 11:28:38 +00:00
Andrey A. Chernov	51cf017614	Copy comment about caller check from ffs_read to ffs_extread, don't check for uio_resid < 0 here too.	2004-01-23 06:00:41 +00:00
Andrey A. Chernov	070f8eefb1	Fix various panic() strings to reflect true function name to allow easy grep. Small code reorganization to look more logic. Copy ffs_write check from prev. commit to ffs_extwrite.	2004-01-23 05:52:31 +00:00
Andrey A. Chernov	bd0cc17757	ffs_read: Replace wrong check returned EFBIG with EOVERFLOW handling from POSIX: 36708 [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file, and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes. ffs_write: Replace u_int64_t cast with uoff_t cast which is more natural for types used. ffs_write & ffs_read: Remove uio_offset and uio_resid checks for negative values, the caller supposed to do it already. Add comments about it. Reviewed by: bde	2004-01-23 05:38:02 +00:00
Alexander Kabaev	6bd39fe978	Spell magic '16' number as IO_SEQSHIFT.	2004-01-19 20:03:43 +00:00
Alexander Kabaev	291027ce9c	Avoid calling vprint on a vnode while holding its interlock mutex. Move diagnostic printf after vget. This might delay the debug output some, but at least it keeps kernel from exploding if DEBUG_VFS_LOCKS is in effect.	2004-01-04 04:08:34 +00:00
Don Lewis	31c81e4bed	Set fs_ronly to the correct value in ffs_reload() when reloading the file system super block after fsck has repaired the file system. The value of fs_ronly was getting overwritten, which caused ffs_update() to attempt to update inode timestamps even though the file system was still mounted read-only. This fixes the "giving up on N buffers" error that is triggered by running fsck on the root file system and then rebooting without mounting the file system read-write.	2003-12-07 05:16:52 +00:00
Wes Peters	ec52df8eb9	Write the UFS2 superblock with a 'BAD' magic number at the beginning of newfs, to signify the newfs operation has not yet completed. Re- write the superblock with the correct magic number once all of the cylinder groups have been created to show the operation has finished. Sponsored by: St. Bernard Software	2003-11-16 07:08:27 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	c78b8dfacf	Call free(9) after the vnode interlock is released, avoiding a lock-order reversal.	2003-11-13 03:56:32 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Alexander Kabaev	45d45c6cde	Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter and one cannot be sure it is equal to curthread.	2003-11-03 04:46:19 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Don Lewis	9f206707a5	Tweak the calculation of minbfree in ffs_dirpref() so that only those cylinder groups that have at least 75% of the average free space per cylinder group for that file system are considered as candidates for the creation of a new directory. The previous formula for minbfree would set it to zero if the file system was more than 75% full, which allowed cylinder groups with no free space at all to be chosen as candidates for directory creation, which resulted in an expensive search for free blocks for each file that was subsequently created in that directory. Modify the calculation of minifree in the same way. Decrease maxcontigdirs as the file system fills to decrease the likelyhood that a cluster of directories will overflow the available space in a cylinder group. Reviewed by: mckusick Tested by: kmarx@vicor.com MFC after: 2 weeks	2003-10-31 07:25:06 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Tor Egge	f0da6ec99b	Initialize bp->b_offset to the physical offset in partition so GEOM knows where to read from disk.	2003-10-22 18:57:59 +00:00
Poul-Henning Kamp	2c18019f14	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
Poul-Henning Kamp	4e1694ecaf	Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()	2003-10-18 11:16:33 +00:00
Kirk McKusick	bd189c8c3e	When expunging unlinked files from a snapshot, skip over holes in the file rather than panicing with "indiracct: botched params". Submitted by: Mark Santcroos <marks@ripe.net>	2003-10-17 13:57:58 +00:00
Jeff Roberson	a844eb934c	- My last commit to this file is still not safe, I believe that it may be due to the recursion in indir_trunc().	2003-10-06 03:28:03 +00:00
Jeff Roberson	8af6a57099	- Reinstate 1.142 this was fixed by 1.144.	2003-10-06 02:39:37 +00:00
Jeff Roberson	69b609a85d	- The VCHR case in ffs_sync() is an unneccsary optimization especially considering how infrequently we access devices via ffs now that we have devfs. Collapse this case with the other case. Obtained from: bde	2003-10-05 22:56:33 +00:00
Jeff Roberson	ab1f917b53	- Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE() so make the code slightly more uniform. The vnode lock is acquired in all cases and now the only difference between VCHR and other is we call UFS_UPDATE instead of VOP_FSYNC().	2003-10-05 09:42:24 +00:00
Jeff Roberson	cffa37d466	- In ffs_update() assert that either the vnode lock or the XLOCK is held.	2003-10-05 09:39:02 +00:00
Jeff Roberson	2f05568aa8	- Check the XLOCK before inspecting v_data. - Slightly rewrite the fsync loop to be more lock friendly. We must acquire the vnode interlock before dropping the mnt lock. We must also check XLOCK to prevent vclean() races. - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean() races. - Use a local variable to store the results of the nvp == TAILQ_NEXT test so that we do not access the vp after we've vrele()d it. - Add an XXX comment about UFS_UPDATE() not being protected by any lock here. I suspect that it should need the VOP lock.	2003-10-05 07:16:45 +00:00
Jeff Roberson	53938b4a86	- Skip over xvp if XLOCK is set.	2003-10-05 06:48:37 +00:00
Jeff Roberson	5c014b9d6d	- Don't cache_purge() in ufs_reclaim. vclean() does it for us so this is redundant.	2003-10-05 02:45:00 +00:00
Alan Cox	ccf78b6895	Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-04 20:38:32 +00:00
Jeff Roberson	cac3558da3	- The VI assert in getdirtybuf() is only valid if we're not on a VCHR vnode. VCHR vnodes don't do background writes. Reported by: kan	2003-10-04 15:57:05 +00:00
Jeff Roberson	04a17687ea	- Increase the scope of the interlock in ffs_reload(). Acquire it before we release the mntvnode_mtx. - Call vgonel() directly instead of going through vrecycle() since we own the interlock now. - Remove a few cases where we locked the interlock just so that we could call VOP_UNLOCK with interlock held.	2003-10-04 14:27:49 +00:00
Jeff Roberson	934914d2ef	- Fix an unlocked call to GETATTR by slightly shuffling the code in ffs_snapshot() around. - Acquire the interlock before releasing the mntvnode_mtx. Use the interlock to protect v_usecount access.	2003-10-04 14:25:45 +00:00
Jeff Roberson	90e1659e41	- Use the VI_LOCK macro in two places where we directly called mtx_lock() before. Direct calls indicated places that needed review and these have now been reviewed.	2003-10-04 14:03:28 +00:00
Jeff Roberson	8f2e9e4388	- Properly acquire the vnode interlock before releasing the mntvnode_mtx. - Use a local variable to store the results of the test to see if the next vnode on the mount list has changed. This is so that we no longer acess the vnode after we vput() it.	2003-10-04 14:02:32 +00:00
Jeff Roberson	04c81ad83c	- Remove a mp_fixme() and some locks that weren't necessary. I now understand how this works.	2003-10-04 11:06:43 +00:00
Jeff Roberson	cfd5600c66	- Several of the callers to getdirtybuf() were erroneously changed to pass in a list head instead of a pointer to the first element at the time of the first call. These lists are subject to change, and getdirtybuf() would refetch from the wrong list in some cases. Spottedy by: tegge Pointy hat to: me	2003-09-03 04:08:15 +00:00
Jeff Roberson	23efe6dafc	- Backout rev 1.142. This caused a deadlock that I do not understand. More investigation is required.	2003-08-31 11:26:52 +00:00
Jeff Roberson	d919a11d06	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
Jeff Roberson	a0ebaaddef	- Don't acquire the vnode interlock in drain_output(). Instead, require the caller to acquire it. This permits drain_output() to be done atomically with other operations as well as reducing the number of lock operations. - Assert that the proper locks are held in drain_output(). - Change getdirtybuf() to accept a mutex as an argument. This mutex is used to protect the vnode's buf list and the BKGRDWAIT flag. This lock is dropped when we successfully acquire a buffer and held on return otherwise. These semantics reduce the number of cumbersome cases in calling code. - Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF case of interlocked_sleep(). - Change the return value of getdirtybuf() to be the resulting locked buffer or NULL otherwise. This is for callers who pass in a list head that requires a lock. It is necessary since the lock that protects the list head must be dropped in getdirtybuf() so that we don't have a lock order reversal with the buf queues lock in bremfree(). - Adjust all callers of getdirtybuf() to match the new semantics. - Add a comment in indir_trunc() that points at unlocked access to a buf. This may also be one of the last instances of incore() in the tree.	2003-08-31 07:29:34 +00:00
Jeff Roberson	9dbfeb0ae6	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
Alan Cox	9cf8f2f707	The previous change necessitates the addition of a new #include. Otherwise, there is a compilation warning.	2003-08-18 17:27:08 +00:00
Poul-Henning Kamp	b103854847	Don't use a VOP_*() function on our own vnodes, go directly to the relevant internal function, in this case ufs_bmaparray().	2003-08-17 19:26:03 +00:00
Alan Cox	f6c098e569	Revision 1.44 of ufs/ufs/inode.h has made it necessary to add two new #includes to this file. Otherwise, it doesn't compile.	2003-08-16 06:15:17 +00:00
Poul-Henning Kamp	5c24d6ee26	Eliminate the i_devvp field from the incore UFS inodes, we can get the same value from ip->i_ump->um_devvp. This saves a pointer in the memory copies of inodes, which can easily run into several hundred kilobytes. The extra indirection is unmeasurable in benchmarks. Approved by: mckusick	2003-08-15 20:03:19 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
Robert Watson	2495048579	Now that the central POSIX.1e ACL code implements functions to generate the inode mode from a default ACL and creation mask, implement ufs_sync_inode_from_acl() using acl_posix1e_newfilemode(). Since ACL_OVERRIDE_MASK/ACL_PRESERVE_MASK are defined, we no longer need to explicitly pass in a "preserve_mask" field: this is implicit in the use of POSIX.1e semantics. Note: this change contains a semantic bugfix for new file creation: we now intersect the ACL-generated mode and the cmode requested by the user process. This means permissions on newly created file objects will now be more conservative. In the future, we may want to provide alternative semantics (similar to Solaris and Linux) in which the ACL mask overrides the umask, permitting ACLs to broaden the rights beyond the requested umask. PR: 50148 Reported by: Ritz, Bruno <bruno_ritz@gmx.ch> Obtained from: TrustedBSD Project	2003-08-04 03:29:13 +00:00
Robert Watson	7942b925b8	In ufs_chmod(), use privilege only when required in the following cases: - Setting sticky bit on non-directory - Setting setgid on a file with a group that isn't in the effective or extended groups of the authorizing credential I.e., test the requirement first, then do the privilege test, rather than doing the privilege test regardless of the need for privilege. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-04 00:31:01 +00:00
Robert Watson	9080ff25cf	Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the kernel ACL interfaces and system call names. Break out UFS2 and FFS extattr delete and list vnode operations from setextattr and getextattr to deleteextattr and listextattr, which cleans up the implementations, and makes the results more readable, and makes the APIs more clear. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-07-28 18:53:29 +00:00
Poul-Henning Kamp	7c89f162bc	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.	2003-07-27 17:04:56 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
Poul-Henning Kamp	b941a2beb7	We just cached the inode pointer, no need to call VTOI() again.	2003-07-04 12:16:33 +00:00
Alan Cox	4e28b22e35	Lock the vm object when freeing pages.	2003-06-15 21:50:38 +00:00
Poul-Henning Kamp	cefb5754dd	Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.	2003-06-15 18:53:00 +00:00
Robert Watson	44533b1722	Re-implement kernel access control for quotactl() as found in the UFS quota implementation. Push some quite broken access control logic out of ufs_quotactl() into the individual command implementations in ufs_quota.c; fix that logic. Pass in the thread argument to any quotactl command that will need to perform access control. o quotaon() requires privilege (PRISON_ROOT). o quotaoff() requires privilege (PRISON_ROOT). o getquota() requires that: If the type is USRQUOTA, either the effective uid match the requested quota ID, that the unprivileged_get_quota flag be set, or that the thread be privileged (PRISON_ROOT). If the type is GRPQUOTA, require that either the thread be a member of the group represented by the requested quota ID, that the unprivileged_get_quota flag be set, or that the thread be privileged (PRISON_ROOT). o setquota() requires privilege (PRISON_ROOT). o setuse() requires privilege (PRISON_ROOT). o qsync() requires no special privilege (consistent with what was present before, but probably not very useful). Add a new sysctl, security.bsd.unprivileged_get_quota, which when set to a non-zero value, will permit unprivileged users to query user quotas with non-matching uids and gids. Set this to 0 by default to be mostly consistent with the previous behavior (the same for USRQUOTA, but not for GRPQUOTA). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-15 06:36:19 +00:00
Poul-Henning Kamp	7652131bee	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk	2003-06-12 20:48:38 +00:00
David E. O'Brien	f4636c5959	Use __FBSDID().	2003-06-11 06:34:30 +00:00
Robert Watson	1e9e2eb598	Implement ffs_listextattr() by breaking out that logic and special-cased attribute name of "" from ffs_getextattr(). Invoking VOP_GETETATTR() with an empty name is now no longer supported; user application compatibility is provided by a system call level compatibility wrapper. We make sure to explicitly reject attempts to set an EA with the name "". Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 05:57:39 +00:00
Robert Watson	bd38ab57a1	Don't special-case handling of the empty string in the UFS1 extended attribute retrieval code: it's no longer special-cased, and is caught by the normal UFS1 EA validity checks (and, in fact, returns the same error, EINVAL). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-06-05 04:58:58 +00:00
Robert Watson	e1249def7d	Return EOPNOTSUPP for attempted EA operations on VCHR vnodes in UFS2; if we permit them to occur, the kernel panics due to our performing EA operations using VOP_STRATEGY on the vnode. This went unnoticed previously because there are very for users of device nodes on UFS2 due to the introduction of devfs. However, this can come up with the Linux compat directories and its hard-coded dev nodes (which will need to go away as we move away from hard-coded device numbers). This can come up if you use EA-intensive features such as ACLs and MAC. The proper fix is pretty complicated, but this band-aid would be an excellent MFC candidate for the release.	2003-06-01 02:42:18 +00:00
Poul-Henning Kamp	61301f74d0	Remove unused variable. Found by: FlexeLint	2003-05-31 19:56:09 +00:00
Poul-Henning Kamp	6280ed26af	Remove unused local variables. Found by: FlexeLint	2003-05-31 18:17:32 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Alan Cox	7f758dabbb	Lock the vm object when performing vm_object_page_clean(). Approved by: re (rwatson)	2003-05-18 22:02:51 +00:00
Robert Watson	62d4b85ec1	Jeff added locking assertions that the VV_ flags on vnodes were modified only while holding appropriate vnode locks. This patch slides the lock release for ufs_extattr_enable() to continue to hold the active vnode lock on a backing file until after the flag change; it also acquires a vnode lock when disabling an attribute and hence clearing a flag on the backing vnode. This permits VFS_DEBUG_LOCKS to run UFS1 extended attributes without panicking, as well as preventing a potential race and vnode flag problem. Approved by: re (jhb) Pointed out by: DEBUG_VFS_LOCKS	2003-05-15 21:07:33 +00:00
Alan Cox	ad682c4825	Lock the vm_object on entry to vm_object_vndeallocate().	2003-05-03 20:28:26 +00:00
Tim J. Robbins	3632928957	Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree(). These fields can be left as NULL if ffs_vget() allocates an inode but fails before the dinode memory has been allocated. There are two cases when this can occur: when we lose a race and another process has added the inode to the hash, and when reading the inode off disk fails. The bug was observed by Kris on one of the package-building machines. See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2 In Kris's case, it was the bread() that failed because of a disk error. The alternative to this patch is to ensure that ffs_vget() does not call vput() when the inode that hasn't been properly initialised.	2003-05-01 06:41:59 +00:00
Tim J. Robbins	8d721e877d	Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems. This is purely a cosmetic change because these members are in a union together.	2003-05-01 06:38:27 +00:00
Mark Murray	51da11a27a	Fix some easy, global, lint warnings. In most cases, this means making some local variables static. In a couple of cases, this means removing an unused variable.	2003-04-30 12:57:40 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
John Baldwin	a15cc35909	Lock both the proc lock and sched_lock when calling sched_nice since kg_nice is now protected by both. Being protected by both means that other places in the kernel that want to read kg_nice only need one of the two locks.	2003-04-22 20:45:38 +00:00
Jeff Roberson	86711bae9b	- Use the sched_nice() api instead of setting the nice value directly. Tested by: Steve Kargl <sgk@troutmask.apl.washington.edu>	2003-04-12 01:05:19 +00:00
Alan Cox	6134838f99	Sufficient access checks are performed by vmapbuf() that calling useracc() is pointless. Remove the call to useracc(). Don't reinitialize fields that are already initialized by getpbuf(). Reviewed by: tegge	2003-04-06 19:26:30 +00:00
Tor Egge	5e2e6a67c4	Check return value from vmapbuf instead of the function address.	2003-03-27 20:48:34 +00:00
Tor Egge	10dccf8ff2	Eliminate a buffer sleep/wakeup race.	2003-03-27 19:28:11 +00:00
Tor Egge	5bbb806004	Add support for reading directly from file to userland buffer when the O_DIRECT descriptor status flag is set and both offset and length is a multiple of the physical media sector size.	2003-03-26 23:40:42 +00:00
John Baldwin	31566c96f4	Use td->td_ucred instead of td->td_proc->p_ucred.	2003-03-20 21:17:40 +00:00
John Baldwin	2a53bfbe62	Minor fixes to ffs_fserr(): - Assume that curthread is not NULL. It never is in -current. - Use td_ucred instead of p_ucred.	2003-03-20 21:15:54 +00:00
Poul-Henning Kamp	b4b138c27f	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.	2003-03-18 08:45:25 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Kirk McKusick	34968037b1	Use the appropriate size when zeroing out the unused portion of a snapshot's copy of a superblock. This patch fixes a panic when taking a snapshot of a 4096/512 filesystem. Reported by: Ian Freislich <ianf@za.uu.net> Sponsored by: DARPA & NAI Labs.	2003-03-07 23:49:16 +00:00
Alan Cox	09c80124a3	Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress. Discussed on: arch@	2003-03-06 03:41:02 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Nate Lawson	99648386d3	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.	2003-03-03 19:15:40 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Kirk McKusick	74f3809a19	Change the field used to test whether the superblock has been updated from the filesystem size field to the filesystem maximum blocksize field. The problem is that older versions of growfs updated only the new size field and not the old size field. This resulted in the old (smaller) size field being copied up to the new size field which caused the filesystem to appear to fsck to be badly trashed. This also adds a sanity check to ensure that the superblock is not being updated when the filesystem is mounted read-only. Obviously such an update should never happen. Reported by: Nate Lawson <nate@root.org> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:21:08 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
David Schultz	9cdb2d4d9d	Expand the reference count on struct dquot to 32 bits. This fixes a panic on large systems where a single user may have more than 64K active or inactive vnodes. PR: 48234 Reviewed by: mike (mentor)	2003-02-24 08:49:59 +00:00
Kirk McKusick	3bf0ed940b	When removing the last item from a non-empty worklist, the worklist tail pointer must be updated. Reported by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs.	2003-02-24 07:28:41 +00:00
Kirk McKusick	5bb651cb72	This patch fixes a deadlock between the bufdaemon and a process taking a snapshot. As part of taking a snapshot of a filesystem, the kernel builds up a list of the filesystem metadata (such as the cylinder group bitmaps) that are contained in the snapshot. When doing a copy-on-write check, the list is first consulted. If the block being written is found on the list, then the full snapshot lookup can be avoided. Besides providing an important performance speedup this check also avoids a potential deadlock between the code creating the snapshot and the bufdaemon trying to cleanup snapshot related buffers. This fix creates a temporary list containing the key metadata blocks that can cause the deadlock. This temporary list is used between the time that the snapshot is first enabled and the time that the fully complete list is built. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-22 00:59:34 +00:00
Kirk McKusick	37e2ebfdba	This patch fixes a bug on an active filesystem on which a snapshot is being taken from panicing with either "freeing free block" or "freeing free inode". The problem arises when the snapshot code is scanning the filesystem looking for inodes with a reference count of zero (e.g., unlinked but still open) so that it can expunge them from its view. If it encounters a reclaimed vnode and has to restart its scan, then it will panic if it encounters and tries to free an inode that it has already processed. The fix is to check each candidate inode to see if it has already been processed before trying to delete it from the snapshot image. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:29:51 +00:00
Kirk McKusick	d60682c239	This patch fixes a bug in the logical block calculation macros so that they convert to 64-bit values before shifting rather than afterwards. Once fixed, they can be used rather than inline expanded. Sponsored by: DARPA & NAI Labs.	2003-02-22 00:19:26 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Kirk McKusick	aca3e4974f	Replace use of random() with arc4random() to provide less guessable values for the initial inode generation numbers in newfs and for newly allocated inode generation numbers in the kernel. Submitted by: Theo de Raadt <deraadt@cvs.openbsd.org> Sponsored by: DARPA & NAI Labs.	2003-02-14 21:31:58 +00:00
Kirk McKusick	50bd54e391	Correct lines incorrectly added to the copyright message. Submitted by: Frank van der Linden <fvdl@wasabisystems.com> Sponsored by: DARPA & NAI Labs.	2003-02-14 00:31:06 +00:00
Jeff Roberson	767b9a529d	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h	2003-02-09 11:28:35 +00:00

... 3 4 5 6 7 ...

1488 Commits