freebsd-skq

Author	SHA1	Message	Date
Tor Egge	791dd2fade	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.	2006-03-08 23:43:39 +00:00
Tor Egge	3b582b4e72	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
Jeff Roberson	b9b12498fd	- Acquire lk in softdep_slowdown so that it's owned when we call softdep_speedup(). - Assert that lk is held in softdep_speedup() rather than acquiring it. This avoids a potential lock recursion.	2006-03-02 08:52:53 +00:00
Jeff Roberson	eb2ea10590	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris	2006-03-02 05:50:23 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Tor Egge	6c62b2acd0	If the lock passed to getdirtybuf() is the softdep lock then the background write completed wakeup could be missed. Close the race by grabbing the lock normally used for protection of bp->b_xflags. Reviewed by: truckman	2006-01-09 19:32:21 +00:00
Tor Egge	c8c7711d66	Broaden scope of softdep_worklist_busy rwlock protection of softdep processing to avoid some dependencies being missed by softdep_flushworklist(). Reviewed by: truckman	2006-01-09 19:16:56 +00:00
Warner Losh	5c65ae3a88	New option: NO_FFS_SNAPSHOT. I did this in p4 about the same time that NetBSD implemented it independently of them (don't know which one was actually first). This saves about 24k for those times you don't need snapshot support (like when running off a ram disk, or in an embedded environment where size matters).	2006-01-06 04:44:09 +00:00
Xin LI	cd34c8b6a2	Typo.	2005-12-23 15:50:57 +00:00
Craig Rodrigues	b6bd025c35	Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow mount options. Noticed by: Amir Shalem < amir at boom dot org dot il>	2005-11-24 15:06:40 +00:00
Craig Rodrigues	cea903627f	If export mount flag is not passed in, set default parameters for export structure and pass that to vfs_export(). Currently in userland mount(8), an export structure is unconditionally passed in, only for UFS. This is an attempt to move that UFS-specific behavior out of mount(8) and into the UFS filesystem code.	2005-11-20 17:04:50 +00:00
Craig Rodrigues	359d438885	Add more options to ffs_opts, so that vfs_filteropts() will not complain when we pass these options to a UFS filesystem as strings via nmount(): noexec, nosuid, nosymfollow, sync, suiddir	2005-11-19 23:28:19 +00:00
Craig Rodrigues	26f59b6455	- Add parsing for the following existing UFS/FFS mount options in the nmount() callpath via vfs_getopt(), and set the appropriate MNT_* flag: -> acls, async, force, multilabel, noasync, noatime, -> noclusterr, noclusterw, snapshot, update - Allow errmsg as a valid mount option via vfs_getopt(), so we can later add a hook to propagate mount errors back to userspace via vfs_mount_error().	2005-11-18 06:06:10 +00:00
Paul Saab	e1cef62715	Rate limit filesystem full and out of inodes messages to once a second.	2005-10-31 20:33:28 +00:00
Nate Lawson	8680d6985f	Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit to (max block - 1) * bsize. For DEV_BSIZE, this doubles the limit from 0.5 TB to 1 TB. For the old 4.4 FFS case, decrease the limit from 0.5 TB to 2 GB - 1. Older systems had a 32 bit off_t so they couldn't access the larger files anyway. Collaboration with: bde	2005-10-21 01:54:00 +00:00
Tor Egge	48c2ac4539	Avoid unintended VMIO on directories and symlinks due to leftover object not having been destroyed.	2005-10-10 19:02:04 +00:00
Tor Egge	4e0cd00988	Adjust totread argument passed to cluster_read() to account for offset not being block aligned.	2005-10-09 21:11:25 +00:00
Tor Egge	9248a8271c	Don't pretend that a failed sync write was succesful.	2005-10-09 20:49:01 +00:00
Tor Egge	021869b542	Reduce probability for a deadlock that can occur when a snapshot inode is updated by a process holding the snapshot lock. Another process updating a different inode in the same inodeblock will do copy on write checks and lock in the opposite direction. The snapshot code force a copy on write of these blocks manually (cf. start of expunge_ufs[12]) and these inode blocks are later put on snapblklist. This partial fix is to 'drain' the relevant ffs_copyonwrite() operation after installing new snapblklist. This is not a 100% solution since a failed block allocation can cause implicit fsync() which might deadlock before the new snapblklist has been installed.	2005-10-09 20:15:15 +00:00
Tor Egge	d4d530da96	Eliminate a deadlock that can occur when a dirty block belonging to a snapshot file is flushed by a process not holding snaplk (e.g. bufdaemon). Another process might hold snaplk and try to access the block due to ffs_copyonwrite processing.	2005-10-09 20:07:51 +00:00
Tor Egge	45f91051da	Eliminate a deadlock that can occur during the cgaccount() processing due to the cg map buffer being held when writing indirect blocks. The process ends up in ffs_copyonwrite(), attempting to get snaplk while holding the cg map buffer lock. Another process might be in ffs_copyonwrite(), trying to allocate a new block for a copy. It would hold snaplk while trying to get the cg map buffer lock. Release the cg map buffer early and use the copy for most of the cgaccount processing to avoid this deadlock.	2005-10-09 20:00:16 +00:00
Tor Egge	17026ff61a	Reduce the probability of low block numbers passed to ffs_snapblkfree() by skipping the call from ffs_snapremove() if the block number is zero. Simplify snapshot locking in ffs_copyonwrite() and ffs_snapblkfree() by using the same locking protocol for low block numbers as for larger block numbers. This removes a lock leak that could happen if vn_lock() succeeded after lockmgr() failed in ffs_snapblkfree(). Check if snapshot is gone before retrying a lock in ffs_copyonwrite().	2005-10-09 19:45:01 +00:00
Tor Egge	c73e9e9c7b	Reinitialize v_type and v_op fields in case vnode has been reused without reclamation. If the vnode previously was a fifo then v_op would point to ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at the end of ffsext_strategy.	2005-10-09 19:06:34 +00:00
Don Lewis	448434c3c9	Initialize the inode i_flag field in ffs_valloc() to clean up any stale flag bits left over from before the inode was recycled. Without this change, a leftover IN_SPACECOUNTED flag could prevent softdep_freefile() and softdep_releasefile() from incrementing fs_pendinginodes. Because handle_workitem_freefile() unconditionally decrements fs_pendinginodes, a negative value could be reported at file system unmount time with a message like: unmount pending error: blocks 0 files -3 The pending block count in fs_pendingblocks could also be negative for similar reasons. These errors can cause the data returned by statfs() to be slightly incorrect. Some other cleanup code in softdep_releasefile() could also be incorrectly bypassed. MFC after: 3 days	2005-10-03 21:57:43 +00:00
Don Lewis	460858e9ef	Correct previous commit to fix the sense of the TDP_NORUNNINGBUF check in ffs_copyonwrite() that is a precondition for calling waitrunningbufspace(). Pointed out by: tegge Pointy hat to: truckman MFC after: 3 days	2005-10-01 19:10:48 +00:00
Don Lewis	bd3c2d867d	Un-staticize waitrunningbufspace() and call it before returning from ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().	2005-09-30 18:07:41 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
Don Lewis	445193b887	After a rmdir()ed directory has been truncated, force an update of the directory's inode after queuing the dirrem that will decrement the parent directory's link count. This will force the update of the parent directory's actual link to actually be scheduled. Without this change the parent directory's actual link count would not be updated until ufs_inactive() cleared the inode of the newly removed directory, which might be deferred indefinitely. ufs_inactive() will not be called as long as any process holds a reference to the removed directory, and ufs_inactive() will not clear the inode if the link count is non-zero, which could be the result of an earlier system crash. If a background fsck is run before the update of the parent directory's actual link count has been performed, or at least scheduled by putting the dirrem on the leaf directory's inodedep id_bufwait list, fsck will corrupt the file system by decrementing the parent directory's effective link count, which was previously correct because it already took the removal of the leaf directory into account, and setting the actual link count to the same value as the effective link count after the dangling, removed, leaf directory has been removed. This happens because fsck acts based on the actual link count, which will be too high when fsck creates the file system snapshot that it references. This change has the fortunate side effect of more quickly cleaning up the large number dirrem structures that linger for an extended time after the removal of a large directory tree. It also fixes a potential problem with the shutdown of the syncer thread timing out if the system is rebooted immediately after removing a large directory tree. Submitted by: tegge MFC after: 3 days	2005-09-29 21:50:26 +00:00
Robert Watson	5f419982c2	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde	2005-09-28 07:03:03 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Tor Egge	2f0ffabcf4	Giant is no longer needed here.	2005-09-12 01:21:42 +00:00
Tor Egge	d536ff2edb	Retain generation count when writing zeroes instead of an inode to disk. Don't free a struct inodedep if another process is allocating saved inode memory for the same struct inodedep in initiate_write_inodeblock_ufs[12](). Handle disappearing dependencies in softdep_disk_io_initiation(). Reviewed by: mckusick	2005-09-05 22:14:33 +00:00
Suleiman Souhlal	fdedad764a	ffs_mountfs() needs devvp to be locked, so lock it. Glanced at by: phk Tested by: pjd MFC after: 3 days	2005-09-02 13:52:55 +00:00
Suleiman Souhlal	93373c422f	Set the mountpoint path in the superblock (fs_fsmnt) at mount-time so that it appears in the various messages (not cleanly unmounted, filesystem full, etc). This has been broken since rev 1.261.	2005-08-21 22:06:41 +00:00
Tor Egge	15da51f73c	Don't set the COMPLETE flag in an inodedep structure before the related inode has been written.	2005-08-21 18:19:06 +00:00
Stephan Uphoff	ed8938e082	Delay freeing disk space for file system blocks until all dirty buffers are safely released. This fixes softdep problems on truncation (deletion) of files with dirty buffers. Reviewed by: jeff@, mckusick@, ps@, tegge@ Tested by: glebius@, ps@ MFC after: 3 weeks	2005-07-31 20:24:14 +00:00
Alan Cox	ec9c9e7363	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
Suleiman Souhlal	679985d03a	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)	2005-06-09 20:20:31 +00:00
Jeff Roberson	204ec66d38	- Don't set our bio op to be a READ when we've just completed a write. There are subtle differences in the read and write completion path. Instead, grab an extra write ref so the write path can drop it when we recursively call bufdone(). I believe this may be the source of the wrong bufobj panics. Reported by: pho, kkenn	2005-05-30 07:04:15 +00:00
Jeff Roberson	6c71a2208d	- Don't restrict the softdep stats to DEBUG kernels, they cost nothing to export. This was happening anyway since this file manually sets DEBUG. - Add a sysctl for the number of items on the worklist. - Use a more canonical loop restart in softdep_fsync_mountdev, it saves some code at the expense of a goto and makes me worry less about modifying a variable that should be private to the TAILQ_FOREACH_SAFE macro.	2005-05-03 11:03:29 +00:00
Jeff Roberson	2524c26de8	- Use bdone() directly instead of calling it indirectly through ffs_rawreaddone(). Sponsored by: Isilon Systems, Inc.	2005-04-30 11:28:19 +00:00
Jeff Roberson	ece9473efa	- Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate(). Do the same for oip. Pointed out by: glebius	2005-04-05 08:49:41 +00:00
Jeff Roberson	bcc8f66c8b	- Use M_ZERO rather than explicitly calling bzero(). - Don't intermingle direct calls to lockmgr and indirect calls through VOPs. This will be important in the future. - Dont lock the devvp's interlock just to release it on the next line by passing LK_INTERLOCK to lockmgr. - Restructure ffs_snapshot_unmount so we don't call free() with the devvp's interlock locked.	2005-04-03 12:03:44 +00:00
Jeff Roberson	41d4783d49	- In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode because it may change identities while we're sleeping on the lock. Otherwise we may bail out of ffs_sync() early due to an error from deadfs. - Collapse a VOP_UNLOCK, vrele into a single vput().	2005-04-03 10:38:18 +00:00
Jeff Roberson	153910e0f5	- Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix two bugs. - ffs_disk_prewrite was pulling the vp from the buf and checking for COPYONWRITE, when really it wanted the vp from the bufobj that we're writing to, which is the devvp. This lead to us skipping the copy on write to all file data, which significantly broke snapshots for the last few months. - When the SOFTUPDATES option was not included in the kernel config we would also skip the copy on write check, which would effectively disable snapshots. - Remove an invalid mp_fixme(). Debugging tips from: mckusick Reported by: iedowse, others Discussed with: phk	2005-04-03 10:29:55 +00:00
Jeff Roberson	aa7ba42796	- FFS supports shared locks, clear LK_NOSHARE from our vnode locks. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:23:20 +00:00
Jeff Roberson	ec3db02a3e	- Set LK_NOSHARE for snapshot locks. snapshots require exclusive only access. - Remove the hack from ffs_lock() to implement LK_NOSHARE in a ffs specific way. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:21:17 +00:00
Jeff Roberson	f247a5240d	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
Jeff Roberson	d6919865fa	- Upgrade a shared lock request to exclusive in ffs_vget() if we have to create the vnode. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:10:51 +00:00
David Schultz	188f6433f6	When the softupdates worklist gets too long, threads that attempt to add more work are forced to process two worklist items first. However, processing an item may generate additional work, causing the unlucky thread to recursively process the worklist. Add a per-thread flag to detect this situation and avoid the recursion. This should fix the stack overflows that could occur while removing large directory trees. Tested by: kris Reviewed by: mckusick	2005-03-25 17:30:31 +00:00
Poul-Henning Kamp	51f5ce0c8c	Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.	2005-03-16 11:20:51 +00:00
Poul-Henning Kamp	de68347b1b	Don't hold a reference on the disk vnode for each inode.	2005-03-15 20:50:58 +00:00
Poul-Henning Kamp	45c26fa2b6	Improve the vfs_hash() API: vput() the unneeded vnode centrally to avoid replicating the vput in all the filesystems.	2005-03-15 20:00:03 +00:00
Poul-Henning Kamp	e82ef95c11	Simplify the vfs_hash calling convention.	2005-03-15 08:07:07 +00:00
Poul-Henning Kamp	14bc0685ac	Use vfs_hash instead of home-rolled.	2005-03-14 10:21:16 +00:00
Jeff Roberson	9cbe5da9d5	- It is not legal to access v_data without the vnode lock or interlock held. Grab the vnode interlock if LK_INTERLOCK has not been passed in so that we can inspect v_data in ffs_lock(). Sponsored by: Isilon Systems, Inc.	2005-03-13 12:04:12 +00:00
Jeff Roberson	fe68abe291	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. - Shorten ffs_reload by one step. The old check for an inactive vnode was slightly racey, and the code which deals with still active vnodes is not much more expensive. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:03:14 +00:00
Jeff Roberson	fdcc82276e	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:01:50 +00:00
Jeff Roberson	b5411d4fcb	- Fix an assert now that the XLOCK no longer exists. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:00:41 +00:00
Jeff Roberson	41766826eb	- Fix anoter dyslexic moment; an atomic_set_int should've become ACTIVESET, not ACTIVECLEAR. Submitted by: iedowse	2005-03-01 07:38:45 +00:00
Jeff Roberson	1a4a9672f1	- Add VOP locking asserts in several functions that have been implicated in recent deadlocks.	2005-02-22 23:56:42 +00:00
Xin LI	a16baf37b9	The recomputation of file system summary at mount time can be a very slow process, especially for large file systems that is just recovered from a crash. Since the summary is already re-sync'ed every 30 second, we will not lag behind too much after a crash. With this consideration in mind, it is more reasonable to transfer the responsibility to background fsck, to reduce the delay after a crash. Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to control this behavior. When set to nonzero, we will get the "old" behavior, that the summary is computed immediately at mount time. Add five new sysctl variables to adjust ndir, nbfree, nifree, nffree and numclusters respectively. Teach fsck_ffs about these API, however, intentionally not to check the existence, since kernels without these sysctls must have recomputed the summary and hence no adjustments are necessary. This change has eliminated the usual tens of minutes of delay of mounting large dirty volumes. Reviewed by: mckusick MFC After: 1 week	2005-02-20 08:02:15 +00:00
Poul-Henning Kamp	dfd4be14bd	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
Xin LI	d5128ab2af	When clearing a fragment, it's possible that the length is zero. Reviewed by: mckusick MFC After: 1 week	2005-02-19 07:31:33 +00:00
Poul-Henning Kamp	1121c39497	Make non-SOFTUPDATES kernels compile again. Integrate the stubfile into the main file now that license issues have been long resolved.	2005-02-11 08:13:31 +00:00
Poul-Henning Kamp	adf4157738	Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static.	2005-02-10 12:20:08 +00:00
Jeff Roberson	a3caf16e99	- In the softupdates case for ffs_truncate() we use vinvalbuf() to invalidate pending io and dependencies. However, vinvalbuf() rightfully does not call vnode_pager_setsize() for us. We must do this here. This could potentially have caused numerous kinds of bugs, but it was specifically causing msync() deadlocks because msync() was writing flushing pages that should not have been valid. Sponsored by: Isilon Systems, Inc. Reported by: kkenn	2005-02-09 23:05:20 +00:00
Poul-Henning Kamp	365b18aa89	style polishing.	2005-02-09 12:22:16 +00:00
Poul-Henning Kamp	02f2c6a9d8	Split the vop_vector for ffs1 and ffs2, this is mostly for the different EXTATTR support.	2005-02-08 21:03:52 +00:00
Poul-Henning Kamp	44787ceb0b	Use ffs_truncate() directly instead of UFS_TRUNCATE()	2005-02-08 20:51:00 +00:00
Poul-Henning Kamp	dd19a799b8	Background writes are entirely an FFS/Softupdates thing. Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.	2005-02-08 20:29:10 +00:00
Poul-Henning Kamp	88e5b12a20	Drag another softupdates tentacle back into FFS: Now that FFS's vop_fsync is separate from the internal use we can do the full job there.	2005-02-08 18:09:11 +00:00
Poul-Henning Kamp	efd6d9808c	Don't use the UFS_* and VFS_* functions where a direct call is possble. The UFS_ functions are for UFS to call back into VFS. The VFS functions are external entry points into the filesystem.	2005-02-08 17:40:01 +00:00
Poul-Henning Kamp	40854ff546	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.	2005-02-08 16:25:50 +00:00
Poul-Henning Kamp	d6f622cc2f	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.	2005-02-08 15:54:30 +00:00
Poul-Henning Kamp	32a870da8a	Use VOP_STRATEGY_APV() instead of direct dereference, this is more correct.	2005-02-08 15:40:11 +00:00
Jeff Roberson	9087d86e66	- Use a seperate malloc tag for saved inode contents to help in debugging memory modified after free errors. Sponsored by: Isilon Systems, Inc.	2005-02-02 20:30:47 +00:00
Poul-Henning Kamp	84a6975215	Introduce and use g_vfs_close().	2005-01-25 15:52:04 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	ce12d37e7b	Don't create vnode_pager objects for the disk device. geom_vfs will do that.	2005-01-24 22:41:59 +00:00
Jeff Roberson	08023360a0	- Convert the global LK lock to a mutex. - Expand the scope of lk to cover not only interrupt races, but also top-half races, which includes many new uses over global top-half only data. - Get rid of interlocked_sleep() and use msleep or BUF_LOCK where appropriate. - Use the lk mutex in place of the various hand rolled semaphores. - Stop dropping the lk lock before we panic. - Fix getdirtybuf() callers so that they reacquire access to whatever softdep datastructure they were inxpecting in the failure/retry case. Previously, sleeps in getdirtybuf() could leave us with pointers to bad memory. - Update handling of ffs to be compatible with ffs locking changes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:18:31 +00:00
Jeff Roberson	3ba649d792	- Initialize and destroy the per-filesystem ufs lock where appropriate. - Use the buffer lock on the superblock buf to serialize calls to sbupdate. - Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:12:28 +00:00
Jeff Roberson	dec351f69e	- Remove GIANT_REQUIRED where giant is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:10:47 +00:00
Jeff Roberson	5cef9d6add	- Use the ufs lock to protect fs_active. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:10:11 +00:00
Jeff Roberson	353255885c	- Acquire the ufs lock around several ffs_alloc functions that require it. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:09:10 +00:00
Jeff Roberson	8e37fbad3a	- Don't use atomic operations to deal with the active array, instead it is now quite naturally protected by the ufsmount mutex. - Use the ufs lock to protect various fields in struct fs, primarily the cg summary needs protection to avoid allocation races. Several functions have been slightly re-arranged to reduce the number of lock operations. - Adjust several functions (blkfree, freefile, etc.) to accept a ufsmount as an argument so that we may access the ufs lock. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:08:35 +00:00
Jeff Roberson	5c77b03eff	- Acquire the ufs lock when manipulating some fields of struct fs. - Change arguments to various ffs functions to match their new prototypes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:04:22 +00:00
Jeff Roberson	f2aa1113a3	- Mark the struct fs members that require the ufsmount mutex. - Define some macros for manipulating the fs_active bitmap. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:03:17 +00:00
Jeff Roberson	aaee366929	- Change some function parameters so that the ufsmount structure is accessable in places where the ufs lock will be needed. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:02:11 +00:00
Pawel Jakub Dawidek	39cfb23935	Fix ACLs handling for the root file system. Without this fix, when ACLs are set via tunefs(8) on the root file system, they are removed on boot when 'mount -a' is called, because mount(8) called for the root file system always add MNT_UPDATE flag and MNT_UPDATE flag isn't perfect. Now, one cannot remove ACLs stored in superblock (configured with tunefs(8)) via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still possible to mount file system which doesn't have ACLs in superblock via 'mount -o acls <file system>' or /etc/fstab's 'acls' option. Reported by: Lech Lorens/pl.comp.os.bsd Discussed with: phk, rwatson Reviewed by: rwatson MFC after: 2 weeks	2005-01-15 17:09:53 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	0391e5a151	Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()	2005-01-11 09:10:46 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Poul-Henning Kamp	a7e8286f28	white space	2004-12-14 21:35:00 +00:00
Poul-Henning Kamp	4a18054d7b	With the introduction of UFS2 we started looking for superblocks in four different locations on a prospective filesystem. If we found none, we forgot to invalidate the four buffers, thus the following sequence would fails: (md0 = blank disk) mount /dev/md0 /mnt (fails, no superblocks) newfs /dev/md0 (writes using physio which does not go through buffercache). mount /dev/md0 /mnt (still fails, the four cached buffers still contain no superblocks) Found by: ru	2004-12-12 14:19:11 +00:00
Kirk McKusick	364ed814e7	Fixes a bug that caused UFS2 filesystems bigger than 2TB to prematurely report that they were full and/or to panic the kernel with the message ``ffs_clusteralloc: allocated out of group''. Submitted by: Henry Whincup <henry@jot.to> MFC after: 1 week	2004-12-09 21:24:00 +00:00
Poul-Henning Kamp	8f25bad356	Fix snapshot creation.	2004-12-08 11:54:06 +00:00
Poul-Henning Kamp	f21cc2cafc	Fix nfs exports (for now). The real fix is to teach mountd about nmount.	2004-12-07 15:09:30 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
Poul-Henning Kamp	743312367a	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.	2004-12-05 22:41:02 +00:00
Poul-Henning Kamp	93e0b506e3	typo in comment.	2004-12-03 20:36:55 +00:00
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
Poul-Henning Kamp	6fde64c778	Mechanically change prototypes for vnode operations to use the new typedefs.	2004-12-01 12:24:41 +00:00
Poul-Henning Kamp	964ebefd8d	Use system wide no-op vfs_start function.	2004-11-25 09:11:27 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	51ac12ab28	Be prepared to accept NULL mountargs as part of root-mounting.	2004-11-13 13:04:31 +00:00
Poul-Henning Kamp	cf5e414960	Put back the vfs_object_create() calls, they do make a difference when my test-setup does what I want it to instead of what I ask it to. Pointed out by: tegge	2004-11-12 10:27:14 +00:00
Poul-Henning Kamp	40ce27cb57	fix some comments	2004-11-10 06:53:31 +00:00
Poul-Henning Kamp	2e6649198a	Use mount flags instead of NULL path to detect root filesystem mount.	2004-11-09 23:38:10 +00:00
Poul-Henning Kamp	5e2ccaff7a	Stop pretending to have a vm_object backing the underlying disk vnode: it isn't used for anything anywhere and the vnode_pager would explode if we attempted to.	2004-11-09 23:12:45 +00:00
Poul-Henning Kamp	40c340aa5d	Don't grab the exclusive bit on a root filesystem until we are willing to mount it. Doing so prevented fsck to be run after a refused mount.	2004-11-04 09:11:22 +00:00
Poul-Henning Kamp	4392001125	Move UFS from DEVFS backing to GEOM backing. This eliminates a bunch of vnode overhead (approx 1-2 % speed improvement) and gives us more control over the access to the storage device. Access counts on the underlying device are not correctly tracked and therefore it is possible to read-only mount the same disk device multiple times: syv# mount -p /dev/md0 /var ufs rw 2 2 /dev/ad0 /mnt ufs ro 1 1 /dev/ad0 /mnt2 ufs ro 1 1 /dev/ad0 /mnt3 ufs ro 1 1 Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches things in RAM) this is not possible with read-write mounts, and the system will correctly reject this. Details: Add a geom consumer and a bufobj pointer to ufsmount. Eliminate the vnode argument from softdep_disk_prewrite(). Pick the vnode out of bp->b_vp for now. Eventually we should find it through bp->b_bufobj->b_private. In the mountcode, use g_vfs_open() once we have used VOP_ACCESS() to check permissions. When upgrading and downgrading between r/o and r/w do the right thing with GEOM access counts. Remove all the workarounds for not being able to do this with VOP_OPEN(). If we are the root mount, drop the exclusive access count until we upgrade to r/w. This allows fsck of the root filesystem and the MNT_RELOAD to work correctly. Set bo_private to the GEOM consumer on the device bufobj. Change the ffs_ops->strategy function to call g_vfs_strategy() In ufs_strategy() directly call the strategy on the disk bufobj. Same in rawread. In ffs_fsync() we will no longer see VCHR device nodes, so remove code which synced the filesystem mounted on it, in case we came there. I'm not sure this code made sense in the first place since we would have taken the specfs route on such a vnode. Redo the highly bogus readblock() function in the snapshot code to something slightly less bogus: Constructing an uio and using physio was really quite a detour. Instead just fill in a bio and ship it down.	2004-10-29 10:15:56 +00:00
Poul-Henning Kamp	570a7ddaa3	We only support backing UFS/FFS with disks.	2004-10-28 06:19:28 +00:00
Poul-Henning Kamp	a40a512387	Eliminate unnecessary KASSERTS.	2004-10-27 06:45:06 +00:00
Poul-Henning Kamp	93d244fb1a	KASSERT that we only get to prewrite() on writes.	2004-10-26 20:13:49 +00:00
Poul-Henning Kamp	8dd5650594	White space changes. Add missing static.	2004-10-26 20:13:21 +00:00
Poul-Henning Kamp	6e77a04170	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.	2004-10-26 10:44:10 +00:00
Poul-Henning Kamp	58883a1fe5	Fix syntax errors introduced by last commit. Why isn't DIRECTIO in NOTES/LINT ?	2004-10-26 09:04:20 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Poul-Henning Kamp	fae974f156	Degeneralize the per cdev copyonwrite callback. The only possible value is ffs_copyonwrite() and the only place it can be called from is FFS which would never want to call another filesystems copyonwrite method, should one exist, so there is no reason why anything generic should know about this.	2004-10-26 06:25:56 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Robert Watson	60c9762920	Explicitly break out NETA license from Berkeley license to clearly indicate license grant, as well as to indicate that NETA is asserting only two clauses, not four clauses. Requested by: imp	2004-10-20 08:05:02 +00:00
Nate Lawson	894d8d3c03	Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB, allowing for sizes up to 4 TB. This doesn't affect UFS2 since b is already a 64 bit type, coincidental with daddr_t. Submitted by: bde	2004-10-09 20:16:06 +00:00
Pawel Jakub Dawidek	8d02a378aa	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.	2004-10-05 11:26:43 +00:00
Poul-Henning Kamp	4f116178ba	Remove support for accessing device nodes in UFS/FFS. Device nodes can still be created and exported with NFS.	2004-09-28 13:30:58 +00:00
Poul-Henning Kamp	961da2716b	Give cluster_write() an explicit vnode argument. In the future a struct buf will not automatically point out a vnode for us.	2004-09-27 19:14:10 +00:00
Pawel Jakub Dawidek	5a19f8b0c4	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.	2004-09-23 10:13:18 +00:00
Poul-Henning Kamp	d705e025d0	The getpages VOP was a good stab at getting scatter/gather I/O without too much kernel copying, but it is not the right way to do it, and it is in the way for straightening out the buffer cache. The right way is to pass the VM page array down through the struct bio to the disk device driver and DMA directly in to/out off the physical memory. Once the VM/buf thing is sorted out it is next on the list. Retire most of vnode method. ffs_getpages(). It is not clear if what is left shouldn't be in the default implementation which we now fall back to. Retire specfs_getpages() as well, as it has no users now.	2004-09-19 08:14:55 +00:00
Poul-Henning Kamp	b08c753baa	Do not traverse list of snapshots if there isn't one. Found by: scottl	2004-09-16 17:28:56 +00:00
Poul-Henning Kamp	b85e29f007	Missed a place where snapshots were allocated in my last commit to this file.	2004-09-16 15:58:18 +00:00
Poul-Henning Kamp	67673e6677	Create struct snapdata which contains the snapshot fields from cdev and the previously malloc'ed snapshot lock. Malloc struct snapdata instead of just the lock. Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes). While here, give the private readblock() function a vnode argument in preparation for moving UFS to access GEOM directly.	2004-09-13 07:29:45 +00:00
Poul-Henning Kamp	883d3c0c07	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
John Baldwin	b72ea57f3b	Generalize the UFS bad magic value used to determine when a filesystem has only been partly initialized via newfs(8) so that it applies to both UFS1 and UFS2. Submitted by: "Xin LI" delphij at frontfree dot net MFC: maybe?	2004-08-19 11:09:13 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Poul-Henning Kamp	7ac439fec4	use bufdone() not biodone().	2004-08-08 13:23:05 +00:00
Poul-Henning Kamp	5e8c582ac2	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.	2004-07-30 22:08:52 +00:00
Poul-Henning Kamp	d634f69316	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.	2004-07-28 20:21:04 +00:00
Alexander Kabaev	b403319b8d	Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.	2004-07-28 06:41:27 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Poul-Henning Kamp	d8d3d4158b	Make sure to update the mnt_stats before UFS1 extattr tried to do I/O on the device. Otherwise the blocksize is undefined in the buffer cache.	2004-07-14 14:19:32 +00:00
Alfred Perlstein	f257b7a54b	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.	2004-07-12 08:14:09 +00:00
Marcel Moolenaar	f65de26bf6	Update for the KDB debugger framework: o Make debugging code conditional upon KDB. o Use kdb_backtrace() instead of backtrace(). o Remove inclusion of opt_ddb.h.	2004-07-10 20:45:47 +00:00
Poul-Henning Kamp	c94cd5fc8c	Explicity initialize vp->v_bsize.	2004-07-07 20:04:06 +00:00
Poul-Henning Kamp	e3c5a7a4dd	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.	2004-07-04 08:52:35 +00:00
Jun Kuriyama	86030e4a00	Avoid deadlock which is caused by locking VDIR of parent and VREG of snapshot itself in wrong order. We can skip unlink check of that directory because it must have snapshot in it. Reviewed by: mckusick and current@	2004-06-18 14:35:17 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Stefan Farfeleder	1a5ff9285a	Avoid assignments to cast expressions. Reviewed by: md5 Approved by: das (mentor)	2004-06-08 13:08:19 +00:00
Tim J. Robbins	fa2a4d0595	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb	2004-06-03 01:47:37 +00:00
Kirill Ponomarev	b4a1d9299a	- Fix typo Approved by: tobez	2004-05-31 16:55:12 +00:00
Ken Smith	4b14cc0205	Upon further review it was decided this piece of the msync(2) fixes was applicable to HEAD, originally it was thought this should only be done in RELENG_4. Implement IO_INVAL in the vnode op for writing by marking the buffer as "no cache". This fix has already been applied to RELENG_4 as Rev. 1.65.2.15 of ufs/ufs/ufs_readwrite.c. Reviewed by: alc, tegge	2004-05-21 12:05:48 +00:00
Ken Smith	83d8045f16	Style fixup in previous commit. Noticed by: bde (thanks!)	2004-05-19 18:06:21 +00:00
Ken Smith	f7dd67d801	Change ffs_realloccg() to set the valid bits for the extended part of the fragment to zero the valid parts of a VM_IO buffer. RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately. Reviewed by: alc, tegge	2004-05-14 22:00:08 +00:00
Bosko Milekic	451079d4ab	Revert previous change to this file because it breaks some things which compare /etc/fstab entries to results from getfsstat(). The real way to fix this is to make 'ufs2' a recognized filesystem (for real, no beating around the bush). This should fix things like 'umount -a -t ufs' now. Appologies for the previous breakage.	2004-04-29 15:10:42 +00:00
Bosko Milekic	2aebb586db	The previous change to mount(8) to report ufs or ufs2 used libufs, which only works for Charlie root. This change reverts the introduction of libufs and moves the check into the kernel. Since the f_fstypename is the same for both ufs and ufs2, we check fs_magic for presence of ufs2 and copy "ufs2" explicitly instead. Submitted by: Christian S.J. Peron <maneo@bsdpro.com>	2004-04-26 15:13:46 +00:00
Bruce Evans	f679aa45a7	Record where half the bits in this file came from (from ufs_readwrite.c). Damage to history from moving bits was especially large since a repo copy is not feasible for partial files.	2004-04-07 11:21:18 +00:00
Warner Losh	012d41340a	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson	2004-04-07 03:47:21 +00:00
John Baldwin	255ec151e6	Fix a paste-o from the buf_prewrite() cleanup commit and check for the MNTK_SUSPEND flag on the correct vnode pointer in softdep_disk_prewrite(). Reviewed by: phk Tested by: kensmith	2004-04-06 19:20:24 +00:00
Maxime Henrion	b1fddb236f	Fix the remaining warnings of growfs(8) on my sparc64 box with WARNS=6. I don't change the WARNS level in the Makefile because I didn't tested this on other archs. The fs.h fix was suggested by: marcel Reviewed by: md5(1)	2004-04-03 23:30:59 +00:00
Alexander Kabaev	c355fd5a84	Avoid doing bawrite to initialize inode block while holding cylinder group block locked. If filesystem has any active snapshots, bawrite can come back trying to allocate new snapshot data block from the same cylinder group and cause panic due to recursive lock attempt. PR: 64206 Reviewed by: mckusick Tested by: pjd	2004-03-16 22:06:32 +00:00
Poul-Henning Kamp	ceb58ca58f	When I was a kid my work table was one cluttered mess an cleaning it up were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.	2004-03-11 18:50:33 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
Kirk McKusick	546a1660f0	In the function clear_inodedeps(), a FREE_LOCK() should be called AFTER the call to vn_start_write(), not before it. Otherwise, it is possible to unlock it multiple times if the vn_start_write() fails. Submitted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>	2004-02-23 06:56:31 +00:00
Bruce Evans	e9827c6d93	Fixed some style bugs: - don't unlock the vnode after vinvalbuf() only to have to relock it almost immediately. - don't refer to devices classified by vn_isdisk() as block devices.	2004-02-14 04:41:13 +00:00
Bruce Evans	0efb13948d	MFextfs: backed out secondary changes in rev.1.40 that had become just style bugs (a variable that is used only once, and misformattings).	2004-02-13 03:05:12 +00:00
Jun Kuriyama	df1941fb59	Fix style bugs in previous commit. Submitted by: bde	2004-02-13 02:02:06 +00:00
Bruce Evans	8adff5fc12	Fixed some minor style bugs (English usage and formatting of binary operators) in and near revs.1.169-1.170 (open mode bandaid). This (or better a proper fix) should have been done before cloning the bandaid to many other file systems.	2004-02-12 16:52:24 +00:00
Jun Kuriyama	5580f04ab0	Reverse lock order by using local variable. This will shut up "acquiring duplicate lock of same type" message. Reviewed by: mckusick	2004-02-12 08:52:08 +00:00
Bruce Evans	1723bc36ef	Removed more vestiges of vfs_ioopt: - rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed timestamp bugs in it. Removal of most of vfs_ioopt made it just and optimization, and removal of the vm object reference calls made it less than an optimization. It was cloned in rev.1.94 of ufs_readwrite.c as part of cloning ffs_extwrite() although it was always less than an optimization in ffs_extwrite(). - some comments, compound statements and vertical whitespace were vestiges of dead code.	2004-02-11 15:27:26 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
Alan Cox	bfb7317ebf	Remove unnecessary vm object reference and deallocate calls from ffs_read() and ffs_write(). These calls trace their origins to the dead vfs_ioopt code, first appearing in revision 1.39 of ufs_readwrite.c. Observed by: bde Discussed with: tegge	2004-01-31 05:42:58 +00:00
Andrey A. Chernov	a0036d23a6	Turn uio_resid/uio_offset comments into KASSERTs Reviewed by: bde	2004-01-27 11:28:38 +00:00
Andrey A. Chernov	51cf017614	Copy comment about caller check from ffs_read to ffs_extread, don't check for uio_resid < 0 here too.	2004-01-23 06:00:41 +00:00
Andrey A. Chernov	070f8eefb1	Fix various panic() strings to reflect true function name to allow easy grep. Small code reorganization to look more logic. Copy ffs_write check from prev. commit to ffs_extwrite.	2004-01-23 05:52:31 +00:00
Andrey A. Chernov	bd0cc17757	ffs_read: Replace wrong check returned EFBIG with EOVERFLOW handling from POSIX: 36708 [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file, and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes. ffs_write: Replace u_int64_t cast with uoff_t cast which is more natural for types used. ffs_write & ffs_read: Remove uio_offset and uio_resid checks for negative values, the caller supposed to do it already. Add comments about it. Reviewed by: bde	2004-01-23 05:38:02 +00:00
Alexander Kabaev	6bd39fe978	Spell magic '16' number as IO_SEQSHIFT.	2004-01-19 20:03:43 +00:00
Alexander Kabaev	291027ce9c	Avoid calling vprint on a vnode while holding its interlock mutex. Move diagnostic printf after vget. This might delay the debug output some, but at least it keeps kernel from exploding if DEBUG_VFS_LOCKS is in effect.	2004-01-04 04:08:34 +00:00
Don Lewis	31c81e4bed	Set fs_ronly to the correct value in ffs_reload() when reloading the file system super block after fsck has repaired the file system. The value of fs_ronly was getting overwritten, which caused ffs_update() to attempt to update inode timestamps even though the file system was still mounted read-only. This fixes the "giving up on N buffers" error that is triggered by running fsck on the root file system and then rebooting without mounting the file system read-write.	2003-12-07 05:16:52 +00:00
Wes Peters	ec52df8eb9	Write the UFS2 superblock with a 'BAD' magic number at the beginning of newfs, to signify the newfs operation has not yet completed. Re- write the superblock with the correct magic number once all of the cylinder groups have been created to show the operation has finished. Sponsored by: St. Bernard Software	2003-11-16 07:08:27 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	c78b8dfacf	Call free(9) after the vnode interlock is released, avoiding a lock-order reversal.	2003-11-13 03:56:32 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Alexander Kabaev	ca430f2e92	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff	2003-11-05 04:30:08 +00:00
Alexander Kabaev	45d45c6cde	Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter and one cannot be sure it is equal to curthread.	2003-11-03 04:46:19 +00:00
Alexander Kabaev	cb9ddc80ae	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.	2003-11-02 04:52:53 +00:00
Alexander Kabaev	492c1e68fb	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff	2003-11-01 05:51:54 +00:00
Don Lewis	9f206707a5	Tweak the calculation of minbfree in ffs_dirpref() so that only those cylinder groups that have at least 75% of the average free space per cylinder group for that file system are considered as candidates for the creation of a new directory. The previous formula for minbfree would set it to zero if the file system was more than 75% full, which allowed cylinder groups with no free space at all to be chosen as candidates for directory creation, which resulted in an expensive search for free blocks for each file that was subsequently created in that directory. Modify the calculation of minifree in the same way. Decrease maxcontigdirs as the file system fills to decrease the likelyhood that a cluster of directories will overflow the available space in a cylinder group. Reviewed by: mckusick Tested by: kmarx@vicor.com MFC after: 2 weeks	2003-10-31 07:25:06 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Tor Egge	f0da6ec99b	Initialize bp->b_offset to the physical offset in partition so GEOM knows where to read from disk.	2003-10-22 18:57:59 +00:00
Poul-Henning Kamp	2c18019f14	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
Poul-Henning Kamp	4e1694ecaf	Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()	2003-10-18 11:16:33 +00:00
Kirk McKusick	bd189c8c3e	When expunging unlinked files from a snapshot, skip over holes in the file rather than panicing with "indiracct: botched params". Submitted by: Mark Santcroos <marks@ripe.net>	2003-10-17 13:57:58 +00:00
Jeff Roberson	a844eb934c	- My last commit to this file is still not safe, I believe that it may be due to the recursion in indir_trunc().	2003-10-06 03:28:03 +00:00

... 2 3 4 5 6 ...

965 Commits