freebsd-skq

Author	SHA1	Message	Date
Tor Egge	a695d54404	Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended file systems. This could cause deadlocks when creating snapshots. Reviewed by: jeff	2006-03-08 02:14:39 +00:00
Tor Egge	3b582b4e72	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
Jeff Roberson	b9b12498fd	- Acquire lk in softdep_slowdown so that it's owned when we call softdep_speedup(). - Assert that lk is held in softdep_speedup() rather than acquiring it. This avoids a potential lock recursion.	2006-03-02 08:52:53 +00:00
Jeff Roberson	eb2ea10590	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris	2006-03-02 05:50:23 +00:00
Jeff Roberson	f5a4db791d	- Using LK_NOWAIT in qsync() can get us into infinite loop situations that lead to deadlocks. Remove it. MFC After: 1 week	2006-02-22 06:12:53 +00:00
Robert Watson	5652c15c24	In quotaoff(), lock the vnode instead of asserting it when manipulating v_vflags. MFC after: 1 week Submitted by: Antoine Brodin <antoine at brodin at laposte dot net>	2006-02-12 13:20:06 +00:00
Robert Watson	4a99d6f90a	Instead of asserting the vnode lock before manipulating v_vflag, acquire it and drop it afterwards. Found by: kris MFC after: 1 week	2006-02-11 21:09:27 +00:00
Jeff Roberson	89b0e10910	- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately. MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn	2006-02-01 00:25:26 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Tor Egge	6c62b2acd0	If the lock passed to getdirtybuf() is the softdep lock then the background write completed wakeup could be missed. Close the race by grabbing the lock normally used for protection of bp->b_xflags. Reviewed by: truckman	2006-01-09 19:32:21 +00:00
Tor Egge	c8c7711d66	Broaden scope of softdep_worklist_busy rwlock protection of softdep processing to avoid some dependencies being missed by softdep_flushworklist(). Reviewed by: truckman	2006-01-09 19:16:56 +00:00
Warner Losh	5c65ae3a88	New option: NO_FFS_SNAPSHOT. I did this in p4 about the same time that NetBSD implemented it independently of them (don't know which one was actually first). This saves about 24k for those times you don't need snapshot support (like when running off a ram disk, or in an embedded environment where size matters).	2006-01-06 04:44:09 +00:00
Xin LI	cd34c8b6a2	Typo.	2005-12-23 15:50:57 +00:00
Dag-Erling Smørgrav	0430a5e289	Eradicate caddr_t from the VFS API.	2005-12-14 00:49:52 +00:00
Craig Rodrigues	b6bd025c35	Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow mount options. Noticed by: Amir Shalem < amir at boom dot org dot il>	2005-11-24 15:06:40 +00:00
Craig Rodrigues	cea903627f	If export mount flag is not passed in, set default parameters for export structure and pass that to vfs_export(). Currently in userland mount(8), an export structure is unconditionally passed in, only for UFS. This is an attempt to move that UFS-specific behavior out of mount(8) and into the UFS filesystem code.	2005-11-20 17:04:50 +00:00
Craig Rodrigues	359d438885	Add more options to ffs_opts, so that vfs_filteropts() will not complain when we pass these options to a UFS filesystem as strings via nmount(): noexec, nosuid, nosymfollow, sync, suiddir	2005-11-19 23:28:19 +00:00
Craig Rodrigues	26f59b6455	- Add parsing for the following existing UFS/FFS mount options in the nmount() callpath via vfs_getopt(), and set the appropriate MNT_* flag: -> acls, async, force, multilabel, noasync, noatime, -> noclusterr, noclusterw, snapshot, update - Allow errmsg as a valid mount option via vfs_getopt(), so we can later add a hook to propagate mount errors back to userspace via vfs_mount_error().	2005-11-18 06:06:10 +00:00
Xin LI	fad951e3a0	Slightly reorganize to reduce duplicated code. Reviewed by: rwatson	2005-11-07 18:25:23 +00:00
Paul Saab	e1cef62715	Rate limit filesystem full and out of inodes messages to once a second.	2005-10-31 20:33:28 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Xin LI	eb2893ec18	Remove an unneeded "a" from comment.	2005-10-25 19:46:15 +00:00
Nate Lawson	8680d6985f	Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit to (max block - 1) * bsize. For DEV_BSIZE, this doubles the limit from 0.5 TB to 1 TB. For the old 4.4 FFS case, decrease the limit from 0.5 TB to 2 GB - 1. Older systems had a 32 bit off_t so they couldn't access the larger files anyway. Collaboration with: bde	2005-10-21 01:54:00 +00:00
Don Lewis	875e108755	Correct the type of the temporary variable used by ufs_lookup.c:1.78 to fix the race condition in the ufs_lookup() ISDOTDOT code. Noticed by: bde MFC after: 12 days	2005-10-16 21:31:46 +00:00
Don Lewis	12d360453c	Close a race in the ufs_lookup() code that handles the ISDOTDOT case by saving the value of dp->i_ino before unlocking the vnode for the current directory and passing the saved value to VFS_VGET(). Without this change, another thread can overwrite dp->i_ino after the current directory is unlocked, causing ufs_lookup() to lock and return the wrong vnode in place of the vnode for its parent directory. A deadlock can occur if dp->i_ino was changed to a subdirectory of the current directory because the root to leaf vnode lock ordering will be violated. A vnode lock can be leaked if dp->i_ino was changed to point to the current directory, which causes the current vnode lock for the current directory to be recursed, which confuses lookup() into calling vrele() when it should be calling vput(). The probability of this bug being triggered seems to be quite low unless the sysctl variable debug.vfscache is set to 0. Reviewed by: jhb MFC after: 2 weeks	2005-10-14 22:13:33 +00:00
Robert Watson	606dcf085f	When performing a VOP_LOOKUP() as part of UFS1 extended attribute auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE. This flag must now be set so that lockmgr knows what kind of lock to acquire, and it will panic if not specified. This resulted in a panic when using extended attributes on UFS1 as of locking work present in the 6.x branch. This is a RELENG_6_0 merge candidate. Reported by: lofi MFC after: 3 days	2005-10-12 14:18:58 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Tor Egge	48c2ac4539	Avoid unintended VMIO on directories and symlinks due to leftover object not having been destroyed.	2005-10-10 19:02:04 +00:00
Tor Egge	4e0cd00988	Adjust totread argument passed to cluster_read() to account for offset not being block aligned.	2005-10-09 21:11:25 +00:00
Tor Egge	9248a8271c	Don't pretend that a failed sync write was succesful.	2005-10-09 20:49:01 +00:00
Tor Egge	021869b542	Reduce probability for a deadlock that can occur when a snapshot inode is updated by a process holding the snapshot lock. Another process updating a different inode in the same inodeblock will do copy on write checks and lock in the opposite direction. The snapshot code force a copy on write of these blocks manually (cf. start of expunge_ufs[12]) and these inode blocks are later put on snapblklist. This partial fix is to 'drain' the relevant ffs_copyonwrite() operation after installing new snapblklist. This is not a 100% solution since a failed block allocation can cause implicit fsync() which might deadlock before the new snapblklist has been installed.	2005-10-09 20:15:15 +00:00
Tor Egge	d4d530da96	Eliminate a deadlock that can occur when a dirty block belonging to a snapshot file is flushed by a process not holding snaplk (e.g. bufdaemon). Another process might hold snaplk and try to access the block due to ffs_copyonwrite processing.	2005-10-09 20:07:51 +00:00
Tor Egge	45f91051da	Eliminate a deadlock that can occur during the cgaccount() processing due to the cg map buffer being held when writing indirect blocks. The process ends up in ffs_copyonwrite(), attempting to get snaplk while holding the cg map buffer lock. Another process might be in ffs_copyonwrite(), trying to allocate a new block for a copy. It would hold snaplk while trying to get the cg map buffer lock. Release the cg map buffer early and use the copy for most of the cgaccount processing to avoid this deadlock.	2005-10-09 20:00:16 +00:00
Tor Egge	17026ff61a	Reduce the probability of low block numbers passed to ffs_snapblkfree() by skipping the call from ffs_snapremove() if the block number is zero. Simplify snapshot locking in ffs_copyonwrite() and ffs_snapblkfree() by using the same locking protocol for low block numbers as for larger block numbers. This removes a lock leak that could happen if vn_lock() succeeded after lockmgr() failed in ffs_snapblkfree(). Check if snapshot is gone before retrying a lock in ffs_copyonwrite().	2005-10-09 19:45:01 +00:00
Tor Egge	c73e9e9c7b	Reinitialize v_type and v_op fields in case vnode has been reused without reclamation. If the vnode previously was a fifo then v_op would point to ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at the end of ffsext_strategy.	2005-10-09 19:06:34 +00:00
Don Lewis	448434c3c9	Initialize the inode i_flag field in ffs_valloc() to clean up any stale flag bits left over from before the inode was recycled. Without this change, a leftover IN_SPACECOUNTED flag could prevent softdep_freefile() and softdep_releasefile() from incrementing fs_pendinginodes. Because handle_workitem_freefile() unconditionally decrements fs_pendinginodes, a negative value could be reported at file system unmount time with a message like: unmount pending error: blocks 0 files -3 The pending block count in fs_pendingblocks could also be negative for similar reasons. These errors can cause the data returned by statfs() to be slightly incorrect. Some other cleanup code in softdep_releasefile() could also be incorrectly bypassed. MFC after: 3 days	2005-10-03 21:57:43 +00:00
Don Lewis	460858e9ef	Correct previous commit to fix the sense of the TDP_NORUNNINGBUF check in ffs_copyonwrite() that is a precondition for calling waitrunningbufspace(). Pointed out by: tegge Pointy hat to: truckman MFC after: 3 days	2005-10-01 19:10:48 +00:00
Don Lewis	bd3c2d867d	Un-staticize waitrunningbufspace() and call it before returning from ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().	2005-09-30 18:07:41 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
Don Lewis	445193b887	After a rmdir()ed directory has been truncated, force an update of the directory's inode after queuing the dirrem that will decrement the parent directory's link count. This will force the update of the parent directory's actual link to actually be scheduled. Without this change the parent directory's actual link count would not be updated until ufs_inactive() cleared the inode of the newly removed directory, which might be deferred indefinitely. ufs_inactive() will not be called as long as any process holds a reference to the removed directory, and ufs_inactive() will not clear the inode if the link count is non-zero, which could be the result of an earlier system crash. If a background fsck is run before the update of the parent directory's actual link count has been performed, or at least scheduled by putting the dirrem on the leaf directory's inodedep id_bufwait list, fsck will corrupt the file system by decrementing the parent directory's effective link count, which was previously correct because it already took the removal of the leaf directory into account, and setting the actual link count to the same value as the effective link count after the dangling, removed, leaf directory has been removed. This happens because fsck acts based on the actual link count, which will be too high when fsck creates the file system snapshot that it references. This change has the fortunate side effect of more quickly cleaning up the large number dirrem structures that linger for an extended time after the removal of a large directory tree. It also fixes a potential problem with the shutdown of the syncer thread timing out if the system is rebooted immediately after removing a large directory tree. Submitted by: tegge MFC after: 3 days	2005-09-29 21:50:26 +00:00
Robert Watson	5f419982c2	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde	2005-09-28 07:03:03 +00:00
John Baldwin	7e9e371f2d	Use the refcount API to manage the reference count for user credentials rather than using pool mutexes. Tested on: i386, alpha, sparc64	2005-09-27 18:09:42 +00:00
Xin LI	0e4f6eecd7	Restore a historical ufs_inactive behavior that has been changed in rev. 1.40 of ufs_inode.c, which allows an inode being truncated even when the filesystem itself is marked RDONLY. A subsequent call of UFS_TRUNCATE (ffs_truncate) would panic the system as it asserts that it can only be called when the filesystem is mounted read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c). Because ffs_mount() already takes care of sync'ing the filesystem to disk before being downgraded to readonly, it appears to be more desirable that we should not permit this sort of writes to disk. This change would fix a panic that occours when read-only mounted a corrupted filesystem and doing some file operations. MT6/5/4 candidate Reviewed by: mckusick	2005-09-23 20:49:57 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Tor Egge	2f0ffabcf4	Giant is no longer needed here.	2005-09-12 01:21:42 +00:00
Christian S.J. Peron	d1dfd92177	Convert the primary ACL allocator from malloc(9) to using a UMA zone instead. Also introduce an aclinit function which will be used to create the UMA zone for use by file systems at system start up. MFC after: 1 month Discussed with: rwatson	2005-09-06 00:06:30 +00:00
Tor Egge	d536ff2edb	Retain generation count when writing zeroes instead of an inode to disk. Don't free a struct inodedep if another process is allocating saved inode memory for the same struct inodedep in initiate_write_inodeblock_ufs[12](). Handle disappearing dependencies in softdep_disk_io_initiation(). Reviewed by: mckusick	2005-09-05 22:14:33 +00:00
Suleiman Souhlal	fdedad764a	ffs_mountfs() needs devvp to be locked, so lock it. Glanced at by: phk Tested by: pjd MFC after: 3 days	2005-09-02 13:52:55 +00:00
Suleiman Souhlal	93373c422f	Set the mountpoint path in the superblock (fs_fsmnt) at mount-time so that it appears in the various messages (not cleanly unmounted, filesystem full, etc). This has been broken since rev 1.261.	2005-08-21 22:06:41 +00:00
Tor Egge	15da51f73c	Don't set the COMPLETE flag in an inodedep structure before the related inode has been written.	2005-08-21 18:19:06 +00:00

1 2 3 4 5 ...

1378 Commits