freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	2814d5ba5f	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 11:51:06 +00:00
Konstantin Belousov	52dfc8d7da	Add the ffs structures introspection functions for ddb. Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month	2008-09-16 11:19:38 +00:00
Konstantin Belousov	90446e360c	When downgrading the read-write mount to read-only, do_unmount() sets MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro. Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 10:59:35 +00:00
Konstantin Belousov	0411d79138	The struct inode *ip supplied to softdep_freefile is not neccessary the inode having number ino. In r170991, the ip was marked IN_MODIFIED, that is not quite correct. Mark only the right inode modified by checking inode number. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 10:52:25 +00:00
Edward Tomasz Napierala	86a0c0aa7b	When calling extattr_check_cred, use V{READ,WRITE}, not I{READ,WRITE}. Approved by: rwatson (mentor)	2008-09-03 12:46:09 +00:00
Attilio Rao	59d4932531	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>	2008-08-31 14:26:08 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Konstantin Belousov	acd05e468a	In ffs_valloc(), ffs_vget() may fail because insmntque() refused to insert new vnode into the mount vnode list. Then, for the SU-enabled mount, ffs_vfree could create freefile dependency. This dependency can hang around forever since inode is not marked as IN_MODIFIED and correspondingly inodeblock may be not marked as dirty. After ffs_vget() fails, retry with FFSV_FORCEINSMQ, mark the inode as modified, and vput() it immediately. Take care of the dup alloc. Tested by: pho Reviewed by: tegge MFC after: 1 month	2008-08-28 09:19:50 +00:00
Konstantin Belousov	7b7ed832e4	Softdep code may need to instantiate vnode when processing dependencies. In particular, it may need this while syncing filesystem being unmounted. Since during unmount MNTK_NOINSMNTQUE flag is set, that could sometimes disallow insertion of the vnode into the vnode mount list, softdep code needs to overwrite the MNTK_NOINSMNTQUE flag. Create the ffs_vgetf() function that sets the VV_FORCEINSMQ flag for new vnode and use it consistently from the softdep code instead of ffs_vget(). Add the retry logic to the softdep_flushfiles() to flush the vnodes that could be instantiated while flushing softdep dependencies. Tested by: pho, kris Reviewed by: tegge MFC after: 1 month	2008-08-28 09:18:20 +00:00
Konstantin Belousov	f2228325de	Put the relocked variable from the r182111 into the #ifdef QUOTA braces to prevent warning about unused var on the !QUOTA kernels. Reported by: ed MFC after: 1 week	2008-08-24 19:06:19 +00:00
Konstantin Belousov	689eae1d90	Revert the r167541: "Remove unneeded getinoquota() call in the ufs_access()." The call to getinoquota in ufs_access() serves the purpose of instantiating inode dquot from the vn_open(). Since quotas are accounted only for the inodes with already attached dquot, removal of the call prevented opened inodes from participation in the quota calculations. Since ufs_access() may be called with the vnode being only shared locked, upgrade (and then downgrade) vnode lock if calling getinoquota(). Reported by: simon at optinet com In collaboration with: pho MFC after: 1 week	2008-08-24 17:24:22 +00:00
Konstantin Belousov	e792b09be2	Revert r181345. Move the NULL pointer check to the vfs_deleteopt() function. Discussed with: rodrigc MFC after: 3 days	2008-08-10 12:15:36 +00:00
Konstantin Belousov	a1a917e029	User may do "mount -o snapshot ...", that causes new FFS mount to be performed with snapshot option, while the mp->mnt_opt is NULL. Protect against NULL pointer dereference. Noted by: Mateusz Guzik <mjguzik gmail com> MFC after: 3 days	2008-08-06 14:47:19 +00:00
Dag-Erling Smørgrav	20ed1beeb5	ufsmount.h uses "struct\tfoo bar;", except where it doesn't. quota.h uses "struct foo\tbar;", except where it doesn't. Try to make them both agree with themselves (though not with eachother)	2008-08-05 15:24:07 +00:00
Dag-Erling Smørgrav	1ac541a69a	Whitespace, prototypes	2008-08-05 10:25:55 +00:00
John Baldwin	4a67a0d994	Whitespace tweak.	2008-07-30 21:07:56 +00:00
Konstantin Belousov	89672c6337	The ffs_balloc_ufs{1,2} functions call bdwrite() while having several vnode buffers locked at once. In particular, there are indirect buffers among locked ones. The bdwrite() may start the flushing to keep dirty buffer list at the bounds. If any buffer on the dirty list requires translation from logical to physical block number, code may ends up trying to lock an indirect buffer already locked in ffs_balloc_ufsX. Prevent the bdflush() activity when several buffers are locked at once by setting the TDP_INBDFUSH for the problematic code blocks. Reported and tested by: pho, Josef Buchsteiner at Juniper In collaboration with: kan MFC after: 1 month	2008-07-23 14:32:44 +00:00
Pawel Jakub Dawidek	a80d8caa74	Say hi to svn, by simplifing ffs_vget() function a bit - there is no need for a variable that is used only once.	2008-07-19 22:29:44 +00:00
Craig Rodrigues	8e7a2353ec	Fix comments to replace SBSIZE with SBLOCKSIZE, since SBSIZE was renamed to SBLOCKSIZE in version 1.33 Reviewed by: mckusick	2008-05-24 20:44:14 +00:00
Craig Rodrigues	fb77e0af12	After converting the "snapshot" mount option to the MNT_SNAPSHOT flag, delete "snapshot" from the persistent mount options list. This should fix problems with doing a mount -o snapshot of a file system, followed by an NFS export of the same file system. PR: 122833 Reported by: Leon Kos <leon.kos lecad fs uni-lj si>, Jaakko Heinonen <jh saunalahti fi> MFC after: 1 month	2008-05-24 00:41:32 +00:00
Craig Rodrigues	02a871f1ea	For the following mount options, do not perform the string to flag conversions here, because we already do them further up in vfs_donmount() in vfs_mount.c async -> MNT_ASYNC force -> MNT_FORCE multilabel -> MNT_MULTILABEL noatime -> MNT_NOATIME noclusterr -> MNT_NOCLUSTERR noclusterw -> MNT_NOCLUSTERW MFC after: 1 month	2008-05-24 00:02:12 +00:00
Stephan Uphoff	2ac78f0e1a	Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set) Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo	2008-05-20 19:05:43 +00:00
Jeff Roberson	721cc5664f	- Use a local variable for i_ino in ufs_lookup. It is only used to communicate between two parts of this one function. This was causing problems with shared lookups as each would trash the ino value in the inode. - Remove the unused i_ino field from the inode structure.	2008-04-22 12:34:16 +00:00
Konstantin Belousov	eab626f110	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks	2008-04-16 11:33:32 +00:00
Jeff Roberson	b300d706ea	- Use a lockmgr lock rather than a mtx to protect dirhash. This lock may be held for the duration of the various dirhash operations which avoids many complex unlock/lock/revalidate sequences. - Permit shared locks on lookup. To protect the ip->i_dirhash pointer we use the vnode interlock in the shared case. Callers holding the exclusive vnode lock can run without fear of concurrent modification to i_dirhash. - Hold an exclusive dirhash lock when creating the dirhash structure for the first time or when re-creating a dirhash structure which has been recycled. Tested by: kris, pho	2008-04-11 09:48:12 +00:00
Jeff Roberson	eb1314a249	- cache dp->i_offset in the local 'i_offset' variable for use in loop indexes so directory lookup becomes shared lock safe. In the modifying cases an exclusive lock is held here so the commit routine may rely on the state of i_offset. - Similarly handle i_diroff by fetching at the start and setting only once the operation is complete. Without the exclusive lock these are only considered hints. - Assert that an exclusive lock is held when we're preparing for a commit routine. - Honor the lock type request from lookup instead of always using exclusive locking. Tested by: pho, kris	2008-04-11 09:44:25 +00:00
Pawel Jakub Dawidek	58e9afacb4	Correct function name in panic(). Reported by: kensmith	2008-04-07 18:12:37 +00:00
Attilio Rao	047dd67e96	Optimize lockmgr in order to get rid of the pool mutex interlock, of the state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007	2008-04-06 20:08:51 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
Jeff Roberson	d04963d0f4	- Since rev 1.142 of ffs_snapshot.c the interlock has not been required to protect the v_lock pointer. Removing the interlock acquisition here allows vn_lock() to proceed without requiring the interlock at all. - If the lock mutated while we were sleeping on it the interlock has been dropped. It is conceivable that the upper layer code was relying on the interlock and LK_NOWAIT to protect the identity or state of the vnode while acquiring the lock. In this case return EBUSY rather than trying the new lock to prevent potential races. Reviewed by: tegge	2008-03-31 07:55:45 +00:00
Jeff Roberson	9c0cdb8253	- Don't free snapdata structures when they are no longer in use. Keeping the lockmgr lock valid allows us to switch the v_lock pointer in snapshot vnodes between the embedded lockmgr lock and snapdata lock without needing the vnode interlock to protect against races - Keep unused snapdata structures in a list. - Add a function to lock the devvp and allocate a snapdata to it or acquire a new one without races. The old function was safe from creation races because we set the mount flag when creating snapshots and thus serializing them. However, it might have been subject to destroying races. Reviewed by: tegge	2008-03-31 07:47:08 +00:00
John Baldwin	d952ba1bd5	Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo' (such as 'atime' vs 'noatime'). The filesystems will always see either 'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid mount options should include 'nofoo' instead of 'foo'. With this fix, you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as noatime without getting an error. You can also update a noatime FFS filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime' using nmount(2) (e.g. 7.x /sbin/mount binary). MFC after: 1 week Reviewed by: crodig	2008-03-26 20:48:07 +00:00
Doug Rabson	dfdcada31e	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks	2008-03-26 15:23:12 +00:00
Konstantin Belousov	1be222e9df	Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks	2008-03-23 13:45:24 +00:00
Jeff Roberson	698b1a6643	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
Konstantin Belousov	0e2c6b177f	Reduce the acquisition of the vnode interlock in the ffs_read() and ffs_extread() when setting the IN_ACCESS flag by checking whether the IN_ACCESS is already set. The possible race there is admissible. Tested by: pho Submitted by: jeff	2008-03-21 12:33:00 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Coleman Kane	6c62df7e49	Replace the non-MPSAFE timeout(9) API in ffs_softdep.c with the MPSAFE callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous items on the http://wiki.freebsd.org/SMPTODO list. Reviewed by: imp, obrien, jhb MFC after: 1 week	2008-03-13 20:15:48 +00:00
Ed Maste	3eb8098d2b	Remove include of opt_quota.h; as of revision 1.205 there is no longer any #ifdef QUOTA conditional code.	2008-03-10 18:44:07 +00:00
Konstantin Belousov	e7fd887711	Initialize mnt_stat.f_iosize before autostarting UFS1 extattrs. It is normally initialized by ffs_statfs() after ffs_mount finished. The extattr autostart code calls the ufs_lookup(), that uses value above to iterate over the directory blocks, see bmask initialization in the ufs_lookup() and ufsdirhash. Having the filesystem with root directory spanning more then one block would result in reading a random kernel memory. PR: kern/120781 Test case provided by: rwatson MFC after: 1 week	2008-03-05 16:34:03 +00:00
Robert Watson	631ea79e3f	Continue on-going campaign to replace lockmgr locks with sx locks where the specific semantics of ockmgr aren't required: update UFS1 extended attributes to protect its data structures using an sx lock. While here, update comments on lock granularity. MFC after: 2 weeks	2008-03-04 12:50:11 +00:00
Robert Watson	6cf7bc60ec	Move setting of MNTK_MPSAFE flag before UFS1 extended attribute auto-start so that the flag is set before we start performing I/O in the auto-start routine. MFC after: 2 weeks Suggested by: kib	2008-03-04 12:10:03 +00:00
Robert Watson	874f7ae331	Don't auto-start or allow extattrctl for UFS2 file systems, as UFS2 has native extended attributes. This didn't interfere with the operation of UFS2 extended attributes, but the code shouldn't be running for UFS2. MFC after: 2 weeks	2008-03-02 22:52:14 +00:00
Giorgos Keramidas	53a5cd3485	Minor typo nit.	2008-02-25 19:31:44 +00:00
Attilio Rao	81c794f998	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
Attilio Rao	628f51d275	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.	2008-02-24 16:38:58 +00:00
Attilio Rao	24463dbbee	- Introduce lockmgr_args() in the lockmgr space. This function performs the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-15 21:04:36 +00:00
Attilio Rao	0e9eb108f0	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
Attilio Rao	d638e093d6	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Konstantin Belousov	9ddfa9c6e9	ffs_balloc_ufsX() routines, in the case of recovering from the failed allocation, free the indirect blocks before clearing the disk pointers, that could lead to the softupdate inconsistencies in the case of the machine or disk crash at the wrong time. Rearrange the recover code to do the ffs_blkfree() after the second ffs_syncvnode(), that clears the pointers chain. Proposed and reviewed by: tegge Tested by: Peter Holm MFC after: 3 weeks	2008-01-03 12:28:57 +00:00
David E. O'Brien	029839a449	style(9)	2008-01-02 01:19:17 +00:00
Konstantin Belousov	e7627b2c62	The ffs_balloc() routines, whan allocating the indirect blocks for the inode, do the rollback in case the allocation failed (due to insufficient free space or quota limits). But, the code does leaves the buffers corresponding to the inoirect blocks on the vnode bufobj list. This causes several assertion failures (for instance, "ffs_truncate3" in ffs_truncate()) to fail, and could result in the indirect block aliasing problem, like writing the context of such blocks to random disk location. Remove the buffers from the bufobj properly. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 weeks	2007-12-29 13:31:27 +00:00
Ken Smith	d9e6294e4f	Fix a broken check that recently became more annoying because it now gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently nobody uses). From Tor's description: This happens when the block range spans two block maps, the first in the inode (mapping up to NDADDR direct blocks) and the second being the first indirect block. The current check assumes that both block maps are indirect blocks. Work done by: tegge Tested by: kris, kensmith	2007-12-01 13:12:43 +00:00
Ruslan Ermilov	5b4ab4a032	Fix build without INVARIANTS and update a comment to match a change made in previous revision.	2007-11-09 11:04:36 +00:00
David E. O'Brien	1102b89baa	Turn most ffs 'DIAGNOSTIC's into INVARIANTS.	2007-11-08 17:21:51 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Alfred Perlstein	77465d9390	Get rid of qaddr_t. Requested by: bde	2007-10-16 10:54:55 +00:00
Bjoern A. Zeeb	7fd627f00f	Fix a DIV0 in case a large value for fs_avgfilesize or fs_avgfpdir is given (with newfs or tunefs) and dirsize overflows. In case dirsize is <= 0 because of an overflow set maxcontigdirs to 0 so it will be 1 later. This is what would happen for large fs_avgfilesize. [1] Identified with help from: roberto, pjd Submitted by: pjd [1] Approved by: re (rwatson) MFC after: 8 days	2007-09-10 14:12:29 +00:00
Craig Rodrigues	7a920f5761	Perform range check before allocating memory when reading extended attributes. Reviewed by: kib Approved by: re (hrs) PR: 114389	2007-07-13 18:51:08 +00:00
Peter Wemm	ae259a3d16	Fix an annoying pointer/int cast warning that shows up on 64 bit systems. Approved by: re	2007-07-02 01:31:43 +00:00
Konstantin Belousov	d66ba37013	Fix livelock that could occur when snapshoting UFS with quotas, where some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE() call there could cause inodeblock to have both freefile and inodedep dependencies without any inode in the block being marked for write. Then, softdep_check_suspend() would return EAGAIN forewer. Force write of inodeblock with allocated freefile softdependency by setting IN_MODIFIED flag in softdep_freefile and unconditionally calling UFS_UPDATE() in ufs_reclaim. Reported by: kris Debug help and tested by: Peter Holm Approved by: re (kensmith) MFC after: 3 weeks	2007-06-22 13:22:37 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Konstantin Belousov	7a31868ed0	Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file: part 2. Convert calls missed in the first big commit. Noted by: rwatson Pointy hat to: kib	2007-06-01 14:33:11 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Pawel Jakub Dawidek	5d14c414ec	- Remove unnecessary vnode internal locking - v_vflag is protect by vnode's lock (not vnode's interlock). - Simplify code a bit.	2007-05-28 00:28:15 +00:00
Pawel Jakub Dawidek	64c40cdcb0	Eliminate VI_LOCK()/VI_UNLOCK() pair from getattr and close code paths. It's hard to measure performance improvement on my test machine, but the change won't degrade performance for sure. I can measure slight improvement for debugging kernel and it can also be a win for machines where atomic operation is more expensive. Reviewed by: kib	2007-05-23 11:06:09 +00:00
Konstantin Belousov	d413d21071	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.	2007-05-18 13:02:13 +00:00
Andrew Thompson	832eef31d1	Add a newline to the printf message.	2007-05-03 22:39:52 +00:00
Konstantin Belousov	5b959aa44f	Fix the NAMEI zone leak when snapshot was successfully created. Reported and tested by: Peter Holm MFC after: 2 weeks	2007-04-10 09:31:42 +00:00
Konstantin Belousov	9724167c2a	Recalculate the NEWBLOCK flag for pagedep structure after the softdep lock is dropped, since pagedep may be already processed and deallocated. Found and tested by: kris MFC after: 2 weeks	2007-04-10 09:30:41 +00:00
Konstantin Belousov	23743f6a11	When LK_NOWAIT is passed as argument to process_worklist_item(), this does not prevent handle_workitem_remove() from recursing into a blocking version. Add the dirrem to worklist instead of processing it now if this is the case. Reported and tested by: kris Submitted by: tegge MFC after: 2 weeks	2007-04-10 09:28:17 +00:00
Xin LI	04533fc68e	Use *_EMPTY macros when appropriate.	2007-04-04 07:29:53 +00:00
Konstantin Belousov	06f0c8dc4d	Revert rev. 1.205. Replace unconditional acquision of Giant when QUOTAS are defined with VFS_LOCK_GIANT(NULL) call. This shall fix softdep operation when mpsafe_vfs = 0. Reported and tested by: kris Submitted by: tegge MFC after: 1 week	2007-03-29 08:26:04 +00:00
Konstantin Belousov	36d4667907	Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more neccessary Giant acquisions in softdepend processing code. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-20 10:51:45 +00:00
Brian Somers	dd51858d31	When we write extended attributes, assert that the inode hasn't already been deleted. The assertion is important to show that we won't end up accounting for extended attribute blocks (using fs_pendingblocks) in our subsequent call to fs_alloc(). Agreed verbally by: mckusick MFC after: 3 weeks	2007-03-19 18:51:02 +00:00
Konstantin Belousov	088ffd2086	Implement fine-grained locking for UFS quotas. Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global dqhlock mutex. i_dquot array for inode is protected by lockmgr' vnode lock, corresponding assert added to the dqget(). Access to struct ufsmount quota-related fields (um_quotas and um_qflags) is protected by um_lock. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith) This work were not possible without enormous amount of help given by Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out numerous errors and provided invaluable suggestions. Peter did tireless testing of the patch as it was developed.	2007-03-14 08:54:08 +00:00
Konstantin Belousov	df0f953ae2	Call getinoquota() before allocating new block for the directory to properly account for block allocation. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:50:27 +00:00
Konstantin Belousov	762c75b209	Remove unneeded getinoquota() call in the ufs_access(). Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)	2007-03-14 08:48:57 +00:00
Tor Egge	61b9d89ff0	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib	2007-03-13 01:50:27 +00:00
Kirk McKusick	a9093e846d	Move macros describing extended attributes in UFS from <sys/extattr.h> to <ufs/ufs/extattr.h>. Move description of extended attributes in UFS from man9/extattr.9 to man5/fs.5. Note that restore will not compile until <sys/extattr.h> and <ufs/ufs/extattr.h> have been updated. Suggested by: Robert Watson	2007-03-06 08:13:21 +00:00
Pawel Jakub Dawidek	b6f6e672f7	Fix build breakage.	2007-03-01 23:14:46 +00:00
Pawel Jakub Dawidek	7869327cfa	Change: "... try to use VADMIN in preference to VADMIN ..." To: "... try to use VADMIN in preference to VWRITE ..."	2007-03-01 21:44:08 +00:00
Pawel Jakub Dawidek	bb531912ff	Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better describe the privilege. OK'ed by: rwatson	2007-03-01 20:47:42 +00:00
Pawel Jakub Dawidek	3b2eb461e0	Avoid checking for privileges if there is no need to. Discussed with: rwatson	2007-03-01 20:38:24 +00:00
Brian Somers	98fff6b57c	Account for di_blocks allocations when IN_SPACECOUNTED is set in an inode's i_flag. It's possible that after ufs_infactive() calls softdep_releasefile(), i_nlink stays >0 for a considerable amount of time (> 60 seconds here). During this period, any ffs allocation routines that alter di_blocks must also account for the blocks in the filesystem's fs_pendingblocks value. This change fixes an eventual df/du discrepency that will happen as the result of fs_pendingblocks being reduced to <0. The only manifestation of this that people may recognise is the following message on boot: /somefs: update error: blocks -N files M at which point the negative pending block count is adjusted to zero. Reviewed by: tegge MFC after: 3 weeks	2007-02-23 20:23:35 +00:00
Kirk McKusick	6e6b7d44ef	The functions that set and delete external attributes must check that the filesystem is not mounted read-only before proceeding. Reported by: Ryan Beasley <ryanb@FreeBSD.org> MFC after: 1 week	2007-02-21 08:50:06 +00:00
Robert Watson	95b091d2f2	Rename three quota privileges from the UFS privilege namespace to the VFS privilege namespace: exceedquota, getquota, and setquota. Leave UFS-specific quota configuration privileges in the UFS name space. This renumbers VFS and UFS privileges, so requires rebuilding modules if you are using security policies aware of privilege identifiers. This is likely no one at this point since none of the committed MAC policies use the privilege checks.	2007-02-19 13:33:10 +00:00
Robert Watson	e82d0201bd	Limit quota privileges in jail to PRIV_UFS_GETQUOTA and PRIV_UFS_SETQUOTA.	2007-02-19 13:26:39 +00:00
Kirk McKusick	5a86fe5361	This README file is obsolete. The cited problems were fixed long ago and the code is installed by default so no longer requires action by the administrator to be included.	2007-02-17 08:25:43 +00:00
Pawel Jakub Dawidek	10bcafe9ab	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs	2007-02-15 22:08:35 +00:00
Konstantin Belousov	32e2b5f1e5	Style(9).	2007-02-15 09:24:58 +00:00
Konstantin Belousov	6a000036fc	Remove not needed acquision of the mount interlock aroung reading of mnt_kern_flags in ufs_itimes(). Suggested by: ssouhlal Confirmed by: tegge MFC after: 2 weeks	2007-02-08 09:47:19 +00:00
Tor Egge	0d86a7f7c2	Call pbgetvp() and pbrelvp() instead of setting b_vp directly. PR: kern/108151	2007-02-04 23:42:02 +00:00
Mike Pritchard	522883b87f	If quotacheck or edquota reset the block or inode grace time for a user or group, when the kernel first sees this, it will update the grace time value. However, it never flags the quota as modified and the updated value never makes it to the quota data file unless the user actually makes some other change that would write the data out. Fixed to flag the quota as modified if the soft limit has actually been reached and should be now enforced.	2007-02-04 06:46:57 +00:00
Mike Pritchard	6c62e3fce9	Prevent quotactl calls that pass in an id of -1 from incorrectly using the callers UID instead of the GID when performing group operations. This could allow users to determine group quota information for groups they are not a member of in some cases. Rename the "uid" parameter in ufs_quotactl to "id" to better show that it is used for more than just the uid, and to be more in line with the naming conventions in the other quota routines. PR: kern/33940	2007-02-01 02:13:53 +00:00
Mike Pritchard	3c0508582d	Disallow negative UIDs when processing quotactl options.	2007-02-01 01:01:56 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Xin LI	e499c6135c	Fix build. chkdquot() should not return anything.	2007-01-20 13:54:28 +00:00
Mike Pritchard	db9b81eabc	Quota system cleanup. 1) Do not do quota accounting for the actual quota data files or for file system snapshot files ("system" files). This prevents a deadlock descibed in PR kern/30958 if the kernel ever has to grow the quota file. Snapshot files were already exempt from the quota checks, but this change generalized the check. 2) Fix a cast that caused extremely large uids/gids to incorrectly write the quota information to the data file at a truncated value for a uint_t32 id value. The incorrect cast caused quota files in this case to be around 4GB in size, with the correct cast they can now be 131GB in size. Also related to PR kern/30958. 3) Check for what appear to be negative UIDs/GIDs and not account for them. This prevents the quota files from becoming 131GB in size and causing quotacheck to run forever at bootup. This could also cause the kernel to try and expand the quota file, which might deadlock due to the issue in #1. kern/30958 and kern/38156 (and some much older closed PR's). 4) With the deadlock problems gone, the kernel can now expand the size of the quota database files if it needs to. 5) Pass in the i-node count change value to chkiq and chkiqchg as an int, like it used to be before the common routine was split up into 2 different routines to increase / decrease the i-node in-use count. Prevents an underflow on the i-node count. Related to PR kern/89247. 6) Prevent the block usage from growing slowly if a file system is full and the write was denied due to that fact. PR kern/89247. Some of these changes require an updated quotacheck to prevent the creation of huge (131GB) quota data files (item #3). #1/#4 probably fixes a lot of the random hangs when quotas are enabled, possibly some of the jail hangs.	2007-01-20 11:58:32 +00:00
Mike Pritchard	6a5c532911	Fix a spelling error. heirarchy -> hierarchy. Obtained from: OpenBSD	2007-01-16 19:40:25 +00:00
Mike Pritchard	6192525baf	Fix a spelling error in some comments. heirarchy -> hierarchy. Obtained from: OpenBSD	2007-01-16 19:35:43 +00:00
Robert Watson	8102a9d4d5	Canonicalize copyright: use a date range rather than comma-delimited list. MFC after: 3 days	2007-01-08 17:55:32 +00:00
Kip Macy	2f6a774be4	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)	2006-11-13 05:51:22 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Konstantin Belousov	2276d0814f	Aquire Giant in the softdep_flush for clear_remove() and clear_inodedeps() processing when QUOTA is set. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 3 days	2006-11-01 13:48:44 +00:00
Pawel Jakub Dawidek	1a60c7fc8e	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Konstantin Belousov	ec7a247a24	Do not translate the IN_ACCESS inode flag into the IN_MODIFIED while filesystem is suspending/suspended. Doing so may result in deadlock. Instead, set the (new) IN_LAZYACCESS flag, that becomes IN_MODIFIED when suspend is lifted. Change the locking protocol in order to set the IN_ACCESS and timestamps without upgrading shared vnode lock to exclusive (see comments in the inode.h). Before that, inode was modified while holding only shared lock. Tested by: Peter Holm Reviewed by: tegge, bde Approved by: pjd (mentor) MFC after: 3 weeks	2006-10-10 09:20:54 +00:00
Tor Egge	ad4276811a	Correct check for when IO_SYNC should be set for filesystem not using softupdates when truncating a directory to zero length. Discussed with: bde	2006-10-02 02:08:31 +00:00
Tor Egge	8d0547c68b	Protect change to bo_flag by holding the bufobj mutex.	2006-09-26 04:21:20 +00:00
Tor Egge	e60c361218	Reduce fluctuations of mnt_flag to allow unlocked readers to get a slightly more consistent view.	2006-09-26 04:20:09 +00:00
Tor Egge	9b65c22cf4	Don't restore MNT_QUOTA bit in mnt_flag after snapshot creation, closing a race between nmount() and quotactl().	2006-09-26 04:19:11 +00:00
Tor Egge	55b4ff0d9f	Increase mnt_noasync once in softdep_mount() to disallow async io, closing a window where a file system using softupdates could be async for a short while if both MNT_UPDATE and MNT_ASYNC were passed as flags to nmount(). Add MNTK_SOFTDEP flag to ensure that softdep_mount() doesn't increase mnt_noasync multiple times.	2006-09-26 04:17:17 +00:00
Tor Egge	a1e363f256	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.	2006-09-26 04:15:59 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Konstantin Belousov	28de2218ec	Fix the glitch introduced in rev. 1.93. In softdep_sync_metadata(), switch by worklist type contains two for() loops, for D_INDIRDEP and D_PAGEDEP. On error, these loops are exited by break, where the switch actually shall be leaved. Use goto instead of break to reach the error handling code. Reported by: Peter Holm Reviewed by: tegge Approved by: pjd (mentor) MFC after: 2 weeks	2006-09-20 07:49:28 +00:00
Robert Watson	5702e0965e	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.	2006-09-17 20:00:36 +00:00
Konstantin Belousov	3f65847e2f	While checking for update of snapshot file in the ffs_copyonwrite, first filter out metadata update. Otherwise, devfs vnode could be erronously interpreted as ufs one, causing further check of i_flags to use random memory. PR: kern/100365 Debugged and fix described by: tegge Approved by: pjd (mentor) MFC after: 2 weeks	2006-08-21 17:20:19 +00:00
Pawel Jakub Dawidek	f4cc92c97c	Correct typo in comment.	2006-08-20 10:52:44 +00:00
David E. O'Brien	80cd95f9cc	Rather than print out a nice error message giving details sufficent to fix a 'ufs_dirbad' and then panicing (making it very hard to see the details), put them in the panic message itself.	2006-07-31 15:44:13 +00:00
Stefan Farfeleder	2b8c9fa46b	Drop two unnecessary casts.	2006-07-18 07:03:43 +00:00
Daichi GOTO	55e9893a66	The ufs_lookup.c has a critical bug around the whiteout process. UFS must check a whiteout name when it uses the whiteout, but the current implementation does not check the whileout name, so sometimes UFS writes over a wrong whtieout. UFS MUST check the whiteout name to use a corrent whiteout. This bug leads unionfs. panic. This commit fixes this trouble. Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer) Reviewed by: tegge & rodrigc (mentor) Approved by: rodrigc (mentor) MFC after: 2 weeks	2006-07-11 17:27:04 +00:00
Pawel Jakub Dawidek	5fe6d2beb4	Declare UFS module version.	2006-07-09 14:11:09 +00:00
Pawel Jakub Dawidek	946478fca6	Change fs->fs_fsmnt to mp->mnt_stat.f_mntonname in warnings about missing MAC and ACLs support in the kernel. If it is a first mount, fs->fs_fsmnt is empty. MFC after: 1 week	2006-07-09 14:10:35 +00:00
Craig Rodrigues	71ac2d7c7c	Check the sectorsize of the underlying disk before trying to bread() the UFS superblock. Should eliminate crashes when trying to do: mount -t ufs on an audio CD. PR: kern/85893 Reported by: Russell Francis <rfrancis at ev dot net> MFC after: 1 week	2006-06-03 21:20:37 +00:00
Maxim Konovalov	e680b88a3d	o Rearrange and remove incorrect comments. Requested by: bde	2006-05-31 15:55:52 +00:00
Maxim Konovalov	3593da9e81	o According to POSIX, the result of ftruncate(2) is unspecified for file types other than VREG, VDIR and shared memory objects. We already handle VREG, VLNK and VDIR cases. Silently ignore truncate requests for all the rest. Adjust comments. PR: kern/98064 Submitted by: bde Security: local DoS Regress. test: regression/fifo/fifo_misc MFC after: 2 weeks	2006-05-31 13:15:29 +00:00
Craig Rodrigues	ee98eb825b	Remove "update" from ffs_opts. It has been moved to global_opts in vfs_mount.c.	2006-05-26 12:44:12 +00:00
Craig Rodrigues	5eb304a91a	Remove calls to vfs_export() for exporting a filesystem for NFS mounting from individual filesystems. Call it instead in vfs_mount.c, after we call VFS_MOUNT() for a specific filesystem.	2006-05-26 00:32:21 +00:00
Craig Rodrigues	4ba8c2a5d3	Take errmsg out of ffs_opts. It is already part of global_opts in vfs_mount.c.	2006-05-24 00:12:21 +00:00
Maxim Konovalov	5d1d31b4b3	o Fix a comment: ufs2_dinode.di_blocks counts blocks not bytes actually held.	2006-05-21 21:55:29 +00:00
Maxim Konovalov	b6893ab299	o Fix a comment: directory whiteout type is DT_WHT not DT_W.	2006-05-21 21:28:34 +00:00
Tom Rhodes	e45269bf51	Provide a less cryptic panic message in place of just "found inode."	2006-05-16 18:51:22 +00:00
Tor Egge	e0cf717542	Read block hints list from last snapshot on the active snapshot list.	2006-05-16 00:14:20 +00:00
Tor Egge	d93d98d98f	Copy last block on file system again after file system has been suspended. Obtained from: NetBSD	2006-05-15 23:18:49 +00:00
Tor Egge	ae5d9f3b1c	Don't leak a locked buffer if last block on file system cannot be read.	2006-05-15 22:59:23 +00:00
Tor Egge	ebb78f64c7	Errors detected while file system is suspended should not trigger an assertion failure.	2006-05-15 22:52:22 +00:00
Tor Egge	b405cb5ea5	Expunge traces of unlinked snapshot files when making a new snapshot.	2006-05-13 20:41:37 +00:00
Tor Egge	4613aa0e99	Bring the call to softdep_releasefile() within the region protected by vn_start_secondary_write() since it might cause file system write activity (e.g. ffs_snapremove()).	2006-05-09 22:33:43 +00:00
Tor Egge	43e07fffb6	ffs_syncvnode() might skip some of the blocks due to them being locked, assuming them to be inflight write buffers. This is not always the case. bufdaemon might hold the buffer lock and give up writing the buffer due to it having dependencies, the file system being suspended or the vnode lock being held by another thread. When bufdaemon decides to write the buffer there is still a window before bufobj_wref() has been called, allowing other threads to believe that the vnode has no dirty buffers or inflight writes. Try harder to flush first block of new subdirectory to get rid of MKDIR_BODY dependency.	2006-05-06 20:51:31 +00:00
Tor Egge	b673e7b7eb	Return error if vnode was reclaimed while it was temporarily unlocked. Add missing calls to vn_finished_write() in error handling.	2006-05-05 21:27:31 +00:00
Tor Egge	0911ecffe7	Turn off disk quotas for snapshot files.	2006-05-05 20:10:04 +00:00
Tor Egge	c7793f61dc	Avoid locking overhead when snapshots are disabled.	2006-05-05 19:58:36 +00:00
Pawel Jakub Dawidek	5b139b2d75	- Set bio_done directly to NULL to indicate that we want to wait for the bio. - Use biowait() instead of copying the code. MFC after: 1 month	2006-05-05 10:06:22 +00:00
Tor Egge	d81daf63bc	Detect the snapshot file being prematurely unlinked.	2006-05-03 00:29:22 +00:00
Tor Egge	868bb88ff2	Temporarily undo clusters contribution to global runningbufspace while handling copy on write for the buffers taking part in the cluster.	2006-05-03 00:10:29 +00:00
Tor Egge	5515ad4282	A side effect of calling runningbufwakeup() is that bp->b_runningbufspace is cleared. Save old value and restore bp->b_runningbufspace before returning from ffs_copyonwrite().	2006-05-03 00:04:38 +00:00
Tor Egge	6d94935d36	Close a race when VOP_LOCK() on a snapshot file is attempted at the same time as it is changed back into a normal file. The locker would get the shared "snaplk" lock which would no longer be the correct lock for the vnode.	2006-05-02 23:52:43 +00:00
Scott Long	cbd6fedbf2	Fix a typo.	2006-04-28 04:39:50 +00:00
Jeff Roberson	6ca9fcc586	- Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child buffers to go on the buf daemon's DIRTYGIANT queue. - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler runs in the context of the buf daemon and may require Giant.	2006-04-28 01:05:31 +00:00
Tom Rhodes	7b3f1bbd61	Revert previous to this file before an actual request is made.	2006-04-22 04:22:15 +00:00
Tom Rhodes	8fc22c9d2e	Remove what I believe are two useless ifdefs. If a user or administrator enables multilabel, or any option for that matter, most likely they have a reason. This will allow users to see that mulilabel is enabled via an issued "mount" command and remove an annoying warning - printed only when a MAC kernel is not installed - on boot up. Discussed with: green, brueffer, Samy Al Bahra. Probably ran past: csjp (though I can't remember).	2006-04-21 07:14:25 +00:00
Ken Smith	39fac37953	Fix panic() message to give the right function name.	2006-04-17 07:43:56 +00:00
Tor Egge	68e8466655	Eliminate softdep_flush() livelock by accounting for number of worklist items marked as being in progress.	2006-04-03 22:23:23 +00:00
Jeff Roberson	3bbd6d8ae6	- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.	2006-03-31 03:54:20 +00:00
Tor Egge	700118c72f	Allow compilation when not using softupdates.	2006-03-19 22:16:44 +00:00
Tor Egge	7de3839d0d	Let snapshots make a copy of old contents for all buffers taking part in a cluster instead of just the first buffer. Delay buf_start() calls until snapshots have a copy of old content. PR: kern/93942	2006-03-19 21:43:36 +00:00
Tor Egge	30b3a49fab	Add kludge to avoid deadlock when unlinking snapshot.	2006-03-19 21:29:20 +00:00
Tor Egge	95e7a3c3ac	Reduce probability of unmount failing after having unmounted snapshots.	2006-03-19 21:09:19 +00:00
Tor Egge	8c86028f11	Ensure that vnode for directory isn't reclaimed before ffs_snapshot() has completed expunging unlinked files. It could come back at another memory location causing a lock order reversal.	2006-03-19 21:05:10 +00:00
Jeff Roberson	8db357205c	- Remove the call to softdep_waitidle after suspending the filesystem. This does not do what I wanted as all dirty buffers must be flushed by the call to ffs_sync and any remaining dependency work would mean that this failed. Pointed out by: tegge	2006-03-12 05:26:12 +00:00
Jeff Roberson	2eedeb7e60	- Remove the call to softdep_waitidle after suspending the filesystem. This does not do what I wanted as all dirty buffers must be flushed by the call to ffs_sync and any remaining dependency work would mean that this failed. Pointed out by: tegge	2006-03-12 05:24:14 +00:00
Tor Egge	ca2fa80767	Block secondary writes while expunging active unlinked files. Fix detection of active unlinked files by checking VI_OWEINACT and VI_DOINGINACT in addition to v_usecount. Defer inactive handling for unlinked files if the file system is mostly suspended (secondary writes being blocked). Perform deferred inactive handling after the file system is resumed.	2006-03-11 01:08:37 +00:00
Tor Egge	1e70cd7fc7	Remove unneeded (and broken) usage of MNT_REF()/MNT_REL().	2006-03-10 02:31:12 +00:00
Tor Egge	791dd2fade	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.	2006-03-08 23:43:39 +00:00
Tor Egge	a695d54404	Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended file systems. This could cause deadlocks when creating snapshots. Reviewed by: jeff	2006-03-08 02:14:39 +00:00
Tor Egge	3b582b4e72	Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.	2006-03-02 22:13:28 +00:00
Jeff Roberson	b9b12498fd	- Acquire lk in softdep_slowdown so that it's owned when we call softdep_speedup(). - Assert that lk is held in softdep_speedup() rather than acquiring it. This avoids a potential lock recursion.	2006-03-02 08:52:53 +00:00
Jeff Roberson	eb2ea10590	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris	2006-03-02 05:50:23 +00:00
Jeff Roberson	f5a4db791d	- Using LK_NOWAIT in qsync() can get us into infinite loop situations that lead to deadlocks. Remove it. MFC After: 1 week	2006-02-22 06:12:53 +00:00
Robert Watson	5652c15c24	In quotaoff(), lock the vnode instead of asserting it when manipulating v_vflags. MFC after: 1 week Submitted by: Antoine Brodin <antoine at brodin at laposte dot net>	2006-02-12 13:20:06 +00:00
Robert Watson	4a99d6f90a	Instead of asserting the vnode lock before manipulating v_vflag, acquire it and drop it afterwards. Found by: kris MFC after: 1 week	2006-02-11 21:09:27 +00:00
Jeff Roberson	89b0e10910	- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately. MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn	2006-02-01 00:25:26 +00:00
Tor Egge	82be0a5a24	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman	2006-01-09 20:42:19 +00:00
Tor Egge	6c62b2acd0	If the lock passed to getdirtybuf() is the softdep lock then the background write completed wakeup could be missed. Close the race by grabbing the lock normally used for protection of bp->b_xflags. Reviewed by: truckman	2006-01-09 19:32:21 +00:00
Tor Egge	c8c7711d66	Broaden scope of softdep_worklist_busy rwlock protection of softdep processing to avoid some dependencies being missed by softdep_flushworklist(). Reviewed by: truckman	2006-01-09 19:16:56 +00:00
Warner Losh	5c65ae3a88	New option: NO_FFS_SNAPSHOT. I did this in p4 about the same time that NetBSD implemented it independently of them (don't know which one was actually first). This saves about 24k for those times you don't need snapshot support (like when running off a ram disk, or in an embedded environment where size matters).	2006-01-06 04:44:09 +00:00
Xin LI	cd34c8b6a2	Typo.	2005-12-23 15:50:57 +00:00
Dag-Erling Smørgrav	0430a5e289	Eradicate caddr_t from the VFS API.	2005-12-14 00:49:52 +00:00
Craig Rodrigues	b6bd025c35	Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow mount options. Noticed by: Amir Shalem < amir at boom dot org dot il>	2005-11-24 15:06:40 +00:00
Craig Rodrigues	cea903627f	If export mount flag is not passed in, set default parameters for export structure and pass that to vfs_export(). Currently in userland mount(8), an export structure is unconditionally passed in, only for UFS. This is an attempt to move that UFS-specific behavior out of mount(8) and into the UFS filesystem code.	2005-11-20 17:04:50 +00:00
Craig Rodrigues	359d438885	Add more options to ffs_opts, so that vfs_filteropts() will not complain when we pass these options to a UFS filesystem as strings via nmount(): noexec, nosuid, nosymfollow, sync, suiddir	2005-11-19 23:28:19 +00:00
Craig Rodrigues	26f59b6455	- Add parsing for the following existing UFS/FFS mount options in the nmount() callpath via vfs_getopt(), and set the appropriate MNT_* flag: -> acls, async, force, multilabel, noasync, noatime, -> noclusterr, noclusterw, snapshot, update - Allow errmsg as a valid mount option via vfs_getopt(), so we can later add a hook to propagate mount errors back to userspace via vfs_mount_error().	2005-11-18 06:06:10 +00:00
Xin LI	fad951e3a0	Slightly reorganize to reduce duplicated code. Reviewed by: rwatson	2005-11-07 18:25:23 +00:00
Paul Saab	e1cef62715	Rate limit filesystem full and out of inodes messages to once a second.	2005-10-31 20:33:28 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Xin LI	eb2893ec18	Remove an unneeded "a" from comment.	2005-10-25 19:46:15 +00:00
Nate Lawson	8680d6985f	Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit to (max block - 1) * bsize. For DEV_BSIZE, this doubles the limit from 0.5 TB to 1 TB. For the old 4.4 FFS case, decrease the limit from 0.5 TB to 2 GB - 1. Older systems had a 32 bit off_t so they couldn't access the larger files anyway. Collaboration with: bde	2005-10-21 01:54:00 +00:00
Don Lewis	875e108755	Correct the type of the temporary variable used by ufs_lookup.c:1.78 to fix the race condition in the ufs_lookup() ISDOTDOT code. Noticed by: bde MFC after: 12 days	2005-10-16 21:31:46 +00:00
Don Lewis	12d360453c	Close a race in the ufs_lookup() code that handles the ISDOTDOT case by saving the value of dp->i_ino before unlocking the vnode for the current directory and passing the saved value to VFS_VGET(). Without this change, another thread can overwrite dp->i_ino after the current directory is unlocked, causing ufs_lookup() to lock and return the wrong vnode in place of the vnode for its parent directory. A deadlock can occur if dp->i_ino was changed to a subdirectory of the current directory because the root to leaf vnode lock ordering will be violated. A vnode lock can be leaked if dp->i_ino was changed to point to the current directory, which causes the current vnode lock for the current directory to be recursed, which confuses lookup() into calling vrele() when it should be calling vput(). The probability of this bug being triggered seems to be quite low unless the sysctl variable debug.vfscache is set to 0. Reviewed by: jhb MFC after: 2 weeks	2005-10-14 22:13:33 +00:00
Robert Watson	606dcf085f	When performing a VOP_LOOKUP() as part of UFS1 extended attribute auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE. This flag must now be set so that lockmgr knows what kind of lock to acquire, and it will panic if not specified. This resulted in a panic when using extended attributes on UFS1 as of locking work present in the 6.x branch. This is a RELENG_6_0 merge candidate. Reported by: lofi MFC after: 3 days	2005-10-12 14:18:58 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Tor Egge	48c2ac4539	Avoid unintended VMIO on directories and symlinks due to leftover object not having been destroyed.	2005-10-10 19:02:04 +00:00
Tor Egge	4e0cd00988	Adjust totread argument passed to cluster_read() to account for offset not being block aligned.	2005-10-09 21:11:25 +00:00
Tor Egge	9248a8271c	Don't pretend that a failed sync write was succesful.	2005-10-09 20:49:01 +00:00
Tor Egge	021869b542	Reduce probability for a deadlock that can occur when a snapshot inode is updated by a process holding the snapshot lock. Another process updating a different inode in the same inodeblock will do copy on write checks and lock in the opposite direction. The snapshot code force a copy on write of these blocks manually (cf. start of expunge_ufs[12]) and these inode blocks are later put on snapblklist. This partial fix is to 'drain' the relevant ffs_copyonwrite() operation after installing new snapblklist. This is not a 100% solution since a failed block allocation can cause implicit fsync() which might deadlock before the new snapblklist has been installed.	2005-10-09 20:15:15 +00:00
Tor Egge	d4d530da96	Eliminate a deadlock that can occur when a dirty block belonging to a snapshot file is flushed by a process not holding snaplk (e.g. bufdaemon). Another process might hold snaplk and try to access the block due to ffs_copyonwrite processing.	2005-10-09 20:07:51 +00:00
Tor Egge	45f91051da	Eliminate a deadlock that can occur during the cgaccount() processing due to the cg map buffer being held when writing indirect blocks. The process ends up in ffs_copyonwrite(), attempting to get snaplk while holding the cg map buffer lock. Another process might be in ffs_copyonwrite(), trying to allocate a new block for a copy. It would hold snaplk while trying to get the cg map buffer lock. Release the cg map buffer early and use the copy for most of the cgaccount processing to avoid this deadlock.	2005-10-09 20:00:16 +00:00
Tor Egge	17026ff61a	Reduce the probability of low block numbers passed to ffs_snapblkfree() by skipping the call from ffs_snapremove() if the block number is zero. Simplify snapshot locking in ffs_copyonwrite() and ffs_snapblkfree() by using the same locking protocol for low block numbers as for larger block numbers. This removes a lock leak that could happen if vn_lock() succeeded after lockmgr() failed in ffs_snapblkfree(). Check if snapshot is gone before retrying a lock in ffs_copyonwrite().	2005-10-09 19:45:01 +00:00
Tor Egge	c73e9e9c7b	Reinitialize v_type and v_op fields in case vnode has been reused without reclamation. If the vnode previously was a fifo then v_op would point to ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at the end of ffsext_strategy.	2005-10-09 19:06:34 +00:00
Don Lewis	448434c3c9	Initialize the inode i_flag field in ffs_valloc() to clean up any stale flag bits left over from before the inode was recycled. Without this change, a leftover IN_SPACECOUNTED flag could prevent softdep_freefile() and softdep_releasefile() from incrementing fs_pendinginodes. Because handle_workitem_freefile() unconditionally decrements fs_pendinginodes, a negative value could be reported at file system unmount time with a message like: unmount pending error: blocks 0 files -3 The pending block count in fs_pendingblocks could also be negative for similar reasons. These errors can cause the data returned by statfs() to be slightly incorrect. Some other cleanup code in softdep_releasefile() could also be incorrectly bypassed. MFC after: 3 days	2005-10-03 21:57:43 +00:00
Don Lewis	460858e9ef	Correct previous commit to fix the sense of the TDP_NORUNNINGBUF check in ffs_copyonwrite() that is a precondition for calling waitrunningbufspace(). Pointed out by: tegge Pointy hat to: truckman MFC after: 3 days	2005-10-01 19:10:48 +00:00
Don Lewis	bd3c2d867d	Un-staticize waitrunningbufspace() and call it before returning from ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().	2005-09-30 18:07:41 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
Don Lewis	445193b887	After a rmdir()ed directory has been truncated, force an update of the directory's inode after queuing the dirrem that will decrement the parent directory's link count. This will force the update of the parent directory's actual link to actually be scheduled. Without this change the parent directory's actual link count would not be updated until ufs_inactive() cleared the inode of the newly removed directory, which might be deferred indefinitely. ufs_inactive() will not be called as long as any process holds a reference to the removed directory, and ufs_inactive() will not clear the inode if the link count is non-zero, which could be the result of an earlier system crash. If a background fsck is run before the update of the parent directory's actual link count has been performed, or at least scheduled by putting the dirrem on the leaf directory's inodedep id_bufwait list, fsck will corrupt the file system by decrementing the parent directory's effective link count, which was previously correct because it already took the removal of the leaf directory into account, and setting the actual link count to the same value as the effective link count after the dangling, removed, leaf directory has been removed. This happens because fsck acts based on the actual link count, which will be too high when fsck creates the file system snapshot that it references. This change has the fortunate side effect of more quickly cleaning up the large number dirrem structures that linger for an extended time after the removal of a large directory tree. It also fixes a potential problem with the shutdown of the syncer thread timing out if the system is rebooted immediately after removing a large directory tree. Submitted by: tegge MFC after: 3 days	2005-09-29 21:50:26 +00:00
Robert Watson	5f419982c2	Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57, osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60, svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81, svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55, svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10, ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58, unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133: Now that Giant is acquired in uprintf() and tprintf(), the caller no longer leads to acquire Giant unless it also holds another mutex that would generate a lock order reversal when calling into these functions. Specifically not backed out is the acquisition of Giant in nfs_socket.c and rpcclnt.c, where local mutexes are held and would otherwise violate the lock order with Giant. This aligns this code more with the eventual locking of ttys. Suggested by: bde	2005-09-28 07:03:03 +00:00
John Baldwin	7e9e371f2d	Use the refcount API to manage the reference count for user credentials rather than using pool mutexes. Tested on: i386, alpha, sparc64	2005-09-27 18:09:42 +00:00
Xin LI	0e4f6eecd7	Restore a historical ufs_inactive behavior that has been changed in rev. 1.40 of ufs_inode.c, which allows an inode being truncated even when the filesystem itself is marked RDONLY. A subsequent call of UFS_TRUNCATE (ffs_truncate) would panic the system as it asserts that it can only be called when the filesystem is mounted read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c). Because ffs_mount() already takes care of sync'ing the filesystem to disk before being downgraded to readonly, it appears to be more desirable that we should not permit this sort of writes to disk. This change would fix a panic that occours when read-only mounted a corrupted filesystem and doing some file operations. MT6/5/4 candidate Reviewed by: mckusick	2005-09-23 20:49:57 +00:00
Robert Watson	84d2b7df26	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week	2005-09-19 16:51:43 +00:00
Tor Egge	2f0ffabcf4	Giant is no longer needed here.	2005-09-12 01:21:42 +00:00
Christian S.J. Peron	d1dfd92177	Convert the primary ACL allocator from malloc(9) to using a UMA zone instead. Also introduce an aclinit function which will be used to create the UMA zone for use by file systems at system start up. MFC after: 1 month Discussed with: rwatson	2005-09-06 00:06:30 +00:00
Tor Egge	d536ff2edb	Retain generation count when writing zeroes instead of an inode to disk. Don't free a struct inodedep if another process is allocating saved inode memory for the same struct inodedep in initiate_write_inodeblock_ufs[12](). Handle disappearing dependencies in softdep_disk_io_initiation(). Reviewed by: mckusick	2005-09-05 22:14:33 +00:00
Suleiman Souhlal	fdedad764a	ffs_mountfs() needs devvp to be locked, so lock it. Glanced at by: phk Tested by: pjd MFC after: 3 days	2005-09-02 13:52:55 +00:00
Suleiman Souhlal	93373c422f	Set the mountpoint path in the superblock (fs_fsmnt) at mount-time so that it appears in the various messages (not cleanly unmounted, filesystem full, etc). This has been broken since rev 1.261.	2005-08-21 22:06:41 +00:00
Tor Egge	15da51f73c	Don't set the COMPLETE flag in an inodedep structure before the related inode has been written.	2005-08-21 18:19:06 +00:00
Ian Dowse	65ed954554	In the ufsdirhash_build() failure case for corrupted directories or unreadable blocks, make sure to destroy the mutex we created. Also fix an unrelated typo in a comment. Found by: Peter Holm's stress tests Reviewed by: dwmalone MFC after: 3 days	2005-08-17 08:48:42 +00:00
Stephan Uphoff	ed8938e082	Delay freeing disk space for file system blocks until all dirty buffers are safely released. This fixes softdep problems on truncation (deletion) of files with dirty buffers. Reviewed by: jeff@, mckusick@, ps@, tegge@ Tested by: glebius@, ps@ MFC after: 3 weeks	2005-07-31 20:24:14 +00:00
Alan Cox	ec9c9e7363	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
Suleiman Souhlal	679985d03a	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)	2005-06-09 20:20:31 +00:00
Ken Smith	6341095e0d	This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that. A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking). Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@	2005-05-31 19:39:52 +00:00
Jeff Roberson	204ec66d38	- Don't set our bio op to be a READ when we've just completed a write. There are subtle differences in the read and write completion path. Instead, grab an extra write ref so the write path can drop it when we recursively call bufdone(). I believe this may be the source of the wrong bufobj panics. Reported by: pho, kkenn	2005-05-30 07:04:15 +00:00
Kirk McKusick	6a52a06851	Allow removal of empty directories with high link counts. These can occur on a filesystem running with soft updates after a crash and before a background fsck has been run. To prevent discrepancies from arising in a background fsck that may already be running, the directory is removed but its inode is not freed and is left with the residual reference count. When encountered by the background fsck it will be reclaimed.	2005-05-18 22:18:21 +00:00
Jeff Roberson	6c71a2208d	- Don't restrict the softdep stats to DEBUG kernels, they cost nothing to export. This was happening anyway since this file manually sets DEBUG. - Add a sysctl for the number of items on the worklist. - Use a more canonical loop restart in softdep_fsync_mountdev, it saves some code at the expense of a goto and makes me worry less about modifying a variable that should be private to the TAILQ_FOREACH_SAFE macro.	2005-05-03 11:03:29 +00:00
Jeff Roberson	2524c26de8	- Use bdone() directly instead of calling it indirectly through ffs_rawreaddone(). Sponsored by: Isilon Systems, Inc.	2005-04-30 11:28:19 +00:00
Pawel Jakub Dawidek	231b1be179	- Plug memory leak. - Fix two style nits. Found by: Coverity Prevent analysis tool Reviewed by: rwatson MFC after: 1 week	2005-04-16 10:57:49 +00:00
Jeff Roberson	4585e3ac5a	- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:59:09 +00:00
Jeff Roberson	ece9473efa	- Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate(). Do the same for oip. Pointed out by: glebius	2005-04-05 08:49:41 +00:00
Jeff Roberson	bcc8f66c8b	- Use M_ZERO rather than explicitly calling bzero(). - Don't intermingle direct calls to lockmgr and indirect calls through VOPs. This will be important in the future. - Dont lock the devvp's interlock just to release it on the next line by passing LK_INTERLOCK to lockmgr. - Restructure ffs_snapshot_unmount so we don't call free() with the devvp's interlock locked.	2005-04-03 12:03:44 +00:00
Jeff Roberson	41d4783d49	- In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode because it may change identities while we're sleeping on the lock. Otherwise we may bail out of ffs_sync() early due to an error from deadfs. - Collapse a VOP_UNLOCK, vrele into a single vput().	2005-04-03 10:38:18 +00:00
Jeff Roberson	153910e0f5	- Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix two bugs. - ffs_disk_prewrite was pulling the vp from the buf and checking for COPYONWRITE, when really it wanted the vp from the bufobj that we're writing to, which is the devvp. This lead to us skipping the copy on write to all file data, which significantly broke snapshots for the last few months. - When the SOFTUPDATES option was not included in the kernel config we would also skip the copy on write check, which would effectively disable snapshots. - Remove an invalid mp_fixme(). Debugging tips from: mckusick Reported by: iedowse, others Discussed with: phk	2005-04-03 10:29:55 +00:00
Jeff Roberson	278c5a6efa	- Fix botched LK_NOWAIT removal. I mistakenly thought this compiled as part of GENERIC.	2005-03-31 05:58:14 +00:00
Jeff Roberson	aa7ba42796	- FFS supports shared locks, clear LK_NOSHARE from our vnode locks. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:23:20 +00:00
Jeff Roberson	ec3db02a3e	- Set LK_NOSHARE for snapshot locks. snapshots require exclusive only access. - Remove the hack from ffs_lock() to implement LK_NOSHARE in a ffs specific way. Sponsored by: Isilon Systems, Inc.	2005-03-31 05:21:17 +00:00
Jeff Roberson	f247a5240d	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
Jeff Roberson	52f6886551	- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c prevents any callers from doing a modifying op without LOCKPARENT or WANTPARENT. It wasn't even properly used in the CREATE or DELETE cases.	2005-03-29 13:16:38 +00:00
Jeff Roberson	d6919865fa	- Upgrade a shared lock request to exclusive in ffs_vget() if we have to create the vnode. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:10:51 +00:00
Jeff Roberson	a69c43548d	- Honor the cn_lkflags passed from namei() when locking the leaf. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:10:01 +00:00
Jeff Roberson	e19881ff08	- UFS no longer uses PDIRUNLOCK to track the parent state. Instead, we now rely on ufs to always leave the parent locked except in the ISDOTDOT case. Adjust asserts to deal with these changes. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:35:58 +00:00
Jeff Roberson	eddcb03d02	- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:34:36 +00:00
David Schultz	188f6433f6	When the softupdates worklist gets too long, threads that attempt to add more work are forced to process two worklist items first. However, processing an item may generate additional work, causing the unlucky thread to recursively process the worklist. Add a per-thread flag to detect this situation and avoid the recursion. This should fix the stack overflows that could occur while removing large directory trees. Tested by: kris Reviewed by: mckusick	2005-03-25 17:30:31 +00:00
Jeff Roberson	080c061ad0	- Call VFS_ROOT() with LK_EXCLUSIVE. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:33:45 +00:00
Jeff Roberson	469ec10c1e	- Update the ufs_root() prototype. - Pass the ufs_root() flags argument to VFS_VGET() to allow callers to specify shared locks. Sponsored by: Isilon Systems, Inc.	2005-03-24 07:32:50 +00:00
Jeff Roberson	23d15e852d	- Lock the clearing of v_data in ufs_reclaim() to prevent a pagefault in ffs_lock() when it acesses v_data without the vnlock. Sponsored by: Isilon Systems, Inc.	2005-03-17 11:58:43 +00:00
Poul-Henning Kamp	51f5ce0c8c	Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.	2005-03-16 11:20:51 +00:00
Poul-Henning Kamp	de68347b1b	Don't hold a reference on the disk vnode for each inode.	2005-03-15 20:50:58 +00:00
Poul-Henning Kamp	45c26fa2b6	Improve the vfs_hash() API: vput() the unneeded vnode centrally to avoid replicating the vput in all the filesystems.	2005-03-15 20:00:03 +00:00
Poul-Henning Kamp	e82ef95c11	Simplify the vfs_hash calling convention.	2005-03-15 08:07:07 +00:00
Jeff Roberson	4483fe9227	- Destroy the vnode object earlier in VOP_RECLAIM as we need more of the vnode valid before the vm flushes pages. - Get rid of some extraneous uses of the vnode interlock. Sponsored by: Isilon Systems, Inc.	2005-03-15 01:42:58 +00:00
Poul-Henning Kamp	14bc0685ac	Use vfs_hash instead of home-rolled.	2005-03-14 10:21:16 +00:00
Jeff Roberson	9cbe5da9d5	- It is not legal to access v_data without the vnode lock or interlock held. Grab the vnode interlock if LK_INTERLOCK has not been passed in so that we can inspect v_data in ffs_lock(). Sponsored by: Isilon Systems, Inc.	2005-03-13 12:04:12 +00:00
Jeff Roberson	fe68abe291	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. - Shorten ffs_reload by one step. The old check for an inactive vnode was slightly racey, and the code which deals with still active vnodes is not much more expensive. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:03:14 +00:00
Jeff Roberson	fdcc82276e	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:01:50 +00:00
Jeff Roberson	b5411d4fcb	- Fix an assert now that the XLOCK no longer exists. Sponsored by: Isilon Systems, Inc.	2005-03-13 12:00:41 +00:00
Jeff Roberson	b6ee8476d3	- In ufs_mknod(), hold the lock across the call to vgone() as that is now required. - In ufs_close(), don't do the EAGAIN vrele hack, the top layer now calls vn_start_write before the lock is acquired as it should. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:59:14 +00:00
Jeff Roberson	38d504db44	- Don't drop the lock in ufs_inactive(). - Also in ufs_inactive, don't acquire the vnode interlock where it isn't strictly needed. Also owning the vnode interlock while calling vprint() will cause locking assertions to trip. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:57:39 +00:00
Jeff Roberson	41766826eb	- Fix anoter dyslexic moment; an atomic_set_int should've become ACTIVESET, not ACTIVECLEAR. Submitted by: iedowse	2005-03-01 07:38:45 +00:00
Poul-Henning Kamp	7ce296cf04	Remove debug printout of major/minor numbers, print name instead.	2005-02-27 21:16:26 +00:00
Sam Leffler	d5bbad8372	use uiomove return value instead of always returning 0 when doing a readlink of a fast link Noticed by: Coverity Prevent analysis tool Reviewed by: phk	2005-02-27 18:58:31 +00:00
Jeff Roberson	1a4a9672f1	- Add VOP locking asserts in several functions that have been implicated in recent deadlocks.	2005-02-22 23:56:42 +00:00
Xin LI	a16baf37b9	The recomputation of file system summary at mount time can be a very slow process, especially for large file systems that is just recovered from a crash. Since the summary is already re-sync'ed every 30 second, we will not lag behind too much after a crash. With this consideration in mind, it is more reasonable to transfer the responsibility to background fsck, to reduce the delay after a crash. Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to control this behavior. When set to nonzero, we will get the "old" behavior, that the summary is computed immediately at mount time. Add five new sysctl variables to adjust ndir, nbfree, nifree, nffree and numclusters respectively. Teach fsck_ffs about these API, however, intentionally not to check the existence, since kernels without these sysctls must have recomputed the summary and hence no adjustments are necessary. This change has eliminated the usual tens of minutes of delay of mounting large dirty volumes. Reviewed by: mckusick MFC After: 1 week	2005-02-20 08:02:15 +00:00
Poul-Henning Kamp	dfd4be14bd	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.	2005-02-19 11:44:57 +00:00
Xin LI	d5128ab2af	When clearing a fragment, it's possible that the length is zero. Reviewed by: mckusick MFC After: 1 week	2005-02-19 07:31:33 +00:00
Jeff Roberson	a8127ebb5d	- Remove the unused and unsafe ufs_ihashlookup. This function returned a vnode pointer that could not be used since no locks were held. Sponsored by: Isilon Systems, Inc.	2005-02-14 20:51:39 +00:00
Poul-Henning Kamp	1121c39497	Make non-SOFTUPDATES kernels compile again. Integrate the stubfile into the main file now that license issues have been long resolved.	2005-02-11 08:13:31 +00:00
Poul-Henning Kamp	adf4157738	Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static.	2005-02-10 12:20:08 +00:00
Jeff Roberson	a3caf16e99	- In the softupdates case for ffs_truncate() we use vinvalbuf() to invalidate pending io and dependencies. However, vinvalbuf() rightfully does not call vnode_pager_setsize() for us. We must do this here. This could potentially have caused numerous kinds of bugs, but it was specifically causing msync() deadlocks because msync() was writing flushing pages that should not have been valid. Sponsored by: Isilon Systems, Inc. Reported by: kkenn	2005-02-09 23:05:20 +00:00
Poul-Henning Kamp	365b18aa89	style polishing.	2005-02-09 12:22:16 +00:00
Colin Percival	79653046d8	Add a new sysctl, "security.jail.chflags_allowed", which controls the behaviour of chflags within a jail. If set to 0 (the default), then a jailed root user is treated as an unprivileged user; if set to 1, then a jailed root user is treated the same as an unjailed root user. This is necessary to allow "make installworld" to work inside a jail, since it attempts to manipulate the system immutable flag on certain files. Discussed with: csjp, rwatson MFC after: 2 weeks	2005-02-08 21:31:11 +00:00
Poul-Henning Kamp	02f2c6a9d8	Split the vop_vector for ffs1 and ffs2, this is mostly for the different EXTATTR support.	2005-02-08 21:03:52 +00:00
Poul-Henning Kamp	44787ceb0b	Use ffs_truncate() directly instead of UFS_TRUNCATE()	2005-02-08 20:51:00 +00:00
Poul-Henning Kamp	dd19a799b8	Background writes are entirely an FFS/Softupdates thing. Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.	2005-02-08 20:29:10 +00:00
Poul-Henning Kamp	88e5b12a20	Drag another softupdates tentacle back into FFS: Now that FFS's vop_fsync is separate from the internal use we can do the full job there.	2005-02-08 18:09:11 +00:00
Poul-Henning Kamp	efd6d9808c	Don't use the UFS_* and VFS_* functions where a direct call is possble. The UFS_ functions are for UFS to call back into VFS. The VFS functions are external entry points into the filesystem.	2005-02-08 17:40:01 +00:00
Robert Watson	45faa442c3	Don't use VOP_LEASE() with operations on extended attribute backing files. Pointed out by: phk	2005-02-08 17:05:38 +00:00
Poul-Henning Kamp	40854ff546	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.	2005-02-08 16:25:50 +00:00
Poul-Henning Kamp	d6f622cc2f	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.	2005-02-08 15:54:30 +00:00
Poul-Henning Kamp	32a870da8a	Use VOP_STRATEGY_APV() instead of direct dereference, this is more correct.	2005-02-08 15:40:11 +00:00
Jeff Roberson	9087d86e66	- Use a seperate malloc tag for saved inode contents to help in debugging memory modified after free errors. Sponsored by: Isilon Systems, Inc.	2005-02-02 20:30:47 +00:00
Ken Smith	87c29bf93e	Back out previous commit, bde@ provided an example of something this breaks.	2005-02-02 14:21:01 +00:00
Ken Smith	0fac1537a2	It was noticed that we do not change a file's access time when it gets executed. This appears to violate most of the UNIX-ish standards. One example quote from: http://www.opengroup.org/onlinepubs/009695399/functions/exec.html Upon successful completion, the exec functions shall mark for update the st_atime field of the file. If an exec function failed but was able to locate the process image file, whether the st_atime field is marked for update is unspecified. Should the exec function succeed, the process image file shall be considered to have been opened with open(). This appears to take care of it for ufs filesystems, doing the necessary sanity checks (read-only filesystem, etc) without violating any other standards (setting atime for any open appears to be allowed in any standards I could find). Noticed by: cperciva Reviewed by: kan, rwatson	2005-02-02 00:21:38 +00:00
Warner Losh	1f0ce611b3	nit in /*-	2005-01-31 08:16:45 +00:00
Peter Edwards	e697161fa2	Tell vnode_create_vobject() how big an object to create, rather than having it work it out via the more expensive VOP_GETATTR Reviewed by: phk@	2005-01-29 14:23:09 +00:00
Poul-Henning Kamp	a369f34d76	Make filesystems get rid of their own vnodes vnode_pager object in VOP_RECLAIM().	2005-01-28 14:42:17 +00:00
Poul-Henning Kamp	d4eb29ba71	Remove unused argument to vrecycle()	2005-01-28 13:08:21 +00:00
Poul-Henning Kamp	84a6975215	Introduce and use g_vfs_close().	2005-01-25 15:52:04 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	f74b3b1f6c	Create a vnode object when the file is opened. Trust that we did so.	2005-01-24 23:04:33 +00:00
Poul-Henning Kamp	ce12d37e7b	Don't create vnode_pager objects for the disk device. geom_vfs will do that.	2005-01-24 22:41:59 +00:00
Poul-Henning Kamp	625d4bc03a	Create a vp->v_object in VFS_FHTOVP() if we want to be exportable with NFS. We are moving responsibility for creating the vnode_pager object into the filesystems which own the vnode, and this is one of the places we have to cover. We call vnode_create_vobject() directly because we own the vnode. If we can get the size easily, pass it as an argument to save the call to VOP_GETATTR() in vnode_create_vobject()	2005-01-24 21:51:19 +00:00
Poul-Henning Kamp	091710ab22	Polish style.	2005-01-24 12:19:28 +00:00
Jeff Roberson	08023360a0	- Convert the global LK lock to a mutex. - Expand the scope of lk to cover not only interrupt races, but also top-half races, which includes many new uses over global top-half only data. - Get rid of interlocked_sleep() and use msleep or BUF_LOCK where appropriate. - Use the lk mutex in place of the various hand rolled semaphores. - Stop dropping the lk lock before we panic. - Fix getdirtybuf() callers so that they reacquire access to whatever softdep datastructure they were inxpecting in the failure/retry case. Previously, sleeps in getdirtybuf() could leave us with pointers to bad memory. - Update handling of ffs to be compatible with ffs locking changes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:18:31 +00:00
Jeff Roberson	3ba649d792	- Initialize and destroy the per-filesystem ufs lock where appropriate. - Use the buffer lock on the superblock buf to serialize calls to sbupdate. - Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:12:28 +00:00
Jeff Roberson	dec351f69e	- Remove GIANT_REQUIRED where giant is no longer required. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:10:47 +00:00
Jeff Roberson	5cef9d6add	- Use the ufs lock to protect fs_active. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:10:11 +00:00
Jeff Roberson	353255885c	- Acquire the ufs lock around several ffs_alloc functions that require it. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:09:10 +00:00
Jeff Roberson	8e37fbad3a	- Don't use atomic operations to deal with the active array, instead it is now quite naturally protected by the ufsmount mutex. - Use the ufs lock to protect various fields in struct fs, primarily the cg summary needs protection to avoid allocation races. Several functions have been slightly re-arranged to reduce the number of lock operations. - Adjust several functions (blkfree, freefile, etc.) to accept a ufsmount as an argument so that we may access the ufs lock. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:08:35 +00:00
Jeff Roberson	5c77b03eff	- Acquire the ufs lock when manipulating some fields of struct fs. - Change arguments to various ffs functions to match their new prototypes. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:04:22 +00:00
Jeff Roberson	f2aa1113a3	- Mark the struct fs members that require the ufsmount mutex. - Define some macros for manipulating the fs_active bitmap. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:03:17 +00:00
Jeff Roberson	aaee366929	- Change some function parameters so that the ufsmount structure is accessable in places where the ufs lock will be needed. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:02:11 +00:00
Jeff Roberson	751d0d9fc9	- Add a mutex to the ufsmount structure. This mutex is used to protect any per-instance global data that is not already protected by a buf or vnode lock. Presently, only fields in ffs's struct fs utilize this lock. - Sort some ufsmount members so that fields used for quotas are grouped together. This is in anticipation of quota locking. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:01:10 +00:00
Pawel Jakub Dawidek	39cfb23935	Fix ACLs handling for the root file system. Without this fix, when ACLs are set via tunefs(8) on the root file system, they are removed on boot when 'mount -a' is called, because mount(8) called for the root file system always add MNT_UPDATE flag and MNT_UPDATE flag isn't perfect. Now, one cannot remove ACLs stored in superblock (configured with tunefs(8)) via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still possible to mount file system which doesn't have ACLs in superblock via 'mount -o acls <file system>' or /etc/fstab's 'acls' option. Reported by: Lech Lorens/pl.comp.os.bsd Discussed with: phk, rwatson Reviewed by: rwatson MFC after: 2 weeks	2005-01-15 17:09:53 +00:00
Poul-Henning Kamp	7c0745eeae	Eliminate unused and unnecessary "cred" argument from vinvalbuf()	2005-01-14 07:33:51 +00:00
Poul-Henning Kamp	e39db32ab0	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.	2005-01-13 12:25:19 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	0391e5a151	Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()	2005-01-11 09:10:46 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
Warner Losh	60727d8b86	/* -> /*- for license, minor formatting changes	2005-01-07 02:29:27 +00:00
Poul-Henning Kamp	a7e8286f28	white space	2004-12-14 21:35:00 +00:00
Poul-Henning Kamp	59d42685ad	Implement simpler panics for VOP_{read,write} on fifos.	2004-12-14 21:30:45 +00:00
Warner Losh	7a7e867742	LINT defines things which compile in code that as referring to the old a_desc element. change this to the new a_gen.a_desc to reflect changes to vnode_if.h generation. Noticed by: tinderbox, phk	2004-12-13 17:53:20 +00:00
Poul-Henning Kamp	4a18054d7b	With the introduction of UFS2 we started looking for superblocks in four different locations on a prospective filesystem. If we found none, we forgot to invalidate the four buffers, thus the following sequence would fails: (md0 = blank disk) mount /dev/md0 /mnt (fails, no superblocks) newfs /dev/md0 (writes using physio which does not go through buffercache). mount /dev/md0 /mnt (still fails, the four cached buffers still contain no superblocks) Found by: ru	2004-12-12 14:19:11 +00:00
Marcel Moolenaar	9effe51e45	Revert previous commit. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed. Requested by: phk	2004-12-11 23:05:30 +00:00
Kirk McKusick	364ed814e7	Fixes a bug that caused UFS2 filesystems bigger than 2TB to prematurely report that they were full and/or to panic the kernel with the message ``ffs_clusteralloc: allocated out of group''. Submitted by: Henry Whincup <henry@jot.to> MFC after: 1 week	2004-12-09 21:24:00 +00:00
Poul-Henning Kamp	8f25bad356	Fix snapshot creation.	2004-12-08 11:54:06 +00:00
Poul-Henning Kamp	f21cc2cafc	Fix nfs exports (for now). The real fix is to teach mountd about nmount.	2004-12-07 15:09:30 +00:00
Poul-Henning Kamp	20a92a18f1	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).	2004-12-07 08:15:41 +00:00
Poul-Henning Kamp	743312367a	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.	2004-12-05 22:41:02 +00:00
Marcel Moolenaar	061f5ec825	Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day. No pathos on: current@	2004-12-05 22:30:28 +00:00
Poul-Henning Kamp	93e0b506e3	typo in comment.	2004-12-03 20:36:55 +00:00
Poul-Henning Kamp	aec0fb7b40	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)	2004-12-01 23:16:38 +00:00
Poul-Henning Kamp	6fde64c778	Mechanically change prototypes for vnode operations to use the new typedefs.	2004-12-01 12:24:41 +00:00
Poul-Henning Kamp	964ebefd8d	Use system wide no-op vfs_start function.	2004-11-25 09:11:27 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	9c83534dd8	Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)	2004-11-15 09:18:27 +00:00
Poul-Henning Kamp	51ac12ab28	Be prepared to accept NULL mountargs as part of root-mounting.	2004-11-13 13:04:31 +00:00
Poul-Henning Kamp	cf5e414960	Put back the vfs_object_create() calls, they do make a difference when my test-setup does what I want it to instead of what I ask it to. Pointed out by: tegge	2004-11-12 10:27:14 +00:00
Poul-Henning Kamp	40ce27cb57	fix some comments	2004-11-10 06:53:31 +00:00
Poul-Henning Kamp	2e6649198a	Use mount flags instead of NULL path to detect root filesystem mount.	2004-11-09 23:38:10 +00:00
Poul-Henning Kamp	5e2ccaff7a	Stop pretending to have a vm_object backing the underlying disk vnode: it isn't used for anything anywhere and the vnode_pager would explode if we attempted to.	2004-11-09 23:12:45 +00:00
Poul-Henning Kamp	5349c79d75	Properly implement a default version of VOP_GETWRITEMOUNT. Remove improper access to vop_stdgetwritemount() which should and will instead rely on the VOP default path.	2004-11-06 11:41:22 +00:00
Poul-Henning Kamp	40c340aa5d	Don't grab the exclusive bit on a root filesystem until we are willing to mount it. Doing so prevented fsck to be run after a refused mount.	2004-11-04 09:11:22 +00:00
Poul-Henning Kamp	4392001125	Move UFS from DEVFS backing to GEOM backing. This eliminates a bunch of vnode overhead (approx 1-2 % speed improvement) and gives us more control over the access to the storage device. Access counts on the underlying device are not correctly tracked and therefore it is possible to read-only mount the same disk device multiple times: syv# mount -p /dev/md0 /var ufs rw 2 2 /dev/ad0 /mnt ufs ro 1 1 /dev/ad0 /mnt2 ufs ro 1 1 /dev/ad0 /mnt3 ufs ro 1 1 Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches things in RAM) this is not possible with read-write mounts, and the system will correctly reject this. Details: Add a geom consumer and a bufobj pointer to ufsmount. Eliminate the vnode argument from softdep_disk_prewrite(). Pick the vnode out of bp->b_vp for now. Eventually we should find it through bp->b_bufobj->b_private. In the mountcode, use g_vfs_open() once we have used VOP_ACCESS() to check permissions. When upgrading and downgrading between r/o and r/w do the right thing with GEOM access counts. Remove all the workarounds for not being able to do this with VOP_OPEN(). If we are the root mount, drop the exclusive access count until we upgrade to r/w. This allows fsck of the root filesystem and the MNT_RELOAD to work correctly. Set bo_private to the GEOM consumer on the device bufobj. Change the ffs_ops->strategy function to call g_vfs_strategy() In ufs_strategy() directly call the strategy on the disk bufobj. Same in rawread. In ffs_fsync() we will no longer see VCHR device nodes, so remove code which synced the filesystem mounted on it, in case we came there. I'm not sure this code made sense in the first place since we would have taken the specfs route on such a vnode. Redo the highly bogus readblock() function in the snapshot code to something slightly less bogus: Constructing an uio and using physio was really quite a detour. Instead just fill in a bio and ship it down.	2004-10-29 10:15:56 +00:00
Poul-Henning Kamp	570a7ddaa3	We only support backing UFS/FFS with disks.	2004-10-28 06:19:28 +00:00
Poul-Henning Kamp	a40a512387	Eliminate unnecessary KASSERTS.	2004-10-27 06:45:06 +00:00
Poul-Henning Kamp	93d244fb1a	KASSERT that we only get to prewrite() on writes.	2004-10-26 20:13:49 +00:00
Poul-Henning Kamp	8dd5650594	White space changes. Add missing static.	2004-10-26 20:13:21 +00:00
Poul-Henning Kamp	53389dd64a	Replace single case switch() with if().	2004-10-26 20:12:25 +00:00
Poul-Henning Kamp	b6e2606155	Vertically align comment.	2004-10-26 20:12:00 +00:00
Poul-Henning Kamp	6e77a04170	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.	2004-10-26 10:44:10 +00:00
Poul-Henning Kamp	58883a1fe5	Fix syntax errors introduced by last commit. Why isn't DIRECTIO in NOTES/LINT ?	2004-10-26 09:04:20 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Poul-Henning Kamp	fae974f156	Degeneralize the per cdev copyonwrite callback. The only possible value is ffs_copyonwrite() and the only place it can be called from is FFS which would never want to call another filesystems copyonwrite method, should one exist, so there is no reason why anything generic should know about this.	2004-10-26 06:25:56 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00

... 5 6 7 8 9 ...

1849 Commits