freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	6c21f6edb8	The VOP_LOOKUP() implementations for CREATE op do not put the name into namecache, to avoid cache trashing when doing large operations. E.g., tar archive extraction is not usually followed by access to many of the files created. Right now, each VOP_LOOKUP() implementation explicitely knowns about this quirk and tests for both MAKEENTRY flag presence and op != CREATE to make the call to cache_enter(). Centralize the handling of the quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP. VFS now sets NOCACHE flag for CREATE namei() calls. Note that the change in semantic is backward-compatible and could be merged to the stable branch, and is compatible with non-changed third-party filesystems which correctly handle MAKEENTRY. Suggested by: Chris Torek <torek@pi-coral.com> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-18 10:01:12 +00:00
Pedro F. Giffuni	4896af9f55	ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. MFC after: 3 days Reviewed by: mckusick	2014-03-02 02:52:34 +00:00
Konstantin Belousov	2aea094f65	Only copy as much bytes as there in superblock, instead of the full block copy, when copying the superblock into the snapshot. UFS1 does not align superblock on the block boundary, and bcopy runs off the end of the buffer. Reported by: Andre Albsmeier <Andre.Albsmeier@siemens.com> Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-12 18:52:33 +00:00
Konstantin Belousov	cc3d8c35f5	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2013-07-09 20:49:32 +00:00
Jeff Roberson	22a722605d	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf	2013-05-31 00:43:41 +00:00
Konstantin Belousov	ddd6b3fc33	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
Konstantin Belousov	f99cb34c4f	The process_deferred_inactive() function locks the vnodes of the ufs mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-01-01 16:14:48 +00:00
Konstantin Belousov	91e9474552	Make it possible to atomically resume writes on the mount and account the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD	2012-12-28 23:08:30 +00:00
Matthew D Fleming	fc8fdae0df	Fix up kernel sources to be ready for a 64-bit ino_t. Original code by: Gleb Kurtsou	2012-09-27 23:30:49 +00:00
Kevin Lo	f7a3729c91	Use NULL instead of 0 for pointers	2012-07-22 15:40:31 +00:00
Edward Tomasz Napierala	c52fd858ae	Remove unused thread argument from vtruncbuf(). Reviewed by: kib	2012-04-23 13:21:28 +00:00
Kirk McKusick	71469bb38f	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-17 16:28:22 +00:00
Kirk McKusick	ecb6e528c5	Export vinactive() from kern/vfs_subr.c (e.g., make it no longer static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks	2012-04-11 23:01:11 +00:00
Kirk McKusick	75a5838904	Add a third flags argument to ffs_syncvnode to avoid a possible conflict with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib	2012-03-25 00:02:37 +00:00
Kirk McKusick	35338e6091	This change avoids a kernel deadlock on "snaplk" when using snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks	2012-03-01 18:45:25 +00:00
Kirk McKusick	86b571509a	There are several bugs/hangs when trying to take a snapshot on a UFS/FFS filesystem running with journaled soft updates. Until these problems have been tracked down, return ENOTSUPP when an attempt is made to take a snapshot on a filesystem running with journaled soft updates. MFC after: 2 weeks	2012-01-17 01:14:56 +00:00
Kirk McKusick	8cd680f8c3	This update eliminates a lock-order reversal warning discovered whle tracking down the system hang reported in kern/160662 and corrected in revision 225806. The LOR is not the cause of the system hang and indeed cannot cause an actual deadlock. However, it can be easily eliminated by defering the acquisition of a buflock until after all the vnode locks have been acquired. Reported by: Hans Ottevanger PR: kern/160662	2011-09-27 17:41:48 +00:00
Kirk McKusick	6b3b8a2109	This update eliminates the system hang reported in kern/160662 when taking a snapshot on a filesystem running with journaled soft updates. Reported by: Hans Ottevanger Fix verified by: Hans Ottevanger PR: kern/160662	2011-09-27 17:34:02 +00:00
Kirk McKusick	9957ac07b2	Fixed dereference of a NULL pointer. Reported by: Peter Holm	2011-06-18 21:10:03 +00:00
Kirk McKusick	43a3cc7796	Ensure that filesystem metadata contained within persistent snapshots is always kept consistent. Suggested by: Jeff Roberson	2011-06-15 23:19:09 +00:00
Kirk McKusick	9eb8728aa5	Update to soft updates journaling to properly track freed blocks that get claimed by snapshots. Submitted by: Jeff Roberson Tested by: Peter Holm	2011-06-12 19:27:05 +00:00
Alexander Leidinger	3eb6e1317c	Add some FEATURE macros for some UFS features. SU+J is not included as a FEATURE macro: - it was not in the tree during the GSoC - I do not see an option to en-/disable it in NOTES Two minor changes where made during the review compared to what was developed during GSoC 2010. No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: kib X-MFC after: to be determined in last commit with code from this project	2011-02-09 15:33:13 +00:00
Jeff Roberson	8ef48de888	- Call softdep_prealloc() before any of the balloc routines in the snapshot code. - Don't fsync() vnodes in prealloc if copy on write is in progress. It is not safe to recurse back into the write path here. Reported by: Vladimir Grebenschikov <vova@fbsd.ru>	2010-05-07 08:45:21 +00:00
Jeff Roberson	113db2dddb	- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm	2010-04-24 07:05:35 +00:00
Martin Blapp	c2ede4b379	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
Robert Watson	885868cd8f	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
John Baldwin	5bd65606f4	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month	2009-03-09 19:35:20 +00:00
Doug Ambrisko	f1c1cdbb9c	For now on every 10 cyclinder groups flush the buffer cache to free up space. If the buffer cache fills up then the disk systems can grind to a halt. Better tuning can be figured out later. Tested by: Tim, others and work Reviewed by: Kostik Belousov PR: 128832	2008-11-13 17:40:21 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Konstantin Belousov	4560452f01	Sync up summary information for cylinder groups while data is already in memory during snapshot creation. This improves the results of the background fsck. Submitted by: tegge MFC after: 1 week	2008-10-13 14:05:01 +00:00
Konstantin Belousov	2814d5ba5f	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month	2008-09-16 11:51:06 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
Konstantin Belousov	57b4252e45	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 12:01:21 +00:00
Jeff Roberson	9c0cdb8253	- Don't free snapdata structures when they are no longer in use. Keeping the lockmgr lock valid allows us to switch the v_lock pointer in snapshot vnodes between the embedded lockmgr lock and snapdata lock without needing the vnode interlock to protect against races - Keep unused snapdata structures in a list. - Add a function to lock the devvp and allocate a snapdata to it or acquire a new one without races. The old function was safe from creation races because we set the mount flag when creating snapshots and thus serializing them. However, it might have been subject to destroying races. Reviewed by: tegge	2008-03-31 07:47:08 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Attilio Rao	0e9eb108f0	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo	2008-01-24 12:34:30 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
David E. O'Brien	1102b89baa	Turn most ffs 'DIAGNOSTIC's into INVARIANTS.	2007-11-08 17:21:51 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Konstantin Belousov	5b959aa44f	Fix the NAMEI zone leak when snapshot was successfully created. Reported and tested by: Peter Holm MFC after: 2 weeks	2007-04-10 09:31:42 +00:00
Xin LI	04533fc68e	Use *_EMPTY macros when appropriate.	2007-04-04 07:29:53 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Mike Pritchard	db9b81eabc	Quota system cleanup. 1) Do not do quota accounting for the actual quota data files or for file system snapshot files ("system" files). This prevents a deadlock descibed in PR kern/30958 if the kernel ever has to grow the quota file. Snapshot files were already exempt from the quota checks, but this change generalized the check. 2) Fix a cast that caused extremely large uids/gids to incorrectly write the quota information to the data file at a truncated value for a uint_t32 id value. The incorrect cast caused quota files in this case to be around 4GB in size, with the correct cast they can now be 131GB in size. Also related to PR kern/30958. 3) Check for what appear to be negative UIDs/GIDs and not account for them. This prevents the quota files from becoming 131GB in size and causing quotacheck to run forever at bootup. This could also cause the kernel to try and expand the quota file, which might deadlock due to the issue in #1. kern/30958 and kern/38156 (and some much older closed PR's). 4) With the deadlock problems gone, the kernel can now expand the size of the quota database files if it needs to. 5) Pass in the i-node count change value to chkiq and chkiqchg as an int, like it used to be before the common routine was split up into 2 different routines to increase / decrease the i-node in-use count. Prevents an underflow on the i-node count. Related to PR kern/89247. 6) Prevent the block usage from growing slowly if a file system is full and the write was denied due to that fact. PR kern/89247. Some of these changes require an updated quotacheck to prevent the creation of huge (131GB) quota data files (item #3). #1/#4 probably fixes a lot of the random hangs when quotas are enabled, possibly some of the jail hangs.	2007-01-20 11:58:32 +00:00
Konstantin Belousov	ec7a247a24	Do not translate the IN_ACCESS inode flag into the IN_MODIFIED while filesystem is suspending/suspended. Doing so may result in deadlock. Instead, set the (new) IN_LAZYACCESS flag, that becomes IN_MODIFIED when suspend is lifted. Change the locking protocol in order to set the IN_ACCESS and timestamps without upgrading shared vnode lock to exclusive (see comments in the inode.h). Before that, inode was modified while holding only shared lock. Tested by: Peter Holm Reviewed by: tegge, bde Approved by: pjd (mentor) MFC after: 3 weeks	2006-10-10 09:20:54 +00:00
Tor Egge	9b65c22cf4	Don't restore MNT_QUOTA bit in mnt_flag after snapshot creation, closing a race between nmount() and quotactl().	2006-09-26 04:19:11 +00:00
Tor Egge	5da56ddb21	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().	2006-09-26 04:12:49 +00:00
Konstantin Belousov	3f65847e2f	While checking for update of snapshot file in the ffs_copyonwrite, first filter out metadata update. Otherwise, devfs vnode could be erronously interpreted as ufs one, causing further check of i_flags to use random memory. PR: kern/100365 Debugged and fix described by: tegge Approved by: pjd (mentor) MFC after: 2 weeks	2006-08-21 17:20:19 +00:00
Tor Egge	e0cf717542	Read block hints list from last snapshot on the active snapshot list.	2006-05-16 00:14:20 +00:00
Tor Egge	d93d98d98f	Copy last block on file system again after file system has been suspended. Obtained from: NetBSD	2006-05-15 23:18:49 +00:00

1 2 3 4

174 Commits