freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	22a722605d	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf	2013-05-31 00:43:41 +00:00
Kirk McKusick	97371fa56d	Properly spell sentinel (missed in 250891) No functional changes. Spotted by: Navdeep Parhar and Alexey Dokuchaev MFC after: 2 weeks	2013-05-22 05:07:55 +00:00
Kirk McKusick	b1bd9340fa	Add missing buffer releases (brelse) after bread calls that return an error. One could argue that returning a buffer even when it is not valid is incorrect, but bread has always returned a buffer valid or not. Reviewed by: kib MFC after: 2 weeks	2013-05-22 00:57:22 +00:00
Kirk McKusick	21844a3d5d	Add missing 28th element to softdep types name array. Found by: Coverity Scan, CID 1007621 Reviewed by: kib MFC after: 2 weeks	2013-05-22 00:48:24 +00:00
Kirk McKusick	d80dbbdb4a	Null a pointer after it is freed so that when it is returned the return value is NULL. Based on the returned flags, the return value should never be inspected in the case where NULL is returned, but it is good coding practice not to return a pointer to freed memory. Found by: Coverity Scan, CID 1006096 Reviewed by: kib MFC after: 2 weeks	2013-05-22 00:40:26 +00:00
Kirk McKusick	64e2b0887c	Remove a bogus check for a NULL buffer pointer. Add a KASSERT that it is not NULL. Found by: Coverity Scan, CID 1009114 Reviewed by: kib MFC after: 2 weeks	2013-05-22 00:30:34 +00:00
Kirk McKusick	13e369a747	Properly spell sentinel (not sintenel or sentinal). No functional changes. Spotted by: kib MFC after: 2 weeks	2013-05-22 00:17:50 +00:00
Jeff Roberson	26089666b6	Prepare to replace the buf splay with a trie: - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division	2013-04-06 22:21:23 +00:00
Kirk McKusick	cd861931e6	The code in clear_remove() and clear_inodedeps() skips one entry in the pagedep and inodedep hash tables. An entry in the table is skipped because 'pagedep_hash' and 'inodedep_hash' hold the size of the hash tables - 1. The chance that this would have any operational failure is extremely unlikely. These funtions only need to find a single entry and are only called when there are too many entries. The chance that they would fail because all the entries are on the single skipped hash chain are remote. Submitted by: Pedro Martelletto Reviewed by: kib MFC after: 2 weeks	2013-04-03 19:26:32 +00:00
Konstantin Belousov	ba05dec5a4	The softdep freeblks workitem might hold a reference on the dquot. Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks	2013-02-27 07:32:39 +00:00
Konstantin Belousov	ddd6b3fc33	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
Attilio Rao	b1308d72c2	Fixup r218424: uio_yield() was scaling directly to userland priority. When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week	2012-12-21 13:14:12 +00:00
Jeff Roberson	ad9cdc05ba	- Fix a truncation bug with softdep journaling that could leak blocks on crash. When truncating a file that never made it to disk we use the canceled allocation dependencies to hold the journal records until the truncation completes. Previously allocdirect dependencies on the id_bufwait list were not considered and their journal space could expire before the bitmaps were written. Cancel them and attach them to the freeblks as we do for other allocdirects. - Add KTR traces that were used to debug this problem. - When adding jsegdeps, always use jwork_insert() so we don't have more than one segdep on a given jwork list. Sponsored by: EMC / Isilon Storage Division	2012-11-14 06:37:43 +00:00
Jeff Roberson	b2c29d39cd	- Fix a bug that has existed since the original softdep implementation. When a background copy of a cg is written we complete any work associated with that bmsafemap. If new work has been added to the non-background copy of the buffer it will be completed before the next write happens. The solution is to do the rollbacks when we make the copy so only those dependencies that were present at the time of writing will be completed when the background write completes. This would've resulted in various bitmap related corruptions and panics. It also would've expired journal entries early causing journal replay to miss some records. MFC after: 2 weeks	2012-11-12 19:53:55 +00:00
Jeff Roberson	53cc0bebb9	- Correct rev 242734, segments can sometimes get stuck. Be a bit more defensive with segment state. Reported by: b. f. <bf1783@googlemail.com>	2012-11-09 04:04:25 +00:00
Jeff Roberson	40b43503c0	- Implement BIO_FLUSH support around journal entries. This will not 100% solve power loss problems with dishonest write caches. However, it should improve the situation and force a full fsck when it is unable to resolve with the journal. - Resolve a case where the journal could wrap in an unsafe way causing us to prematurely lose journal entries in very specific scenarios. Discussed with: mckusick MFC after: 1 month	2012-11-08 01:41:04 +00:00
Jeff Roberson	6d95eb4c5f	- In cancel_mkdir_dotdot don't panic if the inodedep is not available. If the previous diradd had already finished it could have been reclaimed already. This would only happen under heavy dependency pressure. Reported by: Andrey Zonov <zont@FreeBSD.org> Discussed with: mckusick MFC after: 1 week	2012-11-02 21:04:06 +00:00
Edward Tomasz Napierala	f1988d463c	Fix two problems that caused instant panic when the device mounted with softupdates went away. Note that this does not fix the problem entirely; I'm committing it now to make it easier for someone to pick up the work. Reviewed by: mckusick	2012-10-28 18:53:28 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Matthew D Fleming	fc8fdae0df	Fix up kernel sources to be ready for a 64-bit ino_t. Original code by: Gleb Kurtsou	2012-09-27 23:30:49 +00:00
Kirk McKusick	aa445c9d7c	In softdep_setup_inomapdep() we may have to allocate both inodedep and bmsafemap dependency structures in inodedep_lookup() and bmsafemap_lookup() respectively. The setup of these structures must be done while holding the soft-dependency mutex. If the inodedep is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the bmsafemap. If the bmsafemap is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the inodedep. To resolve this problem, bmsafemap_lookup has had a parameter added that allows a pre-malloc'ed bmsafemap to be passed in so that it does not need to release the mutex to create a new bmsafemap. The softdep_setup_inomapdep() routine pre-malloc's a bmsafemap dependency before acquiring the mutex and starting to build the inodedep with a call to inodedep_lookup(). The subsequent call to bmsafemap_lookup() is passed this pre-allocated bmsafemap entry so that it need not release the mutex if it needs to create a new one. Reported by: Peter Holm Tested by: Peter Holm MFC after: 1 week	2012-06-11 23:07:21 +00:00
Kirk McKusick	8b6207110d	Add missing `continue' statement at end of case. Found by: Kevin Lo (kevlo@) MFC after: 1 week	2012-05-18 15:20:21 +00:00
Edward Tomasz Napierala	26621e1f06	Remove unused thread argument from clear_inodeps() and clear_remove().	2012-04-23 14:44:18 +00:00
Kirk McKusick	71469bb38f	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-17 16:28:22 +00:00
Kirk McKusick	23d6e518da	A file cannot be deallocated until its last name has been removed and it is no longer referenced by a user process. The inode for a file whose name has been removed, but is still referenced at the time of a crash will still be allocated in the filesystem, but will have no references (e.g., they will have no names referencing them from any directory). With traditional soft updates these unreferenced inodes will be found and reclaimed when the background fsck is run. When using journaled soft updates, the kernel must keep track of these inodes so that it can find and reclaim them during the cleanup process. Their existence cannot be stored in the journal as the journal only handles short-term events, and they may persist for days. So, they are tracked by keeping them in a linked list whose head pointer is stored in the superblock. The journal tracks them only until their linked list pointers have been commited to disk. Part of the cleanup process involves traversing the list of unreferenced inodes and reclaiming them. This bug was triggered when confusion arose in the commit steps of keeping the unreferenced-inode linked list coherent on disk. Notably, a race between the link() system call adding a link-count to a file and the unlink() system call removing a link-count to the file. Here if the unlink() ran after link() had looked up the file but before link() had incremented the link-count of the file, the file's link-count would drop to zero before the link() incremented it back up to one. If the file was referenced by a user process, the first transition through zero made it appear that it should be added to the unreferenced-inode list when in fact it should not have been added. If the new name created by link() was deleted within a few seconds (with the file still referenced by a user process) it would legitimately be a candidate for addition to the unreferenced-inode list. The result was that there were two attempts to add the same inode to the unreferenced-inode list which scrambled the unreferenced-inode list's pointers leading to a panic. The fix is to detect and avoid the false attempt at adding it to the unreferenced-inode list by having the link() system call check to see if the link count is zero before it increments it. If it is, the link() fails with ENOENT (showing that it has failed the link()/unlink() race). While tracking down this bug, we have added additional assertions to detect the problem sooner and also simplified some of the code. Reported by: Kirk Russell Fix submitted by: Jeff Roberson Tested by: Peter Holm PR: kern/159971 MFC (to 9 only): 2 weeks	2012-04-02 21:58:37 +00:00
Kirk McKusick	75a5838904	Add a third flags argument to ffs_syncvnode to avoid a possible conflict with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib	2012-03-25 00:02:37 +00:00
Konstantin Belousov	064f517d2b	Supply boolean as the second argument to ffs_update(), and not a MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks	2012-03-13 22:04:27 +00:00
Konstantin Belousov	38ddb5725b	Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week	2012-03-09 00:12:05 +00:00
Kirk McKusick	35338e6091	This change avoids a kernel deadlock on "snaplk" when using snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks	2012-03-01 18:45:25 +00:00
Kirk McKusick	e8e848ef8e	Missing conditions in checking whether an inode has been written. Found and tested by: Peter Holm MFC after: 2 weeks (to 9 only)	2012-02-13 01:33:39 +00:00
Konstantin Belousov	752a98b13e	Add missing opt_quota.h include to activate #ifdef QUOTA blocks, apparently a step in unbreaking QUOTA support. Reported and tested by: Adam Strohl <adams-freebsd ateamsystems com> MFC after: 1 week	2012-02-06 17:59:14 +00:00
Konstantin Belousov	b313a71044	JNEWBLK dependency may legitimately appear on the buf dependency list. If softdep_sync_buf() discovers such dependency, it should do nothing, which is safe as it is only waiting on the parent buffer to be written, so it can be removed. Committed on behalf of: jeff MFC after: 1 week	2012-02-06 11:47:24 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Konstantin Belousov	b296414c62	Use nowait sync request for a vnode when doing softdep cleanup. We possibly own the unrelated vnode lock, doing waiting sync causes deadlocks. Reported and tested by: pho Approved by: re (bz)	2011-09-20 21:53:26 +00:00
Martin Matuska	82378711f9	Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week	2011-08-25 08:17:39 +00:00
Kirk McKusick	fddf7baebe	Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEP is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)	2011-07-30 00:43:18 +00:00
Kirk McKusick	d716efa9f7	Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flag so that it is visible to userland programs. This change enables the `mount' command with no arguments to be able to show if a filesystem is mounted using journaled soft updates as opposed to just normal soft updates. Approved by: re (bz)	2011-07-24 18:27:09 +00:00
Kirk McKusick	b8ea56d7e4	Consistently check mount flag (MNTK_SUJ) rather than superblock flag (FS_SUJ) when determining whether to do journaling-based operations. The mount flag is set only when journaling is active while the superblock flag is set to indicate that journaling is to be used. For example, when the filesystem is mounted read-only, the journaling may be present (FS_SUJ) but not active (MNTK_SUJ). Inappropriate checking of the FS_SUJ flag was causing some journaling actions to be attempted at inappropriate times.	2011-07-14 18:06:13 +00:00
Jeff Roberson	e9b4d8327f	- Speed up pendingblock processing again. Having too much delay between ffs_blkfree() and the pending adjustment causes all kinds of space related problems.	2011-07-04 22:08:04 +00:00
Jeff Roberson	f2803e61fa	- Handle D_JSEGDEP in the softdep_sync_buf() switch. These can now find themselves on snapshot vnodes. Reported by: pho	2011-07-04 21:04:25 +00:00
Jeff Roberson	8e4f5b70b0	- It is impossible to run request_cleanup() while doing a copyonwrite. This will most likely cause new block allocations which can recurse into request cleanup. - While here optimize the ufs locking slightly. We need only acquire and drop once. - process_removes() and process_truncates() also is only needed once. - Attempt to flush each item on the worklist once but do not loop forever if some can not be completed. Discussed with: mckusick	2011-07-04 20:53:55 +00:00
Kirk McKusick	08af0c8b8d	Handle the FREEDEP case in softdep_sync_buf(). This fix failed to get added in -r223325. Submitted by: Peter Holm	2011-06-29 22:12:43 +00:00
Jeff Roberson	16f7d82285	- Fix directory count rollbacks by passing the mode to the journal dep earlier. - Add rollback/forward code for frag and cluster accounting. - Handle the FREEDEP case in softdep_sync_buf(). (submitted by pho)	2011-06-20 03:25:09 +00:00
Kirk McKusick	43a3cc7796	Ensure that filesystem metadata contained within persistent snapshots is always kept consistent. Suggested by: Jeff Roberson	2011-06-15 23:19:09 +00:00
Kirk McKusick	e34a713594	Missing cleanup case after completion of a snapshot vnode write claiming a released block. Submitted by: Jeff Roberson Tested by: Peter Holm	2011-06-15 06:13:08 +00:00
Kirk McKusick	9eb8728aa5	Update to soft updates journaling to properly track freed blocks that get claimed by snapshots. Submitted by: Jeff Roberson Tested by: Peter Holm	2011-06-12 19:27:05 +00:00
Kirk McKusick	9420dc62cd	Disable the soft updates journaling after a filesystem is successfully downgraded to read-only. It will be restarted if the filesystem is upgraded back to read-write.	2011-06-12 18:46:48 +00:00
Jeff Roberson	280e091a99	Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho	2011-06-10 22:48:35 +00:00
Matthew D Fleming	3d08a76bbc	Use a name instead of a magic number for kern_yield(9) when the priority should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >	2011-05-13 05:27:58 +00:00
Jeff Roberson	273ca85137	- Refactor softdep_setup_freeblocks() into a set of functions to prepare for a new journal specific partial truncate routine. - Use dep_current[] in place of specific dependency counts. This is automatically maintained when workitems are allocated and has less risk of becoming incorrect.	2011-04-11 01:43:59 +00:00

1 2 3 4 5 ...

321 Commits