freebsd-skq

Author	SHA1	Message	Date
kib	dadf5cd065	The softdep_setup_freeblocks() adds worklist items before deallocate_dependencies() is done. This opens a race between softdep thread and the thread that does the truncation: A write of the indirect block causes the freeblks to become ALLCOMPLETE while softdep_setup_freeblocks() dropped softdep lock. And then, softdep_disk_write_complete() would reassign the workitem to the mount point worklist, causing premature processing of the workitem, or journal write exhaust the fb_jfreeblkhd and handle_written_jfreeblk does the same reassign. indir_trunc() then would find the indirect block that is locked (with lock owned by kernel) but without any dependencies, causing it to hang in getblk() waiting for buffer lock. Do not mark freeblks as DEPCOMPLETE until deallocate_dependencies() finished. Analyzed, suggested and reviewed by: jeff Tested by: pho	2010-11-11 11:54:01 +00:00
kib	ac0984a653	Change #ifdef INVARIANTS panic into KASSERT, and print some useful information to diagnose the issue, in handle_complete_freeblocks(). Reviewed by: jeff Tested by: pho	2010-11-11 11:41:52 +00:00
kib	90d9ba4d7c	In journal_mount(), only set MNTK_SUJ flag after the jblocks are mapped. I believe there is a window otherwise where jblocks can be accessed without proper initialization. Reviewed by: jeff Tested by: pho	2010-11-11 11:38:57 +00:00
kib	a16cecf311	Add function lbn_offset to calculate offset of the indirect block of given level. Reviewed by: jeff Tested by: pho	2010-11-11 11:35:42 +00:00
kib	57488f7bec	Fix typo. Function is called ffs_blkfree.	2010-11-11 11:26:59 +00:00
jhb	c016e5df49	Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.	2010-11-09 20:41:10 +00:00
ivoras	4dfcef74e1	Bring vfs.ufs.dirhash_maxmem into the age of the fruitbat and make it autotuned. It is only an upper bound (the memory is not always allocated) and the system contains a vm_lowmem handler so nothing will crash and burn if it's tuned too high. Reviewed by: mckusick	2010-10-25 21:46:23 +00:00
kib	4036cd070d	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks	2010-10-10 07:05:47 +00:00
alc	e9dc33bfce	M_USE_RESERVE has been deprecated for a decade. Eliminate any uses that have no run-time effect.	2010-10-02 17:58:57 +00:00
mckusick	bca797c285	Since local variable 'i' is used only in a KASSERT, declare and initialize it only if INVARIANTS is defined to avoid a declared but unused warning. Suggested by: Brian Somers <brian@FreeBSD.org>	2010-09-29 14:46:57 +00:00
kib	bcef52d3bf	Fix typo in comment.	2010-09-29 07:40:11 +00:00
obrien	55122bfc2d	Correct some non-code typos.	2010-09-17 09:14:40 +00:00
mckusick	dd70ac636a	Update comments in soft updates code to more fully describe the addition of journalling. Only functional change is to tighten a KASSERT. Reviewed by: jeff Roberson	2010-09-14 18:04:05 +00:00
jhb	d4890c88b0	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days	2010-08-20 19:46:50 +00:00
kib	60a46e5ff9	Softdep_process_worklist() should unsuspend not only before processing the worklist (in softdep_process_journal), but also after flushing the workitems. Might be, we should even do this before bwillwrite() too, but this seems to be not needed for now. Fs might be suspended during processing the queue, and then there is nobody around to unsuspend. In collaboration with: pho Tested by: bz Reviewed by: jeff	2010-08-12 08:35:24 +00:00
jhb	7f218ea7f4	Revert the previous commit. The race is not applicable to the lockmgr implementation in 8.0 and later as its flags field does not hold dynamic state such as waiters flags, but is only modified in lockinit() aside from VN_LOCK_*(). Discussed with: attilio	2010-07-16 19:52:03 +00:00
jhb	ea417bf09a	When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case. MFC after: 3 days	2010-07-16 19:20:20 +00:00
jeff	285c3f355c	- Handle the truncation of an inode with an effective link count of 0 in the context of the process that reduced the effective count. Previously all truncation as a result of unlink happened in the softdep flush thread. This had the effect of being impossible to rate limit properly with the journal code. Now the process issuing unlinks is suspended when the journal files. This has a side-effect of improving rm performance by allowing more concurrent work. - Handle two cases in inactive, one for effnlink == 0 and another when nlink finally reaches 0. - Eliminate the SPACECOUNTED related code since the truncation is no longer delayed. Discussed with: mckusick	2010-07-06 07:11:04 +00:00
kib	e735ee5c8d	Ensure that VOP_ACCESSX is called with exclusively locked vnode for the kernel compiled with QUOTA option. ufs_accessx() upgrades the vdp vnode lock from shared to exclusive to assign the dquot structure to the vnode, and ufs_delete_denied() is called when tvp is locked. Since upgrade drops shared lock when non-blocked upgrade failed, LOR is there. Reported and tested by: Dmitry Pryanishnikov <lynx.ripe gmail com> Tested by: pho PR: kern/147890 MFC after: 1 week	2010-06-20 13:35:16 +00:00
avg	13985611dd	ffs_softdep: change K&R in function defintions to ANSI prototypes Apparently it's bad when we first have an ANSI prototype in function declaration, but then use K&R in its defintion. Complaint from: clang MFC after: 2 weeks	2010-06-11 18:26:53 +00:00
kib	881c9b1a5c	Extend the scope of the lock on the quota file vnode in quotaon() to cover the initial read by dqopen(). Assert that vnode is locked in dqopen(). Remove VFS_LOCK_GIANT() from dqopen(), since quotaon() keeps Giant locked if needed around the call.	2010-06-03 10:24:53 +00:00
avg	4e8fc6f387	ffs_mount: accept and drop userland-only options that can be passed from loader(8) In r193192 loader(8) has grown an ability to pass root mount options from fstab via vfs.root.mountfrom.options. Unfortunately, some options that can be present in fstab are for userland only and lead to root mounting failure when seen by kernel. Rather than teaching loader about FFS-specific options that should be filtered out, ffs_mount recognizes those options as valid, but ignores and deletes[1] them. [1] is suggested by jh. PR: kern/141050 Reported by: many Reviewed by: jh, bde MFC after: 4 days	2010-05-19 09:32:11 +00:00
jeff	ebb7d74dae	- Don't immediately re-run softdepflush if we didn't make any progress on the last iteration. This can lead to a deadlock when we have worklist items that cannot be immediately satisfied. Reported by: uqs, Dimitry Andric <dimitry@andric.com> - Remove some unnecessary debugging code and place some other under SUJ_DEBUG. - Examine the journal state in softdep_slowdown(). - Re-format some comments so I may more easily add flag descriptions.	2010-05-19 06:18:01 +00:00
jeff	8fb90eedbc	- Call softdep_prealloc() before any of the balloc routines in the snapshot code. - Don't fsync() vnodes in prealloc if copy on write is in progress. It is not safe to recurse back into the write path here. Reported by: Vladimir Grebenschikov <vova@fbsd.ru>	2010-05-07 08:45:21 +00:00
jeff	0b3e023908	- Use the correct flag mask when determining whether an inode has successfully made it to the free list yet or not. This fixes a deadlock that can occur with unlinked but referenced files. Journal space and inodedeps were not correctly reclaimed because the inode block was not left dirty. Tested/Reported by: lwindschuh@googlemail.com	2010-05-07 08:20:56 +00:00
mckusick	e95ff34dac	Merger of the quota64 project into head. This joint work of Dag-Erling Smørgrav and myself updates the FFS quota system to support both traditional 32-bit and new 64-bit quotas (for those of you who want to put 2+Tb quotas on your users). By default quotas are not compiled into the kernel. To include them in your kernel configuration you need to specify: options QUOTA # Enable FFS quotas If you are already running with the current 32-bit quotas, they should continue to work just as they have in the past. If you wish to convert to using 64-bit quotas, use `quotacheck -c 64'; if you wish to revert from 64-bit quotas back to 32-bit quotas, use `quotacheck -c 32'. There is a new library of functions to simplify the use of the quota system, do `man quotafile' for details. If your application is currently using the quotactl(2), it is highly recommended that you convert your application to use the quotafile interface. Note that existing binaries will continue to work. Special thanks to John Kozubik of rsync.net for getting me interested in pursuing 64-bit quota support and for funding part of my development time on this project.	2010-05-07 00:41:12 +00:00
alc	fecc56fac1	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
mckusick	b25e55dcc5	Final update to current version of head in preparation for reintegration.	2010-05-06 17:37:23 +00:00
alc	5c7ca3ee73	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00
trasz	402e3baade	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
avg	043deeb564	ffs_vfsops: restore alphabetic order of options in ffs_opts The order was not correct only for nfsv4acls. ("no" prefix is ignored) MFC after: 1 week	2010-04-29 10:04:00 +00:00
jeff	47b1b89a95	- When canceling jaddrefs they may not yet be in the journal if this is via a revert call. In this case don't attempt to remove something that has not yet been added. Otherwise this jaddref must hang around to prevent the bitmap write as normal.	2010-04-28 07:57:37 +00:00
jeff	564f436237	- Fix builds without SOFTUPDATES defined in the kernel config.	2010-04-28 07:26:41 +00:00
mckusick	3a0f5972a0	Update to current version of head.	2010-04-28 05:33:59 +00:00
pjd	57ec1f2624	Fix build for UFS without SOFTUPDATES.	2010-04-24 07:36:33 +00:00
jeff	a574495410	- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm	2010-04-24 07:05:35 +00:00
kib	d5b92466a9	The cache_enter(9) function shall not be called for doomed dvp. Assert this. In the reported panic, vdestroy() fired the assertion "vp has namecache for ..", because pseudofs may end up doing cache_enter() with reclaimed dvp, after dotdot lookup temporary unlocked dvp. Similar problem exists in ufs_lookup() for "." lookup, when vnode lock needs to be upgraded. Verify that dvp is not reclaimed before calling cache_enter(). Reported and tested by: pho Reviewed by: kan MFC after: 2 weeks	2010-04-20 10:19:27 +00:00
avg	d488f0b549	ffs_mount: remove redundant assignment of geom consumer to devvp.v_bufobj The assignment is already done in g_vfs_open. Redundant assignment is harmless, but can become a problem if g_vfs_open logic is changed. MFC after: 1 week	2010-04-03 08:25:04 +00:00
mckusick	f63b97928b	Debugging nits found while testing the new 64-bit quota code.	2010-03-16 06:12:30 +00:00
des	834fb25a9e	IFH@204581	2010-03-04 13:35:57 +00:00
kib	d068222571	When ffs_realloccg() failed to allocate bigger fragment and, because pending blocks are scheduled for removal, goes to retry the (re)allocation, clear the bp pointer. It might happen that meantime free space is really exhausted and we are entering nospace: label without bread()ing buffer, causing stale bp value to be brelse()d again. Tested by: pho (Producing a scenario to reliably reproduce the race appeared to be much harder then fixing the bug) MFC after: 1 week	2010-02-13 10:34:50 +00:00
mckusick	e7471d443b	One last pass to get all the unsigned comparisons correct.	2010-02-11 18:14:53 +00:00
mckusick	d533f2ac8c	This fix corrects a problem in the file system that treats large inode numbers as negative rather than unsigned. For a default (16K block) file system, this bug began to show up at a file system size above about 16Tb. To fully handle this problem, newfs must be updated to ensure that it will never create a filesystem with more than 2^32 inodes. That patch will be forthcoming soon. Reported by: Scott Burns, John Kilburg, Bruce Evans Followup by: Jeff Roberson PR: 133980 MFC after: 2 weeks	2010-02-10 20:10:35 +00:00
trasz	bf6995c4bb	Remove unused variable.	2010-02-10 18:56:49 +00:00
trasz	0383f8d8bb	Return proper error code. Found with: clang	2010-01-25 16:09:50 +00:00
trasz	f346ef85e4	Move out code that does POSIX.1e ACL inheritance into separate routines. Reviewed by: rwatson	2010-01-24 15:12:27 +00:00
mckusick	94b44c0969	Cast 64-bit quantity to intptr_t rather than int so as to work properly with 64-bit architectures (such as amd64). Reported by: bz	2010-01-11 22:42:06 +00:00
mckusick	0cddeb2cb4	Background: When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson	2010-01-11 20:44:05 +00:00
mbr	7450f52a57	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
mckusick	3d4c810fbe	KASSERT that condition raised by Coverity cannot happen. Found by: Coverity Prevent (tm) KASSERT by: sam	2010-01-07 06:20:07 +00:00

1 2 3 4 5 ...

1684 Commits