freebsd-nq

Author	SHA1	Message	Date
John Baldwin	8587289fb8	The UFS dirhash code was attempting to update shared state in the dirhash from multiple threads while holding a shared lock during a lookup operation. This could result in incorrect ENOENT failures which could then be permanently stored in the name cache. Specifically, the dirhash code optimizes the case that a single thread is walking a directory sequentially opening (or stat'ing) each file. It uses state in the dirhash structure to determine if a given lookup is using the optimization. If the optimization fails, it disables it and restarts the lookup. The problem arises when two threads both attempt the optimization and fail. The first thread will restart the loop, but the second thread will incorrectly think that it did not try the optimization and will only examine a subset of the directory entires in its hash chain. As a result, it may fail to find its directory entry and incorrectly fail with ENOENT. To make this safe for use with shared locks, simplify the state stored in the dirhash and move some of the state (the part that determines if the current thread is trying the optimization) into a local variable. One result is that we will now try the optimization more often. We still update the value under the shared lock, but it is a single atomic store similar to i_diroff that is stored in UFS directory i-nodes for the non-dirhash lookup. Reviewed by: kib MFC after: 1 week	2011-03-07 18:33:29 +00:00
John Baldwin	96e1934a43	Use ffs() to locate free bits in the inode bitmap rather than a loop with bit shifts. Reviewed by: mckusick MFC after: 1 month	2011-03-04 22:26:41 +00:00
Konstantin Belousov	c30c6a2311	v_mountedhere is a member of the union. Check that the vnodes have proper type before using the member. Reported and tested by: Michael Butler <imb protected-networks net>	2011-02-19 07:47:25 +00:00
Konstantin Belousov	455a6e0ff3	Use the native sector size of the device backing the UFS volume for SU+J journal blocks, instead of hard coding 512 byte sector size. Journal need to atomically write the block, that can only be guaranteed at the device sector size, not larger. Attempt to write less then sector size results in driver errors. Note that this is the first structure in UFS that depends on the sector size. Other elements are written in the units of fragments. In collaboration with: pho Reviewed by: jeff Tested by: bz, pho	2011-02-12 12:52:12 +00:00
Alexander Leidinger	3502f01f20	Wrap long line. Noticed by: bz	2011-02-10 08:06:56 +00:00
Alexander Leidinger	3eb6e1317c	Add some FEATURE macros for some UFS features. SU+J is not included as a FEATURE macro: - it was not in the tree during the GSoC - I do not see an option to en-/disable it in NOTES Two minor changes where made during the review compared to what was developed during GSoC 2010. No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: kib X-MFC after: to be determined in last commit with code from this project	2011-02-09 15:33:13 +00:00
Matthew D Fleming	e7ceb1e99b	Based on discussions on the svn-src mailing list, rework r218195: - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.	2011-02-08 00:16:36 +00:00
Matthew D Fleming	08b163fa51	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
Sergey Kandaurov	b04409f5fc	Embed a quota error message (C string) into uprintf() fmt. While here, fix whitespaces. Approved by: kib (mentor)	2011-01-13 16:29:27 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
Konstantin Belousov	ac32f1176b	Instead of incrementing freework reference counter in indir_trunc(), do it at the allocation time for journaled fs and indirect blocks, when the allocated object is not accessible outside. Requested and reviewed by: jeff Tested by: pho	2011-01-04 10:25:55 +00:00
Konstantin Belousov	465e3ccdbb	Handle missing jremrefs when a directory is renamed overtop of another, deleting it. If the directory is removed, UFS always need to remove the .. ref, even if the ultimate ref on the parent would not change. The new directory must have a new journal entry for that ref. Otherwise journal processing would not properly account for the parent's reference since it will belong to a removed directory entry. Change ufs_rename()'s dotdot rename section to always setup_dotdot_link(). In the tip != NULL case SUJ needs the newref dependency allocated via setup_dotdot_link(). Stop setting isrmdir to 2 for newdirrem() in softdep_setup_remove(). Remove the isdirrem > 1 checks from newdirrem(). Reported by: many Submitted by: jeff Tested by: pho	2010-12-30 10:52:07 +00:00
Konstantin Belousov	42a6fc4385	In indir_trunc(), when processing jnewblk entries that are not written to the disk, recurse to handle indirect blocks of next level that are hidden by the corresponding entry. In collaboration with: pho Reviewed by: jeff, mckusick Tested by: mckusick, pho	2010-12-30 10:41:17 +00:00
Konstantin Belousov	8c2a54de80	Add kernel side support for BIO_DELETE/TRIM on UFS. The FS_TRIM fs flag indicates that administrator requested issuing of TRIM commands for the volume. UFS will only send the command to disk if the disk reports GEOM::candelete attribute. Since disk queue is reordered, data block is marked as free in the bitmap only after TRIM command completed. Due to need to sleep waiting for i/o to finish, TRIM bio_done routine schedules taskqueue to set the bitmap bit. Based on the patch by: mckusick Reviewed by: mckusick, pjd Tested by: pho MFC after: 1 month	2010-12-29 12:25:28 +00:00
Konstantin Belousov	d2d6c59245	Move the definition of mkdirlisthd from header to C file. Reviewed by: mckusick Tested by: pho	2010-12-29 12:16:06 +00:00
Konstantin Belousov	abf6c181e4	Use a proper type for the variable holding the summary size of the inode data. Otherwise, on 32bit systems, unlinked inode which size is the multiple of 4GB was not truncated, causing corruption. Reported by: brucec Reviewed by: mckusick Tested by: pho	2010-12-29 11:19:39 +00:00
Kirk McKusick	84ad0a66d0	This patch fixes a soft update panic while running perl 5.12 tests which produced: panic: indir_trunc: Index out of range -148 parent -2061 lbn -305164 Reported by: Dimitry Andric Fixed by: Jeff Roberson	2010-12-23 00:38:57 +00:00
Konstantin Belousov	fddd463dc2	Journal start looks up .sujournal file by doing lookup on the root dvp. As result, failed softdep_mount() might leave up to two vnodes on the mp mountlist, preventing mnt_ref from going to zero. Call ffs_flushfiles() after failed softdep_mount() to clean mountlist. Initial report by: Garrett Cooper Reproduced and tested by: pho	2010-12-01 21:19:11 +00:00
Peter Holm	bcc5c95b6b	First step in fixing the handle_workitem_freeblocks panic. In collaboration with: kib	2010-11-27 20:27:07 +00:00
Kirk McKusick	18709a09ed	Delete /sys/ufs/ffs/README.snapshot as it is no longer relevant. Drop reference to it in mount(8). MFC: 3 days	2010-11-20 18:40:50 +00:00
Konstantin Belousov	730b63b0c2	Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c	2010-11-19 21:17:34 +00:00
Konstantin Belousov	be913821af	The softdep_setup_freeblocks() adds worklist items before deallocate_dependencies() is done. This opens a race between softdep thread and the thread that does the truncation: A write of the indirect block causes the freeblks to become ALLCOMPLETE while softdep_setup_freeblocks() dropped softdep lock. And then, softdep_disk_write_complete() would reassign the workitem to the mount point worklist, causing premature processing of the workitem, or journal write exhaust the fb_jfreeblkhd and handle_written_jfreeblk does the same reassign. indir_trunc() then would find the indirect block that is locked (with lock owned by kernel) but without any dependencies, causing it to hang in getblk() waiting for buffer lock. Do not mark freeblks as DEPCOMPLETE until deallocate_dependencies() finished. Analyzed, suggested and reviewed by: jeff Tested by: pho	2010-11-11 11:54:01 +00:00
Konstantin Belousov	496fd81362	Change #ifdef INVARIANTS panic into KASSERT, and print some useful information to diagnose the issue, in handle_complete_freeblocks(). Reviewed by: jeff Tested by: pho	2010-11-11 11:41:52 +00:00
Konstantin Belousov	d23c72cdb5	In journal_mount(), only set MNTK_SUJ flag after the jblocks are mapped. I believe there is a window otherwise where jblocks can be accessed without proper initialization. Reviewed by: jeff Tested by: pho	2010-11-11 11:38:57 +00:00
Konstantin Belousov	fae5c47dd4	Add function lbn_offset to calculate offset of the indirect block of given level. Reviewed by: jeff Tested by: pho	2010-11-11 11:35:42 +00:00
Konstantin Belousov	4e4ff01629	Fix typo. Function is called ffs_blkfree.	2010-11-11 11:26:59 +00:00
John Baldwin	b3e3402d3a	Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.	2010-11-09 20:41:10 +00:00
Ivan Voras	8e431dd6f1	Bring vfs.ufs.dirhash_maxmem into the age of the fruitbat and make it autotuned. It is only an upper bound (the memory is not always allocated) and the system contains a vm_lowmem handler so nothing will crash and burn if it's tuned too high. Reviewed by: mckusick	2010-10-25 21:46:23 +00:00
Konstantin Belousov	d0cc54f3b4	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks	2010-10-10 07:05:47 +00:00
Alan Cox	a03e344a7f	M_USE_RESERVE has been deprecated for a decade. Eliminate any uses that have no run-time effect.	2010-10-02 17:58:57 +00:00
Kirk McKusick	e69bed360f	Since local variable 'i' is used only in a KASSERT, declare and initialize it only if INVARIANTS is defined to avoid a declared but unused warning. Suggested by: Brian Somers <brian@FreeBSD.org>	2010-09-29 14:46:57 +00:00
Konstantin Belousov	063045a555	Fix typo in comment.	2010-09-29 07:40:11 +00:00
David E. O'Brien	59b3a4ebb5	Correct some non-code typos.	2010-09-17 09:14:40 +00:00
Kirk McKusick	c0b2efce9e	Update comments in soft updates code to more fully describe the addition of journalling. Only functional change is to tighten a KASSERT. Reviewed by: jeff Roberson	2010-09-14 18:04:05 +00:00
John Baldwin	3634d5b241	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days	2010-08-20 19:46:50 +00:00
Konstantin Belousov	691401eef8	Softdep_process_worklist() should unsuspend not only before processing the worklist (in softdep_process_journal), but also after flushing the workitems. Might be, we should even do this before bwillwrite() too, but this seems to be not needed for now. Fs might be suspended during processing the queue, and then there is nobody around to unsuspend. In collaboration with: pho Tested by: bz Reviewed by: jeff	2010-08-12 08:35:24 +00:00
John Baldwin	61e1c19319	Revert the previous commit. The race is not applicable to the lockmgr implementation in 8.0 and later as its flags field does not hold dynamic state such as waiters flags, but is only modified in lockinit() aside from VN_LOCK_*(). Discussed with: attilio	2010-07-16 19:52:03 +00:00
John Baldwin	dbfcf8cfea	When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case. MFC after: 3 days	2010-07-16 19:20:20 +00:00
Jeff Roberson	9f9c8c59ae	- Handle the truncation of an inode with an effective link count of 0 in the context of the process that reduced the effective count. Previously all truncation as a result of unlink happened in the softdep flush thread. This had the effect of being impossible to rate limit properly with the journal code. Now the process issuing unlinks is suspended when the journal files. This has a side-effect of improving rm performance by allowing more concurrent work. - Handle two cases in inactive, one for effnlink == 0 and another when nlink finally reaches 0. - Eliminate the SPACECOUNTED related code since the truncation is no longer delayed. Discussed with: mckusick	2010-07-06 07:11:04 +00:00
Konstantin Belousov	427ef27ec7	Ensure that VOP_ACCESSX is called with exclusively locked vnode for the kernel compiled with QUOTA option. ufs_accessx() upgrades the vdp vnode lock from shared to exclusive to assign the dquot structure to the vnode, and ufs_delete_denied() is called when tvp is locked. Since upgrade drops shared lock when non-blocked upgrade failed, LOR is there. Reported and tested by: Dmitry Pryanishnikov <lynx.ripe gmail com> Tested by: pho PR: kern/147890 MFC after: 1 week	2010-06-20 13:35:16 +00:00
Andriy Gapon	d89c217f30	ffs_softdep: change K&R in function defintions to ANSI prototypes Apparently it's bad when we first have an ANSI prototype in function declaration, but then use K&R in its defintion. Complaint from: clang MFC after: 2 weeks	2010-06-11 18:26:53 +00:00
Konstantin Belousov	db875cbd74	Extend the scope of the lock on the quota file vnode in quotaon() to cover the initial read by dqopen(). Assert that vnode is locked in dqopen(). Remove VFS_LOCK_GIANT() from dqopen(), since quotaon() keeps Giant locked if needed around the call.	2010-06-03 10:24:53 +00:00
Andriy Gapon	0b9626482b	ffs_mount: accept and drop userland-only options that can be passed from loader(8) In r193192 loader(8) has grown an ability to pass root mount options from fstab via vfs.root.mountfrom.options. Unfortunately, some options that can be present in fstab are for userland only and lead to root mounting failure when seen by kernel. Rather than teaching loader about FFS-specific options that should be filtered out, ffs_mount recognizes those options as valid, but ignores and deletes[1] them. [1] is suggested by jh. PR: kern/141050 Reported by: many Reviewed by: jh, bde MFC after: 4 days	2010-05-19 09:32:11 +00:00
Jeff Roberson	f0268739c7	- Don't immediately re-run softdepflush if we didn't make any progress on the last iteration. This can lead to a deadlock when we have worklist items that cannot be immediately satisfied. Reported by: uqs, Dimitry Andric <dimitry@andric.com> - Remove some unnecessary debugging code and place some other under SUJ_DEBUG. - Examine the journal state in softdep_slowdown(). - Re-format some comments so I may more easily add flag descriptions.	2010-05-19 06:18:01 +00:00
Jeff Roberson	8ef48de888	- Call softdep_prealloc() before any of the balloc routines in the snapshot code. - Don't fsync() vnodes in prealloc if copy on write is in progress. It is not safe to recurse back into the write path here. Reported by: Vladimir Grebenschikov <vova@fbsd.ru>	2010-05-07 08:45:21 +00:00
Jeff Roberson	2c3ae115b6	- Use the correct flag mask when determining whether an inode has successfully made it to the free list yet or not. This fixes a deadlock that can occur with unlinked but referenced files. Journal space and inodedeps were not correctly reclaimed because the inode block was not left dirty. Tested/Reported by: lwindschuh@googlemail.com	2010-05-07 08:20:56 +00:00
Kirk McKusick	e27ed89aef	Merger of the quota64 project into head. This joint work of Dag-Erling Smørgrav and myself updates the FFS quota system to support both traditional 32-bit and new 64-bit quotas (for those of you who want to put 2+Tb quotas on your users). By default quotas are not compiled into the kernel. To include them in your kernel configuration you need to specify: options QUOTA # Enable FFS quotas If you are already running with the current 32-bit quotas, they should continue to work just as they have in the past. If you wish to convert to using 64-bit quotas, use `quotacheck -c 64'; if you wish to revert from 64-bit quotas back to 32-bit quotas, use `quotacheck -c 32'. There is a new library of functions to simplify the use of the quota system, do `man quotafile' for details. If your application is currently using the quotactl(2), it is highly recommended that you convert your application to use the quotafile interface. Note that existing binaries will continue to work. Special thanks to John Kozubik of rsync.net for getting me interested in pursuing 64-bit quota support and for funding part of my development time on this project.	2010-05-07 00:41:12 +00:00
Alan Cox	eb00b276ab	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
Kirk McKusick	945f418ab8	Final update to current version of head in preparation for reintegration.	2010-05-06 17:37:23 +00:00
Alan Cox	5ac59343be	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00

1 2 3 4 5 ...

1705 Commits