freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	8d4a7be84d	Reorganize the code in bdwrite() which handles move of dirtiness from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks	2010-06-08 17:54:28 +00:00
Alan Cox	c8fa870982	Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.	2010-06-02 15:46:37 +00:00
Alan Cox	e98d019d3c	Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed. Submitted by: kib	2010-05-25 02:26:25 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alan Cox	aa12e8b71d	The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib	2010-05-18 16:40:29 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Alan Cox	e3ef0d2fcf	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
Alan Cox	a7283d3213	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
Alan Cox	c5a648516e	Acquire the page lock around vm_page_unwire() and vm_page_wire(). Reviewed by: kib	2010-05-03 16:41:11 +00:00
Alan Cox	139a0de7f1	Properly synchronize access to the page's hold_count in vfs_vmio_release(). Reviewed by: kib	2010-05-02 19:10:27 +00:00
Alan Cox	b88b6c9d80	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Jeff Roberson	113db2dddb	- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm	2010-04-24 07:05:35 +00:00
Andriy Gapon	1b4bc5f851	bo_bsize: revert r205860 and take an alternative approch in getblk In r205860 I missed the fact that there is code that strongly assumes that devvp bo_bsize is equal to underlying provider's sectorsize. In those places it is hard to obtain the sectorsize in an alternative way if devvp bo_bsize is set to something else. So, I am reverting bo_bsize assigment in g_vfs_open. Instead, in getblk I use DEV_BSIZE block size for b_offset calculation if vp is a disk vp as reported by vn_isdisk. This should coinside with vp being a devvp. Reported by: Mykola Dzham <i@levsha.me> Tested by: Mykola Dzham <i@levsha.me> Pointyhat to: avg MFC after: 2 weeks X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks	2010-04-02 15:12:31 +00:00
Konstantin Belousov	7ac5806bfb	When buffer write is failed, it is wrong for brelse() to invalidate portion of the page that was written. Among other problems, this page might be picked up by pagedaemon, with failed assertion in vm_pageout_flush() about validity of the page. Reported and tested by: pho Approved by: re (kensmith) MFC after: 3 weeks	2009-07-19 20:25:59 +00:00
Alan Cox	0a276edef9	Eliminate an unused variable from allocbuf(). Eliminate the unnecessary setting of page valid bits from a non-VMIO buffer in vm_hold_load_pages().	2009-06-07 18:19:04 +00:00
Alan Cox	6864a18c41	Eliminate a comment describing code that was deleted over eight years ago. Move another comment to its proper place. Fix a typo in a third comment.	2009-06-01 06:12:08 +00:00
Alan Cox	1f17689408	nfs_write() can use the recently introduced vfs_bio_set_valid() instead of vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.	2009-05-31 20:18:02 +00:00
Alan Cox	623469c996	Modify vm_hold_load_pages() to allocate pages using VM_ALLOC_NOOBJ rather than using the kernel object. This allows the elimination of page queues locking from vm_hold_free_pages().	2009-05-29 18:35:51 +00:00
Zachary Loafman	cfeb7489c2	fail(9) support: Add support for kernel fault injection using KFAIL_POINT_* macros and fail_point_* infrastructure. Add example fail point in vfs_bio.c to simulate VM buf pressure. Approved by: dfr (mentor)	2009-05-27 16:36:54 +00:00
John Baldwin	d422da9a0a	Only use the ABI compat shim for vfs.bufspace if the old buffer is smaller than a long. PR: amd64/134786 Submitted by: Emil Mikulic emikulic\| gmail MFC after: 3 days	2009-05-21 16:18:45 +00:00
Alan Cox	1be5269359	Several changes to vfs_bio_clrbuf(): Provide a more descriptive comment. Eliminate dead code. The page cannot possibly have PG_ZERO set. Eliminate unnecessary blank lines. Reviewed by: tegge	2009-05-17 23:25:53 +00:00
Alan Cox	6e5982caf7	Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge	2009-05-17 20:26:00 +00:00
Alan Cox	1c1b26f276	Eliminate page queues locking from bufdone_finish() through the following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge	2009-05-13 05:39:39 +00:00
Alan Cox	c3d3fe6314	Revert CVS revision 1.94 (svn r16840). Current pmap implementations don't suffer from the race condition that motivated revision 1.94. Consequently, the work-around that was implemented by revision 1.94 is no longer needed. Moreover, reverting this work-around eliminates the need for vfs_busy_pages() to acquire the page queues lock when preparing a buffer for read. Reviewed by: tegge	2009-05-11 05:16:57 +00:00
Alexander Kabaev	8aeb69d0f2	Undo private changes that should never have been committed.	2009-04-17 18:34:11 +00:00
Alexander Kabaev	348496ad39	More fallout from negative dotdot caching. Negative entries should be removed from and reinserted to proper ncneg list. Reported by: pho Submitted by: kib	2009-04-17 18:11:11 +00:00
Konstantin Belousov	28a1b4eb37	In flushbufqueues(), do not allocate sentinel buffer on the stack, struct buf is large. Use sleeping malloc(9) call, and zero the allocated buf as a debugging feature.	2009-04-16 09:37:48 +00:00
Konstantin Belousov	949af70942	Export the number of times bufdaemon got help from the normal threads.	2009-04-16 09:33:52 +00:00
John Baldwin	9b84ba1cbb	Improve the description of a few sysctls. Submitted by: bde (partially) MFC after: 3 days	2009-03-23 20:18:06 +00:00
Attilio Rao	76ed3c71f1	Fix an old-standing bug that crept in along the several revisions: B_DELWRI cleanup and vnode disassociation should happen just before to assign the buffer to a queue. Reported by: miwi, Volker <volker at vwsoft dot com>, Ben Kaduk <minimarmot at gmail dot com>, Christopher Mallon <christoph dot mallon at gmx dot de> Tested by: lulf, miwi	2009-03-17 16:30:49 +00:00
Konstantin Belousov	c1d8b5e82c	Fix two issues with bufdaemon, often causing the processes to hang in the "nbufkv" sleep. First, ffs background cg group block write requests a new buffer for the shadow copy. When ffs_bufwrite() is called from the bufdaemon due to buffers shortage, requesting the buffer deadlock bufdaemon. Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk to not block while allocating the buffer, and return failure instead. Add a flag argument to the geteblk to allow to pass the flags to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer allocation failed and either GB_NOWAIT_BD is specified, or geteblk() is called from bufdaemon (or its helper, see below). In ffs_bufwrite(), fall back to synchronous cg block write if shadow block allocation failed. Since r107847, buffer write assumes that vnode owning the buffer is locked. The second problem is that buffer cache may accumulate many buffers belonging to limited number of vnodes. With such workload, quite often threads that own the mentioned vnodes locks are trying to read another block from the vnodes, and, due to buffer cache exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make any substantial progress because the vnodes are locked. Allow the threads owning vnode locks to help the bufdaemon by doing the flush pass over the buffer cache before getnewbuf() is going to uninterruptible sleep. Move the flushing code from buf_daemon() to new helper function buf_do_flush(), that is called from getnewbuf(). The number of buffers flushed by single call to buf_do_flush() from getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent recursive calls to buf_do_flush() by marking the bufdaemon and threads that temporarily help bufdaemon by TDP_BUFNEED flag. In collaboration with: pho Reviewed by: tegge (previous version) Tested by: glebius, yandex ... MFC after: 3 weeks	2009-03-16 15:39:46 +00:00
John Baldwin	060e911cf4	In the ABI shim for vfs.bufspace, rather than truncating values larger than INT_MAX to INT_MAX, just go ahead and write out the full long to give an error of ENOMEM to the user process. Requested by: bde	2009-03-10 21:27:15 +00:00
John Baldwin	38cce81ab3	Add an ABI compat shim for the vfs.bufspace sysctl for sysctl requests that try to fetch it as an int rather than a long. If the current value is greater than INT_MAX it reports a value of INT_MAX.	2009-03-10 15:26:50 +00:00
John Baldwin	5bd65606f4	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month	2009-03-09 19:35:20 +00:00
John Baldwin	8941aad19b	Tweak the output of VOP_PRINT/vn_printf() some. - Align the fifo output in fifo_print() with other vn_printf() output. - Remove the leading space from lockmgr_printinfo() so its output lines up in vn_printf(). - lockmgr_printinfo() now ends with a newline, so remove an extra newline from vn_printf().	2009-02-06 20:06:48 +00:00
Attilio Rao	0d7935fd01	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-10-10 21:23:50 +00:00
Konstantin Belousov	52dfc8d7da	Add the ffs structures introspection functions for ddb. Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month	2008-09-16 11:19:38 +00:00
Konstantin Belousov	2bb4c6f922	In brelse, put the B_NEEDSGIANT buffer on the QUEUE_DIRTY_GIANT queue, instead of QUEUE_DIRTY. Tested by: pho Reviewed by: attilio MFC after: 3 days	2008-08-19 11:31:49 +00:00
Alan Cox	14e69e48b8	Eliminate dead code. (The commit message for revision 1.287 explains why this code is dead.)	2008-07-20 04:13:51 +00:00
Attilio Rao	71072af500	b_waiters cannot be adequately protected by the interlock because it is dropped after the call to lockmgr() so just revert this approach using something similar to the precedent one: BUF_LOCKWAITERS() just checks if there are waiters (not the actual number of them) and it is based on newly introduced lockmgr_waiters() which returns if the lockmgr has waiters or not. The name has been choosen differently by old lockwaiters() in order to not confuse them. KPI results enriched by this commit so __FreeBSD_version bumping and manpage update will be happening soon. 'struct buf' also changes, so kernel ABI is disturbed. Bug found by: jeff Approved by: jeff, kib	2008-03-28 12:30:12 +00:00
Jeff Roberson	698b1a6643	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)	2008-03-22 09:15:16 +00:00
Konstantin Belousov	e7ffdf423a	Reduce contention on the vnode interlock by not acquiring the BO_LOCK around the check for the BV_BKGRDINPROG in the brelse() and bqrelse(). See the comment for the explanation why it is safe. Tested by: pho Submitted by: jeff	2008-03-21 12:38:44 +00:00
Jeff Roberson	0169d126a6	- Reduce contention on the global bdonelock and bpinlock by using a pool mutex to protect these sleep/wakeup/counter races. This still is preferable to bloating each bio with a mtx.	2008-03-21 10:00:05 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Attilio Rao	7fbfba7bf8	- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>	2008-03-01 19:47:50 +00:00
Attilio Rao	84887fa362	- Add real assertions to lockmgr locking primitives. A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation. - Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm. - Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation. - Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check. KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon. Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way. Tested by: kris (earlier version) Reviewed by: jhb	2008-02-13 20:44:19 +00:00
Attilio Rao	d638e093d6	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)	2008-01-19 17:36:23 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Warner Losh	c94a7cac1f	Rather than not redirting the bp when we get ENXIO, only redirty it when the error is EIO. This catches a much larger class of errors that are unlikely to succeed if retried. Submitted by: bde	2007-12-30 05:53:45 +00:00
Warner Losh	b27aa20e8d	A partial solution to some of the 'pull the umass device with a mounted FS' problems. These are more along the lines of 'avoiding an avoidable panic' than a complete solution to removable devices. We now close the barn door after the horse has gotten lose and has been hit by a truck, as it were. The barn no longer catches fire in this case, but the horse is still dead :-). The vfs_bio.c fix causes us not to put a failed write back into the dirty pool if the error returned was ENXIO. In that case, the buffer is treated like any other clean buffer that's being retured. ENXIO means the device isn't there anymore and will never be there again in the future, so retrying is futile. The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file system. If the device is gone, retrying later won't help and we'll never be able to unmount the device. These two are part of a larger patch set submitted by the author. The other patches will be forth coming. I added comments to these two patches. Submitted by: Henrik Gulbrandsen Reviewed by: phk@ PR: usb/46176 (partial)	2007-12-27 16:38:28 +00:00
Alan Cox	30418ed31c	Eliminate vfs_page_set_valid()'s unused argument.	2007-12-02 01:28:35 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Ruslan Ermilov	718a600b20	Fix the description of the formula used to autosize the number of buffers in the buffer cache. Approved by: re (kensmith)	2007-09-26 11:22:23 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Marcel Moolenaar	55b5660de4	Work around an integer overflow in expression `3 * maxbufspace / 4', when maxbufspace is larger than INT_MAX / 3. The overflow causes a hard hang on ia64 when physical memory is sufficiently large (8GB).	2007-06-09 23:41:14 +00:00
Xin LI	7b8c8b858c	In getblk(), before gbincore(), use BO_LOCK directly when locking the bufobj, rather than using VI_LOCK, like what was done with revision 1.453.	2007-06-08 07:05:08 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Konstantin Belousov	8e68f804a7	Disable nesting of BOP_BDFLUSH(). VOP_FSYNC() call in bdwrite() could result in bdwrite() being reentered, thus causing infinite recursion. Reported and tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks	2007-04-24 10:59:21 +00:00
Wojciech A. Koszek	2404c938e6	vm_map_delete should be used only internally, by the VM subsystem. Replace it with vm_map_remove, which not only embeds additional check, but also takes care of locking. Reviewed by: alc Approved by: alc, cognet (mentor)	2007-03-29 13:26:13 +00:00
Kris Kennaway	0c9c08dd9c	Correct a comment typo	2007-03-25 10:07:23 +00:00
Julian Elischer	486a941418	Instead of doing comparisons using the pcpu area to see if a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.	2007-03-08 06:44:34 +00:00
Xin LI	74f094f6a4	Use LIST_EMPTY() instead of unrolled version (LIST_FIRST() [!=]= NULL)	2007-02-22 14:52:59 +00:00
Konstantin Belousov	2cc7d26f7f	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)	2007-01-23 10:01:19 +00:00
Konstantin Belousov	cc570216bb	In rev. 1.514, iodone on async buffer may happen before code checks the vnode v_flag. For cluster buffers this would result in dereferencing NULL b_vp. To prevent the panic, cache relevant vnode flag before calling bstrategy. Reported by: Peter Holm, kris Tested by: Peter Holm Reviewed by: tegge Pointy hat to: kib	2006-12-20 09:22:31 +00:00
Konstantin Belousov	3b7b5496a7	Resolve two deadlocks that could be caused by busy md device backed by vnode. Allow for md thread and the thread that owns lock on vnode backing the md device to do the write even when runningbufspace is exhausted. Tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks	2006-12-14 11:34:07 +00:00
Alan Cox	0c2b04b419	Refactor vfs_setdirty(), creating vfs_setdirty_locked_object(). Call vfs_setdirty_locked_object() from vfs_busy_pages() instead of vfs_setdirty(), thereby eliminating a second acquisition and release of the same vm object lock.	2006-10-29 00:04:39 +00:00
Alan Cox	20ed1b5b1b	In bufdone_finish() restrict the acquisition and release of the page queues lock to BIO_READ operations. Recent changes to the implementation of the per-page flags have eliminated the need for the page queues lock in the other cases.	2006-10-28 19:16:57 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Tor Egge	04aa807cb6	If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.	2006-10-02 02:06:27 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Alan Cox	ab83ac429d	Reduce the scope of the page queues lock in vfs_busy_pages() now that vm_page_sleep_if_busy() no longer requires the caller to hold the page queues lock.	2006-08-08 06:00:49 +00:00
Alan Cox	af51d7bf57	Eliminate OBJ_WRITEABLE. It hasn't been used in a long time.	2006-07-21 06:40:29 +00:00
Jeff Roberson	4b24e4210e	- Properly check against B_DELWRI and B_NEEDSGIANT. This check was incorrectly written and caused some !NEEDSGIANT buffers to be put in the NEEDSGIANT queue. Sponsored by: Isilon Systems, Inc.	2006-04-04 06:44:21 +00:00
Jeff Roberson	084d64ac21	- Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf requires Giant. It is set in bgetvp and cleared in brelvp. - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant. - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and only if we think there are buffers in that queue. Sponsored by: Isilon Systems, Inc.	2006-03-31 02:56:30 +00:00
Pawel Jakub Dawidek	96c0381f5c	Destroy "bip" bio in error case. Found by: Coverity Prevent analysis tool Coverity ID: 795 MFC after: 3 days	2006-03-22 00:42:41 +00:00
Tor Egge	c78226329a	For low memory situations, non-VMIO buffers didnt't release pages back to the system when brelse() was called with B_RELBUF set on the buffer. This could be a problem when the system was low on memory, had many buffers on QUEUE_EMPTYKVA and started to traverse directories. For each getnewbuf(), pages were allocated from the system, driving the free reserve downwards. For each brelse(), the system put the buffer on QUEUE_CLEAN, with B_INVAL set. This commit changes the semantics of B_RELBUF to also free pages from non-VMIO buffers. Reviewed by: alc	2006-02-02 21:37:39 +00:00
Alan Cox	bb53e2bf27	Remove an unnecessary call to pmap_remove_all(). The given page is not mapped because its contents are invalid. Reviewed by: tegge	2006-01-23 00:00:45 +00:00
Tor Egge	dffaf91aa3	Set flag in needsbuffer while still holding bqlock to avoid lost wakeup.	2006-01-16 22:09:47 +00:00
Alexander Leidinger	ef39c05baa	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)	2005-12-31 14:39:20 +00:00
Craig Rodrigues	6951bea6c8	Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@	2005-12-07 03:39:08 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Tor Egge	8272da3106	Release clean buffer with wrong size and no dependencies also for non-VMIO case.	2005-10-09 22:41:25 +00:00
Don Lewis	bd3c2d867d	Un-staticize waitrunningbufspace() and call it before returning from ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().	2005-09-30 18:07:41 +00:00
Don Lewis	6c8b634f1d	Un-staticize runningbufwakeup() and staticize updateproc. Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>	2005-09-30 01:30:01 +00:00
Peter Edwards	d41c4674c2	Close a race in biodone(), whereby the bio_done field of the passed bio may have been freed and reassigned by the wakeup before being tested after releasing the bdonelock. There's a non-zero chance this is the cause of a few of the crashes knocking around with biodone() sitting in the stack backtrace. Reviewed By: phk@	2005-09-29 10:37:20 +00:00
Jeff Roberson	9e2aaec1e3	- Use lockmgr_printinfo rather than rolling our own. This introduces a slight problem by using printf instead of db_printf however 'show lockedvnods' does the same so I believe it is ok for now.	2005-08-03 05:02:08 +00:00
Alan Cox	ec9c9e7363	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks	2005-07-20 19:06:06 +00:00
Jeff Roberson	7a06fe49dc	- Add and enhance asserts related to the wrong bufobj panic. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)	2005-06-14 20:32:27 +00:00
Jeff Roberson	748c92fbad	- Split one KASSERT in bremfree() into two to aid in debugging. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:45:05 +00:00
Brian Feldman	cc3149b1ea	Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208	2005-06-10 23:50:41 +00:00
Jeff Roberson	a3d239bc29	- My sub-par public school education has been exposed. s/sentinal/sentinel/ Noticed by: Emil Mikulic	2005-06-09 04:40:20 +00:00
Jeff Roberson	9e879a5ee0	- Under heavy IO load the buf daemon can run for many hundereds of milliseconds due to what is essentially n^2 algorithmic complexity. This change makes the algorithm N*2 instead. This heavy processing manifested itself as skipping in audio and video playback due to the long scheduling latencies and contention on giant by pcm. - flushbufqueues() is now responsible for flushing multiple buffers rather than one at a time. This allows us to save our progress in the list by using a sentinal. We must do the numdirtywakeup() and waitrunningbufspace() here now rather than in buf_daemon(). - Also add a uio_yield() after we have processed the list once for bufs without deps and again for bufs with deps. This is to release Giant and allow any other giant locked code to proceed. Tested by: Many users on current@ Revealed by: schedgraph traces sent by Emil Mikulic & Anthony Ginepro	2005-06-08 20:26:05 +00:00
Jeff Roberson	1f22a07afd	- Add bufobj_wrefl() to add a write ref to a bufobj that is already locked.	2005-05-30 07:01:18 +00:00
Jeff Roberson	4a723b3604	- Remove long dead splbio() calls and comments relating to the old synchronization mechanism.	2005-04-30 12:18:50 +00:00
Jeff Roberson	ba4f7c7023	- Don't acquire Giant before calling b_biodone, individual consumers are now required to do so themselves. Sponsored by: Isilon Systems, Inc.	2005-04-30 11:44:22 +00:00
Jeff Roberson	0d12524bbf	- Add two KASSERTs to prevent us from recycling a buf that is still on a bufobj list. Sponsored by: Isilon Systems, Inc.	2005-04-22 00:53:20 +00:00
Jeff Roberson	6c759f3558	- Add information about the buf lock to db_show_buffer. - Add a 'show lockedbufs' command that is similar to show lockedvnods. Sponsored by: Isilon Systems, Inc.	2005-03-25 00:20:37 +00:00
Jeff Roberson	ec346d1040	- Lock access to the buffer_map with the vm_map lock. In 4.x this was done with splbio, in 5.x this was done with Giant. Discussed with: alc Reported by: julian, pho	2005-03-08 09:34:54 +00:00
Poul-Henning Kamp	1ba212823f	Make various vnode related functions static	2005-02-10 12:28:58 +00:00
Jeff Roberson	5c18d18b1d	- Add more information to the getnewbuf() recycling KTR. Sponsored by: Isilon Systems, Inc.	2005-02-10 02:22:56 +00:00
Jeff Roberson	b56dc9a785	- Remove an invalid KASSERT added in recent background write reshuffling. Sponsored by: Isilon Systems, Inc.	2005-02-08 23:25:08 +00:00
Poul-Henning Kamp	dd19a799b8	Background writes are entirely an FFS/Softupdates thing. Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.	2005-02-08 20:29:10 +00:00
Jeff Roberson	8364446643	- Don't release BKGRDINPROG until after we've bufdone'd the copy. Sponsored by: Isilon Systems, Inc.	2005-02-05 01:26:14 +00:00
Jeff Roberson	bd8d684fd7	- Don't drop the wref on the bufobj until after bufdone() has completed. Without this, threads waiting in bufobj_wwait() may wakeup prior to bufdone() completing. Sponsored by: Isilon Systems, Inc.	2005-01-28 17:48:58 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	35764be39e	Kill the VV_OBJBUF and test the v_object for NULL instead.	2005-01-24 13:13:57 +00:00
Jeff Roberson	71ddd673b1	- Add CTR calls to trace the lifecycle of a buffer. - Remove some KASSERTs which are invalid if the appropriate lock is not held. - Slightly restructure bremfree() so that it is more sane. - Change the flush code in bdwrite() to avoid acquiring a mutex whenever possible. - Change the flush code in bdwrite() to avoid holding the bufobj mutex while calling buf_countdeps(). This introduces a lock-order relationship with the softdep lock that can not otherwise be resolved. - Don't set B_DONE until bufdone() is complete, otherwise another processor may believe the buf is done before it is. - Only acquire Giant if the caller has set b_iodone. Don't grab giant around normal bufdone() calls. Sponsored By: Isilon Systems, Inc.	2005-01-24 10:47:04 +00:00
Poul-Henning Kamp	6ef8480a88	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.	2005-01-11 10:43:08 +00:00
Poul-Henning Kamp	8df6bac4c7	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson	2005-01-11 07:36:22 +00:00
Jeff Roberson	b646893f0f	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.	2004-11-18 08:44:09 +00:00
Poul-Henning Kamp	6e67e2a710	Retire b_magic now, we have the bufobj containing the same hint.	2004-11-04 09:48:18 +00:00
Poul-Henning Kamp	9f7a3028d5	Change buf->b_object to buf->b_bufobj->bo_object some whitespace fixes.	2004-11-04 09:06:54 +00:00
Poul-Henning Kamp	9bc4d9a495	whitespace	2004-11-04 08:25:52 +00:00
Poul-Henning Kamp	c569065139	Remove buf->b_dev field.	2004-11-04 07:59:57 +00:00
Alan Cox	d19ef81437	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
Poul-Henning Kamp	0cbda9dfd5	Remove the last call in the system to VOP_SPECSTRATEGY(): We can no longer come through the VNODE layer to the disks since all the filesystems now go via geom_vfs to GEOM.	2004-10-29 10:52:31 +00:00
Poul-Henning Kamp	6afb3b1c37	Give dev_strategy() an explict cdev argument in preparation for removing buf->b-dev. Put a bio between the buf passed to dev_strategy() and the device driver strategy routine in order to not clobber fields in the buf. Assert copyright on vfs_bio.c and update copyright message to canonical text. There is no legal difference between John Dysons two-clause abbreviated BSD license and the canonical text.	2004-10-29 07:16:37 +00:00
Poul-Henning Kamp	c5995e45eb	Lock bp->b_bufobj->b_object instead of bp->b_object	2004-10-28 08:38:46 +00:00
Poul-Henning Kamp	6e77a04170	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.	2004-10-26 10:44:10 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Alan Cox	cd9c0da805	Hold the lock on the containing vm object when calling vm_page_sleep_if_busy().	2004-10-26 06:58:26 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Alan Cox	a50b705403	Use VM_ALLOC_NOBUSY to eliminate vm_page_wakeup() calls and the acquisition and release of the global page queues lock required to make the call. Remove GIANT_REQUIRED from vm_hold_free_pages(). All of its VM operations are properly synchronized.	2004-10-25 06:34:14 +00:00
Poul-Henning Kamp	4dcd0ac4cf	Collapse vnode->v_object and buf->b_object into bufobj->bo_object.	2004-10-25 06:02:57 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Poul-Henning Kamp	6230ce6aa9	use dev_re[fl]thread() rather than home rolled versions.	2004-09-24 05:55:03 +00:00
Poul-Henning Kamp	1a52a73d68	Eliminate DEV_STRATEGY() macro: call dev_strategy() directly. Make dev_strategy() handle errors and departing devices properly.	2004-09-23 14:45:04 +00:00
Poul-Henning Kamp	a0e78d2eb0	Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount of the number of threads which are inside whatever is behind the cdevsw for this particular cdev. Make the device mutex visible through dev_lock() and dev_unlock(). We may want finer granularity later. Replace spechash_mtx use with dev_lock()/dev_unlock().	2004-09-23 07:17:41 +00:00
Poul-Henning Kamp	08dbd671ff	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
Poul-Henning Kamp	4095f485c8	undent some functions a bit.	2004-09-15 21:08:58 +00:00
Poul-Henning Kamp	ab19cad78e	stylistic polishing.	2004-09-15 20:54:23 +00:00
Poul-Henning Kamp	883d3c0c07	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
Poul-Henning Kamp	cf95b5c381	Eliminate unused second argument to reassignbuf() and simplify it accordingly.	2004-07-25 21:24:23 +00:00
Poul-Henning Kamp	a3d57cfbfd	Neuter this warning for now, I think I know the remaining issues.	2004-07-25 08:09:21 +00:00
Alan Cox	d8582da660	Remove GIANT_REQUIRED from vmapbuf().	2004-07-18 04:57:49 +00:00
Peter Edwards	0f01586867	Fix bug introduced in rev 1.434: When avoiding the zeroing of "bogus_page" when it appears in a buf, be sure to advance the pointers into the data for successive pages. The bug caused file corruption when read(2)ing from a "hole" in a file where a previous page of the read block had already been faulted in: fsx tripped up on this pretty quickly. The particular access pattern is probably pretty unusual, so other applications probably wouldn't have had problems, but you'd never know. Reviewed By: alc@	2004-07-06 23:40:40 +00:00
Poul-Henning Kamp	7f6599fec6	Make the last commit handle non-phk root devices better.	2004-07-04 19:42:25 +00:00
Stefan Farfeleder	5908d366fb	Consistently use __inline instead of __inline__ as the former is an empty macro in <sys/cdefs.h> for compilers without support for inline.	2004-07-04 16:11:03 +00:00
Poul-Henning Kamp	1cbb1e02c4	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.	2004-07-04 12:49:04 +00:00
Poul-Henning Kamp	cfa5e80af8	Remove stale comment	2004-07-03 19:37:06 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Alan Cox	ec1100fc6e	Avoid pointless zeroing of the bogus page in vfs_bio_clrbuf(). Suggested by: tegge@ (from October of last year)	2004-05-08 06:46:40 +00:00
Alan Cox	5a32489377	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
Dag-Erling Smørgrav	30a058027a	Replace a manual check of a VMIO candidate with vn_canvmio(). This silences an annoying warning in getblk() when VMIO'ing on a directory vnode, which can happen when vfs.vmiodirenable is 1. Bring the warning message in line with reality at the same time. Submitted by: hmp	2004-03-12 12:02:12 +00:00
Poul-Henning Kamp	ceb58ca58f	When I was a kid my work table was one cluttered mess an cleaning it up were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.	2004-03-11 18:50:33 +00:00
Poul-Henning Kamp	4d453ef101	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().	2004-03-11 18:02:36 +00:00
Alan Cox	3eba15c12e	Remove GIANT_REQUIRED from vunmapbuf().	2004-03-07 00:37:18 +00:00
Poul-Henning Kamp	cd690b60de	Device megapatch 6/6: This is what we came here for: Hang dev_t's from their cdevsw, refcount cdevsw and dev_t and generally keep track of things a lot better than we used to: Hold a cdevsw reference around all entrances into the device driver, this will be necessary to safely determine when we can unload driver code. Hold a dev_t reference while the device is open. KASSERT that we do not enter the driver on a non-referenced dev_t. Remove old D_NAG code, anonymous dev_t's are not a problem now. When destroy_dev() is called on a referenced dev_t, move it to dead_cdevsw's list. When the refcount drops, free it. Check that cdevsw->d_version is correct. If not, set all methods to the dead_*() methods to prevent entrance into driver. Print warning on console to this effect. The device driver may still explode if it is also incompatible with newbus, but in that case we probably didn't get this far in the first place.	2004-02-21 21:57:26 +00:00
Alan Cox	c5aebf380c	swp_pager_async_iodone() no longer requires Giant. Modify bufdone() and swapgeom_done() to perform swp_pager_async_iodone() without Giant. Reviewed by: tegge	2004-02-07 08:54:50 +00:00
Alan Cox	96a7b42213	Remove a variable that has been initialized but otherwise unused since revision 1.315.	2003-12-20 19:46:21 +00:00
Poul-Henning Kamp	00cbe31bd8	Send B_PHYS out to pasture, it no longer serves any function.	2003-11-15 09:28:09 +00:00
Alan Cox	28c9416429	- Remove the remaining now unnecessary checks for the buf's b_object being NULL. See revision 1.421 for more detail. - Remove GIANT_REQUIRED from vfs_unbusy_pages(). Discussed with: jeff	2003-11-15 08:45:36 +00:00
Poul-Henning Kamp	1415a09d42	Replace B_PHYS conditional assignment to bio_offset with KASSERT check to see that the originating code already did it right.	2003-11-12 10:27:06 +00:00
Kirk McKusick	fde81c7d8e	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.	2003-11-12 08:01:40 +00:00
Alan Cox	e35e0182c3	- Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being consistency initialized. Consequently, a number of conditionals that checked the validity of b_object before passing it to VM_OBJECT_LOCK() and VM_OBJECT_UNLOCK() are no longer needed.	2003-11-11 04:45:37 +00:00
Kirk McKusick	15a93fcc31	Allow the bufdaemon and update daemon processes to skip the waitrunningbufspace() calls so that they are always able to proceed and clean up buffer space. Submitted by: Brian Fundakowski Feldman <green@freebsd.org>	2003-11-04 06:30:00 +00:00
John Baldwin	787f162df6	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
Poul-Henning Kamp	68b00bf648	Remove KASSERTS on B_PHYS for vmapbuf() and vunmapbuf(), B_PHYS is going away.	2003-10-21 06:53:10 +00:00
Alan Cox	48ae2dddac	- Add vm object locking to vfs_clean_pages() and vfs_bio_set_validclean(). This is to synchronize access to the vm page's valid field by vm_page_set_validclean().	2003-10-19 20:39:06 +00:00
Poul-Henning Kamp	2d6a9d0747	Initialize b_iooffset before calling VOP_[SPEC]STRATEGY	2003-10-18 19:49:46 +00:00
Poul-Henning Kamp	0efedd8864	Don't report b_pblkno, it is going away.	2003-10-18 17:59:02 +00:00
Poul-Henning Kamp	583b92e328	Convert some if(bla) panic("foo") to KASSERTS to improve grep-ability.	2003-10-18 09:32:39 +00:00
Poul-Henning Kamp	d986d4580c	The size and contents of the DEV_STRATEGY() macro has progressed to the point where it being a macro is no longer sensible, and it will only be more so in days to come. BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not be used directly anymore. Put the contents of both in the new function dev_strategy() and make DEV_STRATEGY() call that function. In addition, this allows us to make the rather magic bufdonebio() helper function static. This alse saves hunderedandsome bytes of code in a typical kernel.	2003-10-18 09:03:15 +00:00
Jeff Roberson	85b9831dfa	- Add a mising vn_finished_write() Pointy hat: jeff Found by: robert Obtained from: kirk	2003-10-14 00:38:34 +00:00
Alan Cox	d58e70a08d	In vfs_bio_clrbuf(), ignore the state of the object lock if the page is the "bogus" page. Found by: tegge	2003-10-12 18:26:48 +00:00
Alan Cox	08814d66d5	- Synchronize access to a page's valid field in vfs_bio_clrbuf() by using the lock from its containing object. - Remove GIANT_REQUIRED from vm_hold_load_pages().	2003-10-10 07:26:21 +00:00
Jeff Roberson	d1cf0fc7fc	- Add a missing vn_start_write() to flushbufqueues(). This could have caused snapshot related problems. - The vp can not be NULL here or we would panic in vfs_bio_awrite(). Stop confusing the logic by checking for it in several places. Submitted by: kirk and then rototilled by me to remove vp == NULL checks.	2003-10-05 22:16:08 +00:00
Alan Cox	6ec2fca505	Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.	2003-10-04 22:47:20 +00:00
Alan Cox	bf0da100d6	- Extend the scope the vm object lock to cover calls to vm_page_is_valid(). - Assert that the lock on the containing vm object is held in vm_page_is_valid().	2003-10-04 19:23:29 +00:00
Alan Cox	c76789caa6	- vm_hold_free_pages() should lock the kernel object. (The pages being freed belong to the kernel object.) - Increase the granularity of the vm object locking in vm_hold_load_pages() in order to reduce the number of times that we acquire and release the same lock.	2003-09-22 04:58:09 +00:00
Alan Cox	35b86dc8de	Correct a typo in the previous revision.	2003-09-15 02:56:48 +00:00
Alan Cox	58abfe0051	Convert vmapbuf() from using pmap_extract() to using pmap_extract_and_hold(). Note, however, that GIANT_REQUIRED should not be removed until all platforms fully implement the "prot" parameter to pmap_extract_and_hold(). Reviewed by: tegge	2003-09-13 04:29:55 +00:00
Jeff Roberson	d919a11d06	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
Jeff Roberson	a7db559087	- If there is no vp assume that BKGRDINPROG is not set and set RELPBUF in brelse().	2003-08-31 01:07:45 +00:00
Jeff Roberson	b5c61abd82	- In some cases bp->b_vp can be NULL in brelse, don't try to lock the interlock in that case. Found by: alc	2003-08-31 00:06:07 +00:00
Marcel Moolenaar	9e8147f3af	In bufdone(), change the format specifier for m->valid and m->dirty to a long type and explicitly cast m->valid and m->dirty to unsigned long. When PAGE_SIZE is 32K, these fields are in fact unsigned long.	2003-08-28 19:58:11 +00:00
Alexander Kabaev	772a9659d9	Do not return with vnode interlock held. Reviewed by: rwatson	2003-08-28 15:48:15 +00:00
Jeff Roberson	9dbfeb0ae6	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
Alan Cox	b7ad744dc5	Hold the page queues lock when performing vm_page_clear_dirty() and vm_page_set_invalid().	2003-08-23 18:11:53 +00:00
Poul-Henning Kamp	4bfd22f25e	Grab Giant in bufdonebio() since drivers may not hold it. This only protects the "struct buf" consumers (ie: DEV_STRATEGY()), but does not protect BIO_STRATEGY() users.	2003-08-02 09:45:10 +00:00
Alan Cox	105660e8ba	Eliminate an abuse of kmem_alloc_pageable() in bufinit() by using VM_ALLOC_NOOBJ to allocate the bogus page. Reviewed by: tegge	2003-08-02 05:05:34 +00:00
Poul-Henning Kamp	568733688b	Initialize b_saveaddr when we hand out buffers	2003-06-20 08:26:38 +00:00
Alan Cox	f717a9d063	Lock the vm object when removing a page.	2003-06-11 16:37:33 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Poul-Henning Kamp	17a1391990	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.	2003-05-31 16:42:45 +00:00
Alan Cox	01dfc1deae	Finish the vm_object locking for this file, including holding the vm_object lock when accessing the vm_object's flags or calling vm_page_lookup().	2003-04-28 05:40:45 +00:00
Alan Cox	af3e0bb202	- Lock the vm_object when performing vm_page_alloc() in allocbuf().	2003-04-26 07:42:24 +00:00
Alan Cox	097d4338db	Lock the vm_object in vfs_busy_pages().	2003-04-20 00:17:05 +00:00
Alan Cox	0fa05eae77	- Lock the vm_object when performing vm_object_pip_subtract(). - Assert that the vm_object lock is held in vm_object_pip_subtract().	2003-04-19 22:11:41 +00:00
Alan Cox	0d420ad3e6	- Lock the vm_object when performing vm_object_pip_wakeupn(). - Assert that the vm_object lock is held in vm_object_pip_wakeupn(). - Add a new macro VM_OBJECT_LOCK_ASSERT().	2003-04-19 21:15:44 +00:00
Alan Cox	de5ef10142	Update locking on the kernel_object to use the new macros.	2003-04-14 00:36:53 +00:00
Alan Cox	0b556837a9	Remove an unnecessary trunc_page() from vmapbuf(). Reviewed by: tegge	2003-04-06 00:40:54 +00:00
Alan Cox	08468b6ad7	o Check the b_bufsize passed to vmapbuf() returning an error if it is invalid. o Remove a debugging printf() from vmapbuf(). Suggested by: tegge	2003-04-04 06:14:54 +00:00
Poul-Henning Kamp	d086f85ac4	Preparation commit before I start on the bioqueue lockdown: Collect all the bits of bioqueue handing in subr_disk.c, vfs_bio.c is big enough as it is and disksort already lives in subr_disk.c.	2003-03-30 08:51:23 +00:00
Tor Egge	5bbb806004	Add support for reading directly from file to userland buffer when the O_DIRECT descriptor status flag is set and both offset and length is a multiple of the physical media sector size.	2003-03-26 23:40:42 +00:00
Jake Burkholder	227f9a1c58	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)	2003-03-25 00:07:06 +00:00
Poul-Henning Kamp	b4b138c27f	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.	2003-03-18 08:45:25 +00:00
Jeff Roberson	749ffa4ecd	- Add a lock for protecting against msleep(bp, ...) wakeup(bp) races. - Create a new function bdone() which sets B_DONE and calls wakup(bp). This is suitable for use as b_iodone for buf consumers who are not going through the buf cache. - Create a new function bwait() which waits for the buf to be done at a set priority and with a specific wmesg. - Replace several cases where the above functionality was implemented without locking with the new functions.	2003-03-13 07:31:45 +00:00
Jeff Roberson	09f11da5a3	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.	2003-03-13 07:19:23 +00:00
Jeff Roberson	7261f5f68e	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick	2003-03-04 00:04:44 +00:00
Jeff Roberson	491081fabf	- Hold the vnode interlock across calls to bgetvp instead of acquiring it internally. This is required to stop multiple bufs from being associated with a single lblkno.	2003-03-02 06:05:23 +00:00
Jeff Roberson	bff5362bf2	- gc USE_BUFHASH. The smp locking of the buf cache renders this useless.	2003-03-01 05:55:03 +00:00
Kirk McKusick	7e734c4149	When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c delta 1.371) we must ensure that we do not get ourselves into a recursive trap endlessly trying to clean up after ourselves. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 23:59:09 +00:00
Jeff Roberson	2e3981a70c	- Add the missing NULL interlock argument to a recently added BUF_LOCK.	2003-02-25 08:23:11 +00:00
Kirk McKusick	3a7053cb60	Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.	2003-02-25 06:44:42 +00:00
Jeff Roberson	17661e5ac4	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick	2003-02-25 03:37:48 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Jeff Roberson	71146186a1	- Introduce a new function bremfreel() that does a bremfree with the buf queue lock already held. - In getblk() and flushbufqueues() use bremfreel() while we still have the buf queue lock held to keep the lists consistent. - Add LK_NOWAIT to two cases where we're essentially asserting that the bufs are not locked while acquiring the locks. This will make sure that we get the appropriate panic() and not another one for sleeping with a lock held.	2003-02-16 10:43:06 +00:00
Jeff Roberson	25c4325446	- Add a comment about a race that will happen without Giant.	2003-02-10 22:47:34 +00:00
Jeff Roberson	c7b716cc2a	- Unlock the nblock after the loop in bwillwrite().	2003-02-10 22:33:59 +00:00
Jeff Roberson	7137d635ac	- In getnewbuf() unlock the bq lock prior to sleeping when we're out of buffers. Submitted by: tegge	2003-02-10 06:02:51 +00:00
Jeff Roberson	3306adcfcf	- Correct another atomic op. Spotted by: alc	2003-02-09 22:39:51 +00:00
Jeff Roberson	69953c8435	- Move some code out from #ifdef INVARIANTS.	2003-02-09 12:11:37 +00:00
Jeff Roberson	767b9a529d	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h	2003-02-09 11:28:35 +00:00
Jeff Roberson	15553af710	- spell add 'add' and not 'subtract' in an atomic op. Spotted by: alc Pointy hat to: jeff	2003-02-09 11:21:40 +00:00
Jeff Roberson	d85be48243	- Lock down the buffer cache's infrastructure code. This includes locks on buf lists, synchronization variables, and atomic ops for the counters. This change does not remove giant from any code although some pushdown may be possible. - In vfs_bio_awrite() don't access buf fields without the buf lock.	2003-02-09 09:47:31 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Matthew Dillon	2d5c7e4506	Close the remaining user address mapping races for physical I/O, CAM, and AIO. Still TODO: streamline useracc() checks. Reviewed by: alc, tegge MFC after: 7 days	2003-01-20 17:46:48 +00:00
Alan Cox	28ec30cd9f	- Hold the page queues lock around vm_page_hold(). - Assert that the page queues lock rather than Giant is held in vm_page_hold().	2003-01-20 09:24:03 +00:00
Alan Cox	6eb07b4ac2	Fix two long-standing, but likely harmless, errors in the use of vm_pageout_deficit: 1. Update vm_pageout_deficit before VM_WAIT. There is no sense in delaying the update; the sooner the pageout daemon receives this information the better. Reviewed by: tegge 2. Update vm_pageout_deficit according to the number of pages still needed to complete the allocation, not the original size of the allocation. Submitted by: tegge (These errors have existed since the introduction of vm_pageout_deficit in revision 1.144.)	2003-01-16 08:14:56 +00:00
Matthew Dillon	f597900329	Merge all the various copies of vmapbuf() and vunmapbuf() into a single portable copy. Note that pmap_extract() must be used instead of pmap_kextract(). This is precursor work to a reorganization of vmapbuf() to close remaining user/kernel races (which can lead to a panic).	2003-01-15 23:54:35 +00:00
Alan Cox	b0ef8c5fe4	- Update vm_pageout_deficit using atomic operations. It's a simple counter outside the scope of existing locks. - Eliminate a redundant clearing of vm_pageout_deficit.	2003-01-14 06:57:03 +00:00
Alan Cox	8febaa4df0	vm_hold_load_pages() needn't clear PG_ZERO because it didn't pass VM_ALLOC_ZERO to vm_page_alloc(). (PG_ZERO is clear by default.)	2003-01-12 06:30:15 +00:00
Alan Cox	1f17965656	Make bogus_offset local to bufinit().	2003-01-07 19:55:08 +00:00
Poul-Henning Kamp	ea4804130a	Fix cut&paste bug which would result in a panic because buffer was being biodone'ed multiple times.	2003-01-05 22:01:08 +00:00
Alan Cox	9ce904432a	Allocate bogus_page with VM_ALLOC_WIRED. (Previously, bogus_page's allocation incremented the global count of wired pages, but not the page's own wire count. This inconsistency was introduced in revision 1.230.)	2003-01-05 18:46:13 +00:00
Poul-Henning Kamp	f5b11b6e2d	Temporarily introduce a new VOP_SPECSTRATEGY operation while I try to sort out disk-io from file-io in the vm/buffer/filesystem space. The intent is to sort VOP_STRATEGY calls into those which operate on "real" vnodes and those which operate on VCHR vnodes. For the latter kind, the call will be changed to VOP_SPECSTRATEGY, possibly conditionally for those places where dual-use happens. Add a default VOP_SPECSTRATEGY method which will call the normal VOP_STRATEGY. First time it is called it will print debugging information. This will only happen if a normal vnode is passed to VOP_SPECSTRATEGY by mistake. Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY does on a VCHR vnode today. Add a new VOP_STRATEGY method in specfs to catch instances where the conversion to VOP_SPECSTRATEGY has not yet happened. Handle the request just like we always did, but first time called print debugging information. Apart up to two instances of console messages per boot, this amounts to a glorified no-op commit. If you get any of the messages on your console I would very much like a copy of them mailed to phk@freebsd.org	2003-01-04 22:10:36 +00:00
Poul-Henning Kamp	7b330b22b6	Don't call VOP_BMAP on VCHR vnodes when the logical and physical block numbers are identical: it cannot even hope to accomplish anything.	2003-01-04 09:37:42 +00:00
Poul-Henning Kamp	862702306b	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.	2003-01-03 06:32:15 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Alan Cox	d746789347	Hold the page queues lock when calling vm_page_flag_clear().	2002-12-27 06:52:32 +00:00
Alan Cox	0cb6c00463	- Hold the kernel_object's lock around vm_page_alloc(kernel_object,...). - Hold the page queues lock around vm_page_wakeup().	2002-12-23 20:10:47 +00:00
Kirk McKusick	0f5f789c0d	The buffer daemon cannot skip over buffers owned by locked inodes as they may be the only viable ones to flush. Thus it will now wait for an inode lock if the other alternatives will result in rollbacks (and immediate redirtying of the buffer). If only buffers with rollbacks are available, one will be flushed, but then the buffer daemon will wait briefly before proceeding. Failing to wait briefly effectively deadlocks a uniprocessor since every other process writing to that filesystem will wait for the buffer daemon to clean up which takes close enough to forever to feel like a deadlock. Reported by: Archie Cobbs <archie@dellroad.org> Sponsored by: DARPA & NAI Labs. Approved by: re	2002-12-14 01:35:30 +00:00
Alan Cox	178949e021	Hold the page queues/flags lock when calling vm_page_set_validclean(). Approved by: re	2002-11-23 19:10:31 +00:00
Alan Cox	4fec79bef8	Now that pmap_remove_all() is exported by our pmap implementations use it directly.	2002-11-16 07:44:25 +00:00
Alan Cox	d154fb4fe6	When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations. Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().	2002-11-10 07:12:04 +00:00
Kirk McKusick	bc7bdd50c1	When the number of dirty buffers rises too high, the buf_daemon runs to help clean up. After selecting a potential buffer to write, this patch has it acquire a lock on the vnode that owns the buffer before trying to write it. The vnode lock is necessary to avoid a race with some other process holding the vnode locked and trying to flush its dirty buffers. In particular, if the vnode in question is a snapshot file, then the race can lead to a deadlock. To avoid slowing down the buf_daemon, it does a non-blocking lock request when trying to lock the vnode. If it fails to get the lock it skips over the buffer and continues down its queue looking for buffers to flush. Sponsored by: DARPA & NAI Labs.	2002-10-18 01:29:59 +00:00
Poul-Henning Kamp	53cc479393	Remove unused includes. Clarify the intention of a while(); Move a local variable to avoid potential name-confusion.	2002-09-28 17:46:30 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Poul-Henning Kamp	54286a04c5	Correctly order VI_UNLOCK(), local variables and block comment.	2002-09-28 12:15:44 +00:00
Poul-Henning Kamp	089cf428da	Make biowait() check bio_error before the BIO_ERROR flag, to propery catch internal GEOM use of bio_error. Sponsored by: DARPA & NAI Labs.	2002-09-26 16:32:14 +00:00
Jeff Roberson	b7227b7712	- Lock accesses to v_numoutput. - Lock calls to gbincore.	2002-09-25 02:11:37 +00:00
Poul-Henning Kamp	f986355c0e	s/Danglish/English/ Some style issues. Change the timeout to be hz/10 instead of hz. Brucification by: bde.	2002-09-15 17:52:35 +00:00

... 3 4 5 6 7 ...

782 Commits