freebsd-dev

Author	SHA1	Message	Date
Remko Lodder	248a0568e7	Correct a copy and paste'o in phys_pager.c, we are talking about phys here and not about devices. PR: 93755 Approved by: imp (mentor, implicit when re-assigning the ticket to me).	2007-10-30 14:48:13 +00:00
Alan Cox	21f7958604	Change vm_page_cache_transfer() such that it does not transfer pages that would have an offset beyond the end of the target object. Such pages should remain in the source object. MFC after: 3 days Diagnosed and reviewed by: Kostik Belousov Reported and tested by: Peter Holm	2007-10-27 00:09:30 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Alan Cox	0ab3c7a594	Correct an error of omission in the reimplementation of the page cache: vnode_pager_setsize() must handle the case where a file is truncated to a non-page-size-aligned boundary and there is a cached page underlying the new end of file. Reported by: kris, tegge Tested by: kris MFC after: 3 days	2007-10-22 06:23:46 +00:00
Alan Cox	7b0e72d184	Correct an error in vm_map_sync(), nee vm_map_clean(), that has existed since revision 1.1. Specifically, neither traversal of the vm map checks whether the end of the vm map has been reached. Consequently, the first traversal can wrap around and bogusly return an error. This error has gone unnoticed for so long because no one had ever before tried msync(2)ing a region above the stack. Reported by: peter MFC after: 1 week	2007-10-22 05:21:05 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Alan Cox	2573269111	The previous revision, updating vm_object_page_remove() for the new page cache, did not account for the case where the vm object has nothing but cached pages. Reported by: kris, tegge Reviewed by: tegge MFC after: 3 days	2007-10-18 23:02:18 +00:00
Peter Wemm	c899450b21	Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.	2007-10-18 22:47:39 +00:00
Ruslan Ermilov	8229241a90	Fix CTL_VM_NAMES.	2007-10-16 11:32:57 +00:00
John Baldwin	71eb44c7b1	Allow recursion on the 'zones' internal UMA zone. Submitted by: thompsa MFC after: 1 week Approved by: re (kensmith) Discussed with: jeff	2007-10-11 20:11:27 +00:00
Konstantin Belousov	4ab8ab9285	Do not dereference NULL pointer. Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)	2007-10-08 20:09:53 +00:00
Alan Cox	b8c5048025	In the rare case that vm_page_cache() actually frees the given page, it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful. Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov	2007-10-08 18:01:38 +00:00
Alan Cox	dc9250f55c	Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794	2007-10-07 18:03:03 +00:00
Alan Cox	c944491426	Correct an error of omission in the reimplementation of the page cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)	2007-09-27 04:21:59 +00:00
Alan Cox	f3a2ed4bd9	Correct an error in the previous revision, specifically, vm_object_madvise() should request that the reactivated, cached page not be busied. Reported by: Rink Springer Approved by: re (kensmith)	2007-09-25 21:01:10 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Jeff Roberson	258853ab1c	- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re	2007-09-21 05:07:07 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Alan Cox	4cd457233b	Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week	2007-09-15 18:30:28 +00:00
Konstantin Belousov	d239bd3ccc	Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2007-08-20 12:05:45 +00:00
Konstantin Belousov	daab56673e	Remove comment that is no longer quite true. Noted by: alc Approved by: re (kensmith)	2007-08-18 16:41:31 +00:00
Konstantin Belousov	efe7553ed7	Fix the phys_pager in the way similar to the rev. 1.83 of the sys/vm/device_pager.c: Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block. Reviewed by: alc Approved by: re (kensmith)	2007-08-18 16:40:33 +00:00
Konstantin Belousov	deea654ebf	Protect the creation of the device pager with the dev_pager_mtx. Lookup of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)	2007-08-07 15:36:25 +00:00
Alan Cox	b5e8f167b9	Consider a scenario in which one processor, call it Pt, is performing vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-08-05 21:04:32 +00:00
Konstantin Belousov	c6199d59e3	Do not acquire Giant unconditionally around the calls to the cdevsw d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag. Reviewed by: alc Approved by: re (kensmith)	2007-08-05 05:40:52 +00:00
Alan Cox	eaa29f1ce4	Add a counter for the total number of pages cached and support for reporting the value of this counter in the program "vmstat". Approved by: re (rwatson)	2007-07-27 20:01:22 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
Alan Cox	806453645a	Two changes to vm_fault_additional_pages(): 1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan. 2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case. Approved by: re (hrs) MFC after: 3 weeks	2007-07-20 06:55:11 +00:00
Alan Cox	8941dc4471	Eliminate two unused functions: vm_phys_alloc_pages() and vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes. Approved by: re (hrs)	2007-07-14 21:21:17 +00:00
Alan Cox	bd06ab2f60	Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun". Approved by: re (hrs)	2007-07-14 19:00:44 +00:00
Alan Cox	0f752392c6	Update a comment describing the page queues. Approved by: re (hrs)	2007-07-13 04:42:20 +00:00
Alan Cox	e99a797492	Eliminate dead code. Approved by: re (hrs)	2007-07-12 22:23:28 +00:00
Alan Cox	20dd22a24e	Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given page is wired, preventing it from being recycled. However, when transmission of the page completes, the page is unwired and returned to the page queues. At that point, the page is not in any special state that prevents it from being recycled. Consequently, vm_page_cowfault() should verify that the page is still held by the same vm object before retrying the replacement of the page. Note: The containing object is, however, safe from being recycled by virtue of having a non-zero paging-in-progress count. While I'm here, add some assertions and comments. Approved by: re (rwatson) MFC After: 3 weeks	2007-07-10 18:41:34 +00:00
Alan Cox	d1974c0df1	Eliminate the special case handling of OBJT_DEVICE objects in vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)	2007-07-08 19:42:52 +00:00
Alan Cox	65ea29a690	When a cached page is reactivated in vm_fault(), update the counter that tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week	2007-07-06 21:25:21 +00:00
Peter Wemm	c2815ad564	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
Alan Cox	14137dc045	In the previous revision, when I replaced the unconditional acquisition of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-07-02 06:56:37 +00:00
Alan Cox	97824da382	Eliminate the use of Giant from vm_daemon(). Replace the unconditional use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks	2007-06-26 18:24:05 +00:00
Alan Cox	fe8606ac9e	Eliminate GIANT_REQUIRED from swap_pager_putpages(). Approved by: re (mux) MFC after: 1 week	2007-06-24 18:40:30 +00:00
Alan Cox	9e897b1bc6	Eliminate unnecessary checks from vm_pageout_clean(): The page that is passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge	2007-06-18 02:04:38 +00:00
Matt Jacob	0a49733cb9	Don't declare inline a function which isn't.	2007-06-17 04:19:05 +00:00
Matt Jacob	6bda842d77	Make sure object is NULL- there is a possible case where you could fall through to it being used w/o being set. Put a break in the default case.	2007-06-17 04:17:48 +00:00
Matt Jacob	9dae729081	Initialize reqpage to zero.	2007-06-17 04:14:27 +00:00
Alan Cox	bcc231ecb6	If attempting to cache a "busy", panic instead of printing a diagnostic message and returning.	2007-06-16 21:07:51 +00:00
Alan Cox	2f9f48d623	Update a comment.	2007-06-16 05:25:53 +00:00
Alan Cox	2446e4f02c	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
Alan Cox	d076fbea58	Eliminate dead code: We have not performed pageouts on the kernel object in this millenium.	2007-06-13 06:10:10 +00:00
Alan Cox	ad7a4c3acd	Conditionally acquire Giant in vm_contig_launder_page().	2007-06-11 03:20:16 +00:00
Attilio Rao	393a081d42	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
Alan Cox	11752d88a2	Add a new physical memory allocator. However, do not yet connect it to the build. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-10 00:49:16 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	6759608248	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Attilio Rao	f9819486e5	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
Alan Cox	cf4682ae23	Eliminate the reactivation of cached pages in vm_fault_prefault() and vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.	2007-05-22 04:45:59 +00:00
Jeff Roberson	80b200da28	- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument. Suggested by: julian@ Contributed by: attilio@	2007-05-20 22:33:42 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Robert Watson	6ab3b958fc	Update stale comment on protecting UMA per-CPU caches: we now use critical sections rather than mutexes.	2007-05-09 22:53:34 +00:00
Alan Cox	04a18977c8	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
Alan Cox	17afe8befe	Remove some code from vmspace_fork() that became redundant after revision 1.334 modified _vm_map_init() to initialize the new vm map's flags to zero.	2007-04-26 05:48:17 +00:00
Robert Watson	d9135e724e	Audit pathnames looked up in swapon(2) and swapoff(2). MFC after: 2 weeks Obtained from: TrustedBSD Project	2007-04-23 14:41:34 +00:00
Alan Cox	f40fd96d5b	Correct contigmalloc2()'s implementation of M_ZERO. Specifically, contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest. Submitted by: Michael Plass PR: 81301 MFC after: 1 week	2007-04-19 05:39:54 +00:00
Alan Cox	a96d395ba1	Correct two comments. Submitted by: Michael Plass	2007-04-19 04:52:47 +00:00
Giorgos Keramidas	a52da38f26	Minor typo fix, noticed while I was going through *_pager.c files.	2007-04-10 12:34:51 +00:00
Pawel Jakub Dawidek	0f2c2ce0a3	When KVA is exhausted, try the vm_lowmem event for the last time before panicing. This helps a lot in ZFS stability.	2007-04-05 20:52:51 +00:00
Pawel Jakub Dawidek	fcdd9721e4	Fix a problem for file systems that don't implement VOP_BMAP() operation. The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet: if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] } (we have some random values in rahead and rbehind variables) I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails? alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.	2007-04-05 20:49:46 +00:00
Alan Cox	19c244d064	Prevent a race between vm_object_collapse() and vm_object_split() from causing a crash. Suppose that we have two objects, obj and backing_obj, where backing_obj is obj's backing object. Further, suppose that backing_obj has a reference count of two. One being the reference held by obj and the other by a map entry. Now, suppose that the map entry is deallocated and its reference removed by vm_object_deallocate(). vm_object_deallocate() recognizes that the only remaining reference is from a shadow object, obj, and calls vm_object_collapse() on obj. vm_object_collapse() executes if (backing_object->ref_count == 1) { /* * If there is exactly one reference to the backing * object, we can collapse it into the parent. */ vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT); vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes if (op & OBSC_COLLAPSE_WAIT) { vm_object_set_flag(backing_object, OBJ_DEAD); } Finally, suppose that either vm_object_backing_scan() or vm_object_collapse() sleeps releasing its locks. At this instant, another thread executes vm_object_split(). It crashes in vm_object_reference_locked() on the assertion that the object is not dead. If, however, assertions are not enabled, it crashes much later, after the object has been recycled, in vm_object_deallocate() because the shadow count and shadow list are inconsistent. Reviewed by: tegge Reported by: jhb MFC after: 1 week	2007-03-27 08:55:17 +00:00
Alan Cox	8fece8c367	Two small changes to vm_map_pmap_enter(): 1) Eliminate an unnecessary check for fictitious pages. Specifically, only device-backed objects contain fictitious pages and the object is not device-backed. 2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to prevent possible wrap around with extremely large maps and objects, respectively. Observed by: tegge (last summer)	2007-03-25 19:33:40 +00:00
Alan Cox	768131d293	vm_page_busy() no longer requires the page queues lock to be held. Reduce the scope of the page queues lock in vm_fault() accordingly.	2007-03-23 06:11:25 +00:00
Alan Cox	c5474b8f18	Change the order of lock reacquisition in vm_object_split() in order to simplify the code slightly. Add a comment concerning lock ordering.	2007-03-22 07:02:43 +00:00
Alan Cox	d8810d894d	Use PCPU_LAZY_INC() to update page fault statistics.	2007-03-05 18:55:14 +00:00
John Baldwin	8db5fc58ff	Use pause() in vm_object_deallocate() to yield the CPU to the lock holder rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended to awaken the swapper, not random threads blocked in vm_object_deallocate().	2007-02-27 19:40:26 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
Alan Cox	9f5c801b94	Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.	2007-02-25 06:14:58 +00:00
Alan Cox	0cd31a0d75	Change the page's CLEANCHK flag from being a page queue mutex synchronized flag to a vm object mutex synchronized flag.	2007-02-22 06:15:52 +00:00
Alan Cox	711585d087	Enable vm_page_free() and vm_page_free_zero() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.	2007-02-18 05:54:42 +00:00
Alan Cox	ba000fb2c1	Remove a stale comment. Add punctuation to a nearby comment.	2007-02-17 19:37:00 +00:00
Alan Cox	d3d029bd62	Relax the page queue lock assertions in vm_page_remove() and vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)	2007-02-15 05:43:38 +00:00
Alan Cox	7d60988bad	Avoid the unnecessary acquisition of the free page queues lock when a page is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)	2007-02-14 07:05:55 +00:00
Robert Watson	1e319f6db3	Add uma_set_align() interface, which will be called at most once during boot by MD code to indicated detected alignment preference. Rather than cache alignment being encoded in UMA consumers by defining a global alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now a special value (-1) that causes UMA to look at registered alignment. If no preferred alignment has been selected by MD code, a default alignment of (16 - 1) will be used. Currently, no hardware platforms specify alignment; architecture maintainers will need to modify MD startup code to specify an alignment if desired. This must occur before initialization of UMA so that all UMA zones pick up the requested alignment. Reviewed by: jeff, alc Submitted by: attilio	2007-02-11 20:13:52 +00:00
Alan Cox	5351a2488a	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.	2007-02-11 05:18:40 +00:00
John Baldwin	e8865caffb	- Move 'struct swdevt' back into swap_pager.h and expose it to userland. - Restore support for fetching swap information from crash dumps via kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps. Reviewed by: arch@, phk MFC after: 1 week	2007-02-07 17:43:11 +00:00
Alan Cox	e9f995d824	Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.	2007-02-07 06:37:30 +00:00
Alan Cox	3ae3919d0b	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
Mohan Srinivasan	6c125b8df6	Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@	2007-01-25 01:05:23 +00:00
Mohan Srinivasan	7738029183	Fix for a bug where only one process (of multiple) blocked on maxpages on a zone is woken up, with the rest never being woken up as a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked procsses instead. This change introduces a thundering herd, but since this should be relatively infrequent, optimizing this (by introducing a count of blocked processes, for example) may be premature. Reviewd by: ups@	2007-01-24 22:49:11 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Xin LI	f67af5c918	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 15:05:52 +00:00
Robert Watson	635fd50514	Remove uma_zalloc_arg() hack, which coerced M_WAITOK to M_NOWAIT when allocations were made using improper flags in interrupt context. Replace with a simple WITNESS warning call. This restores the invariant that M_WAITOK allocations will always succeed or die horribly trying, which is relied on by many UMA consumers. MFC after: 3 weeks Discussed with: jhb	2007-01-10 21:04:43 +00:00
Alan Cox	e6eaadba43	Declare the map entry created by kmem_init() for the range from VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as MAP_NOFAULT.	2007-01-07 07:32:04 +00:00
John Baldwin	663b416f16	- Add a new function uma_zone_exhausted() to see if a zone is full. - Add a printf in swp_pager_meta_build() to warn if the swapzone becomes exhausted so that there's at least a warning before a box that runs out of swapzone space before running out of swap space deadlocks. MFC after: 1 week Reviwed by: alc	2007-01-05 19:09:01 +00:00
Alan Cox	73000556e8	Optimize vm_object_split(). Specifically, make the number of iterations equal to the number of physical pages that are renamed to the new object rather than the new object's virtual size.	2006-12-17 20:14:43 +00:00
Alan Cox	95442adf05	Simplify the computation of the new object's size in vm_object_split().	2006-12-16 08:17:07 +00:00
Kip Macy	35d10226b7	Remove the requirement that phys_avail be sorted in ascending order by explicitly finding the lowest and highest addresses when calculating the size of the vm_pages array Reviewed by :alc	2006-12-08 08:44:47 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Ruslan Ermilov	9bed18a493	The clean_map has been made local to vm_init.c long ago.	2006-11-20 16:23:34 +00:00
Ruslan Ermilov	ef1b7c4804	Remove a redundant pointer-type variable.	2006-11-20 08:33:55 +00:00

1 2 3 4 5 ...

2434 Commits