freebsd-nq

Author	SHA1	Message	Date
Alan Cox	4cd457233b	Correct an assertion in vm_pageout_flush(). Specifically, if a page's status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week	2007-09-15 18:30:28 +00:00
Konstantin Belousov	d239bd3ccc	Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert(). For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2007-08-20 12:05:45 +00:00
Konstantin Belousov	daab56673e	Remove comment that is no longer quite true. Noted by: alc Approved by: re (kensmith)	2007-08-18 16:41:31 +00:00
Konstantin Belousov	efe7553ed7	Fix the phys_pager in the way similar to the rev. 1.83 of the sys/vm/device_pager.c: Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block. Reviewed by: alc Approved by: re (kensmith)	2007-08-18 16:40:33 +00:00
Konstantin Belousov	deea654ebf	Protect the creation of the device pager with the dev_pager_mtx. Lookup of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)	2007-08-07 15:36:25 +00:00
Alan Cox	b5e8f167b9	Consider a scenario in which one processor, call it Pt, is performing vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-08-05 21:04:32 +00:00
Konstantin Belousov	c6199d59e3	Do not acquire Giant unconditionally around the calls to the cdevsw d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag. Reviewed by: alc Approved by: re (kensmith)	2007-08-05 05:40:52 +00:00
Alan Cox	eaa29f1ce4	Add a counter for the total number of pages cached and support for reporting the value of this counter in the program "vmstat". Approved by: re (rwatson)	2007-07-27 20:01:22 +00:00
Pawel Jakub Dawidek	57fd3d5572	When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)	2007-07-26 16:58:09 +00:00
Alan Cox	806453645a	Two changes to vm_fault_additional_pages(): 1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan. 2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case. Approved by: re (hrs) MFC after: 3 weeks	2007-07-20 06:55:11 +00:00
Alan Cox	8941dc4471	Eliminate two unused functions: vm_phys_alloc_pages() and vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes. Approved by: re (hrs)	2007-07-14 21:21:17 +00:00
Alan Cox	bd06ab2f60	Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun". Approved by: re (hrs)	2007-07-14 19:00:44 +00:00
Alan Cox	0f752392c6	Update a comment describing the page queues. Approved by: re (hrs)	2007-07-13 04:42:20 +00:00
Alan Cox	e99a797492	Eliminate dead code. Approved by: re (hrs)	2007-07-12 22:23:28 +00:00
Alan Cox	20dd22a24e	Correct a problem in the ZERO_COPY_SOCKETS option, specifically, in vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given page is wired, preventing it from being recycled. However, when transmission of the page completes, the page is unwired and returned to the page queues. At that point, the page is not in any special state that prevents it from being recycled. Consequently, vm_page_cowfault() should verify that the page is still held by the same vm object before retrying the replacement of the page. Note: The containing object is, however, safe from being recycled by virtue of having a non-zero paging-in-progress count. While I'm here, add some assertions and comments. Approved by: re (rwatson) MFC After: 3 weeks	2007-07-10 18:41:34 +00:00
Alan Cox	d1974c0df1	Eliminate the special case handling of OBJT_DEVICE objects in vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)	2007-07-08 19:42:52 +00:00
Alan Cox	65ea29a690	When a cached page is reactivated in vm_fault(), update the counter that tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week	2007-07-06 21:25:21 +00:00
Peter Wemm	c2815ad564	Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate Approved by: re (kensmith)	2007-07-04 22:57:21 +00:00
Alan Cox	14137dc045	In the previous revision, when I replaced the unconditional acquisition of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks	2007-07-02 06:56:37 +00:00
Alan Cox	97824da382	Eliminate the use of Giant from vm_daemon(). Replace the unconditional use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks	2007-06-26 18:24:05 +00:00
Alan Cox	fe8606ac9e	Eliminate GIANT_REQUIRED from swap_pager_putpages(). Approved by: re (mux) MFC after: 1 week	2007-06-24 18:40:30 +00:00
Alan Cox	9e897b1bc6	Eliminate unnecessary checks from vm_pageout_clean(): The page that is passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge	2007-06-18 02:04:38 +00:00
Matt Jacob	0a49733cb9	Don't declare inline a function which isn't.	2007-06-17 04:19:05 +00:00
Matt Jacob	6bda842d77	Make sure object is NULL- there is a possible case where you could fall through to it being used w/o being set. Put a break in the default case.	2007-06-17 04:17:48 +00:00
Matt Jacob	9dae729081	Initialize reqpage to zero.	2007-06-17 04:14:27 +00:00
Alan Cox	bcc231ecb6	If attempting to cache a "busy", panic instead of printing a diagnostic message and returning.	2007-06-16 21:07:51 +00:00
Alan Cox	2f9f48d623	Update a comment.	2007-06-16 05:25:53 +00:00
Alan Cox	2446e4f02c	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-16 04:57:06 +00:00
Alan Cox	d076fbea58	Eliminate dead code: We have not performed pageouts on the kernel object in this millenium.	2007-06-13 06:10:10 +00:00
Alan Cox	ad7a4c3acd	Conditionally acquire Giant in vm_contig_launder_page().	2007-06-11 03:20:16 +00:00
Attilio Rao	393a081d42	Optimize vmmeter locking. In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)	2007-06-10 21:59:14 +00:00
Alan Cox	11752d88a2	Add a new physical memory allocator. However, do not yet connect it to the build. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re	2007-06-10 00:49:16 +00:00
Jeff Roberson	982d11f836	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
Attilio Rao	b4b7081961	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:45:18 +00:00
Attilio Rao	6759608248	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)	2007-06-04 21:38:48 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Attilio Rao	2feb50bf7d	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)	2007-05-31 22:52:15 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Attilio Rao	f9819486e5	Add functions sx_xlock_sig() and sx_slock_sig(). These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)	2007-05-31 09:14:48 +00:00
Alan Cox	cf4682ae23	Eliminate the reactivation of cached pages in vm_fault_prefault() and vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.	2007-05-22 04:45:59 +00:00
Jeff Roberson	80b200da28	- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument. Suggested by: julian@ Contributed by: attilio@	2007-05-20 22:33:42 +00:00
Jeff Roberson	222d01951f	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>	2007-05-18 07:10:50 +00:00
Robert Watson	6ab3b958fc	Update stale comment on protecting UMA per-CPU caches: we now use critical sections rather than mutexes.	2007-05-09 22:53:34 +00:00
Alan Cox	04a18977c8	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194	2007-05-05 19:50:28 +00:00
Alan Cox	17afe8befe	Remove some code from vmspace_fork() that became redundant after revision 1.334 modified _vm_map_init() to initialize the new vm map's flags to zero.	2007-04-26 05:48:17 +00:00
Robert Watson	d9135e724e	Audit pathnames looked up in swapon(2) and swapoff(2). MFC after: 2 weeks Obtained from: TrustedBSD Project	2007-04-23 14:41:34 +00:00
Alan Cox	f40fd96d5b	Correct contigmalloc2()'s implementation of M_ZERO. Specifically, contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest. Submitted by: Michael Plass PR: 81301 MFC after: 1 week	2007-04-19 05:39:54 +00:00
Alan Cox	a96d395ba1	Correct two comments. Submitted by: Michael Plass	2007-04-19 04:52:47 +00:00
Giorgos Keramidas	a52da38f26	Minor typo fix, noticed while I was going through *_pager.c files.	2007-04-10 12:34:51 +00:00
Pawel Jakub Dawidek	0f2c2ce0a3	When KVA is exhausted, try the vm_lowmem event for the last time before panicing. This helps a lot in ZFS stability.	2007-04-05 20:52:51 +00:00
Pawel Jakub Dawidek	fcdd9721e4	Fix a problem for file systems that don't implement VOP_BMAP() operation. The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet: if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] } (we have some random values in rahead and rbehind variables) I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails? alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.	2007-04-05 20:49:46 +00:00
Alan Cox	19c244d064	Prevent a race between vm_object_collapse() and vm_object_split() from causing a crash. Suppose that we have two objects, obj and backing_obj, where backing_obj is obj's backing object. Further, suppose that backing_obj has a reference count of two. One being the reference held by obj and the other by a map entry. Now, suppose that the map entry is deallocated and its reference removed by vm_object_deallocate(). vm_object_deallocate() recognizes that the only remaining reference is from a shadow object, obj, and calls vm_object_collapse() on obj. vm_object_collapse() executes if (backing_object->ref_count == 1) { /* * If there is exactly one reference to the backing * object, we can collapse it into the parent. */ vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT); vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes if (op & OBSC_COLLAPSE_WAIT) { vm_object_set_flag(backing_object, OBJ_DEAD); } Finally, suppose that either vm_object_backing_scan() or vm_object_collapse() sleeps releasing its locks. At this instant, another thread executes vm_object_split(). It crashes in vm_object_reference_locked() on the assertion that the object is not dead. If, however, assertions are not enabled, it crashes much later, after the object has been recycled, in vm_object_deallocate() because the shadow count and shadow list are inconsistent. Reviewed by: tegge Reported by: jhb MFC after: 1 week	2007-03-27 08:55:17 +00:00
Alan Cox	8fece8c367	Two small changes to vm_map_pmap_enter(): 1) Eliminate an unnecessary check for fictitious pages. Specifically, only device-backed objects contain fictitious pages and the object is not device-backed. 2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to prevent possible wrap around with extremely large maps and objects, respectively. Observed by: tegge (last summer)	2007-03-25 19:33:40 +00:00
Alan Cox	768131d293	vm_page_busy() no longer requires the page queues lock to be held. Reduce the scope of the page queues lock in vm_fault() accordingly.	2007-03-23 06:11:25 +00:00
Alan Cox	c5474b8f18	Change the order of lock reacquisition in vm_object_split() in order to simplify the code slightly. Add a comment concerning lock ordering.	2007-03-22 07:02:43 +00:00
Alan Cox	d8810d894d	Use PCPU_LAZY_INC() to update page fault statistics.	2007-03-05 18:55:14 +00:00
John Baldwin	8db5fc58ff	Use pause() in vm_object_deallocate() to yield the CPU to the lock holder rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended to awaken the swapper, not random threads blocked in vm_object_deallocate().	2007-02-27 19:40:26 +00:00
John Baldwin	4d70511ac3	Use pause() rather than tsleep() on stack variables and function pointers.	2007-02-27 17:23:29 +00:00
Alan Cox	9f5c801b94	Change the way that unmanaged pages are created. Specifically, immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.	2007-02-25 06:14:58 +00:00
Alan Cox	0cd31a0d75	Change the page's CLEANCHK flag from being a page queue mutex synchronized flag to a vm object mutex synchronized flag.	2007-02-22 06:15:52 +00:00
Alan Cox	711585d087	Enable vm_page_free() and vm_page_free_zero() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.	2007-02-18 05:54:42 +00:00
Alan Cox	ba000fb2c1	Remove a stale comment. Add punctuation to a nearby comment.	2007-02-17 19:37:00 +00:00
Alan Cox	d3d029bd62	Relax the page queue lock assertions in vm_page_remove() and vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)	2007-02-15 05:43:38 +00:00
Alan Cox	7d60988bad	Avoid the unnecessary acquisition of the free page queues lock when a page is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)	2007-02-14 07:05:55 +00:00
Robert Watson	1e319f6db3	Add uma_set_align() interface, which will be called at most once during boot by MD code to indicated detected alignment preference. Rather than cache alignment being encoded in UMA consumers by defining a global alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now a special value (-1) that causes UMA to look at registered alignment. If no preferred alignment has been selected by MD code, a default alignment of (16 - 1) will be used. Currently, no hardware platforms specify alignment; architecture maintainers will need to modify MD startup code to specify an alignment if desired. This must occur before initialization of UMA so that all UMA zones pick up the requested alignment. Reviewed by: jeff, alc Submitted by: attilio	2007-02-11 20:13:52 +00:00
Alan Cox	5351a2488a	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.	2007-02-11 05:18:40 +00:00
John Baldwin	e8865caffb	- Move 'struct swdevt' back into swap_pager.h and expose it to userland. - Restore support for fetching swap information from crash dumps via kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps. Reviewed by: arch@, phk MFC after: 1 week	2007-02-07 17:43:11 +00:00
Alan Cox	e9f995d824	Change the pagedaemon, vm_wait(), and vm_waitpfault() to sleep on the vm page queue free mutex instead of the vm page queue mutex.	2007-02-07 06:37:30 +00:00
Alan Cox	3ae3919d0b	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.	2007-02-05 06:02:55 +00:00
Mohan Srinivasan	6c125b8df6	Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@	2007-01-25 01:05:23 +00:00
Mohan Srinivasan	7738029183	Fix for a bug where only one process (of multiple) blocked on maxpages on a zone is woken up, with the rest never being woken up as a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked procsses instead. This change introduces a thundering herd, but since this should be relatively infrequent, optimizing this (by introducing a count of blocked processes, for example) may be premature. Reviewd by: ups@	2007-01-24 22:49:11 +00:00
Jeff Roberson	f0393f063a	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.	2007-01-23 08:46:51 +00:00
Xin LI	f67af5c918	Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.	2007-01-17 15:05:52 +00:00
Robert Watson	635fd50514	Remove uma_zalloc_arg() hack, which coerced M_WAITOK to M_NOWAIT when allocations were made using improper flags in interrupt context. Replace with a simple WITNESS warning call. This restores the invariant that M_WAITOK allocations will always succeed or die horribly trying, which is relied on by many UMA consumers. MFC after: 3 weeks Discussed with: jhb	2007-01-10 21:04:43 +00:00
Alan Cox	e6eaadba43	Declare the map entry created by kmem_init() for the range from VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as MAP_NOFAULT.	2007-01-07 07:32:04 +00:00
John Baldwin	663b416f16	- Add a new function uma_zone_exhausted() to see if a zone is full. - Add a printf in swp_pager_meta_build() to warn if the swapzone becomes exhausted so that there's at least a warning before a box that runs out of swapzone space before running out of swap space deadlocks. MFC after: 1 week Reviwed by: alc	2007-01-05 19:09:01 +00:00
Alan Cox	73000556e8	Optimize vm_object_split(). Specifically, make the number of iterations equal to the number of physical pages that are renamed to the new object rather than the new object's virtual size.	2006-12-17 20:14:43 +00:00
Alan Cox	95442adf05	Simplify the computation of the new object's size in vm_object_split().	2006-12-16 08:17:07 +00:00
Kip Macy	35d10226b7	Remove the requirement that phys_avail be sorted in ascending order by explicitly finding the lowest and highest addresses when calculating the size of the vm_pages array Reviewed by :alc	2006-12-08 08:44:47 +00:00
Julian Elischer	ad1e7d285a	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.	2006-12-06 06:34:57 +00:00
Ruslan Ermilov	9bed18a493	The clean_map has been made local to vm_init.c long ago.	2006-11-20 16:23:34 +00:00
Ruslan Ermilov	ef1b7c4804	Remove a redundant pointer-type variable.	2006-11-20 08:33:55 +00:00
Ruslan Ermilov	276096bb3e	When counting vm totals, skip unreferenced objects, including vnodes representing mounted file systems. Reviewed by: alc MFC after: 3 days	2006-11-20 00:16:00 +00:00
Alan Cox	0f3b612a06	There is no point in setting PG_REFERENCED on kmem_object pages because they are "unmanaged", i.e., non-pageable, pages. Remove a stale comment.	2006-11-13 00:27:02 +00:00
Alan Cox	44b8bd66f9	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)	2006-11-12 21:48:34 +00:00
Alan Cox	49c3b92531	I misplaced the assertion that was added to vm_page_startup() in the previous change. Correct its placement.	2006-11-08 19:11:54 +00:00
Alan Cox	9ad3296a25	Simplify the construction of the free queues in vm_page_startup(). Add an assertion to test a hypothesis concerning other redundant computation in vm_page_startup().	2006-11-08 18:43:47 +00:00
Alan Cox	815bc69fb0	Ensure that the page's oflags field is initialized by contigmalloc().	2006-11-08 06:23:29 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Robert Watson	ae4e9636ac	Better align output of "show uma" by moving from displaying the basic counters of allocs/frees/use for each zone to the same statistics shown by userspace "vmstat -z". MFC after: 3 days	2006-10-26 12:55:32 +00:00
Alan Cox	66bdd5d619	The page queues lock is no longer required by vm_page_wakeup().	2006-10-23 05:27:31 +00:00
Alan Cox	2a53696fb8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Alan Cox	9fea8cad08	Eliminate unnecessary PG_BUSY tests. They originally served a purpose that is now handled by vm object locking.	2006-10-21 21:02:04 +00:00
Alan Cox	7e2393ff51	Long ago, revision 1.22 of vm/vm_pager.h introduced a bug. Specifically, it introduced a check after the call to file system's get pages method that assumes that the get pages method does not change the array of pages that is passed to it. In the case of vnode_pager_generic_getpages(), this assumption has been incorrect. The contents of the array of pages may be shifted by vnode_pager_generic_getpages(). Likely, the problem has been hidden by vnode_pager_haspage() limiting the set of pages that are passed to vnode_pager_generic_getpages() such that a shift never occurs. The fix implemented herein is to adjust the pointer to the array of pages rather than shifting the pages within the array. MFC after: 3 weeks Fix suggested by: tegge	2006-10-14 23:21:48 +00:00
Alan Cox	bff763439b	Change vnode_pager_addr() such that on returning it distinguishes between an error returned by VOP_BMAP() and a hole in the file. Change the callers to vnode_pager_addr() such that they return VM_PAGER_ERROR when VOP_BMAP fails instead of a zero-filled page. Reviewed by: tegge MFC after: 3 weeks	2006-10-14 22:09:03 +00:00
Kip Macy	600c53adf9	sun4v requires TSBs (translation storage buffers) to be contiguous and be size aligned requiring heavy usage of vm_page_alloc_contig This change makes vm_page_alloc_contig SMP safe Approved by: scottl (acting as backup for mentor rwatson)	2006-10-12 04:41:39 +00:00
Alan Cox	1de11f1af3	Distinguish between two distinct kinds of errors from VOP_BMAP() in vnode_pager_generic_getpages(): (1) that VOP_BMAP() is unsupported by the underlying file system and (2) an error in performing the VOP_BMAP(). Previously, vnode_pager_generic_getpages() assumed that all errors were of the first type. If, in fact, the error was of the second type, the likely outcome was for the process to become permanently blocked on a busy page. MFC after: 3 weeks Reviewed by: tegge	2006-10-10 18:26:18 +00:00

1 2 3 4 5 ...

2416 Commits