freebsd-skq

Author	SHA1	Message	Date
Alan Cox	9be60284a6	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
Doug Rabson	92bab635d3	Fix a memory leak in the device pager which is exposed by the NVIDIA OpenGL driver. Submitted by: nvidia (possibly also tegge)	2004-07-30 11:09:18 +00:00
Doug Rabson	874f013517	Fix handling of msync(2) for character special files. Submitted by: nvidia	2004-07-30 11:08:02 +00:00
Maxime Henrion	12c649749c	Get rid of another lockmgr(9) consumer by using sx locks for the user maps. We always acquire the sx lock exclusively here, but we can't use a mutex because we want to be able to sleep while holding the lock. This is completely equivalent to what we were doing with the lockmgr(9) locks before. Approved by: alc	2004-07-30 09:10:28 +00:00
Alan Cox	a087914310	Advance the state of pmap locking on alpha, amd64, and i386. - Enable recursion on the page queues lock. This allows calls to vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page queues lock held. Such calls are made to allocate page table pages and pv entries. - The previous change enables a partial reversion of vm/vm_page.c revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault() now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT. - Add partial locking to pmap_copy(). (As a side-effect, pmap_copy() should now be faster on i386 SMP because it no longer generates IPIs for TLB shootdown on the other processors.) - Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now, all changes to a user-level pmap on alpha, amd64, and i386 are performed with appropriate locking.)	2004-07-29 18:56:31 +00:00
Bosko Milekic	244f45548a	Rework the way slab header storage space is calculated in UMA. - zone_large_init() stays pretty much the same. - zone_small_init() will try to stash the slab header in the slab page being allocated if the amount of calculated wasted space is less than UMA_MAX_WASTE (for both the UMA_ZONE_REFCNT case and regular case). If the amount of wasted space is >= UMA_MAX_WASTE, then UMA_ZONE_OFFPAGE will be set and the slab header will be allocated separately for better use of space. - uma_startup() calculates the maximum ipers required in offpage slabs (so that the offpage slab header zone(s) can be sized accordingly). The algorithm used to calculate this replaces the old calculation (which only happened to work coincidentally). We now iterate over possible object sizes, starting from the smallest one, until we determine that wastedspace calculated in zone_small_init() might end up being greater than UMA_MAX_WASTE, at which point we use the found object size to compute the maximum possible ipers. The reason this works is because: - wastedspace versus objectsize is a see-saw function with local minima all equal to zero and local maxima growing directly proportioned to objectsize. This implies that for objects up to or equal a certain objectsize, the see-saw remains entirely below UMA_MAX_WASTE, so for those objectsizes it is impossible to ever go OFFPAGE for slab headers. - ipers (items-per-slab) versus objectsize is an inversely proportional function which falls off very quickly (very large for small objectsizes). - To determine the maximum ipers we'll ever need from OFFPAGE slab headers we first find the largest objectsize for which we are guaranteed to not go offpage for and use it to compute ipers (as though we were offpage). Since the only objectsizes allowed to go offpage are bigger than the found objectsize, and since ipers vs objectsize is inversely proportional (and monotonically decreasing), then we are guaranteed that the ipers computed is always >= what we will ever need in offpage slab headers. - Define UMA_FRITM_SZ and UMA_FRITMREF_SZ to be the actual (possibly padded) size of each freelist index so that offset calculations are fixed. This might fix weird data corruption problems and certainly allows ARM to now boot to at least single-user (via simulator). Tested on i386 UP by me. Tested on sparc64 SMP by fenner. Tested on ARM simulator to single-user by cognet.	2004-07-29 15:25:40 +00:00
Alan Cox	56e0670fdc	Correct a very old error in both vm_object_madvise() (originating in vm/vm_object.c revision 1.88) and vm_object_sync() (originating in vm/vm_map.c revision 1.36): When descending a chain of backing objects, both use the wrong object's backing offset. Consequently, both may operate on the wrong pages. Quoting Matt, "This could be responsible for all of the sporatic madvise oddness that has been reported over the years." Reviewed by: Matt Dillon	2004-07-28 18:23:08 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
Alan Cox	5122b74809	For years, kmem_alloc_pageable() has been misused. Now that the last of these misuses has been corrected, remove it before new ones appear, such as arm/arm/pmap.c revision 1.8.	2004-07-25 20:08:59 +00:00
Alan Cox	9b45f81502	Remove spl calls.	2004-07-25 19:28:10 +00:00
Alan Cox	57a21aba93	Make the code and comments for vm_object_coalesce() consistent.	2004-07-25 07:48:47 +00:00
Alan Cox	51ab6c2890	Simplify vmspace initialization. The bcopy() of fields from the old vmspace to the new vmspace in vmspace_exec() is mostly wasted effort. With one exception, vm_swrss, the copied fields are immediately overwritten. Instead, initialize these fields to zero in vmspace_alloc(), eliminating a bcopy() from vmspace_exec() and a bzero() from vmspace_fork().	2004-07-24 07:40:35 +00:00
Alan Cox	5285558ac2	- Change uma_zone_set_obj() to call kmem_alloc_nofault() instead of kmem_alloc_pageable(). The difference between these is that an errant memory access to the zone will be detected sooner with kmem_alloc_nofault(). The following changes serve to eliminate the following lock-order reversal reported by witness: 1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311 2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797 3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931 There is no potential deadlock in this case. However, witness is unable to recognize this because vm objects used by UMA have the same type as ordinary vm objects. To remedy this, we make the following changes: - Add a mutex type argument to VM_OBJECT_LOCK_INIT(). - Use the mutex type argument to assign distinct types to special vm objects such as the kernel object, kmem object, and UMA objects. - Define a static swap zone object for use by UMA. (Only static objects are assigned a special mutex type.)	2004-07-22 19:44:49 +00:00
Brian Feldman	d951b75210	Fix a race in vm_page_sleep_if_busy(). Due to vm_object locking being incomplete, it currently has to know how to drop and pick back up the vm_object's mutex if it has to sleep and drop the page queue mutex. The problem with this is that if the page is busy, while we are sleeping, the page can be freed and object disappear. When trying to lock m->object, we'd get a stale or NULL pointer and crash. The object is now cached, but this makes the assumption that the object is referenced in some manner and will not itself disappear while it is unlocked. Since this only happens if the object is locked, I had to remove an assumption earlier in contigmalloc() that reversed the order of locking the object and doing vm_page_sleep_if_busy(), not the normal order.	2004-07-21 23:56:09 +00:00
Peter Wemm	5476633aed	Semi-gratuitous change. Move two refcount operations to their own lines rather than be buried inside an if (expression). And now that the if expression is the same in both exit paths, use the same ordering.	2004-07-21 05:08:10 +00:00
Peter Wemm	3f25cbddc2	Move the initialization and teardown of pmaps to the vmspace zone's init and fini handlers. Our vm system removes all userland mappings at exit prior to calling pmap_release. It just so happens that we might as well reuse the pmap for the next process since the userland slate has already been wiped clean. However. There is a functional benefit to this as well. For platforms that share userland and kernel context in the same pmap, it means that the kernel portion of a pmap remains valid after the vmspace has been freed (process exit) and while it is in uma's cache. This is significant for i386 SMP systems with kernel context borrowing because it avoids a LOT of IPIs from the pmap_lazyfix() cleanup in the usual case. Tested on: amd64, i386, sparc64, alpha Glanced at by: alc	2004-07-21 00:29:21 +00:00
Brian Feldman	757cd67065	Remove extraneous locks on the VM free page queue mutex; it is not meant to be recursed upon, and could cauuse a deadlock inside the new contigmalloc (vm.old_contigmalloc=0) code. Submitted by: alc	2004-07-19 23:29:36 +00:00
Alan Cox	e832aafc51	- Eliminate the pte object from the pmap. Instead, page table pages are allocated as "no object" pages. Similar changes were made to the amd64 and i386 pmap last year. The primary reason being that maintaining a pte object leads to lock order violations. A secondary reason being that the pte object is redundant, i.e., the page table itself can be used to lookup page table pages. (Historical note: The pte object predates our ability to allocate "no object" pages. Thus, the pte object was a necessary evil.) - Unconditionally check the vm object lock's status in vm_page_remove(). Previously, this assertion could not be made on Alpha due to its use of a pte object.	2004-07-19 18:12:04 +00:00
Brian Feldman	0c3c862e21	Since breakage of malloc(9)/uma_zalloc(9) is totally non-optional in GENERIC/for WITNESS users, make sure the sysctl to disable the behavior is read-only and always enabled.	2004-07-19 15:05:24 +00:00
Brian Feldman	4362fada8f	Reimplement contigmalloc(9) with an algorithm which stands a greatly- improved chance of working despite pressure from running programs. Instead of trying to throw a bunch of pages out to swap and hope for the best, only a range that can potentially fulfill contigmalloc(9)'s request will have its contents paged out (potentially, not forcibly) at a time. The new contigmalloc operation still operates in three passes, but it could potentially be tuned to more or less. The first pass only looks at pages in the cache and free pages, so they would be thrown out without having to block. If this is not enough, the subsequent passes page out any unwired memory. To combat memory pressure refragmenting the section of memory being laundered, each page is removed from the systems' free memory queue once it has been freed so that blocking later doesn't cause the memory laundered so far to get reallocated. The page-out operations are now blocking, as it would make little sense to try to push out a page, then get its status immediately afterward to remove it from the available free pages queue, if it's unlikely to have been freed. Another change is that if KVA allocation fails, the allocated memory segment will be freed and not leaked. There is a sysctl/tunable, defaulting to on, which causes the old contigmalloc() algorithm to be used. Nonetheless, I have been using vm.old_contigmalloc=0 for over a month. It is safe to switch at run-time to see the difference it makes. A new interface has been used which does not require mapping the allocated pages into KVA: vm_page.h functions vm_page_alloc_contig() and vm_page_release_contig(). These are what vm.old_contigmalloc=0 uses internally, so the sysctl/tunable does not affect their operation. When using the contigmalloc(9) and contigfree(9) interfaces, memory is now tracked with malloc(9) stats. Several functions have been exported from kern_malloc.c to allow other subsystems to use these statistics, as well. This invalidates the BUGS section of the contigmalloc(9) manpage.	2004-07-19 06:21:27 +00:00
Alan Cox	3e36afbe27	Remove the GIANT_REQUIRED preceding pmap_remove() in vm_pageout_map_deactivate_pages().	2004-07-18 04:38:11 +00:00
Alan Cox	3d2e54c317	Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.	2004-07-15 18:00:43 +00:00
Alan Cox	26354d4c08	Remove an unused and unimplemented sysctl. (For the record, it was marked as unimplemented in revision 1.129 nearly six years ago.)	2004-07-12 17:45:37 +00:00
Alan Cox	790bdd0f2e	Increase the scope of the page queues lock in vm_page_alloc() to cover a diagnostic check that accesses the cache queue count.	2004-07-10 22:12:49 +00:00
Alan Cox	fd2d354908	Micro-optimize vmspace for 64-bit architectures: Colocate vm_refcnt and vm_exitingcnt so that alignment does not result in wasted space.	2004-07-06 17:35:10 +00:00
Bruce M Simpson	9bd86a9861	Properly brucify a string by outdenting it.	2004-07-06 02:27:30 +00:00
Bosko Milekic	0d0837ee6d	Introduce debug.nosleepwithlocks sysctl, 0 by default. If set to 1 and WITNESS is not built, then force all M_WAITOK allocations to M_NOWAIT behavior (transparently). This is to be used temporarily if wierd deadlocks are reported because we still have code paths that perform M_WAITOK allocations with lock(s) held, which can lead to deadlock. If WITNESS is compiled, then the sysctl is ignored and we ask witness to tell us wether we have locks held, converting to M_NOWAIT behavior only if it tells us that we do. Note this removes the previous mbuf.h inclusion as well (only needed by last revision), and cleans up unneeded [artificial] comparisons to just the mbuf zones. The problem described above has nothing to do with previous mbuf wait behavior; it is a general problem.	2004-07-04 16:07:44 +00:00
Brian Feldman	7a708c3626	Reextend the M_WAITOK-disabling-hack to all three of the mbuf-related zones, and do it by direct comparison of uma_zone_t instead of strcmp. The mbuf subsystem used to provide M_TRYWAIT/M_DONTWAIT semantics, but this is mostly no longer the case. M_WAITOK has taken over the spot M_TRYWAIT used to have, and for mbuf things, still may return NULL if the code path is incorrectly holding a mutex going into mbuf allocation functions. The M_WAITOK/M_NOWAIT semantics are absolute; though it may deadlock the system to try to malloc or uma_zalloc something with a mutex held and M_WAITOK specified, it is absolutely required to not return NULL and will result in instability and/or security breaches otherwise. There is still room to add the WITNESS_WARN() to all cases so that we are notified of the possibility of deadlocks, but it cannot change the value of the "badness" variable and allow allocation to actually fail except for the specialized cases which used to be M_TRYWAIT.	2004-07-04 15:59:25 +00:00
Brian Feldman	cf107c1d1a	Limit mbuma damage. Suddenly ALL allocations with M_WAITOK are subject to failing -- that is, allocations via malloc(M_WAITOK) that are required to never fail -- if WITNESS is not defined. While everyone should be running WITNESS, in any case, zone "Mbuf" allocations are really the only ones that should be screwed with by this hack. This hack is crashing people, and would continue to do so with or without WITNESS. Things shouldn't be allocating with M_WAITOK with locks held, but it's not okay just to always remove M_WAITOK when !WITNESS. Reported by: Bernd Walter <ticso@cicely5.cicely.de>	2004-07-03 18:11:41 +00:00
John Baldwin	0c0b25ae91	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)	2004-07-02 20:21:44 +00:00
John Baldwin	bf0acc273a	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.	2004-07-02 19:09:50 +00:00
John Baldwin	d202e0cccc	- Don't use a variable to point to the user area that we only use once. Just use p2->p_uarea directly instead. - Remove an old and mostly bogus assertion regarding p2->p_sigacts. - Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.	2004-07-02 03:45:07 +00:00
Tor Egge	9174ca7ba3	Initialize result->backing_object_offset before linking result onto the list of vm objects shadowing source in vm_object_shadow(). This closes a race where vm_object_collapse() could be called with a partially uninitialized object argument causing symptoms that looked like hardware problems, e.g. signal 6, 10, 11 or a /bin/sh busy-waiting for a nonexistant child process.	2004-06-28 20:26:35 +00:00
Andrew Gallatin	b351299ca3	Use MIN() macro rather than ulmin() inline, and fix stray tab that snuck in with my last commit. Submitted by: green	2004-06-28 19:58:39 +00:00
Andrew Gallatin	1dad8fe1ed	Fix alpha - the use of min() on longs was loosing the high bits and returning wrong answers, leading to strange values vm2->vm_{s,t,d}size.	2004-06-28 19:15:40 +00:00
David Schultz	17d9d0d049	Update a stale comment. The heuristic to swap processes out based on the number of pages already paged out was broken in rev 1.10 and removed in rev 1.11.	2004-06-27 01:58:12 +00:00
Alan Cox	d9dd6bfb56	Remove an unused field from the vmspace structure.	2004-06-26 19:16:35 +00:00
Brian Feldman	2a7be1b6d1	Correct the tracking of various bits of the process's vmspace and vm_map when not propogated on fork (due to minherit(2)). Consistency checks otherwise fail when the vm_map is freed and it appears to have not been emptied completely, causing an INVARIANTS panic in vm_map_zdtor(). PR: kern/68017 Submitted by: Mark W. Krentel <krentel@dreamscape.com> Reviewed by: alc	2004-06-24 22:43:46 +00:00
Alan Cox	5e609009de	Call vm_pageout_page_stats() with the page queues lock held.	2004-06-24 04:08:43 +00:00
Alan Cox	1aab16a6b6	Remove spl calls.	2004-06-24 03:13:30 +00:00
Bosko Milekic	cc822cb53e	Make uma_mtx MTX_RECURSE. Here's why: The general UMA lock is a recursion-allowed lock because there is a code path where, while we're still configured to use startup_alloc() for backend page allocations, we may end up in uma_reclaim() which calls zone_foreach(zone_drain), which grabs uma_mtx, only to later call into startup_alloc() because while freeing we needed to allocate a bucket. Since startup_alloc() also takes uma_mtx, we need to be able to recurse on it. This exact explanation also added as comment above mtx_init(). Trace showing recursion reported by: Peter Holm <peter-at-holm.cc>	2004-06-23 21:59:03 +00:00
Bruce M Simpson	0e3fe6e3e6	In swap_pager_getpages(), bp->b_dev can be NULL, particularly for the case of NFS mounted swap, so do not try to dereference it. While we're here, brucify the printf() call which happens when we time out on acquisition of vm_page_queue_mtx. PR: kern/67898 Submitted by: bde (style)	2004-06-23 15:15:07 +00:00
Alan Cox	0a2df4773c	Remove spl() calls. Update comments to reflect the removal of spl() calls. Remove '\n' from panic() format strings. Remove some blank lines.	2004-06-19 04:19:47 +00:00
Poul-Henning Kamp	f3732fd15b	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.	2004-06-17 17:16:53 +00:00
Alan Cox	d45f21f31a	Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not accessible through an object. Thus, PG_BUSY serves no purpose.	2004-06-17 06:16:58 +00:00
Poul-Henning Kamp	89c9c53da0	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.	2004-06-16 09:47:26 +00:00
Julian Elischer	fa88511615	Nice, is a property of a process as a whole.. I mistakenly moved it to the ksegroup when breaking up the process structure. Put it back in the proc structure.	2004-06-16 00:26:31 +00:00
Brian Feldman	408a38478a	Make contigmalloc() more reliable: 1. Remove a race whereby contigmalloc() would deadlock against the running processes in the system if they kept reinstantiating the memory on the active and inactive page queues that it was trying to flush out. The process doing the contigmalloc() would sit in "swwrt" forever and the swap pager would be going at full force, but never get anywhere. Instead of doing it until the queues are empty, launder for as many iterations as there are pages in the queue. 2. Do all laundering to swap synchronously; previously, the vnode laundering was synchronous and the swap laundering not. 3. Increase the number of launder-or-allocate passes to three, from two, while failing without bothering to do all the laundering on the third pass if allocation was not possible. This effectively gives exactly two chances to launder enough contiguous memory, helpful with high memory churn where a lot of memory from one pass to the next (and during a single laundering loop) becomes dirtied again. I can now reliably hot-plug hardware requiring a 256KB contigmalloc() without having the kldload/cbb ithread sit around failing to make progress, while running a busy X session. Previously, it took killing X to get contigmalloc() to get further (that is, quiescing the system), and even then contigmalloc() returned failure.	2004-06-15 01:02:00 +00:00
Poul-Henning Kamp	1930e303cf	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.	2004-06-11 11:16:26 +00:00
Bosko Milekic	7fd8788213	Backout previous change, I think Julian has a better solution which does not require type-stable refcnts here.	2004-06-09 20:50:08 +00:00
Bosko Milekic	e66468ea7a	Make the slabrefzone, the zone from which we allocated slabs with internal reference counters, UMA_ZONE_NOFREE. This way, those slabs (with their ref counts) will be effectively type-stable, then using a trick like this on the refcount is no longer dangerous: MEXT_REM_REF(m); if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) { if (m->m_ext.ext_type == EXT_PACKET) { uma_zfree(zone_pack, m); return; } else if (m->m_ext.ext_type == EXT_CLUSTER) { uma_zfree(zone_clust, m->m_ext.ext_buf); m->m_ext.ext_buf = NULL; } else { (*(m->m_ext.ext_free))(m->m_ext.ext_buf, m->m_ext.ext_args); if (m->m_ext.ext_type != EXT_EXTREF) free(m->m_ext.ref_cnt, M_MBUF); } } uma_zfree(zone_mbuf, m); Previously, a second thread hitting the above cmpset might actually read the refcnt AFTER it has already been freed. A very rare occurance. Now we'll know that it won't be freed, though. Spotted by: julian, pjd	2004-06-09 19:18:50 +00:00
Alexander Leidinger	b1dabb2606	Remove references to L1 in the comments, according to Alan they are historical leftovers. Approved by: alc	2004-06-07 19:33:05 +00:00
Alan Cox	69c1a910e5	Update stale comments regarding page coloring.	2004-06-05 21:06:42 +00:00
Alan Cox	62326de742	Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to blist.h, enabling the removal of numerous #includes from subr_blist.c. (subr_blist.c and swap_pager.c are the only users of these definitions.)	2004-06-04 04:03:26 +00:00
Bosko Milekic	b83e441b9f	Fix a comment above uma_zsecond_create(), describing its arguments. It doesn't take 'align' and 'flags' but 'master' instead, which is a reference to the Master Zone, containing the backing Keg. Pointed out by: Tim Robbins (tjr)	2004-06-01 01:36:26 +00:00
Bosko Milekic	099a0e588c	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
Alan Cox	e363785643	Remove a stale comment: PG_DIRTY and PG_FILLED were removed in revisions 1.17 and 1.12 respectively.	2004-05-30 20:48:15 +00:00
Hiten Pandya	76ce6ff787	Correct typo, vm_page_list_find() is called vm_pageq_find() for quite a long time, i.e., since the cleanup of the VM Page-queues code done two years ago. Reviewed by: Alan Cox <alc at freebsd.org>, Matthew Dillon <dillon at backplane.com>	2004-05-30 00:42:38 +00:00
Dag-Erling Smørgrav	c53f7ace3a	MFS: vm_map.c rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but provide a sysctl knob for reverting to old ones.	2004-05-25 18:40:53 +00:00
Dag-Erling Smørgrav	b103b94801	Back out previous commit; it went to the wrong file.	2004-05-25 18:28:52 +00:00
Dag-Erling Smørgrav	9507605f93	MFS: rev 1.187.2.27 through 1.187.2.29, fix MS_INVALIDATE semantics but provide a sysctl knob for reverting to old ones.	2004-05-25 16:31:49 +00:00
Alan Cox	3ffbc0cd8e	Correct two error cases in vm_map_unwire(): 1. Contrary to the Single Unix Specification our implementation of munlock(2) when performed on an unwired virtual address range has returned an error. Correct this. Note, however, that the behavior of "system" unwiring is unchanged, only "user" unwiring is changed. If "system" unwiring is performed on an unwired virtual address range, an error is still returned. 2. Performing an errant "system" unwiring on a virtual address range that was "user" (i.e., mlock(2)) but not "system" wired would incorrectly undo the "user" wiring instead of returning an error. Correct this. Discussed with: green@ Reviewed by: tegge@	2004-05-25 05:51:17 +00:00
Alan Cox	4be14af9cf	To date, unwiring a fictitious page has produced a panic. The reason being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious pages but unwiring uses PHYS_TO_VM_PAGE(). The resulting panic reported an unexpected wired count. Rather than attempting to fix PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of fictitious pages. Specifically, fictitious pages will never be completely unwired. Therefore, we can keep a fictitious page's wired count forever set to one and thereby avoid the use of PHYS_TO_VM_PAGE() when we know that we're working with a fictitious page, just not which one. In collaboration with: green@, tegge@ PR: kern/29915	2004-05-22 04:53:51 +00:00
Alan Cox	1bb816d3d1	Restructure vm_page_select_cache() so that adding assertions is easy. Some of the conditions that caused vm_page_select_cache() to deactivate a page were wrong. For example, deactivating an unmanaged or wired page is a nop. Thus, if vm_page_select_cache() had ever encountered an unmanaged or wired page, it would have looped forever. Now, we assert that the page is neither unmanaged nor wired.	2004-05-12 04:27:18 +00:00
Alan Cox	f651b12907	Cache queue pages are not mapped. Thus, the pmap_remove_all() by vm_pageout_scan()'s loop for freeing cache queue pages is unnecessary.	2004-05-12 04:10:35 +00:00
Tim J. Robbins	8eec77b09e	To handle orphaned character device vnodes properly in mmap(), check that v_mount is non-null before dereferencing it. If it's null, behave as if MNT_NOEXEC was not set on the mount that originally containined it.	2004-05-11 10:26:37 +00:00
Alan Cox	3f39cca96b	Cache queue pages are not mapped. Thus, the pmap_remove_all() by vm_page_alloc() is unnecessary.	2004-05-09 01:00:15 +00:00
Brian Feldman	d9b2500eef	In r1.190, vslock() and vsunlock() were bogusly made to do a "user wire" and a "system unwire." Make this a "system wire" and "system unwire." Reviewed by: alc	2004-05-07 11:43:24 +00:00
Brian Feldman	af7cd0c521	Properly remove MAP_FUTUREWIRE when a vm_map_entry gets torn down. Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's vmspace::vm_map and subsequent processes would wire all of their memory. Coupled with a wired-page leak in vm_fault_unwire(), this would run the system out of free pages and cause programs to randomly SIGBUS when faulting in new pages. (Note that this is not the fix for the latter part; pages are still leaked when a wired area is unmapped in some cases.) Reviewed by: alc PR kern/62930	2004-05-07 00:17:07 +00:00
Alan Cox	5a32489377	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
Alan Cox	8a3ef85721	Zero the physical page only if it is invalid and not prezeroed.	2004-04-25 07:58:59 +00:00
Alan Cox	e265f05414	Add a VM_OBJECT_LOCK_ASSERT() call. Remove splvm() and splx() calls. Move a comment.	2004-04-24 23:23:36 +00:00
Alan Cox	2ec91846fd	Update the comment describing vm_page_grab() to reflect the previous revision and correct some of its style errors.	2004-04-24 21:36:23 +00:00
Alan Cox	7ef6ba5d27	Push down the responsibility for zeroing a physical page from the caller to vm_page_grab(). Although this gives VM_ALLOC_ZERO a different meaning for vm_page_grab() than for vm_page_alloc(), I feel such change is necessary to accomplish other goals. Specifically, I want to make the PG_ZERO flag immutable between the time it is allocated by vm_page_alloc() and freed by vm_page_free() or vm_page_free_zero() to avoid locking overheads. Once we gave up on the ability to automatically recognize a zeroed page upon entry to vm_page_free(), the ability to mutate the PG_ZERO flag became useless. Instead, I would like to say that "Once a page becomes valid, its PG_ZERO flag must be ignored."	2004-04-24 20:53:55 +00:00
Alan Cox	4da4d293df	In cases where a file was resident in memory mmap(..., PROT_NONE, ...) would actually map the file with read access enabled. According to http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read access on a virtual address range that was PROT_NONE. The solution implemented herein is (1) to pass a vm_prot_t to vm_map_pmap_enter() describing the allowed access and (2) to make vm_map_pmap_enter() responsible for understanding the limitations of pmap_enter_quick(). Submitted by: "Mark W. Krentel" <krentel@dreamscape.com> PR: kern/64573	2004-04-24 03:46:44 +00:00
Alan Cox	87aefa499a	Push down Giant into vm_pager_get_pages(). The only get pages methods that require Giant are in the device and vnode pagers.	2004-04-23 06:10:58 +00:00
Alan Cox	b14d6acced	- pmap_kenter_temporary() is unused by machine-independent code. Therefore, move its declaration to the machine-dependent header file on those machines that use it. In principle, only i386 should have it. Alpha and AMD64 should use their direct virtual-to-physical mapping. - Remove pmap_kenter_temporary() from ia64. It is unused. Approved by: marcel@	2004-04-10 22:41:46 +00:00
Alan Cox	41f1b2c460	The demise of vm_pager_map_page() in revision 1.93 of vm/vm_pager.c permits the reduction of the pager map's size by 8M bytes. In other words, eight megabytes of largely wasted KVA are returned to the kernel map for use elsewhere.	2004-04-08 19:08:49 +00:00
Warner Losh	05eb3785e7	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-06 20:15:37 +00:00
Alan Cox	9e0ddbd00a	Eliminate vm_pager_map_page() and vm_pager_unmap_page() and their uses. Use sf_buf_alloc() and sf_buf_free() instead.	2004-04-06 07:12:32 +00:00
Alexander Kabaev	ce7a036d02	Delay permission checks for VCHR vnodes until after vnode is locked in vm_mmap_vnode function, where we can safely check for a special /dev/zero case. Rev. 1.180 has reordered checks and introduced a regression. Submitted by: alc Was broken by: kan	2004-04-05 04:54:22 +00:00
Alan Cox	bdb93eb248	Remove unused arguments from pmap_init().	2004-04-05 00:37:50 +00:00
Alan Cox	889eb0fc62	Eliminate unused arguments from vm_page_startup().	2004-04-04 23:33:36 +00:00
Tim J. Robbins	ed0302e6a7	Do not copy vm_exitingcnt to the new vmspace in vmspace_exec(). Copying it led to impossibly high values in the new vmspace, causing it to never drop to 0 and be freed.	2004-03-23 08:37:34 +00:00
Guido van Rooij	b483c7f6e2	When mmap-ing a file from a noexec mount, be sure not to grant the right to mmap it PROT_EXEC. This also depends on the architecture, as some architextures (e.g. i386) do not distinguish between read and exec pages Inspired by: http://linux.bkbits.net:8080/linux-2.4/cset@1.1267.1.85 Reviewed by: alc	2004-03-18 20:58:51 +00:00
Don Lewis	bb734798af	Make overflow/wraparound checking more robust and unbreak len=0 in vslock(), mlock(), and munlock(). Reviewed by: bde	2004-03-15 09:11:23 +00:00
Don Lewis	f0ea4612ef	Style(9) changes. Pointed out by: bde	2004-03-15 06:43:51 +00:00
Don Lewis	ce8660e395	Revert to the original vslock() and vsunlock() API with the following exceptions: Retain the recently added vslock() error return. The type of the len argument should be size_t, not u_int. Suggested by: bde	2004-03-15 06:42:40 +00:00
Don Lewis	be4c5ad025	Remove redundant suser() check.	2004-03-15 06:36:55 +00:00
Alan Cox	0fcfb99247	Remove GIANT_REQUIRED from contigfree().	2004-03-13 07:09:15 +00:00
Peter Wemm	2965c04576	Part 2 of rev 1.68. Update comment to match reality now that vm_endcopy exists and we no longer copy to the end of the struct. Forgotten by: alfred and green	2004-03-12 00:16:48 +00:00
Alan Cox	5d328ed44b	- Make the acquisition of Giant in vm_fault_unwire() conditional on the pmap. For the kernel pmap, Giant is not required. In general, for other pmaps, Giant is required by i386's pmap_pte() implementation. Specifically, the use of PMAP2/PADDR2 is synchronized by Giant. Note: In principle, updates to the kernel pmap's wired count could be lost without Giant. However, in practice, we never use the kernel pmap's wired count. This will be resolved when pmap locking appears. - With the above change, cpu_thread_clean() and uma_large_free() need not acquire Giant. (The first case is simply the revival of i386/i386/vm_machdep.c's revision 1.226 by peter.)	2004-03-10 04:44:43 +00:00
Alan Cox	a7d8612155	Implement a work around for the deadlock avoidance case in vm_object_deallocate() so that it doesn't spin forever either. Submitted by: bde	2004-03-08 03:54:36 +00:00
Alan Cox	fcffa790e9	Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.	2004-03-07 21:06:48 +00:00
Robert Watson	a3c0761103	Mark uma_callout as CALLOUT_MPSAFE, as uma_timeout can run MPSAFE. Reviewed by: jeff	2004-03-07 07:00:46 +00:00
Don Lewis	169299398a	Undo the merger of mlock()/vslock and munlock()/vsunlock() and the introduction of kern_mlock() and kern_munlock() in src/sys/kern/kern_sysctl.c 1.150 src/sys/vm/vm_extern.h 1.69 src/sys/vm/vm_glue.c 1.190 src/sys/vm/vm_mmap.c 1.179 because different resource limits are appropriate for transient and "permanent" page wiring requests. Retain the kern_mlock() and kern_munlock() API in the revived vslock() and vsunlock() functions. Combine the best parts of each of the original sets of implementations with further code cleanup. Make the mclock() and vslock() implementations as similar as possible. Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent test, which can return EAGAIN, last so that requests that have no hope of ever being satisfied will not be retried unnecessarily. Disable the test that can return EAGAIN in the vslock() implementation because it will cause the sysctl code to wedge. Tested by: Cy Schubert <Cy.Schubert AT komquats.com>	2004-03-05 22:03:11 +00:00
Alan Cox	3b383f0922	In the last revision, I introduced a physical contiguity check that is both unnecessary and wrong. While it is necessary to verify that the page is still free after dropping and reacquiring the free page queue lock, the physical contiguity of the page can not change, making this check unnecessary. This check was wrong in that it could cause an out-of-bounds array access. Tested by: rwatson	2004-03-05 04:46:32 +00:00
Bruce Evans	61ecb14af6	Record exactly where this file was copied from. It wasn't repo-copied so this is not very obvious. Fixed some style bugs (mainly missing parentheses around return values).	2004-03-04 10:18:17 +00:00
Bruce Evans	dcbcd518e0	Minor style fixes. In vm_daemon(), don't fetch the rss limit long before it is needed.	2004-03-04 09:36:46 +00:00
Alan Cox	45ad3d59ed	Remove some long unused definitions.	2004-03-04 04:26:14 +00:00
Alan Cox	ca3b447732	Modify contigmalloc1() so that the free page queues lock is not held when vm_page_free() is called. The problem with holding this lock is that it is a spin lock and vm_page_free() may attempt the acquisition of a different default-type lock.	2004-03-02 08:25:58 +00:00
Alexander Kabaev	30d4dd7ee9	Pich up a do {} while(0) cleanup by phk that was discarded accidentally in previous revision. Submitted by: alc	2004-03-01 02:44:33 +00:00
Alexander Kabaev	c8daea132f	Move the code dealing with vnode out of several functions into a single helper function vm_mmap_vnode. Discussed with: jeffr,alc (a while ago)	2004-02-27 22:02:15 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Alan Cox	2c840b1f65	- Substitute bdone() and bwait() from vfs_bio.c for swap_pager_putpages()'s buffer completion code. Note: the only difference between swp_pager_sync_iodone() and bdone(), aside from the locking in the latter, was the unnecessary clearing of B_ASYNC. - Remove an unnecessary pmap_page_protect() from swp_pager_async_iodone(). Reviewed by: tegge	2004-02-23 03:15:13 +00:00
Alan Cox	85b8d6b45b	Correct a long-standing race condition in vm_object_page_remove() that could result in a dirty page being unintentionally freed. Reviewed by: tegge MFC after: 7 days	2004-02-22 03:36:51 +00:00
Alan Cox	9ea8d1a67c	Eliminate the second, unnecessary call to pmap_page_protect() near the end of vm_pageout_flush(). Instead, assert that the page is still write protected. Discussed with: tegge	2004-02-21 23:32:00 +00:00
Alan Cox	0f75a97722	- Correct a long-standing race condition in vm_page_try_to_free() that could result in a dirty page being unintentionally freed. - Simplify the dirty page check in vm_page_dontneed(). Reviewed by: tegge MFC after: 7 days	2004-02-19 07:43:55 +00:00
Dag-Erling Smørgrav	497ddd5807	Back out previous commit due to objections.	2004-02-16 21:36:59 +00:00
Dag-Erling Smørgrav	cbea5fb98f	Don't panic if we fail to satisfy an M_WAITOK request; return 0 instead. The calling code will either handle that gracefully or cause a page fault.	2004-02-16 18:41:58 +00:00
Alan Cox	5850fa3e42	Correct a long-standing race condition in vm_contig_launder() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275. MFC after: 7 days	2004-02-16 03:43:57 +00:00
Alan Cox	c6d9ef2e1f	Correct a long-standing race condition in vm_fault() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275. Reviewed by: tegge MFC after: 7 days	2004-02-15 00:42:26 +00:00
Alan Cox	84d98bf699	- Correct a long-standing race condition in vm_page_try_to_cache() that could result in a panic "vm_page_cache: caching a dirty page, ...": Access to the page must be restricted or removed before calling vm_page_cache(). This race condition is identical in nature to that which was addressed by vm_pageout.c's revision 1.251. - Simplify the code surrounding the fix to this same race condition in vm_pageout.c's revision 1.251. There should be no behavioral change. Reviewed by: tegge MFC after: 7 days	2004-02-14 08:54:37 +00:00
Poul-Henning Kamp	d2bae332d6	Remove the absolute count g_access_abs() function since experience has shown that it is not useful. Rename the relative count g_access_rel() function to g_access(), only the name has changed. Change all g_access_rel() calls in our CVS tree to call g_access() instead. Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source code compatibility.	2004-02-12 22:42:11 +00:00
Alan Cox	40448065e8	Further reduce the use of Giant in vm_map_delete(): Perform pmap_remove() on system maps, besides the kmem_map, without Giant. In collaboration with: tegge	2004-02-12 20:56:06 +00:00
Alan Cox	a3dfacb51c	Correct a long-standing race condition in the inactive queue scan. (See the added comment for low-level details.) The effect of this race condition is a panic "vm_page_cache: caching a dirty page, ..." Reviewed by: tegge MFC after: 7 days	2004-02-10 18:34:27 +00:00
Alan Cox	c5aebf380c	swp_pager_async_iodone() no longer requires Giant. Modify bufdone() and swapgeom_done() to perform swp_pager_async_iodone() without Giant. Reviewed by: tegge	2004-02-07 08:54:50 +00:00
Alan Cox	bfee999d6a	- Locking for the per-process resource limits structure has eliminated the need for Giant in vm_map_growstack(). - Use the proc * that is passed to vm_map_growstack() rather than curthread->td_proc.	2004-02-05 06:33:18 +00:00
John Baldwin	91d5354a2c	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64	2004-02-04 21:52:57 +00:00
John Baldwin	b56ef1c10d	Drop the reference count on the old vmspace after fully switching the current thread to the new vmspace. Suggested by: dillon	2004-02-02 23:23:48 +00:00
Poul-Henning Kamp	3e5b686160	Check error return from g_clone_bio(). (netchild@) Add XXX comment about why this is still not optimal. (phk@) Submitted by: netchild@	2004-02-02 13:08:03 +00:00
Jeff Roberson	7b09539ce2	- Use a seperate startup function for the zeroidle kthread. Use this to set P_NOLOAD prior to running the thread.	2004-02-02 07:51:03 +00:00
Jeff Roberson	aaa8bb1604	- Fix a problem where we did not drain the cache of buckets in the zone when uma_reclaim() was called. This was introduced when the zone working-set algorithm was removed in favor of using the per cpu caches as the working set.	2004-02-01 06:15:17 +00:00
Dag-Erling Smørgrav	e726bc0e6c	Mechanical whitespace cleanup.	2004-01-30 16:26:29 +00:00
Bruce Evans	9a44a82b61	Fixed breakage of scheduling in rev.1.29 of subr_4bsd.c. The "scheduler" here has very little to do with scheduling. It is actually the swapper, and it really must be the last SYSINIT'ed item like its comment says, since proc0 metamorphoses into swapper by calling scheduler() last in mi_start(), and scheduler() never returns.. Rev.1.29 of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item (kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST. The sorting of SYSINITs with identical orders (at all levels) is apparently nondeterministic, so this resulted in schedule() sometimes being called second last and schedcpu_thread() not being called at all. This quick fix just changes the code to almost match the comment (SI_ORDER_FIRST -> SI_ORDER_ANY). "LAST" is misspelled "ANY", and there is no way to ensure that there is only 1 very lst SYSINIT. A more complete fix would remove the SYSINIT obfuscation.	2004-01-29 12:35:11 +00:00
Jeff Roberson	29bcc4514f	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.	2004-01-25 03:54:52 +00:00
Alan Cox	7dea2c2e3b	1. Statically initialize swap_pager_full and swap_pager_almost_full to the full state. (When swap is added their state will change appropriately.) 2. Set swap_pager_full and swap_pager_almost_full to the full state when the last swap device is removed. Combined these changes eliminate nonsense messages from the kernel on swap- less machines. Item 2 submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz> Prodding by: phk	2004-01-24 21:31:06 +00:00
Alan Cox	c19aa3402b	Increase UMA_BOOT_PAGES because of changes to pv entry initialization in revision 1.457 of i386/i386/pmap.c.	2004-01-18 05:51:06 +00:00
Alan Cox	23b186d324	Don't acquire Giant in vm_object_deallocate() unless the object is vnode- backed.	2004-01-18 03:44:14 +00:00
Alan Cox	f4c2663897	Remove vm_page_alloc_contig(). It's now unused.	2004-01-14 06:21:38 +00:00
Alan Cox	0e88a71798	Remove long dead code, specifically, code related to munmapfd(). (See also vm/vm_mmap.c revision 1.173.)	2004-01-11 06:59:21 +00:00
Alan Cox	baadec0711	- Unmanage pages allocated by contigmalloc1(). (There is no point in having PV entries for these pages.) - Remove splvm() and splx() calls.	2004-01-10 21:17:53 +00:00
Alan Cox	37d44833d5	Unmanage pages allocated by kmem_alloc(). (There is no point in having PV entries for these pages.)	2004-01-10 00:22:33 +00:00
Alan Cox	65bae14d77	- Enable recursive acquisition of the mutex synchronizing access to the free pages queue. This is presently needed by contigmalloc1(). - Move a sanity check against attempted double allocation of two pages to the same vm object offset from vm_page_alloc() to vm_page_insert(). This provides better protection because double allocation could occur through a direct call to vm_page_insert(), such as that by vm_page_rename(). - Modify contigmalloc1() to hold the mutex synchronizing access to the free pages queue while it scans vm_page_array in search of free pages. - Correct a potential leak of pages by contigmalloc1() that I introduced in revision 1.20: We must convert all cache queue pages to free pages before we begin removing free pages from the free queue. Otherwise, if we have to restart the scan because we are unable to acquire the vm object lock that is necessary to convert a cache queue page to a free page, we leak those free pages already removed from the free queue.	2004-01-08 20:48:26 +00:00
Alan Cox	c020e821c7	Don't bother clearing PG_ZERO in contigmalloc1(), kmem_alloc(), or kmem_malloc(). It serves no purpose.	2004-01-06 20:52:55 +00:00
Alan Cox	2f7af3db57	Simplify the various pager allocation routines by computing the desired object size once and assigning that value to a local variable.	2004-01-04 20:55:15 +00:00
Alan Cox	a67048571f	Eliminate the acquisition and release of Giant from vnode_pager_alloc(). The vm object and vnode locking should suffice. Discussed with: jeff	2004-01-04 03:18:24 +00:00
Alan Cox	e793b7797d	Reduce the scope of Giant in swap_pager_alloc().	2004-01-03 20:02:17 +00:00
Alan Cox	d0058957b5	Revision 1.74 of vm_meter.c ("Avoid lock-order reversal") makes the release and subsequent reacquisition of the same vm object lock in vm_object_collapse() unnecessary.	2004-01-02 19:57:45 +00:00
Alan Cox	e0ba75dd78	Avoid lock-order reversal between the vm object list mutex and the vm object mutex.	2004-01-02 19:38:25 +00:00
Alan Cox	ff5dcf2546	- Increase the scope of the kmem_object's lock in kmem_malloc(). Add a comment explaining why a further increase is not possible.	2004-01-01 19:48:56 +00:00
Alan Cox	4804edb44f	In vm_page_lookup() check the root of the vm object's splay tree for the desired page before calling vm_page_splay().	2003-12-31 19:02:01 +00:00
Alan Cox	bcdaad7fe7	Simplify vm_page_grab(): Don't bother with the generation check. If the vm object hasn't changed, the desired page will be at or near the root of the vm object's splay tree, making vm_page_lookup() cheap. (The only lock required for vm_page_lookup() is already held.) If, however, the vm object has changed and retry was requested, eliminating the generation check also eliminates a pointless acquisition and release of the page queues lock.	2003-12-31 01:44:45 +00:00
Alan Cox	4da9f125cc	- Modify vm_object_split() to expect a locked vm object on entry and return on a locked vm object on exit. Remove GIANT_REQUIRED. - Eliminate some unnecessary local variables from vm_object_split().	2003-12-30 22:28:36 +00:00
Alan Cox	bd228075c7	Remove swap_pager_un_object_list; it is unused.	2003-12-29 04:21:44 +00:00
Alan Cox	53d0a98878	Remove GIANT_REQUIRED from kmem_suballoc().	2003-12-28 00:10:48 +00:00
Alan Cox	a976eb5e46	- Reduce Giant's scope in vm_fault(). - Use vm_object_reference_locked() instead of vm_object_reference() in vm_fault().	2003-12-26 23:33:37 +00:00
Alan Cox	75898105c0	Minor correction to revision 1.258: Use the proc pointer that is passed to vm_map_growstack() in the RLIMIT_VMEM check rather than curthread.	2003-12-26 21:54:45 +00:00
Alan Cox	9582cd94cb	- Create an unmapped guard page to trap access to vm_page_array[-1]. This guard page would have trapped the problems with the MFC of the PAE support to RELENG_4 at an earlier point in the sequence of events. Submitted by: tegge	2003-12-22 02:04:08 +00:00
Alan Cox	925692caa5	- Significantly reduce the number of preallocated pv entries in pmap_init(). Such a large preallocation is unnecessary and wastes nearly eight megabytes of kernel virtual address space per gigabyte of managed physical memory. - Increase UMA_BOOT_PAGES by two. This enables the removal of pmap_pv_allocf(). (Note: this function was only used during initialization, specifically, after pmap_init() but before pmap_init2(). During pmap_init2(), a new allocator is installed.)	2003-12-22 01:01:32 +00:00
Alan Cox	cafe836a56	- Correct an error in mincore(2) that has existed since its introduction: mincore(2) should check that the page is valid, not just allocated. Otherwise, it can return a false positive for a page that is not yet resident because it is being read from disk.	2003-12-21 06:03:40 +00:00
Alexander Kabaev	5e6dbda017	Remove trailing whitespace.	2003-12-08 02:45:45 +00:00
Alan Cox	c8123cb800	Addendum to revision 1.174: In the case where vm_pager_allocate() is called to create a vnode-backed object, the vnode lock must be held by the caller. Reported by: truckman Discussed with: kan	2003-12-08 00:47:33 +00:00
Alan Cox	20eec4bbdb	Fix a deadlock between vm_fault() and vm_mmap(): The expected lock ordering between vm_map and vnode locks is that vm_map locks are acquired first. In revision 1.150 mmap(2) was changed to pass a locked vnode into vm_mmap(). This creates a lock-order reversal when vm_mmap() calls one of the vm_map routines that acquires a vm_map lock. The solution implemented herein is to release the vnode lock in mmap() before calling vm_mmap() and reacquire this lock if necessary in vm_mmap(). Approved by: re (scottl) Reviewed by: jeff, kan, rwatson	2003-12-06 05:45:32 +00:00
John Baldwin	b6c71225a9	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha	2003-12-03 14:57:26 +00:00
Jeff Roberson	e30b97c5f9	- Unbreak UP. mp_maxid is not defined on uni-processor machines, although I believe it and the other MP variables should be. For now, just define it here and wait for jhb to clean it up later. Approved by: re (rwatson)	2003-11-30 22:18:14 +00:00
Jeff Roberson	504d5de3a8	- Replace the local maxcpu with mp_maxid. Previously, if mp_maxid was equal to MAXCPU, we would overrun the pcpu_mtx array because maxcpu was calculated incorrectly. - Add some more debugging code so that memory leaks at the time of uma_zdestroy() are more easily diagnosed. Approved by: re (rwatson)	2003-11-30 08:04:01 +00:00
Alan Cox	1cd5fbd854	- Avoid a lock-order reversal between Giant and a system map mutex that occurs when kmem_malloc() fails to allocate a sufficient number of vm pages. Specifically, we avoid the lock-order reversal by not grabbing Giant around pmap_remove() if the map is the kmem_map. Approved by: re (jhb) Reported by: Eugene <eugene3@web.de>	2003-11-19 18:48:45 +00:00
Tim J. Robbins	167a9effa5	In vnode_pager_input_smlfs(), call VOP_STRATEGY instead of VOP_SPECSTRATEGY on non-VCHR vnodes. This fixes a panic when reading data from files on a filesystem with a small (less than a page) block size. PR: 59271 Reviewed by: alc	2003-11-15 09:54:11 +00:00
Alan Cox	d1f42ac2ee	- Remove use of Giant from uma_zone_set_obj().	2003-11-14 17:49:07 +00:00
Alan Cox	6f8b4fc03a	- Remove long dead code.	2003-11-14 08:22:38 +00:00
Alan Cox	b7b7cd4421	Changes to msync(2) - Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE is specified to msync(2). This is required by the Open Group Base Specifications Issue 6. - vm_map_sync() doesn't return KERN_FAILURE. Thus, msync(2) can't possibly return EIO. - The second major loop in vm_map_sync() handles sub maps. Thus, failing on sub maps in the first major loop isn't necessary.	2003-11-14 06:55:11 +00:00
Alan Cox	d88346020b	- The Open Group Base Specifications Issue 6 specifies that an munmap(2) must return EINVAL if size is zero. Submitted by: tegge - In order to avoid a race condition in multithreaded applications, the check and removal operations by munmap(2) must be in the same critical section. To accomodate this, vm_map_check_protection() is modified to require its caller to obtain at least a read lock on the map.	2003-11-10 01:37:40 +00:00
Jonathan Mini	8f101a2f31	NFC: Update stale comments. Reviewed by: alc	2003-11-10 00:44:00 +00:00
Alan Cox	637315ed9c	- Remove Giant from msync(2). Giant is still acquired by the lower layers if we drop into the pmap or vnode layers. - Migrate the handling of zero-length msync(2)s into vm_map_sync() so that multithread applications can't change the map between implementing the zero-length hack in msync(2) and reacquiring the map lock in vm_map_sync(). Reviewed by: tegge	2003-11-09 22:09:04 +00:00
Alan Cox	950f8459d4	- Rename vm_map_clean() to vm_map_sync(). This better reflects the fact that msync(2) is its only caller. - Migrate the parts of the old vm_map_clean() that examined the internals of a vm object to a new function vm_object_sync() that is implemented in vm_object.c. At the same, introduce the necessary vm object locking so that vm_map_sync() and vm_object_sync() can be called without Giant. Reviewed by: tegge	2003-11-09 05:25:35 +00:00
Alan Cox	32a89c324e	- Move the implementation of OBJ_ONEMAPPING from vm_map_delete() to vm_map_entry_delete() so that all of the vm object manipulation is performed in one place.	2003-11-05 05:48:22 +00:00
Marcel Moolenaar	199c91ab79	Update avail_ssize for rstacks after growing them.	2003-11-04 06:48:58 +00:00
Dag-Erling Smørgrav	a86fa82659	Whitespace cleanup.	2003-11-03 16:14:45 +00:00
Alan Cox	a89c6258bb	- Increase the scope of the source object lock in vm_map_copy_entry().	2003-11-03 00:59:54 +00:00
Alan Cox	63f6cefcd5	- Increase the scope of two vm object locks in vm_object_split().	2003-11-02 22:52:42 +00:00
Alan Cox	b921a12b3b	- Introduce and use vm_object_reference_locked(). Unlike vm_object_reference(), this function must not be used to reanimate dead vm objects. This restriction simplifies locking. Reviewed by: tegge	2003-11-02 21:30:10 +00:00
Alan Cox	22ec553f77	- Increase the scope of two vm object locks in vm_object_collapse(). - Remove the acquisition and release of Giant from vm_object_coalesce().	2003-11-01 23:06:41 +00:00
Alan Cox	c7c8dd7e80	- Modify swap_pager_copy() and its callers such that the source and destination objects are locked on entry and exit. Add comments to the callers noting that the locks can be released by swap_pager_copy(). - Remove several instances of GIANT_REQUIRED.	2003-11-01 08:57:26 +00:00
Alan Cox	de33beddd5	- Additional vm object locking in vm_object_split() - New vm object locking assertions in vm_page_insert() and vm_object_set_writeable_dirty()	2003-11-01 04:54:23 +00:00
Alan Cox	3b9a4cb6a9	- Revert a part of revision 1.73: Make vm_object_set_flag() an inline function. This function is so trivial that inlining reduces the size of the kernel.	2003-10-31 20:17:00 +00:00
Alan Cox	dc6279b887	- Take advantage of the swap pager locking: Eliminate the use of Giant from vm_object_madvise(). - Remove excessive blank lines from vm_object_madvise().	2003-10-31 18:32:03 +00:00
Marcel Moolenaar	08667f6dc1	Fix two bugs introduced with the rstack functionality and specific to the rstack functionality: 1. Fix a KASSERT that tests for the address to be above the upward growable stack. Typically for rstack, the faulting address can be identical to the record end of the upward growable entry, and very likely is on ia64. The KASSERT tested for greater than, not greater equal, so whenever the register stack had to be grown the assertion fired. 2. When we grow the upward growable stack entry and adjust the unlying object, don't forget to adjust the size of the VM map. Not doing so would trigger an assert in vm_mapzdtor(). Pointy hat: marcel (for not testing with INVARIANTS).	2003-10-31 07:29:28 +00:00
Alan Cox	2928cef7e1	- Synchronize access to the swdevt's sw_flags with sw_dev_mtx. - Remove several instances of GIANT_REQUIRED.	2003-10-31 05:18:45 +00:00
Alan Cox	7645e88596	- Synchronize access to the swdevt's sw_blist with sw_dev_mtx. - Remove several instances of GIANT_REQUIRED.	2003-10-30 09:12:43 +00:00
Alan Cox	d05bc12976	- Synchronize access to swdevhd using sw_dev_mtx. - Use swp_sizecheck() rather than assignment to swap_pager_full in swaponsomething().	2003-10-30 07:11:06 +00:00
Alan Cox	0676a140b2	- Synchronize updates to nswapdev using sw_dev_mtx.	2003-10-29 07:51:41 +00:00
Alan Cox	2d9974c1e8	- Avoid a race in swaponsomething(): Calculate the new swdevt's first and end swblk and insert this new swdevt into the list of swap devices in the same critical section.	2003-10-29 05:42:28 +00:00
Alan Cox	d536c58f53	- Complete the synchronization of accesses to the swblock hash table.	2003-10-27 05:58:15 +00:00
Alan Cox	7827d9b0fe	- Introduce and use a mutex synchronizing access to the swblock hash table.	2003-10-26 19:55:35 +00:00
Alan Cox	43186e53ae	- Simplify vm_object_collapse()'s collapse case, reducing the number of lock acquires and releases performed. - Move an assertion from vm_object_collapse() to vm_object_zdtor() because it applies to all cases of object destruction.	2003-10-26 06:29:26 +00:00
Alan Cox	ee3dc7d7fe	- Add some of the required vm object locking, including assertions where the vm object lock is required and already held.	2003-10-25 23:42:17 +00:00
Alan Cox	93dbd07122	- Align a comment within struct vm_page. - Annotate the vm_page's valid field as synchronized by the containing vm object's lock.	2003-10-25 18:33:04 +00:00
Alan Cox	52051abcf1	- Call vnode_pager_input_old() with the vm object locked.	2003-10-25 05:21:16 +00:00
Alan Cox	2e3b314d3a	- Push down Giant from vm_pageout() to vm_pageout_scan(), freeing vm_pageout_page_stats() from Giant. - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the vm object to be locked on entry. (All of the pager routines now expect this.)	2003-10-24 06:43:04 +00:00
Alan Cox	ab42316c2f	- Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from vm_pageout_scan(). Rationale: I don't like leaving a busy page in the cache queue with neither the vm object nor the vm page queues lock held. - Assert that the page is active in vm_pageout_page_stats().	2003-10-22 18:41:32 +00:00
Alan Cox	d3c09dd7db	- Assert that every page found in the active queue is an active page.	2003-10-22 03:08:24 +00:00
Alan Cox	0d42c05ff4	- Assert that the containing vm object is locked in vm_page_set_validclean(). (This function reads and modifies the vm page's valid field, which is synchronized by the lock on the containing vm object.)	2003-10-21 19:36:51 +00:00
Alan Cox	fee181a696	- Remove some long unused code.	2003-10-20 18:57:01 +00:00
Alan Cox	3ad8097fd4	- Remove comments referring to functions that no longer exist.	2003-10-20 05:16:27 +00:00
Alan Cox	2bf43e4374	- Hold the vm object's lock around calls to vm_page_set_validclean().	2003-10-20 04:05:24 +00:00
Alan Cox	1b26eb10ff	- Synchronize access to a vm page's valid field using the containing vm object's lock. - Reduce the scope of the vm page queues lock in two places.	2003-10-19 00:01:56 +00:00
Alan Cox	8b575f6c28	- Synchronize access to the page's valid field in vnode_pager_generic_getpages() using the containing object's lock.	2003-10-18 21:30:29 +00:00
Alan Cox	7a93508274	- Increase the object lock's scope in vm_contig_launder() so that access to the object's type field and the call to vm_pageout_flush() are synchronized. - The above change allows for the eliminaton of the last parameter to vm_pageout_flush(). - Synchronize access to the page's valid field in vm_pageout_flush() using the containing object's lock.	2003-10-18 21:09:21 +00:00
Alan Cox	cbef13d877	Corrections to revision 1.305 - Specifying VM_MAP_WIRE_HOLESOK should not assume that the start address is the beginning of the map. Instead, move to the first entry after the start address. - The implementation of VM_MAP_WIRE_HOLESOK was incomplete. This caused the failure of mlockall(2) in some circumstances.	2003-10-18 18:48:17 +00:00

... 2 3 4 5 6 ...

2118 Commits