Commit Graph

359 Commits

Author SHA1 Message Date
Alan Cox
3d2e54c317 Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove().  In general, they require the lock in
order to modify a page's pv list or flags.  In some cases, however,
pmap_protect() can avoid acquiring the lock.
2004-07-15 18:00:43 +00:00
Alan Cox
26354d4c08 Remove an unused and unimplemented sysctl. (For the record, it was marked
as unimplemented in revision 1.129 nearly six years ago.)
2004-07-12 17:45:37 +00:00
Alan Cox
5e609009de Call vm_pageout_page_stats() with the page queues lock held. 2004-06-24 04:08:43 +00:00
Alan Cox
1aab16a6b6 Remove spl calls. 2004-06-24 03:13:30 +00:00
Julian Elischer
fa88511615 Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.
2004-06-16 00:26:31 +00:00
Alan Cox
f651b12907 Cache queue pages are not mapped. Thus, the pmap_remove_all() by
vm_pageout_scan()'s loop for freeing cache queue pages is unnecessary.
2004-05-12 04:10:35 +00:00
Bruce Evans
dcbcd518e0 Minor style fixes. In vm_daemon(), don't fetch the rss limit long before
it is needed.
2004-03-04 09:36:46 +00:00
Alan Cox
9ea8d1a67c Eliminate the second, unnecessary call to pmap_page_protect() near the end
of vm_pageout_flush().  Instead, assert that the page is still write
protected.

Discussed with:	tegge
2004-02-21 23:32:00 +00:00
Alan Cox
84d98bf699 - Correct a long-standing race condition in vm_page_try_to_cache() that
could result in a panic "vm_page_cache: caching a dirty page, ...":
   Access to the page must be restricted or removed before calling
   vm_page_cache().  This race condition is identical in nature to that
   which was addressed by vm_pageout.c's revision 1.251.
 - Simplify the code surrounding the fix to this same race condition
   in vm_pageout.c's revision 1.251.  There should be no behavioral
   change.  Reviewed by: tegge

MFC after:	7 days
2004-02-14 08:54:37 +00:00
Alan Cox
a3dfacb51c Correct a long-standing race condition in the inactive queue scan. (See
the added comment for low-level details.)  The effect of this race
condition is a panic "vm_page_cache: caching a dirty page, ..."

Reviewed by:	tegge
MFC after:	7 days
2004-02-10 18:34:27 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Alan Cox
2e3b314d3a - Push down Giant from vm_pageout() to vm_pageout_scan(), freeing
vm_pageout_page_stats() from Giant.
 - Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
   vm object to be locked on entry.  (All of the pager routines now expect
   this.)
2003-10-24 06:43:04 +00:00
Alan Cox
ab42316c2f - Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from
vm_pageout_scan().  Rationale: I don't like leaving a busy page in the
   cache queue with neither the vm object nor the vm page queues lock held.
 - Assert that the page is active in vm_pageout_page_stats().
2003-10-22 18:41:32 +00:00
Alan Cox
d3c09dd7db - Assert that every page found in the active queue is an active page. 2003-10-22 03:08:24 +00:00
Alan Cox
7a93508274 - Increase the object lock's scope in vm_contig_launder() so that access
to the object's type field and the call to vm_pageout_flush() are
   synchronized.
 - The above change allows for the eliminaton of the last parameter
   to vm_pageout_flush().
 - Synchronize access to the page's valid field in vm_pageout_flush()
   using the containing object's lock.
2003-10-18 21:09:21 +00:00
Alan Cox
6989c456b3 - Synchronize access to a vm page's valid field using the containing
vm object's lock.
 - Release the vm object and vm page queues locks around vput().
2003-10-17 05:07:17 +00:00
Alan Cox
45ae1d9147 Merge vm_pageout_free_page_calc() into vm_pageout(), eliminating some
unneeded code.
2003-09-19 05:03:45 +00:00
Alan Cox
4b5f553179 When calling vget() on a vnode-backed vm object, acquire the vnode
interlock before releasing the vm object's lock.
2003-09-17 06:55:42 +00:00
Alan Cox
3562af1215 - Add vm object locking to the part of vm_pageout_scan() that launders
dirty pages.
 - Remove some unused variables.
2003-08-31 00:00:46 +00:00
Alan Cox
3e1b578a28 Extend the scope of the page queues lock in vm_pageout_scan() to cover
the traversal of the PQ_INACTIVE queue.
2003-08-15 05:13:36 +00:00
Poul-Henning Kamp
8f60c087e6 Change the layout policy of the swap_pager from a hardcoded width
striping to a per device round-robin algorithm.

Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.

Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist.  This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.

Remove the compile time constant limit of number of swap devices,
there is no limit now.  Instead of a fixed size array, store the
per swapdev structure in a TAILQ.

Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).

We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.

A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).

The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.

Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.

Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.

Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.
2003-08-03 13:35:31 +00:00
Alan Cox
ecf6279f00 - Complete the vm object locking in vm_pageout_object_deactivate_pages().
- Change vm_pageout_object_deactivate_pages()'s first parameter from a
   vm_map_t to a pmap_t.
 - Change vm_pageout_object_deactivate_pages()'s and
   vm_pageout_map_deactivate_pages()'s last parameter from a vm_pindex_t
   to a long.  Since the number of pages in an address space doesn't
   require 64 bits on an i386, vm_pindex_t is overkill.
2003-07-07 07:16:29 +00:00
Alan Cox
0774dfb376 Add vm object locking to vm_pageout_map_deactivate_pages(). 2003-06-29 19:51:24 +00:00
Alan Cox
5163584c7e - Add vm object locking to vm_pageout_clean(). 2003-06-28 20:07:54 +00:00
David E. O'Brien
874651b13c Use __FBSDID(). 2003-06-11 23:50:51 +00:00
David Schultz
e92686d065 If we seem to be out of VM, don't allow the pagedaemon to kill
processes in the first pass.  Among other things, this will give
us a chance to launder vnode-backed pages before concluding that
we need more swap.  This is particularly useful for systems that
have no swap.

While here, update a comment and remove some long-unused code.

Reported by:	Lucky Green <shamrock@cypherpunks.to>
Suggested by:	dillon
Approved by:	re (rwatson)
2003-05-19 00:51:07 +00:00
Alan Cox
c4a1d732a3 Avoid a lock-order reversal and implement vm_object locking
in vm_pageout_page_free().
2003-05-04 06:56:27 +00:00
Alan Cox
85b1dc89b6 Eliminate an unused parameter from vm_pageout_object_deactivate_pages(). 2003-04-30 03:08:16 +00:00
Alan Cox
b6e48e0372 - Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
   whether its caller has locked the vm_object.  (This is a temporary
   measure to bootstrap vm_object locking.)
2003-04-24 04:31:25 +00:00
John Baldwin
897ecacd64 Lock the proc to check p_flag and several other related tests in
vm_daemon().  We don't need to hold sched_lock as long now as a result.
2003-04-22 20:03:08 +00:00
Alan Cox
72ba747d16 - Lock the vm_object when performing vm_object_pip_wakeup().
- Merge two identical cases in a switch statement.
2003-04-20 20:37:14 +00:00
Alan Cox
d22bc7101c - Lock the vm_object when performing vm_object_pip_add(). 2003-04-20 03:41:21 +00:00
Wes Peters
f4cf2141f6 Add a facility allowing processes to inform the VM subsystem they are
critical and should not be killed when pageout is looking for more
memory pages in all the wrong places.

Reviewed by:	arch@
Sponsored by:	St. Bernard Software
2003-03-31 21:09:57 +00:00
David Schultz
72d97679ff - When the VM daemon is out of swap space and looking for a
process to kill, don't block on a map lock while holding the
  process lock.  Instead, skip processes whose map locks are held
  and find something else to kill.
- Add vm_map_trylock_read() to support the above.

Reviewed by:	alc, mike (mentor)
2003-03-12 23:13:16 +00:00
Alan Cox
6b4b77ad34 Add a comment describing how pagedaemon_wakeup() should be used and
synchronized.

Suggested by:	tegge
2003-02-09 20:40:36 +00:00
Alan Cox
a1c0a78518 - It's more accurate to say that vm_paging_needed() returns TRUE
than a positive number.
 - In pagedaemon_wakeup(), set vm_pages_needed to 1 rather than
   incrementing it to accomplish the same.
2003-02-02 07:16:40 +00:00
Alan Cox
8e1d8de578 - Convert vm_pageout()'s tsleep()s to msleep()s with the page queue lock. 2003-02-02 01:11:21 +00:00
Alan Cox
8b24576748 - Remove (some) unnecessary explicit initializations to zero.
- Style changes to vm_pageout(): declarations and white-space.
2003-02-01 21:55:30 +00:00
Alan Cox
b0ef8c5fe4 - Update vm_pageout_deficit using atomic operations. It's a simple
counter outside the scope of existing locks.
 - Eliminate a redundant clearing of vm_pageout_deficit.
2003-01-14 06:57:03 +00:00
Alan Cox
ff2023a5df Make vm_pageout_page_free() static. 2003-01-14 02:28:39 +00:00
Poul-Henning Kamp
c410df597b Avoid extern decls in .c files by putting them in the vm/swap_pager.h
include file where they belong.
Share the dmmax_mask variable.
2003-01-03 14:30:46 +00:00
Matthew Dillon
40bb4f4bcf vm_pager_put_pages() takes VM_PAGER_* flags, not OBJPC_* flags. It just
so happens that OBJPC_SYNC has the same value as VM_PAGER_PUT_SYNC so no
harm done.  But fix it :-)

No operational changes.

MFC after:	1 day
2002-12-28 21:15:39 +00:00
Alan Cox
b365ea9e30 Hold the page queues lock when performing vm_page_flag_set(). 2002-12-18 04:02:02 +00:00
Alan Cox
38857e7f73 Hold the page queues lock when calling pmap_protect(); it updates fields
of the vm_page structure.  Nearby, remove an unnecessary semicolon and
return statement.

Approved by:	re (blanket)
2002-12-01 05:40:18 +00:00
Alan Cox
78f7187d01 Increase the scope of the page queue lock in vm_pageout_scan().
Approved by:	re (blanket)
2002-12-01 00:02:39 +00:00
Alan Cox
ba0208b945 Assert that the page queues lock rather than Giant is held in
vm_pageout_page_free().

Approved by:	re
2002-11-23 08:08:54 +00:00
Jeff Roberson
855a310fcb - Add an event that is triggered when the system is low on memory. This is
intended to be used by significant memory consumers so that they may drain
   some of their caches.

Inspired by:	phk
Approved by:	re
Tested on:	x86, alpha
2002-11-21 09:17:56 +00:00
Alan Cox
a12cc0e489 Remove vm_page_protect(). Instead, use pmap_page_protect() directly. 2002-11-18 04:05:22 +00:00
Alan Cox
4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alan Cox
eea85e9bb6 Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
2002-11-13 05:39:58 +00:00
Alan Cox
d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Jeff Roberson
b43179fbe8 - Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
 - Replace direct manipulation of scheduler data with hooks provided by the
   new api.
 - Remove KSE specific state modifications and single runq assumptions from
   kern_switch.c

Reviewed by:	-arch
2002-10-12 05:32:24 +00:00
Jeff Roberson
3ef3e7c42b - Get rid of the unused LK_NOOBJ. 2002-09-25 01:24:58 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Julian Elischer
71fad9fdee Completely redo thread states.
Reviewed by:	davidxu@freebsd.org
2002-09-11 08:13:56 +00:00
Alan Cox
67ef391e00 o Lock page queue accesses by vm_page_activate(). 2002-08-10 23:53:59 +00:00
Alan Cox
299018d3b6 o Lock page queue accesses by vm_page_free().
o Increment cnt.v_dfree inside vm_pageout_page_free() rather than
   at each call.
2002-07-28 05:46:47 +00:00
Alan Cox
55df3298c6 o Require that the page queues lock is held on entry to vm_pageout_clean()
and vm_pageout_flush().
 o Acquire the page queues lock before calling vm_pageout_clean()
   or vm_pageout_flush().
2002-07-27 23:20:32 +00:00
Alan Cox
ce18aebde4 o Lock page queue accesses by vm_page_activate() and vm_page_deactivate()
in vm_pageout_object_deactivate_pages().
 o Apply some style fixes to vm_pageout_object_deactivate_pages().
2002-07-27 06:41:03 +00:00
Alan Cox
8ffc151979 o Extend the scope of the page queues lock in vm_pageout_scan()
to cover the traversal of the cache queue.
2002-07-23 02:42:25 +00:00
Alan Cox
40eab1e944 o Lock page queue accesses by vm_page_try_to_cache(). (The accesses
in kern/vfs_bio.c are already locked.)
 o Assert that the page queues lock is held in vm_page_try_to_cache().
2002-07-20 20:58:46 +00:00
Alan Cox
15a5d2108e o Lock page queue accesses by vm_page_cache() in vm_fault() and
vm_pageout_scan().  (The others are already locked.)
 o Assert that the page queues lock is held in vm_page_cache().
2002-07-20 19:34:21 +00:00
Alan Cox
48c0444c98 o Lock accesses to the active page queue in vm_pageout_scan() and
vm_pageout_page_stats().
2002-07-20 18:45:25 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Alan Cox
d974f03c69 o Introduce and use vm_map_trylock() to replace several direct uses
of lockmgr().
 o Add missing synchronization to vmspace_swap_count(): Obtain a read lock
   on the vm_map before traversing it.
2002-04-28 06:07:54 +00:00
Jeff Roberson
670d17b5c0 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 04:02:59 +00:00
Alfred Perlstein
11caded34f Remove __P. 2002-03-19 22:20:14 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Brian Feldman
25adb370be Back out the modification of vm_map locks from lockmgr to sx locks. The
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.

Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.
2002-03-18 15:08:09 +00:00
Brian Feldman
0e0af8ecda Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system.  To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe.  One more
user of lockmgr down, many more to go :)
2002-03-13 23:48:08 +00:00
Eivind Eklund
a128794977 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
Mike Silbersack
7f3a40933b Fix a horribly suboptimal algorithm in the vm_daemon.
In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes.  Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations.  The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period.  With this change
in place, the vm_daemon completes in less than a second.  Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure.  It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR:		33542, 20393
Reviewed by:	dillon
MFC after:	1 week
2002-02-27 18:03:02 +00:00
Mike Silbersack
ef6020d187 Changes to make the OOM killer much more effective:
- Allow the OOM killer to target processes currently locked in
  memory.  These very often are the ones doing the memory hogging.
- Drop the wakeup priority of processes currently sleeping while
  waiting for their page fault to complete.  In order for the OOM
  killer to work well, the killed process and other system processes
  waiting on memory must be allowed to wakeup first.

Reviewed by:	dillon
MFC after:	1 week
2002-02-19 18:34:02 +00:00
Matthew Dillon
027df6bdd7 GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer
cache lockups for over a year now.

MFC after:		0 days
2002-01-31 18:39:44 +00:00
Matthew Dillon
23b590188f Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()
against VM_WAIT in the pageout code.  Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.

Hopefully MFC: before the 4.5 release
2001-12-20 22:42:27 +00:00
Matthew Dillon
57601bcb5d Syntax cleanup and documentation, no operational changes.
MFC after:	1 day
2001-10-21 06:12:06 +00:00
Tor Egge
30105b9ec4 Don't remove all mappings of a swapped out process if the vm map contained
wired entries.  vm_fault_unwire() depends on the mapping being intact.

Reviewed by:	dillon
2001-10-14 20:51:14 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
6d03d577a5 Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc...  vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)
2001-07-04 23:27:09 +00:00
Matthew Dillon
54d9214595 whitespace / register cleanup 2001-07-04 19:00:13 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
ad6c5bbede Don't lock around swap_pager_swap_init() that is only called once during
the pagedaemon's startup code since it calls malloc which results in lock
order reversals.
2001-06-20 23:34:06 +00:00
John Baldwin
69a78d4666 Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for
now.  The proc locking isn't actually safe yet and won't be until the proc
locking is finished.
2001-06-20 00:48:20 +00:00
Matthew Dillon
ff2b5645b5 Two fixes to the out-of-swap process termination code. First, start killing
processes a little earlier to avoid a deadlock.  Second, when calculating
the 'largest process' do not just count RSS.  Instead count the RSS + SWAP
used by the process.  Without this the code tended to kill small
inconsequential processes like, oh, sshd, rather then one of the many
'eatmem 200MB' I run on a whim :-).  This fix has been extensively tested on
-stable and somewhat tested on -current and will be MFCd in a few days.

Shamed into fixing this by: ps
2001-06-09 18:06:58 +00:00
John Baldwin
3614c6fcbb - Add in several asserts of vm_mtx.
- Assert Giant in vm_pageout_scan() for the vnode hacking that it does.
- Don't hold vm_mtx around vget() or vput().
- Lock Giant when calling vm_pageout_scan() from the pagedaemon.  Also,
  lock curproc while setting the P_BUFEXHAUST flag.
- For now we still hold Giant for all of the vm_daemon.  When process
  limits are locked we will be only need Giant for swapout_procs().
2001-05-23 22:48:28 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
John Baldwin
1c58e4e550 During the code to pick a process to kill when memory is exhausted, keep
the process in question locked as soon as we find it and determine it to
be eligible until we actually kill it.  To avoid deadlock, we don't block
on the process lock but skip any process that is already locked during our
search.
2001-05-17 22:49:03 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
John Baldwin
1005a129e5 Convert the allproc and proctree locks from lockmgr locks to sx locks. 2001-03-28 11:52:56 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
John Baldwin
8606d88043 - Catch up to proc flag changes.
- Minimal proc locking.
- Use queue macros.
2001-01-24 11:28:36 +00:00
Matthew Dillon
2b6b0df712 This implements a better launder limiting solution. There was a solution
in 4.2-REL which I ripped out in -stable and -current when implementing the
low-memory handling solution.  However, maxlaunder turns out to be the saving
grace in certain very heavily loaded systems (e.g. newsreader box).  The new
algorithm limits the number of pages laundered in the first pageout daemon
pass.  If that is not sufficient then suceessive will be run without any
limit.

Write I/O is now pipelined using two sysctls, vfs.lorunningspace and
vfs.hirunningspace.  This prevents excessive buffered writes in the
disk queues which cause long (multi-second) delays for reads.  It leads
to more stable (less jerky) and generally faster I/O streaming to disk
by allowing required read ops (e.g. for indirect blocks and such) to occur
without interrupting the write stream, amoung other things.

NOTE: eventually, filesystem write I/O pipelining needs to be done on a
per-device basis.  At the moment it is globalized.
2000-12-26 19:41:38 +00:00
Seigo Tanimura
21cd6e6232 - If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by:	dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by:	alfred, dillon
2000-12-13 10:01:00 +00:00
Jake Burkholder
c0c2557090 - Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr.  Also provides macros for the flags
  pased to specify shared, exclusive or release which map to the
  lockmgr flags.  This is so that the use of lockmgr can be easily
  replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
2000-12-13 00:17:05 +00:00
Matthew Dillon
02fa91d35e Be less conservative with a recently added KASSERT. Certain edge
cases with file fragments and read-write mmap's can lead to a situation
    where a VM page has odd dirty bits, e.g. 0xFC - due to being dirtied by
    an mmap and only the fragment (representing a non-page-aligned end of
    file) synced via a filesystem buffer.  A correct solution that
    guarentees consistent m->dirty for the file EOF case is being
    worked on.  In the mean time we can't be so conservative in the
    KASSERT.
2000-12-11 07:52:47 +00:00
John Baldwin
c5a44a6af6 Protect p_stat with sched_lock. 2000-12-02 06:09:44 +00:00
Jake Burkholder
553629ebc9 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
Matthew Dillon
936524aa02 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Kirk McKusick
f2a2857bb3 Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).
2000-07-11 22:07:57 +00:00
Matthew Dillon
8b03c8ed5e This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it.  It implements a new
    PG_UNMANAGED flag that has slightly different characteristics
    from PG_FICTICIOUS.

    A new sysctl, kern.ipc.shm_use_phys has been added to enable the
    use of physically-backed sysv shared memory rather then swap-backed.
    Physically backed shm segments are not tracked with PV entries,
    allowing programs which use a large shm segment as a rendezvous
    point to operate without eating an insane amount of KVM in the
    PV entry management.  Read: Oracle.

    Peter's OBJT_PHYS object will also allow us to eventually implement
    page-table sharing and/or 4MB physical page support for such segments.
    We're half way there.
2000-05-29 22:40:54 +00:00
Matthew Dillon
883f3caa13 Fix bug in vm_pageout_page_stats() that always resulted in a full
scan of the active queue.  This fix is not expected to have any
    noticeable impact on performance.

Noticed by: Rik van Riel <riel@conectiva.com.br>
2000-05-29 02:31:55 +00:00
Peter Wemm
249645144d Checkpoint of a new physical memory backed object type, that does not
have pv_entries.  This is intended for very special circumstances,
eg: a certain database that has a 1GB shm segment mapped into 300
processes.  That would consume 2GB of kvm just to hold the pv_entries
alone.  This would not be used on systems unless the physical ram was
available, as it's not pageable.

This is a work-in-progress, but is a useful and functional checkpoint.
Matt has got some more fixes for it that will be committed soon.

Reviewed by:	dillon
2000-05-21 13:41:29 +00:00
Peter Wemm
0385347c1a Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that.  In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t.  This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system.  (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc).  Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries.  This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist.  This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by:	dillon
2000-05-21 12:50:18 +00:00
Matthew Dillon
25db2c5417 Add necessary spl protection for swapper. The problem was located by
Alfred while testing his SPLASSERT stuff.   This is not a complete fix,
    more protections are probably needed.
2000-03-27 21:33:32 +00:00
Philippe Charnier
5929bcfaba Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
Philippe Charnier
956f31353c Spelling 2000-03-26 15:20:23 +00:00
Eivind Eklund
6bdfe06ad9 Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
Alan Cox
be72f78813 The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to
eliminate an extra (useless) level of indirection in half of the page
queue accesses and (2) to use a single name for each queue throughout,
instead of, e.g., "vm_page_queue_active" in some places and
"vm_page_queues[PQ_ACTIVE]" in others.

Reviewed by:	dillon
1999-10-30 07:37:14 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Matthew Dillon
90ecac61c0 Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
Replace various VM related page count calculations strewn over the
    VM code with inlines to aid in readability and to reduce fragility
    in the code where modules depend on the same test being performed
    to properly sleep and wakeup.

    Split out a portion of the page deactivation code into an inline
    in vm_page.c to support vm_page_dontneed().

    add vm_page_dontneed(), which handles the madvise MADV_DONTNEED
    feature in a related commit coming up for vm_map.c/vm_object.c.  This
    code prevents degenerate cases where an essentially active page may
    be rotated through a subset of the paging lists, resulting in premature
    disposal.
1999-09-17 04:56:40 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Alan Cox
aeea9b3695 Remove two unused variable declarations. 1999-08-22 00:01:46 +00:00
Alan Cox
bfbacbd93f vm_pageout_clean:
Remove dead code.

Submitted by:	dillon
1999-08-17 00:07:35 +00:00
Kirk McKusick
e929c00d23 The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA.  With this patch clean
and dirty buffers have been separated.  Empty buffers with KVM
assignments have been separated from truely empty buffers.  getnewbuf()
has been rewritten and now operates in a 100% optimal fashion.  That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized.  Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur.  This resulted in processes blocking on conditions
unrelated to what they were doing.  This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits.  We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat.  The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed.  This
algorithm has been changed.  reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list.  The new algorithm is deterministic but
not perfect.  The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit.  This bit allows processes working with filesystem
buffers to use available emergency reserves.  Normal processes do not set
this bit and are not allowed to dig into emergency reserves.  The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-04 00:25:38 +00:00
Peter Wemm
9c8b8baa38 Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface.  kproc_start is still available
as a SYSINIT() hook.  This allowed simplification of chunks of the
sysinit code in the process.  This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process.  It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit.  This means that nfsiod
doesn't need to be in /sbin and is always "available".  This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
1999-07-01 13:21:46 +00:00
Peter Wemm
d50c199430 There isn't much point waking up a daemon that hasn't existed since
softupdates came in.  Try calling speedup_syncer() instead..
1999-06-26 14:56:58 +00:00
Dmitrij Tejblum
11a9f83f80 Make pmap_collect() an official pmap interface. 1999-04-23 20:29:58 +00:00
Peter Wemm
c8da68e917 Don't forcibly kill processes that are locked in-core via PHOLD - it was
just checking P_NOSWAP before.
1999-04-06 03:14:56 +00:00
Julian Elischer
51df594922 Stop the mfs from trying to swap out crucial bits of the mfs
as this can lead to deadlock.
Submitted by: Mat dillon <dillon@freebsd.org>
1999-03-12 00:44:03 +00:00
Luoqi Chen
fe2144fd5a Eliminate a possible numerical overflow. 1999-02-19 19:14:48 +00:00
Luoqi Chen
b1028ad122 Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
Matthew Dillon
faa273d5c2 Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with
PQ_FREE.  There is little operational difference other then the kernel
    being a few kilobytes smaller and the code being more readable.

    * vm_page_select_free() has been *greatly* simplified.
    * The PQ_ZERO page queue and supporting structures have been removed
    * vm_page_zero_idle() revamped (see below)

    PG_ZERO setting and clearing has been migrated from vm_page_alloc()
    to vm_page_free[_zero]() and will eventually be guarenteed to remain
    tracked throughout a page's life ( if it isn't already ).

    When a page is freed, PG_ZERO pages are appended to the appropriate
    tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended.
    When locating a new free page, PG_ZERO selection operates from within
    vm_page_list_find() ( get page from end of queue instead of beginning
    of queue ) and then only occurs in the nominal critical path case.  If
    the nominal case misses, both normal and zero-page allocation devolves
    into the same _vm_page_list_find() select code without any specific
    zero-page optimizations.

    Additionally, vm_page_zero_idle() has been revamped.  Hysteresis has been
    added and zero-page tracking adjusted to conform with the other changes.
    Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free
    pages.  We may wish to increase both parameters as time permits.  The
    hysteresis is designed to avoid silly zeroing in borderline allocation/free
    situations.
1999-02-08 00:37:36 +00:00
Matthew Dillon
9fdfe602fc Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
    problems and replaced with an explicit dup of the vm_map_entry structure.
    Prior to the removal, they were entirely unused.
1999-02-07 21:48:23 +00:00
Matthew Dillon
7dbf82dc13 Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

    The inline can thus do sanity checks ( or not ) over all cases.
1999-01-24 06:04:52 +00:00
Matthew Dillon
d044d7bfb6 Added warning printf ( needs INVARIANTS ) when busy cache page is found
while trying to free memory.
1999-01-24 01:33:22 +00:00
Matthew Dillon
aaba53da90 It is possible for a page in the cache to be busy. vm_pageout.c was not
checking for this condition while it tried to free cache pages.  Fixed.
1999-01-24 01:06:31 +00:00
Matthew Dillon
a489f7614b Reorganized some of the low memory testing code to make it more useful.
Removed call to vm_object_collapse(), which can block.  This was being
    called without the pageout code holding any sort of reference on the
    vm_object or vm_page_t structures being manipulated.  Since this code
    can block, it was possible for other kernel code to shred the state
    the pageout code was assuming remained intact.

    Fixed potential blocking condition in vm_pageout_page_free() ( which
    could cause a deadlock in a low-memory situation ).

    Currently there is a hack in-place to deal with clean filesystem meta-data
    polluting the inactive page queue.  John doesn't like the hack, and neither
    do I.

    Revamped and commented a portion of the pageout loop.

    Added protection against potential memory deadlocks with OBJT_VNODE
    when using VOP_ISLOCKED().  The problem is that vp->v_data can be NULL
    which causes VOP_ISLOCKED() to return a less informed answer.

    remove vm_pager_sync() -- none of the pagers use it any more ( the old
    swapper used to.  The new one does not ).
1999-01-21 10:12:54 +00:00
Matthew Dillon
1c7c3c6a86 This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
Peter Wemm
b0359e2c11 Add John Dyson's SYSCTL descriptions, and an export of more stats to
a sysctl hierarchy (vm.stats.*).  SYSCTL descriptions are only present
in source, they do not get compiled into the binaries taking up memory.
1998-10-31 17:21:31 +00:00
Poul-Henning Kamp
f5ef029e92 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
Andrzej Bialecki
faa5f8d8da Make #define NO_SWAPPING a normal kernel config option.
Reviewed by:	jkh
1998-09-29 17:33:59 +00:00
Doug Rabson
e69763a315 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
Doug Rabson
069e9bc1b4 Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
Doug Rabson
d474eaaa5f Protect all modifications to paging_in_progress with splvm(). The i386
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.

With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)
1998-08-06 08:33:19 +00:00
Alexander Langer
427e99a0b8 Removed unnecessary test from if/else construct.
PR:		7233
Submitted by:	Stefan Eggers <seggers@semyam.dinoco.de>
1998-07-10 17:58:35 +00:00
John Dyson
e8f367853b Correct sleep priority. 1998-06-02 05:39:13 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
John Dyson
be01eafd5f Quell unneeded pageout daemon activity. 1998-03-08 18:19:17 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
John Dyson
a15403de7b Correct some severe VM tuning problems for small systems (<=16MB), and
improve tuning on larger systems.  (A couple of the VM tuning params for
small systems were so badly chosen that the system could hang under load.)

The broken tuning was originaly my fault.
1998-02-24 10:16:23 +00:00
John Dyson
e47ed70b0f Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
John Dyson
eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00