Commit Graph

1125 Commits

Author SHA1 Message Date
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
Matthew Dillon
23b590188f Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()
against VM_WAIT in the pageout code.  Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.

Hopefully MFC: before the 4.5 release
2001-12-20 22:42:27 +00:00
Matthew Dillon
3ebeaf5984 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
Luigi Rizzo
60363fb9f7 vm/vm_kern.c: rate limit (to once per second) diagnostic printf when
you run out of mbuf address space.

kern/subr_mbuf.c: print a warning message when mb_alloc fails, again
	rate-limited to at most once per second. This covers other
	cases of mbuf allocation failures. Probably it also overlaps the
	one handled in vm/vm_kern.c, so maybe the latter should go away.

This warning will let us gradually remove the printf that are scattered
across most network drivers to report mbuf allocation failures.
Those are potentially dangerous, in that they are not rate-limited and
can easily cause systems to panic.

Unless there is disagreement (which does not seem to be the case
judging from the discussion on -net so far), and because this is
sort of a safety bugfix, I plan to commit a similar change to STABLE
during the weekend (it affects kern/uipc_mbuf.c there).

Discussed-with: jlemon, silby and -net
2001-12-01 00:21:30 +00:00
Jonathan Lemon
4584bbf555 When laying out objects in a ZONE_INTERRUPT zone, allow them to cross
a page boundary, since we've already allocated all our contiguous kva
space up front.  This eliminates some memory wastage, and allows us to
actually reach the # of objects were specified in the zinit() call.

Reviewed by: peter, dillon
2001-11-17 00:40:48 +00:00
Matthew Dillon
fe8e0238cc Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress count
on a vnode-backed object must be incremented *after* obtaining the vnode
lock.  If it is bumped before obtaining the vnode lock we can deadlock
against vtruncbuf().

Submitted by:	peter, ps
MFC after:	3 days
2001-11-09 21:34:45 +00:00
Matthew Dillon
33c6774151 Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond the
file EOF.  This works around a bug in the ISOFS (CDRom) BMAP code which
returns bogus values for requests beyond the file EOF rather then returning
an error, resulting in either corrupt data being mmap()'d beyond the file EOF
or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of
CDRom files).

Reported by: peter / Yahoo
MFC after:	3 days
2001-11-05 18:58:47 +00:00
Matthew Dillon
e302698320 Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise().  It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.
2001-10-31 03:06:33 +00:00
Matthew Dillon
7a5a635273 Move recently added procedure which was incorrectly placed within an
#ifdef DDB block.
2001-10-26 16:27:54 +00:00
Matthew Dillon
245df27cee Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a
real effect.

Optimize vfs_msync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  Improves looping case by 500%.

Optimize ffs_sync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  This makes a couple of assumptions,
which I believe are ok, in regards to vnode stability when the mount list
mutex is held.  Improves looping case by 500%.

(more optimization work is needed on top of these fixes)

MFC after:	1 week
2001-10-26 00:08:05 +00:00
Matthew Dillon
57601bcb5d Syntax cleanup and documentation, no operational changes.
MFC after:	1 day
2001-10-21 06:12:06 +00:00
Ian Dowse
0eb6ce3169 Move the code that computes the system load average from vm_meter.c
to kern_synch.c in preparation for adding some jitter to the
inter-sample time.

Note that the "vm.loadavg" sysctl still lives in vm_meter.c which
isn't the right place, but it is appropriate for the current (bad)
name of that sysctl.

Suggested by:	jhb (some time ago)
Reviewed by:	bde
2001-10-20 13:10:43 +00:00
Matthew Dillon
b386828956 contigmalloc1() could cause the vm_page_zero_count to become incorrect.
Properly track the count.

Submitted by:	mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
2001-10-17 17:34:34 +00:00
Tor Egge
d6844b6bf6 Don't use an uninitialized field reserved for callers in the bio structure
passed to swap_pager_strategy().  Instead, use a field reserved for drivers
and initialize it before usage.

Reviewed by:	dillon
2001-10-15 23:02:54 +00:00
Tor Egge
30105b9ec4 Don't remove all mappings of a swapped out process if the vm map contained
wired entries.  vm_fault_unwire() depends on the mapping being intact.

Reviewed by:	dillon
2001-10-14 20:51:14 +00:00
Tor Egge
e7673b8424 Fix locking violations during page wiring:
- vm map entries are not valid after the map has been unlocked.

 - An exclusive lock on the map is needed before calling
   vm_map_simplify_entry().

Fix cleanup after page wiring failure to unwire all pages that had been
successfully wired before the failure was detected.

Reviewed by:	dillon
2001-10-14 20:47:08 +00:00
Matthew Dillon
33bd457d91 Makes contigalloc[1]() create the vm_map / underlying wired pages in the
kernel map and object in a manner that contigfree() is actually able to
free.  Previously contigfree() freed up the KVA space but could not
unwire & free the underlying VM pages due to mismatched pageability between
the map entry and the VM pages.

Submitted by:	Thomas Moestl <tmoestl@gmx.net>
Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
MFC after:	3 days
2001-10-13 04:23:37 +00:00
Matthew Dillon
00a6f47f13 Finally fix the VM bug where a file whos EOF occurs in the middle of a page
would sometimes prevent a dirty page from being cleaned, even when synced,
resulting in the dirty page being re-flushed to disk every 30-60 seconds or
so, forever.  The problem is that when the filesystem flushes a page to
its backing file it typically does not clear dirty bits representing areas
of the page that are beyond the file EOF.  If the file is also mmap()'d and
a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty
bits to VM_PAGE_BITS_ALL.  This combination could leave us with an uncleanable,
unfreeable page.

The solution is to have the vnode_pager detect the edge case and manually
clear the dirty bits representing areas beyond the file EOF.  The filesystem
does the rest and the page comes up clean after the write completes.

MFC after:	3 days
2001-10-12 18:17:34 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Ian Dowse
564bfabecb Remove the SSLEEP case from the load average computation. This has
been a no-op for as long as our CVS history goes back. Processes in
state SSLEEP could only be counted if p_slptime == 0, but immediately
before loadav() is called, schedcpu() has just incremented p_slptime
on all SSLEEP processes.
2001-10-04 22:33:31 +00:00
Robert Watson
8c5d4fe829 o Modify access control checks in mmap() to use securelevel_gt() instead
of direct variable access.

Obtained from:	TrustedBSD Project
2001-09-26 20:29:39 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Peter Wemm
eb30c1c0b9 Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area.  KSE touched cpu_wait() which had the same change
replicated five ways for each platform.  Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86.  The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by:	jake, tmm, dillon
2001-09-10 04:28:58 +00:00
John Baldwin
29fdb744d1 Process priority is locked by the sched_lock, not the proc lock. 2001-09-01 20:16:30 +00:00
Matthew Dillon
7feaf028be make swapon() MPSAFE (will adjust syscalls.master later) 2001-08-31 22:15:37 +00:00
Matthew Dillon
6a33d53c48 mark obreak() and ovadvise() as being MPSAFE 2001-08-31 22:10:03 +00:00
Matthew Dillon
d2c60af81a Cleanup 2001-08-31 01:26:30 +00:00
Peter Wemm
3516c025ff Implement idle zeroing of pages. I've been tinkering with this
on and off since John Dyson left his work-in-progress.

It is off by default for now.  sysctl vm.zeroidle_enable=1 to turn it on.

There are some hacks here to deal with the present lack of preemption - we
yield after doing a small number of pages since we wont preempt otherwise.

This is basically Matt's algorithm [with hysteresis] with an idle process
to call it in a similar way it used to be called from the idle loop.

I cleaned up the includes a fair bit here too.
2001-08-25 05:00:44 +00:00
Matthew Dillon
676274db9b Remove support for the badly broken MAP_INHERIT (from -current only). 2001-08-24 19:29:56 +00:00
Matthew Dillon
219d632c15 Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area.  i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by:    freebsd-smp, jake
2001-08-22 04:07:27 +00:00
Matthew Dillon
0b76df7146 KASSERT if vm_page_t->wire_count overflows. 2001-08-22 04:01:56 +00:00
Matthew Dillon
2f9e4e8025 Limit the amount of KVM reserved for the buffer cache and for swap-meta
information.  The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache.  This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating.  The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.
2001-08-20 00:41:12 +00:00
John Baldwin
02cd7c3cf2 - Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
  The only caller outside of M_ASLEEP was the ata driver, which called both
  asleep() and await() with spl-raised, so there was no need for the
  asleep() and await() pair.  M_ASLEEP was unused.

Reviewed by:	jasone, peter
2001-08-10 06:56:12 +00:00
John Baldwin
8ec48c6dbf - Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
  The only caller outside of M_ASLEEP was the ata driver, which called both
  asleep() and await() with spl-raised, so there was no need for the
  asleep() and await() pair.  M_ASLEEP was unused.

Reviewed by:	jasone, peter
2001-08-10 06:37:05 +00:00
Thomas Moestl
59fa485c3e Add a missing semicolon to unbreak the kernel build with INVARIANTS
(which was unfortunately turned off in the confguration I used for the
last test build).

Spotted by:	jake
Pointy hat to:	tmm
2001-08-05 03:55:02 +00:00
John Baldwin
bd8e0d5871 Whitespace fixes. 2001-08-04 20:49:29 +00:00
Thomas Moestl
b4c53a8111 Add a zdestroy() function to the zone allocator. This is needed for the
unload case of modules that use their own zones.
It has been tested with the nfs module.
2001-08-04 20:17:05 +00:00
Alfred Perlstein
61ce6eeee3 Fixups for the initial allocation by dillon:
1) allocate fewer buckets
  2) when failing to allocate swap zone, keep reducing the zone by
     a third rather than a half in order to reduce the chance of
     allocating way too little.

I also moved around some code for readability.

Suggested by: dillon
Reviewed by: dillon
2001-08-02 07:54:58 +00:00
Jake Burkholder
3a9b5daf48 Oops. Last commit to vm_object.c should have got these files too.
Remove the use of atomic ops to manipulate vm_object and vm_page flags.
Giant is required here, so they are superfluous.

Discussed with:	dillon
2001-07-31 04:09:52 +00:00
Jake Burkholder
b06805ad34 Remove the use of atomic ops to manipulate vm_object and vm_page flags.
Giant is required here, so they are superfluous.

Discussed with:	dillon
2001-07-31 04:03:53 +00:00
Ian Dowse
a4821e444e Permit direct swapping to NFS regular files using swapon(2). We
already allow this for NFS swap configured via BOOTP, so it is
known to work fine.

For many diskless configurations is is more flexible to have the
client set up swapping itself; it can recreate a sparse swap file
to save on server space for example, and it works with a non-NFS
root filesystem such as an in-kernel filesystem image.
2001-07-28 20:18:38 +00:00
Assar Westerlund
d3e5863fa9 make vm_page_select_cache static
Requested by:	bde
2001-07-23 12:34:31 +00:00
Assar Westerlund
0379d76358 (vm_page_select_cache): add prototype 2001-07-21 17:08:15 +00:00
Benno Rice
1f246456a5 The i386-specific includes in this file were "fixed" by bracketing them with
#ifndef __alpha__.  Fix this for the rest of the world by turning it into
#ifdef __i386__.

Reviewed by:	obrien
2001-07-15 04:11:51 +00:00
Dag-Erling Smørgrav
bf3009895e Fix missing newline and terminator at the end of the vm.zone sysctl. 2001-07-09 03:37:33 +00:00
Matt Jacob
f343cf2135 Apply field bandages to the includes so compiles happen on alpha. 2001-07-05 06:13:44 +00:00
Matthew Dillon
7197571105 Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c.  It was exactly the
same for all platforms and updating them all was getting annoying.
2001-07-05 01:32:42 +00:00
Matthew Dillon
6d03d577a5 Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc...  vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)
2001-07-04 23:27:09 +00:00
Matthew Dillon
1b40f8c036 Change inlines back into mainline code in preparation for mutexing. Also,
most of these inlines had been bloated in -current far beyond their
original intent.  Normalize prototypes and function declarations to be ANSI
only (half already were).  And do some general cleanup.

(kernel size also reduced by 50-100K, but that isn't the prime intent)
2001-07-04 20:15:18 +00:00
Matthew Dillon
54d9214595 whitespace / register cleanup 2001-07-04 19:00:13 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
b62b9b648b Fix a XXX comment by moving the initialization of the number of pbuf's
for the vnode pager to a new vnode pager init method instead of making it
a hack in getpages().
2001-07-03 07:35:56 +00:00
John Baldwin
6d541bf1ae - Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex.
- Don't drop the vm mutex while grabbing the pbuf mutex to manipulate
  said variables.
2001-06-22 21:12:19 +00:00
Bosko Milekic
08442f8a82 Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

 o Reduce contention for SMP by offering per-CPU pools and locks.
 o Better use of data cache due to per-CPU pools.
 o Much less code cache pollution due to excessively large allocation macros.
 o Framework for `grouping' objects from same page together so as to be able
   to possibly free wired-down pages back to the system if they are no longer
   needed by the network stacks.

 Additional things changed with this addition:

  - Moved some mbuf specific declarations and initializations from
    sys/conf/param.c into mbuf-specific code where they belong.
  - m_getclr() has been renamed to m_get_clrd() because the old name is really
    confusing. m_getclr() HAS been preserved though and is defined to the new
    name. No tree sweep has been done "to change the interface," as the old
    name will continue to be supported and is not depracated. The change was
    merely done because m_getclr() sounds too much like "m_get a cluster."
  - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
    systat(1) (see TODO below).
  - Fixed systat(1) to display number of "free mbufs" based on new per-CPU
    stat structures.
  - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
    per-CPU stat structures. All infos are fetched via sysctl.

 TODO (in order of priority):

  - Re-enable mbtypes statistics in both netstat(1) and systat(1) after
    introducing an SMP friendly way to collect the mbtypes stats under the
    already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
    seems too costly for a mere stat update, especially when other locks are
    already present).
  - Optionally have systat(1) display not only "total free mbufs" but also
    "total free mbufs per CPU pool."
  - Fix minor length-fetching issues in netstat(1) related to recently
    re-enabled option to read mbuf stats from a core file.
  - Move reference counters at least for mbuf clusters into an unused portion
    of the cluster itself, to save space and need to allocate a counter.
  - Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
2001-06-22 06:35:32 +00:00
John Baldwin
ad6c5bbede Don't lock around swap_pager_swap_init() that is only called once during
the pagedaemon's startup code since it calls malloc which results in lock
order reversals.
2001-06-20 23:34:06 +00:00
John Baldwin
69a78d4666 Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for
now.  The proc locking isn't actually safe yet and won't be until the proc
locking is finished.
2001-06-20 00:48:20 +00:00
Matthew Dillon
ef6a93ef81 Cleanup the tabbing 2001-06-11 19:17:05 +00:00
Matthew Dillon
ff2b5645b5 Two fixes to the out-of-swap process termination code. First, start killing
processes a little earlier to avoid a deadlock.  Second, when calculating
the 'largest process' do not just count RSS.  Instead count the RSS + SWAP
used by the process.  Without this the code tended to kill small
inconsequential processes like, oh, sshd, rather then one of the many
'eatmem 200MB' I run on a whim :-).  This fix has been extensively tested on
-stable and somewhat tested on -current and will be MFCd in a few days.

Shamed into fixing this by: ps
2001-06-09 18:06:58 +00:00
Thomas Moestl
5c5c8fa826 Change the way information about swap devices is exported to be more
canonical: define a versioned struct xswdev, and add a sysctl node
handler that allows the user to get this structure for a certain device
index by specifying this index as last element of the MIB.
This new node handler, vm.swap_info, replaces the old vm.nswapdev
and vm.swapdevX.* (where X was the index) sysctls.
2001-06-01 22:53:10 +00:00
Thomas Moestl
d279178df7 Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
  the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
  from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by:	bde
Reviewed by:	bde (some time ago)
2001-06-01 13:23:28 +00:00
John Baldwin
342a1480aa Don't hold the VM lock across VOP's and other things that can sleep. 2001-05-29 16:58:25 +00:00
John Baldwin
190609dd48 Stick VM syscalls back under Giant if the BLEED option is not defined. 2001-05-24 18:04:29 +00:00
Matthew Dillon
ac8f990bde This patch implements O_DIRECT about 80% of the way. It takes a patchset
Tor created a while ago, removes the raw I/O piece (that has cache coherency
problems), and adds a buffer cache / VM freeing piece.

Essentially this patch causes O_DIRECT I/O to not be left in the cache, but
does not prevent it from going through the cache, hence the 80%.  For
the last 20% we need a method by which the I/O can be issued directly to
buffer supplied by the user process and bypass the buffer cache entirely,
but still maintain cache coherency.

I also have the code working under -stable but the changes made to sys/file.h
may not be MFCable, so an MFC is not on the table yet.

Submitted by:	tegge, dillon
2001-05-24 07:22:27 +00:00
John Baldwin
e6b961ffbd - Assert Giant is held in the vnode pager methods.
- Lock the VM while walking down a vm_object's backing_object list in
  vnode_pager_lock().
2001-05-23 22:51:23 +00:00
John Baldwin
3614c6fcbb - Add in several asserts of vm_mtx.
- Assert Giant in vm_pageout_scan() for the vnode hacking that it does.
- Don't hold vm_mtx around vget() or vput().
- Lock Giant when calling vm_pageout_scan() from the pagedaemon.  Also,
  lock curproc while setting the P_BUFEXHAUST flag.
- For now we still hold Giant for all of the vm_daemon.  When process
  limits are locked we will be only need Giant for swapout_procs().
2001-05-23 22:48:28 +00:00
John Baldwin
60517fd1f7 - Assert that the vm lock is held for all of _vm_object_allocate().
- Restore the previous order of setting up a new vm_object.  The previous
  had a small bug where we zero'd out the flags after we set the
  OBJ_ONEMAPPING flag.
- Add several asserts of vm_mtx.
- Assert Giant is held rather than locking and unlocking it in a few
  places.
- Add in some #ifdef objlocks code to lock individual vm objects when
  vm objects each have their own lock someday.
- Don't bother acquiring the allproc lock for a ddb command.  If DDB
  blocked on the lock, that would be worse than having an inconsistent
  allproc list.
2001-05-23 22:42:10 +00:00
John Baldwin
21c641b2a9 - Add lots of vm_mtx assertions.
- Add a few KTR tracepoints to track the addition and removal of
  vm_map_entry's and the creation adn free'ing of vmspace's.
- Adjust a few portions of code so that we update the process' vmspace
  pointer to its new vmspace before freeing the old vmspace.
2001-05-23 22:38:00 +00:00
John Baldwin
3a2189d451 - Lock the VM around the pmap_swapin_proc() call in faultin().
- Don't lock Giant in the scheduler() function except for when calling
  faultin().
- In swapout_procs(), lock the VM before the proccess to avoid a lock order
  violation.
- In swapout_procs(), release the allproc lock before calling swapout().
  We restart the process scan after swapping out a process.
- In swapout_procs(), un #if 0 the code to bump the vmspace reference count
  and lock the process' vm structures.  This bug was introduced by me and
  could result in the vmspace being free'd out from under a running
  process.
- Fix an old bug where the vmspace reference count was not free'd if we
  failed the swap_idle_threshold2 test.
2001-05-23 22:35:45 +00:00
John Baldwin
b608320d4a - Fix the sw_alloc_interlock to actually lock itself when the lock is
acquired.
- Assert Giant is held in the strategy, getpages, and putpages methods and
  the getchainbuf, flushchainbuf, and waitchainbuf functions.
- Always call flushchainbuf() w/o the VM lock.
2001-05-23 22:31:15 +00:00
John Baldwin
6d556da5c2 Assert Giant is held for the device pager alloc and getpages methods since
we call the mmap method of the cdevsw of the device we are mmap'ing.
2001-05-23 22:27:52 +00:00
John Baldwin
e4ca250d4b - Obtain Giant in mmap() syscall while messing with file descriptors and
vnodes.
- Fix an old bug that would leak a reference to a fd if the vnode being
  mmap'd wasn't of type VREG or VCHR.
- Lock Giant in vm_mmap() around calls into the VM that can call into
  pager routines that need Giant or into other VM routines that need
  Giant.
- Replace code that used a goto to jump around the else branch of a test
  to use an else branch instead.
2001-05-23 22:17:43 +00:00
John Baldwin
bb10bb4978 Acquire Giant around vm_map_remove() inside of the obreak() syscall for
vm_object_terminate().
2001-05-23 22:13:10 +00:00
John Baldwin
576f0c5fa4 Take a more conservative approach and still lock Giant around VM faults
for now.
2001-05-23 22:09:18 +00:00
John Baldwin
c52f090cfb Set the phys_pager_alloc_lock to 1 when it is acquired so that it is
actually locked.
2001-05-23 19:52:23 +00:00
Alfred Perlstein
c5e62505ad aquire Giant when playing with the buffercache and doing IO.
use msleep against the vm mutex while waiting for a page IO to complete.
2001-05-23 10:28:11 +00:00
Alfred Perlstein
240e0fdd93 aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodone
with the mutex held.
2001-05-22 19:01:26 +00:00
John Baldwin
86e92ee7e1 Remove duplicate include and sort includes. 2001-05-22 07:21:46 +00:00
John Baldwin
7d4ad42de5 Sort includes. 2001-05-22 07:01:11 +00:00
John Baldwin
12635f9c89 Unlock the VM lock at the end of munlock() instead of locking it again. 2001-05-22 06:07:36 +00:00
John Baldwin
874468957d Sort includes from previous commit. 2001-05-22 05:35:45 +00:00
John Baldwin
4edf4a58e6 Sort includes. 2001-05-22 00:56:25 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
John Baldwin
ea7549540f - Use a timeout for the tsleep in scheduler() instead of having vmmeter()
wakeup proc0 by hand to enforce the timeout.
- When swapping out a process, keep the process locked via the proc lock
  from the first checks up until we clear PS_INMEM and set PS_SWAPPING in
  swapout().  The swapout() function now must be called with the proc lock
  held and releases it before returning.
- Comment out the code to attempt to lock a process' VM structures before
  swapping out.  It is broken in that it releases the lock after obtaining
  it.  If it does grab the lock, it needs to hand it off to swapout()
  instead of releasing it.  This can be revisisted when the VM is locked
  as this is a valid test to perform.  It also causes a lock order reversal
  for the time being, which is the immediate cause for temporarily
  disabling it.
2001-05-18 00:08:38 +00:00
John Baldwin
1c58e4e550 During the code to pick a process to kill when memory is exhausted, keep
the process in question locked as soon as we find it and determine it to
be eligible until we actually kill it.  To avoid deadlock, we don't block
on the process lock but skip any process that is already locked during our
search.
2001-05-17 22:49:03 +00:00
John Baldwin
c96d52a913 - Use PROC_LOCK_ASSERT instead of a direct mtx_assert.
- Don't hold Giant in the swapper daemon while we walk the list of
  processes looking for a process to swap back in.
- Don't bother grabbing the sched_lock while checking a process' sleep
  time in swapout_procs() to ensure that a process has been idle for at
  least swap_idle_threshold2 before swapping it out.  If we lose the race
  we just let a process stay in memory until the next call of
  swapout_procs().
- Remove some unneeded spl's, sched_lock does all the locking needed in
  this case.
2001-05-15 22:20:44 +00:00
Poul-Henning Kamp
a468031ce8 Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.
2001-05-06 20:00:03 +00:00
Mark Murray
559034b748 Putting sys/lockmgr.h in here allows us to depollute userland includes
a bit.
OK'ed by:	bde
2001-05-03 11:33:51 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Alfred Perlstein
93c7ba9f09 Address a number of problems with sysctl_vm_zone().
The zone allocator's locks should be leaflocks, meaning that they
should never be held when entering into another subsystem, however
the sysctl grabs the zone global mutex and individual zone mutexes
while holding the lock it calls SYSCTL_OUT which recurses into the
VM subsystem in order to wire user memory to do a safe copy.  This
can block and cause lock order reversals.

To fix this:
  lock zone global.
  get a count of the number of zones.
  unlock global.
  allocate temporary storage.
  format and SYSCTL_OUT the banner.
  lock global.
  traverse list.
    make sure we haven't looped more than the initial count taken
      to avoid overflowing the allocated buffer.
    lock each nodes.
    read values and format into buffer.
    unlock individual node.
  unlock global.
  format and SYSCTL_OUT the rest of the data.
  free storage.
  return.

Other problems included not checking for errors when doing sysctl out
of the column header.  Fixed.

Inconsistant termination of the copied string. Fixed.

Objected to by: des (for not using sbuf)

Since the output is not variable length and I'm actually over
allocating signifigantly and I'd like to get this fixed now, I'll
work on the sbuf convertion at a later date.  I would not object
to someone else taking it upon themselves to convert it to sbuf.
I hold no MAINTIANER rights to this code (for now).
2001-04-27 22:24:45 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Alfred Perlstein
d8d5fa8805 vnode_pager_freepage() is really vm_page_free() in disguise,
nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
2001-04-19 06:18:23 +00:00
Alfred Perlstein
a9fa2c05fc Protect pager object creation with sx locks.
Protect pager object list manipulation with a mutex.

It doesn't look possible to combine them under a single sx lock because
creation may block and we can't have the object list manipulation block
on anything other than a mutex because of interrupt requests.
2001-04-18 20:24:16 +00:00
Alfred Perlstein
305dd591ee Fix the botched rev 1.59 where I made it such that without INVARIANTS
the map is never locked.

Submitted by: tegge
2001-04-18 05:30:24 +00:00
Poul-Henning Kamp
f84e29a06c This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine.  For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
2001-04-17 08:56:39 +00:00
Alfred Perlstein
cc64b484dd use TAILQ_FOREACH, fix a comment's location 2001-04-15 10:22:04 +00:00
Alfred Perlstein
971dd34298 if/panic -> KASSERT 2001-04-13 11:15:40 +00:00
Alfred Perlstein
2a758ebe58 protect pbufs and associated counts with a mutex 2001-04-13 10:23:32 +00:00