Commit Graph

1192 Commits

Author SHA1 Message Date
Alan Cox
ff8f4ebe22 Add a comment documenting a race condition in vm_fault(): Specifically, a
modification is made to the vm_map while only a read lock is held.
2002-04-18 03:55:50 +00:00
Alan Cox
6139043b1f o Call vm_map_growstack() from vm_fault() if vm_map_lookup() has failed
due to conditions that suggest the possible need for stack growth.
   This has two beneficial effects: (1) we can
   now remove calls to vm_map_growstack() from the MD trap handlers and (2)
   simple page faults are faster because we no longer unnecessarily perform
   vm_map_growstack() on every page fault.
 o Remove vm_map_growstack() from the i386's trap_pfault().
 o Remove the acquisition and release of Giant from i386's trap_pfault().
   (vm_fault() still acquires it.)
2002-04-18 03:28:27 +00:00
Peter Wemm
334f706177 Do not free the vmspace until p->p_vmspace is set to null. Otherwise
statclock can access it in the tail end of statclock_process() at an
unfortunate time.  This bit me several times on an SMP alpha (UP2000)
and the problem went away with this change.  I'm not sure why it doesn't
break x86 as well.  Maybe it's because the clocks are much faster
on alpha (HZ=1024 by default).
2002-04-17 05:26:42 +00:00
Alan Cox
b208d0633f Remove an unused option, VM_FAULT_HOLD, to vm_fault(). 2002-04-17 02:23:57 +00:00
Peter Wemm
1a87a0da66 Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page().  This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap.  (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances.  Leaving this to pmap itself gives more flexibilitly.)

Reviewed by:	jake
Tested on:	i386, ia64 and (I believe) sparc64. (my alpha was hosed)
2002-04-15 16:00:03 +00:00
Jeff Roberson
5300d9dda2 Fix a witness warning when expanding a hash table. We were allocating the new
hash while holding the lock on a zone.  Fix this by doing the allocation
seperately from the actual hash expansion.

The lock is dropped before the allocation and reacquired before the expansion.
The expansion code checks to see if we lost the race and frees the new hash
if we do.  We really never will lose this race because the hash expansion is
single threaded via the timeout mechanism.
2002-04-14 13:47:10 +00:00
Jeff Roberson
0da47b2fc6 Protect the initial list traversal in sysctl_vm_zone() with the uma_mtx. 2002-04-14 12:39:38 +00:00
Jeff Roberson
af7f9b97b6 Fix the calculation that determines uz_maxpages. It was off for large zones.
Fortunately we have no large zones with maximums specified yet, so it wasn't
breaking anything.

Implement blocking when a zone exceeds the maximum and M_WAITOK is specified.
Previously this just failed like the old zone allocator did.  The old zone
allocator didn't support WAITOK/NOWAIT though so we should do what we
advertise.

While I was in there I cleaned up some more zalloc logic to further simplify
that code path and reduce redundant code.  This was needed to make the blocking
work properly anyway.
2002-04-14 01:56:25 +00:00
Jeff Roberson
bce9779110 Remember to unlock the zone if the fill count is too high.
Pointed out by:	pete, jake, jhb
2002-04-10 01:52:50 +00:00
Jeff Roberson
1d4cb54ba8 Quiet witness warnings about acquiring several zone locks. In the case that
this happens it is OK.
2002-04-08 21:08:17 +00:00
Jeff Roberson
86bbae32f4 Add a mechanism to disable buckets when the v_free_count drops below
v_free_min.  This should help performance in memory starved situations.
2002-04-08 06:20:34 +00:00
Jeff Roberson
605cbd6a08 Don't release the zone lock until after the dtor has been called. As far as I
can tell this could not have caused any problems yet because UMA is still
called with giant.

Pointy hat to:	jeff
Noticed by:	jake
2002-04-08 05:13:48 +00:00
Jeff Roberson
9c2cd7e5a9 Implement uma_zdestroy(). It's prototype changed slightly. I decided that I
didn't like the wait argument and that if you were removing a zone it had
better be empty.

Also, I broke out part of hash_expand and made a seperate hash_free() for use
in uma_zdestroy.
2002-04-08 04:48:58 +00:00
Jeff Roberson
a553d4b8eb Rework most of the bucket allocation and free code so that per cpu locks are
never held across blocking operations.  Also, fix two other lock order
reversals that were exposed by jhb's witness change.

The free path previously had a bug that would cause it to skip the free bucket
list in some cases and go straight to allocating a new bucket.  This has been
fixed as well.

These changes made the bucket handling code much cleaner and removed quite a
few lock operations.  This should be marginally faster now.

It is now possible to call malloc w/o Giant and avoid any witness warnings.
This still isn't entirely safe though because malloc_type statistics are not
protected by any lock.
2002-04-08 02:42:55 +00:00
Jeff Roberson
c235bfa551 Spelling correction; s/seperate/separate/g
Submitted by:	eric
2002-04-07 22:56:48 +00:00
Jeff Roberson
fedfeee018 There should be no remaining references to these two files in the tree. If
there are, it is an error.  vm_zone has been superseded by uma.
2002-04-07 22:51:18 +00:00
Jeff Roberson
d0b06acbe1 This fixes a bug where isitem never got set to 1 if a certain chain of events
relating to extreme low memory situations occured.  This was only ever seen on
the port build cluster, so many thanks to kris for helping me debug this.

Tested by:	kris
2002-04-07 22:47:36 +00:00
Alan Cox
aa4d062142 o Eliminate the use of grow_stack() and useracc() from sendsig(), osendsig(),
and osf1_sendsig().
 o Eliminate the prototype for the MD grow_stack() now that it has been removed
   from all platforms.
2002-04-05 00:52:15 +00:00
Matthew Dillon
80f5c8bf42 Embed a struct vmmeter in the per-cpu structure and add a macro,
PCPU_LAZY_INC() which increments elements in it for cases where we
can afford the occassional inaccuracy.  Use of per-cpu stats counters
avoids significant cache stalls in various critical paths that would
otherwise severely limit our cpu scaleability.

Adjust all sysctl's accessing cnt.* elements to now use a procedure
which aggregates the requested field for all cpus and for the global
vmmeter.

The global vmmeter is retained, since some stats counters, like v_free_min,
cannot be made per-cpu.  Also, this allows us to convert counters from
the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so
have at it!
2002-04-04 21:38:47 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
Jake Burkholder
48f9a59443 Fix a long standing 32bit-ism. Don't assume that the size of a chunk of
memory in phys_avail will fit in 'int', use vm_size_t.  This fixes booting
on sparc64 machines with more than 2 gigs of ram.

Thanks to Jan Chrillesen for providing me with access to a 4 gig machine.
2002-04-03 06:57:52 +00:00
Alfred Perlstein
157d7b3538 fix comment typo, s/neccisary/necessary/g 2002-04-02 21:25:12 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Jeff Roberson
f22a4b62f5 Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag.  Remove the dup_list and dup_ok code from subr_witness.  Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface.  The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by:	jhb
2002-03-27 09:23:41 +00:00
Alan Cox
433b72aa12 Remove an unused prototype. 2002-03-26 05:30:59 +00:00
Jeff Roberson
f4af24d55d Reset the cachefree statistics after draining the cache. This fixes a bug
where a sysctl within 20 seconds of a cache_drain could yield negative "USED"
counts.

Also, grab the uma_mtx while in the sysctl handler.  This hadn't caused
problems yet because Giant is held all the time.

Reported by:	kkenn
2002-03-24 10:56:11 +00:00
Jeff Roberson
736ee5907f Add uma_zone_set_max() to add enforced limits to non vm obj backed zones. 2002-03-20 05:28:34 +00:00
Jeff Roberson
670d17b5c0 Remove references to vm_zone.h and switch over to the new uma API. 2002-03-20 04:02:59 +00:00
Alfred Perlstein
11caded34f Remove __P. 2002-03-19 22:20:14 +00:00
Jeff Roberson
9eb6e51923 Quit a warning introduced by UMA. This only occurs on machines where
vm_size_t != unsigned long.

Reviewed by:	phk
2002-03-19 11:49:10 +00:00
Peter Wemm
30171114b3 Fix a gcc-3.1+ warning.
warning: deprecated use of label at end of compound statement

ie: you cannot do this anymore:
switch(foo) {
....

default:
}
2002-03-19 11:02:06 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Brian Feldman
25adb370be Back out the modification of vm_map locks from lockmgr to sx locks. The
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.

Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.
2002-03-18 15:08:09 +00:00
Alan Cox
9f0567f557 Remove vm_object_count: It's unused, incorrectly maintained and duplicates
information maintained by the zone allocator.
2002-03-17 18:37:37 +00:00
Alan Cox
5ee9fe6ba1 Undo part of revision 1.57: Now that (o)sendsig() doesn't call useracc(),
the motivation for saving and restoring the map->hint in useracc() is gone.
(The same tests that motivated this change in revision 1.57 now show that
there is no performance loss from removing it.)  This was really a hack and
some day we would have had to add new synchronization here on map->hint
to maintain it.
2002-03-17 07:01:42 +00:00
Alan Cox
2f6c16e1e8 Acquire a read lock on the map inside of vm_map_check_protection() rather
than expecting the caller to do so.  This (1) eliminates duplicated code in
kernacc() and useracc() and (2) fixes missing synchronization in munmap().
2002-03-17 03:19:31 +00:00
Jake Burkholder
ac59490b5e Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/
pmap_qremove.  pmap_kenter is not safe to use in MI code because it is not
guaranteed to flush the mapping from the tlb on all cpus.  If the process
in question is preempted and migrates cpus between the call to pmap_kenter
and pmap_kremove, the original cpu will be left with stale mappings in its
tlb.  This is currently not a problem for i386 because we do not use PG_G on
SMP, and thus all mappings are flushed from the tlb on context switches, not
just user mappings.  This is not the case on all architectures, and if PG_G
is to be used with SMP on i386 it will be a problem.  This was committed by
peter earlier as part of his fine grained tlb shootdown work for i386, which
was backed out for other reasons.

Reviewed by:	peter
2002-03-17 00:56:41 +00:00
Kirk McKusick
0d2af52141 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
Brian Feldman
9cb574590e Document faultstate.lookup_still_valid more than none.
Requested by:	alfred
2002-03-14 02:10:14 +00:00
Brian Feldman
0e0af8ecda Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system.  To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe.  One more
user of lockmgr down, many more to go :)
2002-03-13 23:48:08 +00:00
Eivind Eklund
a128794977 - Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
  while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by:	alc
2002-03-10 21:52:48 +00:00
Tor Egge
ff91d7800f Revert change in revision 1.53 and add a small comment to protect
the revived code.

vm pages newly allocated are marked busy (PG_BUSY), thus calling
vm_page_delete before the pages has been freed or unbusied will
cause a deadlock since vm_page_object_page_remove will wait for the
busy flag to be cleared.  This can be triggered by calling malloc
with size > PAGE_SIZE and the M_NOWAIT flag on systems low on
physical free memory.

A kernel module that reproduces the problem, written by Logan Gabriel
<logan@mail.2cactus.com>, can be found in the freebsd-hackers mail
archive (12 Apr 2001).  The problem was recently noticed again by
Archie Cobbs <archie@dellroad.org>.

Reviewed by:	dillon
2002-03-09 16:24:27 +00:00
Matthew Dillon
8c5dffe8ca Fix a bug in the vm_map_clean() procedure. msync()ing an area of memory
that has just been mapped MAP_ANON|MAP_NOSYNC and has not yet been accessed
will panic the machine.

MFC after:	1 day
2002-03-07 03:54:56 +00:00
Matthew Dillon
b9b7a4be90 Add a sequential iteration optimization to vm_object_page_clean(). This
moderately improves msync's and VM object flushing for objects containing
randomly dirtied pages (fsync(), msync(), filesystem update daemon),
and improves cpu use for small-ranged sequential msync()s in the face of
very large mmap()ings from O(N) to O(1) as might be performed by a database.

A sysctl, vm.msync_flush_flag, has been added and defaults to 3 (the two
committed optimizations are turned on by default).  0 will turn off both
optimizations.

This code has already been tested under stable and is one in a series of
memq / vp->v_dirtyblkhd / fsync optimizations to remove O(N^2) restart
conditions that will be coming down the pipe.

MFC after:	3 days
2002-03-06 02:42:56 +00:00
Eivind Eklund
f52bd684f3 * Move bswlist declaration and initialization from kern/vfs_bio.c to
vm/vm_pager.c, which is the only place it is used.
* Make the QUEUE_* definitions and bufqueues local to vfs_bio.c.
* constify buf_wmesg.
2002-03-05 18:20:58 +00:00
Alan Cox
2be21c5e68 o Create vm_pageq_enqueue() to encapsulate code that is duplicated time
and again in vm_page.c and vm_pageq.c.
 o Delete unusused prototypes.  (Mainly a result of the earlier renaming
   of various functions from vm_page_*() to vm_pageq_*().)
2002-03-04 18:55:26 +00:00
Alan Cox
64190c7a2f Call vm_pageq_remove_nowakeup() rather than duplicating it. 2002-03-03 22:36:14 +00:00
Alan Cox
5714577006 Remove some long dead code. 2002-03-02 22:21:42 +00:00
John Baldwin
fdcc1cc09f Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic
and isn't strictly required.  However, it lowers the number of false
positives found when grep'ing the kernel sources for p_ucred to ensure
proper locking.
2002-02-27 19:18:10 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Mike Silbersack
7f3a40933b Fix a horribly suboptimal algorithm in the vm_daemon.
In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes.  Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations.  The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period.  With this change
in place, the vm_daemon completes in less than a second.  Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure.  It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR:		33542, 20393
Reviewed by:	dillon
MFC after:	1 week
2002-02-27 18:03:02 +00:00
Peter Wemm
d1693e1701 Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels.  Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely.  Userland programs still crashed.
2002-02-27 09:51:33 +00:00
Peter Wemm
bd1e3a0f89 Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places.  Do the same for i386.  This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead.  This will help for PAE down the road.

Obtained from:	jake (MI code, suggestions for MD part)
2002-02-27 02:14:58 +00:00
Peter Wemm
dd50331c0e Remove unused variable (td) 2002-02-26 01:01:37 +00:00
Poul-Henning Kamp
57c10583aa GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED. 2002-02-22 09:26:35 +00:00
Tor Egge
d2760948fe Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero hold
count that would otherwise be on one of the free queues.  This eliminates a
panic when broken programs unmap memory that still has pending IO from raw
devices.

Reviewed by:	dillon, alc
2002-02-19 23:19:30 +00:00
Mike Silbersack
0c9e47230a Add one more comment to the OOM changes so that future readers of
the code may better understand the code.

Suggested by:	dillon
MFC after:	1 week
2002-02-19 18:50:49 +00:00
Mike Silbersack
ef6020d187 Changes to make the OOM killer much more effective:
- Allow the OOM killer to target processes currently locked in
  memory.  These very often are the ones doing the memory hogging.
- Drop the wakeup priority of processes currently sleeping while
  waiting for their page fault to complete.  In order for the OOM
  killer to work well, the killed process and other system processes
  waiting on memory must be allowed to wakeup first.

Reviewed by:	dillon
MFC after:	1 week
2002-02-19 18:34:02 +00:00
Bruce Evans
1e92845e1b Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED,
DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET
and SIMPLELOCK_DEBUG.
2002-02-15 13:16:11 +00:00
Julian Elischer
2c1007663f In a threaded world, differnt priorirites become properties of
different entities.  Make it so.

Reviewed by:	jhb@freebsd.org (john baldwin)
2002-02-11 20:37:54 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Alfred Perlstein
582ec34cd8 Fix a race with free'ing vmspaces at process exit when vmspaces are
shared.

Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.

The race occured because of how the reference was utilized:
  test vmspace reference,
  possibly block,
  decrement reference

When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.

Submitted by: green
2002-02-05 21:23:05 +00:00
Matthew Dillon
027df6bdd7 GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid buffer
cache lockups for over a year now.

MFC after:		0 days
2002-01-31 18:39:44 +00:00
David Malone
d2979f90e7 Remove a parameter name from a prototype. 2002-01-25 21:33:10 +00:00
Bruce Evans
e50f5c2e8d Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined.
Fixed some style bugs.
2002-01-17 16:46:26 +00:00
Alfred Perlstein
a4db49537b Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
Matthew Dillon
23b590188f Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()
against VM_WAIT in the pageout code.  Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.

Hopefully MFC: before the 4.5 release
2001-12-20 22:42:27 +00:00
Matthew Dillon
3ebeaf5984 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
Luigi Rizzo
60363fb9f7 vm/vm_kern.c: rate limit (to once per second) diagnostic printf when
you run out of mbuf address space.

kern/subr_mbuf.c: print a warning message when mb_alloc fails, again
	rate-limited to at most once per second. This covers other
	cases of mbuf allocation failures. Probably it also overlaps the
	one handled in vm/vm_kern.c, so maybe the latter should go away.

This warning will let us gradually remove the printf that are scattered
across most network drivers to report mbuf allocation failures.
Those are potentially dangerous, in that they are not rate-limited and
can easily cause systems to panic.

Unless there is disagreement (which does not seem to be the case
judging from the discussion on -net so far), and because this is
sort of a safety bugfix, I plan to commit a similar change to STABLE
during the weekend (it affects kern/uipc_mbuf.c there).

Discussed-with: jlemon, silby and -net
2001-12-01 00:21:30 +00:00
Jonathan Lemon
4584bbf555 When laying out objects in a ZONE_INTERRUPT zone, allow them to cross
a page boundary, since we've already allocated all our contiguous kva
space up front.  This eliminates some memory wastage, and allows us to
actually reach the # of objects were specified in the zinit() call.

Reviewed by: peter, dillon
2001-11-17 00:40:48 +00:00
Matthew Dillon
fe8e0238cc Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress count
on a vnode-backed object must be incremented *after* obtaining the vnode
lock.  If it is bumped before obtaining the vnode lock we can deadlock
against vtruncbuf().

Submitted by:	peter, ps
MFC after:	3 days
2001-11-09 21:34:45 +00:00
Matthew Dillon
33c6774151 Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond the
file EOF.  This works around a bug in the ISOFS (CDRom) BMAP code which
returns bogus values for requests beyond the file EOF rather then returning
an error, resulting in either corrupt data being mmap()'d beyond the file EOF
or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of
CDRom files).

Reported by: peter / Yahoo
MFC after:	3 days
2001-11-05 18:58:47 +00:00
Matthew Dillon
e302698320 Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise().  It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.
2001-10-31 03:06:33 +00:00
Matthew Dillon
7a5a635273 Move recently added procedure which was incorrectly placed within an
#ifdef DDB block.
2001-10-26 16:27:54 +00:00
Matthew Dillon
245df27cee Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a
real effect.

Optimize vfs_msync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  Improves looping case by 500%.

Optimize ffs_sync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  This makes a couple of assumptions,
which I believe are ok, in regards to vnode stability when the mount list
mutex is held.  Improves looping case by 500%.

(more optimization work is needed on top of these fixes)

MFC after:	1 week
2001-10-26 00:08:05 +00:00
Matthew Dillon
57601bcb5d Syntax cleanup and documentation, no operational changes.
MFC after:	1 day
2001-10-21 06:12:06 +00:00
Ian Dowse
0eb6ce3169 Move the code that computes the system load average from vm_meter.c
to kern_synch.c in preparation for adding some jitter to the
inter-sample time.

Note that the "vm.loadavg" sysctl still lives in vm_meter.c which
isn't the right place, but it is appropriate for the current (bad)
name of that sysctl.

Suggested by:	jhb (some time ago)
Reviewed by:	bde
2001-10-20 13:10:43 +00:00
Matthew Dillon
b386828956 contigmalloc1() could cause the vm_page_zero_count to become incorrect.
Properly track the count.

Submitted by:	mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
2001-10-17 17:34:34 +00:00
Tor Egge
d6844b6bf6 Don't use an uninitialized field reserved for callers in the bio structure
passed to swap_pager_strategy().  Instead, use a field reserved for drivers
and initialize it before usage.

Reviewed by:	dillon
2001-10-15 23:02:54 +00:00
Tor Egge
30105b9ec4 Don't remove all mappings of a swapped out process if the vm map contained
wired entries.  vm_fault_unwire() depends on the mapping being intact.

Reviewed by:	dillon
2001-10-14 20:51:14 +00:00
Tor Egge
e7673b8424 Fix locking violations during page wiring:
- vm map entries are not valid after the map has been unlocked.

 - An exclusive lock on the map is needed before calling
   vm_map_simplify_entry().

Fix cleanup after page wiring failure to unwire all pages that had been
successfully wired before the failure was detected.

Reviewed by:	dillon
2001-10-14 20:47:08 +00:00
Matthew Dillon
33bd457d91 Makes contigalloc[1]() create the vm_map / underlying wired pages in the
kernel map and object in a manner that contigfree() is actually able to
free.  Previously contigfree() freed up the KVA space but could not
unwire & free the underlying VM pages due to mismatched pageability between
the map entry and the VM pages.

Submitted by:	Thomas Moestl <tmoestl@gmx.net>
Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
MFC after:	3 days
2001-10-13 04:23:37 +00:00
Matthew Dillon
00a6f47f13 Finally fix the VM bug where a file whos EOF occurs in the middle of a page
would sometimes prevent a dirty page from being cleaned, even when synced,
resulting in the dirty page being re-flushed to disk every 30-60 seconds or
so, forever.  The problem is that when the filesystem flushes a page to
its backing file it typically does not clear dirty bits representing areas
of the page that are beyond the file EOF.  If the file is also mmap()'d and
a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty
bits to VM_PAGE_BITS_ALL.  This combination could leave us with an uncleanable,
unfreeable page.

The solution is to have the vnode_pager detect the edge case and manually
clear the dirty bits representing areas beyond the file EOF.  The filesystem
does the rest and the page comes up clean after the write completes.

MFC after:	3 days
2001-10-12 18:17:34 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Ian Dowse
564bfabecb Remove the SSLEEP case from the load average computation. This has
been a no-op for as long as our CVS history goes back. Processes in
state SSLEEP could only be counted if p_slptime == 0, but immediately
before loadav() is called, schedcpu() has just incremented p_slptime
on all SSLEEP processes.
2001-10-04 22:33:31 +00:00
Robert Watson
8c5d4fe829 o Modify access control checks in mmap() to use securelevel_gt() instead
of direct variable access.

Obtained from:	TrustedBSD Project
2001-09-26 20:29:39 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Peter Wemm
eb30c1c0b9 Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area.  KSE touched cpu_wait() which had the same change
replicated five ways for each platform.  Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86.  The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by:	jake, tmm, dillon
2001-09-10 04:28:58 +00:00
John Baldwin
29fdb744d1 Process priority is locked by the sched_lock, not the proc lock. 2001-09-01 20:16:30 +00:00
Matthew Dillon
7feaf028be make swapon() MPSAFE (will adjust syscalls.master later) 2001-08-31 22:15:37 +00:00
Matthew Dillon
6a33d53c48 mark obreak() and ovadvise() as being MPSAFE 2001-08-31 22:10:03 +00:00
Matthew Dillon
d2c60af81a Cleanup 2001-08-31 01:26:30 +00:00
Peter Wemm
3516c025ff Implement idle zeroing of pages. I've been tinkering with this
on and off since John Dyson left his work-in-progress.

It is off by default for now.  sysctl vm.zeroidle_enable=1 to turn it on.

There are some hacks here to deal with the present lack of preemption - we
yield after doing a small number of pages since we wont preempt otherwise.

This is basically Matt's algorithm [with hysteresis] with an idle process
to call it in a similar way it used to be called from the idle loop.

I cleaned up the includes a fair bit here too.
2001-08-25 05:00:44 +00:00
Matthew Dillon
676274db9b Remove support for the badly broken MAP_INHERIT (from -current only). 2001-08-24 19:29:56 +00:00
Matthew Dillon
219d632c15 Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area.  i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by:    freebsd-smp, jake
2001-08-22 04:07:27 +00:00
Matthew Dillon
0b76df7146 KASSERT if vm_page_t->wire_count overflows. 2001-08-22 04:01:56 +00:00