117 Commits

Author SHA1 Message Date
dillon
cbc4469f38 whitespace / register cleanup 2001-07-04 19:00:13 +00:00
dillon
e028603b7e With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
jhb
3910f65aa2 Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for
now.  The proc locking isn't actually safe yet and won't be until the proc
locking is finished.
2001-06-20 00:48:20 +00:00
jhb
b6e2dc1111 - Lock the VM around the pmap_swapin_proc() call in faultin().
- Don't lock Giant in the scheduler() function except for when calling
  faultin().
- In swapout_procs(), lock the VM before the proccess to avoid a lock order
  violation.
- In swapout_procs(), release the allproc lock before calling swapout().
  We restart the process scan after swapping out a process.
- In swapout_procs(), un #if 0 the code to bump the vmspace reference count
  and lock the process' vm structures.  This bug was introduced by me and
  could result in the vmspace being free'd out from under a running
  process.
- Fix an old bug where the vmspace reference count was not free'd if we
  failed the swap_idle_threshold2 test.
2001-05-23 22:35:45 +00:00
alfred
a3f0842419 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
jhb
70b98f2d04 - Use a timeout for the tsleep in scheduler() instead of having vmmeter()
wakeup proc0 by hand to enforce the timeout.
- When swapping out a process, keep the process locked via the proc lock
  from the first checks up until we clear PS_INMEM and set PS_SWAPPING in
  swapout().  The swapout() function now must be called with the proc lock
  held and releases it before returning.
- Comment out the code to attempt to lock a process' VM structures before
  swapping out.  It is broken in that it releases the lock after obtaining
  it.  If it does grab the lock, it needs to hand it off to swapout()
  instead of releasing it.  This can be revisisted when the VM is locked
  as this is a valid test to perform.  It also causes a lock order reversal
  for the time being, which is the immediate cause for temporarily
  disabling it.
2001-05-18 00:08:38 +00:00
jhb
8865204035 - Use PROC_LOCK_ASSERT instead of a direct mtx_assert.
- Don't hold Giant in the swapper daemon while we walk the list of
  processes looking for a process to swap back in.
- Don't bother grabbing the sched_lock while checking a process' sleep
  time in swapout_procs() to ensure that a process has been idle for at
  least swap_idle_threshold2 before swapping it out.  If we lose the race
  we just let a process stay in memory until the next call of
  swapout_procs().
- Remove some unneeded spl's, sched_lock does all the locking needed in
  this case.
2001-05-15 22:20:44 +00:00
markm
bcca5847d5 Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
jhb
79cf991a6b Convert the allproc and proctree locks from lockmgr locks to sx locks. 2001-03-28 11:52:56 +00:00
jake
55d5108ac5 Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different
  scheduling classes using different portions of the array.  This
  allows user processes to have their priorities propogated up into
  interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
  32.  We used to have 4 separate arrays of 32 queues each, so this
  may not be optimal.  The new run queue code was written with this
  in mind; changing the number of run queues only requires changing
  constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter.  This
  is intended to be used to create per-cpu run queues.  Implement
  wrappers for compatibility with the old interface which pass in
  the global run queue structure.
- Group the priority level, user priority, native priority (before
  propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
  symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
  This was used to detect when a process' priority had lowered and
  it should yield.  We now effectively yield on every interrupt.
- Activate propogate_priority().  It should now have the desired
  effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
  idle loop.  It interfered with propogate_priority() because
  the idle process needed to do a non-blocking acquire of Giant
  and then other processes would try to propogate their priority
  onto it.  The idle process should not do anything except idle.
  vm_page_zero_idle() will return in the form of an idle priority
  kernel thread which is woken up at apprioriate times by the vm
  system.
- Update struct kinfo_proc to the new priority interface.  Deliberately
  change its size by adjusting the spare fields.  It remained the same
  size, but the layout has changed, so userland processes that use it
  would parse the data incorrectly.  The size constraint should really
  be changed to an arbitrary version number.  Also add a debug.sizeof
  sysctl node for struct kinfo_proc.
2001-02-12 00:20:08 +00:00
bmilekic
f364d4ac36 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
jhb
ef07536d23 - Doh, lock faultin() with proc lock in scheduler().
- Lock p_swtime with sched_lock in scheduler() as well.
2001-01-25 01:38:09 +00:00
jhb
c5cc2f8e26 Argh, I didn't get this test right when I converted it. Break this up
into two separate if's instead of nested if's.  Also, reorder things
slightly to avoid unnecessary mutex operations.
2001-01-24 12:23:17 +00:00
jhb
9e1a5e2c5b - Catch up to proc flag changes.
- Proc locking in a few places.
- faultin() now must be called with the proc lock held.
- Split up swappable() into a couple of tests so that it can be locke in
  swapout_procs().
- Use queue macros.
2001-01-24 11:25:56 +00:00
jake
a4ad237eaa - Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr.  Also provides macros for the flags
  pased to specify shared, exclusive or release which map to the
  lockmgr flags.  This is so that the use of lockmgr can be easily
  replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
2000-12-13 00:17:05 +00:00
jhb
b8b5877ce7 Protect p_stat with sched_lock. 2000-12-02 03:29:33 +00:00
jake
0c0be4e826 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
jhb
9ae17765f4 - Catch a machine/mutex.h -> sys/mutex.h I somehow missed.
- Close a small race condition.  The sched_lock mutex protects
  p->p_stat as well as the run queues.  Another CPU could change p_stat
  of the process while we are waiting for the lock, and we would end up
  scheduling a process that isn't runnable.
2000-10-25 00:04:16 +00:00
jasone
769e0f974d Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
phk
75e82c815e Remove unneeded <sys/buf.h> includes.
Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.
2000-04-18 15:15:39 +00:00
charnier
5c16c2a8f1 Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce
2000-03-27 20:41:17 +00:00
charnier
686df89909 Spelling 2000-03-26 15:20:23 +00:00
phk
bb6e8e0c2e Remove unused 3rd argument from vsunlock() which abused B_WRITE. 2000-03-13 10:47:24 +00:00
luoqi
5c9244cd12 User ldt sharing. 1999-12-06 04:53:08 +00:00
alc
9ef4299380 Reverse the sense of the test in the KASSERT's from the last commit. 1999-10-30 09:09:02 +00:00
phk
8d8f53dcdc Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by:   Tor.Egge@fast.no
Reviewed by:    alc, ken (partly)
1999-10-30 06:32:05 +00:00
phk
8e3c3eafed useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
dillon
4cb1921c9b Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
Replace various VM related page count calculations strewn over the
    VM code with inlines to aid in readability and to reduce fragility
    in the code where modules depend on the same test being performed
    to properly sleep and wakeup.

    Split out a portion of the page deactivation code into an inline
    in vm_page.c to support vm_page_dontneed().

    add vm_page_dontneed(), which handles the madvise MADV_DONTNEED
    feature in a related commit coming up for vm_map.c/vm_object.c.  This
    code prevents degenerate cases where an essentially active page may
    be rotated through a subset of the paging lists, resulting in premature
    disposal.
1999-09-17 04:56:40 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
peter
9ad9e224d9 Update for run queue code. 1999-08-19 00:15:27 +00:00
alc
279eafd09f Fix the following problem:
When creating new processes (or performing exec), the new page
directory is initialized too early.  The kernel might grow before
p_vmspace is initialized for the new process.  Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR:		kern/12378
Submitted by:	tegge
Reviewed by:	dfr
1999-07-21 18:02:27 +00:00
alc
c9ce3ad902 Remove some unused function and variable declarations. 1999-06-19 18:42:53 +00:00
peter
bc820937dc Only use p->p_lock (manage by PHOLD()/PRELE()) - P_NOSWAP/P_PHYSIO is no
longer set.
1999-04-06 03:11:34 +00:00
luoqi
082d37c1ac Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
julian
05a2232887 Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-26 02:38:12 +00:00
dillon
f89474ca7b Removed low-memory blockages at fork. This is the wrong place to put
this sort of test.  We need to fix the low-memory handling in general.
1999-01-21 09:36:23 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
julian
a7b385889e Changes to the LINUX_THREADS support to only allocate extra memory for
shared signal handling when there is shared signal handling being
used.

This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).

Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.

A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-07 21:23:50 +00:00
julian
d718e5c06d Fix two bogons created by 'patch(1)' in my last commit. 1998-12-19 08:23:31 +00:00
julian
61490236bc Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by:	 "Richard Seaman, Jr." <lists@tar.com>
Obtained from:	linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.
1998-12-19 02:55:34 +00:00
dg
3defb6d13f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
abial
121218d024 Make #define NO_SWAPPING a normal kernel config option.
Reviewed by:	jkh
1998-09-29 17:33:59 +00:00
dufault
e28788f2a4 Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:27:00 +00:00
eivind
d7a6ab2803 Staticize. 1998-02-09 06:11:36 +00:00
eivind
4547a09753 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
dyson
ebccbfc1ff 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
eivind
c552a9a1c3 Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
dyson
197bd655c4 VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
dyson
043ed4f1ba Fix the prototype for swapout_procs();
Submitted by:	dima@best.net
1997-12-11 02:10:55 +00:00
dyson
9821c09585 Support an optional, sysctl enabled feature of idle process swapout. This
is apparently useful for large shell systems, or systems  with long running
idle processes.  To enable the feature:

	sysctl -w vm.swap_idle_enabled=1

Please note that some of the other vm sysctl variables have been renamed
to be more accurate.
Submitted by:	Much of it from Matt Dillon <dillon@best.net>
1997-12-06 02:23:36 +00:00