- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.
Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.
Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.
Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.
Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).
PR: 20609
Submitted by: (Tor Egge) tegge
- If possible, context switch to the thread directly in sched_ithd(),
rather than triggering a delayed ast reschedule.
- Disable interrupts while restoring fpu state in the trap handler,
in order to ensure that we are not preempted in the middle, which
could cause migration to another cpu.
Reviewed by: peter
Tested by: peter (alpha)
to 1GB. A box of mine is running with MAXDSIZ and DFLDSIZ increased
up to 1.5GB.
Wishlist: It would be nice to warn if MAXTSIZ + MAXDSIZ + MAXSSIZ
exceeds VM_MAXUSER_ADDRESS - VM_MINUSER_ADDRESS.
incompletely converting simplelocks to mutexes (COM_LOCK() is supposed
to hide the SMP locking internals, but it now depends on mutex interfaces
being visible).
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.
Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.
Solved by: jake (the allmighty) :-)
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().
that name as a variable. Use mtx_owned(&Giant) where appropriate
instead.
- Proc locking.
- P_FOO -> PS_FOO.
- Update comments about enable interrupts during trap and why this may be
bad if we trap while holding a spin mutex.
- Don't bother resetting p to curproc in syscall() in case we are the child
returning from fork. The child hasn't returned from fork through syscall
in a while.
- Remove fork_return() as it has been superseded by the MI version.
the alpha mp_machdep.c.
- Proc locking.
- Catch up to the P_FOO -> PS_FOO proc flags changes.
- Stick ap_init()'s prototype with the other prototypes.
- Remove the Xforwardirq IPI.
- Remove unused simplelocks.
- Don't try to psignal() from forward_statclock(), but set the appropriate
signal pending flag in p_sflag instead.
- Add in KTR_SMP tracepoints for various SMP functions. (Brought over
from the alpha port)
- Setup proc0.p_heldmtx, proc0.contested, and curproc earlier so that we
can use mutexes.
- Initialize sched_lock and Giant earlier and enter Giant during init386.
- Use suser(9) instead of checking cr_uid directly.
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.
Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).
Submitted by: peter
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.
Reviewed By: peter
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.
This change is necessary in order to avoid some circular bootstrapping
dependencies.
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.
pre-emptable kernel. For variables of size 4 bytes or less they compile
to a single instruction, which does not allow a process to migrate cpus
in the middle, and get the value for the "wrong" cpu.
appropriate function, rather than doing a horse-and-buggy
acquire. They now take the mutex type as an arg and can be
used with sleep as well as spin mutexes.
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.
Reviewed by: jhb
non-386 atomic_load_acq(). %eax is an input since its value is used in
the cmpxchg instruction, but we don't care what value it is, so setting
it to a specific value is just wasteful. Thus, it is being used without
being initialized as the warning stated, but it is ok for it to be used
because its value isn't important. Thus, we are only sort of lying when
we say it is an output only operand.
- Add "cc" to the clobber list for atomic_load_acq() since the cmpxchgl
changes ZF.
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.
Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)
compiling errors where gcc would run out of registers.
- Add "cc" to the list of clobbers for micro-ops where we perform
instructions that alter %eflags.
- Use xchgl instead of cmpxchgl to release a spin lock. This could allow
for more efficient register allocation as we no longer mandate that %eax
be used.
- Reenable the optimized mutex micro-ops in the non-i386 case.
that modules can call.
- Remove the old gcc <= 2.8 versions of the atomic ops.
- Resort the order of some things in the file so that there is only
one #ifdef for KLD_MODULE, and so that all WANT_FUNCTIONS stuff is
moved to the bottom of the file.
- Remove ATOMIC_ACQ_REL() and just use explicit macros instead.
time I tinkered around here. Since INTREN is called from the interrupt
critical path now, it should not be too expensive. In this case, we
look at the bits being changed to decide which 8 bit IO port to write to
rather than unconditionally writing to both. I could probably have gone
further and only done the write if the bits actually changed, but that
seemed overkill for the usual case in interrupt threads.
[an outb is rather expensive when it has to cross the ISA bus]
exactly the same functionality via a sysctl, making this feature
a run-time option.
The default is 1(ON), which means that /dev/random device will
NOT block at startup.
setting kern.random.sys.seeded to 0(OFF) will cause /dev/random
to block until the next reseed, at which stage the sysctl
will be changed back to 1(ON).
While I'm here, clean up the sysctls, and make them dynamic.
Reviewed by: des
Tested on Alpha by: obrien
implement memory fences for the 486+. The 386 still uses versions w/o
memory fences as all operations on the 386 are not program ordered.
The 386 versions are not MP safe.
declarations of a variable of the same name. The one in the outer block
was unused and probably just slipped in at one point or another. This
silences a compiler warning.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.