Commit Graph

93 Commits

Author SHA1 Message Date
jake
961b97d434 Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
jake
d93fbc9916 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
grog
747ab40a69 Correct a couple of typos. 2000-05-07 05:09:45 +00:00
green
74f13b7793 Change the scheduler to actually respect the PUSER barrier. It's been
wrong for many years that negative niceness would lower the priority
of a process below PUSER, and once below PUSER, there were conditionals
in the code that are required to test for whether a process was in
the kernel which would break.

The breakage could (and did) cause lock-ups, basically nothing else
but the least nice program being able to run in some conditions.  The
algorithm which adjusts the priority now subtracts PRIO_MIN to do
things properly, and the ESTCPULIM() algorithm was updated to use
PRIO_TOTAL (PRIO_MAX - PRIO_MIN) to calculate the estcpu.

NICE_WEIGHT is now 1 to accomodate the full range of priorities better
(a -20 process with full CPU time has the priority of a +0 process with
no CPU time).  There are now 20 queues (exactly; 80 priorities) for
use in user processes' scheduling, and PUSER has been lowered to 48
to accomplish this.

This means, to the user, that things will be scheduled more correctly
(noticeable), there is no lock-up anymore WRT a niced -20 process
never releasing the CPU time for other processes.  In this fair system,
tsleep()ed < PUSER processes now will get the proper higher priority
than priority >= PUSER user processes.

The detective work of this was done by me, along with part of the
solution.  Luoqi Chen has provided most of the solution, and really
helped me understand what was happening better, to boot :)

Submitted by:   luoqi
Concept reviewed by:    bde
2000-04-30 18:33:43 +00:00
dillon
b852fcb160 The SMP cleanup commit broke UP compiles. Make UP compiles work again. 2000-03-28 18:06:49 +00:00
dillon
689641c1ea Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward.  A system call may select whether it needs the MP
    lock or not (the default being that it does need it).

    A great deal of conditional SMP code for various deadended experiments
    has been removed.  'cil' and 'cml' have been removed entirely, and the
    locking around the cpl has been removed.  The conditional
    separately-locked fast-interrupt code has been removed, meaning that
    interrupts must hold the CPL now (but they pretty much had to anyway).
    Another reason for doing this is that the original separate-lock for
    interrupts just doesn't apply to the interrupt thread mechanism being
    contemplated.

    Modifications to the cpl may now ONLY occur while holding the MP
    lock.  For example, if an otherwise MP safe syscall needs to mess with
    the cpl, it must hold the MP lock for the duration and must (as usual)
    save/restore the cpl in a nested fashion.

    This is precursor work for the real meat coming later: avoiding having
    to hold the MP lock for common syscalls and I/O's and interrupt threads.
    It is expected that the spl mechanisms and new interrupt threading
    mechanisms will be able to run in tandem, allowing a slow piecemeal
    transition to occur.

    This patch should result in a moderate performance improvement due to
    the considerable amount of code that has been removed from the critical
    path, especially the simplification of the spl*() calls.  The real
    performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch
2000-03-28 07:16:37 +00:00
dufault
705c38904d I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority.  This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by:	bde
2000-03-02 22:03:49 +00:00
dufault
0bdb67cb26 Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by:	jkh, bde
2000-03-02 16:20:07 +00:00
peter
031f01d30f Don't make the ktrace hook in tsleep() deref a null curproc after a panic.
PR:		15169
Submitted by:	David Gilbert <dgilbert@velocet.ca>
1999-11-30 09:01:46 +00:00
phk
8da3ba86dc Add a bit of sanity checking and problem avoidance in case the
timecounter hardware is bogus.

This will produce a new warning "microuptime() went backwards"
and try to not screw up the process resource accounting.
1999-11-29 11:29:04 +00:00
bde
1ad19bea4d Scheduler fixes equivalent to the ones logged in the following NetBSD
commit to kern_synch.c:

  ----------------------------
  revision 1.55
  date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
  Scheduler bug fixes and reorganization
  * fix the ancient nice(1) bug, where nice +20 processes incorrectly
    steal 10 - 20% of the CPU, (or even more depending on load average)
  * provide a new schedclk() mechanism at a new clock at schedhz, so high
    platform hz values don't cause nice +0 processes to look like they are
    niced
  * change the algorithm slightly, and reorganize the code a lot
  * fix percent-CPU calculation bugs, and eliminate some no-op code

  === nice bug === Correctly divide the scheduler queues between niced and
  compute-bound processes. The current nice weight of two (sort of, see
  `algorithm change' below) neatly divides the USRPRI queues in half; this
  should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
  being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
  and it was done after decay_cpu() which can only _reduce_ the value.  It
  has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
  scheduler-penalize themselves onto the same queue as nice +20 processes.
  (Or even a higher one.)

  === New schedclk() mechansism === Some platforms should be cutting down
  stathz before hitting the scheduler, since the scheduler algorithm only
  works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
  back and forth by 4 every time p_estcpu is touched (each occurance an
  abstraction violation), use p_estcpu without scaling and require schedhz
  to be generated directly at the right frequency. Use a default stathz (well,
  actually, profhz) / 4, so nothing changes unless a platform defines schedhz
  and a new clock.  Define these for alpha, where hz==1024, and nice was
  totally broke.

  === Algorithm change === The nice value used to be added to the
  exponentially-decayed scheduler history value p_estcpu, in _addition_ to
  be incorporated directly (with greater wieght) into the priority calculation.
  At first glance, it appears to be a pointless increase of 1/8 the nice
  effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
  because it will ramp up linearly but be decayed only exponentially, thus
  converging to an additional .75 nice for a loadaverage of one. I killed
  this, it makes the behavior hard to control, almost impossible to analyze,
  and the effect (~~nothing at for the first second, then somewhat increased
  niceness after three seconds or more, depending on load average) pointless.

  === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
  Collect scheduler functionality. Try to put each abstraction in just one
  place.
  ----------------------------

The details are a little different in FreeBSD:

=== nice bug ===   Fixing this is the main point of this commit.  We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor).  However, clipping at all is fundamentally
bad.  It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs.  This will be fixed
later.

=== New schedclk() mechanism ===  We don't use the NetBSD schedclk()
(now schedclock()) mechanism.  We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock.  This is more accurate
and flexible.

=== Algorithm change ===  Same change.

=== Other bugs ===  The p_pctcpu bug was fixed long ago.  We don't try as
hard to abstract functionality yet.

Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().

Agreed with by:		dufault
1999-11-28 12:12:13 +00:00
bde
0f795adedc Updated comments for the move in the previous commit. 1999-11-27 15:27:11 +00:00
bde
4955977bad Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend.  The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by:	dufault
1999-11-27 12:32:27 +00:00
phk
8fca18de89 This is a partial commit of the patch from PR 14914:
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914
1999-11-16 10:56:05 +00:00
marcel
d5e8d714b9 sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.
1999-09-29 15:03:48 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
peter
eac345f15d Don't initialize run queues here, do it all in one place. 1999-08-19 00:14:43 +00:00
bde
79ceaf95c2 The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them.  In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.
1999-03-05 16:38:13 +00:00
julian
ee38b91324 The tunable parameter for the scheduler quantum was inverted.
Higher numbers led to smaller quanta.
In discussion with BDE, change this parameter to be in uSecs
to make it machine independent,
and limit it to non zero multiples of 'tick' (rounding down).
Also make the variabel globally available so that the present function that
returns its value (used for posix scheduling I believe) can go away.

Submitted by: Bruce Evans <bde@freebsd.org>
1999-03-03 18:15:29 +00:00
bde
6d8d63664b Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process.  Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by:	dfr, phk
1999-02-28 10:53:29 +00:00
bde
d51135c0c3 Improved scheduling in uiomove(), etc. resched_wanted() is true too
often for it to be a good criterion for switching kernel cpu hogs --
it is true after most wakeups.  Use the criterion "has been running
for >= 2 quanta" instead.
1999-02-22 16:57:48 +00:00
eivind
89e1199534 KNFize, by bde. 1999-01-10 01:58:29 +00:00
eivind
a8dc66f457 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
dillon
953800406c Add asleep() and await() support. Currently highly experimental. A
small support structure had to be added to the proc structure, and
    a few minor conditional panics no longer apply.
1998-12-21 07:41:51 +00:00
dg
b4bceb0b07 Compare p_cpulimit with RLIM_INFINITY before comparing it with the process
runtime. p_runtime is unsigned while p_cpulimit is not, so this avoids the
nasty side effect of the process getting killed when the runtime comes up
"negative" due to other bugs.
1998-11-27 11:44:22 +00:00
bde
8fdbb5fce3 Fixed the previous fix - stathz doesn't give the statclock frequency
when it is 0.

Submitted by:	mostly by Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
1998-11-26 16:49:55 +00:00
bde
043d2a6202 Oops, yet again back out some local changes that shouldn't have been
in the previous commit.
1998-11-26 14:05:58 +00:00
bde
0d3ca540ea Fixed scaling of p_pctcpu. It was wrong by a factor of stathz/hz.
Until recently, this was half compensated for in at least ps and top
by multiplying by 100/stathz to get a better wrong factor of 100/hz.
1998-11-26 14:00:08 +00:00
bde
2fee4bebdc Oops, back out some local changes that shouldn't have been in the
previous commit.
1998-10-25 20:11:36 +00:00
bde
1dabb16c3c Fixed breakage of the !SMP case of roundrobin() in the previous commit. 1998-10-25 19:57:23 +00:00
phk
13c66194f4 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
dillon
d076acdc11 priority comparison in maybe_resched() didn't work properly if current
and chk process were on different scheduler queues.  Fixed.
1998-08-26 05:27:42 +00:00
bde
863d5c8b68 Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively.  Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
1998-07-15 02:32:35 +00:00
bde
0c05d7ae1f Moved definition of fscale from param.c to kern_synch.c where it
should always have been (it has no user-servicable parts even at
compile time) and staticized it.
1998-07-11 13:06:41 +00:00
phk
107b074e23 Add 3 sysctl variables for future use by ps)1_ 1998-06-30 21:25:58 +00:00
bde
9e868cbb1a Removed unused includes. 1998-06-21 18:02:50 +00:00
phk
d3d65c6b2e Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.
Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING:  Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by:       bde
1998-05-28 09:30:28 +00:00
tegge
4347025be3 Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot().  If the system was idle when
an interrupt caused a panic, this won't work.  Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.
1998-05-17 22:12:14 +00:00
phk
86337bf437 s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by:	bde
1998-05-17 11:53:46 +00:00
phk
5e9a131f20 Time changes mark 2:
* Figure out UTC relative to boottime.  Four new functions provide
      time relative to boottime.

    * move "runtime" into struct proc.  This helps fix the calcru()
      problem in SMP.

    * kill mono_time.

    * add timespec{add|sub|cmp} macros to time.h.  (XXX: These may change!)

    * nanosleep, select & poll takes long sleeps one day at a time

Reviewed by:    bde
Tested by:      ache and others
1998-04-04 13:26:20 +00:00
dufault
a9ef2eb2ee Remove duplicate comment 1998-03-28 18:16:29 +00:00
dufault
8ed0defc6e Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work.  Changes:

Change all "posix4" to "p1003_1b".  Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.
1998-03-28 11:51:01 +00:00
bde
cd450d6714 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
dufault
06743e32eb idprio processes must be preempted as soon as anything is runnable. 1998-03-11 20:50:42 +00:00
julian
10c5ccc30a Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by:	Kirk McKusick (mcKusick@mckusick.com)
Obtained from:  WHistle development tree
1998-03-08 09:59:44 +00:00
dufault
8893ec06df Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.

POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:25:55 +00:00
bde
59d87ac480 Fixed a missing newline in a debugging printf.
Fixed punctuation in some comments.
1998-02-25 06:04:46 +00:00
eivind
4547a09753 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
eivind
c552a9a1c3 Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
bde
85fbb446a9 Fixed style bugs in previous commit. 1997-12-29 08:54:52 +00:00