9698 Commits

Author SHA1 Message Date
jeff
47d8080afa - ftick was initialized to -1 for init and any of it's children. Fix this by
setting ftick = ltick = ticks in schedinit().
 - Update the priority when we are pulled off of the run queue and when we
   are inserted onto the run queue so that it more accurately reflects our
   present status.  This is important for efficient priority propagation
   functioning.
 - Move the frequency test into sched_pctcpu_update() so we don't repeat it
   each time we'd like to call it.
 - Put some temporary work-around code in sched_priority() in case the tick
   mechanism produces a bad priority.  Eventually this should revert to an
   assert again.
2007-01-05 08:50:38 +00:00
jeff
ae96850d62 - Only allow the tdq_idx to increase by one each tick rather than up to
the most recently chosen index.  This significantly improves nice
   behavior.  This allows a lower priority thread to run some multiple of
   times before the higher priority thread makes it to the front of
   the queue.  A nice +20 cpu hog now only gets ~5% of the cpu when running
   with a nice 0 cpu hog and about 1.5% with a nice -20 hog.  A nice
   difference of 1 makes a 4% difference in cpu usage between two hogs.
 - Track a seperate insert and removal index.  When the removal index is
   empty it is updated to point at the current insert index.
 - Don't remove and re-add a thread to the runq when it is being adjusted
   down in priority.
 - Pull some conditional code out of sched_tick().  It's looking a bit
   large now.
2007-01-04 12:16:19 +00:00
jeff
60a15a9f22 - Don't pass a pointer into runq_choose_from(). The caller can adjust the
index if it chooses to.
2007-01-04 12:10:58 +00:00
jeff
2c3282f28a ULE 2.0:
- Remove the double queue mechanism for timeshare threads.  It was slow
   due to excess cache lines in play, caused suboptimal scheduling behavior
   with niced and other non-interactive processes, complicated priority
   lending, etc.
 - Use a circular queue with a floating starting index for timeshare threads.
   Enforces fairness by moving the insertion point closer to threads with
   worse priorities over time.
 - Give interactive timeshare threads real-time user-space priorities and
   place them on the realtime/ithd queue.
 - Select non-interactive timeshare thread priorities based on their cpu
   utilization over the last 10 seconds combined with the nice value.  This
   gives us more sane priorities and behavior in a loaded system as
   compared to the old method of using the interactivity score.  The
   interactive score quickly hit a ceiling if threads were non-interactive
   and penalized new hog threads.
 - Use one slice size for all threads.  The slice is not currently
   dynamically set to adjust scheduling behavior of different threads.
 - Add some new sysctls for scheduling parameters.

Bug fixes/Clean up:
 - Fix zeroing of td_sched after initialization in sched_fork_thread() caused
   by recent ksegrp removal.
 - Fix KSE interactivity issues related to frequent forking and exiting of
   kse threads.  We simply disable the penalty for thread creation and exit
   for kse threads.
 - Cleanup the cpu estimator by using tickincr here as well.  Keep ticks and
   ltick/ftick in the same frequency.  Previously ticks were stathz and
   others were hz.
 - Lots of new and updated comments.
 - Many many others.

Tested on:	up x86/amd64, 8way amd64.
2007-01-04 08:56:25 +00:00
jeff
78c3275ce1 - Add three new functions to support circular run queues.
- runq_add_pri allows the caller to position the thread at any rqindex
   regardless of priority.
 - runq_choose_from() chooses the lowest priority thread starting from a given
   index.  The index is updated with the rqindex of the chosen thread.  This
   routine is used to pick the lowest priority relative to a given index.
 - runq_remove_idx() updates the index if the run queue that held the removed
   thread is now empty.
2007-01-04 08:39:58 +00:00
jeff
b5c5ce5407 - Fix schedgraph output with KSE threads. Call thread_switchout() after
calling CTR() so we don't confuse a new kse thread with a real preemption.
2007-01-03 02:38:41 +00:00
davidxu
eafca3f075 Fix compiling. 2007-01-02 04:14:01 +00:00
rwatson
2a3330b81a Prefer a more traditional spelling of inhibited in comments and panic
messages.
2006-12-31 15:56:04 +00:00
jeff
9c815f4892 - More search and replace prettying. 2006-12-29 12:55:32 +00:00
jeff
e74edb3876 - Clean up a bit after the most recent KSE restructuring. 2006-12-29 10:37:07 +00:00
rwatson
4a9f23955f Break contents of kern_mac.c out into two files following a repo-copy:
mac_framework.c   Contains basic MAC Framework functions, policy
                  registration, sysinits, etc.

mac_syscalls.c    Contains implementations of various MAC system calls,
                  including ENOSYS stubs when compiling without options
                  MAC.

Obtained from:	TrustedBSD Project
2006-12-28 20:52:02 +00:00
rwatson
e7f843dc94 Update MAC Framework general comments, referencing various interfaces it
consumes and implements, as well as the location of the framework and
policy modules.

Refactor MAC Framework versioning a bit so that the current ABI version can
be exported via a read-only sysctl.

Further update comments relating to locking/synchronization.

Update copyright to take into account these and other recent changes.

Obtained from:	TrustedBSD Project
2006-12-28 17:25:57 +00:00
davidxu
70875d94ab break loop early if we know that there are at least two signals. 2006-12-25 03:00:15 +00:00
davidxu
f51b738f57 Fix typo, p_slptime should be td_slptime. 2006-12-24 01:52:27 +00:00
bms
1a77168e4f Drop all received data mbufs from a socket's queue if the MT_SONAME
mbuf is dropped, to preserve the invariant in the PR_ADDR case.

Add a regression test to detect this condition, but do not hook it
up to the build for now.

PR:             kern/38495
Submitted by:   James Juran
Reviewed by:    sam, rwatson
Obtained from:  NetBSD
MFC after:      2 weeks
2006-12-23 21:07:07 +00:00
rwatson
a1911a8513 Update comments to reflect changes in the extattrctl() code.
Clean up comment formatting.

Obtained from:	TrustedBSD Project
2006-12-23 00:30:03 +00:00
rwatson
520b5875d2 Following a repo-copy of vfs_syscalls.c to vfs_extattr.c, remove
non-extattr functions from vfs_extattr.c, and extattr functions from
vfs_syscalls.c.

Change copyright/license on vfs_extattr.c to my copyright/license on
the extended attribute implementation (from extattr.h).

Clean up includes a bit.

Obtained from:	TrustedBSD Project
2006-12-23 00:10:36 +00:00
rwatson
ae9ef07995 Move src/sys/sys/mac_policy.h, the kernel interface between the MAC
Framework and security modules, to src/sys/security/mac/mac_policy.h,
completing the removal of kernel-only MAC Framework include files from
src/sys/sys.  Update the MAC Framework and MAC policy modules.  Delete
the old mac_policy.h.

Third party policy modules will need similar updating.

Obtained from:	TrustedBSD Project
2006-12-22 23:34:47 +00:00
rrs
c427816562 The prepend function did not handle non-pkthdr's correctly.
It always called MH_ALIGN for small lengths being
prepended (less than MHLEN). This meant that if you did
a prepend on a non M_PKTHDR the system would panic with
the KASSERT in MH_ALIGN. Instead we are not aware of
this and do a MH_ALIGN or M_ALIGN as appropriate.

Reviewed by:	andre
Approved by:	gnn
2006-12-21 19:58:04 +00:00
rwatson
6fa1425be4 Remove mac_enforce_subsystem debugging sysctls. Enforcement on
subsystems will be a property of policy modules, which may require
access control check entry points to be invoked even when not actively
enforcing (i.e., to track information flow without providing
protection).

Obtained from:	TrustedBSD Project
Suggested by:	Christopher dot Vance at sparta dot com
2006-12-21 09:51:34 +00:00
rwatson
5749ecccba Expand commenting on label slots, justification for the MAC Framework locking
model, interactions between locking and policy init/destroy methods.

Rewrap some comments to 77 character line wrap.

Obtained from:	TrustedBSD Project
2006-12-20 20:38:44 +00:00
jkim
0099defbac MFP4: (part of) 110058
copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and
it is done from its wrapper functions to support 32-bit emulations.  After I
implemented this, I have briefly referenced NetBSD and Darwin.  NetBSD passes
copyin()/copyout() function pointers from wrappers.  Darwin passes size of
message type as an argument, which is actually similar to my first
implementation (P4 109706).  We may revisit these implementations later.
2006-12-20 19:26:30 +00:00
kib
9311fcbc5d In rev. 1.514, iodone on async buffer may happen before code checks the
vnode v_flag. For cluster buffers this would result in dereferencing NULL
b_vp. To prevent the panic, cache relevant vnode flag before calling
bstrategy.

Reported by:	Peter Holm, kris
Tested by:	Peter Holm
Reviewed by: tegge
Pointy hat to:	kib
2006-12-20 09:22:31 +00:00
davidxu
5a984630fa Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
    kern.threads.umtx_dflt_spins
        if the sysctl value is non-zero, a zero umutex.m_spincount will
        cause the sysctl value to be used a spin cycle count.
    kern.threads.umtx_max_spins
        the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130
2006-12-20 04:40:39 +00:00
mbr
a2c03bf6cb Back out rev. 1.266. The real cause for the recent panics has been fixed
in rev. 1.267 and there is no need to keep this test.
2006-12-20 02:49:59 +00:00
mbr
37965664cc Giant might have been temporarily dropped while waiting for proctree_lock, allowing for an
intervening tty_close() that cleared tp->t_session.

Submitted by:	tegge
MFC:		1 day
2006-12-19 22:34:32 +00:00
mbr
ccabbc6486 Add the tp->t_refcnt validity check back. There are still some race
conditions where tp->t_refcnt can go to zero.
2006-12-19 16:46:13 +00:00
davidxu
b0b74f9bd3 Remove unused sysctls. 2006-12-19 13:06:01 +00:00
pjd
6cc6a8d100 Use pipe_direct_write() optimization only if the data is in process' memory.
This fixes sending data through pipe from the kernel.

Fix suggested by:	rwatson
2006-12-19 12:52:22 +00:00
kmacy
af645e118f ktrace_cv is no longer used - remove
Submitted by: Attilio Rao
2006-12-17 00:16:09 +00:00
kmacy
bb69932355 Cleaner fix for handling declaration of loop variable under INVARIANTS
- in trying to avoid nested brackets and #ifdef INVARIANTS around i at the
  top, I broke booting for INVARIANTS all together :-(
- the cleanest fix is to simply assign to sq twice if INVARIANTS is enabled
- tested both with and without INVARIANTS :-/
2006-12-17 00:14:20 +00:00
ache
aebc61a22f Don't intermix assignments and variable declarations in prev. commit 2006-12-16 21:17:27 +00:00
ache
84d03f55f7 Fix NULL pointer reference for INVARIANTS case
Submitted by:   Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
2006-12-16 20:33:26 +00:00
rodrigc
10e4664552 In vfs_export(), if we specify MNT_DELEXPORT in the struct export_args,
after we perform the operations to delete the export,
call vfs_deleteopt() to delete the "export" mount option from
the linked list of mount options associated with that mount point.

This fixes one scenario:
- put a filesystem in /etc/exports to export it
- remove the filesystem from /etc/exports to delete the export and restart
  mountd
- try to do a "mount -u -o ro" or "mount -u -o rw" on that filesystem
  now that it is no  longer exported.
2006-12-16 15:50:36 +00:00
rodrigc
ccbffdda2c Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.
2006-12-16 15:44:03 +00:00
rodrigc
2aade96a59 Convert to ANSI-style function prototypes. 2006-12-16 12:06:59 +00:00
rwatson
4d1fe0d425 For now, back out sysv_ipc.c:1.30, which caused shmget() with odd mode
arguments to fail.  The mode field for shmget() appears to have undefined
meaning in the context of an already-present IPC object, but applications
appear to assume any arbitrary passed value will be ignored.  I had hoped
to revisit this more quickly, but am removing the change for now to
prevent toe-stubbing.

Reported by:	JAroslav Suchanek <jarda at grisoft dot cz>
PR:		kern/106078
2006-12-16 11:30:54 +00:00
kmacy
4e5f5353fb correct name of number of sleep queues 2006-12-16 07:50:39 +00:00
kmacy
7327d346fc Add second sleep queue so that sx and lockmgr can have separate sleep
queues for shared and exclusive acquisitions

Submitted by: Attilio Rao
Approved by: jhb
2006-12-16 06:54:09 +00:00
kmacy
ad3abace0c - Fix some gcc warnings in lock_profile.h
- add cnt_hold cnt_lock support for spin mutexes
- make sure contested is initialized to zero to only bump contested when appropriate
- move initialization function to kern_mutex.c to avoid cyclic dependency between
  mutex.h and lock_profile.h
2006-12-16 02:37:58 +00:00
n_hibma
c98f016084 Align the interfaces for the various watchdogs and make the interface
behave as expected.

Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
  WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
  explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
  lost your sense of humor, than don't add a define.

Specific changes:

i80321_wdog.c
  Don't roll your own passive watchdog tickle as this would defeat the
  purpose of an active (userland) watchdog tickle.

ichwd.c / ipmi.c:
  WD_ACTIVE means active patting of the watchdog by a userland process,
  not whether the watchdog is active. See sys/watchdog.h.

kern_clock.c:
  (software watchdog) Remove a check for WD_ACTIVE as this does not make
  sense here. This reverts r1.181.
2006-12-15 21:44:49 +00:00
kib
e7cdcb3240 Resolve two deadlocks that could be caused by busy md device backed
by vnode. Allow for md thread and the thread that owns lock on vnode
backing the md device to do the write even when runningbufspace is
exhausted.

Tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	2 weeks
2006-12-14 11:34:07 +00:00
jhb
65d8bd30a0 Add a function to return the MD interrupt source cookie associated with
an interrupt event.  Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.
2006-12-12 19:20:19 +00:00
jhb
7106027433 Add a comment and fix a whitespace nit. 2006-12-12 19:19:22 +00:00
julian
541d02c2d4 Fix a potential point of confusion. Art Ironport we've seen this end up
with an infinite loop in and out of the kernel during process shutdown.
2006-12-12 08:01:55 +00:00
rodrigc
fbf224913d Use vfs_mount_error() to log mount errors in a few places with human
readable strings which can be retrieved if an "errmsg" parameter is
passed into nmount().
2006-12-07 02:57:00 +00:00
julian
948c671f4a Changes to try fix sched_ule.c courtesy of David Xu. 2006-12-06 06:55:59 +00:00
julian
396ed947f6 Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
kmacy
ff9c89ea11 Bug fix for obscenely large wait times on uncontested locks
if waittime was zero (the lock was uncontested) l->lpo_waittime
in the hash table would not get initialized.

Inspection prompted by questions from: Attilio Rao
2006-12-04 22:15:50 +00:00
jhb
12956f1daa Fix an edge case in rman_manage_region() where it didn't handle a resource
ending at ULONG_MAX properly.  While here, use TAILQ_FOREACH_SAFE().

Tested by:	"Stephane E. Potvin" <sepotvin at videotron-ca>
MFC after:	1 week
2006-12-04 16:45:23 +00:00