Commit Graph

353 Commits

Author SHA1 Message Date
Kip Macy
50d6e42434 - Forward port flush of page table updates on context switch or userret
- Forward port vfork XEN hack
2008-10-19 01:35:27 +00:00
Jeff Roberson
8df78c41d6 - Make SCHED_STATS more generic by adding a wrapper to create the
variables and sysctl nodes.
 - In reset walk the children of kern_sched_stats and reset the counters
   via the oid_arg1 pointer.  This allows us to add arbitrary counters to
   the tree and still reset them properly.
 - Define a set of switch types to be passed with flags to mi_switch().
   These types are named SWT_*.  These types correspond to SCHED_STATS
   counters and are automatically handled in this way.
 - Make the new SWT_ types more specific than the older switch stats.
   There are now stats for idle switches, remote idle wakeups, remote
   preemption ithreads idling, etc.
 - Add switch statistics for ULE's pickcpu algorithm.  These stats include
   how much migration there is, how often affinity was successful, how
   often threads were migrated to the local cpu on wakeup, etc.

Sponsored by:	Nokia
2008-04-17 04:20:10 +00:00
Jeff Roberson
b7edba7704 - Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs
to enter thread_suspend_check().
 - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
   thread_suspend_check() to ast() rather than userret().
 - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
   that we don't miss a suspend request.  If this is set use the
   expensive signal path.
 - Set NEEDSUSPCHK when creating a new thread in thr in case the
   creating thread is due to be suspended as well but has not yet.

Reviewed by:	davidxu (Authored original patch)
2008-03-21 08:23:25 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Joseph Koshy
d07f36b075 Kernel and hwpmc(4) support for callchain capture.
Sponsored by:	FreeBSD Foundation and Google Inc.
2007-12-07 08:20:17 +00:00
Julian Elischer
e01eafef2a A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
Jeff Roberson
b61ce5b0e6 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
Jeff Roberson
3036ab79e3 - Include opt_sched.h for SCHED_STATS. 2007-06-12 23:27:31 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Attilio Rao
b4b7081961 Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
Jeff Roberson
1c4bcd050a - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Robert Watson
0c14ff0eb5 Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Bruce Evans
1ca2c0183f kern_intr.c:
- Count (scheduling of) software interrupts (SWIs) as SWIs, not as
  hardware interrupts.
- Don't count (scheduling of) delayed SWIs as interrupts at all, since
  in the delayed case it is expected that there are many more scheduling
  calls than handling calls.  Perhaps all interrupts should be counted
  only when they are handled, but it is only counts of delayed SWIs that
  shouldn never be combined with the other counts.

subr_trap.c:
- Count (handling of) Asynchronous System Traps (ASTs) as traps, not as
  software interrupts.

Before these changes, the counter for SWIs only counted ASTs, and SWIs
weren't counted separately, but a subcounter for ASTs alone is less
needed than for most other exception sources.

4.4BSD-Lite uses the counters for similar things (actually matching
their names) on its main arches (hp300, ..., !i386) where more of the
exceptions are in hardware.
2006-10-18 04:48:09 +00:00
David Xu
42925630b6 Test before modifying p_sflag to avoid unconditionally cache line
ping-pong on SMP.
2006-02-10 14:59:16 +00:00
Poul-Henning Kamp
eb2da9a51f Simplify system time accounting for profiling.
Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly.  Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret().  Use the absolute value of
td->pticks in userret() and eliminate third argument.
2006-02-08 08:09:17 +00:00
Poul-Henning Kamp
5b1a8eb397 Modify the way we account for CPU time spent (step 1)
Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines.  (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark.  On more modern hardware no change
in performance is seen.
2006-02-07 21:22:02 +00:00
Robert Watson
2c255e9df6 Moderate rewrite of kernel ktrace code to attempt to generally improve
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.

These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler).  Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.

- Eliminate the ktrace worker thread and global record queue, as they
  are no longer used.  Keep the global free record list, as records
  are still used.

- Add a per-process record queue, which will hold any asynchronously
  generated records, such as from context switches.  This replaces the
  global queue as the place to submit asynchronous records to.

- When a record is committed asynchronously, simply queue it to the
  process.

- When a record is committed synchronously, first drain any pending
  per-process records in order to maintain ordering as best we can.
  Currently ordering between competing threads is provided via a global
  ktrace_sx, but a per-process flag or lock may be desirable in the
  future.

- When a process returns to user space following a system call, trap,
  signal delivery, etc, flush any pending records.

- When a process exits, flush any pending records.

- Assert on process tear-down that there are no pending records.

- Slightly abstract the notion of being "in ktrace", which is used to
  prevent the recursive generation of records, as well as generating
  traces for ktrace events.

Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on.  I.e., performing a
drain every (n) records.

MFC after:	1 month
Discussed with:	jhb
Requested by:	Marc Olzheim <marcolz at stack dot nl>
2005-11-13 13:27:44 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Jeff Roberson
9d65cdf6ff - Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 12:52:46 +00:00
Jeff Roberson
eb8d0e01c0 - The td_locks check is currently broken with snapshots and possibly
some case in unmount.  Disable the KASSERT until these problems can
   be diagnosed.

Sponsored by:	Isilon Systems, Inc.
2005-03-25 09:56:56 +00:00
Jeff Roberson
61ef09d118 - Fail an assert if we attempt to return with any lockmgr locks held in
userret().

Sponsored by:	Isilon Systems, Inc.
2005-03-24 09:35:38 +00:00
John Baldwin
99b808f461 Whitespace fix. 2004-12-30 20:30:58 +00:00
Jeff Roberson
6a98702001 - Run sched_userret() after thread_userret(). Before, sched_userret() would
lower the priority of the returning thread to a user priority before
   calling into thread_userret() which would call wakeup() which in turn would
   cause the returning thread to eventually context switch rather than
   completing its slice.  Allowing this thread to complete its slice first
   yields a 15% performance improvement in super-smack on my dual opteron with
   4BSD.
2004-12-26 07:30:35 +00:00
Poul-Henning Kamp
9197ce2ee5 Add a new per-thread private flag: TDP_GEOM.
This flag gets set whenever the thread posts an event on the GEOM
event queue, and if the flag is set when the thread is prepared
to return to userland from the kernel, g_waitidle() will be called
to make sure that the posted events have completed.

This can replace an insufficient number of g_waitidle() calls in
various other places, and has the advantage of being failsafe:  Any
system call which does a VOP_OPEN()/VOP_CLOSE will now correctly
wait for any geom events it posted as part of spoils or tastes.

Assert that topology and Giant is not held in g_waitidle().
2004-10-23 20:49:17 +00:00
John Baldwin
78c85e8dfc Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
John Baldwin
ea73c1ea21 Don't try to protect td_sticks with sched_lock. It doesn't need it as it
is only accessed by curthread.
2004-09-23 21:03:58 +00:00
John Baldwin
7eaec467d8 Various small style fixes. 2004-09-22 15:24:33 +00:00
Julian Elischer
5995adc206 Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after:	3 days
2004-08-31 07:34:54 +00:00
Julian Elischer
99e9dcb817 Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after:	2 days
2004-08-31 06:12:13 +00:00
David Xu
2b70a83aff Call thread_user_enter for M:N thread, ast() should be treated as another
entrance of kernel.
2004-08-08 22:28:33 +00:00
John Baldwin
52eb84641d - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags
since they are only accessed by curthread and thus do not need any
  locking.
- Move pr_addr and pr_ticks out of struct uprof (which is per-process)
  and directly into struct thread as td_profil_addr and td_profil_ticks
  as these variables are really per-thread.  (They are used to defer an
  addupc_intr() that was too "hard" until ast()).
2004-07-16 21:04:55 +00:00
John Baldwin
bf0acc273a - Change mi_switch() and sched_switch() to accept an optional thread to
switch to.  If a non-NULL thread pointer is passed in, then the CPU will
  switch to that thread directly rather than calling choosethread() to pick
  a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
  TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
  requiring all callers of mi_switch() to know to do this if they can be
  called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
  the middle of the function prototypes and up above into their own
  section.
2004-07-02 19:09:50 +00:00
John Baldwin
a3a7017895 Tidy up uprof locking. Mostly the fields are protected by both the proc
lock and sched_lock so they can be read with either lock held.  Document
the locking as well.  The one remaining bogosity is that pr_addr and
pr_ticks should be per-thread but profiling of multithreaded apps is
currently undefined.
2004-07-02 03:50:48 +00:00
Julian Elischer
4ccbe07e84 Remove unused variable. 2004-03-31 08:20:44 +00:00
Peter Wemm
37814395c1 Push Giant down a little further:
- no longer serialize on Giant for thread_single*() and family in fork,
  exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
  vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
  isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
2004-03-13 22:31:39 +00:00
Robert Watson
16df17d062 Put "failed to set signal flags properly for ast()" check under
DIAGNOSTIC instead of INVARIANTS.  INVARIANTS is intended for tests
that don't substantially change code flow or behavior (passive), but
this test required locking both the proc lock and scheduler lock
in order to execute.  It also appears to be a very advisory diagnostic
as opposed to an invariant violation.

Following discussion with:	bde
2004-03-05 17:35:28 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Jeff Roberson
29bcc4514f - Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL.  Assert that one of these is set in mi_switch() and propery
   adjust the rusage statistics.  This is to simplify the large number of
   users of this interface which were previously all required to adjust the
   proper counter prior to calling mi_switch().  This also facilitates more
   switch and locking optimizations.
 - Change all callers of mi_switch() to pass the appropriate paramter and
   remove direct references to the process statistics.
2004-01-25 03:54:52 +00:00
Peter Wemm
917cf8d2a3 Log involuntary context switches correctly. 2003-09-05 22:15:26 +00:00
David Xu
75ea65e3a2 kse.h is not needed for these files. 2003-08-05 12:08:49 +00:00
Peter Wemm
aeaead20b8 When ktracing context switches, make sure we record involuntary switches.
Otherwise, when we get a evicted from the cpu, there is no record of it.
This is not a default ktrace flag.
2003-07-31 01:36:24 +00:00
David Xu
9dde3bc999 o Change kse_thr_interrupt to allow send a signal to a specified thread,
or unblock a thread in kernel, and allow UTS to specify whether syscall
  should be restarted.
o Add ability for UTS to monitor signal comes in and removed from process,
  the flag PS_SIGEVENT is used to indicate the events.
o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with
  this flag set to wait for above signal event.
o For SA based thread, kernel masks all signal in its signal mask, let
  UTS to use kse_thr_interrupt interrupt a thread, and install a signal
  frame in userland for the thread.
o Add a tm_syncsig in thread mailbox, when a hardware trap occurs,
  it is used to deliver synchronous signal to userland, and upcall
  is schedule, so UTS can process the synchronous signal for the thread.

Reviewed by: julian (mentor)
2003-06-28 08:29:05 +00:00
David Xu
cd4f6ebb13 1. Add code to support bound thread. when blocked, a bound thread never
schedules an upcall. Signal delivering to a bound thread is same as
   non-threaded process. This is intended to be used by libpthread to
   implement PTHREAD_SCOPE_SYSTEM thread.
2. Simplify kse_release() a bit, remove sleep loop.
2003-06-15 12:51:26 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
John Baldwin
90af4afacb - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
John Baldwin
5eac9e2dcb The signotify() sanity check in userret() doesn't need Giant anymore. 2003-04-23 18:51:55 +00:00
John Baldwin
9752f794c7 - Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag and
rename them appropriately.  Protect both flags with both the proc lock
  and the sched_lock.
- Protect p_profthreads with the proc lock.
- Remove Giant from profil(2).
2003-04-22 20:54:04 +00:00
John Baldwin
f5d5cb3c7c Tweak locking in the PS_XCPU handler to hold the sched_lock while reading
p_runtime.
2003-04-17 22:33:04 +00:00
Jeff Roberson
4093529dee - Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
 - signotify() now operates on a thread since unmasked pending signals are
   stored in the thread.
 - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
2003-03-31 22:49:17 +00:00
Jeff Roberson
1bf4700bff - Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.
2003-03-31 22:02:38 +00:00
David Xu
21e0492ab1 Fix signal delivering bug for threaded process. 2003-03-11 02:59:50 +00:00
John Baldwin
263067951a Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().
2003-03-04 21:03:05 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
Jeff Roberson
58a3c27384 - Add a new function, thread_signal_add(), that is called from postsig to
add a signal to a mailbox's pending set.
 - Add a new function, thread_signal_upcall(), this causes the current thread
   to upcall so that we can deliver pending signals.

Reviewed by:	mini
2003-02-17 09:58:11 +00:00
Julian Elischer
4a338afd7a Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by:	 parts by davidxu@
Reviewed by:	jeff@ mini@
2003-02-17 09:55:10 +00:00
Jeff Roberson
e4625663c9 - Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc.  These counters are only examined through calcru.

Submitted by:	davidxu
Tested on:	x86, alpha, UP/SMP
2003-02-17 02:19:58 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
Tim J. Robbins
48ed1432c5 Use a local variable to store the number of ticks that elapsed in
kernel mode instead of (unintentionally) using the global `ticks'.
This error completely broke profiling.
2003-01-31 11:22:31 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Julian Elischer
93a7aa79d6 Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.
2002-12-28 01:23:07 +00:00
Robert Watson
52378b8acd To reduce per-return overhead of userret(), call into
mac_thread_userret() only if PS_MACPEND is set in the process AST mask.
This avoids the cost of the entry point in the common case, but
requires policies interested in the userret event to set the flag
(protected by the scheduler lock) if they do want the event.  Since
all the policies that we're working with which use mac_thread_userret()
use the entry point only selectively to perform operations deferred
for locking reasons, this maintains the desired semantics.

Approved by:	re
Requested by:	bde
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 19:00:17 +00:00
Julian Elischer
053effc60e iBack out david's last commit. the suspension code needs to be called
for non KSE processes too.
2002-10-26 04:44:17 +00:00
David Xu
3139ada54c Move suspension checking code from userret() into thread_userret(). 2002-10-26 02:56:51 +00:00
Jeff Roberson
b43179fbe8 - Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
 - Replace direct manipulation of scheduler data with hooks provided by the
   new api.
 - Remove KSE specific state modifications and single runq assumptions from
   kern_switch.c

Reviewed by:	-arch
2002-10-12 05:32:24 +00:00
John Baldwin
5715307f74 - Move p_cpulimit to struct proc from struct plimit and protect it with
sched_lock.  This means that we no longer access p_limit in mi_switch()
  and the p_limit pointer can be protected by the proc lock.
- Remove PRS_ZOMBIE check from CPU limit test in mi_switch().  PRS_ZOMBIE
  processes don't call mi_switch(), and even if they did there is no longer
  the danger of p_limit being NULL (which is what the original zombie check
  was added for).
- When we bump the current processes soft CPU limit in ast(), just bump the
  private p_cpulimit instead of the shared rlimit.  This fixes an XXX for
  some value of fix.  There is still a (probably benign) bug in that this
  code doesn't check that the new soft limit exceeds the hard limit.

Inspired by:	bde (2)
2002-10-09 17:17:24 +00:00
Juli Mallett
289e1e23d1 Access td->td_kse inside sched_lock.
Submitted by:	julian
2002-10-02 18:25:09 +00:00
Juli Mallett
bc7b9f1dba De-obfuscate local use of members of 'struct thread', for which we have
local variables, and group assignment.
2002-10-02 16:39:39 +00:00
Robert Watson
92dbb82a47 Add a new MAC entry point, mac_thread_userret(td), which permits policy
modules to perform MAC-related events when a thread returns to user
space.  This is required for policies that have floating process labels,
as it's not always possible to acquire the process lock at arbitrary
points in the stack during system call processing; process labels might
represent traditional authentication data, process history information,
or other data.

LOMAC will use this entry point to perform the process label update
prior to the thread returning to userspace, when plugged into the MAC
framework.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-02 02:42:38 +00:00
Juli Mallett
1d9c56964d Back our kernel support for reliable signal queues.
Requested by:	rwatson, phk, and many others
2002-10-01 17:15:53 +00:00
John Baldwin
feb2449610 Minor style nits in a comment. 2002-10-01 15:49:32 +00:00
John Baldwin
6cae6dacd5 Various style fixups.
Submitted by:	bde (mostly)
2002-10-01 14:16:50 +00:00
John Baldwin
f6ccde8308 Actually clear PS_XCPU in ast() when we handle it.
Submitted by:	bde
Pointy hat to:	jhb
2002-10-01 14:13:13 +00:00
John Baldwin
dc183990ca - Add a new per-process flag PS_XCPU to indicate that at least one thread
has exceeded its CPU time limit.
- In mi_switch(), set PS_XCPU when the CPU time limit is exceeded.
- Perform actual CPU time limit exceeded work in ast() when PS_XCPU is set.

Requested by:	many
2002-09-30 21:13:54 +00:00
Juli Mallett
1226f694e6 First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control.  There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland.  That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by:	New Gold Technology
Reviewed by:	bde, tjr, jake [an older version, logic similar]
2002-09-30 20:20:22 +00:00
Julian Elischer
253fdd5ba9 slightly clean up the thread_userret() and thread_consider_upcall() calls.
also some slight changes for TDF_BOUND testing and small style changes
Should ONLY affect KSE programs

Submitted by:	davidxu
2002-09-23 06:14:30 +00:00
Robert Watson
1c39a77468 Spell proprly properly:
failed to set signal flags proprly for ast()
  failed to set signal flags proprly for ast()
  failed to set signal flags proprly for ast()
  failed to set signal flags proprly for ast()
2002-08-22 14:36:03 +00:00
Jonathan Mini
aaa1c7715b Revert removal of cred_free_thread(): It is used to ensure that a thread's
credentials are not improperly borrowed when the thread is not current in
the kernel.

Requested by:	jhb, alfred
2002-07-11 02:18:33 +00:00
Julian Elischer
ad22735e3f Don't slow every syscall and trap by doing locks and stuff if the
'stop' bits are not set. This is a temporary thing.. I think this code probably
needs to be rewritten anyhow.
2002-07-10 06:40:22 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Jonathan Mini
01ad8a53db Remove unused diagnostic function cread_free_thread().
Approved by:	alfred
2002-06-24 06:22:00 +00:00
John Baldwin
d0c149fce8 We no longer need to acqure Giant in ast() for ktrpsig() in postsig() now
that ktrace no longer needs Giant.
2002-06-07 05:43:40 +00:00
Julian Elischer
628855e758 CURSIG() is not a macro so rename it cursig().
Obtained from:	KSE tree
2002-05-29 23:44:32 +00:00
Bruce Evans
79065dba2a Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by:	luoqi (first part only, long ago, reorganized by me)
Reminded by:	dillon
2002-04-04 17:49:48 +00:00
Jake Burkholder
b454c6dd29 Style fixes purposefully left out of last commit. I checked the kse tree
and didn't see any changes that this conflicts with.
2002-03-29 16:45:03 +00:00
Jake Burkholder
d0ce9a7e07 Remove abuse of intr_disable/restore in MI code by moving the loop in ast()
back into the calling MD code.  The MD code must ensure no races between
checking the astpening flag and returning to usermode.

Submitted by:	peter (ia64 bits)
Tested on:	alpha (peter, jeff), i386, ia64 (peter), sparc64
2002-03-29 16:35:26 +00:00
Warner Losh
cb9a238a8a Remove last two abuses of cpu_critical_{enter,exit} in the MI code.
Reviewed by: jake, jhb, rwatson
2002-03-21 06:11:09 +00:00
John Baldwin
01c04d2de9 Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount.  Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case.  In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place.  This also removse td_cache_ucred from struct thread.  This is
still only done #ifdef DIAGNOSTIC.

[ missed this file in the previous commit ]

Tested on:	i386, alpha
2002-03-20 21:12:04 +00:00
Jake Burkholder
39dda4e363 Make this compile.
Pointy hat to:	julian
2002-02-23 01:42:13 +00:00
Julian Elischer
77c4066424 Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by:	dillon@freebsd.org
2002-02-22 23:58:22 +00:00
Julian Elischer
2eb927e2bb If the credential on an incoming thread is correct, don't bother
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).

Reviewed by:	jhb@freebsd.org, bde@freebsd.org (slightly different version)
2002-02-17 01:09:56 +00:00
Julian Elischer
2c1007663f In a threaded world, differnt priorirites become properties of
different entities.  Make it so.

Reviewed by:	jhb@freebsd.org (john baldwin)
2002-02-11 20:37:54 +00:00
Bruce Evans
e744f30933 Changed the type of pcb_flags from u_char to u_int and adjusted things.
This removes the only atomic operation on a char type in the entire
kernel.
2002-01-17 17:49:23 +00:00
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
John Baldwin
9d234f99f7 Axe a stale comment. Holding sched_lock across both setrunqueue() and
mi_switch() is sufficient.
2002-01-04 10:55:51 +00:00