Commit Graph

267 Commits

Author SHA1 Message Date
jeff
acb93d599c Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
jeff
3b1acbdce2 - Pass the priority argument from *sleep() into sleepq and down into
sched_sleep().  This removes extra thread_lock() acquisition and
   allows the scheduler to decide what to do with the static boost.
 - Change the priority arguments to cv_* to match sleepq/msleep/etc.
   where 0 means no priority change.  Catch -1 in cv_broadcastpri() and
   convert it to 0 for now.
 - Set a flag when sleeping in a way that is compatible with swapping
   since direct priority comparisons are meaningless now.
 - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
   controls the boost behavior.  Turning it off gives better performance
   in some workloads but needs more investigation.
 - While we're modifying sleepq, change signal and broadcast to both
   return with the lock held as the lock was held on enter.

Reviewed by:	jhb, peter
2008-03-12 06:31:06 +00:00
jeff
e30139dff5 - KSE may free a thread that was never actually forked. This will leave
td_cpuset NULL.  Check for this condition before dereferencing the
   cpuset.

Reported by:	david@catwhisker.org, miwi@freebsd.org
Sponsored by:	Nokia
2008-03-12 05:01:14 +00:00
jeff
694203dedd Add cpuset, an api for thread to cpu binding and cpu resource grouping
and assignment.
 - Add a reference to a struct cpuset in each thread that is inherited from
   the thread that created it.
 - Release the reference when the thread is destroyed.
 - Add prototypes for syscalls and macros for manipulating cpusets in
   sys/cpuset.h
 - Add syscalls to create, get, and set new numbered cpusets:
   cpuset(), cpuset_{get,set}id()
 - Add syscalls for getting and setting affinity masks for cpusets or
   individual threads: cpuid_{get,set}affinity()
 - Add types for the 'level' and 'which' parameters for the cpuset.  This
   will permit expansion of the api to cover cpu masks for other objects
   identifiable with an id_t integer.  For example, IRQs and Jails may be
   coming soon.
 - The root set 0 contains all valid cpus.  All thread initially belong to
   cpuset 1.  This permits migrating all threads off of certain cpus to
   reserve them for special applications.

Sponsored by:	Nokia
Discussed with:	arch, rwatson, brooks, davidxu, deischen
Reviewed by:	antoine
2008-03-02 07:39:22 +00:00
julian
265714a11e give thread0 the tid 100000 and bumpt the others to start at 100001
MFC after:	1 week
2007-12-22 04:56:48 +00:00
jeff
4ec9caf00c Refactor select to reduce contention and hide internal implementation
details from consumers.

 - Track individual selecters on a per-descriptor basis such that there
   are no longer collisions and after sleeping for events only those
   descriptors which triggered events must be rescaned.
 - Protect the selinfo (per descriptor) structure with a mtx pool mutex.
   mtx pool mutexes were chosen to preserve api compatibility with
   existing code which does nothing but bzero() to setup selinfo
   structures.
 - Use a per-thread wait channel rather than a global wait channel.
 - Hide select implementation details in a seltd structure which is
   opaque to the rest of the kernel.
 - Provide a 'selsocket' interface for those kernel consumers who wish to
   select on a socket when they have no fd so they no longer have to
   be aware of select implementation details.

Tested by:	kris
Reviewed on:	arch
2007-12-16 06:21:20 +00:00
jeff
12adc443d6 - Re-implement lock profiling in such a way that it no longer breaks
the ABI when enabled.  There is no longer an embedded lock_profile_object
   in each lock.  Instead a list of lock_profile_objects is kept per-thread
   for each lock it may own.  The cnt_hold statistic is now always 0 to
   facilitate this.
 - Support shared locking by tracking individual lock instances and
   statistics in the per-thread per-instance lock_profile_object.
 - Make the lock profiling hash table a per-cpu singly linked list with a
   per-cpu static lock_prof allocator.  This removes the need for an array
   of spinlocks and reduces cache contention between cores.
 - Use a seperate hash for spinlocks and other locks so that only a
   critical_enter() is required and not a spinlock_enter() to modify the
   per-cpu tables.
 - Count time spent spinning in the lock statistics.
 - Remove the LOCK_PROFILE_SHARED option as it is always supported now.
 - Specifically drop and release the scheduler locks in both schedulers
   since we track owners now.

In collaboration with:	Kip Macy
Sponsored by:	Nokia
2007-12-15 23:13:31 +00:00
rrs
303afeb279 - Adds event handlers for process_ctor,process_dtor, process_init,
process_fini, thread_ctor, thread_dtor, thread_init, thread_fini. This
  will allow us to extend dynamically areas in proc/thread for dtrace ;-)
Reviewed by:    rwatson
2007-11-15 14:20:07 +00:00
julian
303014d009 This time REALLY copy the name from the proc to the thread as a default. 2007-11-15 06:35:26 +00:00
marcel
1e7c4f0a3f o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o  Add cpu_thread_free() which is called from thread_free()
   to counter-act cpu_thread_alloc().

i386:	Have cpu_thread_free() call cpu_thread_clean() to
	preserve behaviour.
ia64:	Have cpu_thread_free() call mtx_destroy() for the
	mutex initialized in cpu_thread_alloc().

PR: ia64/118024
2007-11-14 20:21:54 +00:00
julian
7ee6259be7 A bunch more files that should probably print out a thread name
instead of a process name.
2007-11-14 06:51:33 +00:00
julian
b248158d8d Make sure there is a good default thread name for all threads. 2007-11-14 06:04:57 +00:00
kib
9ae733819b Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with:	Peter Holm
Reviewed by:	jhb
2007-11-05 11:36:16 +00:00
julian
11e1aa0d18 Introduce a way to make pure kernal threads.
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.

kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.

All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.

fix top to show  kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.

make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'

man page fixes to follow.
2007-10-26 08:00:41 +00:00
jeff
13d3160ef5 - Call sched_sleep() before we suspend threads. sched_wakeup() is already
called via setrunnable().  This allows time slept while suspended to
   be accounted for swap.

Approved by:	re
2007-09-21 04:04:22 +00:00
attilio
e25b203061 Fix some entries in the locks static table of witness.
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
  the table, however, has been the source of a false positive LOR reporting
  with the dt_lock.  However, smp rendezvous lock would have had sched_lock
  there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
  appropriate stub.

Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
  declared as static.

Tested by: kris
Approved by: jeff (mentor)
Approved by: re
2007-09-20 20:38:43 +00:00
jeff
3fc0f8b973 - Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
   previously the sched_lock.  These bugs have existed for some time.
 - Allow swapout to try each thread in a process individually and then
   swapin the whole process if any of these fail.  This allows us to move
   most scheduler related swap flags into td_flags.
 - Keep ki_sflag for backwards compat but change all in source tools to
   use the new and more correct location of P_INMEM.

Reported by:	pho
Reviewed by:	attilio, kib
Approved by:	re (kensmith)
2007-09-17 05:31:39 +00:00
attilio
c2dedaa0a9 Actually, upcalls cannot be freed while destroying the thread because we
should call uma_zfree() with various spinlock helds.  Rearranging the
code would not help here because we cannot break atomicity respect
prcess spinlock, so the only one choice we have is to defer the operation.
In order to do this use a global queue synchronized through the kse_lock
spinlock which is freed at any thread_alloc() / thread_wait() through a
call to thread_reap().
Note that this approach is not ideal as we should want a per-process
list of zombie upcalls, but it follows initial guidelines of KSE authors.

Tested by: jkim, pav
Approved by: jeff, julian
Approved by: re
2007-07-27 09:21:18 +00:00
attilio
ad75d346f7 Actually, KSE kernel bits locking is broken and can lead likely to
dangerous races.
Fix this problems adding correct locking for the members of 'struct
kse_upcall' and other struct proc/struct thread related members.
For the moment, just leave ku_mflag and ku_flags "lazy" locked.
While here, cleanup the code removing the function kse_GC() (unused),
and merging upcall_link(), upcall_unlink(), upcall_stash() in their
respective callers (static functions, very short and only called in one
place).

Reported by: pav
Tested by: pav (on some pointyhat cluster nodes)
Approved by: jeff
Approved by: re
Sponsorized by: NGX Italy (http://www.ngx.it)
2007-07-23 14:52:22 +00:00
jeff
26422aea29 - Garbage collect unused concurrency functions.
- Remove unused kse fields from struct proc.
 - Group remaining fields and #ifdef KSE them.
 - Move some kern_kse.c only prototypes out of proc and into kern_kse.

Discussed with:	Julian
2007-06-12 19:49:39 +00:00
jeff
49712c9a60 Solve a complex exit race introduced with thread_lock:
- Add a count of exiting threads, p_exitthreads, to struct proc.
 - Increment p_exithreads when we set the deadthread in thread_exit().
 - When we thread_stash() a deadthread use an atomic to drop the count.
 - Spin until the p_exithreads count reaches 0 in thread_wait().
 - Lock the last exiting thread momentarily to be certain that it has
   exited cpu_throw().
 - Restructure thread_wait().  It does not need a loop as there will only
   ever be one thread.

Tested by:	moose@opera.com
Reported by:	kris, moose@opera.com
2007-06-12 07:24:46 +00:00
attilio
c105658c88 The current rusage code show peculiar problems:
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations

This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.

A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.

Submitted by: jeff
Approved by: jeff (mentor)
2007-06-09 18:56:11 +00:00
jeff
8931208c40 Commit 4/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.
 - Move some common code into thread_suspend_switch() to handle the
   mechanics of suspending a thread.  The locking here is incredibly
   convoluted and should be simplified.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:52:24 +00:00
attilio
9bd4fdf7ce Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
jeff
a7a8bac81f - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
jhb
38635fcc55 Align 'struct thread' on 16 byte boundaries so that the lower 4 bits are
always 0.  Previously we aligned threads on a minimum of 8-byte boundaries.

Note: This changes the uma zone to no longer cache align threads.  We
really want the uma zone to do align threads to MAX(16, cache line size)
but there currently isn't a good way to express that to uma.

Submitted by:	attilio
2007-03-27 16:51:34 +00:00
mohans
a332cb00d5 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
rwatson
2a3330b81a Prefer a more traditional spelling of inhibited in comments and panic
messages.
2006-12-31 15:56:04 +00:00
davidxu
b0b74f9bd3 Remove unused sysctls. 2006-12-19 13:06:01 +00:00
julian
396ed947f6 Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
davidxu
ce90069697 Remove member p_procscopegrp which is no longer used by libthr. 2006-10-27 05:45:44 +00:00
jb
f82c799735 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
davidxu
5a12667fcf This is initial version of POSIX priority mutex support, a new userland
mutex structure is added as following:
struct umutex {
        __lwpid_t       m_owner;
        uint32_t        m_flags;
        uint32_t        m_ceilings[2];
        uint32_t        m_spare[4];
};
The m_owner represents owner thread, it is a thread id, in non-contested
case, userland can simply use atomic_cmpset_int to lock the mutex, if the
mutex is contested, high order bit will be set, and userland should do locking
and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents
pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel
should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is
pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner
to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure
and kernel syscall should be invoked to do locking, this becauses
for such a mutex, kernel should always boost the thread's priority before
it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex,
the first element is used to boost thread's priority when it locked the mutex,
second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT
mutex's link list is kept in userland, the m_ceiling[1] is managed by thread
library so kernel needn't allocate memory to keep the link list, when such
a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED.
Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process
shared, if the flag is not set, it saves a vm_map_lookup() call.

The umtx chain is still used as a sleep queue, when a thread is blocked on
PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority
propagating, it is dynamically allocated and reference count is used,
it is not optimized but works well in my tests, while the umtx chain has
its own locking protocol, the priority propagating protocol are all protected
by sched_lock because priority propagating function is called with sched_lock
held from scheduler.

No visible performance degradation is found which these changes. Some parameter
names in _umtx_op syscall are renamed.
2006-08-28 04:24:51 +00:00
maxim
d96c84ab9e o Fix typo in the comment.
PR:		kern/99632
Submitted by:	clsung
2006-06-30 08:10:55 +00:00
davidxu
d7a4692118 Rethink it a bit, if there is a STOP flag, don't bother to resume other
threads.
2006-03-21 10:05:15 +00:00
davidxu
8aed544b7c Because JOB control has higher priority than single threading in
thread_suspend_check(), call thread_stopped() to report SIGCHLD
if there is JOB control in progress.
2006-03-21 08:41:15 +00:00
davidxu
baf4d3f4f1 1. Count last time slice, this intends to fix
"calcru: runtime went backwards" bug for threaded process.
2. Add comment about possible logical problem with scheduler.

MFC after: 3 days
2006-03-14 04:00:21 +00:00
davidxu
df0e90beed Remove unused code. 2006-03-13 10:37:25 +00:00
davidxu
f1ce5c8660 Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by:	jhb
MFC after:	1 week
2006-02-15 23:52:01 +00:00
davidxu
6afcb6595b In order to speed up process suspension on MP machine, send IPI to
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.
2006-02-13 03:16:55 +00:00
rwatson
e145abd47f When exiting a thread, submit any pending record. Today, we don't
audit thread exit, but should that happen, this will prevent
unhappiness, as the thread exit system call will never return, and
hence not commit the record.

Pointed out by/with:	cognet
Obtained from:		TrustedBSD Project
2006-02-06 01:51:08 +00:00
rwatson
8b356bb2d7 When GC'ing a thread, assert that it has no active audit record.
This should not happen, but with this assert, brueffer and I would
not have spent 45 minutes trying to figure out why he wasn't
seeing audit records with the audit version in CVS.

Obtained from:	TrustedBSD Project
2006-02-05 21:06:09 +00:00
rwatson
36f0dbe4c4 Add new fields to process-related data structures:
- td_ar to struct thread, which holds the in-progress audit record during
  a system call.

- p_au to struct proc, which holds per-process audit state, such as the
  audit identifier, audit terminal, and process audit masks.

In the earlier implementation, td_ar was added to the zero'd section of
struct thread.  In order to facilitate merging to RELENG_6, it has been
moved to the end of the data structure, requiring explicit
initalization in the thread constructor.

Much help from:	wsalamon
Obtained from:	TrustedBSD Project
2006-02-02 00:37:05 +00:00
davidxu
c0c32b144f Now SIGCHLD is always queued. 2005-12-09 02:27:55 +00:00
davidxu
5d50adf57d Last step to make mq_notify conform to POSIX standard, If the process
has successfully attached a notification request to the message queue
via a queue descriptor, file closing should remove the attachment.
2005-11-30 05:12:03 +00:00
davidxu
37bb483679 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
davidxu
ef1e34d5ce Add thread_find() function to search a thread by lwpid. 2005-11-03 01:34:08 +00:00
davidxu
7086514f41 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00