diff --git a/lib/libc/sys/kse.2 b/lib/libc/sys/kse.2 deleted file mode 100644 index 95e725624040..000000000000 --- a/lib/libc/sys/kse.2 +++ /dev/null @@ -1,679 +0,0 @@ -.\" Copyright (c) 2002 Packet Design, LLC. -.\" All rights reserved. -.\" -.\" Subject to the following obligations and disclaimer of warranty, -.\" use and redistribution of this software, in source or object code -.\" forms, with or without modifications are expressly permitted by -.\" Packet Design; provided, however, that: -.\" -.\" (i) Any and all reproductions of the source or object code -.\" must include the copyright notice above and the following -.\" disclaimer of warranties; and -.\" (ii) No rights are granted, in any manner or form, to use -.\" Packet Design trademarks, including the mark "PACKET DESIGN" -.\" on advertising, endorsements, or otherwise except as such -.\" appears in the above copyright notice or in the software. -.\" -.\" THIS SOFTWARE IS BEING PROVIDED BY PACKET DESIGN "AS IS", AND -.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, PACKET DESIGN MAKES NO -.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING -.\" THIS SOFTWARE, INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED -.\" WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, -.\" OR NON-INFRINGEMENT. PACKET DESIGN DOES NOT WARRANT, GUARANTEE, -.\" OR MAKE ANY REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS -.\" OF THE USE OF THIS SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, -.\" RELIABILITY OR OTHERWISE. IN NO EVENT SHALL PACKET DESIGN BE -.\" LIABLE FOR ANY DAMAGES RESULTING FROM OR ARISING OUT OF ANY USE -.\" OF THIS SOFTWARE, INCLUDING WITHOUT LIMITATION, ANY DIRECT, -.\" INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE, OR CONSEQUENTIAL -.\" DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSS OF -.\" USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY THEORY OF -.\" LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF -.\" THE USE OF THIS SOFTWARE, EVEN IF PACKET DESIGN IS ADVISED OF -.\" THE POSSIBILITY OF SUCH DAMAGE. -.\" -.\" $FreeBSD$ -.\" -.Dd February 13, 2007 -.Dt KSE 2 -.Os -.Sh NAME -.Nm kse -.Nd "kernel support for user threads" -.Sh LIBRARY -.Lb libc -.Sh SYNOPSIS -.In sys/types.h -.In sys/kse.h -.Ft int -.Fn kse_create "struct kse_mailbox *mbx" "int sys-scope" -.Ft int -.Fn kse_exit void -.Ft int -.Fn kse_release "struct timespec *timeout" -.Ft int -.Fn kse_switchin "struct kse_thr_mailbox *tmbx" "int flags" -.Ft int -.Fn kse_thr_interrupt "struct kse_thr_mailbox *tmbx" "int cmd" "long data" -.Ft int -.Fn kse_wakeup "struct kse_mailbox *mbx" -.Sh DESCRIPTION -These system calls implement kernel support for multi-threaded processes. -.\" -.Ss Overview -.\" -Traditionally, user threading has been implemented in one of two ways: -either all threads are managed in user space and the kernel is unaware -of any threading (also known as -.Dq "N to 1" ) , -or else separate processes sharing -a common memory space are created for each thread (also known as -.Dq "N to N" ) . -These approaches have advantages and disadvantages: -.Bl -column "- Cannot utilize multiple CPUs" "+ Can utilize multiple CPUs" -.It Sy "User threading Kernel threading" -.It "+ Lightweight - Heavyweight" -.It "+ User controls scheduling - Kernel controls scheduling" -.It "- Syscalls must be wrapped + No syscall wrapping required" -.It "- Cannot utilize multiple CPUs + Can utilize multiple CPUs" -.El -.Pp -The KSE system is a -hybrid approach that achieves the advantages of both the user and kernel -threading approaches. -The underlying philosophy of the KSE system is to give kernel support -for user threading without taking away any of the user threading library's -ability to make scheduling decisions. -A kernel-to-user upcall mechanism is used to pass control to the user -threading library whenever a scheduling decision needs to be made. -An arbitrarily number of user threads are multiplexed onto a fixed number of -virtual CPUs supplied by the kernel. -This can be thought of as an -.Dq "N to M" -threading scheme. -.Pp -Some general implications of this approach include: -.Bl -bullet -.It -The user process can run multiple threads simultaneously on multi-processor -machines. -The kernel grants the process virtual CPUs to schedule as it -wishes; these may run concurrently on real CPUs. -.It -All operations that block in the kernel become asynchronous, allowing -the user process to schedule another thread when any thread blocks. -.El -.\" -.Ss Definitions -.\" -KSE allows a user process to have multiple -.Sy threads -of execution in existence at the same time, some of which may be blocked -in the kernel while others may be executing or blocked in user space. -A -.Sy "kernel scheduling entity" -(KSE) is a -.Dq "virtual CPU" -granted to the process for the purpose of executing threads. -A thread that is currently executing is always associated with -exactly one KSE, whether executing in user space or in the kernel. -The KSE is said to be -.Sy assigned -to the thread. -KSEs (a user abstraction) are implemented on top -of kernel threads using an 'upcall' entity. -.Pp -The KSE becomes -.Sy unassigned , -and the associated thread is suspended, when the KSE has an associated -.Sy mailbox , -(see below) the thread has an associated -.Sy thread mailbox , -(also see below) and any of the following occurs: -.Bl -bullet -.It -The thread invokes a system call that blocks. -.It -The thread makes any other demand of the kernel that cannot be immediately -satisfied, e.g., touches a page of memory that needs to be fetched from disk, -causing a page fault. -.It -Another thread that was previously blocked in the kernel completes its -work in the kernel (or is -.Sy interrupted ) -and becomes ready to return to user space, and the current thread is returning -to user space. -.It -A signal is delivered to the process, and this KSE is chosen to deliver it. -.El -.Pp -In other words, as soon as there is a scheduling decision to be made, -the KSE becomes unassigned, because the kernel does not presume to know -how the process' other runnable threads should be scheduled. -Unassigned KSEs always return to user space as soon as possible via -the -.Sy upcall -mechanism (described below), allowing the user process to decide how -that KSE should be utilized next. -KSEs always complete as much work as possible in the kernel before -becoming unassigned. -.Pp -Individual KSEs within a process are effectively indistinguishable, -and any KSE in a process may be assigned by the kernel to any runnable -(in the kernel) thread associated with that process. -In practice, the kernel attempts to preserve the affinity between threads -and actual CPUs to optimize cache behavior, but this is invisible to the -user process. -(Affinity is not yet fully implemented.) -.Pp -Each KSE has a unique -.Sy "KSE mailbox" -supplied by the user process. -A mailbox consists of a control structure containing a pointer to an -.Sy "upcall function" -and a user stack. -The KSE invokes this function whenever it becomes unassigned. -The kernel updates this structure with information about threads that have -become runnable and signals that have been delivered before each upcall. -Upcalls may be temporarily blocked by the user thread scheduling code -during critical sections. -.Pp -Each user thread has a unique -.Sy "thread mailbox" -as well. -Threads are referred to using pointers to these mailboxes when communicating -between the kernel and the user thread scheduler. -Each KSE's mailbox contains a pointer to the mailbox of the user thread -that the KSE is currently executing. -This pointer is saved when the thread blocks in the kernel. -.Pp -Whenever a thread blocked in the kernel is ready to return to user space, -it is added to the process's list of -.Sy completed -threads. -This list is presented to the user code at the next upcall as a linked list -of thread mailboxes. -.Pp -There is a kernel-imposed limit on the number of threads in a process -that may be simultaneously blocked in the kernel (this number is not -currently visible to the user). -When this limit is reached, upcalls are blocked and no work is performed -for the process until one of the threads completes (or a signal is -received). -.\" -.Ss Managing KSEs -.\" -To become multi-threaded, a process must first invoke -.Fn kse_create . -The -.Fn kse_create -system call -creates a new KSE (except for the very first invocation; see below). -The KSE will be associated with the mailbox pointed to by -.Fa mbx . -If -.Fa sys_scope -is non-zero, then the new thread will be counted as a system scope -thread. Other things must be done as well to make a system scope thread -so this is not sufficient (yet). -System scope variables are not covered -in detail in this manual page yet, but briefly, they never perform -upcalls and do not return to the user thread scheduler. -Once launched they run autonomously. -The pthreads library knows how to make system -scope threads and users are encouraged to use the library interface. -.Pp -Each process initially has a single KSE executing a single user thread. -Since the KSE does not have an associated mailbox, it must remain assigned -to the thread and does not perform any upcalls. -(It is by definition a system scope thread). -The result is the traditional, unthreaded mode of operation. -Therefore, as a special case, the first call to -.Fn kse_create -by this initial thread with -.Fa sys_scope -equal to zero does not create a new KSE; instead, it simply associates the -current KSE with the supplied KSE mailbox, and no immediate upcall results. -However, an upcall will be triggered the next time the thread blocks and -the required conditions are met. -.Pp -The kernel does not allow more KSEs to exist in a process than the -number of physical CPUs in the system (this number is available as the -.Xr sysctl 3 -variable -.Va hw.ncpu ) . -Having more KSEs than CPUs would not add any value to the user process, -as the additional KSEs would just compete with each other for access to -the real CPUs. -Since the extra KSEs would always be side-lined, the result -to the application would be exactly the same as having fewer KSEs. -There may however be arbitrarily many user threads, and it is up to the -user thread scheduler to handle mapping the application's user threads -onto the available KSEs. -.Pp -The -.Fn kse_exit -system call -causes the KSE assigned to the currently running thread to be destroyed. -If this KSE is the last one in the process, there must be no remaining -threads associated with that process blocked in the kernel. -This system call does not return unless there is an error. -Calling -.Fn kse_exit -from the last thread is the same as calling -.Fn exit . -.Pp -The -.Fn kse_release -system call -is used to -.Dq park -the KSE assigned to the currently running thread when it is not needed, -e.g., when there are more available KSEs than runnable user threads. -The thread converts to an upcall but does not get scheduled until -there is a new reason to do so, e.g., a previously -blocked thread becomes runnable, or the timeout expires. -If successful, -.Fn kse_release -does not return to the caller. -.Pp -The -.Fn kse_switchin -system call can be used by the UTS, when it has selected a new thread, -to switch to the context of that thread. -The use of -.Fn kse_switchin -is machine dependent. -Some platforms do not need a system call to switch to a new context, -while others require its use in particular cases. -.Pp -The -.Fn kse_wakeup -system call -is the opposite of -.Fn kse_release . -It causes the (parked) KSE associated with the mailbox pointed to by -.Fa mbx -to be woken up, causing it to upcall. -If the KSE has already woken up for another reason, this system call has no -effect. -The -.Fa mbx -argument -may be -.Dv NULL -to specify -.Dq "any KSE in the current process" . -.Pp -The -.Fn kse_thr_interrupt -system call -is used to interrupt a currently blocked thread. -The thread must either be blocked in the kernel or assigned to a KSE -(i.e., executing). -The thread is then marked as interrupted. -As soon as the thread invokes an interruptible system call (or immediately -for threads already blocked in one), the thread will be made runnable again, -even though the kernel operation may not have completed. -The effect on the interrupted system call is the same as if it had been -interrupted by a signal; typically this means an error is returned with -.Va errno -set to -.Er EINTR . -.\" -.Ss Signals -.\" -The current implementation creates a special signal thread. -Kernel threads (KSEs) in a process mask all signals, and only the signal -thread waits for signals to be delivered to the process, the signal thread -is responsible -for dispatching signals to user threads. -.Pp -A downside of this is that if a multiplexed thread -calls the -.Fn execve -syscall, its signal mask and pending signals may not be -available in the kernel. -They are stored -in userland and the kernel does not know where to get them, however -.Tn POSIX -requires them to be restored and passed them to new process. -Just setting the mask for the thread before calling -.Fn execve -is only a -close approximation to the problem as it does not re-deliver back to the kernel -any pending signals that the old process may have blocked, and it allows a -window in which new signals may be delivered to the process between the setting -of the mask and the -.Fn execve . -.Pp -For now this problem has been solved by adding a special combined -.Fn kse_thr_interrupt Ns / Ns Fn execve -mode to the -.Fn kse_thr_interrupt -syscall. -The -.Fn kse_thr_interrupt -syscall has a sub command -.Dv KSE_INTR_EXECVE , -that allows it to accept a -.Vt kse_execv_args -structure, and allowing it to adjust the signals and then atomically -convert into an -.Fn execve -call. -Additional pending signals and the correct signal mask can be passed -to the kernel in this way. -The thread library overrides the -.Fn execve -syscall -and translates it into -.Fn kse_intr_interrupt -call, allowing a multiplexed thread -to restore pending signals and the correct signal mask before doing the -.Fn exec . -This solution to the problem may change. -.\" -.Ss KSE Mailboxes -.\" -Each KSE has a unique mailbox for user-kernel communication defined in -.In sys/kse.h . -Some of the fields there are: -.Pp -.Va km_version -describes the version of this structure and must be equal to -.Dv KSE_VER_0 . -.Va km_udata -is an opaque pointer ignored by the kernel. -.Pp -.Va km_func -points to the KSE's upcall function; -it will be invoked using -.Va km_stack , -which must remain valid for the lifetime of the KSE. -.Pp -.Va km_curthread -always points to the thread that is currently assigned to this KSE if any, -or -.Dv NULL -otherwise. -This field is modified by both the kernel and the user process as follows. -.Pp -When -.Va km_curthread -is not -.Dv NULL , -it is assumed to be pointing at the mailbox for the currently executing -thread, and the KSE may be unassigned, e.g., if the thread blocks in the -kernel. -The kernel will then save the contents of -.Va km_curthread -with the blocked thread, set -.Va km_curthread -to -.Dv NULL , -and upcall to invoke -.Fn km_func . -.Pp -When -.Va km_curthread -is -.Dv NULL , -the kernel will never perform any upcalls with this KSE; in other words, -the KSE remains assigned to the thread even if it blocks. -.Va km_curthread -must be -.Dv NULL -while the KSE is executing critical user thread scheduler -code that would be disrupted by an intervening upcall; -in particular, while -.Fn km_func -itself is executing. -.Pp -Before invoking -.Fn km_func -in any upcall, the kernel always sets -.Va km_curthread -to -.Dv NULL . -Once the user thread scheduler has chosen a new thread to run, -it should point -.Va km_curthread -at the thread's mailbox, re-enabling upcalls, and then resume the thread. -.Em Note : -modification of -.Va km_curthread -by the user thread scheduler must be atomic -with the loading of the context of the new thread, to avoid -the situation where the thread context area -may be modified by a blocking async operation, while there -is still valid information to be read out of it. -.Pp -.Va km_completed -points to a linked list of user threads that have completed their work -in the kernel since the last upcall. -The user thread scheduler should put these threads back into its -own runnable queue. -Each thread in a process that completes a kernel operation -(synchronous or asynchronous) that results in an upcall is guaranteed to be -linked into exactly one KSE's -.Va km_completed -list; which KSE in the group, however, is indeterminate. -Furthermore, the completion will be reported in only one upcall. -.Pp -.Va km_sigscaught -contains the list of signals caught by this process since the previous -upcall to any KSE in the process. -As long as there exists one or more KSEs with an associated mailbox in -the user process, signals are delivered this way rather than the -traditional way. -(This has not been implemented and may change.) -.Pp -.Va km_timeofday -is set by the kernel to the current system time before performing -each upcall. -.Pp -.Va km_flags -may contain any of the following bits OR'ed together: -.Bl -tag -width indent -.It Dv KMF_NOUPCALL -Block upcalls from happening. -The thread is in some critical section. -.It Dv KMF_NOCOMPLETED , KMF_DONE , KMF_BOUND -This thread should be considered to be permanently bound to -its KSE, and treated much like a non-threaded process would be. -It is a -.Dq "long term" -version of -.Dv KMF_NOUPCALL -in some ways. -.It Dv KMF_WAITSIGEVENT -Implement characteristics needed for the signal delivery thread. -.El -.\" -.Ss Thread Mailboxes -.\" -Each user thread must have associated with it a unique -.Vt "struct kse_thr_mailbox" -as defined in -.In sys/kse.h . -It includes the following fields. -.Pp -.Va tm_udata -is an opaque pointer ignored by the kernel. -.Pp -.Va tm_context -stores the context for the thread when the thread is blocked in user space. -This field is also updated by the kernel before a completed thread is returned -to the user thread scheduler via -.Va km_completed . -.Pp -.Va tm_next -links the -.Va km_completed -threads together when returned by the kernel with an upcall. -The end of the list is marked with a -.Dv NULL -pointer. -.Pp -.Va tm_uticks -and -.Va tm_sticks -are time counters for user mode and kernel mode execution, respectively. -These counters count ticks of the statistics clock (see -.Xr clocks 7 ) . -While any thread is actively executing in the kernel, the corresponding -.Va tm_sticks -counter is incremented. -While any KSE is executing in user space and that KSE's -.Va km_curthread -pointer is not equal to -.Dv NULL , -the corresponding -.Va tm_uticks -counter is incremented. -.Pp -.Va tm_flags -may contain any of the following bits OR'ed together: -.Bl -tag -width indent -.It Dv TMF_NOUPCALL -Similar to -.Dv KMF_NOUPCALL . -This flag inhibits upcalling for critical sections. -Some architectures require this to be in one place and some in the other. -.El -.Sh RETURN VALUES -The -.Fn kse_create , -.Fn kse_wakeup , -and -.Fn kse_thr_interrupt -system calls -return zero if successful. -The -.Fn kse_exit -and -.Fn kse_release -system calls -do not return if successful. -.Pp -All of these system calls return a non-zero error code in case of an error. -.Sh ERRORS -The -.Fn kse_create -system call -will fail if: -.Bl -tag -width Er -.It Bq Er ENXIO -There are already as many KSEs in the process as hardware processors. -.It Bq Er EAGAIN -The user is not the super user, and the soft resource limit corresponding -to the -.Fa resource -argument -.Dv RLIMIT_NPROC -would be exceeded (see -.Xr getrlimit 2 ) . -.It Bq Er EFAULT -The -.Fa mbx -argument -points to an address which is not a valid part of the process address space. -.El -.Pp -The -.Fn kse_exit -system call -will fail if: -.Bl -tag -width Er -.It Bq Er EDEADLK -The current KSE is the last in its process and there are still one or more -threads associated with the process blocked in the kernel. -.It Bq Er ESRCH -The current KSE has no associated mailbox, i.e., the process is operating -in traditional, unthreaded mode (in this case use -.Xr _exit 2 -to exit the process). -.El -.Pp -The -.Fn kse_release -system call -will fail if: -.Bl -tag -width Er -.It Bq Er ESRCH -The current KSE has no associated mailbox, i.e., the process is operating is -traditional, unthreaded mode. -.El -.Pp -The -.Fn kse_wakeup -system call -will fail if: -.Bl -tag -width Er -.It Bq Er ESRCH -The -.Fa mbx -argument -is not -.Dv NULL -and the mailbox pointed to by -.Fa mbx -is not associated with any KSE in the process. -.It Bq Er ESRCH -The -.Fa mbx -argument -is -.Dv NULL -and the current KSE has no associated mailbox, i.e., the process is operating -in traditional, unthreaded mode. -.El -.Pp -The -.Fn kse_thr_interrupt -system call -will fail if: -.Bl -tag -width Er -.It Bq Er ESRCH -The thread corresponding to -.Fa tmbx -is neither currently assigned to any KSE in the process nor blocked in the -kernel. -.El -.Sh SEE ALSO -.Xr rfork 2 , -.Xr pthread 3 , -.Xr ucontext 3 -.Rs -.%A "Thomas E. Anderson" -.%A "Brian N. Bershad" -.%A "Edward D. Lazowska" -.%A "Henry M. Levy" -.%J "ACM Transactions on Computer Systems" -.%N Issue 1 -.%V Volume 10 -.%D February 1992 -.%I ACM Press -.%P pp. 53-79 -.%T "Scheduler activations: effective kernel support for the user-level management of parallelism" -.Re -.Sh HISTORY -The KSE system calls first appeared in -.Fx 5.0 . -.Sh AUTHORS -KSE was originally implemented by -.An -nosplit -.An Julian Elischer Aq Mt julian@FreeBSD.org , -with additional contributions by -.An Jonathan Mini Aq Mt mini@FreeBSD.org , -.An Daniel Eischen Aq Mt deischen@FreeBSD.org , -and -.An David Xu Aq Mt davidxu@FreeBSD.org . -.Pp -This manual page was written by -.An Archie Cobbs Aq Mt archie@FreeBSD.org . -.Sh BUGS -The KSE code is -.Ud