mi_switch(9): update to current day

The function itself and much of the information in this page remains
relevant, but many details need to be fixed.
 - Update function signatures
 - Update the list of major uses of mi_switch() (it is not exhaustive)
 - Document 'flags' argument and its possible values
 - Document thread lock requirement for callers
 - Thread runtime limits are out of scope now, no need to describe them
 - Remove outdated information w.r.t. KSE, runqueue, non-preemptible
   kernel, etc
 - Update the description of cpu_switch() and its responsibilities

PR:		149574
Reviewed by:	kib
Discussed with:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38185
This commit is contained in:
Mitchell Horne 2023-02-09 11:41:14 -04:00
parent 1029dab634
commit 175db7b582

View File

@ -2,10 +2,14 @@
.\"
.\" Copyright (c) 1996 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\" Copyright (c) 2023 The FreeBSD Foundation
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Paul Kranenburg.
.\"
.\" Portions of this documentation were written by Mitchell Horne
.\" under sponsorship from the FreeBSD Foundation.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
@ -29,7 +33,7 @@
.\"
.\" $FreeBSD$
.\"
.Dd November 24, 1996
.Dd January 9, 2023
.Dt MI_SWITCH 9
.Os
.Sh NAME
@ -41,96 +45,171 @@
.In sys/param.h
.In sys/proc.h
.Ft void
.Fn mi_switch "void"
.Fn mi_switch "int flags"
.Ft void
.Fn cpu_switch "void"
.Fn cpu_switch "struct thread *oldtd" "struct thread *newtd" "struct mtx *lock"
.Ft void
.Fn cpu_throw "void"
.Fn cpu_throw "struct thread *oldtd" "struct thread *newtd"
.Sh DESCRIPTION
The
.Fn mi_switch
function implements the machine independent prelude to a thread context
function implements the machine-independent prelude to a thread context
switch.
It is called from only a few distinguished places in the kernel
code as a result of the principle of non-preemptable kernel mode execution.
It is the single entry point for every context switch and is called from only
a few distinguished places in the kernel.
The context switch is, by necessity, always performed by the switched thread,
even when the switch is initiated from elsewhere; e.g. preemption requested via
Inter-Processor Interrupt (IPI).
.Pp
The various major uses of
.Nm
.Fn mi_switch
can be enumerated as follows:
.Bl -enum -offset indent
.It
From within a function such as
.Xr cv_wait 9 ,
.Xr mtx_lock 9 ,
.Xr sleepq_wait 9
or
.Xr tsleep 9
.Fn turnstile_wait
when the current thread
voluntarily relinquishes the CPU to wait for some resource or lock to become
available.
.It
After handling a trap
(e.g.\& a system call, device interrupt)
when the kernel prepares a return to user-mode execution.
This case is
typically handled by machine dependent trap-handling code after detection
of a change in the signal disposition of the current process, or when a
higher priority thread might be available to run.
The latter event is
communicated by the machine independent scheduling routines by calling
the machine defined
.Fn need_resched .
Involuntary preemption due to arrival of a higher-priority thread.
.It
At the tail end of
.Xr critical_exit 9 ,
if preemption was deferred due to the critical section.
.It
Within the TDA_SCHED AST handler, when rescheduling before the return to
usermode was requested.
There are several reasons for this, a notable one coming from
.Fn sched_clock
when the running thread has exceeded its time slice.
.It
In the signal handling code
(see
.Xr issignal 9 )
if a signal is delivered that causes a process to stop.
.It
When a thread dies in
.Xr thread_exit 9
and control of the processor can be passed to the next runnable thread.
.It
In
.Xr thread_suspend_check 9
.Fn thread_suspend_check
where a thread needs to stop execution due to the suspension state of
the process as a whole.
.It
In
.Xr kern_yield 9
when a thread wants to voluntarily relinquish the processor.
.El
.Pp
The
.Va flags
argument to
.Fn mi_switch
records the amount of time the current thread has been running in the
process structures and checks this value against the CPU time limits
allocated to the process
(see
.Xr getrlimit 2 ) .
Exceeding the soft limit results in a
.Dv SIGXCPU
signal to be posted to the process, while exceeding the hard limit will
cause a
.Dv SIGKILL .
indicates the context switch type.
One of the following must be passed:
.Bl -tag -offset indent -width "SWT_REMOTEWAKEIDLE"
.It Dv SWT_OWEPREEMPT
Switch due to delayed preemption after exiting a critical section.
.It Dv SWT_TURNSTILE
Switch after propagating scheduling priority to the owner of a resource.
.It Dv SWT_SLEEPQ
Begin waiting on a
.Xr sleepqueue 9 .
.It Dv SWT_RELINQUISH
Yield call.
.It Dv SWT_NEEDRESCHED
Rescheduling was requested.
.It Dv SWT_IDLE
Switch from the idle thread.
.It Dv SWT_IWAIT
A kernel thread which handles interrupts has finished work and must wait for
interrupts to schedule additional work.
.It Dv SWT_SUSPEND
Thread suspended.
.It Dv SWT_REMOTEPREEMPT
Preemption by a higher-priority thread, initiated by a remote processor.
.It Dv SWT_REMOTEWAKEIDLE
Idle thread preempted, initiated by a remote processor.
.It Dv SWT_BIND
The running thread has been bound to another processor and must be switched
out.
.El
.Pp
If the thread is still in the
.Dv TDS_RUNNING
state,
.Fn mi_switch
will put it back onto the run queue, assuming that
it will want to run again soon.
If it is in one of the other
states and KSE threading is enabled, the associated
.Em KSE
will be made available to any higher priority threads from the same
group, to allow them to be scheduled next.
In addition to the switch type, callers must specify the nature of the
switch by performing a bitwise OR with one of the
.Dv SW_VOL
or
.Dv SW_INVOL
flags, but not both.
Respectively, these flags denote whether the context switch is voluntary or
involuntary on the part of the current thread.
For an involuntary context switch in which the running thread is
being preempted, the caller should also pass the
.Dv SW_PREEMPT
flag.
.Pp
After these administrative tasks are done,
Upon entry to
.Fn mi_switch ,
the current thread must be holding its assigned thread lock.
It may be unlocked as part of the context switch.
After they have been rescheduled and execution resumes, threads will exit
.Fn mi_switch
hands over control to the machine dependent routine
.Fn cpu_switch ,
which will perform the actual thread context switch.
with their thread lock unlocked.
.Pp
.Fn mi_switch
records the amount of time the current thread has been running before handing
control over to the scheduler, via
.Fn sched_switch .
After selecting a new thread to run, the scheduler will call
.Fn cpu_switch
to perform the low-level context switch.
.Pp
.Fn cpu_switch
first saves the context of the current thread.
Next, it calls
.Fn choosethread
to determine which thread to run next.
Finally, it reads in the saved context of the new thread and starts to
execute the new thread.
is the machine-dependent function that performs the actual switch from the
running thread
.Fa oldtd
to the chosen thread
.Fa newtd .
First, it saves the context of
.Fa oldtd
to its Process Control Block,
.Po
PCB
.Vt struct pcb
.Pc ,
pointed at by
.Va oldtd->td_pcb .
The function then updates important per-CPU state such as the
.Dv curthread
variable, and activates
.Fa newtd\&'s
virtual address space using its associated
.Xr pmap 9
structure.
Finally, it reads in the saved context from
.Fa newtd\&'s
PCB.
CPU instruction flow continues in the new thread context, on
.Fa newtd\&'s
kernel stack.
The return from
.Fn cpu_switch
can be understood as a completion of the function call initiated by
.Fa newtd
when it was previously switched out, at some point in the distant (relative to
CPU time) past.
.Pp
The
.Fa mtx
argument to
.Fn cpu_switch
is used to pass the mutex which will be stored as
.Fa oldtd\&'s
thread lock at the moment that
.Fa oldtd
is completely switched out.
This is an implementation detail of
.Fn sched_switch .
.Pp
.Fn cpu_throw
is similar to
@ -140,19 +219,18 @@ This function is useful when the kernel does not have an old thread
context to save, such as when CPUs other than the boot CPU perform their
first task switch, or when the kernel does not care about the state of the
old thread, such as in
.Fn thread_exit
.Xr thread_exit 9
when the kernel terminates the current thread and switches into a new
thread.
.Pp
To protect the
.Xr runqueue 9 ,
all of these functions must be called with the
.Va sched_lock
mutex held.
thread,
.Fa newtd .
The
.Fa oldtd
argument is unused.
.Sh SEE ALSO
.Xr cv_wait 9 ,
.Xr critical_exit 9 ,
.Xr issignal 9 ,
.Xr kern_yield 9 ,
.Xr mutex 9 ,
.Xr runqueue 9 ,
.Xr tsleep 9 ,
.Xr wakeup 9
.Xr pmap 9 ,
.Xr sleepqueue 9 ,
.Xr thread_exit 9