1993-06-12 14:58:17 +00:00
|
|
|
/*-
|
2017-11-18 14:26:50 +00:00
|
|
|
* SPDX-License-Identifier: BSD-4-Clause
|
|
|
|
*
|
1994-06-06 14:54:41 +00:00
|
|
|
* Copyright (C) 1994, David Greenman
|
|
|
|
* Copyright (c) 1990, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
2007-12-07 08:20:17 +00:00
|
|
|
* Copyright (c) 2007 The FreeBSD Foundation
|
1993-06-12 14:58:17 +00:00
|
|
|
*
|
|
|
|
* This code is derived from software contributed to Berkeley by
|
|
|
|
* the University of Utah, and William Jolitz.
|
|
|
|
*
|
2007-12-07 08:20:17 +00:00
|
|
|
* Portions of this software were developed by A. Joseph Koshy under
|
|
|
|
* sponsorship from the FreeBSD Foundation and Google, Inc.
|
|
|
|
*
|
1993-06-12 14:58:17 +00:00
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
|
|
|
* must display the following acknowledgement:
|
|
|
|
* This product includes software developed by the University of
|
|
|
|
* California, Berkeley and its contributors.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.
locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.
Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!
Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.
Use if NNPX > 0 instead of ifdef NPX for floating point code.
machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.
It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.
pmap.h
Remove rcsid, don't want them in the kernel files!
Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.
Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.
trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.
Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)
vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.
Replace several 0xFE00000 constants with KERNBASE
1993-10-15 10:34:29 +00:00
|
|
|
* from: @(#)trap.c 7.4 (Berkeley) 5/13/91
|
1993-06-12 14:58:17 +00:00
|
|
|
*/
|
|
|
|
|
2003-06-11 00:56:59 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2012-03-28 20:58:30 +00:00
|
|
|
#include "opt_hwpmc_hooks.h"
|
2003-07-31 01:36:24 +00:00
|
|
|
#include "opt_ktrace.h"
|
2007-06-12 23:27:31 +00:00
|
|
|
#include "opt_sched.h"
|
1996-01-03 21:42:35 +00:00
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/param.h>
|
2000-09-07 01:33:02 +00:00
|
|
|
#include <sys/bus.h>
|
2014-03-16 10:55:57 +00:00
|
|
|
#include <sys/capsicum.h>
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/kernel.h>
|
2001-06-29 19:51:37 +00:00
|
|
|
#include <sys/lock.h>
|
2000-10-20 07:58:15 +00:00
|
|
|
#include <sys/mutex.h>
|
2007-12-07 08:20:17 +00:00
|
|
|
#include <sys/pmckern.h>
|
2001-06-29 19:51:37 +00:00
|
|
|
#include <sys/proc.h>
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
#include <sys/ktr.h>
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
#include <sys/ptrace.h>
|
2015-04-29 10:23:02 +00:00
|
|
|
#include <sys/racct.h>
|
1997-11-24 13:25:37 +00:00
|
|
|
#include <sys/resourcevar.h>
|
2002-10-12 05:32:24 +00:00
|
|
|
#include <sys/sched.h>
|
1997-11-24 13:25:37 +00:00
|
|
|
#include <sys/signalvar.h>
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
#include <sys/syscall.h>
|
2010-06-30 18:03:42 +00:00
|
|
|
#include <sys/syscallsubr.h>
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
#include <sys/sysent.h>
|
2001-06-29 19:51:37 +00:00
|
|
|
#include <sys/systm.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <sys/vmmeter.h>
|
2003-07-31 01:36:24 +00:00
|
|
|
#ifdef KTRACE
|
|
|
|
#include <sys/uio.h>
|
|
|
|
#include <sys/ktrace.h>
|
|
|
|
#endif
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
#include <security/audit/audit.h>
|
2003-07-31 01:36:24 +00:00
|
|
|
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <machine/cpu.h>
|
2000-12-12 01:14:32 +00:00
|
|
|
|
2011-02-14 20:49:37 +00:00
|
|
|
#ifdef VIMAGE
|
|
|
|
#include <net/vnet.h>
|
|
|
|
#endif
|
|
|
|
|
2012-03-28 20:58:30 +00:00
|
|
|
#ifdef HWPMC_HOOKS
|
|
|
|
#include <sys/pmckern.h>
|
|
|
|
#endif
|
|
|
|
|
2006-10-22 11:52:19 +00:00
|
|
|
#include <security/mac/mac_framework.h>
|
|
|
|
|
2017-02-25 10:38:18 +00:00
|
|
|
void (*softdep_ast_cleanup)(struct thread *);
|
Currently, softupdate code detects overstepping on the workitems
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks. Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.
Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode. A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.
There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads. NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag. This is needed because nfsd may be the sole cause of the SU
workqueue overflow. But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.
Reviewed by: jhb, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
2015-05-27 09:20:42 +00:00
|
|
|
|
2001-06-29 19:51:37 +00:00
|
|
|
/*
|
2007-03-04 22:36:48 +00:00
|
|
|
* Define the code needed before returning to user mode, for trap and
|
|
|
|
* syscall.
|
2001-06-29 19:51:37 +00:00
|
|
|
*/
|
2001-01-24 09:53:49 +00:00
|
|
|
void
|
2006-02-08 08:09:17 +00:00
|
|
|
userret(struct thread *td, struct trapframe *frame)
|
1994-06-06 14:54:41 +00:00
|
|
|
{
|
2001-09-12 08:38:13 +00:00
|
|
|
struct proc *p = td->td_proc;
|
1994-06-06 14:54:41 +00:00
|
|
|
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
CTR3(KTR_SYSC, "userret: thread %p (pid %d, %s)", td, p->p_pid,
|
2007-11-14 06:51:33 +00:00
|
|
|
td->td_name);
|
2011-10-03 16:58:58 +00:00
|
|
|
KASSERT((p->p_flag & P_WEXIT) == 0,
|
|
|
|
("Exiting process returns to usermode"));
|
2004-03-05 17:35:28 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
2016-07-12 03:53:15 +00:00
|
|
|
/*
|
|
|
|
* Check that we called signotify() enough. For
|
|
|
|
* multi-threaded processes, where signal distribution might
|
|
|
|
* change due to other threads changing sigmask, the check is
|
|
|
|
* racy and cannot be performed reliably.
|
2016-07-18 10:53:47 +00:00
|
|
|
* If current process is vfork child, indicated by P_PPWAIT, then
|
|
|
|
* issignal() ignores stops, so we block the check to avoid
|
|
|
|
* classifying pending signals.
|
2016-07-12 03:53:15 +00:00
|
|
|
*/
|
|
|
|
if (p->p_numthreads == 1) {
|
|
|
|
PROC_LOCK(p);
|
|
|
|
thread_lock(td);
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
if ((p->p_flag & P_PPWAIT) == 0 &&
|
|
|
|
(td->td_pflags & TDP_SIGFASTBLOCK) == 0) {
|
|
|
|
if (SIGPENDING(td) && (td->td_flags &
|
|
|
|
(TDF_NEEDSIGCHK | TDF_ASTPENDING)) !=
|
|
|
|
(TDF_NEEDSIGCHK | TDF_ASTPENDING)) {
|
|
|
|
thread_unlock(td);
|
|
|
|
panic(
|
|
|
|
"failed to set signal flags for ast p %p td %p fl %x",
|
|
|
|
p, td, td->td_flags);
|
|
|
|
}
|
2016-07-18 10:53:47 +00:00
|
|
|
}
|
2016-07-12 03:53:15 +00:00
|
|
|
thread_unlock(td);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
}
|
Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.
Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.
Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.
Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month
2009-10-11 16:49:30 +00:00
|
|
|
#endif
|
Moderate rewrite of kernel ktrace code to attempt to generally improve
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.
These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler). Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.
- Eliminate the ktrace worker thread and global record queue, as they
are no longer used. Keep the global free record list, as records
are still used.
- Add a per-process record queue, which will hold any asynchronously
generated records, such as from context switches. This replaces the
global queue as the place to submit asynchronous records to.
- When a record is committed asynchronously, simply queue it to the
process.
- When a record is committed synchronously, first drain any pending
per-process records in order to maintain ordering as best we can.
Currently ordering between competing threads is provided via a global
ktrace_sx, but a per-process flag or lock may be desirable in the
future.
- When a process returns to user space following a system call, trap,
signal delivery, etc, flush any pending records.
- When a process exits, flush any pending records.
- Assert on process tear-down that there are no pending records.
- Slightly abstract the notion of being "in ktrace", which is used to
prevent the recursive generation of records, as well as generating
traces for ktrace events.
Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on. I.e., performing a
drain every (n) records.
MFC after: 1 month
Discussed with: jhb
Requested by: Marc Olzheim <marcolz at stack dot nl>
2005-11-13 13:27:44 +00:00
|
|
|
#ifdef KTRACE
|
|
|
|
KTRUSERRET(td);
|
|
|
|
#endif
|
2020-02-20 15:34:02 +00:00
|
|
|
|
2017-02-25 10:38:18 +00:00
|
|
|
td_softdep_cleanup(td);
|
|
|
|
MPASS(td->td_su == NULL);
|
Currently, softupdate code detects overstepping on the workitems
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks. Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.
Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode. A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.
There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads. NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag. This is needed because nfsd may be the sole cause of the SU
workqueue overflow. But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.
Reviewed by: jhb, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
2015-05-27 09:20:42 +00:00
|
|
|
|
2004-10-23 20:49:17 +00:00
|
|
|
/*
|
|
|
|
* If this thread tickled GEOM, we need to wait for the giggling to
|
|
|
|
* stop before we return to userland
|
|
|
|
*/
|
2020-02-14 13:08:46 +00:00
|
|
|
if (__predict_false(td->td_pflags & TDP_GEOM))
|
2004-10-23 20:49:17 +00:00
|
|
|
g_waitidle();
|
|
|
|
|
2003-02-01 12:17:09 +00:00
|
|
|
/*
|
|
|
|
* Charge system time if profiling.
|
|
|
|
*/
|
2020-02-14 13:08:46 +00:00
|
|
|
if (__predict_false(p->p_flag & P_PROFIL))
|
2006-02-08 08:09:17 +00:00
|
|
|
addupc_task(td, TRAPF_PC(frame), td->td_pticks * psratio);
|
2018-06-04 01:10:23 +00:00
|
|
|
|
|
|
|
#ifdef HWPMC_HOOKS
|
|
|
|
if (PMC_THREAD_HAS_SAMPLES(td))
|
|
|
|
PMC_CALL_HOOK(td, PMC_FN_THR_USERRET, NULL);
|
|
|
|
#endif
|
2004-12-26 07:30:35 +00:00
|
|
|
/*
|
|
|
|
* Let the scheduler adjust our priority etc.
|
|
|
|
*/
|
|
|
|
sched_userret(td);
|
2012-09-08 18:35:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for misbehavior.
|
2012-10-30 15:10:50 +00:00
|
|
|
*
|
|
|
|
* In case there is a callchain tracing ongoing because of
|
|
|
|
* hwpmc(4), skip the scheduler pinning check.
|
|
|
|
* hwpmc(4) subsystem, infact, will collect callchain informations
|
|
|
|
* at ast() checkpoint, which is past userret().
|
2012-09-08 18:35:15 +00:00
|
|
|
*/
|
|
|
|
WITNESS_WARN(WARN_PANIC, NULL, "userret: returning");
|
|
|
|
KASSERT(td->td_critnest == 0,
|
|
|
|
("userret: Returning in a critical section"));
|
2005-03-24 09:35:38 +00:00
|
|
|
KASSERT(td->td_locks == 0,
|
2012-09-08 18:35:15 +00:00
|
|
|
("userret: Returning with %d locks held", td->td_locks));
|
2013-12-17 13:37:02 +00:00
|
|
|
KASSERT(td->td_rw_rlocks == 0,
|
|
|
|
("userret: Returning with %d rwlocks held in read mode",
|
|
|
|
td->td_rw_rlocks));
|
2018-05-22 07:20:22 +00:00
|
|
|
KASSERT(td->td_sx_slocks == 0,
|
|
|
|
("userret: Returning with %d sx locks held in shared mode",
|
|
|
|
td->td_sx_slocks));
|
2019-08-19 11:18:36 +00:00
|
|
|
KASSERT(td->td_lk_slocks == 0,
|
|
|
|
("userret: Returning with %d lockmanager locks held in shared mode",
|
|
|
|
td->td_lk_slocks));
|
2012-09-08 18:35:15 +00:00
|
|
|
KASSERT((td->td_pflags & TDP_NOFAULTING) == 0,
|
|
|
|
("userret: Returning with pagefaults disabled"));
|
2019-10-29 17:28:25 +00:00
|
|
|
if (__predict_false(!THREAD_CAN_SLEEP())) {
|
|
|
|
#ifdef EPOCH_TRACE
|
|
|
|
epoch_trace_list(curthread);
|
|
|
|
#endif
|
|
|
|
KASSERT(1, ("userret: Returning with sleep disabled"));
|
|
|
|
}
|
2012-10-30 15:10:50 +00:00
|
|
|
KASSERT(td->td_pinned == 0 || (td->td_pflags & TDP_CALLCHAIN) != 0,
|
2012-09-08 18:35:15 +00:00
|
|
|
("userret: Returning with with pinned thread"));
|
2020-01-11 22:58:14 +00:00
|
|
|
KASSERT(td->td_vp_reserved == NULL,
|
|
|
|
("userret: Returning with preallocated vnode"));
|
2016-06-26 20:07:24 +00:00
|
|
|
KASSERT((td->td_flags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0,
|
2013-02-21 19:02:50 +00:00
|
|
|
("userret: Returning with stop signals deferred"));
|
Currently, softupdate code detects overstepping on the workitems
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks. Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.
Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode. A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.
There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads. NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag. This is needed because nfsd may be the sole cause of the SU
workqueue overflow. But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.
Reviewed by: jhb, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
2015-05-27 09:20:42 +00:00
|
|
|
KASSERT(td->td_su == NULL,
|
|
|
|
("userret: Returning with SU cleanup request not handled"));
|
2018-03-24 13:51:27 +00:00
|
|
|
KASSERT(td->td_vslock_sz == 0,
|
|
|
|
("userret: Returning with vslock-wired space"));
|
2011-02-14 20:49:37 +00:00
|
|
|
#ifdef VIMAGE
|
|
|
|
/* Unfortunately td_vnet_lpush needs VNET_DEBUG. */
|
|
|
|
VNET_ASSERT(curvnet == NULL,
|
|
|
|
("%s: Returning on td %p (pid %d, %s) with vnet %p set in %s",
|
|
|
|
__func__, td, p->p_pid, td->td_name, curvnet,
|
|
|
|
(td->td_vnet_lpush != NULL) ? td->td_vnet_lpush : "N/A"));
|
|
|
|
#endif
|
2015-04-29 10:23:02 +00:00
|
|
|
#ifdef RACCT
|
2018-11-29 05:08:46 +00:00
|
|
|
if (__predict_false(racct_enable && p->p_throttled != 0))
|
|
|
|
racct_proc_throttled(p);
|
2012-10-26 16:01:08 +00:00
|
|
|
#endif
|
1994-06-06 14:54:41 +00:00
|
|
|
}
|
1993-06-12 14:58:17 +00:00
|
|
|
|
|
|
|
/*
|
2001-06-29 19:51:37 +00:00
|
|
|
* Process an asynchronous software trap.
|
|
|
|
* This is relatively easy.
|
2001-08-10 22:53:32 +00:00
|
|
|
* This function will return with preemption disabled.
|
1993-06-12 14:58:17 +00:00
|
|
|
*/
|
2000-09-07 01:33:02 +00:00
|
|
|
void
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
ast(struct trapframe *framep)
|
2000-09-07 01:33:02 +00:00
|
|
|
{
|
2002-10-01 14:16:50 +00:00
|
|
|
struct thread *td;
|
|
|
|
struct proc *p;
|
2020-02-20 15:34:02 +00:00
|
|
|
int flags, sig;
|
2000-09-07 01:33:02 +00:00
|
|
|
|
2002-10-01 14:16:50 +00:00
|
|
|
td = curthread;
|
|
|
|
p = td->td_proc;
|
2002-10-02 16:39:39 +00:00
|
|
|
|
Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
2002-06-29 17:26:22 +00:00
|
|
|
CTR3(KTR_SYSC, "ast: thread %p (pid %d, %s)", td, p->p_pid,
|
|
|
|
p->p_comm);
|
2001-02-22 18:05:15 +00:00
|
|
|
KASSERT(TRAPF_USERMODE(framep), ("ast in kernel mode"));
|
2003-03-04 21:03:05 +00:00
|
|
|
WITNESS_WARN(WARN_PANIC, NULL, "Returning to user mode");
|
2001-08-10 22:53:32 +00:00
|
|
|
mtx_assert(&Giant, MA_NOTOWNED);
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
THREAD_LOCK_ASSERT(td, MA_NOTOWNED);
|
2002-03-29 16:45:03 +00:00
|
|
|
td->td_frame = framep;
|
2006-02-08 08:09:17 +00:00
|
|
|
td->td_pticks = 0;
|
2002-10-01 14:16:50 +00:00
|
|
|
|
2002-03-29 16:45:03 +00:00
|
|
|
/*
|
2007-09-17 05:31:39 +00:00
|
|
|
* This updates the td_flag's for the checks below in one
|
2002-03-29 16:45:03 +00:00
|
|
|
* "atomic" operation with turning off the astpending flag.
|
|
|
|
* If another AST is triggered while we are handling the
|
2007-09-17 05:31:39 +00:00
|
|
|
* AST's saved in flags, the astpending flag will be set and
|
2002-03-29 16:45:03 +00:00
|
|
|
* ast() will be called again.
|
|
|
|
*/
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
thread_lock(td);
|
|
|
|
flags = td->td_flags;
|
2008-03-21 08:23:25 +00:00
|
|
|
td->td_flags &= ~(TDF_ASTPENDING | TDF_NEEDSIGCHK | TDF_NEEDSUSPCHK |
|
|
|
|
TDF_NEEDRESCHED | TDF_ALRMPEND | TDF_PROFPEND | TDF_MACPEND);
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
thread_unlock(td);
|
- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
2017-04-17 17:34:47 +00:00
|
|
|
VM_CNT_INC(v_trap);
|
2004-09-22 15:24:33 +00:00
|
|
|
|
2015-06-10 10:43:59 +00:00
|
|
|
if (td->td_cowgen != p->p_cowgen)
|
|
|
|
thread_cow_update(td);
|
2004-07-16 21:04:55 +00:00
|
|
|
if (td->td_pflags & TDP_OWEUPC && p->p_flag & P_PROFIL) {
|
|
|
|
addupc_task(td, td->td_profil_addr, td->td_profil_ticks);
|
|
|
|
td->td_profil_ticks = 0;
|
|
|
|
td->td_pflags &= ~TDP_OWEUPC;
|
2004-07-02 03:50:48 +00:00
|
|
|
}
|
2012-03-28 20:58:30 +00:00
|
|
|
#ifdef HWPMC_HOOKS
|
|
|
|
/* Handle Software PMC callchain capture. */
|
|
|
|
if (PMC_IS_PENDING_CALLCHAIN(td))
|
|
|
|
PMC_CALL_HOOK_UNLOCKED(td, PMC_FN_USER_CALLCHAIN_SOFT, (void *) framep);
|
|
|
|
#endif
|
2007-09-17 05:31:39 +00:00
|
|
|
if (flags & TDF_ALRMPEND) {
|
2002-03-29 16:45:03 +00:00
|
|
|
PROC_LOCK(p);
|
2011-09-16 13:58:51 +00:00
|
|
|
kern_psignal(p, SIGVTALRM);
|
2002-03-29 16:45:03 +00:00
|
|
|
PROC_UNLOCK(p);
|
|
|
|
}
|
2007-09-17 05:31:39 +00:00
|
|
|
if (flags & TDF_PROFPEND) {
|
2002-03-29 16:45:03 +00:00
|
|
|
PROC_LOCK(p);
|
2011-09-16 13:58:51 +00:00
|
|
|
kern_psignal(p, SIGPROF);
|
2002-03-29 16:45:03 +00:00
|
|
|
PROC_UNLOCK(p);
|
|
|
|
}
|
2002-11-08 19:00:17 +00:00
|
|
|
#ifdef MAC
|
2007-09-17 05:31:39 +00:00
|
|
|
if (flags & TDF_MACPEND)
|
2002-11-08 19:00:17 +00:00
|
|
|
mac_thread_userret(td);
|
|
|
|
#endif
|
2003-02-17 09:55:10 +00:00
|
|
|
if (flags & TDF_NEEDRESCHED) {
|
2003-07-31 01:36:24 +00:00
|
|
|
#ifdef KTRACE
|
|
|
|
if (KTRPOINT(td, KTR_CSW))
|
2012-04-20 15:32:36 +00:00
|
|
|
ktrcsw(1, 1, __func__);
|
2003-07-31 01:36:24 +00:00
|
|
|
#endif
|
Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
|
|
|
thread_lock(td);
|
2006-10-26 21:42:22 +00:00
|
|
|
sched_prio(td, td->td_user_pri);
|
2019-12-15 21:26:50 +00:00
|
|
|
mi_switch(SW_INVOL | SWT_NEEDRESCHED);
|
2003-07-31 01:36:24 +00:00
|
|
|
#ifdef KTRACE
|
|
|
|
if (KTRPOINT(td, KTR_CSW))
|
2012-04-20 15:32:36 +00:00
|
|
|
ktrcsw(0, 1, __func__);
|
2003-07-31 01:36:24 +00:00
|
|
|
#endif
|
2002-04-04 17:49:48 +00:00
|
|
|
}
|
Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.
Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.
Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.
Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month
2009-10-11 16:49:30 +00:00
|
|
|
|
2016-07-12 03:53:15 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
|
|
|
if (p->p_numthreads == 1 && (flags & TDF_NEEDSIGCHK) == 0) {
|
|
|
|
PROC_LOCK(p);
|
|
|
|
thread_lock(td);
|
|
|
|
/*
|
|
|
|
* Note that TDF_NEEDSIGCHK should be re-read from
|
|
|
|
* td_flags, since signal might have been delivered
|
|
|
|
* after we cleared td_flags above. This is one of
|
|
|
|
* the reason for looping check for AST condition.
|
2016-07-18 10:53:47 +00:00
|
|
|
* See comment in userret() about P_PPWAIT.
|
2016-07-12 03:53:15 +00:00
|
|
|
*/
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
if ((p->p_flag & P_PPWAIT) == 0 &&
|
|
|
|
(td->td_pflags & TDP_SIGFASTBLOCK) == 0) {
|
|
|
|
if (SIGPENDING(td) && (td->td_flags &
|
|
|
|
(TDF_NEEDSIGCHK | TDF_ASTPENDING)) !=
|
|
|
|
(TDF_NEEDSIGCHK | TDF_ASTPENDING)) {
|
|
|
|
thread_unlock(td); /* fix dumps */
|
|
|
|
panic(
|
|
|
|
"failed2 to set signal flags for ast p %p td %p fl %x %x",
|
|
|
|
p, td, flags, td->td_flags);
|
|
|
|
}
|
2016-07-18 10:53:47 +00:00
|
|
|
}
|
2016-07-12 03:53:15 +00:00
|
|
|
thread_unlock(td);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
Currently, when signal is delivered to the process and there is a thread
not blocking the signal, signal is placed on the thread sigqueue. If
the selected thread is in kernel executing thr_exit() or sigprocmask()
syscalls, then signal might be not delivered to usermode for arbitrary
amount of time, and for exiting thread it is lost.
Put process-directed signals to the process queue unconditionally,
selecting the thread to deliver the signal only by the thread returning
to usermode, since only then the thread can handle delivery of signal
reliably. For exiting thread or thread that has blocked some signals,
check whether the newly blocked signal is queued for the process, and
try to find a thread to wakeup for delivery, in reschedule_signal(). For
exiting thread, assume that all signals are blocked.
Change cursig() and postsig() to look both into the thread and process
signal queues. When there is a signal that thread returning to usermode
could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do
unlocked read of p_siglist and p_pendingcnt to check for queued signals.
Note that thread that has a signal unblocked might get spurious wakeup
and EINTR from the interruptible system call now, due to the possibility
of being selected by reschedule_signals(), while other thread returned
to usermode earlier and removed the signal from process queue. This
should not cause compliance issues, since the thread has not blocked a
signal and thus should be ready to receive it anyway.
Reported by: Justin Teller <justin.teller gmail com>
Reviewed by: davidxu, jilles
MFC after: 1 month
2009-10-11 16:49:30 +00:00
|
|
|
/*
|
|
|
|
* Check for signals. Unlocked reads of p_pendingcnt or
|
|
|
|
* p_siglist might cause process-directed signal to be handled
|
|
|
|
* later.
|
|
|
|
*/
|
|
|
|
if (flags & TDF_NEEDSIGCHK || p->p_pendingcnt > 0 ||
|
|
|
|
!SIGISEMPTY(p->p_siglist)) {
|
2020-02-20 15:34:02 +00:00
|
|
|
sigfastblock_fetch(td);
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
if ((td->td_pflags & TDP_SIGFASTBLOCK) != 0 &&
|
|
|
|
td->td_sigblock_val != 0) {
|
2020-03-10 20:04:38 +00:00
|
|
|
sigfastblock_setpend(td, true);
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
} else {
|
2020-02-20 21:25:12 +00:00
|
|
|
PROC_LOCK(p);
|
|
|
|
mtx_lock(&p->p_sigacts->ps_mtx);
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
while ((sig = cursig(td)) != 0) {
|
|
|
|
KASSERT(sig >= 0, ("sig %d", sig));
|
|
|
|
postsig(sig);
|
|
|
|
}
|
2020-02-20 21:25:12 +00:00
|
|
|
mtx_unlock(&p->p_sigacts->ps_mtx);
|
|
|
|
PROC_UNLOCK(p);
|
2016-07-12 03:52:05 +00:00
|
|
|
}
|
2002-04-04 17:49:48 +00:00
|
|
|
}
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Handle deferred update of the fast sigblock value, after
|
|
|
|
* the postsig() loop was performed.
|
|
|
|
*/
|
2020-02-20 15:34:02 +00:00
|
|
|
if (td->td_pflags & TDP_SIGFASTPENDING)
|
2020-03-10 20:04:38 +00:00
|
|
|
sigfastblock_setpend(td, false);
|
Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00
|
|
|
|
2008-03-21 08:23:25 +00:00
|
|
|
/*
|
|
|
|
* We need to check to see if we have to exit or wait due to a
|
|
|
|
* single threading requirement or some other STOP condition.
|
|
|
|
*/
|
|
|
|
if (flags & TDF_NEEDSUSPCHK) {
|
|
|
|
PROC_LOCK(p);
|
|
|
|
thread_suspend_check(0);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
}
|
1997-04-07 07:16:06 +00:00
|
|
|
|
2009-10-27 10:55:34 +00:00
|
|
|
if (td->td_pflags & TDP_OLDMASK) {
|
|
|
|
td->td_pflags &= ~TDP_OLDMASK;
|
|
|
|
kern_sigprocmask(td, SIG_SETMASK, &td->td_oldsigmask, NULL, 0);
|
|
|
|
}
|
|
|
|
|
2006-02-08 08:09:17 +00:00
|
|
|
userret(td, framep);
|
1997-04-07 07:16:06 +00:00
|
|
|
}
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
|
2010-05-26 15:39:43 +00:00
|
|
|
const char *
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
syscallname(struct proc *p, u_int code)
|
|
|
|
{
|
|
|
|
static const char unknown[] = "unknown";
|
2010-07-04 18:16:17 +00:00
|
|
|
struct sysentvec *sv;
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
|
2010-07-04 18:16:17 +00:00
|
|
|
sv = p->p_sysent;
|
|
|
|
if (sv->sv_syscallnames == NULL || code >= sv->sv_size)
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
return (unknown);
|
2010-07-04 18:16:17 +00:00
|
|
|
return (sv->sv_syscallnames[code]);
|
Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
2010-05-23 18:32:02 +00:00
|
|
|
}
|