Commit Graph

557 Commits

Author SHA1 Message Date
Konstantin Belousov
396a0d4455 Do not wake up sleeping thread in reschedule_signals() if the signal
is blocked.  The spurious wakeup might result in spurious EINTR.

The reschedule_signals() function is called when the calling thread
has the signal mask changed.  For each newly blocked signal, we try to
find a thread which might have the signal not blocked.  If no such
thread exists, sigtd() returns random thread, which must not be waken
up.  I decided that re-checking, as suggested by PR submitter, is more
reasonable change than to change sigtd() interface, due to other uses
of sigtd().  signotify() already performs this check.

Submitted by:	Duane <parakleta@darkreality.org>
PR:	219228
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-05-12 15:34:59 +00:00
Mark Johnston
0c46712ca1 Let ptracestop() suspend threads sleeping in an SBDRY section.
When a thread enters ptracestop(), for example because it had received
SIGSTOP from ptrace(PT_ATTACH), it attempts to suspend other threads in
the same process. In the case of a thread sleeping interruptibly in an
SBDRY section, sig_suspend_threads() must wake the thread and allow it to
reach the user-mode boundary. However, sig_suspend_threads() would
erroneously avoid waking up such threads, resulting in an apparent hang.

Reviewed by:	kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-05-11 17:03:45 +00:00
Brooks Davis
f19351aad8 Provide a freebsd32 implementation of sigqueue()
The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets.  The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.

Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.

Reviewed by:	kib, wblock
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D10605
2017-05-05 18:49:39 +00:00
Tycho Nightingale
86be94fca3 Add support for capturing 'struct ptrace_lwpinfo' for signals
resulting in a process dumping core in the corefile.

Also extend procstat to view select members of 'struct ptrace_lwpinfo'
from the contents of the note.

Sponsored by:	Dell EMC Isilon
2017-03-30 18:21:36 +00:00
Konstantin Belousov
469ec1eb6a When clearing altsigstack settings on exec, do it to the right thread.
Diagnosed by:	smh
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-17 13:37:37 +00:00
Eric Badger
b4d3325975 Don't clear p_ptevents on normal SIGKILL delivery
The ptrace() user has the option of discarding the signal. In such a
case, p_ptevents should not be modified. If the ptrace() user decides to
send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events
do not have the capability to discard the signal, so continue to clear
the mask in that case.

Reviewed by:	jhb (initial revision)
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9939
2017-03-16 13:03:31 +00:00
Eric Badger
b38bd91f4f don't stop in issignal() if P_SINGLE_EXIT is set
Suppose a traced process is stopped in ptracestop() due to receipt of a
SIGSTOP signal, and is awaiting orders from the tracing process on how
to handle the signal. Before sending any such orders, the tracing
process exits. This should kill the traced process. But suppose a second
thread handles the SIGKILL and proceeds to exit1(), calling
thread_single(). The first thread will now awaken and will have a chance
to check once more if it should go to sleep due to the SIGSTOP.  It must
not sleep after P_SINGLE_EXIT has been set; this would prevent the
SIGKILL from taking effect, leaving a stopped orphan behind after the
tracing process dies.

Also add new tests for this condition.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9890
2017-03-07 13:41:01 +00:00
Ed Maste
e052a8b932 kern_sig.c: ANSIfy and remove archaic register keyword
Sponsored by:	The FreeBSD Foundation
2017-03-02 22:17:53 +00:00
Eric Badger
82a4538f31 Defer ptracestop() signals that cannot be delivered immediately
When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.

Add a number of tests for the new functionality. Several tests were authored
by jhb.

PR:		212607
Reviewed by:	kib
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell EMC
In collaboration with:	jhb
Differential Revision:	https://reviews.freebsd.org/D9260
2017-02-20 15:53:16 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Brooks Davis
ed6d876b19 Modernize the initalization of sigproptbl.
Use C99 designators to set the value of each slot and the nitems macro to
check for valid entries. In the process, switch to indexing by signal
number rather than signal-1 for improved clarity.

Obtained from:	CheriBSD (a6053c5abf)
Sponsored by:	DARPA, AFRL
Reviewed by:	kib
2016-09-06 22:03:53 +00:00
Brooks Davis
fd50a70770 Merge from CheriBSD:
Rename sigprop-table constants to SIGPROP_ from SA_ to reduce the
impression of a namespace collision.

Submitted by:	rwatson
Reviewed by:	jhb, kib (slightly different versions)
Obtained from:	CheriBSD (814ec5771c)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D7616
2016-09-02 18:22:56 +00:00
Mark Johnston
7f649dda55 Correct a check for P2_PTRACE_FSTP in ptracestop().
MFC after:	1 day
2016-08-19 01:27:24 +00:00
Konstantin Belousov
b7a25e63b6 When a debugger attaches to the process, SIGSTOP is sent to the
target.  Due to a way issignal() selects the next signal to deliver
and report, if the simultaneous or already pending another signal
exists, that signal might be reported by the next waitpid(2) call.
This causes minor annoyance for debuggers, which must be prepared to
take any signal as the first event, then filter SIGSTOP later.

More importantly, for tools like gcore(1), which attach and then
detach without processing events, SIGSTOP might leak to be delivered
after PT_DETACH.  This results in the process being unintentionally
stopped after detach, which is fatal for automatic tools.

The solution is to force SIGSTOP to be the first signal reported after
the attach.  Attach code is modified to set P2_PTRACE_FSTP to indicate
that the attaching ritual was not yet finished, and issignal() prefers
SIGSTOP in that condition.  Also, the thread which handles
P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first
waitpid(2).  All that ensures that SIGSTOP is consumed first.

Additionally, if P2_PTRACE_FSTP is still set on detach, which means
that waitpid(2) was not called at all, SIGSTOP is removed from the
queue, ensuring that the process is resumed on detach.

In issignal(), when acting on STOPing signals, remove the signal from
queue before suspending.  Otherwise parallel attach could result in
ptracestop() acting on that STOP as if it was the STOP signal from the
attach.  Then SIGSTOP from attach leaks again.

As a minor refactoring, some bits of the common attach code is moved
to new helper proc_set_traced().

Reported by:	markj
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D7256
2016-07-28 08:41:13 +00:00
Stephen J. Kiernan
cc37baea09 Add the NUM_CORE_FILES kernel config option which specifies the limit for the
number of core files allowed by a particular process when using the %I core
file name pattern.

Sanity check at compile time to ensure the value is within the valid range of
0-10.

Reviewed by:	jtl, sjg
Approved by:	sjg (mentor)
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D6812
2016-07-27 03:21:02 +00:00
John Baldwin
8d570f64aa Add a mask of optional ptrace() events.
ptrace() now stores a mask of optional events in p_ptevents.  Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.

Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.

The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.

The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX.  PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.

The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.

While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.

Reviewed by:	kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7044
2016-07-15 15:32:09 +00:00
Konstantin Belousov
46e47c4f8d Provide helper macros to detect 'non-silent SBDRY' state and to
calculate appropriate return value for stops.  Simplify the code by
using them.

Fix typo in sig_suspend_threads().  The thread which sleep must be
aborted is td2. (*)

In issignal(), when handling stopping signal for thread in
TD_SBDRY_INTR state, do not stop, this is wrong and fires assert.
This is yet another place where execution should be forced out of
SBDRY-protected region.  For such case, return -1 from issignal() and
translate it to corresponding error code in sleepq_catch_signals().
Assert that other consumers of cursig() are not affected by the new
return value. (*)

Micro-optimize, mostly VFS and VOP methods, by avoiding calling the
functions when SIGDEFERSTOP_NOP non-change is requested. (**)

Reported and tested by:	pho (*)
Requested by:	bde (**)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-07-03 18:19:48 +00:00
Konstantin Belousov
6f56cb8dbf Complete r302215. TDF_SBDRY | TDF_SERESTART and TDF_SBDRY |
TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as
if TDF_SBDRY is not set for STOP signal delivery.  The only difference
is that sig_suspend_threads() should abort the sleep instead of doing
immediate suspension.

Reported by:	ngie
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
Approved by:	re (gjb)
2016-06-28 16:41:50 +00:00
Konstantin Belousov
9e590ff04b When filt_proc() removes event from the knlist due to the process
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
  knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.

Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function.  The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty.  As result,
the knlist_remove_inevent() is no longer needed and removed.

Other changes:

In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications.  The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.

Fix immediate note activation in filt_procattach().  Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.

In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken.  Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).

Some minor and incomplete style fixes.

Analyzed and tested by:	Eric Badger <eric@badgerio.us>
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D6859
2016-06-27 21:52:17 +00:00
Konstantin Belousov
3a1e5dd8e6 Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.

Reviewed by:	jilles
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-06-26 20:07:24 +00:00
Mariusz Zaborski
fb4cdc96a8 Define tunable instead of using CTLFLAG_RWTUN flag with kern.corefile.
The allproc_lock lock used in the sysctl_kern_corefile function is initialized
in the procinit function which is called after setting sysctl values at boot.
That means if we set kern.corefile at boot we will be trying to use
lock with is uninitialized and machine will crash.

If we define kern.corefile as tunable instead of using CTFLAG_RWTUN we will
not call the sysctl_kern_corefile function and we will not use an uninitialized
lock. When machine will boot then we will start using function depending on
the lock.

Reviewed by:	pjd
2016-06-09 20:23:30 +00:00
John Baldwin
5fcfab6e32 Add ptrace(2) reporting for LWP events.
Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting
thread creation and destruction. Newly created threads will stop to report
PL_FLAG_BORN before returning to userland and exiting threads will stop to
report PL_FLAG_EXIT before exiting completely. Both of these events are
only enabled and reported if PT_LWP_EVENTS is enabled on a process.
2015-12-29 23:25:26 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Andriy Gapon
2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Ed Schouten
f3fe76ecd8 Unignore signals when starting CloudABI processes.
As CloudABI processes cannot adjust their signal handlers, we need to
make sure that we start up CloudABI processes with consistent signal
masks. Though the POSIx standard signal behavior is all right, we do
need to make sure that we ignore SIGPIPE, as it would otherwise be
hard to interact with pipes and sockets.

Extend execsigs() to iterate over ps_sigignore and call sigdflt() for
each of the ignored signals.

Reviewed by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3365
2015-08-12 11:30:31 +00:00
Konstantin Belousov
b4490c6e93 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
Ed Schouten
ea566832d7 Add missing const keyword to kern_sigaction()'s 'act' parameter.
This structure is not modified by the function. Also add const to
sigact_flag_test(), as it is called by kern_sigaction().
2015-07-10 14:39:46 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Konstantin Belousov
aef68c961a When delivering a signal with default disposition to the thread,
tdsigwakeup() increases the priority of the low-priority threads, to
give them a chance to be terminated timely.  Also, kernel allows user
to signal kernel processes.  The combined effect is that signalling
idle process bump a priority of the selected delivery thread, which
starts eating CPU.

Check for the delivery thread be an idle thread and do not raise its
priority then.

The signal delivery to the kernel threads must be opt-in feature.
Kernel thread should explicitely declare the ability to handle signals
directed to it.  E.g., nfsd threads check for signal as an indication
of exit request.

Most threads do not handle signals at all, and queuing the signal to
them causes odd side-effects.  Most innocent consequence is the memory
leak due to queued ksiginfo, which is never deleted from the sigqueue.
Code to prevent even queuing signals to the kernel threads is trivial,
but it requires careful examination of each call to kproc/kthread
creation to decide should the signalling be allowed.  The commit is a
stop-gap measure which fixes the immediate case for now.

PR:	200493
Reported and tested by:	trasz
Discussed with:	trasz, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 16:26:08 +00:00
John Baldwin
515b7a0b97 Add KTR tracing for some MI ptrace events.
Differential Revision:	https://reviews.freebsd.org/D2643
Reviewed by:	kib
2015-05-25 22:13:22 +00:00
Rui Paulo
0da9e11b7e Disable coredump_devctl because it could lead to leaking paths to
jails.
2015-03-24 02:17:17 +00:00
Mateusz Guzik
5bc0ff888a coredump: protect corefilename access with a lock
Previously format string traversal could happen while the string itself was
being modified.

Use allproc_lock as coredumping is a rare operation and as such we don't
have to create a dedicated lock.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn>
Reviewed by:	kib
X-Additional:	JuniorJobs project
2015-03-21 04:39:33 +00:00
Mark Johnston
aa14e9b7c9 Reimplement support for userland core dump compression using a new interface
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.

This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.

Differential Revision:	https://reviews.freebsd.org/D1832
Reviewed by:	rpaulo
Discussed with:	kib
2015-03-09 03:50:53 +00:00
Konstantin Belousov
dacbc9dbe7 Keep a reference on the coredump vnode for vn_fullpath() call. Do it
by moving vn_close() after the point where notification is sent.

Reported by:	sbruno
Tested by:	pho, sbruno
Sponsored by:	The FreeBSD Foundation
2015-02-24 13:07:31 +00:00
Rui Paulo
b5263b26db Remove check against NULL after M_WAITOK.
Submitted by:	Oliver Pinter
2015-02-11 19:07:05 +00:00
Rui Paulo
6fbc0f7d98 Restore the data array in coredump(), but use a different style to
calculate the length.

Requested by:	kib
2015-02-11 00:58:15 +00:00
Rui Paulo
624157bb5e Remove a printf and an strlen() from the coredump code. 2015-02-10 18:35:46 +00:00
Rui Paulo
eb6368d4f8 Sanitise the coredump file names sent to devd.
While there, add a sysctl to turn this feature off as requested by
kib@.
2015-02-10 04:34:39 +00:00
Rui Paulo
842ab62b05 Notify devd(8) when a process crashed.
This change implements a notification (via devctl) to userland when
the kernel produces coredumps after a process has crashed.
devd can then run a specific command to produce a human readable crash
report.  The command is most usually a helper that runs gdb/lldb
commands on the file/coredump pair.  It's possible to use this
functionality for implementing automatic generation of crash reports.

devd(8) will be notified of the full path of the binary that crashed and
the full path of the coredump file.
2015-02-09 23:13:50 +00:00
Konstantin Belousov
677258f7e7 Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger
attachment to the process.  Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.

Discussed with:	jilles, rwatson
Man page fixes by:	rwatson
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:13:11 +00:00
Konstantin Belousov
e3612a4c1f Make SIGSTOP working for sleeps done while waiting for fifo readers or
writers in open(2), when the fifo is located on an NFS mount.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:03:26 +00:00
Konstantin Belousov
271ab2406f For sigaction(2), ignore possible garbage in sa_flags for sa_handler
== SIG_DFL or SIG_IGN.  Sloppy code does not fully initialize struct
sigaction for such cases, and being too demanding in the case of
default handler does not catch anything.

Reported and tested by:	Alex Tutubalin <lexa@lexa.ru>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-16 07:06:58 +00:00
Konstantin Belousov
8ee9765a9d Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that
the created file name was cached.  Use the flag for core dumps.

Requested by:	rpaulo
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:32:07 +00:00
Konstantin Belousov
6ddcc23386 Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.

Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC.  SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process.  Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.

Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume.  It
is used for debugging; should be removed after the real use of the
interface is added.

In collaboration with:	pho
Discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-13 16:18:29 +00:00
Konstantin Belousov
70778bba03 Assert the state of the process lock and sigact mutex in
kern_sigprocmask() and reschedule_signals().

Discussed with:	rea
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-28 10:20:00 +00:00
Konstantin Belousov
e442f29f08 Fix SA_SIGINFO | SA_RESETHAND handling. The sysent' sv_sendsig()
method needs pre-reset state of the ps_siginfo to correctly construct
signal frame.

Move sigdflt() call after the sv_sendsig() invocation in postsig().
Simultaneously extract common code from trapsignal() and postsig()
into new helper postsig_done().

Submitted by:	rea
MFC after:	1 week
2014-11-26 14:09:04 +00:00
Konstantin Belousov
539c9eef12 Fixes for i/o during coredumping:
- Do not dump into system files.
- Do not acquire write reference to the mount point where img.core is
  written, in the coredump().  The vn_rdwr() calls from ELF imgact
  request the write ref from vn_rdwr().  Recursive acqusition of the
  write ref deadlocks with the unmount.
- Instead, take the range lock for the whole core file.  This prevents
  parallel dumping from two processes executing the same image,
  converting the useless interleaved dump into sequential dumping,
  with second core overwriting the first.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-10-04 18:35:00 +00:00
Konstantin Belousov
c83655f334 Revert the handling of all siginfo sa_flags except SA_SIGINFO to the
pre-r270321.  Namely, the flags are preserved for SIG_DFL and SIG_IGN
dispositions.

Requested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-24 16:37:50 +00:00
Mateusz Guzik
ce8daaadbd Use refcount_init in sigacts_alloc.
This change is a no-op, but fixes up an inconsistency introduced with
r268634.

MFC after:	3 days
2014-08-24 09:24:37 +00:00
Konstantin Belousov
350ae56373 Ensure that sigaction flags for signal, which disposition is reset to
ignored or default, are not leaking.  Apparently, there exists code
which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN.

In kern_sigaction, ignore flags when resetting.  Encapsulate the flag
and disposition testing into helper sigact_flag_test().

On exec, and when delivering signal with SA_RESETHAND flag set,
signals are reset automatically.  Use new helper sigdflt(), which
removes duplicated code and corrects all flag bits for the signal.

For proc0, set sigintr bit for all ignored signals.  Ignored signals
are consumed in tdsendsignal() and not delivered to the victim thread
at all.

Reported and tested by:	royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 08:19:08 +00:00
Konstantin Belousov
2d86417410 Check the validity of struct sigaction sa_flags value, reject unknown
flags.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 07:52:47 +00:00
Mateusz Guzik
c959c23740 Manage struct sigacts refcnt with atomics instead of a mutex.
MFC after:	1 week
2014-07-14 21:12:59 +00:00
Mateusz Guzik
d00c8ea429 Perform a lockless check in sigacts_shared.
It is used only during execve (i.e. singlethreaded), so there is no fear
of returning 'not shared' which soon becomes 'shared'.

While here reorganize the code a little to avoid proc lock/unlock in
shared case.

MFC after:	1 week
2014-07-01 06:29:15 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Pawel Jakub Dawidek
f2b525e6b9 Make process descriptors standard part of the kernel. rwhod(8) already
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.

MFC after:	3 days
2013-11-30 15:08:35 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Jilles Tjoelker
b20a9aa92a Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.
Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.

This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).

PR:		kern/184002
Reported by:	Dmitrijs Ledkovs
Tested by:	Dmitrijs Ledkovs
2013-11-17 22:31:23 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Mark Johnston
7b77e1fe0f Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after:	2 weeks
2013-08-15 04:08:55 +00:00
Mateusz Guzik
462314b3f9 Remove duplicate assertion from tdsendsignal.
MFC after:	2 weeks
2013-07-22 00:44:37 +00:00
Gleb Smirnoff
b9ce4f67ae Fix memory leak in coredump().
Reviewed by:	kib
2013-04-05 20:24:51 +00:00
John Baldwin
1968f37bc9 Tweak some comments. 2013-03-18 18:04:09 +00:00
John Baldwin
3cf3b9f097 Partially revert r195702. Deferring stops is now implemented via a set of
calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep
calls.
- Remove the stop_allowed parameters from cursig() and issignal().
  issignal() checks TDF_SBDRY directly.
- Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
2013-03-18 17:23:58 +00:00
John Baldwin
593efaf9f7 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
Pawel Jakub Dawidek
6c08be2b88 Add break to the default case. 2013-02-17 11:47:58 +00:00
Konstantin Belousov
888d4d4f86 When vforked child is traced, the debugging events are not generated
until child performs exec().  The behaviour is reasonable when a
debugger is the real parent, because the parent is stopped until
exec(), and sending a debugging event to the debugger would deadlock
both parent and child.

On the other hand, when debugger is not the parent of the vforked
child, not sending debugging signals makes it impossible to debug
across vfork.

Fix the issue by declining generating debug signals only when vfork()
was done and child called ptrace(PT_TRACEME).  Set a new process flag
P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is
set, which indicates that the process was created with vfork() and
still did not execed. Check P_PPTRACE from issignal(), instead of
refusing the trace outright for the P_PPWAIT case.  The scope of
P_PPTRACE is exactly contained in the scope of P_PPWAIT.

Found and tested by:  zont
Reviewed by:	pluknet
MFC after:	2 weeks
2013-02-07 15:34:22 +00:00
John Baldwin
a120a7a3cd Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
Pawel Jakub Dawidek
c345faea5a Replace expand_name() function with corefile_open() function, which not
only returns name, but also vnode of corefile to use.

This simplifies the code and closes few races, especially in %I handling.

Reviewed by:	kib
Obtained from:	WHEEL Systems
2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek
22a5d85aa9 Use correct file permissions when looking for available core file if
kern.corefile contains %I.

Obtained from:	WHEEL Systems
2012-12-19 23:40:02 +00:00
Pawel Jakub Dawidek
07a8e07896 The 'flags' argument can be modified in vn_open_cred(), so we need to
set it for every loop interation.

Pointed out by:	kib
2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek
cc58032c44 Do not audit paths we try when kern.corefile contains %I.
Obtained from:	WHEEL Systems
2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek
29146f1a7a Style cleanups. 2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek
086053a370 The expand_name() function isn't called with the process lock held anymore,
so we can safely use malloc(M_WAITOK) now.

Pointed out by:	kib
2012-12-19 12:00:09 +00:00
Pawel Jakub Dawidek
f06f465db7 Minor style tweaks.
Obtained from:	WHEEL Systems
2012-12-17 10:51:22 +00:00
Pawel Jakub Dawidek
c52ff61196 Better variables naming in expand_name() to be more consistent with coredump().
Obtained from:	WHEEL Systems
2012-12-17 10:48:10 +00:00
Pawel Jakub Dawidek
dd57ce87eb Move expand_name() after process lock is released.
This fixed panic where we hold mutex (process lock) and try to obtain sleepable
lock (vnode lock in expand_name()). The panic could occur when %I was used
in kern.corefile.

Additionally we avoid expand_name() overhead when coredumps are disabled.

Obtained from:	WHEEL Systems
2012-12-16 14:53:27 +00:00
Pawel Jakub Dawidek
2ce1b32df2 Don't add audit record when coredumps are disabled or name cannot be expanded.
Discussed with:	rwatson
Obtained from:	WHEEL Systems
2012-12-16 14:24:59 +00:00
Pawel Jakub Dawidek
7e73ee85ab Make the check easier to read.
Obtained from:	WHEEL Systems
2012-12-16 14:14:18 +00:00
Pawel Jakub Dawidek
b039f8c2aa Use 'cred' variable.
Obtained from:	WHEEL Systems
2012-12-16 13:56:38 +00:00
Pawel Jakub Dawidek
b0c9d4d70e Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode
to dump core.

Reviewed by:	rwatson
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-11-27 10:38:11 +00:00
Pawel Jakub Dawidek
8890f5d020 Allow to use kill(2) in capability mode, but process can send a signal only
to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT)
which was failing in capability mode, so the code was failing back to exit(1).

Reviewed by:	rwatson
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-11-27 10:22:40 +00:00
Pawel Jakub Dawidek
b62d05fcf9 Allow to modify kern.sugid_coredump and kern.corefile from loader.conf.
Obtained from:	WHEEL Systems
2012-11-27 10:16:48 +00:00
Pawel Jakub Dawidek
c320984687 More style fixes. 2012-11-27 10:15:58 +00:00
Pawel Jakub Dawidek
23c6445a4b Style fixes (mostly whitespaces). 2012-11-27 10:11:54 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Eitan Adler
3d74f47b90 Correct the killpg(2) return values:
Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Reviewed by:	jilles
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:43:02 +00:00
Eitan Adler
10950e4651 Colin acked the wrong diff originally. fixed version coming soon.
Approved by:	cperciva (implicit)
2012-10-22 03:36:44 +00:00
Eitan Adler
2a1c0e4d4e Correct the killpg(2) return values:
Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Reviewed by:	jilles
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:34:43 +00:00
John Baldwin
0f14f15b62 Ignore stop and continue signals sent to an exiting process. Stop signals
set p_xstat to the signal that triggered the stop, but p_xstat is also
used to hold the exit status of an exiting process.  Without this change,
a stop signal that arrived after a process was marked P_WEXIT but before
it was marked a zombie would overwrite the exit status with the stop signal
number.

Reviewed by:	kib
MFC after:	1 week
2012-09-13 15:51:18 +00:00
Konstantin Belousov
888aefef89 Deliver SIGSYS to the guilty thread, not to the process.
MFC after:	1 week
2012-08-18 18:17:10 +00:00
David Xu
7ce60f6013 Always clear p_xthread if current thread no longer needs it, in theory, if
debugger exited without calling ptrace(PT_DETACH), there is a time window
that the p_xthread may be pointing to non-existing thread, in practical,
this is not a problem because child process soon will be killed by parent
process.
2012-07-10 05:45:13 +00:00
Konstantin Belousov
2dd9ea6f70 Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.

Use the flag to fix sigsuspend(2) error return ktrace records.

Requested by:	bde
MFC after:	1 week
2012-04-12 10:48:43 +00:00
Jilles Tjoelker
8a8be77610 Remove unused and wrong SA_PROC internal signal property.
The SA_PROC signal property indicated whether each signal number is directed
at a specific thread or at the process in general. However, that depends on
how the signal was generated and not on the signal number. SA_PROC was not
used.
2012-04-09 21:58:58 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Sergey Kandaurov
c241c5e49a Fix arguments list for proc:::signal-discard DTrace probe.
Reported by:	Anton Yuzhaninov <citrin citrin ru>
MFC after:	1 week
2011-10-28 15:22:51 +00:00
Konstantin Belousov
8e9a54ee46 The sigwait(3) function shall not return EINTR, according to the
POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7
contains the wrapper sigwait(3) which hides EINTR from callers.  The
EINTR return is used by libthr to handle required cancellation point
in the sigwait(3).

To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and
earlier, to have right ABI for sigwait(3), transform EINTR return from
sigwait(2) into ERESTART.

Discussed with:	davidxu
MFC after:	1 week
2011-10-01 10:18:55 +00:00