Commit Graph

536 Commits

Author SHA1 Message Date
John Baldwin
5fcfab6e32 Add ptrace(2) reporting for LWP events.
Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting
thread creation and destruction. Newly created threads will stop to report
PL_FLAG_BORN before returning to userland and exiting threads will stop to
report PL_FLAG_EXIT before exiting completely. Both of these events are
only enabled and reported if PT_LWP_EVENTS is enabled on a process.
2015-12-29 23:25:26 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Andriy Gapon
2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Ed Schouten
f3fe76ecd8 Unignore signals when starting CloudABI processes.
As CloudABI processes cannot adjust their signal handlers, we need to
make sure that we start up CloudABI processes with consistent signal
masks. Though the POSIx standard signal behavior is all right, we do
need to make sure that we ignore SIGPIPE, as it would otherwise be
hard to interact with pipes and sockets.

Extend execsigs() to iterate over ps_sigignore and call sigdflt() for
each of the ignored signals.

Reviewed by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3365
2015-08-12 11:30:31 +00:00
Konstantin Belousov
b4490c6e93 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
Ed Schouten
ea566832d7 Add missing const keyword to kern_sigaction()'s 'act' parameter.
This structure is not modified by the function. Also add const to
sigact_flag_test(), as it is called by kern_sigaction().
2015-07-10 14:39:46 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Konstantin Belousov
aef68c961a When delivering a signal with default disposition to the thread,
tdsigwakeup() increases the priority of the low-priority threads, to
give them a chance to be terminated timely.  Also, kernel allows user
to signal kernel processes.  The combined effect is that signalling
idle process bump a priority of the selected delivery thread, which
starts eating CPU.

Check for the delivery thread be an idle thread and do not raise its
priority then.

The signal delivery to the kernel threads must be opt-in feature.
Kernel thread should explicitely declare the ability to handle signals
directed to it.  E.g., nfsd threads check for signal as an indication
of exit request.

Most threads do not handle signals at all, and queuing the signal to
them causes odd side-effects.  Most innocent consequence is the memory
leak due to queued ksiginfo, which is never deleted from the sigqueue.
Code to prevent even queuing signals to the kernel threads is trivial,
but it requires careful examination of each call to kproc/kthread
creation to decide should the signalling be allowed.  The commit is a
stop-gap measure which fixes the immediate case for now.

PR:	200493
Reported and tested by:	trasz
Discussed with:	trasz, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-29 16:26:08 +00:00
John Baldwin
515b7a0b97 Add KTR tracing for some MI ptrace events.
Differential Revision:	https://reviews.freebsd.org/D2643
Reviewed by:	kib
2015-05-25 22:13:22 +00:00
Rui Paulo
0da9e11b7e Disable coredump_devctl because it could lead to leaking paths to
jails.
2015-03-24 02:17:17 +00:00
Mateusz Guzik
5bc0ff888a coredump: protect corefilename access with a lock
Previously format string traversal could happen while the string itself was
being modified.

Use allproc_lock as coredumping is a rare operation and as such we don't
have to create a dedicated lock.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn>
Reviewed by:	kib
X-Additional:	JuniorJobs project
2015-03-21 04:39:33 +00:00
Mark Johnston
aa14e9b7c9 Reimplement support for userland core dump compression using a new interface
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.

This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.

Differential Revision:	https://reviews.freebsd.org/D1832
Reviewed by:	rpaulo
Discussed with:	kib
2015-03-09 03:50:53 +00:00
Konstantin Belousov
dacbc9dbe7 Keep a reference on the coredump vnode for vn_fullpath() call. Do it
by moving vn_close() after the point where notification is sent.

Reported by:	sbruno
Tested by:	pho, sbruno
Sponsored by:	The FreeBSD Foundation
2015-02-24 13:07:31 +00:00
Rui Paulo
b5263b26db Remove check against NULL after M_WAITOK.
Submitted by:	Oliver Pinter
2015-02-11 19:07:05 +00:00
Rui Paulo
6fbc0f7d98 Restore the data array in coredump(), but use a different style to
calculate the length.

Requested by:	kib
2015-02-11 00:58:15 +00:00
Rui Paulo
624157bb5e Remove a printf and an strlen() from the coredump code. 2015-02-10 18:35:46 +00:00
Rui Paulo
eb6368d4f8 Sanitise the coredump file names sent to devd.
While there, add a sysctl to turn this feature off as requested by
kib@.
2015-02-10 04:34:39 +00:00
Rui Paulo
842ab62b05 Notify devd(8) when a process crashed.
This change implements a notification (via devctl) to userland when
the kernel produces coredumps after a process has crashed.
devd can then run a specific command to produce a human readable crash
report.  The command is most usually a helper that runs gdb/lldb
commands on the file/coredump pair.  It's possible to use this
functionality for implementing automatic generation of crash reports.

devd(8) will be notified of the full path of the binary that crashed and
the full path of the coredump file.
2015-02-09 23:13:50 +00:00
Konstantin Belousov
677258f7e7 Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger
attachment to the process.  Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.

Discussed with:	jilles, rwatson
Man page fixes by:	rwatson
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:13:11 +00:00
Konstantin Belousov
e3612a4c1f Make SIGSTOP working for sleeps done while waiting for fifo readers or
writers in open(2), when the fifo is located on an NFS mount.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-18 15:03:26 +00:00
Konstantin Belousov
271ab2406f For sigaction(2), ignore possible garbage in sa_flags for sa_handler
== SIG_DFL or SIG_IGN.  Sloppy code does not fully initialize struct
sigaction for such cases, and being too demanding in the case of
default handler does not catch anything.

Reported and tested by:	Alex Tutubalin <lexa@lexa.ru>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-16 07:06:58 +00:00
Konstantin Belousov
8ee9765a9d Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that
the created file name was cached.  Use the flag for core dumps.

Requested by:	rpaulo
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:32:07 +00:00
Konstantin Belousov
6ddcc23386 Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.

Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC.  SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process.  Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.

Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume.  It
is used for debugging; should be removed after the real use of the
interface is added.

In collaboration with:	pho
Discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-13 16:18:29 +00:00
Konstantin Belousov
70778bba03 Assert the state of the process lock and sigact mutex in
kern_sigprocmask() and reschedule_signals().

Discussed with:	rea
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-28 10:20:00 +00:00
Konstantin Belousov
e442f29f08 Fix SA_SIGINFO | SA_RESETHAND handling. The sysent' sv_sendsig()
method needs pre-reset state of the ps_siginfo to correctly construct
signal frame.

Move sigdflt() call after the sv_sendsig() invocation in postsig().
Simultaneously extract common code from trapsignal() and postsig()
into new helper postsig_done().

Submitted by:	rea
MFC after:	1 week
2014-11-26 14:09:04 +00:00
Konstantin Belousov
539c9eef12 Fixes for i/o during coredumping:
- Do not dump into system files.
- Do not acquire write reference to the mount point where img.core is
  written, in the coredump().  The vn_rdwr() calls from ELF imgact
  request the write ref from vn_rdwr().  Recursive acqusition of the
  write ref deadlocks with the unmount.
- Instead, take the range lock for the whole core file.  This prevents
  parallel dumping from two processes executing the same image,
  converting the useless interleaved dump into sequential dumping,
  with second core overwriting the first.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-10-04 18:35:00 +00:00
Konstantin Belousov
c83655f334 Revert the handling of all siginfo sa_flags except SA_SIGINFO to the
pre-r270321.  Namely, the flags are preserved for SIG_DFL and SIG_IGN
dispositions.

Requested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-24 16:37:50 +00:00
Mateusz Guzik
ce8daaadbd Use refcount_init in sigacts_alloc.
This change is a no-op, but fixes up an inconsistency introduced with
r268634.

MFC after:	3 days
2014-08-24 09:24:37 +00:00
Konstantin Belousov
350ae56373 Ensure that sigaction flags for signal, which disposition is reset to
ignored or default, are not leaking.  Apparently, there exists code
which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN.

In kern_sigaction, ignore flags when resetting.  Encapsulate the flag
and disposition testing into helper sigact_flag_test().

On exec, and when delivering signal with SA_RESETHAND flag set,
signals are reset automatically.  Use new helper sigdflt(), which
removes duplicated code and corrects all flag bits for the signal.

For proc0, set sigintr bit for all ignored signals.  Ignored signals
are consumed in tdsendsignal() and not delivered to the victim thread
at all.

Reported and tested by:	royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 08:19:08 +00:00
Konstantin Belousov
2d86417410 Check the validity of struct sigaction sa_flags value, reject unknown
flags.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 07:52:47 +00:00
Mateusz Guzik
c959c23740 Manage struct sigacts refcnt with atomics instead of a mutex.
MFC after:	1 week
2014-07-14 21:12:59 +00:00
Mateusz Guzik
d00c8ea429 Perform a lockless check in sigacts_shared.
It is used only during execve (i.e. singlethreaded), so there is no fear
of returning 'not shared' which soon becomes 'shared'.

While here reorganize the code a little to avoid proc lock/unlock in
shared case.

MFC after:	1 week
2014-07-01 06:29:15 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Robert Watson
4a14441044 Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
Pawel Jakub Dawidek
f2b525e6b9 Make process descriptors standard part of the kernel. rwhod(8) already
requires process descriptors to work and having PROCDESC in GENERIC
seems not enough, especially that we hope to have more and more consumers
in the base.

MFC after:	3 days
2013-11-30 15:08:35 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Jilles Tjoelker
b20a9aa92a Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.
Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.

This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).

PR:		kern/184002
Reported by:	Dmitrijs Ledkovs
Tested by:	Dmitrijs Ledkovs
2013-11-17 22:31:23 +00:00
Pawel Jakub Dawidek
7008be5bd7 Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

	struct cap_rights {
		uint64_t	cr_rights[CAP_RIGHTS_VERSION + 2];
	};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

	#define	CAP_PDKILL	CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

	#define	CAP_LOOKUP	CAPRIGHT(0, 0x0000000000000400ULL)
	#define	CAP_FCHMOD	CAPRIGHT(0, 0x0000000000002000ULL)

	#define	CAP_FCHMODAT	(CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

	cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
	void cap_rights_set(cap_rights_t *rights, ...);
	void cap_rights_clear(cap_rights_t *rights, ...);
	bool cap_rights_is_set(const cap_rights_t *rights, ...);

	bool cap_rights_is_valid(const cap_rights_t *rights);
	void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
	void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
	bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

	cap_rights_t rights;

	cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

	#define	cap_rights_set(rights, ...)				\
		__cap_rights_set((rights), __VA_ARGS__, 0ULL)
	void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

	cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by:	The FreeBSD Foundation
2013-09-05 00:09:56 +00:00
Mark Johnston
7b77e1fe0f Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after:	2 weeks
2013-08-15 04:08:55 +00:00
Mateusz Guzik
462314b3f9 Remove duplicate assertion from tdsendsignal.
MFC after:	2 weeks
2013-07-22 00:44:37 +00:00
Gleb Smirnoff
b9ce4f67ae Fix memory leak in coredump().
Reviewed by:	kib
2013-04-05 20:24:51 +00:00
John Baldwin
1968f37bc9 Tweak some comments. 2013-03-18 18:04:09 +00:00
John Baldwin
3cf3b9f097 Partially revert r195702. Deferring stops is now implemented via a set of
calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep
calls.
- Remove the stop_allowed parameters from cursig() and issignal().
  issignal() checks TDF_SBDRY directly.
- Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
2013-03-18 17:23:58 +00:00
John Baldwin
593efaf9f7 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
Pawel Jakub Dawidek
6c08be2b88 Add break to the default case. 2013-02-17 11:47:58 +00:00
Konstantin Belousov
888d4d4f86 When vforked child is traced, the debugging events are not generated
until child performs exec().  The behaviour is reasonable when a
debugger is the real parent, because the parent is stopped until
exec(), and sending a debugging event to the debugger would deadlock
both parent and child.

On the other hand, when debugger is not the parent of the vforked
child, not sending debugging signals makes it impossible to debug
across vfork.

Fix the issue by declining generating debug signals only when vfork()
was done and child called ptrace(PT_TRACEME).  Set a new process flag
P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is
set, which indicates that the process was created with vfork() and
still did not execed. Check P_PPTRACE from issignal(), instead of
refusing the trace outright for the P_PPWAIT case.  The scope of
P_PPTRACE is exactly contained in the scope of P_PPWAIT.

Found and tested by:  zont
Reviewed by:	pluknet
MFC after:	2 weeks
2013-02-07 15:34:22 +00:00
John Baldwin
a120a7a3cd Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
Pawel Jakub Dawidek
c345faea5a Replace expand_name() function with corefile_open() function, which not
only returns name, but also vnode of corefile to use.

This simplifies the code and closes few races, especially in %I handling.

Reviewed by:	kib
Obtained from:	WHEEL Systems
2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek
22a5d85aa9 Use correct file permissions when looking for available core file if
kern.corefile contains %I.

Obtained from:	WHEEL Systems
2012-12-19 23:40:02 +00:00
Pawel Jakub Dawidek
07a8e07896 The 'flags' argument can be modified in vn_open_cred(), so we need to
set it for every loop interation.

Pointed out by:	kib
2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek
cc58032c44 Do not audit paths we try when kern.corefile contains %I.
Obtained from:	WHEEL Systems
2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek
29146f1a7a Style cleanups. 2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek
086053a370 The expand_name() function isn't called with the process lock held anymore,
so we can safely use malloc(M_WAITOK) now.

Pointed out by:	kib
2012-12-19 12:00:09 +00:00
Pawel Jakub Dawidek
f06f465db7 Minor style tweaks.
Obtained from:	WHEEL Systems
2012-12-17 10:51:22 +00:00
Pawel Jakub Dawidek
c52ff61196 Better variables naming in expand_name() to be more consistent with coredump().
Obtained from:	WHEEL Systems
2012-12-17 10:48:10 +00:00
Pawel Jakub Dawidek
dd57ce87eb Move expand_name() after process lock is released.
This fixed panic where we hold mutex (process lock) and try to obtain sleepable
lock (vnode lock in expand_name()). The panic could occur when %I was used
in kern.corefile.

Additionally we avoid expand_name() overhead when coredumps are disabled.

Obtained from:	WHEEL Systems
2012-12-16 14:53:27 +00:00
Pawel Jakub Dawidek
2ce1b32df2 Don't add audit record when coredumps are disabled or name cannot be expanded.
Discussed with:	rwatson
Obtained from:	WHEEL Systems
2012-12-16 14:24:59 +00:00
Pawel Jakub Dawidek
7e73ee85ab Make the check easier to read.
Obtained from:	WHEEL Systems
2012-12-16 14:14:18 +00:00
Pawel Jakub Dawidek
b039f8c2aa Use 'cred' variable.
Obtained from:	WHEEL Systems
2012-12-16 13:56:38 +00:00
Pawel Jakub Dawidek
b0c9d4d70e Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode
to dump core.

Reviewed by:	rwatson
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-11-27 10:38:11 +00:00
Pawel Jakub Dawidek
8890f5d020 Allow to use kill(2) in capability mode, but process can send a signal only
to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT)
which was failing in capability mode, so the code was failing back to exit(1).

Reviewed by:	rwatson
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-11-27 10:22:40 +00:00
Pawel Jakub Dawidek
b62d05fcf9 Allow to modify kern.sugid_coredump and kern.corefile from loader.conf.
Obtained from:	WHEEL Systems
2012-11-27 10:16:48 +00:00
Pawel Jakub Dawidek
c320984687 More style fixes. 2012-11-27 10:15:58 +00:00
Pawel Jakub Dawidek
23c6445a4b Style fixes (mostly whitespaces). 2012-11-27 10:11:54 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Eitan Adler
3d74f47b90 Correct the killpg(2) return values:
Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Reviewed by:	jilles
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:43:02 +00:00
Eitan Adler
10950e4651 Colin acked the wrong diff originally. fixed version coming soon.
Approved by:	cperciva (implicit)
2012-10-22 03:36:44 +00:00
Eitan Adler
2a1c0e4d4e Correct the killpg(2) return values:
Return EPERM if processes were found but they
were unable to be signaled.

Return the first error from p_cansignal if no signal was successful.

Reviewed by:	jilles
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:34:43 +00:00
John Baldwin
0f14f15b62 Ignore stop and continue signals sent to an exiting process. Stop signals
set p_xstat to the signal that triggered the stop, but p_xstat is also
used to hold the exit status of an exiting process.  Without this change,
a stop signal that arrived after a process was marked P_WEXIT but before
it was marked a zombie would overwrite the exit status with the stop signal
number.

Reviewed by:	kib
MFC after:	1 week
2012-09-13 15:51:18 +00:00
Konstantin Belousov
888aefef89 Deliver SIGSYS to the guilty thread, not to the process.
MFC after:	1 week
2012-08-18 18:17:10 +00:00
David Xu
7ce60f6013 Always clear p_xthread if current thread no longer needs it, in theory, if
debugger exited without calling ptrace(PT_DETACH), there is a time window
that the p_xthread may be pointing to non-existing thread, in practical,
this is not a problem because child process soon will be killed by parent
process.
2012-07-10 05:45:13 +00:00
Konstantin Belousov
2dd9ea6f70 Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.

Use the flag to fix sigsuspend(2) error return ktrace records.

Requested by:	bde
MFC after:	1 week
2012-04-12 10:48:43 +00:00
Jilles Tjoelker
8a8be77610 Remove unused and wrong SA_PROC internal signal property.
The SA_PROC signal property indicated whether each signal number is directed
at a specific thread or at the process in general. However, that depends on
how the signal was generated and not on the signal number. SA_PROC was not
used.
2012-04-09 21:58:58 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Sergey Kandaurov
c241c5e49a Fix arguments list for proc:::signal-discard DTrace probe.
Reported by:	Anton Yuzhaninov <citrin citrin ru>
MFC after:	1 week
2011-10-28 15:22:51 +00:00
Konstantin Belousov
8e9a54ee46 The sigwait(3) function shall not return EINTR, according to the
POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7
contains the wrapper sigwait(3) which hides EINTR from callers.  The
EINTR return is used by libthr to handle required cancellation point
in the sigwait(3).

To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and
earlier, to have right ABI for sigwait(3), transform EINTR return from
sigwait(2) into ERESTART.

Discussed with:	davidxu
MFC after:	1 week
2011-10-01 10:18:55 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Jonathan Anderson
cfb5f76865 Add experimental support for process descriptors
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-18 22:51:30 +00:00
Edward Tomasz Napierala
b8fdb0d94d Fix support for RACCT_CORE by merging forgotten file. 2011-05-26 18:54:07 +00:00
Jilles Tjoelker
6100955206 ktrace: Log the code for all signals (PSIG events).
The code provides information on how the signal was generated.

Formerly, the code was only logged for traps, much like only signal handlers
for traps received a meaningful si_code before FreeBSD 7.0.

In rare cases, no information is available and 0 is still logged.

MFC after:	1 week
2011-04-17 14:38:11 +00:00
John Baldwin
e806d352d2 Fix several places to ignore processes that are not yet fully constructed.
MFC after:	1 week
2011-04-06 17:47:22 +00:00
John Baldwin
c3b127e022 Small style fix. 2011-03-23 13:44:32 +00:00
Konstantin Belousov
6fa39a7327 Allow debugger to specify that children of the traced process should be
automatically traced. Extend the ptrace(PL_LWPINFO) to report that child
just forked.

Reviewed by:	davidxu, jhb
MFC after:	2 weeks
2011-01-25 10:59:21 +00:00
David Xu
407af02b6e In kern_sigtimedwait(), move initialization code out of process lock,
instead of using SIGISMEMBER to test every interesting signal, just
unmask the signal set and let cursig() return one, get the signal
after it returns, call reschedule_signal() after signals are blocked
again.

In kern_sigprocmask(), don't call reschedule_signal() when it is
unnecessary.

In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with
sig_ffs(), rename variable 'i' to sig.
2010-10-14 08:01:33 +00:00
David Xu
fc4ecc1d48 sigqueue_collect_set() is no longer needed because other functions
maintain pending set correctly.
2010-10-13 06:28:40 +00:00
David Xu
cf7d9a8ca8 Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian
2010-10-09 02:50:23 +00:00
Matthew D Fleming
4d369413e1 Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
David Xu
137cf33d5e rescure comments from RELENG_4. 2010-09-01 01:26:07 +00:00
David Xu
83b718eb07 If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT
signals, because it is managed by debugger, however a normal signal sent to
a interruptibly sleeping thread wakes up the thread so it will handle the
signal when the process leaves the stopped state.

PR:	150138
MFC after:	1 week
2010-08-31 07:15:50 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
David Xu
212bc4b337 Fix function name in error messages. 2010-07-20 02:23:12 +00:00
John Baldwin
fc8cca02c7 - Various style and whitespace fixes.
- Make sugid_coredump and kern_logsigexit private to kern_sig.c.

Submitted by:	bde (partially)
MFC after:	1 month
2010-07-08 19:15:26 +00:00
Konstantin Belousov
8a26007903 Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused
debugee stop. The change should keep the ABI. Take care of compat32.

Discussed with:	davidxu, jhb
MFC after:	2 weeks
2010-07-04 11:48:30 +00:00
John Baldwin
ad6eec7b9e Tweak the in-kernel API for sending signals to threads:
- Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c.
- Add tdsignal() and tdksignal() routines that mirror psignal() and
  pksignal() except that they accept a thread as an argument instead of
  a process.  They send a signal to a specific thread rather than to an
  individual process.

Reviewed by:	kib
2010-06-29 20:41:52 +00:00
Konstantin Belousov
c51050129f Do not report a stack garbage as the old value for debug.ncores sysctl.
Reported by:	brucec
2010-06-21 09:51:25 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
Alfred Perlstein
b7402d8269 Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(),
use the heap instead.

Obtained from: Juniper Networks

Reviewed by:	jhb
2010-04-30 03:15:00 +00:00