freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	aef68c961a	When delivering a signal with default disposition to the thread, tdsigwakeup() increases the priority of the low-priority threads, to give them a chance to be terminated timely. Also, kernel allows user to signal kernel processes. The combined effect is that signalling idle process bump a priority of the selected delivery thread, which starts eating CPU. Check for the delivery thread be an idle thread and do not raise its priority then. The signal delivery to the kernel threads must be opt-in feature. Kernel thread should explicitely declare the ability to handle signals directed to it. E.g., nfsd threads check for signal as an indication of exit request. Most threads do not handle signals at all, and queuing the signal to them causes odd side-effects. Most innocent consequence is the memory leak due to queued ksiginfo, which is never deleted from the sigqueue. Code to prevent even queuing signals to the kernel threads is trivial, but it requires careful examination of each call to kproc/kthread creation to decide should the signalling be allowed. The commit is a stop-gap measure which fixes the immediate case for now. PR: 200493 Reported and tested by: trasz Discussed with: trasz, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 16:26:08 +00:00
John Baldwin	515b7a0b97	Add KTR tracing for some MI ptrace events. Differential Revision: https://reviews.freebsd.org/D2643 Reviewed by: kib	2015-05-25 22:13:22 +00:00
Rui Paulo	0da9e11b7e	Disable coredump_devctl because it could lead to leaking paths to jails.	2015-03-24 02:17:17 +00:00
Mateusz Guzik	5bc0ff888a	coredump: protect corefilename access with a lock Previously format string traversal could happen while the string itself was being modified. Use allproc_lock as coredumping is a rare operation and as such we don't have to create a dedicated lock. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> Reviewed by: kib X-Additional: JuniorJobs project	2015-03-21 04:39:33 +00:00
Mark Johnston	aa14e9b7c9	Reimplement support for userland core dump compression using a new interface in kern_gzio.c. The old gzio interface was somewhat inflexible and has not worked properly since r272535: currently, the gzio functions are called with a range lock held on the output vnode, but kern_gzio.c does not pass the IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr() attempts to reacquire the range lock. Moreover, the new gzio interface can be used to implement kernel core compression. This change also modifies the kernel configuration options needed to enable userland core dump compression support: gzio is now an option rather than a device, and the COMPRESS_USER_CORES option is removed. Core dump compression is enabled using the kern.compress_user_cores sysctl/tunable. Differential Revision: https://reviews.freebsd.org/D1832 Reviewed by: rpaulo Discussed with: kib	2015-03-09 03:50:53 +00:00
Konstantin Belousov	dacbc9dbe7	Keep a reference on the coredump vnode for vn_fullpath() call. Do it by moving vn_close() after the point where notification is sent. Reported by: sbruno Tested by: pho, sbruno Sponsored by: The FreeBSD Foundation	2015-02-24 13:07:31 +00:00
Rui Paulo	b5263b26db	Remove check against NULL after M_WAITOK. Submitted by: Oliver Pinter	2015-02-11 19:07:05 +00:00
Rui Paulo	6fbc0f7d98	Restore the data array in coredump(), but use a different style to calculate the length. Requested by: kib	2015-02-11 00:58:15 +00:00
Rui Paulo	624157bb5e	Remove a printf and an strlen() from the coredump code.	2015-02-10 18:35:46 +00:00
Rui Paulo	eb6368d4f8	Sanitise the coredump file names sent to devd. While there, add a sysctl to turn this feature off as requested by kib@.	2015-02-10 04:34:39 +00:00
Rui Paulo	842ab62b05	Notify devd(8) when a process crashed. This change implements a notification (via devctl) to userland when the kernel produces coredumps after a process has crashed. devd can then run a specific command to produce a human readable crash report. The command is most usually a helper that runs gdb/lldb commands on the file/coredump pair. It's possible to use this functionality for implementing automatic generation of crash reports. devd(8) will be notified of the full path of the binary that crashed and the full path of the coredump file.	2015-02-09 23:13:50 +00:00
Konstantin Belousov	677258f7e7	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:13:11 +00:00
Konstantin Belousov	e3612a4c1f	Make SIGSTOP working for sleeps done while waiting for fifo readers or writers in open(2), when the fifo is located on an NFS mount. Reported by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:03:26 +00:00
Konstantin Belousov	271ab2406f	For sigaction(2), ignore possible garbage in sa_flags for sa_handler == SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct sigaction for such cases, and being too demanding in the case of default handler does not catch anything. Reported and tested by: Alex Tutubalin <lexa@lexa.ru> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-16 07:06:58 +00:00
Konstantin Belousov	8ee9765a9d	Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps. Requested by: rpaulo Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-21 13:32:07 +00:00
Konstantin Belousov	6ddcc23386	Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-13 16:18:29 +00:00
Konstantin Belousov	70778bba03	Assert the state of the process lock and sigact mutex in kern_sigprocmask() and reschedule_signals(). Discussed with: rea Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-28 10:20:00 +00:00
Konstantin Belousov	e442f29f08	Fix SA_SIGINFO \| SA_RESETHAND handling. The sysent' sv_sendsig() method needs pre-reset state of the ps_siginfo to correctly construct signal frame. Move sigdflt() call after the sv_sendsig() invocation in postsig(). Simultaneously extract common code from trapsignal() and postsig() into new helper postsig_done(). Submitted by: rea MFC after: 1 week	2014-11-26 14:09:04 +00:00
Konstantin Belousov	539c9eef12	Fixes for i/o during coredumping: - Do not dump into system files. - Do not acquire write reference to the mount point where img.core is written, in the coredump(). The vn_rdwr() calls from ELF imgact request the write ref from vn_rdwr(). Recursive acqusition of the write ref deadlocks with the unmount. - Instead, take the range lock for the whole core file. This prevents parallel dumping from two processes executing the same image, converting the useless interleaved dump into sequential dumping, with second core overwriting the first. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-10-04 18:35:00 +00:00
Konstantin Belousov	c83655f334	Revert the handling of all siginfo sa_flags except SA_SIGINFO to the pre-r270321. Namely, the flags are preserved for SIG_DFL and SIG_IGN dispositions. Requested and reviewed by: jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-24 16:37:50 +00:00
Mateusz Guzik	ce8daaadbd	Use refcount_init in sigacts_alloc. This change is a no-op, but fixes up an inconsistency introduced with r268634. MFC after: 3 days	2014-08-24 09:24:37 +00:00
Konstantin Belousov	350ae56373	Ensure that sigaction flags for signal, which disposition is reset to ignored or default, are not leaking. Apparently, there exists code which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN. In kern_sigaction, ignore flags when resetting. Encapsulate the flag and disposition testing into helper sigact_flag_test(). On exec, and when delivering signal with SA_RESETHAND flag set, signals are reset automatically. Use new helper sigdflt(), which removes duplicated code and corrects all flag bits for the signal. For proc0, set sigintr bit for all ignored signals. Ignored signals are consumed in tdsendsignal() and not delivered to the victim thread at all. Reported and tested by: royger Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-22 08:19:08 +00:00
Konstantin Belousov	2d86417410	Check the validity of struct sigaction sa_flags value, reject unknown flags. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-08-22 07:52:47 +00:00
Mateusz Guzik	c959c23740	Manage struct sigacts refcnt with atomics instead of a mutex. MFC after: 1 week	2014-07-14 21:12:59 +00:00
Mateusz Guzik	d00c8ea429	Perform a lockless check in sigacts_shared. It is used only during execve (i.e. singlethreaded), so there is no fear of returning 'not shared' which soon becomes 'shared'. While here reorganize the code a little to avoid proc lock/unlock in shared case. MFC after: 1 week	2014-07-01 06:29:15 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
Pawel Jakub Dawidek	f2b525e6b9	Make process descriptors standard part of the kernel. rwhod(8) already requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days	2013-11-30 15:08:35 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Jilles Tjoelker	b20a9aa92a	Fix siginfo_t.si_status for wait6/waitid/SIGCHLD. Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002 Reported by: Dmitrijs Ledkovs Tested by: Dmitrijs Ledkovs	2013-11-17 22:31:23 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Mark Johnston	7b77e1fe0f	Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks	2013-08-15 04:08:55 +00:00
Mateusz Guzik	462314b3f9	Remove duplicate assertion from tdsendsignal. MFC after: 2 weeks	2013-07-22 00:44:37 +00:00
Gleb Smirnoff	b9ce4f67ae	Fix memory leak in coredump(). Reviewed by: kib	2013-04-05 20:24:51 +00:00
John Baldwin	1968f37bc9	Tweak some comments.	2013-03-18 18:04:09 +00:00
John Baldwin	3cf3b9f097	Partially revert r195702. Deferring stops is now implemented via a set of calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.	2013-03-18 17:23:58 +00:00
John Baldwin	593efaf9f7	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month	2013-02-21 19:02:50 +00:00
Pawel Jakub Dawidek	6c08be2b88	Add break to the default case.	2013-02-17 11:47:58 +00:00
Konstantin Belousov	888d4d4f86	When vforked child is traced, the debugging events are not generated until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks	2013-02-07 15:34:22 +00:00
John Baldwin	a120a7a3cd	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month	2013-02-06 17:06:51 +00:00
Pawel Jakub Dawidek	c345faea5a	Replace expand_name() function with corefile_open() function, which not only returns name, but also vnode of corefile to use. This simplifies the code and closes few races, especially in %I handling. Reviewed by: kib Obtained from: WHEEL Systems	2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek	22a5d85aa9	Use correct file permissions when looking for available core file if kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 23:40:02 +00:00
Pawel Jakub Dawidek	07a8e07896	The 'flags' argument can be modified in vn_open_cred(), so we need to set it for every loop interation. Pointed out by: kib	2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek	cc58032c44	Do not audit paths we try when kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek	29146f1a7a	Style cleanups.	2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek	086053a370	The expand_name() function isn't called with the process lock held anymore, so we can safely use malloc(M_WAITOK) now. Pointed out by: kib	2012-12-19 12:00:09 +00:00
Pawel Jakub Dawidek	f06f465db7	Minor style tweaks. Obtained from: WHEEL Systems	2012-12-17 10:51:22 +00:00

1 2 3 4 5 ...

479 Commits