freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	d8bc2a17a5	fd: remove fd_lastfile It keeps recalculated way more often than it is needed. Provide a routine (fdlastfile) to get it if necessary. Consumers may be better off with a bitmap iterator instead.	2020-07-15 10:24:04 +00:00
Mateusz Guzik	1724c563e6	cred: distribute reference count per thread This avoids dirtying creds in the common case, see the comment in kern_prot.c for details. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D24007	2020-06-09 23:03:48 +00:00
Christian S.J. Peron	757a564248	Add BSM record conversion for a number of syscalls: - thr_kill(2) and thr_exit(2) generally (no argument auditing here. - A set of syscalls for the process descriptor family, specifically: pdfork(2), pdgetpid(2) and pdkill(2) For these syscalls, audit the file descriptor. In the case of pdfork(2) a pointer to an integer (file descriptor) is passed in as an argument. We audit the post initialized file descriptor (not the random garbage that would have been passed in). We will also audit the child process which was created from the fork operation (similar to what is done for the fork(2) syscall). pdkill(2) we audit the signal value and fd, and finally pdgetpid(2) just the file descriptor: - Following is a sample of the produced audit trails: header,111,11,pdfork(2),0,Sat May 16 03:07:50 2020, + 394 msec argument,0,0x39d,child PID argument,2,0x2,flags argument,1,0x8,fd subject,root,root,0,root,0,924,0,0,0.0.0.0 return,success,925 header,79,11,pdgetpid(2),0,Sat May 16 03:07:50 2020, + 394 msec argument,1,0x8,fd subject,root,root,0,root,0,924,0,0,0.0.0.0 return,success,0 trailer,79 header,135,11,pdkill(2),0,Sat May 16 03:07:50 2020, + 395 msec argument,1,0x8,fd argument,2,0xf,signal process_ex,root,root,0,root,0,925,0,0,0.0.0.0 subject,root,root,0,root,0,924,0,0,0.0.0.0 return,success,0 trailer,135 MFC after: 1 week	2020-05-16 03:45:15 +00:00
John Baldwin	59838c1a19	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Konstantin Belousov	146fc63fce	Add a way to manage thread signal mask using shared word, instead of syscall. A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773	2020-02-09 11:53:12 +00:00
Jeff Roberson	61a74c5ccd	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626	2019-12-15 21:11:15 +00:00
Kyle Evans	079c5b9ed8	rfork(2): add RFSPAWN flag When RFSPAWN is passed, rfork exhibits vfork(2) semantics but also resets signal handlers in the child during creation to avoid a point of corruption of parent state from the child. This flag will be used by posix_spawn(3) to handle potential signal issues. Reviewed by: jilles, kib Differential Revision: https://reviews.freebsd.org/D19058	2019-09-25 19:20:41 +00:00
Konstantin Belousov	fe69291ff4	Add procctl(PROC_STACKGAP_CTL) It allows a process to request that stack gap was not applied to its stacks, retroactively. Also it is possible to control the gaps in the process after exec. PR: 239894 Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21352	2019-09-03 18:56:25 +00:00
Mateusz Guzik	50c7615fb0	fork: rework locking around do_fork - move allproc lock into the func, it is of no use prior to it - the code would lock p1 and p2 while holding allproc to partially construct it after it gets added to the list. instead we can do the work prior to adding anything. - protect lastpid with procid_lock As a side effect we do less work with allproc held. Sponsored by: The FreeBSD Foundation	2019-08-17 18:19:49 +00:00
Mateusz Guzik	60cdcb644d	fork: bump process count before checking for permission to cross the limit The limit is almost never reached. Do the check only on failure to see if we can override it. No change in user-visible behavior. Sponsored by: The FreeBSD Foundation	2019-08-17 17:56:43 +00:00
Mateusz Guzik	b05641b6bd	fork: stop skipping < 100 ids on wrap around Code doing this is commented with a claim that these IDs are occupied by daemons, but that's demonstrably false. To an extent the range is used by init and kernel processes (and on sufficiently big machines it indeed is fully populated). On a sample box 40-way box the highest id in the range is 63. On a different one it is 23. Just use the range. Sponsored by: The FreeBSD Foundation	2019-08-17 17:42:01 +00:00
Mark Johnston	2ffee5c1b2	Inherit P2_PROTMAX_{ENABLE,DISABLE} across fork(). Thus, when using proccontrol(1) to disable implicit application of PROT_MAX within a process, child processes will inherit this setting. Discussed with: kib MFC with: r349609 Sponsored by: The FreeBSD Foundation	2019-07-10 19:57:48 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Mark Johnston	7d43b5c98e	Simplify the test against maxproc in fork1(). Previously nprocs_new would be tested against maxprocs twice when nprocs_new < maxprocs - 10. Eliminate the unnecessary comparison. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> GitHub PR: https://github.com/freebsd/freebsd/pull/397 MFC after: 1 week	2019-05-07 15:03:26 +00:00
Mateusz Guzik	37d2b1f3e5	Annotate nprocs with __exclusive_cache_line Sponsored by: The FreeBSD Foundation	2019-05-04 19:04:17 +00:00
Konstantin Belousov	fa50a3552d	Implement Address Space Layout Randomization (ASLR) With this change, randomization can be enabled for all non-fixed mappings. It means that the base address for the mapping is selected with a guaranteed amount of entropy (bits). If the mapping was requested to be superpage aligned, the randomization honours the superpage attributes. Although the value of ASLR is diminshing over time as exploit authors work out simple ASLR bypass techniques, it elimintates the trivial exploitation of certain vulnerabilities, at least in theory. This implementation is relatively small and happens at the correct architectural level. Also, it is not expected to introduce regressions in existing cases when turned off (default for now), or cause any significant maintaince burden. The randomization is done on a best-effort basis - that is, the allocator falls back to a first fit strategy if fragmentation prevents entropy injection. It is trivial to implement a strong mode where failure to guarantee the requested amount of entropy results in mapping request failure, but I do not consider that to be usable. I have not fine-tuned the amount of entropy injected right now. It is only a quantitive change that will not change the implementation. The current amount is controlled by aslr_pages_rnd. To not spoil coalescing optimizations, to reduce the page table fragmentation inherent to ASLR, and to keep the transient superpage promotion for the malloced memory, locality clustering is implemented for anonymous private mappings, which are automatically grouped until fragmentation kicks in. The initial location for the anon group range is, of course, randomized. This is controlled by vm.cluster_anon, enabled by default. The default mode keeps the sbrk area unpopulated by other mappings, but this can be turned off, which gives much more breathing bits on architectures with small address space, such as i386. This is tied with the question of following an application's hint about the mmap(2) base address. Testing shows that ignoring the hint does not affect the function of common applications, but I would expect more demanding code could break. By default sbrk is preserved and mmap hints are satisfied, which can be changed by using the kern.elf{32,64}.aslr.honor_sbrk sysctl. ASLR is enabled on per-ABI basis, and currently it is only allowed on FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support for additional architectures will be added after further testing. Both per-process and per-image controls are implemented: - procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS; - NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible to force ASLR off for the given binary. (A tool to edit the feature control note is in development.) Global controls are: - kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2); - kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings; - kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2); - vm.cluster_anon - enables anon mapping clustering. PR: 208580 (exp runs) Exp-runs done by: antoine Reviewed by: markj (previous version) Discussed with: emaste Tested by: pho MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5603	2019-02-10 17:19:45 +00:00
Konstantin Belousov	be8dd1428e	Re-wrap long line after r341827. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-01-17 04:51:05 +00:00
Mateusz Guzik	19b75ef59a	Microoptimize corner case of ID bitmap handling. Prior to the change we would avoidably test more possibly used IDs. While here update the comment: there is no pidchecked variable anymore.	2018-12-19 20:29:52 +00:00
Mateusz Guzik	7d065d876e	Deinline vfork handling out of the syscall return path. vfork is rarely called (comparatively to other syscalls) and it avoidably pollutes the fast path. Sponsored by: The FreeBSD Foundation	2018-12-19 20:27:26 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Mateusz Guzik	eab2132ad9	Fix a corner case in ID bitmap management. If all IDs from trypid to pid_max were used as pids, the code would enter a loop which would be infinite if none of the IDs could become free (e.g. they all belong to processes which did not transitioned to zombie). Fixes: r341684 ("Manage process-related IDs with bitmaps") Sponsored by: The FreeBSD Foundation	2018-12-08 10:22:12 +00:00
Mateusz Guzik	e52327e3c5	proc: postpone proc unlock until after reporting with kqueue kqueue would always relock immediately afterwards. While here drop the NULL check for list itself. The list is always allocated. Sponsored by: The FreeBSD Foundation	2018-12-08 06:34:12 +00:00
Mateusz Guzik	34ebdceac0	Manage process-related IDs with bitmaps Currently unique pid allocation on fork often requires a full walk of process, group, session lists to make sure it is not used by anything. This has a side effect of requiring proctree to be held along with allproc, which adds more contention in poudriere -j 128. The patch below implements trivial bitmaps which gets rid of the problem. Dedicated lock is introduced to manage IDs. While here a bug was discovered: all processes would inherit reap id from the first process spawned by init. This had a side effect of keeping the ID used and when allocation rolls over to the beginning it keeps being skipped. The patch is loosely based on initial work by mjoras@. Reviewed by: kib Sponsored by: The FreeBSD Foundation	2018-12-07 12:22:32 +00:00
Mateusz Guzik	1e9a1bf589	proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock waitpid always takes proctree to evaluate the list, but only takes allproc if it can reap. With this patch allproc is no longer taken, which helps during poudriere -j 128. Discussed with: kib Sponsored by: The FreeBSD Foundation	2018-11-29 02:52:08 +00:00
Mateusz Guzik	e3d3e8289b	Revert "fork: fix use-after-free with vfork" This unreliably breaks libc handling of vfork where forking succeded, but execve did not. vfork code in libc performs waitpid with WNOHANG in case of failed exec. With the fix exit codepath was waking up the parent before the child fully transitioned to a zombie. Woken up parent would waitpid, which could find a not-yet-zombie child and fail to reap it due to the WNOHANG flag. While removing the flag fixes the problem, it is not an option due to older releases which would still suffer from the kernel change. Revert the fix until a solution can be worked out. Note that while use-after-free which gets back due to the revert is a real bug, it's side-effects are limited due to the fact that struct proc memory is never released by UMA.	2018-11-23 04:38:50 +00:00
Mateusz Guzik	a5ac8272c0	fork: remove avoidable proc lock/unlock pair We don't have to access the process after making it runnable, so there is no need to hold it either. Sponsored by: The FreeBSD Foundation	2018-11-22 21:29:36 +00:00
Mateusz Guzik	b00b27e925	fork: fix use-after-free with vfork The pointer to the child is stored without any reference held. Then it is blindly used to wait until P_PPWAIT is cleared. However, if the child is autoreaped it could have exited and get freed before the parent started waiting. Use the existing hold mechanism to mitigate the problem. Most common case of doing exec remains unchanged. The corner case of doing exit performs wake up before waiting for holds to clear. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18295	2018-11-22 21:08:37 +00:00
Mateusz Guzik	3d3e6793f6	proc: implement pid hash locks and an iterator forks, exits and waits are frequently stalled during poudriere -j 128 runs due to killpg and process list exports performed for each package. Both uses take the allproc lock. The latter case can be modified to iterate over the hash with finer grained locking instead. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17817	2018-11-21 18:56:15 +00:00
Mateusz Guzik	2c054ce924	proc: always store parent pid in p_oppid Doing so removes the dependency on proctree lock from sysctl process list export which further reduces contention during poudriere -j 128 runs. Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17825	2018-11-16 17:07:54 +00:00
Konstantin Belousov	6e22bbf66e	fork: avoid endless wait with PTRACE_FORK and RFSTOPPED. An RFSTOPPED thread can't clean TDB_STOPATFORK, which is done in the fork_return() in its context, so parent is stuck forever. Triggered when trying to ptrace linux process. Instead of waiting for the new thread to clear TDB_STOPATFORK, tag it as traced and reparent to the debugger in do_fork(), and let it only notify the debugger when run. Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com> Reviewed by: jhb MFC after: 1 week X-MFC-Note: keep p_dbgwait placeholder intact Differential revision: https://reviews.freebsd.org/D15857	2018-06-21 21:12:49 +00:00
Mateusz Guzik	81d68271d7	Reduce contention on the proctree lock during heavy package build. There is a proctree -> allproc ordering established. Most of the time it is either xlock -> xlock or slock -> slock. On fork however there is a slock -> xlock pair which results in pathological wait times due to threads keeping proctree held for reading and all waiting on allproc. Switch this to xlock -> xlock. Longer term fix would get rid of proctree in this place to begin with. Right now it is necessary to walk the session/process group lists to determine which id is free. The walk can be avoided e.g. with bitmaps. The exit path used to have one place which dealt with allproc and then with proctree. Move the allproc acquire into the section protected by proctree. This reduces contention against threads waiting on proctree in the fork codepath - the fork proctree holder does not have to wait for allproc as often. Finally, move tidhash manipulation outside of the area protected by either of these locks. The removal from the hash was already unprotected. There is no legitimate reason to look up thread ids for a process still under construction. This results in about 50% wait time reduction during -j 128 package build.	2018-02-20 02:18:30 +00:00
Mateusz Guzik	65f29b9caa	Postpone sx_sunlock(&proctree_lock) on fork until after allproc is dropped. There is a significant contention on the lock during -j 128 package build. This change drops total wait time on this lock by 60%.	2018-02-17 00:23:28 +00:00
Bjoern A. Zeeb	8e23158af7	Remove trailing whitespace. No functional change.	2018-01-14 15:01:25 +00:00
Jeff Roberson	3f289c3fcf	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Matt Joras	2ca45184dc	Introduce EVENTHANDLER_LIST and some users. This introduces a facility to EVENTHANDLER(9) for explicitly defining a reference to an event handler list. This is useful since previously all invokers of events had to do a locked traversal of the global list of event handler lists in order to find the appropriate event handler list. By keeping a pointer to the appropriate list an invoker can avoid this traversal completely. The pointer is initialized with SYSINIT(9) during the eventhandler stage. Users registering interest in events do not need to know if the event is backed by such a list, since the list is added to the global list of lists. As with lists that are not pre-defined it is safe to register for the events before the list has been created. This converts the process_* and thread_* events to using the new facility, as these are events whose locked traversals end up showing up significantly in ports build workflows (and presumably other workflows with many short lived threads/procs). It may be advantageous to convert other events to using the new facility. The el_flags field is now unused, but leave it be so that this revision can be MFC'd. Reviewed by: bdrewery, markj, mjg Approved by: rstone (mentor) In collaboration with: ian MFC after: 4 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12814	2017-11-09 22:51:48 +00:00
Dag-Erling Smørgrav	008a09355b	If the user tries to set kern.randompid to 1 (which is meaningless), set it to a random value between 100 and 1123, rather than 0 as before. Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5336	2017-09-10 15:01:29 +00:00
Konstantin Belousov	2d88da2f06	Move struct syscall_args syscall arguments parameters container into struct thread. For all architectures, the syscall trap handlers have to allocate the structure on the stack. The structure takes 88 bytes on 64bit arches which is not negligible. Also, it cannot be easily found by other code, which e.g. caused duplication of some members of the structure to struct thread already. The change removes td_dbg_sc_code and td_dbg_sc_nargs which were directly copied from syscall_args. The structure is put into the copied on fork part of the struct thread to make the syscall arguments information correct in the child after fork. This move will also allow several more uses shortly. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 21:03:23 +00:00
Gleb Smirnoff	83c9dea1ba	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
Eric Badger	82a4538f31	Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC In collaboration with: jhb Differential Revision: https://reviews.freebsd.org/D9260	2017-02-20 15:53:16 +00:00
Mateusz Guzik	5afb134c32	vfs: add vrefact, to be used when the vnode has to be already active This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version)	2016-12-12 15:37:11 +00:00
Konstantin Belousov	643f6f47fd	Add PROC_TRAPCAP procctl(2) controls and global sysctl kern.trap_enocap. Both can be used to cause processes in capability mode to receive SIGTRAP when ENOTCAPABLE or ECAPMODE errors are returned from syscalls. Idea by: emaste Reviewed by: oshogbo (previous version), emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D7965	2016-09-21 08:23:33 +00:00
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Mark Johnston	e5574e0966	Don't set P2_PTRACE_FSTP in a process that invokes ptrace(PT_TRACE_ME). Such processes are stopped synchronously by a direct call to ptracestop(SIGTRAP) upon exec. P2_PTRACE_FSTP causes the exec()ing thread to suspend itself while waiting for a SIGSTOP that never arrives. Reviewed by: kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D7576	2016-08-19 17:57:14 +00:00
Konstantin Belousov	e69ba32f88	Remove mention of the Giant from the fork_return() description. Making emphasis on this lock in the core function comment is confusing for the modern kernel. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-08-03 07:10:09 +00:00
Konstantin Belousov	b7a25e63b6	When a debugger attaches to the process, SIGSTOP is sent to the target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256	2016-07-28 08:41:13 +00:00
John Baldwin	fc4f075a1a	Add PTRACE_VFORK to trace vfork events. First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when the new child was created via vfork() rather than fork(). Second, a new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK event mask. This new stop is reported after the vfork parent resumes due to the child calling exit or exec. Debuggers can use this stop to reinsert breakpoints in the vfork parent process before it resumes. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7045	2016-07-18 14:53:55 +00:00
John Baldwin	8d570f64aa	Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044	2016-07-15 15:32:09 +00:00
Konstantin Belousov	9e590ff04b	When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859	2016-06-27 21:52:17 +00:00

1 2 3 4 5 ...

437 Commits