freebsd-skq

Author	SHA1	Message	Date
rmacklem	9f795af55b	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
jhb	efd93357ab	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
imp	48b94864c5	Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs	2020-02-03 17:35:11 +00:00
mjg	3f1b6975a2	Remove duplicated empty lines from kern/*.c No functional changes.	2020-01-30 20:05:05 +00:00
oshogbo	4ae67fb7ab	procdesc: allow to collect status through wait(1) if process is traced The debugger like truss(1) depends on the wait(2) syscall. This syscall waits for ALL children. When it is waiting for ALL child's the children created by process descriptors are not returned. This behavior was introduced because we want to implement libraries which may pdfork(1). The behavior of process descriptor brakes truss(1) because it will not be able to collect the status of processes with process descriptors. To address this problem the status is returned to parent when the child is traced. While the process is traced the debugger is the new parent. In case the original parent and debugger are the same process it means the debugger explicitly used pdfork() to create the child. In that case the debugger should be using kqueue()/pdwait() instead of wait(). Add test case to verify that. The test case was implemented by markj@. Reviewed by: kib, markj Discussed with: jhb MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D20362	2019-11-25 18:33:21 +00:00
mjg	ad39028102	proc: clear pid bitmap entry after dropping proctree lock There is no correctness change here, but the procid lock is contended in the fork path and taking it while holding proctree avoidably extends its hold time. Note that there are other ids which can end up getting cleared with the lock. Sponsored by: The FreeBSD Foundation	2019-09-02 12:46:43 +00:00
mjg	02285c0ca1	proc: eliminate the zombproc list It is not needed by anything in the kernel and it slightly drives up contention on both proctree and allproc locks. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21447	2019-08-28 16:18:23 +00:00
oshogbo	2de02c99a3	exit1: fix style nits MFC after: 1 month	2019-08-05 20:20:14 +00:00
oshogbo	1c70fdd895	proc: introduce the proc_add_orphan function This API allows adding the process to its parent orphan list. Reviewed by: kib, markj MFC after: 1 month	2019-08-05 20:11:57 +00:00
oshogbo	20c844416d	exit1: postpone clearing P_TRACED flag until the proctree lock is acquired In case of the process being debugged. The P_TRACED is cleared very early, which would make procdesc_close() not calling proc_clear_orphan(). That would result in the debugged process can not be able to collect status of the process with process descriptor. Reviewed by: markj, kib Tested by: pho MFC after: 1 month	2019-08-05 19:59:23 +00:00
oshogbo	7916abf576	proc: make clear_orphan an public API This will be useful for other patches with process descriptors. Change its name as well. Reviewed by: markj, kib	2019-07-29 21:42:57 +00:00
cem	3038f1af7b	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
cem	250e158ddf	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
markj	6e75ac3373	Set the p_oppid field of orphans when exiting. Such processes will be reparented to the reaper when the current parent is done with them (i.e., ptrace detached), so p_oppid must be updated accordingly. Add a regression test to exercise this code path. Previously it would not be possible to reap an orphan with a stale oppid. Reviewed by: kib, mjg Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19825	2019-04-07 14:26:14 +00:00
mjg	af8321b07f	proc: handle sdt exit probe before taking the proc lock Sponsored by: The FreeBSD Foundation	2018-12-08 06:31:43 +00:00
mjg	73f834bfbb	proc: when exiting move to zombproc before taking proctree The kernel was already doing this prior to r329615. It was changed to reduce contention on allproc. However, introduction of pidhash locks and removal of proctree -> allproc ordering from fork thanks to bitmaps fixed things enough to make this change pessimal. waitpid takes proctree on each call and this change (now) causes avoidable stalls if allproc is held. Sponsored by: The FreeBSD Foundation	2018-12-07 12:32:25 +00:00
mjg	6b8dd99954	Manage process-related IDs with bitmaps Currently unique pid allocation on fork often requires a full walk of process, group, session lists to make sure it is not used by anything. This has a side effect of requiring proctree to be held along with allproc, which adds more contention in poudriere -j 128. The patch below implements trivial bitmaps which gets rid of the problem. Dedicated lock is introduced to manage IDs. While here a bug was discovered: all processes would inherit reap id from the first process spawned by init. This had a side effect of keeping the ID used and when allocation rolls over to the beginning it keeps being skipped. The patch is loosely based on initial work by mjoras@. Reviewed by: kib Sponsored by: The FreeBSD Foundation	2018-12-07 12:22:32 +00:00
mjg	db4060a9d0	proc: create a dedicated lock for zombproc to ligthen the load on allproc_lock waitpid always takes proctree to evaluate the list, but only takes allproc if it can reap. With this patch allproc is no longer taken, which helps during poudriere -j 128. Discussed with: kib Sponsored by: The FreeBSD Foundation	2018-11-29 02:52:08 +00:00
mjg	2dadc8e0dd	Revert "fork: fix use-after-free with vfork" This unreliably breaks libc handling of vfork where forking succeded, but execve did not. vfork code in libc performs waitpid with WNOHANG in case of failed exec. With the fix exit codepath was waking up the parent before the child fully transitioned to a zombie. Woken up parent would waitpid, which could find a not-yet-zombie child and fail to reap it due to the WNOHANG flag. While removing the flag fixes the problem, it is not an option due to older releases which would still suffer from the kernel change. Revert the fix until a solution can be worked out. Note that while use-after-free which gets back due to the revert is a real bug, it's side-effects are limited due to the fact that struct proc memory is never released by UMA.	2018-11-23 04:38:50 +00:00
mjg	75deef51a7	fork: fix use-after-free with vfork The pointer to the child is stored without any reference held. Then it is blindly used to wait until P_PPWAIT is cleared. However, if the child is autoreaped it could have exited and get freed before the parent started waiting. Use the existing hold mechanism to mitigate the problem. Most common case of doing exec remains unchanged. The corner case of doing exit performs wake up before waiting for holds to clear. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18295	2018-11-22 21:08:37 +00:00
mjg	b51585a153	proc: update list manipulation comment on process exit Processes stay in the hash until they get reaped. This code does not unlink the child from the parent, so remove the claim that it does. Sponsored by: The FreeBSD Foundation	2018-11-21 22:16:10 +00:00
mjg	71aabf21a1	proc: implement pid hash locks and an iterator forks, exits and waits are frequently stalled during poudriere -j 128 runs due to killpg and process list exports performed for each package. Both uses take the allproc lock. The latter case can be modified to iterate over the hash with finer grained locking instead. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17817	2018-11-21 18:56:15 +00:00
mjg	4493b1d3a8	proc: always store parent pid in p_oppid Doing so removes the dependency on proctree lock from sysctl process list export which further reduces contention during poudriere -j 128 runs. Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17825	2018-11-16 17:07:54 +00:00
kib	a8cb340144	Add PROC_PDEATHSIG_SET to procctl interface. Allow processes to request the delivery of a signal upon death of their parent process. Supposed consumer of the feature is PostgreSQL. Submitted by: Thomas Munro Reviewed by: jilles, mjg MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D15106	2018-04-18 21:31:13 +00:00
brooks	9d79658aab	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
mjg	25e9d331dc	Reduce contention on the proctree lock during heavy package build. There is a proctree -> allproc ordering established. Most of the time it is either xlock -> xlock or slock -> slock. On fork however there is a slock -> xlock pair which results in pathological wait times due to threads keeping proctree held for reading and all waiting on allproc. Switch this to xlock -> xlock. Longer term fix would get rid of proctree in this place to begin with. Right now it is necessary to walk the session/process group lists to determine which id is free. The walk can be avoided e.g. with bitmaps. The exit path used to have one place which dealt with allproc and then with proctree. Move the allproc acquire into the section protected by proctree. This reduces contention against threads waiting on proctree in the fork codepath - the fork proctree holder does not have to wait for allproc as often. Finally, move tidhash manipulation outside of the area protected by either of these locks. The removal from the hash was already unprotected. There is no legitimate reason to look up thread ids for a process still under construction. This results in about 50% wait time reduction during -j 128 package build.	2018-02-20 02:18:30 +00:00
mjg	7fcf37e646	Fix process exit vs reap race introduced in r329449 The race manifested itself mostly in terms of crashes with "spin lock held too long". Relevant parts of respective code paths: exit: reap: PROC_LOCK(p); PROC_SLOCK(p); p->p_state == PRS_ZOMBIE PROC_UNLOCK(p); PROC_LOCK(p); /* exit work / if (p->p_state == PRS_ZOMBIE) / true / proc_reap() free proc / more exit work */ PROC_SUNLOCK(p); Thus a still exiting process is reaped. Prior to the change the zombie check was followed by slock/sunlock trip which prevented the problem. Even code prior to this commit has a bug: the proc is still accessed for statistic collection purposes. However, the severity is rather small and the bug may be fixed in a future commit. Reported by: many Tested by: allanjude	2018-02-19 00:54:08 +00:00
mjg	7669ad8f23	exit: get rid of PROC_SLOCK when checking a process to report, take #2 The suspension counter needs synchronisation through slock, but we don't need it to check if inspecting the counter is necessary to begin with. In the common case it is not, thus avoid the lock if possible. Reviewed by: kib Tested by: pho	2018-02-18 21:07:15 +00:00
mjg	075146c37f	Revert r329448. Turns out is is actually racy, reproducible with stress2/misc/truss.sh Requested by: kib	2018-02-17 17:23:43 +00:00
mjg	018f299fa7	exit: stop doing PROC_SLOCK just to call proc_reap It immediately does PROC_SUNLOCK anyway and the lock plays no role.	2018-02-17 09:03:11 +00:00
mjg	fc7474d6fb	exit: get rid of PROC_SLOCK when checking a process to report All accessed fields are protected with already held process lock.	2018-02-17 08:48:45 +00:00
mjg	7616e6b9b0	On process exit signal the parent after dropping the proctree lock.	2018-02-17 00:24:50 +00:00
mjg	6b281480bb	Unref the prison after proctree is dropped.	2018-02-17 00:23:56 +00:00
mjg	c1f86f952d	Tidy up kern_wait6 - don't relock curproc in msleep - don't relock proctree if P_STATCHILD is spotted - reformat the proc_to_reap call in the main loop	2018-02-17 00:21:50 +00:00
jeff	94c7af8ca2	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
mjoras	9c18ca3bd2	Introduce EVENTHANDLER_LIST and some users. This introduces a facility to EVENTHANDLER(9) for explicitly defining a reference to an event handler list. This is useful since previously all invokers of events had to do a locked traversal of the global list of event handler lists in order to find the appropriate event handler list. By keeping a pointer to the appropriate list an invoker can avoid this traversal completely. The pointer is initialized with SYSINIT(9) during the eventhandler stage. Users registering interest in events do not need to know if the event is backed by such a list, since the list is added to the global list of lists. As with lists that are not pre-defined it is safe to register for the events before the list has been created. This converts the process_* and thread_* events to using the new facility, as these are events whose locked traversals end up showing up significantly in ports build workflows (and presumably other workflows with many short lived threads/procs). It may be advantageous to convert other events to using the new facility. The el_flags field is now unused, but leave it be so that this revision can be MFC'd. Reviewed by: bdrewery, markj, mjg Approved by: rstone (mentor) In collaboration with: ian MFC after: 4 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12814	2017-11-09 22:51:48 +00:00
kib	34f9433195	Avoid reusing p_ksi while it is on queue. When sending SIGCHLD informing reaper that a zombie was reparented to it, we might race with the situation where the previous parent still not finished delivering SIGCHLD and having its p_ksi structure on the signal queue. While on queue, the ksi should not be used for another send. Fix this by copying p_ksi into newly allocated ksi, which is directly put onto reaper sigqueue. The later ensures that siginfo for reaper SIGCHLD is always present, similar to guarantees for siginfo of child. Reported by: bdrewery Discussed with: jilles Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-12 13:58:51 +00:00
kib	9973ae075e	Do not leak mount references for dying threads. Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-25 10:38:18 +00:00
kib	16d32c998e	When a zombie gets reparented due to the parent exit, send SIGCHLD to the reaper. The traditional reaper init(8) is aware of zombies silently reparented to it after the parents exit, it loops around waitpid(2) to collect them. For other reapers, the silent reparenting is surprising and collecting zombies requires a thread blocking in waitpid(2) just for that purpose. It seems that sending second SIGCHLD is a better workaround than forcing all reapers to obey the setup. Reported by: Michael Zuo <muh.muhten@gmail.com>, jilles PR: 213928 Reviewed by: jilles (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-12 11:11:50 +00:00
kib	aefd35d8c2	Restructure the code to handle reporting of non-exited processes from wait(2). - Do not acquire the process spinlock if neither WTRAPPED nor WUNTRACED options were passed [1]. - Extract the code to report alive process into a new helper report_alive_proc() and use it for trapped, stopped and continued childrens. Note that the process spinlock is required around the WTRAPPED and WUNTRACED tests, because P_STOPPED_TRACE and P_STOPPED_SIG flags are set before other threads are stopped at the suspension point, and that threads increment p_suspcount while owning only the process spinlock, the process lock is dropped by them. If the spinlock is not taken for tests, the syscall thread might miss both p_suspcount increment and wakeup in wakeup in thread_suspend_switch(). Based on the submission by: mjg [1] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-04 20:44:58 +00:00
mjg	cbc182c172	wait: avoid relocking the child if proc_to_reap returns 1 proc_to_reap would always unlock. However, if it returned 1, kern_wait6 would immediately lock it again. Save the dance. Reviewed by: kib	2016-11-24 18:21:48 +00:00
emaste	00b67b15b9	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
jilles	b8f715c74f	wait: Do not copyout uninitialized status/rusage/wrusage. If wait4() or wait6() return 0 because of WNOHANG, the status, rusage and wrusage information should not be returned. PR: 212048 Reported by: Casey Lucas MFC after: 2 weeks	2016-09-09 21:58:48 +00:00
kib	6ebb9a02fc	When a debugger attaches to the process, SIGSTOP is sent to the target. Due to a way issignal() selects the next signal to deliver and report, if the simultaneous or already pending another signal exists, that signal might be reported by the next waitpid(2) call. This causes minor annoyance for debuggers, which must be prepared to take any signal as the first event, then filter SIGSTOP later. More importantly, for tools like gcore(1), which attach and then detach without processing events, SIGSTOP might leak to be delivered after PT_DETACH. This results in the process being unintentionally stopped after detach, which is fatal for automatic tools. The solution is to force SIGSTOP to be the first signal reported after the attach. Attach code is modified to set P2_PTRACE_FSTP to indicate that the attaching ritual was not yet finished, and issignal() prefers SIGSTOP in that condition. Also, the thread which handles P2_PTRACE_FSTP is made to guarantee to own p_xthread during the first waitpid(2). All that ensures that SIGSTOP is consumed first. Additionally, if P2_PTRACE_FSTP is still set on detach, which means that waitpid(2) was not called at all, SIGSTOP is removed from the queue, ensuring that the process is resumed on detach. In issignal(), when acting on STOPing signals, remove the signal from queue before suspending. Otherwise parallel attach could result in ptracestop() acting on that STOP as if it was the STOP signal from the attach. Then SIGSTOP from attach leaks again. As a minor refactoring, some bits of the common attach code is moved to new helper proc_set_traced(). Reported by: markj Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7256	2016-07-28 08:41:13 +00:00
kib	368a94767c	Fix another bug after r302350. Reported and tested by: pho PR: 210884 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-07-18 04:30:34 +00:00
jhb	91d07047c4	Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7044	2016-07-15 15:32:09 +00:00
kib	15841a7afe	When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859	2016-06-27 21:52:17 +00:00
kib	8da898f26c	Add implementation of robust mutexes, hopefully close enough to the intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013. A robust mutex is guaranteed to be cleared by the system upon either thread or process owner termination while the mutex is held. The next mutex locker is then notified about inconsistent mutex state and can execute (or abandon) corrective actions. The patch mostly consists of small changes here and there, adding neccessary checks for the inconsistent and abandoned conditions into existing paths. Additionally, the thread exit handler was extended to iterate over the userspace-maintained list of owned robust mutexes, unlocking and marking as terminated each of them. The list of owned robust mutexes cannot be maintained atomically synchronous with the mutex lock state (it is possible in kernel, but is too expensive). Instead, for the duration of lock or unlock operation, the current mutex is remembered in a special slot that is also checked by the kernel at thread termination. Kernel must be aware about the per-thread location of the heads of robust mutex lists and the current active mutex slot. When a thread touches a robust mutex for the first time, a new umtx op syscall is issued which informs about location of lists heads. The umtx sleep queues for PP and PI mutexes are split between non-robust and robust. Somewhat unrelated changes in the patch: 1. Style. 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared pi mutexes. 3. Removal of the userspace struct pthread_mutex m_owner field. 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls the lifetime of the shared mutex associated with a vnode' page. Reviewed by: jilles (previous version, supposedly the objection was fixed) Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects) Tested by: pho Sponsored by: The FreeBSD Foundation	2016-05-17 09:56:22 +00:00
mjg	eabf748a18	session: avoid proctree lock on proc exit when possible We can get away with the common case with only proc lock held. Reviewed by: kib	2016-01-20 23:33:58 +00:00

1 2 3 4 5 ...

444 Commits