freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	5cbdd18fd4	Make umtxq_check_susp() to correctly handle thread exit requests. The check for P_SINGLE_EXIT was shadowed by the (P_SHOULDSTOP \|\| traced) check. Reported by: bdrewery (might be) Reviewed by: markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21124	2019-08-01 14:34:27 +00:00
Konstantin Belousov	fd336e2ac0	Fix handling of transient casueword(9) failures in do_sem_wait(). In particular, restart should be only done when the failure is transient. For this, recheck the count1 value after the operation. Note that do_sem_wait() is older usem interface. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-07-31 19:16:49 +00:00
Konstantin Belousov	9f7b7da5bf	In do_sem2_wait(), balance umtx_key_get() with umtx_key_release() on retry. Reported by: ler Bisected and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 12 days	2019-07-15 19:18:25 +00:00
Konstantin Belousov	ad038195bd	In do_lock_pi(), do not return prematurely. If umtxq_check_susp() indicates an exit, we should clean the resources before returning. Do it by breaking out of the loop and relying on post-loop cleanup. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 12 days Differential revision: https://reviews.freebsd.org/D20949	2019-07-15 08:39:52 +00:00
Konstantin Belousov	40bd868ba7	Correctly check for casueword(9) success in do_set_ceiling(). After r349951, the return code must be checked instead of old == new comparision. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 12 days Differential revision: https://reviews.freebsd.org/D20949	2019-07-15 08:38:01 +00:00
Konstantin Belousov	30b3018d48	Provide protection against starvation of the ll/sc loops when accessing userpace. Casueword(9) on ll/sc architectures must be prepared for userspace constantly modifying the same cache line as containing the CAS word, and not loop infinitely. Otherwise, rogue userspace livelocks the kernel. To fix the issue, change casueword(9) interface to return new value 1 indicating that either comparision or store failed, instead of relying on the oldval == *oldvalp comparison. The primitive no longer retries the operation if it failed spuriously. Modify callers of casueword(9), all in kern_umtx.c, to handle retries, and react to stops and requests to terminate between retries. On x86, despite cmpxchg should not return spurious failures, we can take advantage of the new interface and just return PSL.ZF. Reviewed by: andrew (arm64, previous version), markj Tested by: pho Reported by: https://xenbits.xen.org/xsa/advisory-295.txt Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D20772	2019-07-12 18:43:24 +00:00
Konstantin Belousov	7fde3c6b28	More style. Re-wrap long lines, reformat comments, remove excessive blank line. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-07-02 21:03:06 +00:00
Konstantin Belousov	4b8b28e130	Style. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-07-02 19:32:48 +00:00
Konstantin Belousov	2d7a555294	Style. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-06-28 20:40:54 +00:00
Mateusz Guzik	ac97da9ad8	Reduce umtx-related work on exec and exit - there is no need to take the process lock to iterate the thread list after single-threading is enforced - typically there are no mutexes to clean up (testable without taking the global umtx lock) - typically there is no need to adjust the priority (testable without taking thread lock) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20160	2019-05-08 16:30:38 +00:00
Mateusz Guzik	6017827676	umtx: avoid umtxshm locking on object termination if possible Sample build world result on tmpfs: kern.ipc.umtx_terminate_notempty: 0 kern.ipc.umtx_terminate_empty: 2891815 Sponsored by: The FreeBSD Foundation	2018-12-08 14:04:57 +00:00
Brooks Davis	b34f4419fb	Make freebsd32_umtx_op follow the freebsd32_foo convention. Sponsored by: DARPA, AFRL	2018-11-09 00:46:10 +00:00
Alan Somers	6040822c4e	Make timespecadd(3) and friends public The timespecadd(3) family of macros were imported from NetBSD back in r35029. However, they were initially guarded by #ifdef _KERNEL. In the meantime, we have grown at least 28 syscalls that use timespecs in some way, leading many programs both inside and outside of the base system to redefine those macros. It's better just to make the definitions public. Our kernel currently defines two-argument versions of timespecadd and timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define three-argument versions. Solaris also defines a three-argument version, but only in its kernel. This revision changes our definition to match the common three-argument version. Bump _FreeBSD_version due to the breaking KPI change. Discussed with: cem, jilles, ian, bde Differential Revision: https://reviews.freebsd.org/D14725	2018-07-30 15:46:40 +00:00
Matt Macy	e1a92f058f	umtx: don't call umtxq_getchain unless the value is needed	2018-05-19 05:09:10 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Brooks Davis	91a743004c	Use umtx_copyin_umtx_time32() in __umtx_op_lock_umutex_compat32(). Non-NULL timeouts where copied in improperly and could produce failures due to incompatible data structures. Reviewed by: kib MFC after: 3 days Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14587	2018-03-06 01:52:04 +00:00
Pedro F. Giffuni	8a36da99de	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:20:12 +00:00
Konstantin Belousov	30c438723d	Convert explicit panic() call to assert. Based on github pull request: #113 Submitted by: pmarillo@github MFC after: 1 week	2017-11-04 10:49:34 +00:00
Eric van Gyzen	9dbdf2a169	When the RTC is adjusted, reevaluate absolute sleep times based on the RTC POSIX 2008 says this about clock_settime(2): If the value of the CLOCK_REALTIME clock is set via clock_settime(), the new value of the clock shall be used to determine the time of expiration for absolute time services based upon the CLOCK_REALTIME clock. This applies to the time at which armed absolute timers expire. If the absolute time requested at the invocation of such a time service is before the new value of the clock, the time service shall expire immediately as if the clock had reached the requested time normally. Setting the value of the CLOCK_REALTIME clock via clock_settime() shall have no effect on threads that are blocked waiting for a relative time service based upon this clock, including the nanosleep() function; nor on the expiration of relative timers based upon this clock. Consequently, these time services shall expire when the requested relative interval elapses, independently of the new or old value of the clock. When the real-time clock is adjusted, such as by clock_settime(3), wake any threads sleeping until an absolute real-clock time. Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions will set that field to zero and return zero to tell the caller to reevaluate its sleep duration based on the new value of the clock. At present, this affects the following functions: pthread_cond_timedwait(3) pthread_mutex_timedlock(3) pthread_rwlock_timedrdlock(3) pthread_rwlock_timedwrlock(3) sem_timedwait(3) sem_clockwait_np(3) I'm working on adding clock_nanosleep(2), which will also be affected. Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed by: jhb, kib MFC after: 2 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9791	2017-03-14 19:06:44 +00:00
Eric van Gyzen	b215ceaaec	Add sem_clockwait_np() This function allows the caller to specify the reference clock and choose between absolute and relative mode. In relative mode, the remaining time can be returned. The API is similar to clock_nanosleep(3). Thanks to Ed Schouten for that suggestion. While I'm here, reduce the sleep time in the semaphore "child" test to greatly reduce its runtime. Also add a reasonable timeout. Reviewed by: ed (userland) MFC after: 2 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9656	2017-02-23 19:36:38 +00:00
Adrian Chadd	0046bef85a	[mips] make UMTX_CHAINS configurable at compile time. The default (512) wastes quite a bit of space which doesn't really buy us much on highly embedded systems which don't take a lot of locks in parallel. This makes it at least build time configurable so people can experiment.	2016-11-15 01:34:38 +00:00
Konstantin Belousov	0f2d97838d	In both do_rw_wrlock() and do_rw_rdlock() after r304808, do not obliterate possible error from sleep with errors from umtxq_check_susp(), when looping to clear URWLOCK_{READ,WRITE}_WAITERS. Noted and reviewed by: vangyzen Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-25 19:15:02 +00:00
Konstantin Belousov	28e21133f3	Prevent leak of URWLOCK_READ_WAITERS flag for urwlocks. If there was some error, e.g. the sleep was interrupted, as in the referenced PR, do_rw_rdlock() did not cleared URWLOCK_READ_WAITERS. Since unlock only wakes up write waiters when there is no read waiters, for URWLOCK_PREFER_READER kind of locks, the result was missed wakeups for writers. In particular, the most visible victims are ld-elf.so locks in processes which loaded libthr, because rtld locks are urwlocks in prefer-reader mode. Normal rwlocks fall into prefer-reader mode only if thread already owns rw lock in read mode, which is not typical and correspondingly less visible. In the PR, unowned rtld bind lock was waited for in the process where only one thread was left alive. Note that do_rw_wrlock() correctly clears URWLOCK_WRITE_WAITERS in case of errors. Reported and tested by: longwitz@incore.de PR: 211947 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-25 16:35:42 +00:00
Eric Badger	b0f2185bbe	sem_post(): wake up the sleeper only after adjusting has_waiters If the caller of sem_post() wakes up a thread sleeping via sem_wait() before it clears the has_waiters flag, the caller of sem_wait() has no way of knowing when it is safe to destroy the semaphore and reuse the memory. This is because the caller of sem_post() may be interrupted between the wake step and the clearing of has_waiters. It will then write into the has_waiters flag in userspace after being preempted for some unknown amount of time. Reviewed by: jhb, kib, vangyzen Approved by: kib (mentor), vangyzen (mentor) MFC after: 2 weeks Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D7505	2016-08-15 20:09:09 +00:00
Konstantin Belousov	2a339d9e3d	Add implementation of robust mutexes, hopefully close enough to the intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013. A robust mutex is guaranteed to be cleared by the system upon either thread or process owner termination while the mutex is held. The next mutex locker is then notified about inconsistent mutex state and can execute (or abandon) corrective actions. The patch mostly consists of small changes here and there, adding neccessary checks for the inconsistent and abandoned conditions into existing paths. Additionally, the thread exit handler was extended to iterate over the userspace-maintained list of owned robust mutexes, unlocking and marking as terminated each of them. The list of owned robust mutexes cannot be maintained atomically synchronous with the mutex lock state (it is possible in kernel, but is too expensive). Instead, for the duration of lock or unlock operation, the current mutex is remembered in a special slot that is also checked by the kernel at thread termination. Kernel must be aware about the per-thread location of the heads of robust mutex lists and the current active mutex slot. When a thread touches a robust mutex for the first time, a new umtx op syscall is issued which informs about location of lists heads. The umtx sleep queues for PP and PI mutexes are split between non-robust and robust. Somewhat unrelated changes in the patch: 1. Style. 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared pi mutexes. 3. Removal of the userspace struct pthread_mutex m_owner field. 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls the lifetime of the shared mutex associated with a vnode' page. Reviewed by: jilles (previous version, supposedly the objection was fixed) Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects) Tested by: pho Sponsored by: The FreeBSD Foundation	2016-05-17 09:56:22 +00:00
Konstantin Belousov	2cfddaa6ff	Fix umtx lock/trylock for compat32. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-04-19 11:37:43 +00:00
Konstantin Belousov	1bdbd70599	Implement process-shared locks support for libthr.so.3, without breaking the ABI. Special value is stored in the lock pointer to indicate shared lock, and offline page in the shared memory is allocated to store the actual lock. Reviewed by: vangyzen (previous version) Discussed with: deischen, emaste, jhb, rwatson, Martin Simmons <martin@lispworks.com> Tested by: pho Sponsored by: The FreeBSD Foundation	2016-02-28 17:52:33 +00:00
Konstantin Belousov	50657fd342	Minor (and incomplete) style cleanup. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-30 20:47:42 +00:00
Konstantin Belousov	2936e0013c	Also mark compat32 umtx op table as constant. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-30 19:32:30 +00:00
Konstantin Belousov	c539e87014	Use C99 array initialization, which also makes the code self-documented, and eases addition of new ops. For the similar reasons, eliminate UMTX_OP_MAX. nitems() handles the only use of the symbol. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-30 19:20:40 +00:00
Ed Schouten	dc4b532479	Fix bad arithmetic in umtx_key_get() to compute object offset. It looks like umtx_key_get() has the addition and subtraction the wrong way around, meaning that it fails to match in certain cases. This causes the cloudlibc unit tests to deadlock in certain cases. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3287	2015-08-04 06:01:13 +00:00
Ed Schouten	52942c1eae	Add missing const keyword to function parameter. The umtx_key_get() function does not dereference the address off the userspace object. The pointer can safely be const.	2015-08-03 21:11:33 +00:00
Eric van Gyzen	e858b027db	Clean up some cosmetic nits in kern_umtx.c, found during recent work in this area and by the Clang static analyzer. Remove some dead assignments. Fix a typo in a panic string. Use umtx_pi_disown() instead of duplicate code. Use an existing variable instead of curthread. Approved by: kib (mentor) MFC after: 3 days Sponsored by: Dell Inc	2015-03-28 21:21:40 +00:00
Konstantin Belousov	13dad10871	The umtx_lock mutex is used by top-half of the kernel, but is currently a spin lock. Apparently, the only reason for this is that umtx_thread_exit() is called under the process spinlock, which put the requirement on the umtx_lock. Note that the witness static order list is wrong for the umtx_lock, umtx_lock is explicitely before any thread lock, so it is also before sleepq locks. Change umtx_lock to be the sleepable mutex. For the reason above, the calls to umtx_thread_exit() are moved from thread_exit() earlier in each caller, when the process spin lock is not yet taken. Discussed with: jhb Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-02-28 04:19:02 +00:00
Konstantin Belousov	84b736b268	When failing to claim ownership of a umtx_pi, restore the umutex owner to its previous, unowned state. This avoids compounding an existing problem of inconsistent ownership. Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com> Obtained from: Dell Inc. PR: 198914 MFC after: 1 week	2015-02-25 16:17:16 +00:00
Konstantin Belousov	cc876d2c5c	When unlocking a contested PI pthread mutex, if the queue of waiters is empty, look up the umtx_pi and disown it if the current thread owns it. This can happen if a signal or timeout removed the last waiter from the queue, but there is still a thread in do_lock_pi() holding a reference on the umtx_pi. The unlocking thread might not own the umtx_pi in this case, but if it does, it must disown it to keep the ownership consistent between the umtx_pi and the umutex. Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com> with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc. Obtained from: Dell Inc. PR: 198914	2015-02-25 16:12:56 +00:00
Konstantin Belousov	6690381ef1	The dependency chain for priority-inheritance mutexes could be subverted by userspace into cycle. Both umtx_propagate_priority() and umtx_repropagate_priority() would then loop infinitely, owning the spinlock. Check for the cycle using standard Floyd' algorithm before doing the pass in the affected functions. Add simple check for condition of tricking the thread into a wait for itself, which could be easily simulated by usermode without race. Found by: Eric van Gyzen <eric@vangyzen.net> In collaboration with: Eric van Gyzen <eric@vangyzen.net> Tested by: pho MFC after: 1 week	2015-01-31 12:27:40 +00:00
Konstantin Belousov	416be7a1c6	Fix assertion, &uc->uc_busy is never zero, the intent is to test the uc_busy value, and not its address [1]. Remove the single use of the macro, write KASSERT() explicitely in the code of umtxq_sleep_pi(). Submitted by: Eric van Gyzen <eric@vangyzen.net> [1] MFC after: 1 week	2014-11-13 18:51:09 +00:00
Konstantin Belousov	2361c6d135	Add type qualifier volatile to the base (userspace) address argument of fuword(9) and suword(9). This makes the functions type-compatible with volatile objects and does not require devolatile force, e.g. in kern_umtx.c. Requested by: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-10-31 17:43:21 +00:00
Konstantin Belousov	f7e91c288a	Convert kern_umtx.c to use fueword() and casueword(). Also fix some mishandling of suword(9) errors as errno, which resulted in spurious ERESTART. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:30:33 +00:00
John Baldwin	1bc9ea1caa	Use correct type in __DEVOLATILE().	2014-10-25 20:42:47 +00:00
Xin LI	6362e06b42	Fix build.	2014-10-25 00:16:36 +00:00
John Baldwin	53e1ffbbce	The current POSIX semaphore implementation stores the _has_waiters flag in a separate word from the _count. This does not permit both items to be updated atomically in a portable manner. As a result, sem_post() must always perform a system call to safely clear _has_waiters. This change removes the _has_waiters field and instead uses the high bit of _count as the _has_waiters flag. A new umtx object type (_usem2) and two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement these semantics. The older operations are still supported under the COMPAT_FREEBSD9/10 options. The POSIX semaphore API in libc has been updated to use the new implementation. Note that the new implementation is not compatible with the previous implementation. However, this only affects static binaries (which cannot be helped by symbol versioning). Binaries using a dynamic libc will continue to work fine. SEM_MAGIC has been bumped so that mismatched binaries will error rather than corrupting a shared semaphore. In addition, a padding field has been added to sem_t so that it remains the same size. Differential Revision: https://reviews.freebsd.org/D961 Reported by: adrian Reviewed by: kib, jilles (earlier version) Sponsored by: Norse	2014-10-24 20:02:44 +00:00
Konstantin Belousov	fbb6eca60f	In do_lock_pi(), do not override error from umtxq_sleep_pi() when doing suspend check. This restores the pre-r251684 behaviour, to retry once after the signal is detected. PR: kern/192918 Submitted by: Elliott Rabe, Dell Inc., Eric van Gyzen <eric@vangyzen.net> Obtained from: Dell Inc. MFC after: 1 week	2014-08-22 18:42:14 +00:00
Attilio Rao	3198603edd	Fix comments. Sponsored by: EMC / Isilon Storage Division	2014-03-19 12:45:40 +00:00
Attilio Rao	ce42e79310	Remove dead code from umtx support: - Retire long time unused (basically always unused) sys__umtx_lock() and sys__umtx_unlock() syscalls - struct umtx and their supporting definitions - UMUTEX_ERROR_CHECK flag - Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall __FreeBSD_version is not bumped yet because it is expected that further breakages to the umtx interface will follow up in the next days. However there will be a final bump when necessary. Sponsored by: EMC / Isilon storage division Reviewed by: jhb	2014-03-18 21:32:03 +00:00
Konstantin Belousov	1d7466bca4	Fix two issues with the spin loops in the umtx(2) implementation. - When looping, check for the pending suspension. Otherwise, other usermode thread which races with the looping one, could try to prevent the process from stopping or exiting. - Add missed checks for the faults from casuword*(). The code is structured in a way which makes the loops exit if the specified address is invalid, since both fuword() and casuword() return -1 on the fault. But if the address is mapped readonly, the typical value read by fuword() is different from -1, while casuword() returns -1. Absent the checks for casuword() faults, this is interpreted as the race with other thread and causes non-interruptible spinning in the kernel. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-06-13 09:33:22 +00:00
Jilles Tjoelker	1e367efa8b	sem: Restart the POSIX sem_* calls after signals with SA_RESTART set. Programs often do not expect an [EINTR] return from sem_wait() and POSIX only allows it if the signal was installed without SA_RESTART. The timeout in sem_timedwait() is absolute so it can be restarted normally. The umtx call can be invoked with a relative timeout and in that case [ERESTART] must be changed to [EINTR]. However, libc does not do this. The old POSIX semaphore implementation did this correctly (before r249566), unlike the new umtx one. It may be desirable to avoid [EINTR] completely, which matches the pthread functions and is explicitly permitted by POSIX. However, the kernel must return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread cancellation will not abort a semaphore wait. In this commit, only restore the 8.x behaviour which is also permitted by POSIX. Discussed with: jhb MFC after: 1 week	2013-04-19 10:16:00 +00:00
Attilio Rao	d52d7aa871	Fix a bug in UMTX_PROFILING: UMTX_PROFILING should really analyze the distribution of locks as they index entries in the umtxq_chains hash-table. However, the current implementation does add/dec the length counters for every thread insert/removal, measuring at all really userland contention and not the hash distribution. Fix this by correctly add/dec the length counters in the points where it is really needed. Please note that this bug brought us questioning in the past the quality of the umtx hash table distribution. To date with all the benchmarks I could try I was not able to reproduce any issue about the hash distribution on umtx. Sponsored by: EMC / Isilon storage division Reviewed by: jeff, davide MFC after: 2 weeks	2013-03-21 19:58:25 +00:00
Attilio Rao	1fc8c346d5	Improve UMTX_PROFILING: - Use u_int values for length and max_length values - Add a way to reset the max_length heuristic in order to have the possibility to reuse the mechanism consecutively without rebooting the machine - Add a way to quick display top5 contented buckets in the system for the max_length value. This should give a quick overview on the quality of the hash table distribution. Sponsored by: EMC / Isilon storage division Reviewed by: jeff, davide	2013-03-09 15:31:19 +00:00

1 2 3 4

168 Commits