Commit Graph

199 Commits

Author SHA1 Message Date
Eric van Gyzen
b215ceaaec Add sem_clockwait_np()
This function allows the caller to specify the reference clock
and choose between absolute and relative mode.  In relative mode,
the remaining time can be returned.

The API is similar to clock_nanosleep(3).  Thanks to Ed Schouten
for that suggestion.

While I'm here, reduce the sleep time in the semaphore "child"
test to greatly reduce its runtime.  Also add a reasonable timeout.

Reviewed by:	ed (userland)
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9656
2017-02-23 19:36:38 +00:00
Adrian Chadd
0046bef85a [mips] make UMTX_CHAINS configurable at compile time.
The default (512) wastes quite a bit of space which doesn't really buy
us much on highly embedded systems which don't take a lot of locks in
parallel.

This makes it at least build time configurable so people can experiment.
2016-11-15 01:34:38 +00:00
Konstantin Belousov
0f2d97838d In both do_rw_wrlock() and do_rw_rdlock() after r304808, do not
obliterate possible error from sleep with errors from
umtxq_check_susp(), when looping to clear URWLOCK_{READ,WRITE}_WAITERS.

Noted and reviewed by:	vangyzen
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-25 19:15:02 +00:00
Konstantin Belousov
28e21133f3 Prevent leak of URWLOCK_READ_WAITERS flag for urwlocks.
If there was some error, e.g. the sleep was interrupted, as in the
referenced PR, do_rw_rdlock() did not cleared URWLOCK_READ_WAITERS.
Since unlock only wakes up write waiters when there is no read
waiters, for URWLOCK_PREFER_READER kind of locks, the result was
missed wakeups for writers.

In particular, the most visible victims are ld-elf.so locks in
processes which loaded libthr, because rtld locks are urwlocks in
prefer-reader mode.  Normal rwlocks fall into prefer-reader mode only
if thread already owns rw lock in read mode, which is not typical and
correspondingly less visible.  In the PR, unowned rtld bind lock was
waited for in the process where only one thread was left alive.

Note that do_rw_wrlock() correctly clears URWLOCK_WRITE_WAITERS in
case of errors.

Reported and tested by:	longwitz@incore.de
PR:	211947
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-25 16:35:42 +00:00
Eric Badger
b0f2185bbe sem_post(): wake up the sleeper only after adjusting has_waiters
If the caller of sem_post() wakes up a thread sleeping via sem_wait()
before it clears the has_waiters flag, the caller of sem_wait() has no way of
knowing when it is safe to destroy the semaphore and reuse the memory. This is
because the caller of sem_post() may be interrupted between the wake step and
the clearing of has_waiters. It will then write into the has_waiters flag in
userspace after being preempted for some unknown amount of time.

Reviewed by:	jhb, kib, vangyzen
Approved by:	kib (mentor), vangyzen (mentor)
MFC after:	2 weeks
Sponsored by:	Dell Inc.
Differential Revision:	https://reviews.freebsd.org/D7505
2016-08-15 20:09:09 +00:00
Konstantin Belousov
2a339d9e3d Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held.  The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.

Reviewed by:	jilles (previous version, supposedly the objection was fixed)
Discussed with:	brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-05-17 09:56:22 +00:00
Konstantin Belousov
2cfddaa6ff Fix umtx lock/trylock for compat32.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-04-19 11:37:43 +00:00
Konstantin Belousov
1bdbd70599 Implement process-shared locks support for libthr.so.3, without
breaking the ABI.  Special value is stored in the lock pointer to
indicate shared lock, and offline page in the shared memory is
allocated to store the actual lock.

Reviewed by:	vangyzen (previous version)
Discussed with:	deischen, emaste, jhb, rwatson,
	Martin Simmons <martin@lispworks.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-02-28 17:52:33 +00:00
Konstantin Belousov
50657fd342 Minor (and incomplete) style cleanup.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-30 20:47:42 +00:00
Konstantin Belousov
2936e0013c Also mark compat32 umtx op table as constant.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-30 19:32:30 +00:00
Konstantin Belousov
c539e87014 Use C99 array initialization, which also makes the code
self-documented, and eases addition of new ops.

For the similar reasons, eliminate UMTX_OP_MAX.  nitems() handles the
only use of the symbol.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-30 19:20:40 +00:00
Ed Schouten
dc4b532479 Fix bad arithmetic in umtx_key_get() to compute object offset.
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3287
2015-08-04 06:01:13 +00:00
Ed Schouten
52942c1eae Add missing const keyword to function parameter.
The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.
2015-08-03 21:11:33 +00:00
Eric van Gyzen
e858b027db Clean up some cosmetic nits in kern_umtx.c, found during recent work
in this area and by the Clang static analyzer.

Remove some dead assignments.

Fix a typo in a panic string.

Use umtx_pi_disown() instead of duplicate code.

Use an existing variable instead of curthread.

Approved by:	kib (mentor)
MFC after:	3 days
Sponsored by:	Dell Inc
2015-03-28 21:21:40 +00:00
Konstantin Belousov
13dad10871 The umtx_lock mutex is used by top-half of the kernel, but is
currently a spin lock.  Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock.  Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.

Change umtx_lock to be the sleepable mutex.  For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.

Discussed with:	jhb
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-02-28 04:19:02 +00:00
Konstantin Belousov
84b736b268 When failing to claim ownership of a umtx_pi, restore the umutex owner
to its previous, unowned state.  This avoids compounding an existing
problem of inconsistent ownership.

Submitted by:	Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from:	Dell Inc.
PR:	198914
MFC after:	1 week
2015-02-25 16:17:16 +00:00
Konstantin Belousov
cc876d2c5c When unlocking a contested PI pthread mutex, if the queue of waiters
is empty, look up the umtx_pi and disown it if the current thread owns it.
This can happen if a signal or timeout removed the last waiter from
the queue, but there is still a thread in do_lock_pi() holding a reference
on the umtx_pi.  The unlocking thread might not own the umtx_pi in this case,
but if it does, it must disown it to keep the ownership consistent between
the umtx_pi and the umutex.

Submitted by:	Eric van Gyzen <eric_van_gyzen@dell.com>
	with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc.
Obtained from:	Dell Inc.
PR:	198914
2015-02-25 16:12:56 +00:00
Konstantin Belousov
6690381ef1 The dependency chain for priority-inheritance mutexes could be
subverted by userspace into cycle.  Both umtx_propagate_priority() and
umtx_repropagate_priority() would then loop infinitely, owning the
spinlock.

Check for the cycle using standard Floyd' algorithm before doing the
pass in the affected functions.  Add simple check for condition of
tricking the thread into a wait for itself, which could be easily
simulated by usermode without race.

Found by:	Eric van Gyzen <eric@vangyzen.net>
In collaboration with:	Eric van Gyzen <eric@vangyzen.net>
Tested by:	pho
MFC after:	1 week
2015-01-31 12:27:40 +00:00
Konstantin Belousov
416be7a1c6 Fix assertion, &uc->uc_busy is never zero, the intent is to test the
uc_busy value, and not its address [1].

Remove the single use of the macro, write KASSERT() explicitely in the
code of umtxq_sleep_pi().

Submitted by:	Eric van Gyzen <eric@vangyzen.net> [1]
MFC after:	1 week
2014-11-13 18:51:09 +00:00
Konstantin Belousov
2361c6d135 Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).  This makes the functions type-compatible
with volatile objects and does not require devolatile force, e.g. in
kern_umtx.c.

Requested by:	bde
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-10-31 17:43:21 +00:00
Konstantin Belousov
f7e91c288a Convert kern_umtx.c to use fueword() and casueword().
Also fix some mishandling of suword(9) errors as errno, which resulted
in spurious ERESTART.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	3 weeks
2014-10-28 15:30:33 +00:00
John Baldwin
1bc9ea1caa Use correct type in __DEVOLATILE(). 2014-10-25 20:42:47 +00:00
Xin LI
6362e06b42 Fix build. 2014-10-25 00:16:36 +00:00
John Baldwin
53e1ffbbce The current POSIX semaphore implementation stores the _has_waiters flag
in a separate word from the _count.  This does not permit both items to
be updated atomically in a portable manner.  As a result, sem_post()
must always perform a system call to safely clear _has_waiters.

This change removes the _has_waiters field and instead uses the high bit
of _count as the _has_waiters flag.  A new umtx object type (_usem2) and
two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement
these semantics.  The older operations are still supported under the
COMPAT_FREEBSD9/10 options.  The POSIX semaphore API in libc has
been updated to use the new implementation.  Note that the new
implementation is not compatible with the previous implementation.
However, this only affects static binaries (which cannot be helped by
symbol versioning).  Binaries using a dynamic libc will continue to work
fine.  SEM_MAGIC has been bumped so that mismatched binaries will error
rather than corrupting a shared semaphore.  In addition, a padding field
has been added to sem_t so that it remains the same size.

Differential Revision:	https://reviews.freebsd.org/D961
Reported by:	adrian
Reviewed by:	kib, jilles (earlier version)
Sponsored by:	Norse
2014-10-24 20:02:44 +00:00
Konstantin Belousov
fbb6eca60f In do_lock_pi(), do not override error from umtxq_sleep_pi() when
doing suspend check.  This restores the pre-r251684 behaviour, to
retry once after the signal is detected.

PR:	kern/192918
Submitted by:	Elliott Rabe, Dell Inc., Eric van Gyzen <eric@vangyzen.net>
Obtained from:	Dell Inc.
MFC after:	1 week
2014-08-22 18:42:14 +00:00
Attilio Rao
3198603edd Fix comments.
Sponsored by:	EMC / Isilon Storage Division
2014-03-19 12:45:40 +00:00
Attilio Rao
ce42e79310 Remove dead code from umtx support:
- Retire long time unused (basically always unused) sys__umtx_lock()
  and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall

__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jhb
2014-03-18 21:32:03 +00:00
Konstantin Belousov
1d7466bca4 Fix two issues with the spin loops in the umtx(2) implementation.
- When looping, check for the pending suspension.  Otherwise, other
  usermode thread which races with the looping one, could try to
  prevent the process from stopping or exiting.

- Add missed checks for the faults from casuword*().  The code is
  structured in a way which makes the loops exit if the specified
  address is invalid, since both fuword() and casuword() return -1 on
  the fault.  But if the address is mapped readonly, the typical value
  read by fuword() is different from -1, while casuword() returns -1.
  Absent the checks for casuword() faults, this is interpreted as the
  race with other thread and causes non-interruptible spinning in the
  kernel.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-06-13 09:33:22 +00:00
Jilles Tjoelker
1e367efa8b sem: Restart the POSIX sem_* calls after signals with SA_RESTART set.
Programs often do not expect an [EINTR] return from sem_wait() and POSIX
only allows it if the signal was installed without SA_RESTART. The timeout
in sem_timedwait() is absolute so it can be restarted normally.

The umtx call can be invoked with a relative timeout and in that case
[ERESTART] must be changed to [EINTR]. However, libc does not do this.

The old POSIX semaphore implementation did this correctly (before r249566),
unlike the new umtx one.

It may be desirable to avoid [EINTR] completely, which matches the pthread
functions and is explicitly permitted by POSIX. However, the kernel must
return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread
cancellation will not abort a semaphore wait. In this commit, only restore
the 8.x behaviour which is also permitted by POSIX.

Discussed with:	jhb
MFC after:	1 week
2013-04-19 10:16:00 +00:00
Attilio Rao
d52d7aa871 Fix a bug in UMTX_PROFILING:
UMTX_PROFILING should really analyze the distribution of locks as they
index entries in the umtxq_chains hash-table.
However, the current implementation does add/dec the length counters
for *every* thread insert/removal, measuring at all really userland
contention and not the hash distribution.

Fix this by correctly add/dec the length counters in the points where
it is really needed.

Please note that this bug brought us questioning in the past the quality
of the umtx hash table distribution.
To date with all the benchmarks I could try I was not able to reproduce
any issue about the hash distribution on umtx.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff, davide
MFC after:	2 weeks
2013-03-21 19:58:25 +00:00
Attilio Rao
1fc8c346d5 Improve UMTX_PROFILING:
- Use u_int values for length and max_length values
- Add a way to reset the max_length heuristic in order to have the
  possibility to reuse the mechanism consecutively without rebooting
  the machine
- Add a way to quick display top5 contented buckets in the system for
  the max_length value.
  This should give a quick overview on the quality of the hash table
  distribution.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff, davide
2013-03-09 15:31:19 +00:00
Davide Italiano
ba4be2110a The fields of struct timespec32 should be int32_t and not uint32_t.
Make this change.

Reviewed by:	bde, davidxu
Tested by:	pho
MFC after:	1 week
2012-10-27 23:42:41 +00:00
David Xu
d7f97db7bd Some style fixes inspired by @bde. 2012-08-11 23:48:39 +00:00
David Xu
e8afbca2bc tvtohz will print out an error message if a negative value is given
to it, avoid this problem by detecting timeout earlier.

Reported by: pho
2012-08-11 00:06:56 +00:00
Davide Italiano
331805a5d3 Fix some style bugs introduced in a previous commit (r233045)
Reported by:	glebius, jmallet
Reviewed by:	jmallet
Approved by:	gnn (mentor)
MFC after:	2 days
2012-04-14 23:53:31 +00:00
David Xu
8931e524bf In sem_post, the field _has_waiters is no longer used, because some
application destroys semaphore after sem_wait returns. Just enter
kernel to wake up sleeping threads, only update _has_waiters if
it is safe. While here, check if the value exceed SEM_VALUE_MAX and
return EOVERFLOW if this is true.
2012-04-05 03:05:02 +00:00
David Xu
17ce606321 umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily.  On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c
2012-04-05 02:24:08 +00:00
David Xu
8b1eafa723 Remove stale comments. 2012-03-31 06:48:41 +00:00
David Xu
b29d7d9b60 Remove trailing semicolon, it is a typo. 2012-03-30 12:57:14 +00:00
David Xu
0cf573e989 Fix COMPAT_FREEBSD32 build.
Submitted by: Andreas Tobler < andreast at fgznet dot ch >
2012-03-30 09:03:53 +00:00
David Xu
4ed8858df0 Remove trailing space. 2012-03-30 05:49:32 +00:00
David Xu
e05171d939 Merge umtxq_sleep and umtxq_nanosleep into a single function by using
an abs_timeout structure which describes timeout info.
2012-03-30 05:40:26 +00:00
David Xu
d31f470d15 Reduce code size by creating common timed sleeping function. 2012-03-29 02:46:43 +00:00
Davide Italiano
c6111de55d Add rudimentary profiling of the hash table used in the in the umtx code to
hold active lock queues.

Reviewed by:	attilio
Approved by:	davidxu, gnn (mentor)
MFC after:	3 weeks
2012-03-16 20:32:11 +00:00
David Xu
c9b01ed581 initialize clock ID and flags only when copying timespec, a _umtx_time
copy already contains these fields.
2012-02-29 02:01:48 +00:00
David Xu
24c209494a Follow changes made in revision 232144, pass absolute timeout to kernel,
this eliminates a clock_gettime() syscall.
2012-02-27 13:38:52 +00:00
David Xu
df1f1bae9e In revision 231989, we pass a 16-bit clock ID into kernel, however
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.
2012-02-25 02:12:17 +00:00
David Xu
f911d9fa4d Fix typo. 2012-02-22 07:34:23 +00:00
David Xu
b13a8fa78f Use unused fourth argument of umtx_op to pass flags to kernel for operation
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.
2012-02-22 03:22:49 +00:00
David Xu
29a06690ca Eliminate branch and insert an explicit reader memory barrier to ensure
that waiter bit is set before reading semaphore count.
2012-01-16 04:39:10 +00:00
Peter Holm
662ebe9b53 Add umtx_copyin_timeout() and move parameter checks here.
In collaboration with:	kib
MFC after:	1 week
2011-12-03 12:30:58 +00:00
Peter Holm
ff77dfb0c1 Rename copyin_timeout32 to umtx_copyin_timeout32 and move parameter
check here. Include check for negative seconds value.

In collaboration with:	kib
MFC after:	1 week
2011-12-03 12:28:33 +00:00
Ed Schouten
6472ac3d8a Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
John Baldwin
3379ac59af Expose the umtx_key structure and API to the rest of the kernel.
MFC after:	3 days
2011-02-23 13:19:14 +00:00
David Xu
c8e368a933 - Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
  lowered, repropagete priorities, this may cause mutex owner's priority
  to be lowerd, in old code, mutex owner's priority is rise-only.
2010-12-29 09:26:46 +00:00
David Xu
1c45127bd3 Enlarge hash table for new condition variable. 2010-12-23 03:12:03 +00:00
David Xu
d1078b0b03 MFp4:
- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
  condition variable, this should eliminate an extra system call to get
  current time.

- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
  system call. Create userland sleep queue for condition variable, in most
  cases, thread will wait in the queue, the pthread_cond_signal will defer
  thread wakeup until the mutex is unlocked, it tries to avoid an extra
  system call and a extra context switch in time window of pthread_cond_signal
  and pthread_mutex_unlock.

The changes are part of process-shared mutex project.
2010-12-22 05:01:52 +00:00
Matthew D Fleming
e0f389c8d3 One of the compat32 functions was copying in a raw timespec, instead of
a 32-bit one.  This can cause weird timeout issues, as the copying reads
garbage from the user.

Code by:     Deepak Veliath <deepak dot veliath at isilon dot com>
MFC after:   1 week
2010-12-15 19:30:44 +00:00
David Xu
acbe332a58 MFp4:
It is possible a lower priority thread lending priority to higher priority
thread, in old code, it is ignored, however the lending should always be
recorded, add field td_lend_user_pri to fix the problem, if a thread does
not have borrowed priority, its value is PRI_MAX.

MFC after: 1 week
2010-12-09 02:42:02 +00:00
David Xu
b169d0efa1 Use atomic instruction to set _has_writer, otherwise there is a race
causes userland to not wake up a thread sleeping in kernel.

MFC after: 3 days
2010-11-22 02:42:02 +00:00
David Xu
32c63db519 Only unlock process if a thread is found. 2010-11-15 07:33:54 +00:00
David Xu
cf7d9a8ca8 Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian
2010-10-09 02:50:23 +00:00
David Xu
df7442533c If a thread is removed from umtxq while sleeping, reset error code
to zero, this gives userland a better indication that a thread needn't
to be cancelled.
2010-08-25 03:14:32 +00:00
Ed Schouten
60ae52f785 Use ISO C99 integer types in sys/kern where possible.
There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.
2010-06-21 09:55:56 +00:00
Nathan Whitehorn
841c0c7ec7 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
David Xu
8251549f27 In function umtxq_insert_queue, use parameter q (shared/exclusive queue)
instead of hard coded constant. This does not affect RELENG_8 and previous,
because the code only exists in the HEAD.
2010-02-10 05:47:34 +00:00
David Xu
93f0162799 Set waiters flag before checking semaphore's counter,
otherwise we might lose a wakeup. Tested on postgresql database server.
2010-02-08 07:31:05 +00:00
David Xu
43271eacad Fix comments in do_sem_wait(). 2010-02-03 07:21:20 +00:00
David Xu
676e6574a1 After busied the lock, re-read state word before checking waiters flag,
otherwise, the waiters bit may not be set and a wakeup is lost.

Submitted by:	justin.teller at gmail dot com
MFC after:	3 days
2010-02-03 03:56:32 +00:00
David Xu
a4b0b4b062 Make a chain be a list of queues, and make threads waiting
for same key coalesce to same queue, this makes searching
path shorter and improves performance.
Also fix comments about shared PI-mutex.
2010-01-10 09:31:57 +00:00
David Xu
73532aa78c Use enum to define key types.
Suggested by:	jmallett
2010-01-09 06:30:40 +00:00
David Xu
4904f91fe0 put semaphore waiter in long term list. 2010-01-09 06:12:44 +00:00
David Xu
2c3b3fef36 Add key type TYPE_SEM. 2010-01-09 06:05:31 +00:00
David Xu
79f8b61995 Add user-level semaphore synchronous type, this change allows multiple
processes to share semaphore by using shared memory area, in simplest case,
only one atomic operation is needed in userland, waiter flag is maintained by
kernel and userland only checks the flag, if the flag is set, user code enters
kernel and does a wakeup() call.
Move type definitions into file _umtx.h to minimize compiling time.
Also type names need to be prefixed with underline character, this would reduce
name conflict (still in progress).
2010-01-04 05:27:49 +00:00
David Xu
b101b127f3 In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.

The revision 197445 might be reverted.
2009-09-25 00:03:13 +00:00
David Xu
945488297b Make UMTX_OP_WAIT_UINT actually wait for an unsigned integer on 64-bits
machine.

MFC after: 1 week
2009-04-13 05:21:17 +00:00
David Xu
326bf9493d 1) Check NULL pointer before calling umtx_pi_adjust_locked(), this avoids
a PANIC.
2) Rework locking for POSIX priority-mutex, this fixes a
   race where a thread may wait there forever even if the mutex is unlocked.
2009-03-13 06:06:20 +00:00
David Xu
7de1ecef2d Add two commands to _umtx_op system call to allow a simple mutex to be
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.

Tested by: kris
Sounds great: jeff
2008-06-24 07:32:12 +00:00
David Xu
6e24e61797 Use a seperated hash table for mutex and rwlock, avoid wasting some time
on walking through idle threads sleeping on condition variables.
2008-05-30 02:18:54 +00:00
David Xu
727158f6f6 Introduce command UMTX_OP_WAIT_UINT_PRIVATE and UMTX_OP_WAKE_PRIVATE
to allow userland to specify that an address is not shared by multiple
processes.
2008-04-29 03:48:48 +00:00
David Xu
44253336b6 let umtxq_busy() only spin on mp machine. make function name
do_rwlock_unlock to be consistent with others.
2008-04-03 11:49:20 +00:00
David Xu
fadd84c58f Fix compiling problem for amd64. 2008-04-02 05:54:41 +00:00
David Xu
11b1023b7d Er, don't restart a timeout version. 2008-04-02 04:26:59 +00:00
David Xu
1a30511c61 Introduce kernel based userland rwlock. Each umtx chain now has two lists,
one for readers and one for writers, other types of synchronization
object just use first list.

Asked by: jeff
2008-04-02 04:08:37 +00:00
David Xu
7fab871d8c Check NULL pointer. 2007-12-17 08:09:37 +00:00
David Xu
9514dcc041 Add missing changes for fixing LOR of umtx lock and thread lock, follow
the committing of files:
	kern_resource.c revision 1.181
	sched_4bsd.c	revision 1.111
	sched_ule.c	revision 1.218
2007-12-17 05:55:07 +00:00
David Xu
110de0cf17 Add function UMTX_OP_WAIT_UINT, the function causes thread to wait for
an integer to be changed.
2007-11-21 04:21:02 +00:00
David Xu
42ce445fed Backout experimental adaptive-spin umtx code. 2007-06-06 07:35:08 +00:00
Jeff Roberson
3c2e44364e Commit 8/14 of sched_lock decomposition.
- Use a global umtx spinlock to protect the sleep queues now that there
   is no global scheduler lock.
 - Use thread_lock() to protect thread state.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:54:50 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
David Xu
4e32b7b3cc Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
    kern.threads.umtx_dflt_spins
        if the sysctl value is non-zero, a zero umutex.m_spincount will
        cause the sysctl value to be used a spin cycle count.
    kern.threads.umtx_max_spins
        the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130
2006-12-20 04:40:39 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
David Xu
745fbd3a72 if a thread blocked on userland condition variable is
pthread_cancel()ed, it is expected that the thread will not
consume a pthread_cond_signal(), therefor, we use thr_wake()
to mark a flag, the flag tells a thread calling do_cv_wait()
in umtx code to not block on a condition variable.
Thread library is expected that once a thread detected itself
is in pthread_cond_wait, it will call the thr_wake() for itself
in its SIGCANCEL handler.
2006-12-04 14:15:12 +00:00
David Xu
a6abdf322d Introduce userspace condition variable, since we have already POSIX
priority mutex implemented, it is the time to introduce this stuff,
now we can use umutex and ucond together to implement pthread's
condition wait/signal.
2006-12-03 01:49:22 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
John Birrell
8460a577a4 Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by:	davidxu@
2006-10-26 21:42:22 +00:00
David Xu
4c9b02c253 Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop,
make a fast path when a umtx_pi can be allocated without being blocked.
2006-10-26 09:33:34 +00:00
David Xu
7c24ae418a In order to eliminate a branch, convert opcode to unsigned integer. 2006-10-25 06:38:46 +00:00
David Xu
91d0b4d615 Eliminate an unnecessary `if' statement. 2006-10-25 06:28:23 +00:00