well-known race condition, which elimination was the reason for the
function appearance in first place. If sigmask supplied as argument to
pselect() enables a signal, the signal might be delivered before thread
called select(2), causing lost wakeup. Reimplement pselect() in kernel,
making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored,
set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK
should be cleared by ast() in case signal was not gelivered during
syscall execution.
Reviewed by: davidxu
Tested by: pho
MFC after: 1 month
query umtx also if the shared waiters bit is set on a shared lock.
The writer starvation avoidance technique, infact, can lead to shared
waiters on a shared lock which can bring to a missed wakeup and thus
to a deadlock if the right bit is not checked (a notable case is the
writers counterpart to be handled through expired timeouts).
Fix that by checking for the shared waiters bit also when unlocking the
shared locks.
That bug was causing a reported MySQL deadlock.
Many thanks go to Nick Esborn and his employer DesertNet which provided
time and machines to identify and fix this issue.
PR: thread/135673
Reported by: Nick Esborn <nick at desert dot net>
Tested by: Nick Esborn <nick at desert dot net>
Reviewed by: jeff
The most notable is that it is not bumped in rwlock_rdlock_common() when
the hard path (__thr_rwlock_rdlock()) returns successfully.
This can lead to deadlocks in libthr when rwlocks recursion in read mode
happens.
Fix the interested parts by correctly handling rdlock_count.
PR: threads/136345
Reported by: rink
Tested by: rink
Reviewed by: jeff
Approved by: re (kib)
MFC: 2 weeks
by temporary pretending that the process is still multithreaded.
Current malloc lock primitives do nothing for singlethreaded process.
Reviewed by: davidxu, deischen
in order to get the symbol binding state "just so". This is to allow
locking to be activated and not run into recursion problems later.
However, one of the magic bits involves an explicit call to _umtx_op()
to force symbol resolution. It does a wakeup operation on a fake,
uninitialized (ie: random contents) umtx. Since libthr isn't active, this
is harmless. Nothing can match the random wakeup.
However, valgrind finds this and is not amused. Normally I'd just
write a suppression record for it, but the idea of passing random
args to syscalls (on purpose) just doesn't feel right.
does not use any external symbols, thus avoiding possible recursion into
rtld to resolve symbols, when called.
Reviewed by: kan, davidxu
Tested by: rink
MFC after: 1 month
by switching into single-thread mode.
libthr ignores broken use of lock bitmaps used by default rtld locking
implementation, this in turn turns lock handoff in _rtld_thread_init
into NOP. This in turn makes child processes of forked multi-threaded
programs to run with _thr_signal_block still in effect, with most
signals blocked.
Reported by: phk, kib
us working malloc in the fork child of the multithreaded process.
Although POSIX requires that only async-signal safe functions shall be
operable after fork in multithreaded process, not having malloc lower
the quality of our implementation.
Tested by: rink
Discussed with: kan, davidxu
Reviewed by: kan
MFC after: 1 month
Threading library calls _pre before the fork, allowing the rtld to
lock itself to ensure that other threads of the process are out of
dynamic linker. _post releases the locks.
This allows the rtld to have consistent state in the child. Although
child may legitimately call only async-safe functions, the call may
need plt relocation resolution, and this requires working rtld.
Reported and debugging help by: rink
Reviewed by: kan, davidxu
MFC after: 1 month (anyway, not before 7.1 is out)
where critical. Some places still use ps_pread/ps_pwrite directly,
but only need changed when byte-order comes into the picture.
Also, change th_p in td_event_msg_t from a pointer type to
psaddr_t, so that events also work when psaddr_t is widened.
This caching allows for completely lock-free allocation/deallocation in the
steady state, at the expense of likely increased memory use and
fragmentation.
Reduce the default number of arenas to 2*ncpus, since thread-specific
caching typically reduces arena contention.
Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced,
cacheline-spaced, and subpage-spaced size classes. The advantages are:
fewer size classes, reduced false cacheline sharing, and reduced internal
fragmentation for allocations that are slightly over 512, 1024, etc.
Increase RUN_MAX_SMALL, in order to limit fragmentation for the
subpage-spaced size classes.
Add a size-->bin lookup table for small sizes to simplify translating sizes
to size classes. Include a hard-coded constant table that is used unless
custom size class spacing is specified at run time.
Add the ability to disable tiny size classes at compile time via
MALLOC_TINY.
- It is opt-out for now so as to give it maximum testing, but it may be
turned opt-in for stable branches depending on the consensus. You
can turn it off with WITHOUT_SSP.
- WITHOUT_SSP was previously used to disable the build of GNU libssp.
It is harmless to steal the knob as SSP symbols have been provided
by libc for a long time, GNU libssp should not have been much used.
- SSP is disabled in a few corners such as system bootstrap programs
(sys/boot), process bootstrap code (rtld, csu) and SSP symbols themselves.
- It should be safe to use -fstack-protector-all to build world, however
libc will be automatically downgraded to -fstack-protector because it
breaks rtld otherwise.
- This option is unavailable on ia64.
Enable GCC stack protection (aka Propolice) for kernel:
- It is opt-out for now so as to give it maximum testing.
- Do not compile your kernel with -fstack-protector-all, it won't work.
Submitted by: Jeremie Le Hen <jeremie@le-hen.org>
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.
Tested by: kris
Sounds great: jeff
returns errno, because errno can be mucked by user's signal handler and
most of pthread api heavily depends on errno to be correct, this change
should improve stability of the thread library.