1. fast simple type mutex.
2. __thread tls works.
3. asynchronous cancellation works ( using signal ).
4. thread synchronization is fully based on umtx, mainly, condition
variable and other synchronization objects were rewritten by using
umtx directly. those objects can be shared between processes via
shared memory, it has to change ABI which does not happen yet.
5. default stack size is increased to 1M on 32 bits platform, 2M for
64 bits platform.
As the result, some mysql super-smack benchmarks show performance is
improved massivly.
Okayed by: jeff, mtm, rwatson, scottl
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.
To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.
Reviewed by: mtm@
ABI preservation tested on: i386, ia64
followed are: Only 3 functions (pthread_cancel, pthread_setcancelstate,
pthread_setcanceltype) are required to be async-signal-safe by POSIX. None of
the rest of the pthread api is required to be async-signal-safe. This means
that only the three mentioned functions are safe to use from inside
signal handlers.
However, there are certain system/libc calls that are
cancellation points that a caller may call from within a signal handler,
and since they are cancellation points calls have to be made into libthr
to test for cancellation and exit the thread if necessary. So, the
cancellation test and thread exit code paths must be async-signal-safe
as well. A summary of the changes follows:
o Almost all of the code paths that masked signals, as well as locking the
pthread structure now lock only the pthread structure.
o Signals are masked (and left that way) as soon as a thread enters
pthread_exit().
o The active and dead threads locks now explicitly require that signals
are masked.
o Access to the isdead field of the pthread structure is protected by both
the active and dead list locks for writing. Either one is sufficient for
reading.
o The thread state and type fields have been combined into one three-state
switch to make it easier to read without requiring a lock. It doesn't need
a lock for writing (and therefore for reading either) because only the
current thread can write to it and it is an integer value.
o The thread state field of the pthread structure has been eliminated. It
was an unnecessary field that mostly duplicated the flags field, but
required additional locking that would make a lot more code paths require
signal masking. Any truly unique values (such as PS_DEAD) have been
reborn as separate members of the pthread structure.
o Since the mutex and condvar pthread functions are not async-signal-safe
there is no need to muck about with the wait queues when handling
a signal ...
o ... which also removes the need for wrapping signal handlers and sigaction(2).
o The condvar and mutex async-cancellation code had to be revised as a result
of some of these changes, which resulted in semi-unrelated changes which
would have been difficult to work on as a separate commit, so they are
included as well.
The only part of the changes I am worried about is related to locking for
the pthread joining fields. But, I will take a closer look at them once this
mega-patch is committed.
o Fix mutex priority protocols. Keep separate counts of priority
inheritance and protection mutexes to make things easier.
This will not have much affect since this is only the
userland side, and the rest involves kernel scheduling.
a list in the thread structure to keep track of the locks and
how many times they have been locked. This list is checked
on every lock and unlock. The traversal through the list is
O(n). Most applications don't hold so many locks at once that
this will become a problem. However, if it does become a problem
it might be a good idea to review this once libthr is
off probation and in the optimization cycle.
This fixes:
o deadlock when a thread tries to recursively acquire a
read lock when a writer is waiting on the lock.
o a thread could previously successfully unlock a lock it did not own
o deadlock when a thread tries to acquire a write lock on
a lock it already owns for reading or writing [ this is admittedly
not required by POSIX, but is nice to have ]
waiting on a locked mutex. This involves passing a struct timespec
from the pthread mutex locking interfaces all the way down to the
function that suspends the thread until the mutex is released.
The timeout is assumed to be an absolute time (i.e. not relative to
the current time).
Also, in _thread_suspend() make the passed in timespec const.
o Remove some code duplication between _thread_init(), which is run once
to initialize libthr and the intitial thread, and pthread_create(), which
initializes newly created threads, into a new function called from both
places: init_td_common()
o Move initialization of certain parts of libthr into a separate
function. These include:
- Active threads list and it's lock
- Dead threads list and it's lock & condition variable
- Naming and insertion of the initial thread into the
active threads list.
not spinlock_t. Spinlock_t and the associated functions and macros may
require blocking signals in order for async-safe libc functions to behave
appropriately in libthr. This is undesriable for libthr internal locking.
So, this is the first step in completely separating libthr from libc's
locking primitives.
Three new macros should be used for internal libthr locking from now on:
THR_LOCK, THR_TRYLOCK, THR_UNLOCK.
and the disabling of signals. What we are really interested in is
keeping track of recursive disabling of signals. We should not
be recursively acquiring thread locks. Any such situations should
be reorganized to not require a recursive lock.
Separating the two out also allows us to block signals independent of
acquiring thread locks. This will be needed in libthr in the near future when
we put the pieces together to protect libc functions that use pthread mutexes
and low level locks.
implementation and the new improved one. We now precompute the
signal set passed to sigtimedwait, using an inverted set when
necessary for compatibility with older kernels.
exit function has invalidated the need for _spin[un]lock_pthread().
The _spin[un]lock() functions can now dereference curthread without
the danger that the ldtentry containing the pointer to the thread
has been cleared out from under them.
joined and then the joiner thread. There isn't an easy (sane?) way
to make it use the correct order without introducing races involving
the target thread and finding which (active or dead) list it is on. So,
after locking the canceled thread it will try to lock the joined thread
and if it fails release the first lock and try again from the top.
Introduce a new function, _spintrylock, which is simply a wrapper arround
umtx_trylock(), to help accomplish this.
Approved by: re/blanket libthr
list is protected by a spinlock_t, but the dead list uses a pthread_mutex
because it is necessary to synchronize other threads with the garbage
collector thread. Lock/Unlock macros are used so it's easier to make
changes to the locks in the future.
The 'dead thread list' lock is intended to replace the gc mutex.
This doesn't have any practical ramifications. It simply makes it
clearer what the purpose of the lock is. The gc will use this lock,
instead of the gc mutex, to synchronize access to the dead list with
other threads.
Modify _pthread_exit() to use these two new locks instead of GIANT_LOCK,
and also to properly lock and protect thread state changes,
especially with respect to a joining thread.
The gc thread was also re-arranged to be more organized and less nested.
_pthread_join() was also modified to use the thread list locks. However,
locking and unlocking here needs special care because a thread could find
itself in a position where it's joining an exiting thread that is
waiting on the dead list lock, which this thread (joiner) holds. If the
joiner doesn't take care to lock *and* unlock in the same order they
(the joiner and the joinee) could deadlock against each other.
Approved by: re/blanket libthr
pthread_cond_t) internaly in addition to the low-level spinlock_t. The
garbage collector mutex and condition variable are two such examples. This
might lead to critical sections nested within critical sections. Implement
a reference counting mechanism so that signals are masked only on the first
entry and unmasked on the last exit.
I'm not sure I like the idea of nested critical sections, but if
the library is going to use the pthread primitives it might be necessary.
Approved by: re/blanket libthr
that take the address of a struct pthread as their first argument.
_spin[un]lock() just become wrappers arround these two functions.
These new functions are for use in situations where curthread can't be
used. One example is _thread_retire(), where we invalidate the array index
curthread uses to get its pointer..
Approved by: re/blanket libthr
Prevent one thread from messing up another thread's saved signal
mask by saving it in struct pthread instead of leaving it as a
global variable. D'oh!
Approved by: re/blanket libthr
as curthread in the new context, so that it will be set automatically when
the thread is switched to. This fixes a race where we'd run for a little
while with curthread unset in _thread_start.
Reviewed by: jeff
_get_curthread(). This is similar to the kernel's curthread. Doing
this saves stack overhead and is more convenient to the programmer.
- Pass the pointer to the newly created thread to _thread_init().
- Remove _get_curthread_slow().
This was changed because originally we were blocking on the umtx and
allowing the kernel to do the queueing. It was decided that the
lib should queue and start the threads in the order it decides and the
umtx code would just be used like spinlocks.