81 Commits

Author SHA1 Message Date
kib
a40c8cb313 Fix _pthread_cancel_enter() and _pthread_cancel_leave() jmptable entries.
PR:	240022
Reported by:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-08-21 19:53:50 +00:00
kib
60e92f3b78 Avoid conflicts with libc symbols in libthr jump table.
In some corner cases of static linking and unexpected libraries order
on the linker command line, libc symbol might preempt the same libthr
symbol, in which case libthr jump table points back to libc causing
either infinite recursion or loop.  Handle all of such symbols by
using private libthr names for them, ensuring that the right pointers
are installed into the table.

In collaboration with: arichardson
PR:	239475
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21088
2019-07-31 19:27:20 +00:00
kib
451878136d Add libc stub for pthread_getthreadid_np(3).
Requested by:	jbeich
PR:	238650
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-23 10:50:26 +00:00
kib
ed4ff10bf5 Untangle jemalloc and mutexes initialization.
The need to use libc malloc(3) from some places in libthr always
caused issues.  For instance, per-thread key allocation was switched to
use plain mmap(2) to get storage, because some third party mallocs
used keys for implementation of calloc(3).

Even more important, libthr calls calloc(3) during initialization of
pthread mutexes, and jemalloc uses pthread mutexes.  Jemalloc provides
some way to both postpone the initialization, and to make
initialization to use specialized allocator, but this is very fragile
and often breaks.  See the referenced PR for another example.

Add the small malloc implementation used by rtld, to libthr. Use it in
thr_spec.c and for mutexes initialization. This avoids the issues with
mutual dependencies between malloc and libthr in principle.  The
drawback is that some more allocations are not interceptable for
alternate malloc implementations.  There should be not too much memory
use from this allocator, and the alternative, direct use of mmap(2) is
obviously worse.

PR:	235211
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D18988
2019-01-29 22:46:44 +00:00
trasz
62e8aea10c Make libthr(3) use sysconf(_SC_NPROCESSORS_CONF); this shaves off
two calls to sysctl(2) from the binary startup.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18046
2018-11-19 18:24:08 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
vangyzen
df2338fa68 Remove old spinlock_debug code from libc
This no longer seems useful.  Remove it.

This was prompted by a "cast discards volatile qualifier" warning
in libthr when WARNS=6.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D10832
2017-05-20 17:32:01 +00:00
kib
9a6f37a73a Remove empty initializer for the once facility. It was not needed
since r179417.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-07-27 15:14:11 +00:00
kib
8da898f26c Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held.  The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.

Reviewed by:	jilles (previous version, supposedly the objection was fixed)
Discussed with:	brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-05-17 09:56:22 +00:00
kib
fe92c9d1a4 Use __FBSDID() for .c files from lib/libthr/thread.
Sponsored by:	The FreeBSD Foundation
2016-04-08 11:15:26 +00:00
kib
725da9cdbc Remove unused variable. It was write-only before r297139.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-04-04 06:58:59 +00:00
kib
bdf946f966 Lock pshared_lock shared around fork, to ensure that the COW snapshot
of the pshared hash in child is consistent and can be safely used.

Reported and tested by:	"Oleg V. Nauman" <oleg@opentransfer.com>
Sponsored by:	The FreeBSD Foundation
2016-03-21 06:52:35 +00:00
kib
1e240521d2 From libthr, remove special and strange code to set up session and
control terminal, activated when running with pid 1.  It is
application duty to handle this, and unsuspecting init replacements
which are linked with libthr would be broken by this.

The pre-resolving of getpid() is restored, just in case.

Reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-03-21 06:46:16 +00:00
kib
e76eb4255b Implement process-shared locks support for libthr.so.3, without
breaking the ABI.  Special value is stored in the lock pointer to
indicate shared lock, and offline page in the shared memory is
allocated to store the actual lock.

Reviewed by:	vangyzen (previous version)
Discussed with:	deischen, emaste, jhb, rwatson,
	Martin Simmons <martin@lispworks.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-02-28 17:52:33 +00:00
jilles
507e66a915 libthr: Don't use both __sys_open() and __sys_openat(). 2015-12-20 16:33:56 +00:00
kib
8d1dfb4106 Fix known issues which blow up the process after dlopen("libthr.so")
(or loading a dso linked to libthr.so into process which was not
linked against threading library).

- Remove libthr interposers of the libc functions, including
  __error(). Instead, functions calls are indirected through the
  interposing table, similar to how pthread stubs in libc are already
  done.  Libc by default points either to syscall trampolines or to
  existing libc implementations.  On libthr load, libthr rewrites the
  pointers to the cancellable implementations already in libthr.  The
  interposition table is separate from pthreads stubs indirection
  table to not pull pthreads stubs into static binaries.

- Postpone the malloc(3) internal mutexes initialization until libthr
  is loaded.  This avoids recursion between calloc(3) and static
  pthread_mutex_t initialization.

- Reinstall signal handlers with wrapper on libthr load.  The
  _rtld_is_dlopened(3) is used to avoid useless calls to sigaction(2)
  when libthr is statically referenced from the main binary.

In the process, fix openat(2), swapcontext(2) and setcontext(2)
interposing.  The libc symbols were exported at different versions
than libthr interposers.  Export both libc and libthr versions from
libc now, with default set to the higher version from libthr.

Remove unused and disconnected swapcontext(3) userspace implementation
from libc/gen.

No objections from:	deischen
Tested by:	pho, antoine (exp-run) (previous versions)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-01-03 18:38:46 +00:00
kib
a14de80022 Switch the defaults to not split the RLIMIT_STACK-sized initial thread
stack into the stacks of the created threads.  Add knob
LIBPTHREAD_SPLITSTACK_MAIN to restore the older behaviour.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-09-24 12:39:12 +00:00
kib
96b731501d Add a knob LIBPTHREAD_BIGSTACK_MAIN, which instructs libthr to leave
the whole RLIMIT_STACK-sized region of the kernel-allocated stack as
the stack of main thread.

By default, the main thread stack is clamped at 2MB (4MB on 64bit
ABIs) and the rest is used for other threads stack allocation.  Since
there is no programmatic way to adjust the size of the main thread
stack, pthread_attr_setstacksize() is too late, the knob allows user
to manage the main stack size both for single-threaded and
multi-threaded processes with the rlimit.

Reported by:	"Ivan A. Kosarev" <ivan@ivan-labs.com>
Tested by:	dim
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-08-13 05:53:41 +00:00
jilles
3ade1f6acb libthr: Always use the threaded rtld lock implementation.
The threaded rtld lock implementation is faster even in the single-threaded
case because it postpones signal handlers via THR_CRITICAL_ENTER and
THR_CRITICAL_LEAVE instead of calling sigprocmask(2).

As a result, exception handling becomes faster in single-threaded
applications linked with libthr.

Reviewed by:	kib
2013-01-18 23:08:40 +00:00
davidxu
f83ff5dc95 In suspend_common(), don't wait for a thread which is in creation, because
pthread_suspend_all_np() may have already suspended its parent thread.
Add locking code in pthread_suspend_all_np() to only allow one thread
to suspend other threads, this eliminates a deadlock where two or more
threads try to suspend each others.
2012-08-27 03:09:39 +00:00
davidxu
520dba8f28 MFp4:
Enqueue thread in LIFO, this can cause starvation, but it gives better
performance. Use _thr_queuefifo to control the frequency of FIFO vs LIFO,
you can use environment string LIBPTHREAD_QUEUE_FIFO to configure the
variable.
2012-05-03 09:17:31 +00:00
kan
7427a39f73 Do not set thread name to less than informative 'initial thread'. 2011-06-19 13:35:36 +00:00
davidxu
437ad27f9c MFp4:
- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
  condition variable, this should eliminate an extra system call to get
  current time.

- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
  system call. Create userland sleep queue for condition variable, in most
  cases, thread will wait in the queue, the pthread_cond_signal will defer
  thread wakeup until the mutex is unlocked, it tries to avoid an extra
  system call and a extra context switch in time window of pthread_cond_signal
  and pthread_mutex_unlock.

The changes are part of process-shared mutex project.
2010-12-22 05:01:52 +00:00
davidxu
f329bc965c In current code, statically initialized and destroyed object have
same null value, the code can not distinguish between them, to
fix the problem, now a destroyed object is assigned to a non-null
value, and it will be rejected by some pthread functions.
PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP is changed to number 1, so that
adaptive mutex can be statically initialized correctly.
2010-09-28 04:57:56 +00:00
davidxu
74604ed9c4 To support stack unwinding for cancellation points, add -fexceptions flag
for them, two functions _pthread_cancel_enter and _pthread_cancel_leave
are added to let thread enter and leave a cancellation point, it also
makes it possible that other functions can be cancellation points in
libraries without having to be rewritten in libthr.
2010-09-25 01:57:47 +00:00
davidxu
b00fcaa22c add code to support stack unwinding when thread exits. note that only
defer-mode cancellation works, asynchrnous mode does not work because
it lacks of libuwind's support. stack unwinding is not enabled unless
LIBTHR_UNWIND_STACK is defined in Makefile.
2010-09-15 02:56:32 +00:00
davidxu
e87e922f31 Convert thread list lock from mutex to rwlock. 2010-09-13 07:03:01 +00:00
davidxu
5f00b957ae Change atfork lock from mutex to rwlock, also make mutexes used by malloc()
module private type, when private type mutex is locked/unlocked, thread
critical region is entered or leaved. These changes makes fork()
async-signal safe which required by POSIX. Note that user's atfork handler
still needs to be async-signal safe, but it is not problem of libthr, it
is user's responsiblity.
2010-09-01 03:11:21 +00:00
davidxu
4dcb50723a Add signal handler wrapper, the reason to add it becauses there are
some cases we want to improve:
  1) if a thread signal got a signal while in cancellation point,
     it is possible the TDP_WAKEUP may be eaten by signal handler
     if the handler called some interruptibly system calls.
  2) In signal handler, we want to disable cancellation.
  3) When thread holding some low level locks, it is better to
     disable signal, those code need not to worry reentrancy,
     sigprocmask system call is avoided because it is a bit expensive.
The signal handler wrapper works in this way:
  1) libthr installs its signal handler if user code invokes sigaction
     to install its handler, the user handler is recorded in internal
     array.
  2) when a signal is delivered, libthr's signal handler is invoke,
     libthr checks if thread holds some low level lock or is in critical
     region, if it is true, the signal is buffered, and all signals are
     masked, once the thread leaves critical region, correct signal
     mask is restored and buffered signal is processed.
  3) before user signal handler is invoked, cancellation is temporarily
     disabled, after user signal handler is returned, cancellation state
     is restored, and pending cancellation is rescheduled.
2010-09-01 02:18:33 +00:00
davidxu
87c8a1faf2 Use umtx to implement process sharable semaphore, to make this work,
now type sema_t is a structure which can be put in a shared memory area,
and multiple processes can operate it concurrently.
User can either use mmap(MAP_SHARED) + sem_init(pshared=1) or use sem_open()
to initialize a shared semaphore.
Named semaphore uses file system and is located in /tmp directory, and its
file name is prefixed with 'SEMD', so now it is chroot or jail friendly.
In simplist cases, both for named and un-named semaphore, userland code
does not have to enter kernel to reduce/increase semaphore's count.
The semaphore is designed to be crash-safe, it means even if an application
is crashed in the middle of operating semaphore, the semaphore state is
still safely recovered by later use, there is no waiter counter maintained
by userland code.
The main semaphore code is in libc and libthr only has some necessary stubs,
this makes it possible that a non-threaded application can use semaphore
without linking to thread library.
Old semaphore implementation is kept libc to maintain binary compatibility.
The kernel ksem API is no longer used in the new implemenation.

Discussed on: threads@
2010-01-05 02:37:59 +00:00
davidxu
c0f6b35a3a - Reduce function call overhead for uncontended case.
- Remove unused flags MUTEX_FLAGS_* and their code.
- Check validity of the timeout parameter in mutex_self_lock().
2008-05-29 07:57:33 +00:00
davidxu
6a349c6771 _vfork is not in libthr, remove the reference. 2008-04-16 03:19:11 +00:00
davidxu
d6c532fa79 Use cpuset defined in pthread_attr for newly created thread, for now,
we set scheduling parameters and cpu binding fully in userland, and
because default scheduling policy is SCHED_RR (time-sharing), we set
default sched_inherit to PTHREAD_SCHED_INHERIT, this saves a system
call.
2008-03-05 07:01:20 +00:00
davidxu
7046a9b037 implement pthread_attr_getaffinity_np and pthread_attr_setaffinity_np. 2008-03-04 03:03:24 +00:00
davidxu
97a20b1db7 Add my recent work of adaptive spin mutex code. Use two environments variable
to tune pthread mutex performance:
1. LIBPTHREAD_SPINLOOPS
	If a pthread mutex is being locked by another thread, this environment
	variable sets total number of spin loops before the current thread
	sleeps in kernel, this saves a syscall overhead if the mutex will be
	unlocked very soon (well written application code).
2. LIBPTHREAD_YIELDLOOPS
	If a pthread mutex is being locked by other threads, this environment
	variable sets total number of sched_yield() loops before the currrent
	thread sleeps in kernel. if a pthread mutex is locked, the current thread
	gives up cpu, but will not sleep in kernel, this means, current thread
	does not set contention bit in mutex, but let lock owner to run again
	if the owner is on kernel's run queue, and when lock owner unlocks the
	mutex, it does not need to enter kernel and do lots of work to resume
	mutex waiters, in some cases, this saves lots of syscall overheads for
	mutex owner.

In my practice, sometimes LIBPTHREAD_YIELDLOOPS can massively improve performance
than LIBPTHREAD_SPINLOOPS, this depends on application. These two environments
are global to all pthread mutex, there is no interface to set them for each
pthread mutex, the default values are zero, this means spinning is turned off
by default.
2007-10-30 05:57:37 +00:00
davidxu
d97c4f1e52 backout experimental adaptive spinning mutex for product use. 2007-05-09 08:39:33 +00:00
davidxu
d305995d66 get LIBPTHREAD_ADAPTIVE_SPIN early, so it can be used for some global
mutexes.
2006-12-20 05:05:44 +00:00
davidxu
e034ab54f2 Check environment variable PTHREAD_ADAPTIVE_SPIN, if it is set, use
it as a default spin cycle count.
2006-12-20 04:43:34 +00:00
davidxu
ca27871833 - Remove variable _thr_scope_system, all threads are system scope.
- Rename _thr_smp_cpus to boolean variable _thr_is_smp.
- Define CPU_SPINWAIT macro for each arch, only X86 supports it.
2006-12-15 11:52:01 +00:00
davidxu
add205129c Eliminate atomic operations in thread cancellation functions, it should
reduce overheads of cancellation points.
2006-11-24 09:57:38 +00:00
davidxu
d2c57b7fad use rtprio_thread system call to get or set thread priority. 2006-09-21 04:21:30 +00:00
davidxu
21e4536026 Replace internal usage of struct umtx with umutex which can supports
real-time if we want, no functionality is changed.
2006-09-06 04:04:10 +00:00
davidxu
58fc7458af Use umutex APIs to implement pthread_mutex, member pp_mutexq is added
into pthread structure to keep track of locked PTHREAD_PRIO_PROTECT mutex,
no real mutex code is changed, the mutex locking and unlocking code should
has same performance as before.
2006-08-28 04:52:50 +00:00
davidxu
cfa46376c7 Get number of CPUs and ignore spin count on single processor machine. 2006-08-08 04:42:41 +00:00
davidxu
5193e44c40 1. Don't override underscore version of aio_suspend(), system(),
wait(), waitpid() and usleep(), they are internal versions and
   should not be cancellation points.
2. Make wait3() as a cancellation point.
3. Move raise() and pause() into file thr_sig.c.
4. Add functions _sigsuspend, _sigwait, _sigtimedwait and _sigwaitinfo,
   remove SIGCANCEL bit in wait-set for those functions, the signal is
   used internally to implement thread cancellation.
2006-07-25 12:50:05 +00:00
davidxu
2b1dbc0acb Caching scheduling policy and priority in userland, a critical but baddly
written application is frequently changing thread priority for SCHED_OTHER
policy.
2006-07-13 22:45:19 +00:00
davidxu
ecacf536b0 Use kernel facilities to support real-time scheduling. 2006-07-12 06:13:18 +00:00
davidxu
66d0fee031 - Use same priority range returned by kernel's sched_get_priority_min()
and sched_get_priority_max() syscalls.
- Remove unused fields from structure pthread_attr.
2006-04-27 08:18:23 +00:00
davidxu
31f2b819c6 WARNS level 4 cleanup. 2006-04-04 02:57:49 +00:00
davidxu
255936645e Remove priority mutex code because it does not work correctly,
to make it work, turnstile like mechanism to support priority
propagating and other realtime scheduling options in kernel
should be available to userland mutex, for the moment, I just
want to make libthr be simple and efficient thread library.

Discussed with: deischen, julian
2006-03-27 23:50:21 +00:00