libthr(3): explain some internals of the locks implementation

Describe internal allocations, mention problems with the use of global
malloc(3) and the reasons for internal allocator existence.

Document shared objects implementation and describe shortcomings of the
chosen approach, as well as the rationale why it was done that way.

Reviewed by:	markj
Discussed with:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D32243
This commit is contained in:
Konstantin Belousov 2021-10-01 04:17:02 +03:00
parent 6bda192013
commit f5b9747075

View File

@ -1,5 +1,5 @@
.\" Copyright (c) 2005 Robert N. M. Watson
.\" Copyright (c) 2014,2015 The FreeBSD Foundation, Inc.
.\" Copyright (c) 2014,2015,2021 The FreeBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" Part of this documentation was written by
@ -29,7 +29,7 @@
.\"
.\" $FreeBSD$
.\"
.Dd May 5, 2020
.Dd October 1, 2021
.Dt LIBTHR 3
.Os
.Sh NAME
@ -259,6 +259,65 @@ the critical section.
This should be taken into account when interpreting
.Xr ktrace 1
logs.
.Sh PROCESS-SHARED SYNCHRONIZATION OBJECTS
In the
.Li libthr
implementation,
user-visible types for all synchronization objects (e.g. pthread_mutex_t)
are pointers to internal structures, allocated either by the corresponding
.Fn pthread_<objtype>_init
method call, or implicitly on first use when a static initializer
was specified.
The initial implementation of process-private locking object used this
model with internal allocation, and the addition of process-shared objects
was done in a way that did not break the application binary interface.
.Pp
For process-private objects, the internal structure is allocated using
either
.Xr malloc 3
or, for
.Xr pthread_mutex_init 3 ,
an internal memory allocator implemented in
.Nm .
The internal allocator for mutexes is used to avoid bootstrap issues
with many
.Xr malloc 3
implementations which need working mutexes to function.
The same allocator is used for thread-specific data, see
.Xr pthread_setspecific 3 ,
for the same reason.
.Pp
For process-shared objects, the internal structure is created by first
allocating a shared memory segment using
.Xr _umtx_op 2
operation
.Dv UMTX_OP_SHM ,
and then mapping it into process address space with
.Xr mmap 2
with the
.Dv MAP_SHARED
flag.
The POSIX standard requires that:
.Bd -literal
only the process-shared synchronization object itself can be used for
performing synchronization. It need not be referenced at the address
used to initialize it (that is, another mapping of the same object can
be used).
.Ed
.Pp
With the
.Fx
implementation, process-shared objects require initialization
in each process that use them.
In particular, if you map the shared memory containing the user portion of
a process-shared object already initialized in different process, locking
functions do not work on it.
.Pp
Another broken case is a forked child creating the object in memory shared
with the parent, which cannot be used from parent.
Note that processes should not use non-async-signal safe functions after
.Xr fork 2
anyway.
.Sh SEE ALSO
.Xr ktrace 1 ,
.Xr ld-elf.so.1 1 ,