This allows for rtld to not issue two sigprocmask(2) syscalls for each
symbol binding operation in single-threaded processes. Rtld needs to
block signals as part of locking to ensure signal safety of the bind
process, because signal handlers might need to lazily resolve symbol
references.
As result, number of syscalls issued on startup by simple programs not
using libthr, is typically reduced 2x. For instance, for hello world,
I see:
non-sigfastblock
# (truss ./hello > /dev/null) |& wc -l
63
sigfastblock
# (truss ./hello > /dev/null) |& wc -l
37
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
Currently RTLD is linked against libc_nossp_pic which means that any libc
symbol used in rtld can pull in a lot of depedencies. This was causing
symbol such as __libc_interposing and all the pthread stubs to be included
in RTLD even though they are not required. It turns out most of these
dependencies can easily be avoided by providing overrides inside of rtld.
This change is motivated by CHERI, where we have an experimental ABI that
requires additional relocation processing to allow the use of function
pointers inside of rtld. Instead of adding this self-relocation code to
RTLD I attempted to remove most function pointers from RTLD and discovered
that most of them came from the libc dependencies instead of being actually
used inside rtld.
A nice side-effect of this change is that rtld is now 22% smaller on amd64.
text data bss dec hex filename
0x21eb6 0xce0 0xe60 145910 239f6 /home/alr48/ld-elf-x86.before.so.1
0x1a6ed 0x728 0xdd8 113645 1bbed /home/alr48/ld-elf-x86.after.so.1
The number of R_X86_64_RELATIVE relocations that need to be processed on
startup has also gone down from 368 to 187 (almost 50% less).
Reviewed By: kib
Differential Revision: https://reviews.freebsd.org/D20663
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
Obtaining compat rtld lock in write mode sets process signal mask to
block all signals. Previous mask is stored in the global variable
oldsigmask. If a lock is write-locked while another lock is already
write-locked, oldsigmask is overwritten by the total mask and on the
last unlock, all signals except traps appear to be blocked.
Fix this by counting the write-lock nested level, and only storing to
oldsigmask/restoring from it at the outermost level.
Masking signals disables involuntary preemption for libc_r, and there
could be no voluntary context switches in the locked code
(dl_iterate_phdr(3) keeps a lock around user callback, but it was
added long after libc_r was renounced). Due to this, remembering the
level in the global variable after the lock is obtained should be
safe, because no two libc_r threads can acquire different write locks
in parallel.
PR: 215826
Reported by: kami
Tested by: yamagi@yamagi.org (previous version)
To be reviewed by: kan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
rtld on x86 to be hidden. This is a micro-optimization, which allows
intrinsic references inside rtld to be handled without indirection
through PLT. The visibility of rtld symbols for other objects in the
symbol namespace is controlled by a version script.
Reviewed by: kan, jilles
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
mode. This allows the binder to be functional in the child after the
fork (assuming no lazy loading of a filter is needed), but other rtld
services which require write lock on rtld_bind_lock cause deadlock, if
called by child.
Change the _rtld_atfork() to lock the bind lock in write mode, making
the rtld fully functional after the fork.
Pre-resolve the symbols which are called by the libthr' fork()
interposer, since dynamic resolution causes deadlock due to the
rtld_bind_lock already owned in the write mode.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
C runtime services, like printf(). Unfortunately, the multithread-safeness
measures in the libc do not work in rtld environment.
Rip the kernel printf() implementation and use it in the rtld instead of
libc version. This printf does not require any shared global data and thus
is mt-safe. Systematically use rtld_printf() and related functions, remove
the calls to err(3).
Note that stdio is still pulled from libc due to libmap implementaion using
fopen(). This is safe but unoptimal, and can be changed later.
Reported and tested by: pgj
Diagnosed and reviewed by: kan (previous version)
Approved by: re (bz)
filters are implemented.
Filtees are loaded on demand, unless LD_LOADFLTR environment variable
is set or -z loadfltr was specified during the linking. This forces
rtld to upgrade read-locked rtld_bind_lock to write lock when it
encounters an object with filter during symbol lookup.
Consolidate common arguments of the symbol lookup functions in the
SymLook structure. Track the state of the rtld locks in the
RtldLockState structure. Pass local RtldLockState through the rtld
symbol lookup calls to allow lock upgrades.
Reviewed by: kan
Tested by: Mykola Dzham <i levsha me>, nwhitehorn (powerpc)
Threading library calls _pre before the fork, allowing the rtld to
lock itself to ensure that other threads of the process are out of
dynamic linker. _post releases the locks.
This allows the rtld to have consistent state in the child. Although
child may legitimately call only async-safe functions, the call may
need plt relocation resolution, and this requires working rtld.
Reported and debugging help by: rink
Reviewed by: kan, davidxu
MFC after: 1 month (anyway, not before 7.1 is out)
bit flag, otherwise if a thread acquired a lock, another thread
or the current thread itself can no longer acquire another lock
because thread_mask_set() return whole flag word, this results
bit leaking in the word and misbehavior in later locking and
unlocking.
programs.
From the PR description:
The gcc runtime's _Unwind_Find_FDE function, invoked during exception
handling's stack unwinding, is not safe to execute from within multiple
threads. FreeBSD' s dl_iterate_phdr() however permits multiple threads
to pass through it though. The result is surprisingly reliable infinite
looping of one or more threads if they just happen to be unwinding at
the same time.
Introduce the new lock that is write locked around the dl_iterate_pdr,
thus providing required exclusion for the stack unwinders.
PR: threads/123062
Submitted by: Andy Newman <an at atrn org>
Reviewed by: kan
MFC after: 2 weeks
implementation in case default one provided by rtld is
not suitable.
Consolidate various identical MD lock implementation into
a single file using appropriate machine/atomic.h.
Approved by: re (scottl)