They provided little benefit (if any) and they caused some problems
in OpenOffice, at least in post-KSE -current and perhaps in other
environments too. The nanosleep calls prevented the profiling timer
from advancing during the spinloops, thereby preventing the thread
scheduler from ever pre-empting the spinning thread. Alexander
Kabaev diagnosed this problem, Martin Blapp helped with testing,
and Matt Dillon provided some helpful suggestions.
This is a short-term fix for a larger problem. The use of spinlocking
isn't guaranteed to work in all cases. For example, if the spinning
thread has higher priority than all other threads, it may never be
pre-empted, and the thread holding the lock may never progress far
enough to release the lock. On the other hand, spinlocking is the
only locking that can work with an arbitrary unknown threads package.
I have some ideas for a much better fix in the longer term. It
would eliminate all locking inside the dynamic linker by making it
safe for symbol lookups and lazy binding to proceed in parallel
with a call to dlopen or dlclose. This means that the only mutual
exclusion needed would be to prevent multiple simultaneous calls
to dlopen and/or dlclose. That mutual exclusion could be put into
the native pthreads library. Applications using foreign threads
packages would have to make their own arrangements to ensure that
they did not have multiple threads in dlopen and/or dlclose -- a
reasonable requirement in my opinion.
MFC after: 3 days
matching constraints where appropriate. This makes the dynamic
linker buildable at -O0 again.
Thanks to Bruce Evans for identifying the cause of the build
problem.
MFC after: 1 week
goto target was so the cache could be freed. So free the cache after
done: rather then before done: (!)
Submitted by: Gavin Atkinson <gavin@ury.york.ac.uk>
Martin Blapp determined that the elf dynamic loader was at fault. In
particular, the loader uses alloca() to allocate a symbol cache on the
stack. Normally this would work just fine, but if the loader is called
from a threaded program and the object being loaded is fairly large the
alloca() can blow away the thread stack and effect other nearby thread
stacks as well. My testing showed that the symbol cache can be as large
as 250KBytes during the openoffice port build and install sequence. Martin
was able to work around the problem by disabling the symbol cache
(cache = NULL;). However, this solution is not adequate for commit because
it can cause an enormous cpu burden for applications which do a lot of
dynamic loading (e.g. like konqueror).
The solution is to use anonymous mmap() to temporarily allocate space to
hold the symbol cache. In testing I found that replacing the alloca()
with mmap() has no observable degredation in performance.
It should be noted that this bug does not necessarily cause an immediate
crash but can instead result in long term corruption and instability in
applications that load modules from threads. The bug is almost certainly
responsible for some of the instabilities found in konqueror, for example,
and possibly netscape too.
Sleuthing work by: Martin Blapp <mb@imp.ch>
X-MFC after: Before or after the 4.6 release depending on the release engineers
DT_INIT and DT_FINI tags pointed to fptr records. In 2.11.2, it points
to the actuall address of the function. On IA64 you cannot just take
an address of a function, store it in a function pointer variable and
call it.. the function pointers point to a fptr data block that has the
target gp and address in it. This is absolutely necessary for using
the in-tree binutils toolchain, but (unfortunately) will not work with
old shared libraries. Save your old ld-elf.so.1 if you want to use
old ones still. Do not mix-and-match.
This is a no-op change for i386 and alpha.
Reviewed by: dfr
particularly help programs which load many shared libraries with
a lot of relocations. Large C++ programs such as are found in KDE
are a prime example.
While relocating a shared object, maintain a vector of symbols
which have already been looked up, directly indexed by symbol
number. Typically, symbols which are referenced by a relocation
entry are referenced by many of them. This is the same optimization
I made to the a.out dynamic linker in 1995 (rtld.c revision 1.30).
Also, compare the first character of a sought-after symbol with its
symbol table entry before calling strcmp().
On a PII/400 these changes reduce the start-up time of a typical
KDE program from 833 msec (elapsed) to 370 msec.
MFC after: 5 days
lock against themselves, causing infinite spinning. Brian Feldman
found this problem when testing with Mozilla and supplied the fix,
which I have revised slightly.
Here is the failure scenario. A thread calls dlopen() and acquires
the writer lock. While the thread still holds the lock, a signal
is delivered and caught. The signal handler tries to call a function
which hasn't been bound yet. It thus enters the dynamic linker
and tries to acquire the reader lock. Since the writer lock is
already held, it will spin forever in the signal handler. The
thread holding the lock won't be able to progress and release the
lock.
The solution is to block almost all signals while holding the
exclusive lock.
A similar problem could conceivably occur in the opposite order.
Namely, a thread is holding the reader lock and then a signal
handler calls dlopen() or dlclose() and spins waiting for the writer
lock. We deal with this administratively by proclaiming that signal
handlers aren't allowed to call dlopen() or dlclose(). Actually
we don't have to proclaim a thing, since signal handlers aren't
allowed to call any system functions except those which are explicitly
permitted.
Submitted by: Brian Fundakowski Feldman <green>
and for all (I hope). Packages such as wine, JDK, and linuxthreads
should no longer have any problems with re-entering the dynamic
linker.
This commit replaces the locking used in the dynamic linker with a
new spinlock-based reader/writer lock implementation. Brian
Fundakowski Feldman <green> argued for this from the very beginning,
but it took me a long time to come around to his point of view.
Spinlocks are the only kinds of locks that work with all thread
packages. But on uniprocessor systems they can be inefficient,
because while a contender for the lock is spinning the holder of the
lock cannot make any progress toward releasing it. To alleviate
this disadvantage I have borrowed a trick from Sleepycat's Berkeley
DB implementation. When spinning for a lock, the requester does a
nanosleep() call for 1 usec. each time around the loop. This will
generally yield the CPU to other threads, allowing the lock holder
to finish its business and release the lock. I chose 1 usec. as the
minimum sleep which would with reasonable certainty not be rounded
down to 0.
The formerly machine-independent file "lockdflt.c" has been moved
into the architecture-specific subdirectories by repository copy.
It now contains the machine-dependent spinlocking code. For the
spinlocks I used the very nifty "simple, non-scalable reader-preference
lock" which I found at
<http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html>
on all CPUs except the 80386 (the specific CPU model, not the
architecture). The 80386 CPU doesn't support the necessary "cmpxchg"
instruction, so on that CPU a simple exclusive test-and-set lock
is used instead. 80386 CPUs are detected at initialization time by
trying to execute "cmpxchg" and catching the resulting SIGILL
signal.
To reduce contention for the locks, I have revamped a couple of
key data structures, permitting all common operations to be done
under non-exclusive (reader) locking. The only operations that
require exclusive locking now are the rare intrusive operations
such as dlopen() and dlclose().
The dllockinit() interface is now deprecated. It still exists,
but only as a do-nothing stub. I plan to remove it as soon as is
reasonably possible. (From the very beginning it was clearly
labeled as experimental and subject to change.) As far as I know,
only the linuxthreads port uses dllockinit(). This interface turned
out to have several problems. As one example, when the dynamic
linker called a client-supplied locking function, that function
sometimes needed lazy binding, causing re-entry into the dynamic
linker and a big looping mess. And in any case, it turned out to be
too burdensome to require threads packages to register themselves
with the dynamic linker.
figure out which shared object(s) contain the the locking methods
and fully bind those objects as if they had been loaded with
LD_BIND_NOW=1. The goal is to keep the locking methods from
requiring any lazy binding. Otherwise infinite recursion occurs
in _rtld_bind.
This fixes the infinite recursion problem in the linuxthreads port.
just a few of them. This looks like it solves the recent
ld-elf.so.1: assert failed: /usr/src/libexec/rtld-elf/lockdflt.c:55
failures seen by some applications such as JDK.
init and fini functions. Now the code is very careful to hold no
locks when calling these functions. Thus the dynamic linker cannot
be re-entered with a lock already held.
Remove the tolerance for recursive locking that I added in revision
1.2 of dllockinit.c. Recursive locking shouldn't happen any more.
Mozilla and JDK users: I'd appreciate confirmation that things still
work right (or at least the same) with these changes.
locking functions. If an application loads a shared object with
dlopen() and the shared object has an init function which requires
lazy binding, then _rtld_bind is called when the thread is already
inside the dynamic linker. This leads to a recursive acquisition
of the lock, which I was not expecting -- hence the assert failure.
This work-around makes the default locking functions handle recursive
locking. It is NOT the correct fix -- that should be implemented
at the generic locking level rather than in the default locking
functions. I will implement the correct fix in a future commit.
Since the dllockinit() interface will likely need to change, warn
about that in both the man page and the header file.
functions to be used by the dynamic linker. This can be called by
threads packages at start-up time. I will add the call to libc_r
soon.
Also add a default locking method that is used up until dllockinit()
is called. The default method works by blocking SIGVTALRM, SIGPROF,
and SIGALRM in critical sections. It is based on the observation
that most user-space threads packages implement thread preemption
with one of these signals (usually SIGVTALRM).
The dynamic linker has never been reentrant, but it became less
reentrant in revision 1.34 of "src/libexec/rtld-elf/rtld.c".
Starting with that revision, multiple threads each doing lazy
binding could interfere with each other. The usual symptom was
that a symbol was falsely reported as undefined at start-up time.
It was rare but not unseen. This commit fixes it.
discovered by Hidetoshi Shimokawa. Large programs need multiple
GOTs. The lazy binding stub in the PLT can be reached from any of
these GOTs, but the dynamic linker only has enough information to
fix up the first GOT entry. Thus calls through the other GOTs went
through the time-consuming lazy binding process on every call.
This fix rewrites the PLT entries themselves to bypass the lazy
binding.
Tested by Hidetoshi Shimokawa and Steve Price.
Reviewed by: Doug Rabson <dfr@freebsd.org>
the Makefile, and move it down into the architecture-specific
subdirectories.
Eliminate an asm() statement for the i386.
Make the dynamic linker work if it is built as an executable instead
of as a shared library. See i386/Makefile.inc to find out how to
do it. Note, this change is not enabled and it might never be
enabled. But it might be useful in the future. Building the
dynamic linker as an executable should make it start up faster,
because it won't have any relocations. But in practice I suspect
the difference is negligible.
quite a few enhancements and bug fixes. There are still some known
deficiencies, but it should be adequate to get us started with ELF.
Submitted by: John Polstra <jdp@polstra.com>