lock against themselves, causing infinite spinning. Brian Feldman
found this problem when testing with Mozilla and supplied the fix,
which I have revised slightly.
Here is the failure scenario. A thread calls dlopen() and acquires
the writer lock. While the thread still holds the lock, a signal
is delivered and caught. The signal handler tries to call a function
which hasn't been bound yet. It thus enters the dynamic linker
and tries to acquire the reader lock. Since the writer lock is
already held, it will spin forever in the signal handler. The
thread holding the lock won't be able to progress and release the
lock.
The solution is to block almost all signals while holding the
exclusive lock.
A similar problem could conceivably occur in the opposite order.
Namely, a thread is holding the reader lock and then a signal
handler calls dlopen() or dlclose() and spins waiting for the writer
lock. We deal with this administratively by proclaiming that signal
handlers aren't allowed to call dlopen() or dlclose(). Actually
we don't have to proclaim a thing, since signal handlers aren't
allowed to call any system functions except those which are explicitly
permitted.
Submitted by: Brian Fundakowski Feldman <green>
and for all (I hope). Packages such as wine, JDK, and linuxthreads
should no longer have any problems with re-entering the dynamic
linker.
This commit replaces the locking used in the dynamic linker with a
new spinlock-based reader/writer lock implementation. Brian
Fundakowski Feldman <green> argued for this from the very beginning,
but it took me a long time to come around to his point of view.
Spinlocks are the only kinds of locks that work with all thread
packages. But on uniprocessor systems they can be inefficient,
because while a contender for the lock is spinning the holder of the
lock cannot make any progress toward releasing it. To alleviate
this disadvantage I have borrowed a trick from Sleepycat's Berkeley
DB implementation. When spinning for a lock, the requester does a
nanosleep() call for 1 usec. each time around the loop. This will
generally yield the CPU to other threads, allowing the lock holder
to finish its business and release the lock. I chose 1 usec. as the
minimum sleep which would with reasonable certainty not be rounded
down to 0.
The formerly machine-independent file "lockdflt.c" has been moved
into the architecture-specific subdirectories by repository copy.
It now contains the machine-dependent spinlocking code. For the
spinlocks I used the very nifty "simple, non-scalable reader-preference
lock" which I found at
<http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html>
on all CPUs except the 80386 (the specific CPU model, not the
architecture). The 80386 CPU doesn't support the necessary "cmpxchg"
instruction, so on that CPU a simple exclusive test-and-set lock
is used instead. 80386 CPUs are detected at initialization time by
trying to execute "cmpxchg" and catching the resulting SIGILL
signal.
To reduce contention for the locks, I have revamped a couple of
key data structures, permitting all common operations to be done
under non-exclusive (reader) locking. The only operations that
require exclusive locking now are the rare intrusive operations
such as dlopen() and dlclose().
The dllockinit() interface is now deprecated. It still exists,
but only as a do-nothing stub. I plan to remove it as soon as is
reasonably possible. (From the very beginning it was clearly
labeled as experimental and subject to change.) As far as I know,
only the linuxthreads port uses dllockinit(). This interface turned
out to have several problems. As one example, when the dynamic
linker called a client-supplied locking function, that function
sometimes needed lazy binding, causing re-entry into the dynamic
linker and a big looping mess. And in any case, it turned out to be
too burdensome to require threads packages to register themselves
with the dynamic linker.
"ld-elf.so.1.old". The dynamic linker is a critical component of
the system, and it is difficult to recover if it is damaged and
there isn't a working backup available. For instance, parts of
the toolchain such as the assembler are dynamically linked, making
it impossible to build a new dynamic linker if the installed one
doesn't work.
DWARF2 exception tables emitted by the compiler for C++ sources.
These tables are tightly packed, and they contain some relocated
addresses which are not well-aligned.
really used in bsd.man.mk).
Don't uselessly set MANSRC ("." is in the path by default, and there are
no ordering problems).
Fixed some other style bugs.
interface, and statically link them to the programs using them.
These functions, upon reflection and discussion, are too generically
named for a library interface with such specific functionality.
Also the api that they use, whilst ok for private use, isn't good
enough for a libc function.
Additionally there were complications with the build/install-world
process. It depends heavily upon xinstall, which got broken by
the change in api, and caused bootstrap problems and general mayhem.
There is work in progress to address future problems that may be
caused by changes in install-chain tools, and better names for
{g|s}etflags can be derived when some future program requires them.
For now the code has been left in src/lib/libc/gen (it started off
in src/bin/ls).
It's important to provide library functions for manipulating file
flag strings if we ever want this interface to be adopted outside
of the source tree, but now isn't necessarily the right moment
with 4.0-release just around the corner.
Approved: jkh
When hostname is not set, ftpd core dumps, because there is no
NULL check for freeing name resolving information for its own
hostname.
So the check is added.
Approved by: jkh
Some of rcmd related function is need to be updated to
support IPv6. Some of them are already updated as standard
document. But there is also several de-facto functions and
they are not listed in standard documents.
They are,
iruserok() (used by rlogind, rshd)
ruserok() (used by kerberos, etc)
KAME package updated those functions in original way.
iruserok_af()
ruserok_af()
But recently there was discussion on IETF IPng mailing
list about how to sync those API, and it is decided,
-Those function is not standard and not documented.
-But let BSDs sync their API as de-facto.
And after some discussion, it is announced that
-add update to iruserok() as iruserok_sa()
-no ruserok() API change(it is only updated internaly)
So I sync those API before 4.0 is released.
The changes are,
-prototype changes
-ruserok() internal update (use iruserok_sa() inside)
-removal of ruserok_af()
-change iruserok_af() as static functioin, and also prefix the name with __.
-add iruserok_sa() (Just call __iruserok_af() inside)
-adding flag AI_ALL to getipnodebyaddr() called from __icheckhost().
This is necessary to support IPv4 communication via AF_INET6 socket
could be correctly authenticated via iruserok_sa()
-irusreok_af() call is replaced to iruserok_sa() call
in rlogind, and rshd.
Approved by: jkh
figure out which shared object(s) contain the the locking methods
and fully bind those objects as if they had been loaded with
LD_BIND_NOW=1. The goal is to keep the locking methods from
requiring any lazy binding. Otherwise infinite recursion occurs
in _rtld_bind.
This fixes the infinite recursion problem in the linuxthreads port.
just a few of them. This looks like it solves the recent
ld-elf.so.1: assert failed: /usr/src/libexec/rtld-elf/lockdflt.c:55
failures seen by some applications such as JDK.
init and fini functions. Now the code is very careful to hold no
locks when calling these functions. Thus the dynamic linker cannot
be re-entered with a lock already held.
Remove the tolerance for recursive locking that I added in revision
1.2 of dllockinit.c. Recursive locking shouldn't happen any more.
Mozilla and JDK users: I'd appreciate confirmation that things still
work right (or at least the same) with these changes.
. add Xrs to hosts.equiv(5), auth.conf(5), services(5) to some pages
. sort Xrs in SEE ALSO sections
Patches based on PR: docs/15680
Submitted by: Christian Weisgerber <naddy@mips.rhein-neckar.de>
locking functions. If an application loads a shared object with
dlopen() and the shared object has an init function which requires
lazy binding, then _rtld_bind is called when the thread is already
inside the dynamic linker. This leads to a recursive acquisition
of the lock, which I was not expecting -- hence the assert failure.
This work-around makes the default locking functions handle recursive
locking. It is NOT the correct fix -- that should be implemented
at the generic locking level rather than in the default locking
functions. I will implement the correct fix in a future commit.
Since the dllockinit() interface will likely need to change, warn
about that in both the man page and the header file.
functions to be used by the dynamic linker. This can be called by
threads packages at start-up time. I will add the call to libc_r
soon.
Also add a default locking method that is used up until dllockinit()
is called. The default method works by blocking SIGVTALRM, SIGPROF,
and SIGALRM in critical sections. It is based on the observation
that most user-space threads packages implement thread preemption
with one of these signals (usually SIGVTALRM).
The dynamic linker has never been reentrant, but it became less
reentrant in revision 1.34 of "src/libexec/rtld-elf/rtld.c".
Starting with that revision, multiple threads each doing lazy
binding could interfere with each other. The usual symptom was
that a symbol was falsely reported as undefined at start-up time.
It was rare but not unseen. This commit fixes it.
happened as it was working around problems elsewhere (ie: binutils/ld
not doing the right thing according to the ELF design). libcrypt has
been adjusted to not need the runtime -lmd. It's still not quite right
(ld is supposed to work damnit) but at least it doesn't impact all the
users of libcrypt in Marcel's cross-build model.