first calls mmap() with the arguments PROT_NONE and MAP_ANON to reserve a
single, contiguous range of virtual addresses for the entire shared library.
Later, rtld calls mmap() with the the shared library's file descriptor
and the argument MAP_FIXED to place the text and data sections within the
reserved range. The rationale for mapping shared libraries in this way is
explained in the commit message for Revision 190885. However, this approach
does have an unintended, negative consequence. Since the first call to
mmap() specifies MAP_ANON and not the shared library's file descriptor, the
kernel has no idea what alignment the vm object backing the file prefers.
As a result, the reserved range's alignment is unlikely to be the same as
the vm object's, and so mapping with superpages becomes impossible. To
address this problem, this revision adds the argument MAP_ALIGNED_SUPER to
the first call to mmap() if the text section is larger than the smallest
superpage size.
To determine if the text section is larger than the smallest superpage
size, rtld must always fetch the page size information. As a result, the
private code for fetching the base page size in rtld's builtin malloc is
redundant. Eliminate it. Requested by: kib
Tested by: zbb (on arm)
Reviewed by: kib (an earlier version)
Discussed with: jhb
by John Marino <draco@marino.st>, with the following (edited) commit
message
Date: Sat, 24 Mar 2012 06:40:50 +0100
Subject: [PATCH 1/1] rtld: Implement DT_RUNPATH and -z nodefaultlib
DT_RUNPATH is incorrectly being considered as an alias of DT_RPATH. The
purpose of DT_RUNPATH is to have two different types of rpath: one that
can be overridden by the environment variable LD_LIBRARY_PATH and one that
can't. With the currently implementation, LD_LIBRARY_PATH will always
trump any embedded rpath or runpath tags.
Current path search order by rtld:
==================================
LD_LIBRARY_PATH
DT_RPATH / DT_RUNPATH (always the same)
ldconfig hints file (default: /var/run/ld-elf.so.hints)
/usr/lib
New path search order by rtld:
==============================
DT_RPATH of the calling object if no DT_RUNPATH
DT_RPATH of the main binary if no DT_RUNPATH and binary isn't calling obj
LD_LIBRARY_PATH
DT_RUNPATH
ldconfig hints file
/usr/lib
The new path search matches how the linux runtime loader works. The other
major added feature is support for linker flag "-z nodefaultlib". When
this flag is passed to the linker, rtld will skip all references to the
standard library search path ("/usr/lib" in this case but it could handle
more color delimited paths) except in DT_RPATH and DT_RUNPATH.
New path search order by rtld with -z nodefaultlib flag set:
============================================================
DT_RPATH of the calling object if no DT_RUNPATH
DT_RPATH of the main binary if no DT_RUNPATH and binary isn't calling obj
LD_LIBRARY_PATH
DT_RUNPATH
ldconfig hints file (skips all references to /usr/lib)
FreeBSD notes:
- we fixed some bugs which were submitted to DragonFly and merged there
as commit 1ff8a2bd3eb6e5587174c6a983303ea3a79e0002;
- we added LD_LIBRARY_PATH_RPATH environment variable to switch to
the previous behaviour of considering DT_RPATH a synonym for DT_RUNPATH;
- the FreeBSD default search path is /lib:/usr/lib and not /usr/lib.
Reviewed by: kan
MFC after: 1 month
MFC note: flip the ld_library_path_rpath default value for stable/9
hash elements, and a helper matched_symbol() which match the given hash
entry and request, performing needed type and version checks.
Based on dragonflybsd support for GNU hash by John Marino <draco marino st>
Reviewed by: kan
Tested by: bapt
MFC after: 2 weeks
for the same object. This can happen when object is a dependency of the
dlopen()ed dso. When called several times, we waste time due to unneeded
processing, and memory, because obj->vertab is allocated anew on each
iteration.
Reviewed by: kan
MFC after: 2 weeks
are assumed to not fail.
Make the xcalloc() calling conventions follow the calloc(3) calling
conventions and replace unchecked calls to calloc() with calls to
xcalloc().
Remove redundand declarations from xmalloc.c, which are already
present in rtld.h.
Reviewed by: kan
Discussed with: bde
MFC after: 2 weeks
Do not relocate twice an object which happens to be needed by loaded
binary (or dso) and some filtee opened due to symbol resolution when
relocating need objects. Record the state of the relocation
processing in Obj_Entry and short-circuit relocate_objects() if
current object already processed.
Do not call constructors for filtees loaded during the early
relocation processing before image is initialized enough to run
user-provided code. Filtees are loaded using dlopen_object(), which
normally performs relocation and initialization. If filtee is
lazy-loaded during the relocation of dso needed by the main object,
dlopen_object() runs too earlier, when most runtime services are not
yet ready.
Postpone the constructors call to the time when main binary and
depended libraries constructors are run, passing the new flag
RTLD_LO_EARLY to dlopen_object(). Symbol lookups callers inform
symlook_* functions about early stage of initialization with
SYMLOOK_EARLY. Pass flags through all functions participating in
object relocation.
Use the opportunity and fix flags argument to find_symdef() in
arch-specific reloc.c to use proper name SYMLOOK_IN_PLT instead of
true, which happen to have the same numeric value.
Reported and tested by: theraven
Reviewed by: kan
MFC after: 2 weeks
Stop using strerror(3) in rtld, which brings in msgcat and stdio.
Directly access sys_errlist array of errno messages with private
rtld_strerror() function.
Now,
$ size /libexec/ld-elf.so.1
text data bss dec hex filename
96983 2480 8744 108207 1a6af /libexec/ld-elf.so.1
Reviewed by: dim, kan
MFC after: 2 weeks
particular on ARM, do require working init arrays.
Traditional FreeBSD crt1 calls _init and _fini of the binary, instead
of allowing runtime linker to arrange the calls. This was probably
done to have the same crt code serve both statically and dynamically
linked binaries. Since ABI mandates that first is called preinit
array functions, then init, and then init array functions, the init
have to be called from rtld now.
To provide binary compatibility to old FreeBSD crt1, which calls _init
itself, rtld only calls intializers and finalizers for main binary if
binary has a note indicating that new crt was used for linking. Add
parsing of ELF notes to rtld, and cache p_osrel value since we parsed
it anyway.
The patch is inspired by init_array support for DragonflyBSD, written
by John Marino.
Reviewed by: kan
Tested by: andrew (arm, previous version), flo (sparc64, previous version)
MFC after: 3 weeks
rtld on 386 and amd64. This adds runtime bits neccessary for the use
of the dispatch functions from the dynamically-linked executables and
shared libraries.
To allow use of external references from the dispatch function, resolution
of the R_MACHINE_IRESOLVE relocations in PLT is postponed until GOT entries
for PLT are prepared, and normal resolution of the GOT entries is finished.
Similar to how it is done by GNU, IRELATIVE relocations are resolved in
advance, instead of normal lazy handling for PLT.
Move the init_pltgot() call before the relocations for the object are
processed.
MFC after: 3 weeks
C runtime services, like printf(). Unfortunately, the multithread-safeness
measures in the libc do not work in rtld environment.
Rip the kernel printf() implementation and use it in the rtld instead of
libc version. This printf does not require any shared global data and thus
is mt-safe. Systematically use rtld_printf() and related functions, remove
the calls to err(3).
Note that stdio is still pulled from libc due to libmap implementaion using
fopen(). This is safe but unoptimal, and can be changed later.
Reported and tested by: pgj
Diagnosed and reviewed by: kan (previous version)
Approved by: re (bz)
by kernel, and parse PT_GNU_STACK phdr from linked and loaded dsos.
If the loaded dso requires executable stack, as specified by PF_X bit
of p_flags of PT_GNU_STACK phdr, but current stack protection does not
permit execution, the __pthread_map_stacks_exec symbol is looked up
and called. It should be implemented in libc or threading library and
change the protection mode of all thread stacks to be executable.
Provide a private interface _rtld_get_stack_prot() to export the stack
access mode as calculated by rtld.
Reviewed by: kan
filters are implemented.
Filtees are loaded on demand, unless LD_LOADFLTR environment variable
is set or -z loadfltr was specified during the linking. This forces
rtld to upgrade read-locked rtld_bind_lock to write lock when it
encounters an object with filter during symbol lookup.
Consolidate common arguments of the symbol lookup functions in the
SymLook structure. Track the state of the rtld locks in the
RtldLockState structure. Pass local RtldLockState through the rtld
symbol lookup calls to allow lock upgrades.
Reviewed by: kan
Tested by: Mykola Dzham <i levsha me>, nwhitehorn (powerpc)
dependency, then the dso never has its DAG initialized. Empty DAG
makes ref_dag() call in dlopen() a nop, and the dso refcount is off
by one.
Initialize the DAG on the first dlopen() call, using a boolean flag
to prevent double initialization.
From the PR (edited):
Assume we have a library liba.so, containing a function a(), and a
library libb.so, containing function b(). liba.so needs functionality
from libb.so, so liba.so links in libb.so.
An application doesn't know about the relation between these libraries,
but needs to call a() and b(). It dlopen()s liba.so and obtains a
pointer to a(), then it dlopen()s libb.so and obtains a pointer to b().
As soon as the application doesn't need a() anymore, it dlclose()s liba.so.
Expected result: the pointer to b() is still valid and can be called
Actual result: the pointer to b() has become invalid, even though the
application did not dlclose() the handle to libb.so. On calling b(), the
application crashes with a segmentation fault.
PR: misc/151861
Based on patch by: jh
Reviewed by: kan
Tested by: Arjan van Leeuwen <freebsd-maintainer opera com>
MFC after: 1 week
altered through their .init code. This might happen if init
vector calls dlopen on its own and that dlopen causes some not
yet initialized object to be initialized earlier as part of that
dlopened DAG.
Do not reset module reference counts to zero on final fini vector
run when process is exiting. Just add an additional parameter to
force fini vector invocation regardless of current reference count
value if object was not destructed yet. This allows dlclose called
from fini vector to proceed normally instead of failing with handle
validation error.
Reviewed by: kib
Reported by: venki kaps
soneeded pathes. The $ORIGIN, $OSNAME, $OSREL and $PLATFORM tokens
are supported. Enabling the substitution requires DF_ORIGIN flag in
DT_FLAGS or DF_1_ORIGIN if DF_FLAGS_1, that may be set with -z origin
gnu ld flag. Translation is unconditionally disabled for setuid/setgid
processes.
The $ORIGIN translation relies on the AT_EXECPATH auxinfo supplied
by kernel.
Requested by: maho
Tested by: maho, pho
Reviewed by: kan
This code came from the merged mips2 and Juniper mips repositories.
Warner Losh, Randall Seager, Oleksandr Tymoshenko and Olivier Houchard
worked to merge, debug and integrate this code. This code may also
contain code derived from NetBSD.
to be compatible with symbol versioning support as implemented by
GNU libc and documented by http://people.redhat.com/~drepper/symbol-versioning
and LSB 3.0.
Implement dlvsym() function to allow lookups for a specific version of
a given symbol.
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.
MFC after: 2 weeks
Setting the LD_DUMP_REL_PRE or LD_DUMP_REL_POST environment variables
cause rtld-elf to output a table of all relocations.
This is useful for debugging.
implementation in case default one provided by rtld is
not suitable.
Consolidate various identical MD lock implementation into
a single file using appropriate machine/atomic.h.
Approved by: re (scottl)
Introdice RTLD_SELF special handle and properly process it within
dlsym() and dlinfo() functions.
The intention is to improve our compatibility with Solaris and
to make a Java port easier.
Partially submitted by: phantom
DT_INIT and DT_FINI tags pointed to fptr records. In 2.11.2, it points
to the actuall address of the function. On IA64 you cannot just take
an address of a function, store it in a function pointer variable and
call it.. the function pointers point to a fptr data block that has the
target gp and address in it. This is absolutely necessary for using
the in-tree binutils toolchain, but (unfortunately) will not work with
old shared libraries. Save your old ld-elf.so.1 if you want to use
old ones still. Do not mix-and-match.
This is a no-op change for i386 and alpha.
Reviewed by: dfr
particularly help programs which load many shared libraries with
a lot of relocations. Large C++ programs such as are found in KDE
are a prime example.
While relocating a shared object, maintain a vector of symbols
which have already been looked up, directly indexed by symbol
number. Typically, symbols which are referenced by a relocation
entry are referenced by many of them. This is the same optimization
I made to the a.out dynamic linker in 1995 (rtld.c revision 1.30).
Also, compare the first character of a sought-after symbol with its
symbol table entry before calling strcmp().
On a PII/400 these changes reduce the start-up time of a typical
KDE program from 833 msec (elapsed) to 370 msec.
MFC after: 5 days
longer includes machine/elf.h.
* consumers of elf.h now use the minimalist elf header possible.
This change is motivated by Binutils 2.11.0 and too much clashing over
our base elf headers and the Binutils elf headers.
Formerly the init functions were called in the opposite of the
order in which libraries were loaded, and libraries were loaded
according to a breadth-first traversal of the dependency graph.
That ordering came from SVR4.0, and it was easy to implement but
not always sensible.
Now we do a depth-first walk over the dependency graph and call
the init functions in an order such that each shared object's needed
objects are initialized before the shared object itself. At the
same time we build a list of finalization (fini) functions in the
opposite order, to guarantee correct C++ destructor ordering whenever
possible. (It may not be possible if dlopen and dlclose are used
in strange ways, but we come as close as one can come.)
The need for this renovation has become apparent as more programs
have started using multithreading. The multithreaded C library
libc_r requires initialization, whereas the standard libc does not.
Since virtually every other object depends on the C library, it is
important that it get initialized first.
and for all (I hope). Packages such as wine, JDK, and linuxthreads
should no longer have any problems with re-entering the dynamic
linker.
This commit replaces the locking used in the dynamic linker with a
new spinlock-based reader/writer lock implementation. Brian
Fundakowski Feldman <green> argued for this from the very beginning,
but it took me a long time to come around to his point of view.
Spinlocks are the only kinds of locks that work with all thread
packages. But on uniprocessor systems they can be inefficient,
because while a contender for the lock is spinning the holder of the
lock cannot make any progress toward releasing it. To alleviate
this disadvantage I have borrowed a trick from Sleepycat's Berkeley
DB implementation. When spinning for a lock, the requester does a
nanosleep() call for 1 usec. each time around the loop. This will
generally yield the CPU to other threads, allowing the lock holder
to finish its business and release the lock. I chose 1 usec. as the
minimum sleep which would with reasonable certainty not be rounded
down to 0.
The formerly machine-independent file "lockdflt.c" has been moved
into the architecture-specific subdirectories by repository copy.
It now contains the machine-dependent spinlocking code. For the
spinlocks I used the very nifty "simple, non-scalable reader-preference
lock" which I found at
<http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html>
on all CPUs except the 80386 (the specific CPU model, not the
architecture). The 80386 CPU doesn't support the necessary "cmpxchg"
instruction, so on that CPU a simple exclusive test-and-set lock
is used instead. 80386 CPUs are detected at initialization time by
trying to execute "cmpxchg" and catching the resulting SIGILL
signal.
To reduce contention for the locks, I have revamped a couple of
key data structures, permitting all common operations to be done
under non-exclusive (reader) locking. The only operations that
require exclusive locking now are the rare intrusive operations
such as dlopen() and dlclose().
The dllockinit() interface is now deprecated. It still exists,
but only as a do-nothing stub. I plan to remove it as soon as is
reasonably possible. (From the very beginning it was clearly
labeled as experimental and subject to change.) As far as I know,
only the linuxthreads port uses dllockinit(). This interface turned
out to have several problems. As one example, when the dynamic
linker called a client-supplied locking function, that function
sometimes needed lazy binding, causing re-entry into the dynamic
linker and a big looping mess. And in any case, it turned out to be
too burdensome to require threads packages to register themselves
with the dynamic linker.