59 Commits

Author SHA1 Message Date
Konstantin Belousov
9fee0541f2 Do not call callbacks for dl_iterate_phdr(3) with the rtld bind and
phdr locks locked.  This allows to call rtld services from the
callback, which is only reasonable for dlopen(path, RTLD_NOLOAD) to
test existence of the library in the image, and for dlsym().  The
later might still be not quite safe, due to the lazy resolution of
filters.

To allow dropping the locks around iteration in dl_iterate_phdr(3), we
insert markers to track current position between relocks.  The global
objects list is converted to tailq and all iterators skip markers,
globallist_next() and globallist_curr() helpers are added.

Reported and tested by:	davide
Reviewed by:	kan
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2016-01-20 07:21:33 +00:00
Ed Maste
d88def534e rtld: wrap a comment to 80 columns 2016-01-05 02:21:57 +00:00
Warner Losh
8fd53f4577 Create a generalized exec hook that different architectures can hook
into if they need to, but default to no action.

Differential Review: https://reviews.freebsd.org/D2718
2016-01-03 04:32:02 +00:00
Eric van Gyzen
ddab052725 Disable SSE in libthr
Clang emits SSE instructions on amd64 in the common path of
pthread_mutex_unlock.  If the thread does not otherwise use SSE,
this usage incurs a context-switch of the FPU/SSE state, which
reduces the performance of multiple real-world applications by a
non-trivial amount (3-5% in one application).

Instead of this change, I experimented with eagerly switching the
FPU state at context-switch time.  This did not help.  Most of the
cost seems to be in the read/write of memory--as kib@ stated--and
not in the #NM handling.  I tested on machines with and without
XSAVEOPT.

One counter-argument to this change is that most applications already
use SIMD, and the number of applications and amount of SIMD usage
are only increasing.  This is absolutely true.  I agree that--in
general and in principle--this change is in the wrong direction.
However, there are applications that do not use enough SSE to offset
the extra context-switch cost.  SSE does not provide a clear benefit
in the current libthr code with the current compiler, but it does
provide a clear loss in some cases.  Therefore, disabling SSE in
libthr is a non-loss for most, and a gain for some.

I refrained from disabling SSE in libc--as was suggested--because
I can't make the above argument for libc.  It provides a wide variety
of code; each case should be analyzed separately.

https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html

Suggestions from:	dim, jmg, rpaulo
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell Inc.
2015-08-05 12:53:55 +00:00
Konstantin Belousov
0c4f9ecde3 Change compiler setting to make default visibility of the symbols for
rtld on x86 to be hidden.  This is a micro-optimization, which allows
intrinsic references inside rtld to be handled without indirection
through PLT.  The visibility of rtld symbols for other objects in the
symbol namespace is controlled by a version script.

Reviewed by:	kan, jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-03-29 18:53:21 +00:00
Konstantin Belousov
74b0daf4f9 Optimize r270798, only do the second pass over non-plt relocations
when the first pass found IFUNCs.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-29 10:43:56 +00:00
Konstantin Belousov
14c3564759 IFUNC symbol type shall be processed for non-PLT relocations,
e.g. when a global variable is initialized with a pointer to ifunc.
Add symbol type check and call resolver for STT_GNU_IFUNC symbol types
when processing non-PLT relocations, but only after non-IFUNC
relocations are done.  The two-phase proceessing is required since
resolvers may reference other symbols, which must be ready to use when
resolver calls are done.

Restructure reloc_non_plt() on x86 to call find_symdef() and handle
IFUNC in single place.

For non-x86 reloc_non_plt(), check for call for IFUNC relocation and
do nothing, to avoid processing relocs twice.

PR:	193048
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-29 09:29:10 +00:00
Konstantin Belousov
8cc5663495 Add dwarf annotations to the amd64 _rtld_bind_start to allow debuggers
to unwind around the calls from PLT to binder.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-04-14 22:44:50 +00:00
Konstantin Belousov
f62651920d Add GNU hash support for rtld.
Based on dragonflybsd support for GNU hash by John Marino <draco marino st>
Reviewed by:	kan
Tested by:	bapt
MFC after:	2 weeks
2012-04-30 13:31:10 +00:00
Konstantin Belousov
082f959ac8 Fix several problems with our ELF filters implementation.
Do not relocate twice an object which happens to be needed by loaded
binary (or dso) and some filtee opened due to symbol resolution when
relocating need objects.  Record the state of the relocation
processing in Obj_Entry and short-circuit relocate_objects() if
current object already processed.

Do not call constructors for filtees loaded during the early
relocation processing before image is initialized enough to run
user-provided code.  Filtees are loaded using dlopen_object(), which
normally performs relocation and initialization.  If filtee is
lazy-loaded during the relocation of dso needed by the main object,
dlopen_object() runs too earlier, when most runtime services are not
yet ready.

Postpone the constructors call to the time when main binary and
depended libraries constructors are run, passing the new flag
RTLD_LO_EARLY to dlopen_object().  Symbol lookups callers inform
symlook_* functions about early stage of initialization with
SYMLOOK_EARLY.  Pass flags through all functions participating in
object relocation.

Use the opportunity and fix flags argument to find_symdef() in
arch-specific reloc.c to use proper name SYMLOOK_IN_PLT instead of
true, which happen to have the same numeric value.

Reported and tested by:	theraven
Reviewed by:	kan
MFC after:	2 weeks
2012-03-20 13:20:49 +00:00
Konstantin Belousov
83aa9cc00c Add support for preinit, init and fini arrays. Some ABIs, in
particular on ARM, do require working init arrays.

Traditional FreeBSD crt1 calls _init and _fini of the binary, instead
of allowing runtime linker to arrange the calls.  This was probably
done to have the same crt code serve both statically and dynamically
linked binaries.  Since ABI mandates that first is called preinit
array functions, then init, and then init array functions, the init
have to be called from rtld now.

To provide binary compatibility to old FreeBSD crt1, which calls _init
itself, rtld only calls intializers and finalizers for main binary if
binary has a note indicating that new crt was used for linking.  Add
parsing of ELF notes to rtld, and cache p_osrel value since we parsed
it anyway.

The patch is inspired by init_array support for DragonflyBSD, written
by John Marino.

Reviewed by:	kan
Tested by:	andrew (arm, previous version), flo (sparc64, previous version)
MFC after:	3 weeks
2012-03-11 20:03:09 +00:00
Ed Schouten
581f58e7a3 Remove unneeded dtv variable.
It is only assigned and not used at all. The object files stay identical
when the variables are removed.

Approved by:	kib
2012-01-17 21:55:20 +00:00
Konstantin Belousov
5734c46c68 _rtld_bind() read-locks the bind lock, and possible plt resolution
from the dispatcher would also acquire bind lock in read mode, which
is the supported operation. plt is explicitely designed to allow safe
multithreaded updates, so the shared lock do not cause problems.

The error in r228435 is that it allows read lock acquisition after the
write lock for the bind block.  If we dlopened the shared object that
contains IRELATIVE or jump slot which target is STT_GNU_IFUNC, then
possible recursive plt resolve from the dispatcher would cause it.

Postpone the resolution for irelative/ifunc right before initializers
are called, and drop bind lock around calls to dispatcher.  Use
initlist to iterate over the objects instead of the ->next, due to
drop of the bind lock in iteration.

For i386/reloc.c:reloc_iresolve(), fix calculation of the dispatch
function address for dso, by taking into account possible non-zero
relocbase.

MFC after:	3 weeks
2011-12-14 16:47:53 +00:00
Konstantin Belousov
6be4b69715 Add support for STT_GNU_IFUNC and R_MACHINE_IRELATIVE GNU extensions to
rtld on 386 and amd64. This adds runtime bits neccessary for the use
of the dispatch functions from the dynamically-linked executables and
shared libraries.

To allow use of external references from the dispatch function, resolution
of the R_MACHINE_IRESOLVE relocations in PLT is postponed until GOT entries
for PLT are prepared, and normal resolution of the GOT entries is finished.
Similar to how it is done by GNU, IRELATIVE relocations are resolved in
advance, instead of normal lazy handling for PLT.

Move the init_pltgot() call before the relocations for the object are
processed.

MFC after:	3 weeks
2011-12-12 11:03:14 +00:00
Eitan Adler
36daf0495a - change "is is" to "is" or "it is"
- change "the the" to "the"

Approved by:	lstewart
Approved by:	sahil (mentor)
MFC after:	3 days
2011-10-16 14:30:28 +00:00
Konstantin Belousov
cb38d4941c When loading dso without PT_GNU_STACK phdr, only call
__pthread_map_stacks_exec() on architectures that allow executable
stacks.

Reported and tested by:	marcel (ia64)
2011-01-25 21:12:31 +00:00
Konstantin Belousov
3ad6376e56 Add section .note.GNU-stack for assembly files used by 386 and amd64. 2011-01-07 16:07:05 +00:00
Dimitry Andric
9a17b89ccf Sort -mno-(mmx|3dnow|sse|sse2|sse3) options consistently throughout the
tree.

Submitted by:	arundel
2011-01-05 21:23:26 +00:00
Dimitry Andric
e172464728 On amd64 and i386, tell the compiler to refrain from generating SSE,
3DNow, MMX and floating point instructions in rtld-elf.

Otherwise, _rtld_bind() (and whatever it calls) could possibly clobber
function arguments that are passed in SSE/3DNow/MMX/FP registers,
usually floating point values.  This can happen, for example, when clang
generates SSE code for memset() or memcpy() calls.

One symptom of this is sshd dying early on amd64 with "PRNG not seeded",
which is ultimately caused by libcrypto.so.6 calling RAND_add() with a
double parameter.  That parameter is passed via %xmm0, which gets wiped
out by an SSE memset() in _rtld_bind().

Reviewed by:	kib, kan
2011-01-04 20:51:28 +00:00
Dimitry Andric
7606ddab28 Remove '-elf' from build flags for libexec/rtld-elf for amd64 and i386.
ELF has been the default format for almost 12 years now.
2011-01-04 20:26:41 +00:00
Konstantin Belousov
8569deaf1c Implement support for ELF filters in rtld. Both normal and auxillary
filters are implemented.

Filtees are loaded on demand, unless LD_LOADFLTR environment variable
is set or -z loadfltr was specified during the linking. This forces
rtld to upgrade read-locked rtld_bind_lock to write lock when it
encounters an object with filter during symbol lookup.

Consolidate common arguments of the symbol lookup functions in the
SymLook structure.  Track the state of the rtld locks in the
RtldLockState structure. Pass local RtldLockState through the rtld
symbol lookup calls to allow lock upgrades.

Reviewed by:	kan
Tested by:	Mykola Dzham <i levsha me>, nwhitehorn (powerpc)
2010-12-25 08:51:20 +00:00
Warner Losh
25faff346c MFtbemd:
Prefer MACHNE_CPUARCH to MACHINE_ARCH in most contexts where you want
to test of all the CPUs of a given family conform.
2010-08-23 22:24:11 +00:00
Roman Divacky
1dfdc15bb0 Only use the cache after the early stage of loading. This is
because calling mmap() etc. may use GOT which is not set up
yet. Use calloc() instead of mmap() in cases where this
was the case before (sparc64, powerpc, arm).

Submitted by:	Dimitry Andric (dimitry andric com)
Reviewed by:	kan
Approved by:	ed (mentor)
2010-05-18 08:55:23 +00:00
Robert Watson
d1f2f1c3f3 Now that the kernel defines CACHE_LINE_SIZE in machine/param.h, use
that definition in the custom locking code for the run-time linker
rather than local definitions.

Pointed out by:	tinderbox
MFC after:	2 weeks
2009-04-19 23:02:50 +00:00
Dag-Erling Smørgrav
4421d895a9 *thwack*! all the world's not i386.
Pointy hat to:	des
2006-03-29 12:29:01 +00:00
David Xu
c0d2338cdd Allocate space for thread pointer, this allows thread library to access
its pointer from begin, and simplifies _get_curthread() in libthr.
2006-03-28 06:09:24 +00:00
Alexander Kabaev
0eb88f2029 Implement ELF symbol versioning using GNU semantics. This code aims
to be compatible with symbol versioning support as implemented by
GNU libc and documented by http://people.redhat.com/~drepper/symbol-versioning
and LSB 3.0.

Implement dlvsym() function to allow lookups for a specific version of
a given symbol.
2005-12-18 19:43:33 +00:00
Marcel Moolenaar
55dfaa9163 Explicitly cast ELF_R_TYPE() to the right type. 2005-12-18 01:38:26 +00:00
John Baldwin
2939195e46 Remove these unused files before any other archs include the same bogus
file.
2004-11-12 18:05:30 +00:00
Doug Rabson
017246d02f Add support for Thread Local Storage. 2004-08-03 08:51:00 +00:00
Peter Wemm
c707fea10b More stack alignment fixes. Arrange so we call _rtld() in ld-elf.so.1
with the correct alignment.  This is important because this calls to
library static constructors are made from here.  The bug in the old crt*.s
files hid this because in this case, two wrongs do indeed make a right.
Also, call _rtld_bind() with the correct alignment, because it calls back
into the pthread library locking functions.  If things happen just
the wrong way, we get a SIG10 due to the broken stack alignment.
2004-03-21 01:43:39 +00:00
Peter Wemm
6143d8ba5f Fix dynamic linking a bit more.. enough that mozilla-firebird works if you
dig up the patches for amd64 support for it.

Note to self: do not put a 64 bit value in a 32 bit space.
2003-12-12 01:12:41 +00:00
Peter Wemm
080f5381b7 Revert last change. ../rtld.c uses CACHE_LINE_SIZE too.
Change it to 64 while here.

Reported by:  ps
2003-12-11 18:42:51 +00:00
Peter Wemm
165d50f626 Only define CACHE_LINE_SIZE in one place.. 2003-12-11 04:49:37 +00:00
Peter Wemm
40a7c81112 CACHE_LINE_SIZE is 64 on athlon and amd64 chips, not 32. This should
probably be 128 since that is what the hardware prefetch fill size is
on both the p3, p4 and athlon* cpus.
2003-12-11 04:47:53 +00:00
Alexander Kabaev
6d5d786f80 Allow threading libraries to register their own locking
implementation in case default one provided by rtld is
not suitable.

Consolidate various identical MD lock implementation into
a single file using appropriate machine/atomic.h.

Approved by:	re (scottl)
2003-05-29 22:58:26 +00:00
Peter Wemm
9783a12b34 Initial pass at supporting shared libraries on amd64. There are still
a few missing relocation types in amd64/reloc.c, but I have not found
any of them in use yet. :-)

Approved by:  re (amd64/* blanket)
2003-05-24 17:37:51 +00:00
Peter Wemm
7c1622ff28 Remove 80386 bandaids from code repocopied from i386. rtld_start.S still
todo.
2003-04-30 21:09:06 +00:00
Alexander Kabaev
605f36fc1e No need to zero fill memory, mmapped anonymously. Kernel will
return pre-zeroed pages itself.

Noticed by:     jake
2003-03-14 21:10:13 +00:00
Thomas Moestl
a42a42e9b9 Fix the handling of high PLT entries (> 32764) on sparc64. This requires
additional arguments to reloc_jmpslot(), which is why MI code and MD code
of other platforms had to be changed.

Reviewed by:	jake
Approved by:	re
2002-11-18 22:08:50 +00:00
John Polstra
e6f0183bff Remove the nanosleep calls from the spin loops in the locking code.
They provided little benefit (if any) and they caused some problems
in OpenOffice, at least in post-KSE -current and perhaps in other
environments too.  The nanosleep calls prevented the profiling timer
from advancing during the spinloops, thereby preventing the thread
scheduler from ever pre-empting the spinning thread.  Alexander
Kabaev diagnosed this problem, Martin Blapp helped with testing,
and Matt Dillon provided some helpful suggestions.

This is a short-term fix for a larger problem.  The use of spinlocking
isn't guaranteed to work in all cases.  For example, if the spinning
thread has higher priority than all other threads, it may never be
pre-empted, and the thread holding the lock may never progress far
enough to release the lock.  On the other hand, spinlocking is the
only locking that can work with an arbitrary unknown threads package.

I have some ideas for a much better fix in the longer term.  It
would eliminate all locking inside the dynamic linker by making it
safe for symbol lookups and lazy binding to proceed in parallel
with a call to dlopen or dlclose.  This means that the only mutual
exclusion needed would be to prevent multiple simultaneous calls
to dlopen and/or dlclose.  That mutual exclusion could be put into
the native pthreads library.  Applications using foreign threads
packages would have to make their own arrangements to ensure that
they did not have multiple threads in dlopen and/or dlclose -- a
reasonable requirement in my opinion.

MFC after:	3 days
2002-07-06 20:25:56 +00:00
John Polstra
d1c02bccdc Update the asm statements to use the "+" modifier instead of
matching constraints where appropriate.  This makes the dynamic
linker buildable at -O0 again.

Thanks to Bruce Evans for identifying the cause of the build
problem.

MFC after:	1 week
2002-06-24 23:19:18 +00:00
Matthew Dillon
b08440e568 Correct a bug in the last commit. The whole point of creating a 'done:'
goto target was so the cache could be freed.  So free the cache after
done: rather then before done: (!)

Submitted by:	Gavin Atkinson <gavin@ury.york.ac.uk>
2002-06-10 21:15:50 +00:00
Matthew Dillon
b603db3019 In tracking down an installation seg fault with then openoffice port
Martin Blapp determined that the elf dynamic loader was at fault.  In
particular, the loader uses alloca() to allocate a symbol cache on the
stack.  Normally this would work just fine, but if the loader is called
from a threaded program and the object being loaded is fairly large the
alloca() can blow away the thread stack and effect other nearby thread
stacks as well.  My testing showed that the symbol cache can be as large
as 250KBytes during the openoffice port build and install sequence.  Martin
was able to work around the problem by disabling the symbol cache
(cache = NULL;).  However, this solution is not adequate for commit because
it can cause an enormous cpu burden for applications which do a lot of
dynamic loading (e.g. like konqueror).

The solution is to use anonymous mmap() to temporarily allocate space to
hold the symbol cache.  In testing I found that replacing the alloca()
with mmap() has no observable degredation in performance.

It should be noted that this bug does not necessarily cause an immediate
crash but can instead result in long term corruption and instability in
applications that load modules from threads.  The bug is almost certainly
responsible for some of the instabilities found in konqueror, for example,
and possibly netscape too.

Sleuthing work by: Martin Blapp <mb@imp.ch>
X-MFC after:	Before or after the 4.6 release depending on the release engineers
2002-06-10 18:52:31 +00:00
Peter Wemm
14a55adf36 Update rtld for the "new" ia64 ABI. In the old toolchain, the
DT_INIT and DT_FINI tags pointed to fptr records.  In 2.11.2, it points
to the actuall address of the function.  On IA64 you cannot just take
an address of a function, store it in a function pointer variable and
call it.. the function pointers point to a fptr data block that has the
target gp and address in it.  This is absolutely necessary for using
the in-tree binutils toolchain, but (unfortunately) will not work with
old shared libraries.  Save your old ld-elf.so.1 if you want to use
old ones still.  Do not mix-and-match.

This is a no-op change for i386 and alpha.

Reviewed by:	dfr
2001-10-29 10:10:10 +00:00
Doug Rabson
b5393d9f78 Add ia64 support. Various adjustments were made to existing targets to
cope with a few interface changes required by the ia64. In particular,
function pointers on ia64 need special treatment in rtld.
2001-10-15 18:48:42 +00:00
John Polstra
c15e7faad5 Performance improvements for the ELF dynamic linker. These
particularly help programs which load many shared libraries with
a lot of relocations.  Large C++ programs such as are found in KDE
are a prime example.

While relocating a shared object, maintain a vector of symbols
which have already been looked up, directly indexed by symbol
number.  Typically, symbols which are referenced by a relocation
entry are referenced by many of them.  This is the same optimization
I made to the a.out dynamic linker in 1995 (rtld.c revision 1.30).

Also, compare the first character of a sought-after symbol with its
symbol table entry before calling strcmp().

On a PII/400 these changes reduce the start-up time of a typical
KDE program from 833 msec (elapsed) to 370 msec.

MFC after:	5 days
2001-05-05 23:21:05 +00:00
John Polstra
cf98e66403 Fix a bug which could cause programs with user threads packages to
lock against themselves, causing infinite spinning.  Brian Feldman
found this problem when testing with Mozilla and supplied the fix,
which I have revised slightly.

Here is the failure scenario.  A thread calls dlopen() and acquires
the writer lock.  While the thread still holds the lock, a signal
is delivered and caught.  The signal handler tries to call a function
which hasn't been bound yet.  It thus enters the dynamic linker
and tries to acquire the reader lock.  Since the writer lock is
already held, it will spin forever in the signal handler.  The
thread holding the lock won't be able to progress and release the
lock.

The solution is to block almost all signals while holding the
exclusive lock.

A similar problem could conceivably occur in the opposite order.
Namely, a thread is holding the reader lock and then a signal
handler calls dlopen() or dlclose() and spins waiting for the writer
lock.  We deal with this administratively by proclaiming that signal
handlers aren't allowed to call dlopen() or dlclose().  Actually
we don't have to proclaim a thing, since signal handlers aren't
allowed to call any system functions except those which are explicitly
permitted.

Submitted by:	Brian Fundakowski Feldman <green>
2000-07-17 17:18:13 +00:00
John Polstra
630df077ab Solve the dynamic linker's problems with multithreaded programs once
and for all (I hope).  Packages such as wine, JDK, and linuxthreads
should no longer have any problems with re-entering the dynamic
linker.

This commit replaces the locking used in the dynamic linker with a
new spinlock-based reader/writer lock implementation.  Brian
Fundakowski Feldman <green> argued for this from the very beginning,
but it took me a long time to come around to his point of view.
Spinlocks are the only kinds of locks that work with all thread
packages.  But on uniprocessor systems they can be inefficient,
because while a contender for the lock is spinning the holder of the
lock cannot make any progress toward releasing it.  To alleviate
this disadvantage I have borrowed a trick from Sleepycat's Berkeley
DB implementation.  When spinning for a lock, the requester does a
nanosleep() call for 1 usec. each time around the loop.  This will
generally yield the CPU to other threads, allowing the lock holder
to finish its business and release the lock.  I chose 1 usec. as the
minimum sleep which would with reasonable certainty not be rounded
down to 0.

The formerly machine-independent file "lockdflt.c" has been moved
into the architecture-specific subdirectories by repository copy.
It now contains the machine-dependent spinlocking code.  For the
spinlocks I used the very nifty "simple, non-scalable reader-preference
lock" which I found at

  <http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html>

on all CPUs except the 80386 (the specific CPU model, not the
architecture).  The 80386 CPU doesn't support the necessary "cmpxchg"
instruction, so on that CPU a simple exclusive test-and-set lock
is used instead.  80386 CPUs are detected at initialization time by
trying to execute "cmpxchg" and catching the resulting SIGILL
signal.

To reduce contention for the locks, I have revamped a couple of
key data structures, permitting all common operations to be done
under non-exclusive (reader) locking.  The only operations that
require exclusive locking now are the rare intrusive operations
such as dlopen() and dlclose().

The dllockinit() interface is now deprecated.  It still exists,
but only as a do-nothing stub.  I plan to remove it as soon as is
reasonably possible.  (From the very beginning it was clearly
labeled as experimental and subject to change.)  As far as I know,
only the linuxthreads port uses dllockinit().  This interface turned
out to have several problems.  As one example, when the dynamic
linker called a client-supplied locking function, that function
sometimes needed lazy binding, causing re-entry into the dynamic
linker and a big looping mess.  And in any case, it turned out to be
too burdensome to require threads packages to register themselves
with the dynamic linker.
2000-07-08 04:10:38 +00:00
John Polstra
7dbe16fbee When a threads package registers locking methods with dllockinit(),
figure out which shared object(s) contain the the locking methods
and fully bind those objects as if they had been loaded with
LD_BIND_NOW=1.  The goal is to keep the locking methods from
requiring any lazy binding.  Otherwise infinite recursion occurs
in _rtld_bind.

This fixes the infinite recursion problem in the linuxthreads port.
2000-01-29 01:27:04 +00:00