Clang emits SSE instructions on amd64 in the common path of
pthread_mutex_unlock. If the thread does not otherwise use SSE,
this usage incurs a context-switch of the FPU/SSE state, which
reduces the performance of multiple real-world applications by a
non-trivial amount (3-5% in one application).
Instead of this change, I experimented with eagerly switching the
FPU state at context-switch time. This did not help. Most of the
cost seems to be in the read/write of memory--as kib@ stated--and
not in the #NM handling. I tested on machines with and without
XSAVEOPT.
One counter-argument to this change is that most applications already
use SIMD, and the number of applications and amount of SIMD usage
are only increasing. This is absolutely true. I agree that--in
general and in principle--this change is in the wrong direction.
However, there are applications that do not use enough SSE to offset
the extra context-switch cost. SSE does not provide a clear benefit
in the current libthr code with the current compiler, but it does
provide a clear loss in some cases. Therefore, disabling SSE in
libthr is a non-loss for most, and a gain for some.
I refrained from disabling SSE in libc--as was suggested--because
I can't make the above argument for libc. It provides a wide variety
of code; each case should be analyzed separately.
https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html
Suggestions from: dim, jmg, rpaulo
Sponsored by: Dell Inc.
Ensure we use calculate_first_tls_offset, even if the main program doesn't
have TLS program header. This is needed on architectures with Variant I
tls, that is arm, arm64, mips, and powerpc. These place the thread control
block at the start of the buffer and, without this, this data may be
trashed.
This appears to not be an issue on mips or powerpc as they include a second
adjustment to move the thread local data, however this is on arm64 (with a
future change to fix placing this data), and should be on arm. I am unable
to trigger this on arm, even after changing the code to move the data
around to make it more likely to be hit. This is most likely because my
tests didn't use the variable in offset 0.
Reviewed by: kib
MFC after: 1 week
Sponsored by: ABT Systems Ltd
The requirement is for a GCC-compatible compiler and not necessarily
GCC itself. However, we currently expect any compiler used for building
the whole of FreeBSD to be GCC-compatible and many things will break if
not; there's no longer a need to have an explicit test for this in rtld.
The runtime linker needs to include a path to itself in the link map
it exports to the debugger. It currently has two choices: it can use
a compiled-in path (/libexec/ld-elf.so.1) or it can use the path stored
in the interpreter path in the binary being executed. The runtime linker
currently prefers the second. However, this is usually wrong for compat32
binaries since the binary specifies the path of rtld on a 32-bit system
(/libexec/ld-elf.so.1) instead of the actual path (/libexec/ld-elf32.so.1).
For now, always assume the compiled in path (/libexec/ld-elf32.so.1) as
the rtld path and ignore the path in the binary for the 32-bit runtime
linker.
Linux LD_ITERATE_PHDR(3):
The dlpi_name field is a null-terminated string giving the
pathname from which the shared object was loaded.
That functionality is much more useful than returning just the short
name.
Update dl_iterate_phdr(3) to follow r272842
MFC of r272842 and r272848
Process STT_GNU_IFUNC when doing non-plt relocations.
MFC r270802:
Only do the second pass over non-plt relocations when the first pass
found IFUNCs.
Approved by: re (gjb)
Add a postinit debugger hook to rtld. This will be used by dtrace(1) to halt
the victim process before its entry point is called, at which point probes
and DOF data are registered with the kernel. The r_debug_state hook cannot
be used for this purpose, as it is called before the program's init routines
are invoked and in particular before DOF data is registered (via drti.o).
malloc_aligned() may not leave enough space for pointer to allocated memory,
saving the pointer will overwrite bytes belongs to another memory block
unexpectly, to fix the problem, use (allocated address + sizeof(void *)) as
initial value, and slip to next aligned address, so maximum extra bytes is
sizeof(void *) + align - 1.
Tested by: Andre Albsmeier < mail at ma17 dot ata dot myota dot orgndre >
MFC r262334:
Increase alignment to size of pointer if the alignment is too small.
Some modules do not align data at least to size of pointer, they uses a
smaller alignment, but our pointer should be aligned to its native
boundary, otherwise on some platforms, hardware alignment checking
will cause bus error.
On MIPS the .dynamic section is read-only, so the pointer to rtld
information for debuggers cannot be stored there (in DT_DEBUG).
Instead, a special section .rld_map is used.
Sponsored by: DARPA, AFRL
Approved by: re (delphij)
available in 32-bit compatibility mode, unconditional.
Overhaul the man page, which had evolved more by accretion than by design.
Approved by: re (gjb)
MFC after: 3 weeks
PATH_MAX after the token substitution. This is wrong, because
origin_subst_one() performs the substitution on the whole rpath and
similar strings, which contain several pathes separated by colon. As
result, long (but correct) rpath consisting of many path elements is
rejected by the function.
Correct the problem by rewriting the origin_subst_one() to perform two
passes, first pass to calculate the number of substitutions to be
performed, and second pass to generate the resulting string. Second
pass allocates the memory for the result based on the count from the
first pass, without enforcing a limit.
Reported and tested by: pgj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Normal libraries have base address 0 and are unaffected by this change.
PR: 176216
Submitted by: Damjan Jovanovic <damjan.jov@gmail.com>
Reviewed by: kib
MFC after: 1 week
instead of /libexec/ld-elf.so.1. Below in the Makefile we execute
'chflags noschg ${DESTDIR}/usr/libexec/ld-elf.so.1', which follows
symlink and removes 'schg' flag from system's /libexec/ld-elf.so.1
instead of the one in DESTDIR. It is also more friendly to use
replative paths in symlink in case of jail/chroot environments.
Obtained from: WHEEL Systems
MFC after: 2 weeks
Rtld did not set FD_CLOEXEC on its internal file descriptors; therefore,
such a file descriptor may be passed to a process created by another thread
running in parallel to dlopen() or fdlopen().
No other threads are expected to be running during parsing of the hints
and libmap files but the file descriptors need not be passed to child
processes so add O_CLOEXEC there as well.
This change will break fdlopen() (as used by OpenPAM) on kernels without
F_DUPFD_CLOEXEC (added in July). Note that running new userland on old
kernels is not supported.
Reviewed by: kib
The place where the function is called can be reached if object loading
and relocation fails too, in which case obj pointer will be NULL. Do not
call process_nodelete then, or crash will follow.
Pointy hat to: kan
Trying to up the reference from the load loop risks missing dependencies
that have not been loaded yet.
MFC afer: 1 week
Reported by: nox
Reviewd by: kib
This is not strictly required with the current ABI but will be when we
switch to the ARM EABI. The aapcs requires the stack to be 4 byte aligned
at all times and 8 byte aligned when calling a public subroutine where the
current ABI only requires sp to be a multiple of 4.
by John Marino <draco@marino.st>, with the following (edited) commit
message
Date: Sat, 24 Mar 2012 06:40:50 +0100
Subject: [PATCH 1/1] rtld: Implement DT_RUNPATH and -z nodefaultlib
DT_RUNPATH is incorrectly being considered as an alias of DT_RPATH. The
purpose of DT_RUNPATH is to have two different types of rpath: one that
can be overridden by the environment variable LD_LIBRARY_PATH and one that
can't. With the currently implementation, LD_LIBRARY_PATH will always
trump any embedded rpath or runpath tags.
Current path search order by rtld:
==================================
LD_LIBRARY_PATH
DT_RPATH / DT_RUNPATH (always the same)
ldconfig hints file (default: /var/run/ld-elf.so.hints)
/usr/lib
New path search order by rtld:
==============================
DT_RPATH of the calling object if no DT_RUNPATH
DT_RPATH of the main binary if no DT_RUNPATH and binary isn't calling obj
LD_LIBRARY_PATH
DT_RUNPATH
ldconfig hints file
/usr/lib
The new path search matches how the linux runtime loader works. The other
major added feature is support for linker flag "-z nodefaultlib". When
this flag is passed to the linker, rtld will skip all references to the
standard library search path ("/usr/lib" in this case but it could handle
more color delimited paths) except in DT_RPATH and DT_RUNPATH.
New path search order by rtld with -z nodefaultlib flag set:
============================================================
DT_RPATH of the calling object if no DT_RUNPATH
DT_RPATH of the main binary if no DT_RUNPATH and binary isn't calling obj
LD_LIBRARY_PATH
DT_RUNPATH
ldconfig hints file (skips all references to /usr/lib)
FreeBSD notes:
- we fixed some bugs which were submitted to DragonFly and merged there
as commit 1ff8a2bd3eb6e5587174c6a983303ea3a79e0002;
- we added LD_LIBRARY_PATH_RPATH environment variable to switch to
the previous behaviour of considering DT_RPATH a synonym for DT_RUNPATH;
- the FreeBSD default search path is /lib:/usr/lib and not /usr/lib.
Reviewed by: kan
MFC after: 1 month
MFC note: flip the ld_library_path_rpath default value for stable/9
relocations are performed before the object's initializer is called.
When dlopen()ing an object, relocate the whole DAG rooted in the
object instead of only relocating the object itself and list of newly
loaded dependencies.
Reversed sequence currently can occur if the same object is a
dependency for both filtee and filter, since filtees are loaded
typically during the relocation processing, when some filter
dependencies might be already loaded but not relocated yet.
Reported and tested by: swills
Reviewed by: kan
MFC after: 1 week