Compiling with clang gives a loss-of-alignment error due the cast to
uint8_t *. Since the TLS is always tcb aligned and TP_OFFSET is defined
as sizeof(struct tcb) we can guarantee there is no misalignment. Silence
the error by moving the offset into the inline assembly.
Reviewed by: br
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21926
As per r177853, we need to avoid using errno inside user mutex code, since
signal handlers can interfere with it and mess up libthr internal state.
So, implement _umtx_op_err() instead, which makes a raw syscall and
returns the error value directly instead of using errno.
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D20946
If dso uses initial exec TLS mode, rtld tries to allocate TLS in
static space. If there is no space left, the dlopen(3) fails. If space
if allocated, initial content from PT_TLS segment is distributed to
all threads' pcbs, which was missed and caused un-initialized TLS
segment for such dso after dlopen(3).
The mode is auto-detected either due to the relocation used, or if the
DF_STATIC_TLS dynamic flag is set. In the later case, the TLS segment
is tried to allocate earlier, which increases chance of the dlopen(3)
to succeed. LLD was recently fixed to properly emit the flag, ld.bdf
did it always.
Initial test by: dumbbell
Tested by: emaste (amd64), ian (arm)
Tested by: Gerald Aryeetey <aryeeteygerald_rogers.com> (arm64)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19072
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Later gcc and clang have deprecated =v (which maps to a specific temp
register) and instead we should just use =r to have the assembler
(hopefully!) save/restore things appropriately after choosing
a register.
Tested:
* AR9344 SoC, with userreg support
* AR9331 SoC, with no userreg support
Sponsored by: Sponsored by: DARPA, AFRL (MIPS TLS user register work)
This work, originally from Stacey Son, uses the MIPS UserReg for
reading the TLS data, and will fall back to the normal syscall path
when it isn't supported.
This code dynamically patches cpu_switch() to bypass the UserReg
instruction so to avoid generating a machine exception.
Thanks to sson for the original work, and to Dan Nelson for
bringing it to date and testing it on MIPS32 with me.
Tested:
* mips64 (sson)
* mips74k (dnelson_1901@yahoo.com) - AR9344 SoC, UserReg support
* mips24k (adrian) - AR9331 SoC, no UserReg support
Obtained from: sson, dnelson_1901@yahoo.com
RISC-V is a new ISA designed to support computer research and education, and
is now become a standard open architecture for industry implementations.
This is a minimal set of changes required to run 'make kernel-toolchain'
using external (GNU) toolchain.
The FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv.
Reviewed by: andrew, bdrewery, emaste, imp
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D4445
Clang emits SSE instructions on amd64 in the common path of
pthread_mutex_unlock. If the thread does not otherwise use SSE,
this usage incurs a context-switch of the FPU/SSE state, which
reduces the performance of multiple real-world applications by a
non-trivial amount (3-5% in one application).
Instead of this change, I experimented with eagerly switching the
FPU state at context-switch time. This did not help. Most of the
cost seems to be in the read/write of memory--as kib@ stated--and
not in the #NM handling. I tested on machines with and without
XSAVEOPT.
One counter-argument to this change is that most applications already
use SIMD, and the number of applications and amount of SIMD usage
are only increasing. This is absolutely true. I agree that--in
general and in principle--this change is in the wrong direction.
However, there are applications that do not use enough SSE to offset
the extra context-switch cost. SSE does not provide a clear benefit
in the current libthr code with the current compiler, but it does
provide a clear loss in some cases. Therefore, disabling SSE in
libthr is a non-loss for most, and a gain for some.
I refrained from disabling SSE in libc--as was suggested--because
I can't make the above argument for libc. It provides a wide variety
of code; each case should be analyzed separately.
https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html
Suggestions from: dim, jmg, rpaulo
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Dell Inc.
only adds support for kernel-toolchain, however it is expected further
changes to add kernel and userland support will be committed as they are
reviewed.
As our copy of binutils is too old the devel/aarch64-binutils port needs
to be installed to pull in a linker.
To build either TARGET needs to be set to arm64, or TARGET_ARCH set to
aarch64. The latter is set so uname -p will return aarch64 as existing
third party software expects this.
Differential Revision: https://reviews.freebsd.org/D2005
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
The amd64, i386, and sparc64 versions were identical, with the one
difference where the former two used inline asm instead of _tcb_get. I
have compared the function before and after replacing the asm with _tcb_get
and found the object files to be identical.
The arm, mips, and powerpc versions were almost identical. The only
difference was the powerpc version used an alignment of 1 where arm and
mips used 16. As this is an increase in alignment is will be safe.
Along with this arm, mips, and powerpc all passed, when initial was true,
the value returned from _tcb_get as the first argument to
_rtld_allocate_tls. This would then return this pointer back to the caller.
We can remove these extra calls by checking if initial is set and setting
the thread control block directly. As this is what the sparc64 code does
we can use it directly.
As after these observations all the architectures can now have identical
code we can merge them into a common file.
Differential Revision: https://reviews.freebsd.org/D1556
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
versions of pthread_md.h have a special case of dereferencing a null
pointer. Clang warns about this with:
In file included from lib/libthr/arch/i386/i386/pthread_md.c:36:
lib/libthr/arch/i386/include/pthread_md.h:96:10: error: indirection of non-volatile null pointer will be deleted, not trap [-Werror,-Wnull-dereference]
return (TCB_GET32(tcb_self));
^~~~~~~~~~~~~~~~~~~
lib/libthr/arch/i386/include/pthread_md.h:73:13: note: expanded from:
: "m" (*(u_int *)(__tcb_offset(name)))); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/libthr/arch/i386/include/pthread_md.h:96:10: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
Since this indirection is done relative to the fs or gs segment, to
retrieve thread-specific data, it is an exception to the rule.
Therefore, add a volatile qualifier to tell the compiler we really want
to dereference a zero address.
MFC after: 1 week
o Set TP using inline assembly to avoid dead code elimination.
o Eliminate _tcb.
Merge from r161840:
Stylize: avoid using a global register variable.
Merge from r157461:
Simplify _get_curthread() and _tcb_ctor because libc and rtld now
already allocate thread pointer space in tls block for initial thread.
Merge from r177853:
Replace function _umtx_op with _umtx_op_err, the later function directly
returns errno, because errno can be mucked by user's signal handler and
most of pthread api heavily depends on errno to be correct, this change
should improve stability of the thread library.
MFC after: 1 week
For all libthr contexts, use ${MACHINE_CPUARCH}
for all libc contexts, use ${MACHINE_ARCH} if it exists, otherwise use
${MACHINE_CPUARCH}
Move some common code up a layer (the .PATH statement was the same in
all the arch submakefiles).
# Hope she hasn't busted powerpc64 with this...
returns errno, because errno can be mucked by user's signal handler and
most of pthread api heavily depends on errno to be correct, this change
should improve stability of the thread library.
o The TLS pointer (r2) points 0x7000 after the *end* of the TCB.
o _rtld_allocate_tls() gets a pointer to the current TCB, not the
current TLS pointer.
o _rtld_free_tls() gets the size of the TCB structure.