Commit Graph

37 Commits

Author SHA1 Message Date
Eric van Gyzen
ddab052725 Disable SSE in libthr
Clang emits SSE instructions on amd64 in the common path of
pthread_mutex_unlock.  If the thread does not otherwise use SSE,
this usage incurs a context-switch of the FPU/SSE state, which
reduces the performance of multiple real-world applications by a
non-trivial amount (3-5% in one application).

Instead of this change, I experimented with eagerly switching the
FPU state at context-switch time.  This did not help.  Most of the
cost seems to be in the read/write of memory--as kib@ stated--and
not in the #NM handling.  I tested on machines with and without
XSAVEOPT.

One counter-argument to this change is that most applications already
use SIMD, and the number of applications and amount of SIMD usage
are only increasing.  This is absolutely true.  I agree that--in
general and in principle--this change is in the wrong direction.
However, there are applications that do not use enough SSE to offset
the extra context-switch cost.  SSE does not provide a clear benefit
in the current libthr code with the current compiler, but it does
provide a clear loss in some cases.  Therefore, disabling SSE in
libthr is a non-loss for most, and a gain for some.

I refrained from disabling SSE in libc--as was suggested--because
I can't make the above argument for libc.  It provides a wide variety
of code; each case should be analyzed separately.

https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html

Suggestions from:	dim, jmg, rpaulo
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell Inc.
2015-08-05 12:53:55 +00:00
Andrew Turner
20fe2c9465 Merge all the copies of _tcb_ctor and _tcb_dtor.
The amd64, i386, and sparc64 versions were identical, with the one
difference where the former two used inline asm instead of _tcb_get. I
have compared the function before and after replacing the asm with _tcb_get
and found the object files to be identical.

The arm, mips, and powerpc versions were almost identical. The only
difference was the powerpc version used an alignment of 1 where arm and
mips used 16. As this is an increase in alignment is will be safe.

Along with this arm, mips, and powerpc all passed, when initial was true,
the value returned from _tcb_get as the first argument to
_rtld_allocate_tls. This would then return this pointer back to the caller.
We can remove these extra calls by checking if initial is set and setting
the thread control block directly. As this is what the sparc64 code does
we can use it directly.

As after these observations all the architectures can now have identical
code we can merge them into a common file.

Differential Revision:	https://reviews.freebsd.org/D1556
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2015-01-21 16:41:05 +00:00
Dimitry Andric
b34d83a709 The TCB_GET32() and TCB_GET64() macros in the i386 and amd64-specific
versions of pthread_md.h have a special case of dereferencing a null
pointer.  Clang warns about this with:

In file included from lib/libthr/arch/i386/i386/pthread_md.c:36:
lib/libthr/arch/i386/include/pthread_md.h:96:10: error: indirection of non-volatile null pointer will be deleted, not trap [-Werror,-Wnull-dereference]
        return (TCB_GET32(tcb_self));
                ^~~~~~~~~~~~~~~~~~~
lib/libthr/arch/i386/include/pthread_md.h:73:13: note: expanded from:
            : "m" (*(u_int *)(__tcb_offset(name))));            \
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/libthr/arch/i386/include/pthread_md.h:96:10: note: consider using __builtin_trap() or qualifying pointer with 'volatile'

Since this indirection is done relative to the fs or gs segment, to
retrieve thread-specific data, it is an exception to the rule.

Therefore, add a volatile qualifier to tell the compiler we really want
to dereference a zero address.

MFC after:	1 week
2011-12-15 19:42:25 +00:00
Konstantin Belousov
6c69d05232 Add section .note.GNU-stack for assembly files used by 386 and amd64. 2011-01-07 16:09:33 +00:00
Warner Losh
10f3ae5899 Merge from tbemd, with a small amount of rework:
For all libthr contexts, use ${MACHINE_CPUARCH}
for all libc contexts, use ${MACHINE_ARCH} if it exists, otherwise use
${MACHINE_CPUARCH}
Move some common code up a layer (the .PATH statement was the same in
all the arch submakefiles).

# Hope she hasn't busted powerpc64 with this...
2010-09-13 01:43:10 +00:00
David Xu
d6e0eb0a48 Replace function _umtx_op with _umtx_op_err, the later function directly
returns errno, because errno can be mucked by user's signal handler and
most of pthread api heavily depends on errno to be correct, this change
should improve stability of the thread library.
2008-04-02 07:41:25 +00:00
David E. O'Brien
a2c4cd4549 style.Makefile(5) 2008-02-13 05:25:43 +00:00
David Xu
d99f6dac14 - Remove variable _thr_scope_system, all threads are system scope.
- Rename _thr_smp_cpus to boolean variable _thr_is_smp.
- Define CPU_SPINWAIT macro for each arch, only X86 supports it.
2006-12-15 11:52:01 +00:00
David Xu
b7e118b1e0 Remove declaration of _thr_initial from MD header file, it is no longer
needed.
2006-04-04 03:35:26 +00:00
David Xu
7bd761788d Simplify _get_curthread() and _tcb_ctor because libc and rtld now
already allocate thread pointer space in tls block for initial thread.
Only i386 and amd64 have been done, others still have to be tested.
2006-04-04 03:26:06 +00:00
David Xu
d827aa478d Remove functions i386_get_gsbase and i386_set_gsbase, they were already
in libc.
2006-01-07 06:01:43 +00:00
David Xu
babdcc8d78 Kill unused variable declaration. 2005-10-29 03:08:43 +00:00
David Xu
920d31ef8d Remove COMPAT_32BIT, it is no longer needed. 2005-04-27 01:29:03 +00:00
David Xu
ff87e1a6ba Remove unused variable. 2005-04-23 03:34:43 +00:00
David Xu
a364e127e3 Now libthr only uses GDT based tls on i386. using LDT can only increase
clock cycles and has 8191 threads limitation.
2005-04-23 03:31:59 +00:00
David Xu
3466f35a77 Add i386_get_gsbase, i386_set_gsbase since old libc doesn't have the
functions, otherwise user ports have to be rebuilt.
2005-04-23 02:14:38 +00:00
Peter Wemm
c050415d18 Adapt the libpthread patch for using i386_set_gsbase() to libthr. 2005-04-14 00:44:07 +00:00
David Xu
a091d823ad Import my recent 1:1 threading working. some features improved includes:
1. fast simple type mutex.
 2. __thread tls works.
 3. asynchronous cancellation works ( using signal ).
 4. thread synchronization is fully based on umtx, mainly, condition
    variable and other synchronization objects were rewritten by using
    umtx directly. those objects can be shared between processes via
    shared memory, it has to change ABI which does not happen yet.
 5. default stack size is increased to 1M on 32 bits platform, 2M for
    64 bits platform.
As the result, some mysql super-smack benchmarks show performance is
improved massivly.

Okayed by: jeff, mtm, rwatson, scottl
2005-04-02 01:20:00 +00:00
Peter Wemm
37771efde6 Fix inverted #ifdef that I added. Who had the pointy hat last?
Submitted by:  kan
2004-12-06 20:41:09 +00:00
Peter Wemm
c5bfff3bab Use the recently exposed fs/gs set functions when compiling libthr to
run as a 32 bit support library for an amd64 kernel.  32 bit consumers of
libthr have zero chance of running on an amd64 kernel since we don't
implement the i386_set_ldt() family of functions.  Note that this commit
doesn't make it actually work, it just removes one more obstacle.
2004-11-06 03:30:53 +00:00
David Xu
9027ac471c Adjust code to support AMD64, on AMD64, thread needs to set fsbase by
itself before it can execute any other code, so new thread should be
created with all signals are masked until after fsbase is set.
2004-08-19 23:49:04 +00:00
Doug Rabson
adbadf41a9 Add TLS support for libthr on i386. 2004-08-15 16:21:30 +00:00
Mark Murray
16fc3635f7 Make NULL a (void*)0 whereever possible, and fix the warnings(-Werror)
that this provokes. "Wherever possible" means "In the kernel OR NOT
C++" (implying C).

There are places where (void *) pointers are not valid, such as for
function pointers, but in the special case of (void *)0, agreement
settles on it being OK.

Most of the fixes were NULL where an integer zero was needed; many
of the fixes were NULL where ascii <nul> ('\0') was needed, and a
few were just "other".

Tested on: i386 sparc64
2004-03-05 08:10:19 +00:00
Mike Makonnen
4e3c587d23 Bump up the maximum number concurrent threads on x86. 2004-02-01 15:33:01 +00:00
Mike Makonnen
a4be5b10a2 Use dynamic instead of static LDT allocation.
Approved by: re (scottl)
2003-12-02 16:00:26 +00:00
Mike Makonnen
7e2160688c The move to _retire() a thread in the GC instead of in the thread's
exit function has invalidated the need for _spin[un]lock_pthread().
The _spin[un]lock() functions can now dereference curthread without
the danger that the ldtentry containing the pointer to the thread
has been cleared out from under them.
2003-06-29 00:12:40 +00:00
David E. O'Brien
a23e5f4d43 .S comments must be C comments, not ASM ones. 2003-06-02 02:32:56 +00:00
Mike Makonnen
0e335eaeb5 Missing unlock.
Approved by:	re/jhb
2003-05-29 20:49:17 +00:00
Mike Makonnen
12c407a424 Return gracefully, rather than aborting, when the maximum concurrent
threads per process has been reached. Return EAGAIN, as per spec.

Approved by:	re/blanket libthr
2003-05-25 22:40:57 +00:00
Mike Makonnen
59a47b31d0 Add two functions: _spinlock_pthread() and _spinunlock_pthread()
that take the address of a struct pthread as their first argument.
_spin[un]lock() just become wrappers arround these two functions.
These new functions are for use in situations where curthread can't be
used. One example is _thread_retire(), where we invalidate the array index
curthread uses to get its pointer..

Approved by:	re/blanket libthr
2003-05-23 23:39:31 +00:00
Mike Makonnen
7d9d7ca2ed Make WARNS2 clean. The fixes mostly included:
o removed unused variables
	o explicit inclusion of header files
	o prototypes for externally defined functions

Approved by:    re/blanket libthr
2003-05-23 09:48:20 +00:00
Mike Makonnen
509d72c4b9 o Make the defenition of _set_curthread() match its declaration
in thr_private.h

o Lock down the ldt_entries array and ldt_free, which points to
  the next free slot. As noted in the comments, it's necessary
  to special case the initial_thread because %gs is not setup
  for it yet. This is ok because that early in the program there
  won't be any reentrancy issues anyways.

Approved by:	re/blanket libthr
2003-05-21 08:21:24 +00:00
Mike Makonnen
a260623c5f Fix a null dereference leading to a core dump when
the number of threads exceeds the number of open slots
in ldt_entries[].

Approved by:	markm (mentor)(implicit)
Reviewed by:	jeff
2003-05-06 02:33:49 +00:00
Jake Burkholder
55ad402a8f - Pass a ucontext_t to _set_curthread. If non-NULL the new thread is set
as curthread in the new context, so that it will be set automatically when
  the thread is switched to.  This fixes a race where we'd run for a little
  while with curthread unset in _thread_start.

Reviewed by:	jeff
2003-04-03 03:34:50 +00:00
Jeff Roberson
996a395d37 - Don't overrun the ldt buffer.
Submitted by:	gordan@freebsd.org
2003-04-02 22:53:52 +00:00
Jeff Roberson
7a57e9abdd - Adjust the makefiles so we have a per architecture makefile. 2003-04-01 07:07:38 +00:00
Jeff Roberson
bb535300dd - Add libthr but don't hook it up to the regular build yet. This is an
adaptation of libc_r for the thr system call interface.  This is beta
   quality code.
2003-04-01 03:46:29 +00:00