Commit Graph

297 Commits

Author SHA1 Message Date
Konstantin Belousov
06d8a116bd libc: add _get_tp() private function
which returns pointer to tcb

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29623
2021-04-09 23:46:24 +03:00
Konstantin Belousov
4c2e9c35fb libc/<arch>/sys/cerror.S: fix typo
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-04 01:00:57 +03:00
Ed Maste
ef36db58da remove obsolete i386 MD memchr implementation
bde reports (in a reply to r351700 commit mail):
    This uses scasb, which was last optimal on the 8086, or perhaps the
    original i386.  On freefall, it is several times slower than the
    naive translation of the naive C code.

Reported by:	bde
Reviewed by:	kib, markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21785
2019-09-25 16:49:22 +00:00
Konstantin Belousov
5d00c5a657 Fix initial exec TLS mode for dynamically loaded shared objects.
If dso uses initial exec TLS mode, rtld tries to allocate TLS in
static space. If there is no space left, the dlopen(3) fails. If space
if allocated, initial content from PT_TLS segment is distributed to
all threads' pcbs, which was missed and caused un-initialized TLS
segment for such dso after dlopen(3).

The mode is auto-detected either due to the relocation used, or if the
DF_STATIC_TLS dynamic flag is set.  In the later case, the TLS segment
is tried to allocate earlier, which increases chance of the dlopen(3)
to succeed.  LLD was recently fixed to properly emit the flag, ld.bdf
did it always.

Initial test by:	dumbbell
Tested by:	emaste (amd64), ian (arm)
Tested by:	Gerald Aryeetey <aryeeteygerald_rogers.com> (arm64)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19072
2019-03-29 17:52:57 +00:00
Konstantin Belousov
a2d95495ee Add usermode helpers for for Intel userspace protection keys feature.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:56:23 +00:00
Konstantin Belousov
071bca67ee Unify i386 and amd64 getcontextx.c, and use ifuncs while there.
In particular, use ifuncs for __getcontextx_size(), also calculate the
size of the extended save area in resolver.  Same for __fillcontextx2().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-14 14:02:33 +00:00
Brooks Davis
db19a093bb Remove MD __sys_* private symbols.
No references to any of these exist in the tree. The list was also
erratic with different architectures exporting different things
(arm64 and riscv exported none).

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18425
2018-12-05 00:46:09 +00:00
Mateusz Guzik
23ec0d58bf amd64: depessimize userspace memcpy/memmove/bcopy
The change resembles what was done in r334537 for kernel routines.
While here take care of i386 variants. Note that primitives remain
suboptimal.

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17167
2018-09-17 15:49:35 +00:00
Mark Johnston
9f9c9b22ec Reimplement brk() and sbrk() to avoid the use of _end.
Previously, libc.so would initialize its notion of the break address
using _end, a special symbol emitted by the static linker following
the bss section.  Compatibility issues between lld and ld.bfd could
cause the wrong definition of _end (libc.so's definition rather than
that of the executable) to be used, breaking the brk()/sbrk()
interface.

Avoid this problem and future interoperability issues by simply not
relying on _end.  Instead, modify the break() system call to return
the kernel's view of the current break address, and have libc
initialize its state using an extra syscall upon the first use of the
interface.  As a side effect, this appears to fix brk()/sbrk() usage
in executables run with rtld direct exec, since the kernel and libc.so
no longer maintain separate views of the process' break address.

PR:		228574
Reviewed by:	kib (previous version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D15663
2018-06-04 19:35:15 +00:00
Brooks Davis
0141ef6c07 Remove support for SYS_sys_exit in favor of SYS_exit.
SYS_exit has been defined in the repo since 1994 except for a brief
window when SYS_sys_exit was defined in 2000.
2018-06-01 22:09:27 +00:00
Brooks Davis
87385baff6 Replace MD assembly exect() with a portable version.
Originally, on the VAX exect() enable tracing once the new executable
image was loaded.  This was possible because tracing was controllable
through user space code by setting the PSL_T flag.  The following
instruction is a system call that activated tracing (as all
instructions do) by copying PSL_T to PSL_TP (trace pending).  The
first instruction of the new executable image would trigger a trace
fault.

This is not portable to all platforms and the behavior was replaced with
ptrace(PT_TRACE_ME, ...) since FreeBSD forked off of the CSRG repository.
Platforms either incorrectly call execve(), trigger trace faults inside
the original executable, or do contain an implementation of this
function.

The exect() interfaces is deprecated or removed on NetBSD and OpenBSD.

Submitted by:	Ali Mashtizadeh <ali@mashtizadeh.com>
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D14989
2018-04-12 18:23:14 +00:00
Brooks Davis
047a2ef697 Remove caching from getlogin(2).
This caching has existed since the CSRG import, but serves no obvious
purpose. Sure, setlogin() is called rarely, but calls to getlogin()
should also be infrequent. The required invalidation was not
implemented on aarch64, arm, mips, amd riscv so updates would never
occur if getlogin() was called before setlogin().

Reported by:	Ali Mashtizadeh <ali@mashtizadeh.com>
Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14965
2018-04-06 17:17:34 +00:00
Brooks Davis
7dd87e9a82 Remove architecture specific sigreturn.S files.
All of these files are identical (modulo license blocks and VCS IDs) to
the files generated by lib/libc/sys/Makefile.inc and serve no purpose.

Reported by:	Ali Mashtizadeh <ali@mashtizadeh.com>
Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14953
2018-04-04 22:45:08 +00:00
John Baldwin
80996ef878 Remove bogus checks against NCARGS.
NCARGS isn't a limit on the number of arguments to pass to a function,
but the number of bytes that can be consumed by arguments to exec.  As
such, it is not suitable for a limit on the count of arguments passed
to makecontext().

Sponsored by:	DARPA / AFRL
2018-01-31 17:57:59 +00:00
Ed Maste
0d18946c9a revert r322589: force use of ld.bfd for linking i386 libc
As of r326897 ld.lld can link a working i386 libc.so, so we no longer
need to force use of ld.bfd.

Sponsored by:	The FreeBSD Foundation
2017-12-16 15:17:54 +00:00
Pedro F. Giffuni
d915a14ef0 libc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-25 17:12:48 +00:00
Pedro F. Giffuni
8a16b7a18f General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
Ed Maste
f8de17a11f force use of ld.bfd for linking i386 libc, even when using lld
lld can successfully link most of a working i386 userland and kernel,
but produces a broken libc. For now if we're otherwise using lld, and
ld.bfd is available, explicitly use it for libc.

Sponsored by:	The FreeBSD Foundation
2017-08-16 18:55:39 +00:00
Brooks Davis
13f2393362 Correct an misunderstanding of MDSRCS.
MDSRCS it intended to allow assembly versions of funtions with C
implementations listed in MISRCS. The selection of the correct
machdep_ldis?.c for a given architecture does not follow this pattern
and the file should be added to SRCS directly.

Reviewed by:	emaste, imp, jhb
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9841
2017-03-02 17:07:28 +00:00
Brooks Davis
9fe44df287 Correct MDSRCS use in <arch>/string/Makefile.inc.
- Remove .c files which duplicate entries in MISRCS.
- Use the same, less merge conflict prone style in all cases.
- Use MDSRCS for mips (.c and .S files both ended up in SRCS).
- Remove pointless sparc64 Makefile.inc.
- Remove uninformative foreign VCS ID entries.

Reviewed by:	emaste, imp, jhb
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9841
2017-03-02 17:05:52 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Brooks Davis
aec2fba60f Reduce duplicate NOASM and PSEUDO definitions
The initial value of NOASM is nearly the same in all cases and the
initial value of PSEUDO is the same in all cases so reduce duplication
(and hopefully, future merge conflicts) by machine independent defaults.

Also document the PSEUDO variable.

Reviewed by:	jhb, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D7820
2016-09-08 22:38:20 +00:00
Konstantin Belousov
afd3e268d2 Rewrite ptrace(2) wrappers in C.
Besides removing hand-translation to assembler, this also adds missing
wrappers for arm64 and risc-v.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D7694
2016-08-29 18:47:51 +00:00
Konstantin Belousov
da6e468936 Do not obliterate errno value in the main thread during ptrace(2) call on x86.
Since ptrace(2) syscall can return -1 for non-error situations, libc
wrappers set errno to 0 before performing the syscall, as the service
to the caller.  On both i386 and amd64, the errno symbol was directly
referenced, which only works correctly in single-threaded process.

Change assembler wrappers for ptrace(2) to get current thread errno
location by calling __error().  Allow __error interposing, as
currently allowed in cerror().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-27 23:03:23 +00:00
George V. Neville-Neil
5cba398b0c Remove unusedd and obsolete openbsd_poll system call. (Phase 1)
Reported by:	brooks
Reviewed by:	brooks,jhb
Differential Revision:	https://reviews.freebsd.org/D7548
2016-08-18 10:50:40 +00:00
Konstantin Belousov
1680854946 Implement userspace gettimeofday(2) with HPET timecounter.
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC.  For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge.  Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.

Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET.  For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.

Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods.  Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location.  __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.

Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access.  But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.

Tested by:	Howard Su <howard0su@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D7473
2016-08-17 09:52:09 +00:00
Brooks Davis
b60998c633 Replace use of the pipe(2) system call with pipe2(2) with a zero flags
value.

This eliminates the need for machine dependant assembly wrappers for
pipe(2).

It also make passing an invalid address to pipe(2) return EFAULT rather
than triggering a segfault.  Document this behavior (which was already
true for pipe2(2), but undocumented).

Reviewed by:	andrew
Approved by:	re (gjb)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D6815
2016-06-22 21:11:27 +00:00
Ed Maste
dae2d550d6 libc: stop exporting curbrk and minbrk in the private namespace
They are not used anywhere else in the base system and are an internal
implementation detail that does not need to be exposed.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5728
2016-03-24 18:47:19 +00:00
Ed Maste
142a37f3ad libc/{i386,amd64}: Do not export .cerror when building WITHOUT_SYMVER
Further to r240152 (i386) and r240178 (amd64), hide the .cerror symbol
so that it is not exported if symbol versioning is not in use.  Without
this change WITHOUT_SYMVER libc contains .text relocations for .cerror,
as described in LLVM PR 26813 (http://llvm.org/pr26813).

This is a no-op for the regular build as the symbol version script
already controls .cerror visibility.

PR:		207712
Submitted by:	Rafael Espíndola
Reviewed by:	jilles, kib
Differential Revision:	https://reviews.freebsd.org/D5571
2016-03-08 00:09:34 +00:00
Konstantin Belousov
bd6060a1c6 Switch libc from using _sig{procmask,action,suspend} symbols, which
are aliases for the syscall stubs and are plt-interposed, to the
libc-private aliases of internally interposed sigprocmask() etc.

Since e.g. _sigaction is not interposed by libthr, calling signal()
removes thr_sighandler() from the handler slot etc.  The result was
breaking signal semantic and rtld locking.

The added __libc_sigprocmask and other symbols are hidden, they are
not exported and cannot be called through PLT.  The setjmp/longjmp
functions for x86 were changed to use direct calls, and since
PIC_PROLOGUE only needed for functional PLT indirection on i386, it is
removed as well.

The PowerPC bug of calling the syscall directly in the setjmp/longjmp
implementation is kept as is.

Reported by:	Pete French <petefrench@ingresso.co.uk>
Tested by:	Michiel Boland <boland37@xs4all.nl>
Reviewed by:	jilles (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-29 14:25:01 +00:00
Konstantin Belousov
35dfc644f5 Copy the fencing of the algorithm to do lock-less update and reading
of the timehands, from the kern_tc.c implementation to vdso.  Add
comments giving hints where to look for the algorithm explanation.

To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc.  On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.

Reviewed by:	alc, bde (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 12:33:51 +00:00
Edward Tomasz Napierala
c886a05c13 Remove reboot.S (part of libc). It's not needed and was actually
broken - returning 0 from reboot(2) resulted in SIGBUS.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-07-07 09:25:51 +00:00
Konstantin Belousov
0538aafc41 The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter.  The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development.  The
shims were reasonable to allow easier revert to the older kernel at
that time.

Now, two or three major releases later, shims do not serve any
purpose.  Such old kernels cannot handle current libc, so revert the
compatibility code.

Make padded syscalls support conditional under the COMPAT6 config
option.  For COMPAT32, the syscalls were under COMPAT6 already.

Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.

Reviewed by:	jhb, imp (previous versions)
Discussed with:	peter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-18 21:50:13 +00:00
Jilles Tjoelker
078cb49070 siglongjmp(): Preserve floating point exception flags on i386 and amd64.
Per POSIX, siglongjmp() shall be equivalent to longjmp() except that it must
match sigsetjmp() instead of setjmp() and except for the effect on the
signal mask. Therefore, it should preserve the floating point exception
flags.

This was fixed for longjmp() and _longjmp() in r180080 and r180081 for amd64
and i386 respectively.
2014-06-09 21:35:36 +00:00
Warner Losh
a5fc5b6223 Convert from WITHOUT_SYSCALL_COMPAT to MK_SYSCALL_COMPAT. 2014-04-05 17:54:43 +00:00
Marcel Moolenaar
8876613dc5 Replace use of ${.CURDIR} by ${LIBC_SRCTOP} and define ${LIBC_SRCTOP}
if not already defined. This allows building libc from outside of
lib/libc using a reach-over makefile.

A typical use-case is to build a standard ILP32 version and a COMPAT32
version in a single iteration by building the COMPAT32 version using a
reach-over makefile.

Obtained from:	Juniper Networks, Inc.
2014-03-04 02:19:39 +00:00
Andreas Tobler
646c5b4840 Replace the WEAK_ALIAS() alias with the WEAK_REFERENCE() alias. Use it and
get rid of the __CONCAT and CNAME macros.

Reviewed by:	bde, kib
2013-11-21 22:31:18 +00:00
Jilles Tjoelker
0f3a4d8051 libc: Access _logname_valid more efficiently.
The variable _logname_valid is not exported via the version script;
therefore, change C and i386/amd64 assembler code to remove indirection
(which allowed interposition). This makes the code slightly smaller and
faster.

Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC,
there is no place containing the address of each variable, so there is no
possible definition for PIC_GOT.
2013-08-17 19:24:58 +00:00
Konstantin Belousov
55a1911ef2 The getcontext() from the __fillcontextx() call in the
check_deferred_signal() returns twice, since handle_signal() emulates
the return from the normal signal handler by sigreturn(2)ing the
passed context.  Second return is performed on the destroyed stack
frame, because __fillcontextx() has already returned.  This causes
undefined and bad behaviour, usually the victim thread gets SIGSEGV.

Avoid nested frame and the need to return from it by doing direct call
to getcontext() in the check_deferred_signal() and using a new private
libc helper __fillcontextx2() to complement the context with the
extended CPU state if the deferred signal is still present.

The __fillcontextx() is now unused, but is kept to allow older
libthr.so to be used with the new libc.

Mark __fillcontextx() as returning twice [1].

Reported by:	pgj
Pointy hat to:	kib
Discussed with:	dim
Tested by:	pgj, dim
Suggested by:	jilles [1]
MFC after:	1 week
2013-05-28 04:54:16 +00:00
Gabor Kovesdan
ab3f6b347e - Correct mispellings of the word occurrence
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:40:10 +00:00
Konstantin Belousov
150facd256 Rework the __vdso_* symbols attributes to only make the symbols weak,
but use normal references instead of weak.  This makes the statically
linked binaries to use fast gettimeofday(2) by forcing the linker to
resolve references and providing the neccessary functions.

Reported by:	bde
Tested by:	marius (sparc64)
MFC after:	2 weeks
2013-01-30 12:48:16 +00:00
Jilles Tjoelker
a8599e090f libc/i386: Do not export .cerror.
For some reason, libc exports the symbol .cerror (HIDENAME(cerror)), albeit
in the FBSDprivate_1.0 version. It looks like there is no reason for this
since it is not used from other libraries. Given that it cannot be accessed
from C and its strange calling convention, it is rather unlikely that other
things rely on it. Perhaps it is from a time when symbols could not be
hidden.

Not exporting .cerror causes it to be jumped to directly instead of via the
PLT.

This change also takes advantage of .cerror's new status by not saving and
loading %ebx before jumping to it. (Therefore, .cerror now saves and loads
%ebx itself.) Where there was a conditional jump to a jump to .cerror, the
conditional jump has been changed to jump to .cerror directly (many modern
CPUs don't do static prediction and in any case it is not much of a benefit
anyway).

This change makes libc.so.7 a few kilobytes smaller.

Reviewed by:	kib
2012-09-05 21:41:05 +00:00
David Xu
b409107773 Executing CPUID with EAX set to 1 to actually get feature flags.
PR:	169730
2012-07-10 01:47:11 +00:00
Konstantin Belousov
869fd80fd4 Use struct vdso_timehands data to implement fast gettimeofday(2) and
clock_gettime(2) functions if supported. The speedup seen in
microbenchmarks is in range 4x-7x depending on the hardware.

Only amd64 and i386 architectures are supported. Libc uses rdtsc and
kernel data to calculate current time, if enabled by kernel.

Hopefully, this code is going to migrate into vdso in some future.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:13:30 +00:00
Joel Dahl
bea977e7f6 mdoc: sort prologue macros. 2012-03-26 19:23:57 +00:00
Konstantin Belousov
754f1c1e63 Make the sys/ucontext.h self-contained by changing the return type
of __getcontextx_size(3) from size_t to int.

PR:	ports/164654
MFC after:	1 month
2012-02-01 13:33:53 +00:00
Konstantin Belousov
2b1de0afd1 Add API for obtaining extended machine context states that cannot be
fit into existing mcontext_t.

On i386 and amd64 do return the extended FPU states using
getcontextx(3). For other architectures, getcontextx(3) returns the
same information as getcontext(2).

Tested by:  pho
MFC after:  1 month
2012-01-21 18:00:28 +00:00
Ed Schouten
19c262fe87 Change index() and rindex() to a weak alias.
This allows people to still write statically linked applications that
call strchr() or strrchr() and have a local variable or function called
index.

Discussed with:	bde@
2012-01-05 10:32:53 +00:00
Ed Schouten
46632c18bd Merge index() and strchr() together.
As I looked through the C library, I noticed the FreeBSD MIPS port has a
hand-written version of index(). This is nice, if it weren't for the
fact that most applications call strchr() instead.

Also, on the other architectures index() and strchr() are identical,
meaning we have two identical pieces of code in the C library and
statically linked applications.

Solve this by naming the actual file strchr.[cS] and let it use
__strong_reference()/STRONG_ALIAS() to provide the index() routine. Do
the same for rindex()/strrchr().

This seems to make the C libraries and static binaries slightly smaller,
but this reduction in size seems negligible.
2012-01-03 07:14:01 +00:00
Konstantin Belousov
1a9879c32a Despite official i386 ABI does not mandate any stack alignment besides
the word alignment, some versions of gcc do require 16-byte alignment.
Make sure the stack is 16-byte aligned before calling a subroutine.

Inspired by:	PR amd64/162214
MFC after:	1 week
2011-11-02 18:08:30 +00:00