Commit Graph

279 Commits

Author SHA1 Message Date
Warner Losh
d0b2dbfa0e Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:55:03 -06:00
Warner Losh
1d386b48a5 Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:42 -06:00
Warner Losh
2a63c3be15 Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:29 -06:00
Warner Losh
42b388439b Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:23 -06:00
Warner Losh
b3e7694832 Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:16 -06:00
Robert Clausecker
d7302cabc0 lib/libc/amd64/string/strchrnul.S: fix wrong indentation
Uses spaces instead of tabs for this line by accident.

Reported by:	jrtc27, kib
Approved by:	kib
2023-08-07 14:03:28 +02:00
Robert Clausecker
61f4c4d3dd lib/libc/amd64/string: add strchrnul implementations (scalar, baseline)
A lot better than the generic (pre) implementaion.  We do not beat glibc
for long strings, likely due to glibc switching to AVX once the input is
sufficiently long.  X86-64-v3 and v4 implementations may be added at a
future time.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ strchrnul_pre.out │         strchrnul_scalar.out         │       strchrnul_baseline.out        │
        │      sec/op       │    sec/op     vs base                │   sec/op     vs base                │
Short          129.68µ ± 3%    59.91µ ± 1%  -53.80% (p=0.000 n=20)   44.37µ ± 1%  -65.79% (p=0.000 n=20)
Mid             21.15µ ± 0%    19.30µ ± 0%   -8.76% (p=0.000 n=20)   12.30µ ± 0%  -41.85% (p=0.000 n=20)
Long           13.772µ ± 0%   11.028µ ± 0%  -19.92% (p=0.000 n=20)   3.285µ ± 0%  -76.15% (p=0.000 n=20)
geomean         33.55µ         23.36µ       -30.37%                  12.15µ       -63.80%

        │ strchrnul_pre.out │          strchrnul_scalar.out          │         strchrnul_baseline.out         │
        │        B/s        │      B/s       vs base                 │      B/s       vs base                 │
Short          919.3Mi ± 3%   1989.7Mi ± 1%  +116.45% (p=0.000 n=20)   2686.8Mi ± 1%  +192.28% (p=0.000 n=20)
Mid            5.505Gi ± 0%    6.033Gi ± 0%    +9.60% (p=0.000 n=20)    9.466Gi ± 0%   +71.97% (p=0.000 n=20)
Long           8.453Gi ± 0%   10.557Gi ± 0%   +24.88% (p=0.000 n=20)   35.441Gi ± 0%  +319.26% (p=0.000 n=20)
geomean        3.470Gi         4.983Gi        +43.62%                   9.584Gi       +176.22%

For comparison, glibc on the same machine:

        │ strchrnul_glibc.out │
        │       sec/op        │
Short             49.73µ ± 0%
Mid               14.60µ ± 0%
Long              1.237µ ± 0%
geomean           9.646µ

        │ strchrnul_glibc.out │
        │         B/s         │
Short            2.341Gi ± 0%
Mid              7.976Gi ± 0%
Long             94.14Gi ± 0%
geomean          12.07Gi

Sponsored by:	The FreeBSD Foundation
Approved by:	mjg
Differential Revision: https://reviews.freebsd.org/D41333
2023-08-06 15:58:27 +02:00
Robert Clausecker
d8385768fb lib/libc/amd64/string/strlen.S: add amd64 baseline kernel
This performs very well.  x86-64-v3 and x86-64-v4 kernels were written,
too, but performed worse than the baseline kernel on short strings.
These may be added at a future point in time if the performance issues
can be fixed.

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ strlen_scalar.out │          strlen_baseline.out          │
        │        B/s        │     B/s       vs base                 │
Short          1.667Gi ± 1%   2.676Gi ± 1%   +60.55% (p=0.000 n=20)
Mid            5.459Gi ± 1%   8.756Gi ± 1%   +60.39% (p=0.000 n=20)
Long           15.34Gi ± 0%   52.27Gi ± 0%  +240.64% (p=0.000 n=20)
geomean        5.188Gi        10.70Gi       +106.24%

Sponsored by:	The FreeBSD Foundation
Approved by:	kib
Reviewed by:	mjg jrtc27
Differential Revision:	https://reviews.freebsd.org/D40693
2023-08-04 01:54:23 +03:00
Robert Clausecker
ad2fac552c lib/libc/amd64: add archlevel-based simd dispatch framework
Add a framework for selecting from one of multiple implementations
of a function based on amd64 architecture level (cf. amd64 SysV
ABI supplement).

Sponsored by:	The FreeBSD Foundation
Approved by:	kib
Reviewed by:	jrtc27
Differential Revision:	https://reviews.freebsd.org/D40693
2023-08-04 01:53:43 +03:00
Dmitry Chagin
acfd261524 libc: Improve setjmp comments
Reviewed by:		imp, kib
Differential Revision:	https://reviews.freebsd.org/D40879
2023-07-06 20:21:54 +03:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Konstantin Belousov
5942b4b6fd sys/param.h: Add _WANT_P_OSREL
Use it instead of defining IN_RTLD by base sources that want P_OSREL_
defines in userspace, but are not rtld.
This allows to remove abuse of IN_RTLD from userspace.

Reviewed by:	dchagin, markj, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38585
2023-02-15 02:43:18 +02:00
Konstantin Belousov
ae507c25de amd64 libc: add missed GNU-stack annotation to memmove/memcpy
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2022-11-18 15:31:38 +02:00
Alexander Motin
f22068d91b amd64: Stop using REP MOVSB for backward memmove()s.
Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.

This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.

This is just a cosmetic sync with kernel, since libc does not use
ERMS at this time.

Reviewed by:    mjg
MFC after:	2 weeks
2022-06-16 14:51:50 -04:00
Mateusz Guzik
fbc002cb72 amd64: bring back asm bcmp, shared with memcmp
Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to
bcmp calls.

Reviewed by:	emaste (previous version), jhb (previous version)
Differential Revision:	https://reviews.freebsd.org/D34673
2022-03-26 09:10:03 +00:00
Mateusz Guzik
f0f0f2abf3 amd64: remove bcmp.S
Fixes:  5fc3cc2713 ("amd64: make bcmp in libc just call memcmp")
2022-03-25 14:57:51 +00:00
Mateusz Guzik
5fc3cc2713 amd64: make bcmp in libc just call memcmp
Preferably bcmp would just alias memcmp but there is build magic which
makes this problematic.

Reviewed by:	jhb
Differential Revision:		https://reviews.freebsd.org/D28846
2022-03-12 14:59:14 +00:00
John Baldwin
ae67737a4c libc: Remove _get_tp() and _set_tp().
Their uses have been replaced by _tcb_get() and _tcb_set() from
<machine/tls.h>.

Reviewed by:	kib, jrtc27
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33354
2021-12-09 13:23:26 -08:00
Konstantin Belousov
06d8a116bd libc: add _get_tp() private function
which returns pointer to tcb

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29623
2021-04-09 23:46:24 +03:00
Konstantin Belousov
d218c6f6af amd64 fabs.S: use '.section .rodata' instead of '.rodata'
Seems to be an issue with older gnu as

Reported by:	rscheff
Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2021-04-04 22:33:22 +03:00
Konstantin Belousov
6d3f54fd09 amd64 fabs.S: put signbit into rodata instead of text
Noted by:	jrtc27
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-04-04 04:49:22 +03:00
Konstantin Belousov
4c2e9c35fb libc/<arch>/sys/cerror.S: fix typo
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-04-04 01:00:57 +03:00
Konstantin Belousov
f548033818 amd64 fabs(3): move signbit to .text
There is no reason for signbit quad to be writeable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-04 01:00:57 +03:00
Mateusz Guzik
7f06b217c5 amd64: import asm strlen into libc
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28845
2021-02-23 00:09:55 +00:00
Mateusz Guzik
f1be262ec1 amd64: move memcmp checks upfront
This is a tradeoff which saves jumps for smaller sizes while making
the 8-16 range slower (roughly in line with the other cases).

Tested with glibc test suite.

For example size 3 (most common with vfs namecache) (ops/s):
before:	407086026
after:	461391995

The regressed range of 8-16 (with 8 as example):
before:	540850489
after:	461671032
2021-01-31 16:07:20 +00:00
Mateusz Guzik
0db6aef407 amd64: add a note about simd to libc memset, memmove and memcmp 2021-01-31 16:07:19 +00:00
Mateusz Guzik
164c3b8184 amd64: add missing ALIGN_TEXT to loops in memset and memmove 2021-01-30 00:01:44 +00:00
Mateusz Guzik
8291e88748 amd64: sync up libc memcmp with the kernel version (r357309) 2020-01-30 19:57:05 +00:00
Mateusz Guzik
4846152a08 amd64: sync up libc memcmp with the kernel version (r357208) 2020-01-29 01:57:07 +00:00
Konstantin Belousov
7c5a46a1bc Remove resolver_qual from DEFINE_IFUNC/DEFINE_UIFUNC macros.
In all practical situations, the resolver visibility is static.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	so (emaste)
Differential revision:	https://reviews.freebsd.org/D20281
2019-05-16 22:20:54 +00:00
Konstantin Belousov
5d00c5a657 Fix initial exec TLS mode for dynamically loaded shared objects.
If dso uses initial exec TLS mode, rtld tries to allocate TLS in
static space. If there is no space left, the dlopen(3) fails. If space
if allocated, initial content from PT_TLS segment is distributed to
all threads' pcbs, which was missed and caused un-initialized TLS
segment for such dso after dlopen(3).

The mode is auto-detected either due to the relocation used, or if the
DF_STATIC_TLS dynamic flag is set.  In the later case, the TLS segment
is tried to allocate earlier, which increases chance of the dlopen(3)
to succeed.  LLD was recently fixed to properly emit the flag, ld.bdf
did it always.

Initial test by:	dumbbell
Tested by:	emaste (amd64), ian (arm)
Tested by:	Gerald Aryeetey <aryeeteygerald_rogers.com> (arm64)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19072
2019-03-29 17:52:57 +00:00
Konstantin Belousov
a2d95495ee Add usermode helpers for for Intel userspace protection keys feature.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:56:23 +00:00
Konstantin Belousov
071bca67ee Unify i386 and amd64 getcontextx.c, and use ifuncs while there.
In particular, use ifuncs for __getcontextx_size(), also calculate the
size of the extended save area in resolver.  Same for __fillcontextx2().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-14 14:02:33 +00:00
Brooks Davis
db19a093bb Remove MD __sys_* private symbols.
No references to any of these exist in the tree. The list was also
erratic with different architectures exporting different things
(arm64 and riscv exported none).

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18425
2018-12-05 00:46:09 +00:00
Mateusz Guzik
ddf6571230 amd64: align target memmove buffer to 16 bytes before using rep movs
See the review for sample test results.

Reviewed by:	kib (kernel part)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18401
2018-12-01 14:20:32 +00:00
Mateusz Guzik
94243af2da amd64: handle small memmove buffers with overlapping stores
Handling sizes of > 32 backwards will be updated later.

Reviewed by:	kib (kernel part)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18387
2018-11-30 20:58:08 +00:00
Mateusz Guzik
2847cfce54 amd64: remove stale attribution for memmove work
While the routine started as expanded bcopy, it is now entirely rewritten.

Sponsored by:	The FreeBSD Foundation
2018-11-30 00:47:36 +00:00
Mateusz Guzik
dd219e5ea5 amd64: tidy up copying backwards in memmove
For non-ERMS case the code used handle possible trailing bytes with
movsb first and then followed it up with movsq. This also happened
to alter how calculations were done for other cases.

Handle the tail with regular movs, just like when copying forward.
Use leaq to calculate the right offset from the get go, instead of
doing separate add and sub.

This adjusts the offset for non-rep cases so that they can be used
to handle the tail.

The routine is still a work in progress.

Sponsored by:	The FreeBSD Foundation
2018-11-30 00:45:10 +00:00
Mateusz Guzik
088ac3ef4b amd64: handle small memset buffers with overlapping stores
Instead of jumping to locations which store the exact number of bytes,
use displacement to move the destination.

In particular the following clears an area between 8-16 (inclusive)
branch-free:

movq    %r10,(%rdi)
movq    %r10,-8(%rdi,%rcx)

For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2.
Writing 8 bytes starting at that offset overlaps with 6 bytes written
previously and writes 2 new, giving 10 in total.

Provides a nice win for smaller stores. Other ones are erratic depending
on the microarchitecture.

General idea taken from NetBSD (restricted use of the trick) and bionic
string functions (use for various ranges like in this patch).

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17660
2018-11-16 00:44:22 +00:00
Mateusz Guzik
ad2ff705a4 amd64: sync up libc memset with the kernel version
- tidy up memset to have rax set earlier for small sizes
- finish the tail in memset with an overlapping store
- align memset buffers to 16 bytes before using rep stos

Sponsored by:	The FreeBSD Foundation
2018-11-15 20:28:35 +00:00
Mateusz Guzik
6fff634455 amd64: convert libc bzero to a C func to avoid future bloat
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17549
2018-11-15 20:20:39 +00:00
Konstantin Belousov
1d160e1e9c Convert amd64_get/set_fs/gsbase to ifunc.
Note that this is the first use of ifuncs in our userspace.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2018-10-30 00:11:30 +00:00
Mateusz Guzik
9c7d70ee7d amd64: convert libc bcopy to a C func to avoid future bloat
The function is of limited use and is an almost a direct clone of
memmove/memcpy (with arguments swapped). Introduction of ERMS variants
of string routines would mean avoidable growth of libc.

bcopy will get redefined to a __builtin_memmove later on with this
symbol only left for compatibility.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17539
2018-10-13 21:17:28 +00:00
Mateusz Guzik
1e52ba8c62 amd64: import updated kernel memmove to libc
bcopy is left alone as it is expected to be converted to a C func.

Due to header mess ALIGN_TEXT is temporarily defined explicitly in memmove.S

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17538
2018-10-13 21:15:47 +00:00
Mateusz Guzik
167374a162 amd64: import updated kernel memset to libc
See r339205 for details.

An unused ERMS support is retained in the macro. It will be activated
after ifunc support lands.

Reviewed by:    kib
Approved by:    re (gjb)
Sponsored by:   The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17405
2018-10-05 19:27:42 +00:00
Mateusz Guzik
7e02ad0769 amd64: reimplement libc memset and bzero with kernel memset
This is a depessimization, see r334537 for an explanation. Routines
remain significantly slower than they have to be.

bzero was removed from the kernel but remains in libc. Macroify to
accommodate differences to memset (no return value, always setting to 0).

The bzero.S file is left in place due to libc build magic which pulls in
a C variant if a matching .S file is missing.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17355
2018-10-01 20:39:17 +00:00
Mateusz Guzik
275c893dab amd64: remove unnecessary cld from libc memcpy/bcopy
The ABI specifies the direction forward on function call, making
the cld instruction redundant.

Approved by:	re (kib)
2018-09-29 07:40:52 +00:00
Mateusz Guzik
5bbde333cd amd64: reimplement libc memcmp and bcmp with kernel memcmp
Both are significantly slower than hand-coded loops. See r338963 for
kernel commit.

bcmp differs from memcmp by always returning 1 when a difference is
found, as opposed to going for a value bigger or lower than 0
depending on what it is. This means it can do less work. For now the
code is duplicated and modified. This will get deduplicated after
another round of optimization when memcmp will get a longer-term form.

Both tested with the glibc suite. While the suite does not have a test
for bcmp, I created a wrapper routine which verified that values match
(0 vs 0, 1 vs non-zero).

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17336
2018-09-27 17:08:29 +00:00
Mateusz Guzik
23ec0d58bf amd64: depessimize userspace memcpy/memmove/bcopy
The change resembles what was done in r334537 for kernel routines.
While here take care of i386 variants. Note that primitives remain
suboptimal.

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17167
2018-09-17 15:49:35 +00:00
Mark Johnston
9f9c9b22ec Reimplement brk() and sbrk() to avoid the use of _end.
Previously, libc.so would initialize its notion of the break address
using _end, a special symbol emitted by the static linker following
the bss section.  Compatibility issues between lld and ld.bfd could
cause the wrong definition of _end (libc.so's definition rather than
that of the executable) to be used, breaking the brk()/sbrk()
interface.

Avoid this problem and future interoperability issues by simply not
relying on _end.  Instead, modify the break() system call to return
the kernel's view of the current break address, and have libc
initialize its state using an extra syscall upon the first use of the
interface.  As a side effect, this appears to fix brk()/sbrk() usage
in executables run with rtld direct exec, since the kernel and libc.so
no longer maintain separate views of the process' break address.

PR:		228574
Reviewed by:	kib (previous version)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D15663
2018-06-04 19:35:15 +00:00