freebsd-dev

Author	SHA1	Message	Date
Warner Losh	d0b2dbfa0e	Remove $FreeBSD$: one-line sh pattern Remove /^\s#[#!]?\s\$FreeBSD\$.*$\n/	2023-08-16 11:55:03 -06:00
Warner Losh	1d386b48a5	Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/	2023-08-16 11:54:42 -06:00
Warner Losh	2a63c3be15	Remove $FreeBSD$: one-line .c comment pattern Remove /^/[/]\s\$FreeBSD\$.*\n/	2023-08-16 11:54:29 -06:00
Warner Losh	42b388439b	Remove $FreeBSD$: one-line .h pattern Remove /^\s\+\s\$FreeBSD\$.$\n/	2023-08-16 11:54:23 -06:00
Warner Losh	b3e7694832	Remove $FreeBSD$: two-line .h pattern Remove /^\s\\n \*\s+\$FreeBSD\$$\n/	2023-08-16 11:54:16 -06:00
Robert Clausecker	d7302cabc0	lib/libc/amd64/string/strchrnul.S: fix wrong indentation Uses spaces instead of tabs for this line by accident. Reported by: jrtc27, kib Approved by: kib	2023-08-07 14:03:28 +02:00
Robert Clausecker	61f4c4d3dd	lib/libc/amd64/string: add strchrnul implementations (scalar, baseline) A lot better than the generic (pre) implementaion. We do not beat glibc for long strings, likely due to glibc switching to AVX once the input is sufficiently long. X86-64-v3 and v4 implementations may be added at a future time. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │ │ sec/op │ sec/op vs base │ sec/op vs base │ Short 129.68µ ± 3% 59.91µ ± 1% -53.80% (p=0.000 n=20) 44.37µ ± 1% -65.79% (p=0.000 n=20) Mid 21.15µ ± 0% 19.30µ ± 0% -8.76% (p=0.000 n=20) 12.30µ ± 0% -41.85% (p=0.000 n=20) Long 13.772µ ± 0% 11.028µ ± 0% -19.92% (p=0.000 n=20) 3.285µ ± 0% -76.15% (p=0.000 n=20) geomean 33.55µ 23.36µ -30.37% 12.15µ -63.80% │ strchrnul_pre.out │ strchrnul_scalar.out │ strchrnul_baseline.out │ │ B/s │ B/s vs base │ B/s vs base │ Short 919.3Mi ± 3% 1989.7Mi ± 1% +116.45% (p=0.000 n=20) 2686.8Mi ± 1% +192.28% (p=0.000 n=20) Mid 5.505Gi ± 0% 6.033Gi ± 0% +9.60% (p=0.000 n=20) 9.466Gi ± 0% +71.97% (p=0.000 n=20) Long 8.453Gi ± 0% 10.557Gi ± 0% +24.88% (p=0.000 n=20) 35.441Gi ± 0% +319.26% (p=0.000 n=20) geomean 3.470Gi 4.983Gi +43.62% 9.584Gi +176.22% For comparison, glibc on the same machine: │ strchrnul_glibc.out │ │ sec/op │ Short 49.73µ ± 0% Mid 14.60µ ± 0% Long 1.237µ ± 0% geomean 9.646µ │ strchrnul_glibc.out │ │ B/s │ Short 2.341Gi ± 0% Mid 7.976Gi ± 0% Long 94.14Gi ± 0% geomean 12.07Gi Sponsored by: The FreeBSD Foundation Approved by: mjg Differential Revision: https://reviews.freebsd.org/D41333	2023-08-06 15:58:27 +02:00
Robert Clausecker	d8385768fb	lib/libc/amd64/string/strlen.S: add amd64 baseline kernel This performs very well. x86-64-v3 and x86-64-v4 kernels were written, too, but performed worse than the baseline kernel on short strings. These may be added at a future point in time if the performance issues can be fixed. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strlen_scalar.out │ strlen_baseline.out │ │ B/s │ B/s vs base │ Short 1.667Gi ± 1% 2.676Gi ± 1% +60.55% (p=0.000 n=20) Mid 5.459Gi ± 1% 8.756Gi ± 1% +60.39% (p=0.000 n=20) Long 15.34Gi ± 0% 52.27Gi ± 0% +240.64% (p=0.000 n=20) geomean 5.188Gi 10.70Gi +106.24% Sponsored by: The FreeBSD Foundation Approved by: kib Reviewed by: mjg jrtc27 Differential Revision: https://reviews.freebsd.org/D40693	2023-08-04 01:54:23 +03:00
Robert Clausecker	ad2fac552c	lib/libc/amd64: add archlevel-based simd dispatch framework Add a framework for selecting from one of multiple implementations of a function based on amd64 architecture level (cf. amd64 SysV ABI supplement). Sponsored by: The FreeBSD Foundation Approved by: kib Reviewed by: jrtc27 Differential Revision: https://reviews.freebsd.org/D40693	2023-08-04 01:53:43 +03:00
Dmitry Chagin	acfd261524	libc: Improve setjmp comments Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D40879	2023-07-06 20:21:54 +03:00
Warner Losh	4d846d260e	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix	2023-05-12 10:44:03 -06:00
Konstantin Belousov	5942b4b6fd	sys/param.h: Add _WANT_P_OSREL Use it instead of defining IN_RTLD by base sources that want P_OSREL_ defines in userspace, but are not rtld. This allows to remove abuse of IN_RTLD from userspace. Reviewed by: dchagin, markj, imp Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38585	2023-02-15 02:43:18 +02:00
Konstantin Belousov	ae507c25de	amd64 libc: add missed GNU-stack annotation to memmove/memcpy Sponsored by: The FreeBSD Foundation MFC after: 1 week	2022-11-18 15:31:38 +02:00
Alexander Motin	f22068d91b	amd64: Stop using REP MOVSB for backward memmove()s. Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes REP MOVSB the fastest way to copy memory in most of cases. However Intel Optimization Reference Manual says: "setting the DF to force REP MOVSB to copy bytes from high towards low addresses will expe- rience significant performance degradation". Measurements on Intel Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs. This patch keeps ERMS use for forward ordered memory copies, but removes it for backward overlapped moves where it does not work. This is just a cosmetic sync with kernel, since libc does not use ERMS at this time. Reviewed by: mjg MFC after: 2 weeks	2022-06-16 14:51:50 -04:00
Mateusz Guzik	fbc002cb72	amd64: bring back asm bcmp, shared with memcmp Turns out clang converts "memcmp(foo, bar, len) == 0" and similar to bcmp calls. Reviewed by: emaste (previous version), jhb (previous version) Differential Revision: https://reviews.freebsd.org/D34673	2022-03-26 09:10:03 +00:00
Mateusz Guzik	f0f0f2abf3	amd64: remove bcmp.S Fixes: `5fc3cc2713` ("amd64: make bcmp in libc just call memcmp")	2022-03-25 14:57:51 +00:00
Mateusz Guzik	5fc3cc2713	amd64: make bcmp in libc just call memcmp Preferably bcmp would just alias memcmp but there is build magic which makes this problematic. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D28846	2022-03-12 14:59:14 +00:00
John Baldwin	ae67737a4c	libc: Remove _get_tp() and _set_tp(). Their uses have been replaced by _tcb_get() and _tcb_set() from <machine/tls.h>. Reviewed by: kib, jrtc27 Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33354	2021-12-09 13:23:26 -08:00
Konstantin Belousov	06d8a116bd	libc: add _get_tp() private function which returns pointer to tcb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29623	2021-04-09 23:46:24 +03:00
Konstantin Belousov	d218c6f6af	amd64 fabs.S: use '.section .rodata' instead of '.rodata' Seems to be an issue with older gnu as Reported by: rscheff Sponsored by: The FreeBSD Foundation MFC after: 6 days	2021-04-04 22:33:22 +03:00
Konstantin Belousov	6d3f54fd09	amd64 fabs.S: put signbit into rodata instead of text Noted by: jrtc27 MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-04-04 04:49:22 +03:00
Konstantin Belousov	4c2e9c35fb	libc/<arch>/sys/cerror.S: fix typo Sponsored by: The FreeBSD Foundation MFC after: 3 days	2021-04-04 01:00:57 +03:00
Konstantin Belousov	f548033818	amd64 fabs(3): move signbit to .text There is no reason for signbit quad to be writeable. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-04-04 01:00:57 +03:00
Mateusz Guzik	7f06b217c5	amd64: import asm strlen into libc Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28845	2021-02-23 00:09:55 +00:00
Mateusz Guzik	f1be262ec1	amd64: move memcmp checks upfront This is a tradeoff which saves jumps for smaller sizes while making the 8-16 range slower (roughly in line with the other cases). Tested with glibc test suite. For example size 3 (most common with vfs namecache) (ops/s): before: 407086026 after: 461391995 The regressed range of 8-16 (with 8 as example): before: 540850489 after: 461671032	2021-01-31 16:07:20 +00:00
Mateusz Guzik	0db6aef407	amd64: add a note about simd to libc memset, memmove and memcmp	2021-01-31 16:07:19 +00:00
Mateusz Guzik	164c3b8184	amd64: add missing ALIGN_TEXT to loops in memset and memmove	2021-01-30 00:01:44 +00:00
Mateusz Guzik	8291e88748	amd64: sync up libc memcmp with the kernel version (r357309)	2020-01-30 19:57:05 +00:00
Mateusz Guzik	4846152a08	amd64: sync up libc memcmp with the kernel version (r357208)	2020-01-29 01:57:07 +00:00
Konstantin Belousov	7c5a46a1bc	Remove resolver_qual from DEFINE_IFUNC/DEFINE_UIFUNC macros. In all practical situations, the resolver visibility is static. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: so (emaste) Differential revision: https://reviews.freebsd.org/D20281	2019-05-16 22:20:54 +00:00
Konstantin Belousov	5d00c5a657	Fix initial exec TLS mode for dynamically loaded shared objects. If dso uses initial exec TLS mode, rtld tries to allocate TLS in static space. If there is no space left, the dlopen(3) fails. If space if allocated, initial content from PT_TLS segment is distributed to all threads' pcbs, which was missed and caused un-initialized TLS segment for such dso after dlopen(3). The mode is auto-detected either due to the relocation used, or if the DF_STATIC_TLS dynamic flag is set. In the later case, the TLS segment is tried to allocate earlier, which increases chance of the dlopen(3) to succeed. LLD was recently fixed to properly emit the flag, ld.bdf did it always. Initial test by: dumbbell Tested by: emaste (amd64), ian (arm) Tested by: Gerald Aryeetey <aryeeteygerald_rogers.com> (arm64) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D19072	2019-03-29 17:52:57 +00:00
Konstantin Belousov	a2d95495ee	Add usermode helpers for for Intel userspace protection keys feature. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D18893	2019-02-20 09:56:23 +00:00
Konstantin Belousov	071bca67ee	Unify i386 and amd64 getcontextx.c, and use ifuncs while there. In particular, use ifuncs for __getcontextx_size(), also calculate the size of the extended save area in resolver. Same for __fillcontextx2(). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-14 14:02:33 +00:00
Brooks Davis	db19a093bb	Remove MD __sys_* private symbols. No references to any of these exist in the tree. The list was also erratic with different architectures exporting different things (arm64 and riscv exported none). Reviewed by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18425	2018-12-05 00:46:09 +00:00
Mateusz Guzik	ddf6571230	amd64: align target memmove buffer to 16 bytes before using rep movs See the review for sample test results. Reviewed by: kib (kernel part) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18401	2018-12-01 14:20:32 +00:00
Mateusz Guzik	94243af2da	amd64: handle small memmove buffers with overlapping stores Handling sizes of > 32 backwards will be updated later. Reviewed by: kib (kernel part) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18387	2018-11-30 20:58:08 +00:00
Mateusz Guzik	2847cfce54	amd64: remove stale attribution for memmove work While the routine started as expanded bcopy, it is now entirely rewritten. Sponsored by: The FreeBSD Foundation	2018-11-30 00:47:36 +00:00
Mateusz Guzik	dd219e5ea5	amd64: tidy up copying backwards in memmove For non-ERMS case the code used handle possible trailing bytes with movsb first and then followed it up with movsq. This also happened to alter how calculations were done for other cases. Handle the tail with regular movs, just like when copying forward. Use leaq to calculate the right offset from the get go, instead of doing separate add and sub. This adjusts the offset for non-rep cases so that they can be used to handle the tail. The routine is still a work in progress. Sponsored by: The FreeBSD Foundation	2018-11-30 00:45:10 +00:00
Mateusz Guzik	088ac3ef4b	amd64: handle small memset buffers with overlapping stores Instead of jumping to locations which store the exact number of bytes, use displacement to move the destination. In particular the following clears an area between 8-16 (inclusive) branch-free: movq %r10,(%rdi) movq %r10,-8(%rdi,%rcx) For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2. Writing 8 bytes starting at that offset overlaps with 6 bytes written previously and writes 2 new, giving 10 in total. Provides a nice win for smaller stores. Other ones are erratic depending on the microarchitecture. General idea taken from NetBSD (restricted use of the trick) and bionic string functions (use for various ranges like in this patch). Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17660	2018-11-16 00:44:22 +00:00
Mateusz Guzik	ad2ff705a4	amd64: sync up libc memset with the kernel version - tidy up memset to have rax set earlier for small sizes - finish the tail in memset with an overlapping store - align memset buffers to 16 bytes before using rep stos Sponsored by: The FreeBSD Foundation	2018-11-15 20:28:35 +00:00
Mateusz Guzik	6fff634455	amd64: convert libc bzero to a C func to avoid future bloat Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17549	2018-11-15 20:20:39 +00:00
Konstantin Belousov	1d160e1e9c	Convert amd64_get/set_fs/gsbase to ifunc. Note that this is the first use of ifuncs in our userspace. Sponsored by: The FreeBSD Foundation MFC after: 1 month	2018-10-30 00:11:30 +00:00
Mateusz Guzik	9c7d70ee7d	amd64: convert libc bcopy to a C func to avoid future bloat The function is of limited use and is an almost a direct clone of memmove/memcpy (with arguments swapped). Introduction of ERMS variants of string routines would mean avoidable growth of libc. bcopy will get redefined to a __builtin_memmove later on with this symbol only left for compatibility. Reviewed by: kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17539	2018-10-13 21:17:28 +00:00
Mateusz Guzik	1e52ba8c62	amd64: import updated kernel memmove to libc bcopy is left alone as it is expected to be converted to a C func. Due to header mess ALIGN_TEXT is temporarily defined explicitly in memmove.S Reviewed by: kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17538	2018-10-13 21:15:47 +00:00
Mateusz Guzik	167374a162	amd64: import updated kernel memset to libc See r339205 for details. An unused ERMS support is retained in the macro. It will be activated after ifunc support lands. Reviewed by: kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17405	2018-10-05 19:27:42 +00:00
Mateusz Guzik	7e02ad0769	amd64: reimplement libc memset and bzero with kernel memset This is a depessimization, see r334537 for an explanation. Routines remain significantly slower than they have to be. bzero was removed from the kernel but remains in libc. Macroify to accommodate differences to memset (no return value, always setting to 0). The bzero.S file is left in place due to libc build magic which pulls in a C variant if a matching .S file is missing. Reviewed by: kib Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17355	2018-10-01 20:39:17 +00:00
Mateusz Guzik	275c893dab	amd64: remove unnecessary cld from libc memcpy/bcopy The ABI specifies the direction forward on function call, making the cld instruction redundant. Approved by: re (kib)	2018-09-29 07:40:52 +00:00
Mateusz Guzik	5bbde333cd	amd64: reimplement libc memcmp and bcmp with kernel memcmp Both are significantly slower than hand-coded loops. See r338963 for kernel commit. bcmp differs from memcmp by always returning 1 when a difference is found, as opposed to going for a value bigger or lower than 0 depending on what it is. This means it can do less work. For now the code is duplicated and modified. This will get deduplicated after another round of optimization when memcmp will get a longer-term form. Both tested with the glibc suite. While the suite does not have a test for bcmp, I created a wrapper routine which verified that values match (0 vs 0, 1 vs non-zero). Reviewed by: kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17336	2018-09-27 17:08:29 +00:00
Mateusz Guzik	23ec0d58bf	amd64: depessimize userspace memcpy/memmove/bcopy The change resembles what was done in r334537 for kernel routines. While here take care of i386 variants. Note that primitives remain suboptimal. Reviewed by: kib (previous version) Approved by: re (gjb) Differential Revision: https://reviews.freebsd.org/D17167	2018-09-17 15:49:35 +00:00
Mark Johnston	9f9c9b22ec	Reimplement brk() and sbrk() to avoid the use of _end. Previously, libc.so would initialize its notion of the break address using _end, a special symbol emitted by the static linker following the bss section. Compatibility issues between lld and ld.bfd could cause the wrong definition of _end (libc.so's definition rather than that of the executable) to be used, breaking the brk()/sbrk() interface. Avoid this problem and future interoperability issues by simply not relying on _end. Instead, modify the break() system call to return the kernel's view of the current break address, and have libc initialize its state using an extra syscall upon the first use of the interface. As a side effect, this appears to fix brk()/sbrk() usage in executables run with rtld direct exec, since the kernel and libc.so no longer maintain separate views of the process' break address. PR: 228574 Reviewed by: kib (previous version) MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D15663	2018-06-04 19:35:15 +00:00

1 2 3 4 5 ...

279 Commits