Commit Graph

37 Commits

Author SHA1 Message Date
Mateusz Guzik
7f06b217c5 amd64: import asm strlen into libc
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28845
2021-02-23 00:09:55 +00:00
Mateusz Guzik
f1be262ec1 amd64: move memcmp checks upfront
This is a tradeoff which saves jumps for smaller sizes while making
the 8-16 range slower (roughly in line with the other cases).

Tested with glibc test suite.

For example size 3 (most common with vfs namecache) (ops/s):
before:	407086026
after:	461391995

The regressed range of 8-16 (with 8 as example):
before:	540850489
after:	461671032
2021-01-31 16:07:20 +00:00
Mateusz Guzik
0db6aef407 amd64: add a note about simd to libc memset, memmove and memcmp 2021-01-31 16:07:19 +00:00
Mateusz Guzik
164c3b8184 amd64: add missing ALIGN_TEXT to loops in memset and memmove 2021-01-30 00:01:44 +00:00
Mateusz Guzik
8291e88748 amd64: sync up libc memcmp with the kernel version (r357309) 2020-01-30 19:57:05 +00:00
Mateusz Guzik
4846152a08 amd64: sync up libc memcmp with the kernel version (r357208) 2020-01-29 01:57:07 +00:00
Mateusz Guzik
ddf6571230 amd64: align target memmove buffer to 16 bytes before using rep movs
See the review for sample test results.

Reviewed by:	kib (kernel part)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18401
2018-12-01 14:20:32 +00:00
Mateusz Guzik
94243af2da amd64: handle small memmove buffers with overlapping stores
Handling sizes of > 32 backwards will be updated later.

Reviewed by:	kib (kernel part)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18387
2018-11-30 20:58:08 +00:00
Mateusz Guzik
2847cfce54 amd64: remove stale attribution for memmove work
While the routine started as expanded bcopy, it is now entirely rewritten.

Sponsored by:	The FreeBSD Foundation
2018-11-30 00:47:36 +00:00
Mateusz Guzik
dd219e5ea5 amd64: tidy up copying backwards in memmove
For non-ERMS case the code used handle possible trailing bytes with
movsb first and then followed it up with movsq. This also happened
to alter how calculations were done for other cases.

Handle the tail with regular movs, just like when copying forward.
Use leaq to calculate the right offset from the get go, instead of
doing separate add and sub.

This adjusts the offset for non-rep cases so that they can be used
to handle the tail.

The routine is still a work in progress.

Sponsored by:	The FreeBSD Foundation
2018-11-30 00:45:10 +00:00
Mateusz Guzik
088ac3ef4b amd64: handle small memset buffers with overlapping stores
Instead of jumping to locations which store the exact number of bytes,
use displacement to move the destination.

In particular the following clears an area between 8-16 (inclusive)
branch-free:

movq    %r10,(%rdi)
movq    %r10,-8(%rdi,%rcx)

For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2.
Writing 8 bytes starting at that offset overlaps with 6 bytes written
previously and writes 2 new, giving 10 in total.

Provides a nice win for smaller stores. Other ones are erratic depending
on the microarchitecture.

General idea taken from NetBSD (restricted use of the trick) and bionic
string functions (use for various ranges like in this patch).

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17660
2018-11-16 00:44:22 +00:00
Mateusz Guzik
ad2ff705a4 amd64: sync up libc memset with the kernel version
- tidy up memset to have rax set earlier for small sizes
- finish the tail in memset with an overlapping store
- align memset buffers to 16 bytes before using rep stos

Sponsored by:	The FreeBSD Foundation
2018-11-15 20:28:35 +00:00
Mateusz Guzik
6fff634455 amd64: convert libc bzero to a C func to avoid future bloat
Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17549
2018-11-15 20:20:39 +00:00
Mateusz Guzik
9c7d70ee7d amd64: convert libc bcopy to a C func to avoid future bloat
The function is of limited use and is an almost a direct clone of
memmove/memcpy (with arguments swapped). Introduction of ERMS variants
of string routines would mean avoidable growth of libc.

bcopy will get redefined to a __builtin_memmove later on with this
symbol only left for compatibility.

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17539
2018-10-13 21:17:28 +00:00
Mateusz Guzik
1e52ba8c62 amd64: import updated kernel memmove to libc
bcopy is left alone as it is expected to be converted to a C func.

Due to header mess ALIGN_TEXT is temporarily defined explicitly in memmove.S

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17538
2018-10-13 21:15:47 +00:00
Mateusz Guzik
167374a162 amd64: import updated kernel memset to libc
See r339205 for details.

An unused ERMS support is retained in the macro. It will be activated
after ifunc support lands.

Reviewed by:    kib
Approved by:    re (gjb)
Sponsored by:   The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17405
2018-10-05 19:27:42 +00:00
Mateusz Guzik
7e02ad0769 amd64: reimplement libc memset and bzero with kernel memset
This is a depessimization, see r334537 for an explanation. Routines
remain significantly slower than they have to be.

bzero was removed from the kernel but remains in libc. Macroify to
accommodate differences to memset (no return value, always setting to 0).

The bzero.S file is left in place due to libc build magic which pulls in
a C variant if a matching .S file is missing.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17355
2018-10-01 20:39:17 +00:00
Mateusz Guzik
275c893dab amd64: remove unnecessary cld from libc memcpy/bcopy
The ABI specifies the direction forward on function call, making
the cld instruction redundant.

Approved by:	re (kib)
2018-09-29 07:40:52 +00:00
Mateusz Guzik
5bbde333cd amd64: reimplement libc memcmp and bcmp with kernel memcmp
Both are significantly slower than hand-coded loops. See r338963 for
kernel commit.

bcmp differs from memcmp by always returning 1 when a difference is
found, as opposed to going for a value bigger or lower than 0
depending on what it is. This means it can do less work. For now the
code is duplicated and modified. This will get deduplicated after
another round of optimization when memcmp will get a longer-term form.

Both tested with the glibc suite. While the suite does not have a test
for bcmp, I created a wrapper routine which verified that values match
(0 vs 0, 1 vs non-zero).

Reviewed by:	kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17336
2018-09-27 17:08:29 +00:00
Mateusz Guzik
23ec0d58bf amd64: depessimize userspace memcpy/memmove/bcopy
The change resembles what was done in r334537 for kernel routines.
While here take care of i386 variants. Note that primitives remain
suboptimal.

Reviewed by:	kib (previous version)
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17167
2018-09-17 15:49:35 +00:00
Pedro F. Giffuni
d915a14ef0 libc: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using mis-identified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-25 17:12:48 +00:00
Brooks Davis
9fe44df287 Correct MDSRCS use in <arch>/string/Makefile.inc.
- Remove .c files which duplicate entries in MISRCS.
- Use the same, less merge conflict prone style in all cases.
- Use MDSRCS for mips (.c and .S files both ended up in SRCS).
- Remove pointless sparc64 Makefile.inc.
- Remove uninformative foreign VCS ID entries.

Reviewed by:	emaste, imp, jhb
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9841
2017-03-02 17:05:52 +00:00
Pedro F. Giffuni
32223c1b7d libc: spelling fixes.
Mostly on comments.
2016-04-30 01:24:24 +00:00
George V. Neville-Neil
a388fa6823 Remove incorrect attribution.
Approved by:	re (kib)
Pointed out by: brueffer
Pointy hat to: gnn
2011-07-21 20:06:14 +00:00
George V. Neville-Neil
c03b5ad6a9 Make both stpcpy and strcpy be assembly language implementations
on amd64.

Submitted by:	Guillaume Morin (guillaume at morinfr.org)
Reviewed by:	kib, jhb
Approved by:	re (bz)
MFC after:	1 month
2011-07-21 16:32:13 +00:00
Konstantin Belousov
adc6846785 Remove duplicate .note.GNU-stack section declaration. bcopy already
made the neccessary provisions.

Reported by:	arundel
2011-02-04 21:04:00 +00:00
Konstantin Belousov
93ab758670 Add section .note.GNU-stack for assembly files used by 386 and amd64. 2011-01-07 16:08:40 +00:00
Peter Wemm
5d053f461c We've been lax about matching END() macros in asm code for some time. This
is used to set the ELF size attribute for functions.  It isn't normally
critical but some things can make use of it (gdb for stack traces).
Valgrind needs it so I'm adding it in.  The problem is present on all
branches and on both i386 and amd64.
2008-11-02 01:10:54 +00:00
Alan Cox
97cd6892ba Optimize the instruction alignment. 2005-04-23 18:45:36 +00:00
Alan Cox
7e266fcd1f Add a machine-specific, optimized implementation of strcat.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-10 18:58:49 +00:00
Alan Cox
fb41e04787 Eliminate a conditional branch and as a side-effect eliminate a branch to
a return instruction.  (The latter is discouraged by the Opteron
optimization manual because it disables branch prediction for the return
instruction.)

Reviewed by: bde
2005-04-10 18:12:07 +00:00
Alan Cox
6524eb94a1 Add a machine-specific, optimized implementation of strcpy.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-10 05:11:06 +00:00
Alan Cox
e5dd4df84c Add a machine-specific, optimized implementation of strcmp.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-09 20:47:08 +00:00
Alan Cox
26f6218be9 Add machine-specific, optimized implementations of bcmp and memcmp.
PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-08 05:15:55 +00:00
Alan Cox
b5c9ad687a Eliminate unneeded instructions that are a vestige of mechanical
translation from i386.
2005-04-08 05:10:18 +00:00
Alan Cox
0417d4e3e9 Eliminate an unneeded instruction that is a vestige of mechanical
translation from i386.
2005-04-07 05:46:46 +00:00
Alan Cox
91c09a383a Add machine-specific, optimized implementations of bcopy, bzero, memcpy,
memmove, and memset.

PR: 73111
Submitted by: Ville-Pertti Keinonen <will@iki.fi> (taken from NetBSD)
MFC after: 3 weeks
2005-04-07 03:56:03 +00:00