Commit Graph

13 Commits

Author SHA1 Message Date
Leandro Lupori
9f50aa45be [PowerPC64] Port optimized strcpy to PPC64LE
Submitted by:           Bruno Larsen <bruno.larsen@eldorado.org.br>
Reviewed by:            luporl, bdragon (IRC)
MFC after:              1 week
Sponsored by:           Eldorado Research Institute (eldorado.org.br)
Differential Revision:  https://reviews.freebsd.org/D29067
2021-03-25 13:20:12 -03:00
Leandro Lupori
2f56128403 [PowerPC64] Enforce natural alignment in bcopy
POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses. Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in bcopy.
This fixes a SIGBUS when calling the function with misaligned pointers
on POWER7.

Submitted by:		Bruno Larsen <bruno.larsen@eldorado.org.br>
Reviewed by:		luporl, bdragon (IRC)
MFC after:		1 week
Sponsored by:		Eldorado Research Institute (eldorado.org.br)
Differential Revision:	https://reviews.freebsd.org/D28776
2021-03-25 13:07:01 -03:00
Brandon Bergren
24faccc241 [PowerPC64LE] Use a shared LIBC_ARCH for powerpc64le.
Given that we have converted to ELFv2 for BE already, endianness is the only
difference between the two ARCHs.

As such, there is no need to differentiate LIBC_ARCH between the two.

Combining them like this lets us avoid needing to have two copies of several
bits for no good reason.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:21:51 +00:00
Brandon Bergren
db7f521db7 Fix r358688 -- Remember to actually save r3 before processing.
Crash was noticed by pkubaj building gcc9.

Apparently non dword-aligned char pointers are somewhat rare in the wild.

Reported by:	pkubaj
Sponsored by:	Tag1 Consulting, Inc.
2020-03-11 23:34:44 +00:00
Justin Hibbits
dcfc676147 powerpc/memcpy: Don't predict the src and dst will be misaligned
Predicting misalignment will pessimize the expected common case.  Don't
predict true or false in thise case.
2020-03-06 03:46:48 +00:00
Justin Hibbits
021abafa5c Finish revert of r358672, missed in r358688.
Manual reverts never succeed correctly.

Reported by:	luporl
2020-03-06 02:30:04 +00:00
Justin Hibbits
00797360b5 powerpc/powerpc64: Enforce natural alignment in memcpy
Summary:
POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses.  Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in memcpy.
This fixes a SIGBUS in X with acceleration enabled on POWER9.

As part of this, revert r358672, it's no longer necessary with this fix.

Regression tested by alfredo.

Reviewed by: alfredo
Differential Revision: https://reviews.freebsd.org/D23969
2020-03-06 01:45:03 +00:00
Alfredo Dal'Ava Junior
2b37373c48 [PowerPC64] restrict memcpy/bcopy optimization to POWER ISA >=V2.07
VSX instructions were added in POWER ISA V2.06 (POWER7), but it
requires data to be word-aligned. Such requirement was removed in
ISA V2.07B (POWER8).

Since current memcpy/bcopy optimization relies on VSX instructions
handling misalignment transparently, and kernel doesn't currently
implement an alignment error handler, this optimzation should be
restrict to ISA V2.07 onwards.

SIGBUS on stxvd2x instruction was reproduced in POWER7+ CPU.

Reviewed by:	luporl, jhibbits, bdragon
Approved by:	jhibbits (mentor)
Differential Revision:	https://reviews.freebsd.org/D23958
2020-03-05 14:13:22 +00:00
Leandro Lupori
e16c18650c [PPC64] memcpy/memmove/bcopy optimization
For copies shorter than 512 bytes, the data is copied using plain
ld/std instructions.
For 512 bytes or more, the copy is done in 3 phases:

Phase 1: copy from the src buffer until it's aligned at a 16-byte boundary
Phase 2: copy as many aligned 64-byte blocks from the src buffer as possible
Phase 3: copy the remaining data, if any

In phase 2, this code uses VSX instructions when available. Otherwise,
it uses ldx/stdx.

Submitted by:	Luis Pires <lffpires_ruabrasil.org> (original version)
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15118
2020-01-15 20:25:52 +00:00
Leandro Lupori
181e35008c [PPC64] strncpy optimization
Assembly optimization of strncpy for PowerPC64, using double words
instead of bytes to copy strings.

Submitted by:	Leonardo Bianconi <leonardo.bianconi_eldorado.org.br> (original version)
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15369
2020-01-15 19:53:03 +00:00
Leandro Lupori
075fb85f09 [PPC64] strcpy optimization
Assembly optimization of strcpy for PowerPC64, using double words
instead of bytes to copy strings.

Submitted by:	Leonardo Bianconi <leonardo.bianconi_eldorado.org.br> (original version)
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D15368
2020-01-15 19:46:01 +00:00
Justin Hibbits
2f420a7c7f revert r346588 for now
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5.  Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
2019-05-11 15:17:42 +00:00
Justin Hibbits
89a0487c97 powerpc64: Rewrite strcmp in asm to take advantage of word size
Summary:
Optimize strcmp for powerpc64.
Data is loaded by double words and cmpb intruction is used to find '\0'.

Some performance gain rates between the current and the optimized solution:

String size (bytes)		Gain rate
	<=8			0.59%
	<=16			1.92%
	32			3.02%
	64			5.60%
	128			10.16%
	256			18.05%
	512			30.18%
	1024			42.82%

Submitted by:	alexandre.yamashita_eldorado.org.br,
		leonardo.bianconi_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D15220
2019-04-23 02:53:53 +00:00