POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses. Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in bcopy.
This fixes a SIGBUS when calling the function with misaligned pointers
on POWER7.
Submitted by: Bruno Larsen <bruno.larsen@eldorado.org.br>
Reviewed by: luporl, bdragon (IRC)
MFC after: 1 week
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D28776
Given that we have converted to ELFv2 for BE already, endianness is the only
difference between the two ARCHs.
As such, there is no need to differentiate LIBC_ARCH between the two.
Combining them like this lets us avoid needing to have two copies of several
bits for no good reason.
Sponsored by: Tag1 Consulting, Inc.
Crash was noticed by pkubaj building gcc9.
Apparently non dword-aligned char pointers are somewhat rare in the wild.
Reported by: pkubaj
Sponsored by: Tag1 Consulting, Inc.
Summary:
POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses. Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in memcpy.
This fixes a SIGBUS in X with acceleration enabled on POWER9.
As part of this, revert r358672, it's no longer necessary with this fix.
Regression tested by alfredo.
Reviewed by: alfredo
Differential Revision: https://reviews.freebsd.org/D23969
VSX instructions were added in POWER ISA V2.06 (POWER7), but it
requires data to be word-aligned. Such requirement was removed in
ISA V2.07B (POWER8).
Since current memcpy/bcopy optimization relies on VSX instructions
handling misalignment transparently, and kernel doesn't currently
implement an alignment error handler, this optimzation should be
restrict to ISA V2.07 onwards.
SIGBUS on stxvd2x instruction was reproduced in POWER7+ CPU.
Reviewed by: luporl, jhibbits, bdragon
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D23958
For copies shorter than 512 bytes, the data is copied using plain
ld/std instructions.
For 512 bytes or more, the copy is done in 3 phases:
Phase 1: copy from the src buffer until it's aligned at a 16-byte boundary
Phase 2: copy as many aligned 64-byte blocks from the src buffer as possible
Phase 3: copy the remaining data, if any
In phase 2, this code uses VSX instructions when available. Otherwise,
it uses ldx/stdx.
Submitted by: Luis Pires <lffpires_ruabrasil.org> (original version)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15118
Assembly optimization of strncpy for PowerPC64, using double words
instead of bytes to copy strings.
Submitted by: Leonardo Bianconi <leonardo.bianconi_eldorado.org.br> (original version)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15369
Assembly optimization of strcpy for PowerPC64, using double words
instead of bytes to copy strings.
Submitted by: Leonardo Bianconi <leonardo.bianconi_eldorado.org.br> (original version)
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15368
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5. Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
Summary:
Optimize strcmp for powerpc64.
Data is loaded by double words and cmpb intruction is used to find '\0'.
Some performance gain rates between the current and the optimized solution:
String size (bytes) Gain rate
<=8 0.59%
<=16 1.92%
32 3.02%
64 5.60%
128 10.16%
256 18.05%
512 30.18%
1024 42.82%
Submitted by: alexandre.yamashita_eldorado.org.br,
leonardo.bianconi_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D15220