27 lines
1.1 KiB
Plaintext
27 lines
1.1 KiB
Plaintext
|
This directory contains mpn functions optimized for Intel Pentium
|
||
|
processors.
|
||
|
|
||
|
RELEVANT OPTIMIZATION ISSUES
|
||
|
|
||
|
1. Pentium doesn't allocate cache lines on writes, unlike most other modern
|
||
|
processors. Since the functions in the mpn class do array writes, we have to
|
||
|
handle allocating the destination cache lines by reading a word from it in the
|
||
|
loops, to achieve the best performance.
|
||
|
|
||
|
2. Pairing of memory operations requires that the two issued operations refer
|
||
|
to different cache banks. The simplest way to insure this is to read/write
|
||
|
two words from the same object. If we make operations on different objects,
|
||
|
they might or might not be to the same cache bank.
|
||
|
|
||
|
STATUS
|
||
|
|
||
|
1. mpn_lshift and mpn_rshift run at about 6 cycles/limb, but the Pentium
|
||
|
documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
|
||
|
or 5 cycles/limb asymptotically.
|
||
|
|
||
|
2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb. Due to loop
|
||
|
overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
|
||
|
|
||
|
3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
|
||
|
should...
|