157 lines
7.9 KiB
Plaintext
157 lines
7.9 KiB
Plaintext
|
==============================================================================
|
||
|
Cycle counts and throughput for low-level routines in GNU MP as currently
|
||
|
implemented.
|
||
|
|
||
|
A range means that the timing is data-dependent. The slower number of such
|
||
|
an interval is usually the best performance estimate.
|
||
|
|
||
|
The throughput value, measured in Gb/s (gigabits per second) has a meaning
|
||
|
only for comparison between CPUs.
|
||
|
|
||
|
A star before a line means that all values on that line are estimates. A
|
||
|
star before a number means that that number is an estimate. A `p' before a
|
||
|
number means that the code is not complete, but the timing is believed to be
|
||
|
accurate.
|
||
|
|
||
|
| mpn_lshift mpn_add_n mpn_mul_1 mpn_addmul_1
|
||
|
| mpn_rshift mpn_sub_n mpn_submul_1
|
||
|
------------+-----------------------------------------------------------------
|
||
|
DEC/Alpha |
|
||
|
EV4 | 4.75 cycles/64b 7.75 cycles/64b 42 cycles/64b 42 cycles/64b
|
||
|
200MHz | 2.7 Gb/s 1.65 Gb/s 20 Gb/s 20 Gb/s
|
||
|
EV5 old code| 4.0 cycles/64b 5.5 cycles/64b 18 cycles/64b 18 cycles/64b
|
||
|
267MHz | 4.27 Gb/s 3.10 Gb/s 61 Gb/s 61 Gb/s
|
||
|
417MHz | 6.67 Gb/s 4.85 Gb/s 95 Gb/s 95 Gb/s
|
||
|
EV5 tuned | 3.25 cycles/64b 4.75 cycles/64b
|
||
|
267MHz | 5.25 Gb/s 3.59 Gb/s as above
|
||
|
417MHz | 8.21 Gb/s 5.61 Gb/s
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Sun/SPARC |
|
||
|
SPARC v7 | 14.0 cycles/32b 8.5 cycles/32b 37-54 cycl/32b 37-54 cycl/32b
|
||
|
SuperSPARC | 3 cycles/32b 2.5 cycles/32b 8.2 cycles/32b 10.8 cycles/32b
|
||
|
50MHz | 0.53 Gb/s 0.64 Gb/s 6.2 Gb/s 4.7 Gb/s
|
||
|
**SuperSPARC| tuned addmul and submul will take: 9.25 cycles/32b
|
||
|
MicroSPARC2 | ? 6.65 cycles/32b 30 cycles/32b 31.5 cycles/32b
|
||
|
110MHz | ? 0.53 Gb/s 3.75 Gb/s 3.58 Gb/s
|
||
|
SuperSPARC2 | ? ? ? ?
|
||
|
Ultra/32 (4)| 2.5 cycles/32b 6.5 cycles/32b 13-27 cyc/32b 16-30 cyc/32b
|
||
|
182MHz | 2.33 Gb/s 0.896 Gb/s 14.3-6.9 Gb/s
|
||
|
Ultra/64 (5)| 2.5 cycles/64b 10 cycles/64b 40-70 cyc/64b 46-76 cyc/64b
|
||
|
182MHz | 4.66 Gb/s 1.16 Gb/s 18.6-11 Gb/s
|
||
|
HalSPARC64 | ? ? ? ?
|
||
|
------------+-----------------------------------------------------------------
|
||
|
SGI/MIPS |
|
||
|
R3000 | 6 cycles/32b 9.25 cycles/32b 16 cycles/32b 16 cycles/32b
|
||
|
40MHz | 0.21 Gb/s 0.14 Gb/s 2.56 Gb/s 2.56 Gb/s
|
||
|
R4400/32 | 8.6 cycles/32b 10 cycles/32b 16-18 19-21
|
||
|
200MHz | 0.74 Gb/s 0.64 Gb/s 13-11 Gb/s 11-9.6 Gb/s
|
||
|
*R4400/64 | 8.6 cycles/64b 10 cycles/64b 22 cycles/64b 22 cycles/64b
|
||
|
*200MHz | 1.48 Gb/s 1.28 Gb/s 37 Gb/s 37 Gb/s
|
||
|
R4600/32 | 6 cycles/64b 9.25 cycles/32b 15 cycles/32b 19 cycles/32b
|
||
|
134MHz | 0.71 Gb/s 0.46 Gb/s 9.1 Gb/s 7.2 Gb/s
|
||
|
R4600/64 | 6 cycles/64b 9.25 cycles/64b ? ?
|
||
|
134MHz | 1.4 Gb/s 0.93 Gb/s ? ?
|
||
|
R8000/64 | 3 cycles/64b 4.6 cycles/64b 8 cycles/64b 8 cycles/64b
|
||
|
75MHz | 1.6 Gb/s 1.0 Gb/s 38 Gb/s 38 Gb/s
|
||
|
*R10000/64 | 2 cycles/64b 3 cycles/64b 11 cycles/64b 11 cycles/64b
|
||
|
*200MHz | 6.4 Gb/s 4.27 Gb/s 74 Gb/s 74 Gb/s
|
||
|
*250MHz | 8.0 Gb/s 5.33 Gb/s 93 Gb/s 93 Gb/s
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Motorola |
|
||
|
MC68020 | ? 24 cycles/32b 62 cycles/32b 70 cycles/32b
|
||
|
MC68040 | ? 6 cycles/32b 24 cycles/32b 25 cycles/32b
|
||
|
MC88100 | >5 cycles/32b 4.6 cycles/32b 16/21 cyc/32b p 18/23 cyc/32b
|
||
|
MC88110 wt | ? 3.75 cycles/32b 6 cycles/32b 8.5 cyc/32b
|
||
|
*MC88110 wb | ? 2.25 cycles/32b 4 cycles/32b 5 cycles/32b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
HP/PA-RISC |
|
||
|
PA7000 | 4 cycles/32b 5 cycles/32b 9 cycles/32b 11 cycles/32b
|
||
|
67MHz | 0.53 Gb/s 0.43 Gb/s 7.6 Gb/s 6.2 Gb/s
|
||
|
PA7100 | 3.25 cycles/32b 4.25 cycles/32b 7 cycles/32b 8 cycles/32b
|
||
|
99MHz | 0.97 Gb/s 0.75 Gb/s 14 Gb/s 12.8 Gb/s
|
||
|
PA7100LC | ? ? ? ?
|
||
|
PA7200 (3) | 3 cycles/32b 4 cycles/32b 7 cycles/32b 6.5 cycles/32b
|
||
|
100MHz | 1.07 Gb/s 0.80 14 Gb/s 15.8 Gb/s
|
||
|
PA7300LC | ? ? ? ?
|
||
|
*PA8000 | 3 cycles/64b 4 cycles/64b 7 cycles/64b 6.5 cycles/64b
|
||
|
180MHz | 3.84 Gb/s 2.88 Gb/s 105 Gb/s 113 Gb/s
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Intel/x86 |
|
||
|
386DX | 20 cycles/32b 17 cycles/32b 41-70 cycl/32b 50-79 cycl/32b
|
||
|
16.7MHz | 0.027 Gb/s 0.031 Gb/s 0.42-0.24 Gb/s 0.34-0.22 Gb/s
|
||
|
486DX | ? ? ? ?
|
||
|
486DX4 | 9.5 cycles/32b 9.25 cycles/32b 17-23 cycl/32b 20-26 cycl/32b
|
||
|
100MHz | 0.34 Gb/s 0.35 Gb/s 6.0-4.5 Gb/s 5.1-3.9 Gb/s
|
||
|
Pentium | 2/6 cycles/32b 2.5 cycles/32b 13 cycles/32b 14 cycles/32b
|
||
|
167MHz | 2.7/0.89 Gb/s 2.1 Gb/s 13.1 Gb/s 12.2 Gb/s
|
||
|
Pentium Pro | 2.5 cycles/32b 3.5 cycles/32b 6 cycles/32b 9 cycles/32b
|
||
|
200MHz | 2.6 Gb/s 1.8 Gb/s 34 Gb/s 23 Gb/s
|
||
|
------------+-----------------------------------------------------------------
|
||
|
IBM/POWER |
|
||
|
RIOS 1 | 3 cycles/32b 4 cycles/32b 11.5-12.5 c/32b 14.5/15.5 c/32b
|
||
|
RIOS 2 | 2 cycles/32b 2 cycles/32b 7 cycles/32b 8.5 cycles/32b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
PowerPC |
|
||
|
PPC601 (1) | 3 cycles/32b 6 cycles/32b 11-16 cycl/32b 14-19 cycl/32b
|
||
|
PPC601 (2) | 5 cycles/32b 6 cycles/32b 13-22 cycl/32b 16-25 cycl/32b
|
||
|
67MHz (2) | 0.43 Gb/s 0.36 Gb/s 5.3-3.0 Gb/s 4.3-2.7 Gb/s
|
||
|
PPC603 | ? ? ? ?
|
||
|
*PPC604 | 2 3 2 3
|
||
|
*167MHz | 57 Gb/s
|
||
|
PPC620 | ? ? ? ?
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Tege |
|
||
|
Model 1 | 2 cycles/64b 3 cycles/64b 2 cycles/64b 3 cycles/64b
|
||
|
250MHz | 8 Gb/s 5.3 Gb/s 500 Gb/s 340 Gb/s
|
||
|
500MHz | 16 Gb/s 11 Gb/s 1000 Gb/s 680 Gb/s
|
||
|
____________|_________________________________________________________________
|
||
|
(1) Using POWER and PowerPC instructions
|
||
|
(2) Using only PowerPC instructions
|
||
|
(3) Actual timing for shift/add/sub depends on code alignment. PA7000 code
|
||
|
is smaller and therefore often faster on this CPU.
|
||
|
(4) Multiplication routines modified for bogus UltraSPARC early-out
|
||
|
optimization. Smaller operand is put in rs1, not rs2 as it should
|
||
|
according to the SPARC architecture manuals.
|
||
|
(5) Preliminary timings, since there is no stable 64-bit environment.
|
||
|
(6) Use mulu.d at least for mpn_lshift. With mak/extu/or, we can only get
|
||
|
to 2 cycles/32b.
|
||
|
|
||
|
=============================================================================
|
||
|
Estimated theoretical asymptotic cycle counts for low-level routines:
|
||
|
|
||
|
| mpn_lshift mpn_add_n mpn_mul_1 mpn_addmul_1
|
||
|
| mpn_rshift mpn_sub_n mpn_submul_1
|
||
|
------------+-----------------------------------------------------------------
|
||
|
DEC/Alpha |
|
||
|
EV4 | 3 cycles/64b 5 cycles/64b 42 cycles/64b 42 cycles/64b
|
||
|
EV5 | 3 cycles/64b 4 cycles/64b 18 cycles/64b 18 cycles/64b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Sun/SPARC |
|
||
|
SuperSPARC | 2.5 cycles/32b 2 cycles/32b 8 cycles/32b 9 cycles/32b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
SGI/MIPS |
|
||
|
R4400/32 | 5 cycles/64b 8 cycles/64b 16 cycles/64b 16 cycles/64b
|
||
|
R4400/64 | 5 cycles/64b 8 cycles/64b 22 cycles/64b 22 cycles/64b
|
||
|
R4600 |
|
||
|
------------+-----------------------------------------------------------------
|
||
|
HP/PA-RISC |
|
||
|
PA7100 | 3 cycles/32b 4 cycles/32b 6.5 cycles/32b 7.5 cycles/32b
|
||
|
PA7100LC |
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Motorola |
|
||
|
MC88110 | 1.5 cyc/32b (6) 1.5 cycle/32b 1.5 cycles/32b 2.25 cycles/32b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
Intel/x86 |
|
||
|
486DX4 |
|
||
|
Pentium P5x | 5 cycles/32b 2 cycles/32b 11.5 cycles/32b 13 cycles/32b
|
||
|
Pentium Pro | 2 cycles/32b 3 cycles/32b 4 cycles/32b 6 cycles/32b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
IBM/POWER |
|
||
|
RIOS 1 | 3 cycles/32b 4 cycles/32b
|
||
|
RIOS 2 | 1.5 cycles/32b 2 cycles/32b 4.5 cycles/32b 5.5 cycles/32b
|
||
|
------------+-----------------------------------------------------------------
|
||
|
PowerPC |
|
||
|
PPC601 (1) | 3 cycles/32b ?4 cycles/32b
|
||
|
PPC601 (2) | 4 cycles/32b ?4 cycles/32b
|
||
|
____________|_________________________________________________________________
|