User Tools

Site Tools


chstone_latency_results

Original from http://lists.legup.org/pipermail/legup-dev/2010-March/000011.html:

adpcm 41934
aes 10571
blowfish 345519
dfadd 1898
dfdiv 1140
dfmul 680
dfsin 57372
gsm 10784
jpeg 8810967
mips 11374
motion 10652
sha 387458

My computer with llvm-gcc:

adpcm 41934 (same)
aes 10571 (same)
blowfish 345649 (.04% increase)
dfadd 1898 (same)
dfdiv 1140 (same)
dfmul 680 (same)
dfsin 57372 (same)
gsm 10713 (.66% decrease due to llvm.memset align 2 being optimized)
mips 11374 (same)
motion 10655 (.03% increase)
sha fail

My computer with clang:

adpcm fail
aes 10571 (same)
blowfish 345649 (.04% increase)
dfadd 1846 (2.82% decrease)
dfdiv 1139 (.09% decrease)
dfmul 676 (.90% decrease)
dfsin 57431 (.10 % increase)
gsm fail
mips 11407 (.29% increase)
motion 10667 (.11% increase)
sha fail

clang seems to do better with 64-bit double floating precision benchmarks, but otherwise just slightly slower than llvm-gcc

Ahmed's computer with ModelSim 6.6a

adpcm fail
aes 9841 (7.42 % decrease)
blowfish 344924 (.17% decrease)
dfadd 1878 (5.91% decrease)
dfdiv 1132 (.71% decrease) 
dfmul 680 (same)
dfsin 56390 (1.74% decrease)
gsm 10784 (same)
mips 11374 (same)
motion 10605 (.44% decrease)
sha 375122 (3.29% decrease)

Definitely faster than ModelSim 6.6, but not by a great deal

Some big differences in performance (cycles on Andrew's machine):

LLVM       2.6svn            2.7        Diff   
adpcm      41934             43284      +1350       
aes        10571             10571      0
blowfish   345519            345587     +68
dfadd      1898              1804       -94
dfdiv      1140              1490       +350
dfmul      680               703        +23
dfsin      57372             66144      +8772
gsm        10784             10761      -23
jpeg       8810967           8810969    +2
mips       11374             11374      0
motion     10652             10651      -1
sha        387458            387456     -2

This should be slower since I'm no longer inlining, but I have modified scheduling to chain fast operations “freely” with no latency. The only fast operations now are bitshift by a constant and casting (zext, sext). This is using LLVM 2.7:

        chained non-inlined % diff (non-optimized / optimized - 1)
adpcm     46566       48121    3.3
aes       11432       12181    6.6
blowfish 323527      348131    7.6
dfadd      2028        2186    7.8
dfdiv      1222        1361   11.4
dfmul       771         817    6.0
dfsin     57442       64145   11.7
gsm       15085       17852   18.3
jpeg        n/a         n/a
mips      10233       11647   13.8
motion    10696       10733    3.5
sha      365166      377712    3.4
dhrystone 12818       12864    3.6

Adding chaining after loads (var = memory_controller_out[7:0] is very fast, so chain after that, but can't chain loads and stores until binding is complete):

        chained chained (2) % diff (non-optimized / optimized - 1)
adpcm     46566       43797    6.3
aes       11432       10758    6.3
blowfish 323527      293219   10.3
dfadd      2028        1906    6.4
dfdiv      1222        1161    5.3
dfmul       771         729    5.8
dfsin     57442       55369    3.7
gsm       15085       14719    2.5
jpeg        n/a         n/a
mips      10233        9743    5.0
motion    10696       10680    0.1
sha      365166      348713    4.7
dhrystone 12818       11569   10.8

This won't be accurate without comparing the difference in fmax though, but it should be minimal.

dhrystone (does not have any shifts to chain):

non-chained: fmax = 167.00 MHz

chained: fmax = 177.49 MHz (critical path changes)

chained again: fmax = 172.47 MHz (2.9% slower than chained, same critical path)

chstone_latency_results.txt · Last modified: 2010/12/15 15:53 (external edit)