User Tools

Site Tools


andrew_s_log

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
andrew_s_log [2011/05/30 15:12]
acanis
andrew_s_log [2011/10/18 17:07] (current)
acanis
Line 1: Line 1:
 +It's very easy to install dokuwiki. Basically just extract the tarball over the
 +existing installation.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2009/10/18 16:00//
 +
 +There is a bug in the "make hybrid"​ flow:
 +<​code>​
 +dfadd.o: In function `main':​
 +(_main_section+0x4c):​ undefined reference to `float64_add'​
 +dfadd.o: In function `main':​
 +(_main_section+0x68):​ undefined reference to `float64_add'​
 +make: *** [hybrid] Error 1
 +</​code>​
 +
 +Looking in dfadd.sw.ll:​
 +<​code>​
 +  %4 = call i64 bitcast (i64 (i64, i64)* @legup_wrap_float64_add to i64 (i64, i64 (i64, i64)*)*)(i64 %3, i64 (i64, i64)* @float64_add)
 +</​code>​
 +There is a reference to float64_add that shouldn'​t be there.
 +Breaking down this function call:
 +<​code>​
 +  %4 = call i64 
 +        bitcast (i64 (i64, i64)* @legup_wrap_float64_add to i64 (i64, i64 (i64, i64)*)*)
 +        (i64 %3, i64 (i64, i64)* @float64_add)
 +</​code>​
 +What is that strange bitcast?
 +Before the llvm-ld this was:
 +<​code>​
 +  %4 = call i64 @legup_wrap_float64_add(i64 %3, i64 (i64, i64)* @float64_add)
 +</​code>​
 +
 +Before the sw pass it was:
 +<​code>​
 +  %4 = tail call i64 @float64_add(i64 %2, i64 %3)
 +</​code>​
 +
 +So something is going wrong in the sw pass.
 +It's a bug in ReplaceCallWith() in utils.cpp
 +<​code>​
 +%4 = tail call i64 @float64_add(i64 %2, i64 %3)
 +</​code>​
 +Becomes:
 +<​code>​
 +%4 = call i64 @legup_wrap_float64_add(i64 %3, i64 (i64, i64)* @float64_add)
 +</​code>​
 +
 +Okay fixed it.
 +
 +Seeing another problem with aes when accelerating aes_main:
 +<​code>​
 +acanis@acanis-desktop:​~/​git/​legup/​examples/​chstone_hybrid/​aes$ export LEGUP_ACCELERATOR_FILENAME=aes; ​        ​../​../​../​llvm/​Release+Asserts/​bin/​opt -legup-config=config.tcl -load=../​../​../​cloog/​install/​lib/​libcloog-isl.so -load=../​../​../​cloog/​install/​lib/​libisl.so -load=../​../​../​llvm/​tools/​polly/​Release+Asserts/​lib/​LLVMPolly.so ​ -load=../​../​../​llvm/​Release+Asserts/​lib//​LLVMLegUp.so -legup-sw-only < aes.prelto.bc > aes.prelto.sw.bc
 +opt: SwOnly.cpp:​205:​ virtual bool legup::​SwOnly::​runOnModule(llvm::​Module&​):​ Assertion `0 && "​Accelerated function is never called or optimized away!\n"'​ failed.
 +</​code>​
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/10/05 12:15//
 +
 +<​code>​
 +/​*-----------------------------------------------*
 +*           CLooG configuration is OK           *
 +*-----------------------------------------------*/​
 +It appears that your system is OK to start CLooG compilation. You need
 +now to type "​make"​. After compilation,​ you should check CLooG by typing
 +"make check"​. If no problem occur, you can type "make uninstall"​ if
 +you are upgrading an old version. Lastly type "make install"​ to install
 +CLooG on your system (log as root if necessary).
 +make -C cloog
 +make[1]: Entering directory `/​home/​acanis/​git/​new/​legup/​cloog'​
 +CDPATH="​${ZSH_VERSION+.}:"​ && cd . && /bin/bash /​home/​acanis/​git/​new/​legup/​cloog/​autoconf/​missing --run aclocal-1.11 -I m4
 +/​home/​acanis/​git/​new/​legup/​cloog/​autoconf/​missing:​ line 54: aclocal-1.11:​ command not found
 +WARNING: `aclocal-1.11'​ is missing on your system. ​ You should only need it if
 +         you modified `acinclude.m4'​ or `configure.ac'​. ​ You might want
 +         to install the `Automake'​ and `Perl' packages. ​ Grab them from
 +         any GNU archive site.
 +CDPATH="​${ZSH_VERSION+.}:"​ && cd . && /bin/bash /​home/​acanis/​git/​new/​legup/​cloog/​autoconf/​missing --run autoconf
 + cd . && /bin/bash /​home/​acanis/​git/​new/​legup/​cloog/​autoconf/​missing --run automake-1.11 --foreign
 +/​home/​acanis/​git/​new/​legup/​cloog/​autoconf/​missing:​ line 54: automake-1.11:​ command not found
 +WARNING: `automake-1.11'​ is missing on your system. ​ You should only need it if
 +         you modified `Makefile.am',​ `acinclude.m4'​ or `configure.ac'​.
 +         You might want to install the `Automake'​ and `Perl' packages.
 +         Grab them from any GNU archive site.
 +configure.ac:​59:​ error: possibly undefined macro: AM_INIT_AUTOMAKE
 +      If this token and others are legitimate, please use m4_pattern_allow.
 +      See the Autoconf documentation.
 +configure.ac:​73:​ error: possibly undefined macro: AC_PROG_LIBTOOL
 +configure.ac:​75:​ error: possibly undefined macro: AM_CONDITIONAL
 +make[1]: *** [configure] Error 1
 +</​code>​
 +
 +To fix this I just added back in the ./​autogen.sh in the cloog directory
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/10/04 12:15//
 +
 +Looking into why the xor-xor pattern in blowfish is taking up more registers.
 +<​code>​
 +        strict (no sharing) -> strict off
 +reg:    6853                -> 7318
 +aluts: ​ 6795                -> 6575
 +</​code>​
 +
 +Just sharing the 5 patterns:
 +<​code>​
 + Pattern Size: 5 (contents: addi32, xori32, addi32, xori32, xori32, )
 + Frequency: 15
 + Number of Pairs: 7
 +</​code>​
 +I get an improvment from:
 +<​code>​
 +;     ​Combinational ALUTs           ; 8,579 / 58,080 ( 15 % )                        ;
 +; Total registers ​                  ; 8389                                           ;
 +; Logic utilization ​                                                                ; 10,935 / 58,080 ( 19 % )     ;
 +</​code>​
 +To:
 +<​code>​
 +;     ​Combinational ALUTs           ; 6,945 / 58,080 ( 12 % )                        ;
 +; Total registers ​                  ; 7494                                           ;
 +; Logic utilization ​                                                                ; 9,586 / 58,080 ( 17 % )      ;
 +</​code>​
 +So reduction of 895 registers, 1634 aluts, 1349 logic utilization
 +
 +Just sharing size 3 patterns:
 +<​code>​
 +Function: BF_encrypt
 + Pattern Size: 3 (contents: addi32, xori32, addi32, )
 + Frequency: 16
 + Number of Pairs: 7
 +
 +
 +Function: BF_cfb64_encrypt
 + Pattern Size: 3 (contents: ori32, ori32, ori32, )
 + Frequency: 2
 + Number of Pairs: 1
 +</​code>​
 +
 +I see:
 +<​code>​
 +;     ​Combinational ALUTs           ; 7,796 / 58,080 ( 13 % )                        ;
 +; Total registers ​                  ; 7446                                           ;
 +; Logic utilization ​                ; 9,706 / 58,080 ( 17 % )           ;
 +</​code>​
 +
 +Only size 1 pairs
 +<​code>​
 +Function: BF_encrypt
 + Pattern Size: 1 (contents: xori32, )
 + Frequency: 50
 + Number of Pairs: 23
 +
 + Pattern Size: 1 (contents: addi32, )
 + Frequency: 32
 + Number of Pairs: 15
 +Function: BF_cfb64_encrypt
 + Pattern Size: 1 (contents: ori32, )
 + Frequency: 6
 + Number of Pairs: 3
 +Function: main
 + Pattern Size: 1 (contents: ori32, )
 + Frequency: 3
 + Number of Pairs: 1
 +
 + Pattern Size: 1 (contents: addi32, )
 + Frequency: 3
 + Number of Pairs: 1
 +</​code>​
 +
 +<​code>​
 +;     ​Combinational ALUTs           ; 7,168 / 58,080 ( 12 % )                        ;
 +; Total registers ​                  ; 7336                                           ;
 +; Logic utilization ​                                                                ; 9,711 / 58,080 ( 17 % )           ;
 +</​code>​
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/22 12:15//
 +
 +Look into the __restrict__ keyword for pointers
 +
 +TODO: 
 + - make sure srem/sdiv share together.
 + - srem is only a problem with aes
 + - function inlining would save 2 dividers in jpeg, 1 in sha, 1 in aes (assuming div/rem sharing)
 +Damn. sdiv and srem are used in the same state.
 + - binding aware scheduling would be crucial here
 +
 +Side note: C2H has some useful benchmarks
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/15 12:15//
 +
 +Stefan just noticed a large drop in LEs in the quality of results page.
 +It occurs for these commits (whihc doesn'​t make sense)
 +<​code>​
 +Stefan Hadjis [Thu, 25 Aug 2011 16:17:22 +0000]
 +    Fixed compilation error
 +Stefan Hadjis [Thu, 25 Aug 2011 15:54:48 +0000]
 +    Removed include Signals.h
 +Stefan Hadjis [Thu, 25 Aug 2011 15:40:26 +0000]
 +    Merge branch '​master'​ of legup.org:​legup
 +Stefan Hadjis [Thu, 25 Aug 2011 14:50:44 +0000]
 +    Binding changes for new LLVM version
 +    Made small changes to be compatible with the new version of LLVM.
 +</​code>​
 +
 +Maybe that's when I changed the quartus version?
 +Build before: Version 9.1 Build 350 03/​24/​2010 ​
 +<​code>​
 +Cycle geomean: 14576.0837656035
 +Fmax geomean: 80.3956562368825
 +Latency geomean: 181.702363878993
 +cat benchmark.csv
 +name time cycles Fmax LEs regs comb mults membits ​
 +chstone/​adpcm 407 31523 77.38 24284 10585 21786 300 27072 
 +chstone/aes 209 15716 75.18 21590 11386 18041 0 36800 
 +chstone/​blowfish 2811 197978 70.43 15967 8368 14198 0 150240 ​
 +chstone/​dfadd 6 804 124.98 10113 3911 9564 0 17056 
 +chstone/​dfdiv 29 2256 78.27 18079 12521 12256 48 12416 
 +chstone/​dfmul 3 291 107.33 5095 2382 4545 32 12032 
 +chstone/​dfsin 1010 64433 63.80 33363 18077 26105 86 12832 
 +chstone/gsm 84 5358 63.54 19058 5813 17477 70 10144 
 +chstone/​jpeg 33949 1323338 38.98 46485 19051 42071 240 468784 ​
 +chstone/​mips 52 5118 98.11 5042 2044 4492 16 4480 
 +chstone/​motion 54 6379 117.52 5449 2406 5000 0 33312 
 +chstone/sha 2843 233875 82.25 17015 8563 14004 0 134368 ​
 +dhrystone 82 7424 90.93 6893 3737 5611 4 2256 
 +program finished with exit code 0
 +elapsedTime=25113.470937
 +</​code>​
 +
 +Build after: Version 9.1 Build 350 03/24/2010
 +<​code>​
 +Cycle geomean: 14576.0837656035
 +Fmax geomean: 84.5261619961444
 +Latency geomean: 173.020465101761
 +cat benchmark.csv
 +name time cycles Fmax LEs regs comb mults membits ​
 +chstone/​adpcm 410 31523 76.91 24100 10173 21358 172 27072 
 +chstone/aes 190 15716 82.52 19730 10508 16113 0 36800 
 +chstone/​blowfish 2724 197978 72.69 13367 7684 11544 0 150240 ​
 +chstone/​dfadd 6 804 134.68 8531 3879 7965 0 17056 
 +chstone/​dfdiv 25 2256 89.90 14962 10736 9430 48 12416 
 +chstone/​dfmul 3 291 101.49 4451 2147 3965 32 12032 
 +chstone/​dfsin 943 64433 68.32 30048 16602 23622 86 12832 
 +chstone/gsm 69 5358 77.93 11112 5864 9598 52 10144 
 +chstone/​jpeg 33026 1323338 40.07 40614 19051 36075 172 468784 ​
 +chstone/​mips 53 5118 95.83 4244 1718 3871 16 4480 
 +chstone/​motion 51 6379 125.57 4726 2322 4273 0 33312 
 +chstone/sha 2738 233875 85.43 15692 8657 12623 0 134368 ​
 +dhrystone 82 7424 90.43 6566 3673 5338 4 2256 
 +program finished with exit code 0
 +elapsedTime=21144.868643
 +</​code>​
 +
 +No. It's caused by a combination of 
 +1) VerilogWriter fix
 +2) A few more dividers might have been shared.
 +
 +Just installed quartus 10.1 sp1.  Took 40 minutes to compile dfsin with no_dsps.
 +So the new version seems to be working.
 +
 +Adding stratix4 to the buildbot:
 +<​code>​
 +buildmaster@acanis-desktop:​~/​buildbot/​public_html/​perf$ generate_perf.py ​
 +</​code>​
 +
 +And modifying dashboard/​overview.html and dashboard/​perf.html. ​
 +Also need to modify process_log.py. Then restart the buildbot.
 +
 +Actually very easy!
 +Wow. Just noticed I wasn't backing up my buildmaster stuff. Just added it
 +to the backup system.
 +Updating the quartus version on buildbot up to 10.1sp1.
 +Do I have to do something with sdc files?
 +Also I have to fix benchmark.pl to actually work properly.
 +Do I need sdc files? Yep otherwise you have a critical warning.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/14 12:15//
 +
 +TODO: 
 + - make sure srem/sdiv share together.
 + - why are sdiv/srem with constant inputs being instantiated?​
 +
 +Turns out sharing between functions is actually more complicated
 +than I originally thought. You need to instatiate the bound functional
 +unit in the main module and then setup a mux between each instantiated module.
 +
 +Just setup a branch for this half way done function inlining code (in ~/​git/​legup):​
 +<​code>​
 +git checkout -b inlining
 +git commit -a
 +</​code>​
 +
 +Cases where there are two sext/zext operations feeding an adder occur in:
 +dfdiv, dfmul, dfsin, gsm, mips, sha, dhrystone
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/13 12:15//
 +
 +I'm trying to turn LegupPass into a ModulePass so we can do binding across
 +function boundaries.
 +
 +Very strange. It seems like LegupConfig is getting constructed twice...
 +
 +So basically there are two versions. One is created by llc and I'm not sure about
 +the other one. I think one of them is for function passes?
 +
 +<​code>​
 +acanis@acanis-desktop:​~/​git/​legup/​examples/​loop$ ../​../​llvm/​Release+Asserts/​bin/​llc -legup-config=../​../​hwtest/​CycloneII.tcl -march=v loop.bc -o loop.v --debug-pass=Details
 +Adding from llc
 +Constructing LegupConfig 0x9f13d40
 +Constructing LegupConfig 0x9f4b230
 +Pass Arguments: ​ -targetdata -legupconfig
 +Target Data Layout
 +Legup Configuration
 +  ModulePass Manager
 +    LegupPass backend
 +      Unnamed pass: implement Pass::​getPassName()
 +Pass Arguments: ​ -no-aa -legupconfig -legup-LiveVariableAnalysis -memdep -legup scheduler DAG -sdc-sched -simple asap -meta-asap
 +No Alias Analysis (always returns '​may'​ alias)
 +Legup Configuration
 +  FunctionPass Manager
 +    LVA
 +    Memory Dependence Analysis
 +    Legup directed acyclic graph with dependency and other information
 +    SDC Scheduler -- use linear programming for scheduling
 +    ASAP scheduler without resource constraints
 +    Complete ASAP Scheduling
 +0x9f13c60 ​  ​Executing Pass '​LegupPass backend'​ on Module '​loop.bc'​...
 +0x9f47fc0 ​    ​Required Analyses: LVA, Complete ASAP Scheduling, Legup Configuration
 +Starting doInitialization
 +op_name: signed_comp_lt_32 count: 155 this 0x9f13d40
 +op_name: signed_comp_lt_32 count: 155 this 0x9f13d40
 +Starting function: main
 +op_name: signed_comp_lt_32 count: 155 this 0x9f13d40
 +op_name: signed_comp_lt_32 count: 155 this 0x9f13d40
 +0x9f4a2d8 ​  ​Executing Pass '​LVA'​ on Function '​main'​...
 +0x9f4a2d8 ​  Made Modification '​LVA'​ on Function '​main'​...
 +0x9f4a2d8 ​  ​Executing Pass '​Memory Dependence Analysis'​ on Function '​main'​...
 +0x9f4ad90 ​    ​Required Analyses: No Alias Analysis (always returns '​may'​ alias)
 +0x9f4a2d8 ​  ​Executing Pass 'Legup directed acyclic graph with dependency and other information'​ on Function '​main'​...
 +0x9f4aba8 ​    ​Required Analyses: Memory Dependence Analysis, Legup Configuration
 +op_name: signed_comp_lt_32 count: 0 this 0x9f4b230
 +llc: /​home/​acanis/​git/​legup/​llvm/​include/​llvm/​LegupConfig.h:​356:​ legup::​Operation* legup::​LegupConfig::​getOperationRef(std::​string):​ Assertion `Operations.find(op_name) != Operations.end()'​ failed.
 +</​code>​
 +
 +Okay. So this doesn'​t happen if I make LegupPass a function pass. Strange.
 +But isn't TargetData a immutable pass?
 +Okay. So one example is MergeFunctions,​ which is a ModulePass which also
 +uses the TargetData info.
 +Okay - it never actually adds TargetData as a required analysis pass.
 +And actually, _none_ of the passes ever add TargetData as a required pass.
 +
 +Strange. So I can't even get the TargetData analysis from legupschedulerDAG.
 +I'll have to just make LegupConfig a global variable for now.
 +
 +So lets get divider sharing working. Aes has 11 dividers/​remainders. Reduces to 4 after binding.
 +I see at least one case where they aren't being shared across function boundaries.
 +
 +TODO: make sure srem/sdiv share together.
 +
 +Very strange. If I call the LVA pass twice
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/12 12:15//
 +
 +I'm going to install the latest version of ubuntu to see if the roccc binaries work.
 +
 +Error compiling gcc in the roccc installation. ​ I had to install gcc-multilib.
 +Okay. roccc works in the latest version of ubuntu!
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/08 12:15//
 +
 +Installed roccc:
 +<​code>​
 +acanis@acanis-desktop:​~/​roccc/​roccc-0.6-distribution$ ./​rocccInstall.sh -t ~/​roccc/​roccc-0.6-install/​
 +
 +ROCCC INSTALLER
 +This process will install ROCCC 2.0 onto your system. ​ Warnings will
 +be recorded in the file warning.log
 +
 +Some steps may take a while
 +
 +The GUI requires Eclipse 3.5 or higher. Please download from www.eclipse.org.
 +
 +Installing modified gcc 4.0.2 for Hi-CIRRF
 +
 +
 +
 +Installing llvm-gcc for Lo-CIRRF
 +Compiling the roccc-compiler proper
 +ROCCC already installed
 +Floating point cores added to the database
 +All of ROCCC is set up!
 +When prompted by the GUI, please enter: /​home/​acanis/​roccc/​roccc-0.6-distribution ​ as the ROCCC distribution directory
 +All of ROCCC has been set up.
 +The binaries are located in  /​home/​acanis/​roccc/​roccc-0.6-distribution/​Install
 +</​code>​
 +
 +Installed eclipse in ~/eclipse.
 +
 +Damn when I try to build in roccc I get the error:
 +<​code>​
 +/​home/​acanis/​roccc/​roccc-0.6-distribution//​Install/​roccc-compiler/​src/​../​bin/​parser:​
 +symbol lookup error:
 +/​home/​acanis/​roccc/​roccc-0.6-distribution//​Install/​roccc-compiler//​solib/​libstdc++.so.6:​
 +undefined symbol:
 +_ZNSt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE2idE,​ version
 +GLIBCXX_3.4
 +Compilation of FFT.c failed.
 +</​code>​
 +
 +So I moved the c++ lib into a tmp directory:
 +<​code>​
 +acanis@acanis-desktop:​~/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​solib$ mv libstdc++.so.6* tmp/
 +</​code>​
 +
 +Now the gui opens okay but I get a new error:
 +<​code>​
 +/​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt:​
 +/​usr/​lib/​libstdc++.so.6:​ version `GLIBCXX_3.4.14'​ not found (required by
 +/​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt)
 +/​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt:​
 +/​usr/​lib/​libstdc++.so.6:​ version `GLIBCXX_3.4.11'​ not found (required by
 +/​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt)
 +Compilation of FFT.c failed.
 +</​code>​
 +
 +They'​re using quite an old version of llvm (2.3)
 +Do I need to downgrade to libstdc++.so.5?​
 +Looking at an ldd:
 +<​code>​
 + ldd /​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt
 +/​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt:​ /​usr/​lib/​libstdc++.so.6:​ version `GLIBCXX_3.4.14'​ not found (required by /​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt)
 +/​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt:​ /​usr/​lib/​libstdc++.so.6:​ version `GLIBCXX_3.4.11'​ not found (required by /​home/​acanis/​roccc/​roccc-0.6-distribution/​Install/​roccc-compiler/​src/​llvm-2.3//​Release/​bin/​opt)
 + linux-gate.so.1 =>  (0xb772c000)
 + libsqlite3.so.0 => /​usr/​lib/​libsqlite3.so.0 (0xb7696000)
 + libpthread.so.0 => /​lib/​tls/​i686/​cmov/​libpthread.so.0 (0xb767c000)
 + libdl.so.2 => /​lib/​tls/​i686/​cmov/​libdl.so.2 (0xb7678000)
 + libstdc++.so.6 => /​usr/​lib/​libstdc++.so.6 (0xb7589000)
 + libm.so.6 => /​lib/​tls/​i686/​cmov/​libm.so.6 (0xb7563000)
 + libgcc_s.so.1 => /​lib/​libgcc_s.so.1 (0xb7554000)
 + libc.so.6 => /​lib/​tls/​i686/​cmov/​libc.so.6 (0xb73f1000)
 + /​lib/​ld-linux.so.2 (0xb772d000)
 +</​code>​
 +
 +Damn. I'm nost sure what to do here...
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/09/07 12:15//
 +
 +I need to look into roccc. ​ Try to compile the chstone benchmarks.
 +
 +Todo:
 +1) The interface for the fsm needs to be changed. How do you know how many cycles an instruction takes?
 +2) Makefile needs to have an option for debugging mode
 +3) Gui - cleanup APIs
 +4) forum for mailing list
 +
 +Looking at the gantt chart for popcount is very interesting. So much has been inlined that the gantt
 +chart is huge. There are 53 states.
 +
 +Looking into the multi fmax bug in benchmark.pl. jpeg seems to have the bug:
 +<​code>​
 +Type           : Clock Setup: '​pll50MHz:​pll50|altpll:​altpll_component|_clk0'​
 +Slack          : 1.303 ns
 +Required Time  : 50.00 MHz ( period = 20.000 ns )
 +Actual Time    : 57.49 MHz ( period = 17.394 ns )
 +From           : tiger:​tiger_sopc|data_cache_0:​the_data_cache_0|Cache:​data_cache_0|dcacheMem:​dcacheMemIns|altsyncram:​altsyncram_component|altsyncram_9hd2:​auto_generated|ram_block1a0~porta_address_reg8
 +To             : tiger:​tiger_sopc|tiger_top_0:​the_tiger_top_0|tiger_top:​tiger_top_0|tiger_tiger:​core|tiger_decode:​de|always0~1_Duplicate_OTERM447_OTERM459
 +From Clock     : pll50MHz:​pll50|altpll:​altpll_component|_clk0
 +To Clock       : pll50MHz:​pll50|altpll:​altpll_component|_clk0
 +Failed Paths   : 0
 +
 +Type           : Clock Setup: '​altera_internal_jtag~TCKUTAP'​
 +Slack          : N/A
 +Required Time  : None
 +Actual Time    : 48.41 MHz ( period = 20.658 ns )
 +From           : tiger:​tiger_sopc|tigers_jtag_uart_1:​the_tigers_jtag_uart_1|vJTAGUart:​tigers_jtag_uart_1|FIFO:​DataOut|dcfifo:​dcfifo_component|dcfifo_4sp1:​auto_generated|altsyncram_vu11:​fifo_ram|altsyncram_rd91:​altsyncram14|ram_block15a0~porta_address_reg7
 +To             : sld_hub:​auto_hub|tdo
 +From Clock     : altera_internal_jtag~TCKUTAP
 +To Clock       : altera_internal_jtag~TCKUTAP
 +Failed Paths   : 0
 +</​code>​
 +
 +I'm recompiling jpeg in quartus to double check this.
 +Strange. jpeg doesnt compile for me.
 +Very strange. I have a blank function that gets called 3 times:
 +<​code>​
 +declare void @mexit_spin(i32) noreturn
 +</​code>​
 +
 +How did buildbot not catch this? Okay nm it was due to some new changes I've been making.
 +Retesting with a fresh copy of the repository. remember to compile with quartus you use "make p" to setup 
 +the project then "make f"
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/23 12:15//
 +
 +I should probably add a forum to the legup website.
 +Actually what I really need to do is turn the mailing list into more of a forum.
 +Like the nabble forum for llvm.
 +
 +
 +I think writing a gui is actually very useful. Because I'll be able to clean up the APIs.
 +
 +How to inline everything?
 +What does inline-threshold do?
 +From the code:
 +<​code>​
 +InlineLimit("​inline-threshold",​ cl::Hidden, cl::​init(225),​ cl::​ZeroOrMore,​
 +        cl::​desc("​Control the amount of inlining to perform (default = 225)"​));​
 +</​code>​
 +
 +I'm going to try to run opt on adpcm:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​chstone/​adpcm$ ../​../​../​llvm/​Debug+Asserts/​bin/​opt -debug -inline -inline-threshold=0 < adpcm.bc > adpcm.new.bc;​ ../​../​../​llvm/​Debug+Asserts/​bin/​llvm-dis adpcm.new.bc
 +Args: ../​../​../​llvm/​Debug+Asserts/​bin/​opt -debug -inline -inline-threshold=0 ​
 +Inliner visiting SCC: upzero: 0 call sites.
 +Inliner visiting SCC: INDIRECTNODE:​ 0 call sites.
 +Inliner visiting SCC: printf: 0 call sites.
 +Inliner visiting SCC: main: 4 call sites.
 +    NOT Inlining: cost=370, thres=0, Call:   tail call fastcc void @upzero(i32 %76, i32* getelementptr inbounds ([6 x i32]* @delay_dltx,​ i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @delay_bpl, i32 0, i32 0)) nounwind
 +    NOT Inlining: cost=370, thres=0, Call:   tail call fastcc void @upzero(i32 %148, i32* getelementptr inbounds ([6 x i32]* @delay_dhx, i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @delay_bph, i32 0, i32 0)) nounwind
 +    NOT Inlining: cost=370, thres=0, Call:   tail call fastcc void @upzero(i32 %258, i32* getelementptr inbounds ([6 x i32]* @dec_del_dltx,​ i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @dec_del_bpl,​ i32 0, i32 0)) nounwind
 +    NOT Inlining: cost=370, thres=0, Call:   tail call fastcc void @upzero(i32 %333, i32* getelementptr inbounds ([6 x i32]* @dec_del_dhx,​ i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @dec_del_bph,​ i32 0, i32 0)) nounwind
 +Inliner visiting SCC: INDIRECTNODE:​ 0 call sites.
 +</​code>​
 +
 +So even though I've set the inline-threshold. Oh wait, You have to set the thresold to be high. 
 +To inline all functions run:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​chstone/​adpcm$ ../​../​../​llvm/​Debug+Asserts/​bin/​opt -debug -inline-threshold=100000 -inline < adpcm.bc > adpcm.new.bc;​ ../​../​../​llvm/​Debug+Asserts/​bin/​llvm-dis adpcm.new.bc
 +Args: ../​../​../​llvm/​Debug+Asserts/​bin/​opt -debug -inline-threshold=100000 -inline ​
 +Inliner visiting SCC: upzero: 0 call sites.
 +Inliner visiting SCC: INDIRECTNODE:​ 0 call sites.
 +Inliner visiting SCC: printf: 0 call sites.
 +Inliner visiting SCC: main: 4 call sites.
 +    Inlining: cost=370, thres=100000,​ Call:   tail call fastcc void @upzero(i32 %76, i32* getelementptr inbounds ([6 x i32]* @delay_dltx,​ i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @delay_bpl, i32 0, i32 0)) nounwind
 +    Inlining: cost=370, thres=100000,​ Call:   tail call fastcc void @upzero(i32 %410, i32* getelementptr inbounds ([6 x i32]* @dec_del_dhx,​ i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @dec_del_bph,​ i32 0, i32 0)) nounwind
 +    Inlining: cost=370, thres=100000,​ Call:   tail call fastcc void @upzero(i32 %335, i32* getelementptr inbounds ([6 x i32]* @dec_del_dltx,​ i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @dec_del_bpl,​ i32 0, i32 0)) nounwind
 +    Inlining: cost=-14630,​ thres=100000,​ Call:   tail call fastcc void @upzero(i32 %225, i32* getelementptr inbounds ([6 x i32]* @delay_dhx, i32 0, i32 0), i32* getelementptr inbounds ([6 x i32]* @delay_bph, i32 0, i32 0)) nounwind
 +    -> Deleting dead function: upzero
 +CGSCCPASSMGR:​ Refreshing SCC with 1 nodes:
 +Call graph node for function: '​main'<<​0x9d0da18>> ​ #uses=1
 +  CS<​0x9d297a4>​ calls function '​printf'​
 +
 +CGSCCPASSMGR:​ SCC Refresh didn't change call graph.
 +Inliner visiting SCC: INDIRECTNODE:​ 0 call sites.
 +</​code>​
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/23 12:15//
 +
 +The linker seems to be optimizing away everything in the sw/hw partitioning case.
 +
 +To remove untracked files in git (be careful):
 +This removes all directories (d) and ignored files (x)
 +<​code>​
 +git clean -fdx 
 +</​code>​
 +
 +Makefile dependencies are annoying. Dry run can help you see what's going on:
 +<​code>​
 +acanis@acanis-desktop:​~/​git/​legup$ make -n
 +mkdir -p cloog/​install
 +cd cloog && ./configure --prefix=/​home/​acanis/​git/​legup/​cloog/​install
 +make -C cloog
 +make install -C cloog
 +cd llvm && ./configure --with-cloog=/​home/​acanis/​git/​legup/​cloog/​install --with-isl=/​home/​acanis/​git/​legup/​cloog/​install
 +make -C mips-binutils
 +make -C llvm
 +make -C tiger/​hybrid/​processor
 +make -C tiger/​processor
 +make clean -C tiger/​linux_tools
 +make -C tiger/​linux_tools
 +make clean -C examples/​lib/​llvm
 +make -C examples/​lib/​llvm
 +</​code>​
 +
 +Okay I figured out why -j doesn'​t propagate to recursive calls of make. I need
 +to call $(MAKE) instead of '​make'​. So make -j4 works fine now.
 +The only problem is this screws up the nice clean "make -n" shown above.
 +
 +There'​s a slight dependency problem with the Transforms/​LegUp makefile:
 +<​code>​
 +LDFLAGS = $(LLVM_OBJ_ROOT)/​lib/​CodeGen/​$(BuildMode)/​IntrinsicLowering.o
 +</​code>​
 +Sometimes CodeGen isn't built before this line:
 +
 +To reproduce run:
 +<​code>​
 +rm -rf llvm/​lib/​Transforms/​LegUp/​Release+Asserts/​ llvm/​lib/​CodeGen/​Release+Asserts/​
 +</​code>​
 +And you'll see:
 +<​code>​
 +llvm[4]: Linking Release+Asserts Loadable Module LLVMLegUp.so
 +g++: /​home/​acanis/​git/​legup/​llvm/​lib/​CodeGen/​Release+Asserts/​IntrinsicLowering.o:​ No such file or directory
 +</​code>​
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/23 12:15//
 +
 +Trying to move IterativeModuloScheduling into the Target/​Verilog directory.
 +Running into the same errors as before:
 +<​code>​
 +IterativeModuloScheduling.cpp:​18:​33:​ error: polly/​LinkAllPasses.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​19:​37:​ error: polly/​Support/​GICHelper.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​20:​38:​ error: polly/​Support/​ScopHelper.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​21:​25:​ error: polly/​Cloog.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​22:​31:​ error: polly/​Dependences.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​23:​28:​ error: polly/​ScopInfo.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​24:​32:​ error: polly/​TempScopInfo.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​39:​25:​ error: cloog/​cloog.h:​ No such file or directory
 +IterativeModuloScheduling.cpp:​40:​29:​ error: cloog/​isl/​cloog.h:​ No such file or directory
 +</​code>​
 +
 +In tools/​polly/​lib/​Makefile there is the line:
 +<​code>​
 +CPP.Flags += $(POLLY_INC)
 +</​code>​
 +
 +Where POLLY_INC is defined in the polly Makefile.config file.
 +I need to add this include path in the base makefile.
 +Okay I can just add this to the Target/​Verilog makefile:
 +<​code>​
 +CPP.Flags += -I$(LLVM_SRC_ROOT)/​../​cloog/​install/​include \
 + -I$(LLVM_SRC_ROOT)/​tools/​polly/​include
 +</​code>​
 +
 +Actually I'm going to move this to the Transforms/​LegUp directory so I can
 +run this as a prepass.
 +
 +This is annoying:
 +<​code>​
 +../​../​llvm/​Debug+Asserts/​bin/​opt -load=../​../​llvm/​Debug+Asserts/​lib/​LLVMLegUp.so -legup-prelto < pipeline.prelto.linked.bc > pipeline.prelto.bc
 +Error opening '​../​../​llvm/​Debug+Asserts/​lib/​LLVMLegUp.so':​ ../​../​llvm/​Debug+Asserts/​lib/​LLVMLegUp.so:​ undefined symbol: _ZNK5polly8ScopPass5printERN4llvm11raw_ostreamEPKNS1_6ModuleE
 +</​code>​
 +
 +The polly shared library is stored in the tools directory now:
 +<​code>​
 +./​tools/​polly/​Debug+Asserts/​lib/​LLVMPolly.so
 +</​code>​
 +
 +This problem again:
 +<​code>​
 +Error opening '​../​../​llvm/​tools/​polly/​Debug+Asserts/​lib/​LLVMPolly.so':​ libisl.so.7:​ cannot open shared object file: No such file or directory
 +</​code>​
 +Okay, I can fix this by loading these shared libraries manually.
 +
 +Missing the SchedulerDAG from the Target/​Verilog:​
 +<​code>​
 +Error opening '​../​../​llvm/​Debug+Asserts/​lib/​LLVMLegUp.so':​ ../​../​llvm/​Debug+Asserts/​lib/​LLVMLegUp.so:​ undefined symbol: _ZN5legup17LegupSchedulerDAG2IDE ​
 +</​code>​
 +
 +And SchedulerPass:​
 +<​code>​
 +../​../​llvm/​Debug+Asserts/​bin/​opt:​ symbol lookup error: ../​../​llvm/​Debug+Asserts/​lib/​LLVMLegUp.so:​ undefined symbol: _ZN5legup13SchedulerPass14canChainBeforeEPN4llvm11InstructionE
 +</​code>​
 +
 +Had to add the .o files to the makefile:
 +<​code>​
 +          $(LLVM_OBJ_ROOT)/​lib/​Target/​Verilog/​$(BuildMode)/​LegupSchedulerDAG.o \
 +          $(LLVM_OBJ_ROOT)/​lib/​Target/​Verilog/​$(BuildMode)/​SchedulerPass.o
 +</​code>​
 +
 +Okay. Seems to be working now. Lets just confirm I get the same results as before
 +and I can commit what I have so far. 
 +I'm seeing 407 cycles for examples/​pipeline:​
 +<​code>​
 +# run 7000000000000000ns
 +# At t=              815000 cycles= ​                407 clk=1 finish=1 return_val= ​        0
 +# ** Note: $finish ​   : pipeline.v(1390)
 +</​code>​
 +Interesting. Before the update I was seeing ~500 cycles.
 +This might be due to the new sdc scheduler?
 +Anyway. It still should be ~300 cycles so I need to fix the cross basic block
 +latency issue.
 +Oh wait. It's because I've modified pipeline.c to have a loop carried dependence.
 +Nope. I changed it back and the cycles doesn'​t change.
 +Luckily I saved everything in examples/​pipeline/​ece1754/​
 +
 +I need to actually distribute the gantt.sty latex style.
 +
 +Trying to debug this tiger issue.
 +I'm running "make tigersim"​. Here is what I see after a while:
 +The runtest.log is stalled at:
 +<​code>​
 +Running ./​../​dejagnu/​tiger_sim/​dfadd.exp ...
 +</​code>​
 +
 +Looking at htop:
 +<​code>​
 + 1899 acanis ​   20   ​0 ​ 3336  1008   784 S  0.0  0.0  0:​00.00 ​ |       ​| ​  `- make test_tiger_sim
 + 1901 acanis ​   20   ​0 ​ 8360  4792  1560 S  0.0  0.2  0:​01.22 ​ |       ​| ​      `- /​usr/​bin/​expect -- /​usr/​share/​dejagnu/​runtest.exp -v -v -v -v --all --status=1 ../​dejagnu/​tiger_sim/​adpcm.exp ../​dejagnu/​tiger_sim/​aes.exp ../d
 + 2164 acanis ​   20   ​0 ​ 3468  1056   804 S  0.0  0.0  0:​00.00 ​ |       ​| ​          `- make tigersim
 + 2189 acanis ​   20   ​0 ​ 3944  1244  1044 S  0.0  0.0  0:​00.00 ​ |       ​| ​          ​| ​  `- /bin/bash -e -c cd /​home/​acanis/​work/​legup/​examples/​chstone/​dfadd/​../​../​../​tiger/​processor/​tiger_DE2/​tiger_sim && ./simulate
 + 2190 acanis ​   20   ​0 ​ 4460  1068   600 S  0.0  0.0  0:​00.00 ​ |       ​| ​          ​| ​      `- /bin/bash -e -c cd /​home/​acanis/​work/​legup/​examples/​chstone/​dfadd/​../​../​../​tiger/​processor/​tiger_DE2/​tiger_sim && ./simulate
 + 2197 acanis ​   20   ​0 ​ 3068   ​628 ​  532 S  0.0  0.0  0:​00.00 ​ |       ​| ​          ​| ​          `- tee transcript.txt
 + 2196 acanis ​   20   0 17016  8416  3004 S  0.0  0.3  0:​00.29 ​ |       ​| ​          ​| ​          `- vish -- -vsim -c -do ../​run_sim_nowave.tcl
 + 2212 acanis ​   20   0 71084 18480  3964 R 96.0  0.7  9:​24.91 ​ |       ​| ​          ​| ​              `- /​opt/​modelsim/​install/​modeltech/​linux/​vsimk -port 56352 -stdoutfilename /​tmp/​VSOUTwrXtYr -c -do ../​run_sim_nowave.tcl
 + 2213 acanis ​   20   0 71084 18480  3964 S  0.0  0.7  0:​00.00 ​ |       ​| ​          ​| ​              ​| ​  `- /​opt/​modelsim/​install/​modeltech/​linux/​vsimk -port 56352 -stdoutfilename /​tmp/​VSOUTwrXtYr -c -do ../​run_sim_nowave.tcl
 + 2205 acanis ​   20   ​0 ​ 3172   ​960 ​  784 S  0.0  0.0  0:​00.00 ​ |       ​| ​          ​| ​              `- /​opt/​modelsim/​install/​modeltech/​linux/​vlm 1598714592 1226522872
 + 2206 acanis ​   20   ​0 ​ 4320  2784  1388 S  0.0  0.1  0:​00.14 ​ |       ​| ​          ​| ​                  `- /​opt/​modelsim/​install/​modeltech/​linux/​mgls/​lib/​mgls_asynch ​ -f6,10
 + 1925 acanis ​   20   ​0 ​ 8360  4792  1560 S  0.0  0.2  0:​00.00 ​ |       ​| ​          `- /​usr/​bin/​expect -- /​usr/​share/​dejagnu/​runtest.exp -v -v -v -v --all --status=1 ../​dejagnu/​tiger_sim/​adpcm.exp ../​dejagnu/​tiger_sim/​aes.exp
 + 2236 acanis ​   20   ​0 ​ 6812  4056  1448 S  0.0  0.1  0:​00.12 ​ |   
 +</​code>​
 +
 +Looking in /​tmp/​VSOUTwrXtYr all I see is:
 +<​code>​
 +...
 +Tap Controller State machine output error
 +Time: 0  Instance: test_bench.DUT.the_tiger_top_0.tiger_top_0.debug_controller.VJTInst.sld_virtual_jtag_component.jtag.output_logic
 +a_input=
 +</​code>​
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/23 12:15//
 +
 +I need to add something to shrink the integer sizes down.
 +There is a presentation here:
 +<​code>​
 +    llvm.org/​pubs/​2007-07-25-LLVM-2.0-and-Beyond.pdf
 +</​code>​
 +
 +The llvm 2.0 release added arbitrary precision integers:
 +<​code>​
 +Primarily useful to EDA / hardware synthesis business:
 +  * An 11-bit multiplier is significantly cheaper/​smaller than a 16-bit one
 +  * Can use LLVM analysis/​optimization framework to shrink variable widths
 +  * Patch available that adds an attribute in llvm-gcc to get this
 +Implementation impact of arbitrary width integers:
 +  * Immediates, constant folding, intermediate arithmetic simplifications
 +  * New APInt class used internally to represent/​manipulate these
 +  * Makes LLVM more portable, not using uint64_t everywhere for arithmetic
 +</​code>​
 +
 +I need to get my hands on that patch. Can't seem to find it. Can't find it. I'll have
 +to implement this myself.
 +
 +For instance, I think this was the case Stefan was looking at in mips:
 +<​code>​
 +  %6 = phi i32 [ %227, %226 ], [ 0, %.preheader ]
 +  %7 = lshr i32 %pc.0, 2
 +  %8 = and i32 %7, 63
 +</​code>​
 +
 +63 is all zeros and then 6 ones. So the above code can be turned into:
 +<​code>​
 +  %pc.0 = phi i32 [ %pc.1, %226 ], [ 4194304, %.preheader ]
 +  %7 = lshr i32 %pc.0, 2
 +  %8 = trunc i32 %7 to i6
 +  %9 = and i6 %8, 63
 +  %10 = zext i6 %9 to i32
 +</​code>​
 +
 +Need to run this pass after link time optimization.
 +What is the impact of this change? Probably won't affect area because
 +quartus would have already made this optimization. Lets doublecheck.
 +No change in cycles. Yep no impact on area.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/22 12:15//
 +
 +Getting rid of the array initialization takes us down to:
 +<​code>​
 +73735 / 2 = 36867 cycles
 +</​code>​
 +So saves exactly 1024 cycles.
 +Just noticed that there are actually no stores happening in the code right now.
 +So that's actually cheating. Were are all these cycles coming from?
 +Is it roughly 32 * 1024 = 32768?
 +Where fully pipelined you can do it in 4 * 1024 = 4096
 +I forgot about unrolling. Would that fix this?
 +
 +Interesting run the command (note the -debug option)
 +<​code>​
 +opt -mem2reg -loops -loop-simplify -loop-unroll -unroll-threshold=192 -debug  ​
 +</​code>​
 +
 +This fully unrolls the 32 loop but leaves the bigger outer loop.
 +<​code>​
 +Loop Unroll: F[main] Loop %
 +  Loop Size = 105
 +  Too large to fully unroll with count: 1024 because size: 107520>​192
 +  will not try to unroll partially because -unroll-allow-partial not given
 +</​code>​
 +
 +Now we finish in 26631 / 2 = 13315 cycles
 +So quite a bit better. But still worse.
 +Try to partially unroll outer loop? Doesn'​t work. How about fully unroll the outer loop
 +by setting the thresold to 107520. Wow that produces a lot of code. Turning off -debug flag.
 +Now llc is taking forever. Oh shit this is stupid. llvm just optimizes everything away. all that's left is printf statments.
 +
 +Interesting. ​ -unroll-allow-partial works if I increase unroll-threshold to 512:
 +<​code>​
 +Loop Unroll: F[main] Loop %
 +  Loop Size = 105
 +  Too large to fully unroll with count: 1024 because size: 107520>​512
 +  partially unrolling with count: 4
 +  Trip Count = 1024
 +UNROLLING loop % by 4 with a breakout at trip 0!
 +</​code>​
 +
 +I guess this is because 1024 is divisible by 512 (4 times)
 +Cycles doesn'​t change at all though.
 +So we can still get a 3x improvement by pipelining, which is expected because i think the inner
 +loop has about 3 dependent operations.
 +
 +There'​s a bug with the new polly:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​popcount$ ~/​work/​legup/​llvm/​Debug+Asserts/​bin/​opt -load /​home/​acanis/​work/​legup/​llvm/​tools/​polly/​Debug+Asserts/​lib/​LLVMPolly.so ​
 +Error opening '/​home/​acanis/​work/​legup/​llvm/​tools/​polly/​Debug+Asserts/​lib/​LLVMPolly.so':​ libisl.so.7:​ cannot open shared object file: No such file or directory
 +  -load request ignored.
 +</​code>​
 +Damn. Can I statically link it in? Well for now I'll just do:
 +<​code>​
 +export LD_LIBRARY_PATH=/​home/​acanis/​git/​legup/​cloog/​install/​lib/:​$LD_LIBRARY_PATH
 +</​code>​
 +
 +Getting an error with pollycc:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​popcount$ ~/​work/​legup/​llvm/​tools/​polly/​utils/​pollycc ​ popcount.c ​
 +Polly support not available in opt
 +</​code>​
 +
 +Looks like the python script parses the output of the opt help:
 +<​code>​
 +['​opt',​ '​-load',​ '/​home/​acanis/​work/​legup/​llvm/​tools/​polly/​Debug+Asserts/​lib/​LLVMPolly.so',​ '​-help'​]
 +</​code>​
 +Wrong opt, updating my PATH
 +<​code>​
 +export PATH=/​home/​acanis/​work/​legup/​llvm//​Debug+Asserts/​bin/:​$PATH
 +</​code>​
 +Seems to be working now. pollycc produces an a.out file
 +
 +Okay, my old code was in:
 +<​code>​
 +/​home/​acanis/​work/​legup/​llvm/​tools/​polly_old/​lib/​IterativeModuloScheduling.cpp
 +</​code>​
 +
 +Note: runOnScop() immediately returns false right now.
 +Okay, I need to figure out how to move this file out of the polly directory...
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/19 12:15//
 +
 +Getting a strange error from quartus. "Word too long". Okay turns out my PATH is longer than 1024 characters.
 +
 +Looking at the c-to-verilog example. I think Nadav had pipelining implemented. ​
 +There is a testbench inside the code. looks like the two array parameters are from two dual ported brams.
 +There is some initialization of the arrays in the testbench:
 +<​code>​
 +   ​integer i;
 +   ​initial begin
 +       for (i = 0; i < (1<<​(ADDRESS_WIDTH-1));​ i = i + 1) begin
 +       ​mem[i] <= i;
 +     end
 +   end
 +</​code>​
 +Looks like the mem is just initialized to 0, 1, 2, ...
 +
 +Is it typical to pass arrays into the main module like this in c-to-verilog?​
 +
 +We take significantly longer: 75783/2 = 37891 cycles
 +vs ctoverilog: 41050ns / 10 = 4105 cycles.
 +So about 10x slower. As expected because we dont have pipelining.
 +
 +There are memory accesses every 40 / 10 = 4 cycles
 +
 +<​code>​
 +#                40975w mem[ 1021] ==       1021; in=         9
 +#                41015w mem[ 1022] ==       1022; in=         9
 +#                41055w mem[ 1023] ==       1023; in=        10
 +</​code>​
 +
 +Having enough memory ports is crucial. Here there are actually 4 ports available.
 +When we pipeline this we will only have 1...
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/18 12:15//
 +
 +Okay. Still a few modelsim warnings on mips, fir, memset.
 +Fixed.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/15 12:15//
 +
 +Recompiling llvm-gcc 2.8 on the eecg machines. ​
 +First you need to compile llvm 2.8
 +<​code>​
 +acanis@navy:​~/​llvm-2.8$ ./configure
 +acanis@navy:​~/​llvm-2.8$ make -j 2 ENABLE_OPTIMIZED=1
 +</​code>​
 +Then compile llvm-gcc:
 +<​code>​
 +acanis@navy:​~/​llvm-gcc-4.2-2.8.source/​obj$ ../​configure --target=i686-pc-linux-gnu --with-tune=generic --with-arch=pentium4 --prefix=`pwd`/​../​install --program-prefix=llvm- --enable-llvm=/​home/​acanis/​llvm-2.8/​ --enable-languages=c,​c++
 +acanis@navy:​~/​llvm-gcc-4.2-2.8.source/​obj$ make -j2 LLVM_VERSION_INFO=2.8
 +</​code>​
 +
 +Received the error:
 +<​code>​
 +/​brown/​r/​r0/​acanis/​llvm-gcc-4.2-2.8.source/​obj/​./​gcc/​xgcc -B/​brown/​r/​r0/​acanis/​llvm-gcc-4.2-2.8.source/​obj/​./​gcc/​ -B/​brown/​r/​r0/​acanis/​llvm-gcc-4.2-2.8.sourc
 +e/​obj/​../​install/​i686-pc-linux-gnu/​bin/​ -B/​brown/​r/​r0/​acanis/​llvm-gcc-4.2-2.8.source/​obj/​../​install/​i686-pc-linux-gnu/​lib/​ -isystem /​brown/​r/​r0/​acanis/​llvm-g
 +cc-4.2-2.8.source/​obj/​../​install/​i686-pc-linux-gnu/​include -isystem /​brown/​r/​r0/​acanis/​llvm-gcc-4.2-2.8.source/​obj/​../​install/​i686-pc-linux-gnu/​sys-include  ​
 +-O2  -O2 -g -O2  -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE ​  -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition ​ -isystem ./
 +include ​ -fPIC -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -msse -c \
 +                ../​../​gcc/​config/​i386/​crtfastmath.c \
 +                -o crtfastmath.o
 +/​brown/​r/​r0/​acanis/​llvm-gcc-4.2-2.8.source/​obj/​./​gcc/​as:​ line 2: exec: -Q: invalid option
 +</​code>​
 +
 +I'm going to try again but get rid of x86 specific target stuff:
 +<​code>​
 +acanis@navy:​~/​llvm-gcc-4.2-2.8.source/​obj$ ../​configure --prefix=`pwd`/​../​install --program-prefix=llvm- --enable-llvm=/​home/​acanis/​llvm-2.8/​ --enable-languages=c,​c++
 +acanis@navy:​~/​llvm-gcc-4.2-2.8.source/​obj$ make -j2 LLVM_VERSION_INFO=2.8
 +acanis@navy:​~/​llvm-gcc-4.2-2.8.source/​obj$ make install
 +</​code>​
 +
 +Okay that worked. The new version of llvm-gcc is in:
 +<​code>​
 +~/​llvm-gcc-4.2-2.8.source/​install/​bin
 +</​code>​
 +
 +Okay. So the mips bug was related to the fact we're still using llvm-gcc. I think we should move to clang. clang 2.9 works fine on mint.
 +
 +Basic blocks don't have names in clang. This will make debugging more difficult.
 +Clang has some warnings on the benchmarks: jpeg, malloc
 +
 +memset fails with:
 +<​code>​
 +FAIL: memset
 +Dest Pointer: i8* %arr
 +Unknown pointer destination in intrinsic argument
 +UNREACHABLE executed at PreLTO.cpp:​159!
 +0  opt          0x088d7379
 +1  opt          0x088d7a41
 +2               ​0x4001e400 __kernel_sigreturn + 0
 +3  libc.so.6 ​   0x402ae098 abort + 392
 +4  opt          0x088c4788 llvm::​report_fatal_error(llvm::​Twine const&) + 0
 +5  LLVMLegUp.so 0x404175a5 legup::​LegUp::​getIntrinsicMemoryAlignment(llvm::​CallInst*) + 255
 +6  LLVMLegUp.so 0x40417846 legup::​LegUp::​lowerLegupInstrinsic(llvm::​CallInst*,​ llvm::​Function*) + 278
 +7  LLVMLegUp.so 0x40417b54 legup::​LegUp::​lowerIfIntrinsic(llvm::​CallInst*,​ llvm::​Function*) + 286
 +8  LLVMLegUp.so 0x4041aa07 legup::​LegUp::​runOnFunction(llvm::​Function&​) + 207
 +9  opt          0x0885e70d llvm::​FPPassManager::​runOnFunction(llvm::​Function&​) + 343
 +10 opt          0x0885e8f6 llvm::​FPPassManager::​runOnModule(llvm::​Module&​) + 114
 +11 opt          0x0885e3cc llvm::​MPPassManager::​runOnModule(llvm::​Module&​) + 398
 +12 opt          0x0885fc15 llvm::​PassManagerImpl::​run(llvm::​Module&​) + 129
 +13 opt          0x0885fc7b llvm::​PassManager::​run(llvm::​Module&​) + 39
 +14 opt          0x083f9d19 main + 4778
 +15 libc.so.6 ​   0x40297775 __libc_start_main + 229
 +16 opt          0x083ea0b1
 +</​code>​
 +
 +struct fails with:
 +<​code>​
 +llc: Ram.cpp:​155:​ void legup::​RAM::​visitConstant(const llvm::​Constant*,​ uint64_t*, std::​stack<​const llvm::​Constant*,​ std::​deque<​const llvm::​Constant*,​ std::​allocator<​const llvm::​Constant*>​ > >&, std::​stack<​unsigned int, std::​deque<​unsigned int, std::​allocator<​unsigned int> > >&, std::​stack<​unsigned int, std::​deque<​unsigned int, std::​allocator<​unsigned int> > >&, unsigned int&, unsigned int&): Assertion `isa<​ConstantAggregateZero>​(c) || isa<​ConstantPointerNull>​(c)'​ failed.
 +0  llc       ​0x09144595
 +1  llc       ​0x09144c5d
 +2            0x4001e400 __kernel_sigreturn + 0
 +3  libc.so.6 0x402ae098 abort + 392
 +4  libc.so.6 0x402a55ce __assert_fail + 238
 +5  llc       ​0x086a4755 legup::​RAM::​visitConstant(llvm::​Constant const*, unsigned long long*, std::​stack<​llvm::​Constant const*, std::​deque<​llvm::​Constant const*, std::​allocator<​llvm::​Constant const*> > >&, std::​stack<​unsigned int, std::​deque<​unsigned int, std::​allocator<​unsigned int> > >&, std::​stack<​unsigned int, std::​deque<​unsigned int, std::​allocator<​unsigned int> > >&, unsigned int&, unsigned int&) + 353
 +6  llc       ​0x086a4f05 legup::​RAM::​initializeStruct() + 595
 +7  llc       ​0x086a5077 legup::​RAM::​buildInitializer() + 111
 +8  llc       ​0x086a50fd legup::​RAM::​generateMIF() + 47
 +9  llc       ​0x08671518 legup::​VerilogWriter::​printMemoryController() + 128
 +10 llc       ​0x086741e2 legup::​VerilogWriter::​print() + 214
 +11 llc       ​0x08661480 legup::​LegupPass::​printVerilog(std::​set<​llvm::​Function*,​ std::​less<​llvm::​Function*>,​ std::​allocator<​llvm::​Function*>​ >) + 112
 +12 llc       ​0x086616a2 legup::​LegupPass::​doFinalization(llvm::​Module&​) + 220
 +13 llc       ​0x09072031 llvm::​FPPassManager::​doFinalization(llvm::​Module&​) + 75
 +14 llc       ​0x09076902 llvm::​FPPassManager::​runOnModule(llvm::​Module&​) + 178
 +15 llc       ​0x09076398 llvm::​MPPassManager::​runOnModule(llvm::​Module&​) + 398
 +16 llc       ​0x09077be1 llvm::​PassManagerImpl::​run(llvm::​Module&​) + 129
 +17 llc       ​0x09077c47 llvm::​PassManager::​run(llvm::​Module&​) + 39
 +18 llc       ​0x085fd21f main + 2887
 +19 libc.so.6 0x40297775 __libc_start_main + 229
 +20 llc       ​0x085fb511
 +</​code>​
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/12 12:15//
 +
 +Added cloog/isl into the repository.
 +
 +git clone git://​repo.or.cz/​cloog.git
 +cd cloog
 +./​get_submodules.sh
 +./​autogen.sh
 +./configure --prefix=~/​work/​polly/​cloog/​install
 +make
 +make install
 +
 +I needed to copy .gitmodules into the base legup folder and modify the path:
 +<​code>​
 +-path = isl
 ++path = cloog/isl
 +</​code>​
 +
 +Then run:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup$ cloog/​get_submodules.sh ​
 +Submodule '​isl'​ (git://​repo.or.cz/​isl.git) registered for path '​cloog/​isl'​
 +Cloning into cloog/​isl...
 +warning: templates not found /​usr/​local/​share/​git-core/​templates
 +remote: Counting objects: 9585, done.
 +remote: Compressing objects: 100% (2180/​2180),​ done.
 +remote: Total 9585 (delta 7127), reused 9585 (delta 7127)
 +Receiving objects: 100% (9585/​9585),​ 2.05 MiB | 328 KiB/s, done.
 +Resolving deltas: 100% (7127/​7127),​ done.
 +Submodule path '​cloog/​isl':​ checked out '​24e309472a53920bdf19130a12c9ccec320c1867'​
 +</​code>​
 +
 +Now I added the new folder:
 +<​code>​
 +git add cloog/isl
 +</​code>​
 +
 +Whoops. That didn't work. Okay I don't think I can use submodules here.
 +I just need to check out both paths.
 +
 +Looking in cloog/​.gitmodules the repo for isl is git://​repo.or.cz/​isl.git
 +
 +cloog revision:
 +<​code>​
 +commit 225c2ed62fe37a4db22bf4b95c3731dab1a50dde
 +Author: Sven Verdoolaege <​skimo@kotnet.org>​
 +Date:   Sun Jul 10 09:27:24 2011 +0200
 +</​code>​
 +
 +isl revision:
 +<​code>​
 +commit e536653cbc99d7349eafa5e1a9cba873db3135eb
 +Author: Sven Verdoolaege <​skimo@kotnet.org>​
 +Date:   Sat Aug 6 22:30:40 2011 +0200
 +</​code>​
 +
 +Wait. This revision is different than the submodule one listed above...
 +
 +Doesn'​t matter. ​
 +
 +Seeing an error for the hybrids:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​chstone_hybrid/​adpcm$ ./​sim_all_functions ​
 +...
 +export LEGUP_ACCELERATOR_FILENAME=adpcm;​ \
 +        ../​../​../​llvm/​Debug+Asserts/​bin/​opt -legup-config=config.tcl -load=../​../​../​llvm/​Debug+Asserts/​lib//​LLVMLegUp.so -legup-sw-only < adpcm.prelto.bc > adpcm.prelto.sw.bc
 +LLVM ERROR: IO failure on output stream.
 +</​code>​
 +
 +This error can't be debugged with gdb. Looking in raw_ostream.cpp
 +<​code>​
 +  // If there are any pending errors, report them now. Clients wishing
 +  // to avoid report_fatal_error calls should check for errors with
 +  // has_error() and clear the error flag with clear_error() before
 +  // destructing raw_ostream objects which may have errors.
 +  if (has_error())
 +    report_fatal_error("​IO failure on output stream."​);​
 +</​code>​
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/10 12:15//
 +
 +llc infinite loops on gsm. Added -debug segfaults.
 +I'm going to have to recompile in debug mode.
 +<​code>​
 +./configure --disable-optimized --with-cloog=/​home/​acanis/​work/​polly/​cloog/​install/​ --with-isl=/​home/​acanis/​work/​polly/​cloog/​install/​
 +</​code>​
 +
 +Seems to be something to do with the new SDC scheduler.
 +Or could it just be taking a long time? I doubt it, gsm never took this long before.
 +There are 30 recursive calls to the function:
 +<​code>​
 +(gdb) bt
 +#0  0xb744341d in memmove () from /​lib/​tls/​i686/​cmov/​libc.so.6
 +#1  0x0916e8dd in mat_appendrow ()
 +#2  0x09161c57 in add_constraintex ()
 +#3  0x086a90db in legup::​SDCScheduler::​addTimingConstraints (this=0x97754b8,​ Root=0x978d6f8,​ Curr=0x978ac78, ​
 +    PartialPathDelay=108.549011) at SDCScheduler.cpp:​227
 +#4  0x086a9117 in legup::​SDCScheduler::​addTimingConstraints (this=0x97754b8,​ Root=0x978d6f8,​ Curr=0x978ad18, ​
 +    PartialPathDelay=104.96801) at SDCScheduler.cpp:​232
 +#5  0x086a9117 in legup::​SDCScheduler::​addTimingConstraints (this=0x97754b8,​ Root=0x978d6f8,​ Curr=0x978af98, ​
 +    PartialPathDelay=101.168007) at SDCScheduler.cpp:​232
 +...
 +#29 0x086a9117 in legup::​SDCScheduler::​addTimingConstraints (this=0x97754b8,​ Root=0x978d6f8,​ Curr=0x978d6f8, ​
 +    PartialPathDelay=4.29199982) at SDCScheduler.cpp:​232
 +---Type <​return>​ to continue, or q <​return>​ to quit---
 +#30 0x086a91f5 in legup::​SDCScheduler::​addTimingConstraints (this=0x97754b8,​ F=@0x970f0d8)
 +    at SDCScheduler.cpp:​246
 +#31 0x086a98c6 in legup::​SDCScheduler::​runOnFunction (this=0x97754b8,​ F=@0x970f0d8) at SDCScheduler.cpp:​430
 +#32 0x09065971 in llvm::​FPPassManager::​runOnFunction (this=0x9774d00,​ F=@0x970f0d8) at PassManager.cpp:​1513
 +#33 0x09065b55 in llvm::​FPPassManager::​runOnModule (this=0x9774d00,​ M=@0x970dee0) at PassManager.cpp:​1535
 +#34 0x09065630 in llvm::​MPPassManager::​runOnModule (this=0x970e378,​ M=@0x970dee0) at PassManager.cpp:​1589
 +#35 0x09066e65 in llvm::​PassManagerImpl::​run (this=0x9713f00,​ M=@0x970dee0) at PassManager.cpp:​1671
 +#36 0x09066ecb in llvm::​PassManager::​run (this=0xbfd39164,​ M=@0x970dee0) at PassManager.cpp:​1715
 +#37 0x085f8f0f in main (argc=6, argv=0xbfd392a4) at llc.cpp:396
 +</​code>​
 +
 +There are about 300 recursive calls to addTimingConstraints() for some reason.
 +There'​s actually a huge basic block in gsm with about 100 instructions.
 +<​code>​
 +bb.nph.i.i.i: ​                                    ; preds = %bb17.i.i
 +</​code>​
 +
 +Seeing one last problem with make tiger
 +<​code>​
 +../​../​mips-binutils/​bin/​mipsel-elf-ld -T ../​../​tiger/​linux_tools/​lib/​prog_link.ld -e main struct.o ../​../​tiger/​tool_source/​lib/​altera_avalon_performance_counter.o -o struct.elf -EL -L ../​../​tiger/​linux_tools/​lib -lgcc -lfloat -luart
 +struct.o: In function `main':​
 +(_main_section+0x2c):​ undefined reference to `memcpy'​
 +struct.o: In function `main':​
 +(_main_section+0x6c):​ undefined reference to `memcpy'​
 +make: *** [tiger] Error 1
 +</​code>​
 +
 +I don't get this. Why didn't I run into this before? Don't we lower memcpys into legup instructions?​
 +I see the memcpy in the .s file:
 +<​code>​
 +main:
 +...
 +# BB#0:                                 # %entry
 +...
 + jal memcpy
 +</​code>​
 +I don't see a memcpy in the .ll (there is a legup_memcpy_4 though). Damn. This must be created in the MIPS backend?
 +I might have to write a memcpy manually. Just like Mark had to write a printf.
 +
 +Tiger libraries are stored in:
 +<​code>​
 +../​../​tiger/​linux_tools/​lib
 +</​code>​
 +Sources are in:
 +<​code>​
 +../​../​tiger/​tool_source/​lib
 +</​code>​
 +
 +I can find memcpy inside libgcc.a. Which should be included.
 +I see a mem.c file in the source directory. Compiles fine if I add:
 +<​code>​
 +../​../​tiger/​tool_source/​lib/​mem.o
 +</​code>​
 +
 +So this compiles okay. But now make emulwatch doesn'​t match:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​struct$ diff -u lli.txt sim.txt
 +--- lli.txt ​    ​2011-08-04 16:​04:​13.000000000 -0400
 ++++ sim.txt ​    ​2011-08-04 16:​04:​15.000000000 -0400
 +@@ -69,7 +69,7 @@
 +   ​%exitcond=0
 + ​legup_memcpy_4:​bb
 +   ​%indvar=d
 +-  %3=cdcd1514
 ++  %3=1514
 +   ​%indvar.next=e
 +   ​%exitcond=1
 + ​legup_memcpy_4:​return
 +</​code>​
 +
 +The code looks like:
 +<​code>​
 +void legup_memcpy_4(uint32_t * d, const uint32_t * s, size_t n)
 +{
 +    uint32_t * dt = d;
 +    const uint32_t * st = s;
 +    n >>= 2;
 +    while (n--)
 +        *dt++ = *st++;
 +}
 +</​code>​
 +
 +The .ll:
 +<​code>​
 +bb:                                               ; preds = %bb, %bb.nph
 +  %indvar = phi i32 [ 0, %bb.nph ], [ %indvar.next,​ %bb ]
 +  %st.04 = getelementptr i32* %s, i32 %indvar
 +  %dt.03 = getelementptr i32* %d, i32 %indvar
 +  %3 = load i32* %st.04, align 4
 +  store i32 %3, i32* %dt.03, align 4
 +  %indvar.next = add i32 %indvar, 1
 +  %exitcond = icmp eq i32 %indvar.next,​ %tmp
 +  %4 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([111 x i8]* @11, i32 0, i32 0), i32 %indvar, i32 %3, i32 %indvar.next,​ i1 %exitcond)
 +  br i1 %exitcond, label %return, label %bb
 +</​code>​
 +
 +So it looks like the load doesn'​t match. cdcd = 1100 1101 1100 1101
 +For some reason this is zeroed out in gxemul. Indvar = d = 13.
 +Actually this even happens with "make watch" but the final result is still correct.
 +I completely removed the legup_memcpy_4 code (disabling prelto pass) still fails.
 +Very strange. Now make emulwatch simulation just stops right after:
 +<​code>​
 +pointSum:​return
 +  %retval1=11
 +</​code>​
 +
 +Why would nothing else get printed? There are three calls to pointSum. gxemul only calls the
 +function once.
 +
 +Interesting. So it looks like the breakpoint never triggers at the return address of main.
 +Instead it triggers at the end of the code:
 +<​code>​
 +../​lib/​gxemul.exp -E testmips -e R3000 struct.elf -p `../​../​tiger/​linux_tools/​lib/​../​find_ra struct.emul.src` -p 0xffffffff80000180 -q
 +exit at: pc = 0xffffffff80000180
 +reg: v0 = 0x0000000000000011
 +</​code>​
 +
 +The return address of main is:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​struct$ ../​../​tiger/​linux_tools/​lib/​../​find_ra struct.emul.src
 +0xffffffff800319f8
 +</​code>​
 +
 +So the return address of pointSum is probably incorrect...
 +If I comment out the pointSum calls everything works fine.
 +Lets see what the return address is:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​struct$ gxemul -E testmips -e R3000 struct.elf -p `../​../​tiger/​linux_tools/​lib/​../​find_ra struct.emul.src` -p 0xffffffff80000180 -q -p  0xffffffff8003002c
 +BREAKPOINT: pc = 0xffffffff8003002c
 +(The instruction has not yet executed.)
 +GXemul> s
 +ffffffff8003002c:​ 27bd0008 ​     addiu   ​sp,​sp,​8
 +GXemul>
 +ffffffff80030030:​ 03e00008 ​     jr      ra      <​sum+0x110>​
 +ffffffff80030034:​ 00000000 (d)  nop
 +GXemul>
 +ffffffff80030148:​ 00028021 ​     addu    s0,zr,v0
 +</​code>​
 +
 +Looks fine. Jumps back right after pointSum call.
 +Wait what's this. I step a few more times and:
 +<​code>​
 +GXemul> ​
 +ffffffff80030150:​ 8fa4003e ​     lw      a0,​62(sp) ​      ​[0xffffffffa0007e6e]
 +[ exception ADEL vaddr=0xffffffffa0007e6e pc=0xffffffff80030150 <​sum+0x118>​ ]
 +GXemul> ​
 +ffffffff80000180:​ 00000000 ​     nop
 +BREAKPOINT: pc = 0xffffffff80000180
 +(The instruction has not yet executed.)
 +</​code>​
 +
 +There should really be another check in the test suite that we never
 +break on the second breakpoint
 +
 +Looking up this exception:
 +<​code>​
 +4.8.9 Address Error Exception — Instruction Fetch/Data Access
 +An address error exception occurs on an instruction or data access
 +when an attempt is made to execute one of the following:
 +       • Fetch an instruction,​ load a word, or store a word that is
 +not aligned on a word boundary
 +       • Load or store a halfword that is not aligned on a halfword boundary
 +       • Reference the kernel address space from user mode
 +
 +Note that in the case of an instruction fetch that is not aligned on a
 +word boundary, PC is updated before the condition
 +is detected. Therefore, both EPC and BadVAddr point to the unaligned
 +instruction address. In the case of a data
 +access the exception is taken if either an unaligned address or an
 +address that was inaccessible in the current processor
 +mode was referenced by a load or store instruction.
 +Cause Register ExcCode Value:
 +    ADEL: Reference was a load or an instruction fetch
 +    ADES: Reference was a store
 +</​code>​
 +
 +The lw was trying to load a 32-bit word from address 62 + sp into reg a0.
 +62 = 0x3e added to the sp is 0xffffffffa0007e6e (as shown by vaddr above).
 +the last 4 bits are: 1110 but the last two bits must be 0 to be aligned
 +to 32-bit. I'm going to file a bug.
 +<​code>​
 +GXemul> reg
 +cpu0:    pc = 0xffffffff80000180 ​   < no symbol >
 +...
 +cpu0:    a0 = 0x000000000b0a0908 ​   s4 = 0x0000000000000004
 +...
 +cpu0:    t5 = 0x0000000000000000 ​   sp = 0xffffffffa0007e30
 +</​code>​
 +
 +I'm just going to file a bug report.
 +Okay no. It seems to be something to do with this line:
 +<​code>​
 +../​../​mips-binutils/​bin/​mipsel-elf-ld -T ../​../​tiger/​linux_tools/​lib/​prog_link_sim.ld -e main struct.o -o struct.elf -EL -L ../​../​tiger/​linux_tools/​lib -lgcc -lfloat -luart_el_sim -lmem_el_sim
 +</​code>​
 +
 +That's because nothing gets run unless we link in those libraries.
 +
 +
 +You can run the Mips test Victor submitted manually like this:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​llvm/​test$ ../​Debug+Asserts/​bin/​llvm-lit -v CodeGen/​Mips/​2010-07-20-Switch.ll ​
 +-- Testing: 1 tests, 4 threads --
 +PASS: LLVM :: CodeGen/​Mips/​2010-07-20-Switch.ll (1 of 1)
 +Testing Time: 0.03s
 +  Expected Passes ​   : 1
 +</​code>​
 +
 +Submitted an LLVM bug report: http://​llvm.org/​bugs/​show_bug.cgi?​id=10634
 +
 +So lets just wait and see what Bruno Lopes has to say.
 +Could it be an issue with the llvm-gcc?
 +
 +The following tests don't run gxemul:
 +<​code>​
 +./​div_const/​dg.exp
 +./​overflow_intrinsic/​dg.exp
 +./​signeddiv/​dg.exp
 +./​phi/​dg.exp
 +./​unaligned/​dg.exp
 +./​cpp/​dg.exp
 +</​code>​
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/04 12:15//
 +
 +New LLVM version is almost working but I'm seeing the error:
 +<​code>​
 +[ 93%] Built target LLVMPolly
 +make -f tools/​llvm-config/​CMakeFiles/​llvm-config.target.dir/​build.make tools/​llvm-config/​CMakeFiles/​llvm-config.target.dir/​depend
 +make[2]: Entering directory `/​home/​acanis/​work/​legup/​build'​
 +cd /​home/​acanis/​work/​legup/​build && /​home/​acanis/​cmake-2.8.4-Linux-i386/​bin/​cmake -E cmake_depends "Unix Makefiles"​ /​home/​acanis/​work/​legup/​llvm /​home/​acanis/​work/​legup/​llvm/​tools/​llvm-config /​home/​acanis/​work/​legup/​build /​home/​acanis/​work/​legup/​build/​tools/​llvm-config /​home/​acanis/​work/​legup/​build/​tools/​llvm-config/​CMakeFiles/​llvm-config.target.dir/​DependInfo.cmake --color=
 +make[2]: Leaving directory `/​home/​acanis/​work/​legup/​build'​
 +make -f tools/​llvm-config/​CMakeFiles/​llvm-config.target.dir/​build.make tools/​llvm-config/​CMakeFiles/​llvm-config.target.dir/​build
 +make[2]: Entering directory `/​home/​acanis/​work/​legup/​build'​
 +/​home/​acanis/​cmake-2.8.4-Linux-i386/​bin/​cmake -E cmake_progress_report /​home/​acanis/​work/​legup/​build/​CMakeFiles
 +[ 93%] Updating LibDeps.txt if necessary...
 +cd /​home/​acanis/​work/​legup/​build/​tools/​llvm-config && /​home/​acanis/​cmake-2.8.4-Linux-i386/​bin/​cmake -E copy_if_different LibDeps.txt.tmp LibDeps.txt
 +Error copying file (if different) from "​LibDeps.txt.tmp"​ to "​LibDeps.txt"​.
 +make[2]: *** [tools/​llvm-config/​LibDeps.txt] Error 1
 +</​code>​
 +
 +I had to fix the llvm-config/​CMakeLists.txt file as mentioned in:
 +<​code>​
 +    http://​comments.gmane.org/​gmane.comp.compilers.llvm.cvs/​89287
 +</​code>​
 +
 +Now I see:
 +<​code>​
 +CMakeFiles/​llvm-mc.dir/​llvm-mc.cpp.o:​ In function `llvm::​InitializeAllTargetMCs()':​
 +/​home/​acanis/​work/​legup/​build/​include/​llvm/​Config/​Targets.def:​41:​ undefined reference to `LLVMInitializeVerilogTargetMC'​
 +collect2: ld returned 1 exit status
 +</​code>​
 +Okay, slight interface change in LLVM.
 +
 +Linker error for llc:
 +<​code>​
 +Linking CXX executable ../​../​bin/​llc
 +../​../​lib/​libLLVMVerilog.a(SDCScheduler.cpp.o):​ In function `legup::​SDCScheduler::​scheduleAXAP(bool)':​
 +/​home/​acanis/​work/​legup/​llvm/​lib/​Target/​Verilog/​SDCScheduler.cpp:​379:​ undefined reference to `set_obj_fnex'​
 +</​code>​
 +
 +The cmake cache is really annoying. Every time you modify the cmake files you have to run:
 +<​code>​
 +rm CMakeCache.txt
 +</​code>​
 +
 +To figure out what's happening when you're making:
 +<​code>​
 +make VERBOSE=1
 +</​code>​
 +
 +Seems like tcl is being added properly here but the lpsolve library isn't being added:
 +<​code>​
 +/​usr/​bin/​c++ ​   -fPIC -fno-rtti -g   ​CMakeFiles/​llc.dir/​llc.cpp.o ​ -o
 +../​../​bin/​llc -rdynamic -ltcl8.5 ../​../​lib/​libLLVMVerilog.a (SKIPPED) -ltcl8.5 -ldl
 +-lpthread ​
 +</​code>​
 +If I manually rerun this with "​-L/​usr/​lib/​lp_solve -llpsolve55"​ added it works fine.
 +Okay. This was a problem with the llvm/​CMakeList.txt file.
 +Great. Everything compiles with cmake. Now lets try autoconf.
 +
 +Compiler warnings:
 +<​code>​
 +LegupTcl.cpp:​ In function ‘int legup::​set_accelerator_function(void*,​ Tcl_Interp*,​ int, const char**)’:
 +LegupTcl.cpp:​23:​ warning: cast from type ‘const char*’ to type ‘char*’ casts away constness
 +LegupTcl.cpp:​ In function ‘int legup::​set_operation_attributes(void*,​ Tcl_Interp*,​ int, const char**)’:
 +LegupTcl.cpp:​35:​ warning: cast from type ‘const char*’ to type ‘char*’ casts away constness
 +...
 +LegupTcl.cpp:​97:​ warning: cast from type ‘const char*’ to type ‘char*’ casts away constness
 +LegupTcl.cpp:​ In function ‘int legup::​set_device_specs(void*,​ Tcl_Interp*,​ int, const char**)’:
 +LegupTcl.cpp:​108:​ warning: cast from type ‘const char*’ to type ‘char*’ casts away constness
 +...
 +LegupTcl.cpp:​136:​ warning: cast from type ‘const char*’ to type ‘char*’ casts away constness
 +</​code>​
 +
 +Strange error:
 +<​code>​
 +make[2]: Entering directory `/​home/​acanis/​work/​legup/​llvm/​unittests/​VMCore'​
 +llvm[2]: Compiling DerivedTypesTest.cpp for Release+Asserts build
 +DerivedTypesTest.cpp:​ In function ‘void<​unnamed>::​PR7658()’:​
 +DerivedTypesTest.cpp:​24:​ error: ‘PATypeHolder’ was not declared in this scope
 +</​code>​
 +I think the file has been deleted. Yep.
 +
 +It seems like our llvm-gcc is too old:
 +<​code>​
 +llvm-gcc array.c -emit-llvm -c -fno-builtin -m32 -malign-double -I ../​lib/​include/​ -O0 -fno-inline-functions -o array.prelto.1.bc
 +# linking may produce llvm mem-family intrinsics
 +../​../​llvm/​Release+Asserts/​bin/​llvm-ld -disable-inlining -disable-opt array.prelto.1.bc -b=array.prelto.linked.bc
 +llvm-ld: error: Cannot load file '​array.prelto.1.bc':​ Bitcode file '​array.prelto.1.bc'​ could not be loaded: Invalid ALLOCA record
 +</​code>​
 +
 +Clang doesn'​t have this error. But there are warnings:
 +<​code>​
 +clang array.c -emit-llvm -c -fno-builtin -m32 -malign-double -I ../​lib/​include/​ -O0 -fno-inline-functions -o array.prelto.1.bc
 +clang: warning: argument unused during compilation:​ '​-malign-double'​
 +</​code>​
 +And also verilog errors:
 +<​code>​
 +-- Compiling module fct
 +** Error: array.v(550):​ '​LEGUP1_F_fct_BB_'​ already declared in this scope.
 +...
 +** Error: array.v(566):​ '​LEGUP3_F_fct_BB_'​ already declared in this scope.
 +-- Compiling module main
 +** Error: array.v(1338):​ '​LEGUP1_F_main_BB_'​ already declared in this scope.
 +...
 +** Error: array.v(1377):​ '​LEGUP8_F_main_BB_'​ already declared in this scope.
 +</​code>​
 +
 +
 +Damn. I can't run llvm-gcc 2.9:
 +<​code>​
 +llvm-gcc: /​lib/​tls/​i686/​cmov/​libc.so.6:​ version `GLIBC_2.11'​ not found (required by llvm-gcc)
 +</​code>​
 +Okay great. llvm-gcc 2.8 works.
 +
 +Seeing some bugs:
 +<​code>​
 +../​../​../​llvm/​Release+Asserts/​bin/​opt:​ symbol lookup error: ../​../​../​llvm/​Release+Asserts/​lib/​LLVMLegUp.so:​ undefined symbol: _ZN4llvm17IntrinsicLowering18LowerIntrinsicCallEPNS_8CallInstE
 +</​code>​
 +
 +I thought I already fixed this...
 +Looks fine. I included the dependency in lib/​Transforms/​LegUp/​Makefile:​
 +<​code>​
 +USED_LIBS = LLVMCodeGen
 +</​code>​
 +Tried adding back in:
 +<​code>​
 +LDFLAGS = $(LLVM_OBJ_ROOT)/​lib/​CodeGen/​$(BuildMode)/​IntrinsicLowering.o
 +</​code>​
 +Strange. So that worked.
 +Wow. dfmul is suddenly fixed! I'm guessing this was caused by the newer llvm-gcc version?
 +
 +llc seems to be running into an infinite loop on gsm...
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/08/03 12:15//
 +
 +Autoconf doesn'​t work for the latest git llvm and polly?
 +<​code>​
 +llvm[0]: Compiling ScheduleOptimizer.cpp for Debug+Asserts build (PIC)
 +ScheduleOptimizer.cpp:​30:​26:​ error: isl/​schedule.h:​ No such file or directory
 +</​code>​
 +
 +Trying make -n to show makefile commands.
 +The schedule.h file doesn'​t actually exist...
 +Is this a new header file that has been added in the past few months?
 +Yep. Needed to update my cloog version.
 +Okay this works now. So I actually need to distribute these
 +header files and .so manually.
 +Damn. This also means I need to update LLVM.
 +
 +Updating to commit:
 +<​code>​
 +commit b4f4cbd199318901d12737ded05ebebd8cb21336
 +Author: David Greene <​greened@obbligato.org>​
 +Date:   Fri Jul 29 20:50:18 2011 +0000
 +</​code>​
 +
 +Damn. The merge totally fails. ​ I see a lot of "both added" conflicts. ​
 +<​code>​
 +git checkout --theirs -- Transforms/
 +git add -u Transforms/
 +</​code>​
 +
 +Actually usually you can just manually merge the makefiles and
 +files we changed (looking at git log), then just checkout/​add the whole directory.
 +
 +Testing the Mips backend again. I'm going to submit some bug reports.
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/07/29 12:15//
 +
 +Adding polly to the repo. Cmake was working before. Trying to get autoconf working.
 +In llvm running ./configure --with-cloog=~/​work/​polly/​cloog/​install/​ --with-isl=~/​work/​polly/​cloog/​install/​ gives:
 +<​code>​
 +=== configuring in tools/polly (/​home/​acanis/​work/​legup/​llvm/​tools/​polly)
 +...
 +checking for isl in inc_not_give_isl,​ lib_not_give_isl... configure: error: isl required but not found
 +configure: error: ./configure failed for tools/polly
 +</​code>​
 +
 +Wow. You can't use ~ in the path! So annoying.
 +<​code>​
 +./configure --with-cloog=/​home/​acanis/​work/​polly/​cloog/​install/​ --with-isl=/​home/​acanis/​work/​polly/​cloog/​install/​
 +</​code>​
 +
 +
 +
 +
 +
 +
 +How to do live variable analysis in SSA?
 +I see from LiveVariables.cpp:​
 +<​code>​
 +It uses the dominance properties of SSA form to efficiently compute live
 +variables for virtual registers
 +</​code>​
 +What does this mean?
 +
 +<​code>​
 +  // Calculate live variable information in depth first order on the CFG of the
 +  // function. ​ This guarantees that we will see the definition of a virtual
 +  // register before its uses due to dominance properties of SSA (except for PHI
 +  // nodes, which are treated as a special case).
 +</​code>​
 +Oh you can just do a depth first traversal of the CFG. 
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/07/26 12:15//
 +
 +Testing the git subtree method locally.
 +Wow. Ran into a really annoying bug with git merge subtree. Turns out you need to specify the directory location otherwise
 +the merge won't work properly:
 +    * http://​stackoverflow.com/​questions/​5904256/​git-subtree-merge-into-a-deeply-nested-subdirectory
 +
 +I see a lot of "both added" conflicts. I'm just going to take LLVM's version and then manually merge the 
 +autoconfig changes
 +<​code>​
 +git checkout --theirs -- .
 +git add -u .
 +</​code>​
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/07/25 12:15//
 +
 +What's the status on the LLVM update and loop pipelining integration?​
 +
 +There'​s still a bug with gxemul with the new LLVM mips backend:
 +<​code>​
 +Running ./​chstone/​dfmul/​dg.exp ...
 +FAIL: gxemul simulation. Expected: reg: v0 = 0x0000000000000000
 +</​code>​
 +
 +make emulwatch gives:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​chstone/​dfmul$ diff -u sim.txt lli.txt ​
 +--- sim.txt 2011-07-07 14:​55:​20.000000000 -0400
 ++++ lli.txt 2011-07-07 14:​55:​18.000000000 -0400
 +@@ -160,12 +160,14 @@
 +   ​%87=ffff000000000000
 +   %88=1
 + ​main:​bb3.i26.i
 +-  %90=0
 ++  %90=1
 ++main:​bb5.i27.i
 ++  %retval.i.i=ffff000000000000
 + ​main:​float64_mul.exit
 +-  %181=3ff8000000000000
 ++  %181=ffff000000000000
 +   ​%183=ffff000000000000
 +-  %184=1
 +-  %186=1
 ++  %184=0
 ++  %186=0
 +   ​%189=4
 +   ​%exitcond=1
 + ​main:​bb2
 +</​code>​
 +
 +The value of %90 is wrong:
 +<​code>​
 +float64_is_signaling_nan.exit.i.i: ​               ​
 +  %84 = phi i32 [ %80, %bb.i.i.i ], [ %retval.i11.i.i,​ %float64_is_signaling_nan.exit14.i.i ], [ 0, %bb16.i.float64_is_signaling_nan.exit14.i.i_crit_edge ]
 +
 +bb3.i26.i: ​                                       ; preds = %float64_is_signaling_nan.exit.i.i
 +  %90 = icmp eq i32 %84, 0
 +</​code>​
 +
 +In both cases we're coming from main:​bb16.i.float64_is_signaling_nan.exit14.i.i_crit_edge.
 +So %84 = 0. Which is also correct in both traces in basic block main:​float64_is_signaling_nan.exit.i.i
 +
 +Where is this in the assembly?
 +Looking at dfmul.s:
 +<​code>​
 +$BB0_40: ​                               # %bb3.i26.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + addiu $19, $zero, 0
 + lui $16, %hi(__unnamed_24)
 + xor $19, $16, $19
 + addiu $4, $16, %lo(__unnamed_24)
 + sltu $5, $19, 1
 + jal mprintf
 +</​code>​
 +
 +From below. I already looked at this.  If $19 < 1 then $5 = 1 else $5 = 0.
 +If $19 represents %84 and $5 represents %90 then when $19=0 then $5=1. 
 +The sim says $5 (%90) is 0 when it should be 1.
 +I would like to step through this code in gxemul.
 +"make emul" runs the following commands:
 +<​code>​
 +../​../​../​mips-binutils/​bin/​mipsel-elf-ld -T ../​../​../​tiger/​linux_tools/​lib/​prog_link_sim.ld -e main dfmul.o -o dfmul.elf -EL -L ../​../​../​tiger/​linux_tools/​lib -lgcc -lfloat -luart_el_sim
 +../​../​../​mips-binutils/​bin/​mipsel-elf-objdump -d dfmul.elf > dfmul.emul.src
 +gxemul -E testmips -e R3000 dfmul.elf -p `../​../​../​tiger/​linux_tools/​lib/​../​find_ra dfmul.emul.src` -p 0xffffffff80000180 -q
 +</​code>​
 +
 +Before running "make emul" you need to compile the .s file with:
 +<​code>​
 +../​../​../​mips-binutils/​bin/​mipsel-elf-as dfmul.s -mips1 -mabi=32 -o dfmul.o -EL
 +../​../​../​mips-binutils/​bin/​mipsel-elf-ld -T ../​../​../​tiger/​linux_tools/​lib/​prog_link.ld -e main dfmul.o ../​../​../​tiger/​tool_source/​lib/​altera_avalon_performance_counter.o -o dfmul.elf -EL -L ../​../​../​tiger/​linux_tools/​lib -lgcc -lfloat -luart
 +../​../​../​mips-binutils/​bin/​mipsel-elf-objdump -D dfmul.elf > dfmul.src
 +../​../​../​tiger/​linux_tools/​lib/​../​elf2sdram dfmul.elf sdram.dat
 +</​code>​
 +
 +
 +Doing an instruction trace with -i. The %90 is printed at line 305279.
 +
 +
 +Fixing lpsolve dependency.
 +Makefile.config is generated by configure from Makefile.config.in
 +Useful guide: http://​llvm.org/​docs/​MakefileGuide.html#​Makefile.config
 +
 +I had to add a new macro called AX_EXT_HAVE_LIB() because the lpsolve
 +library isn't installed in /usr/lib but in /​usr/​lib/​lp_solve/​.
 +The new macro adds the appropriate -L/​usr/​lib/​lp_solve/​ flag.
 +
 +Note the both AX_EXT_HAVE_LIB and AC_SEARCH_LIBS modify the Makefile.config
 +LIBS variable.
 +
 +For some reason the makefile is broken. The LDFLAGS aren'​t ​
 +being added properly:
 +<​code>​
 +/​home/​acanis/​git/​legup/​llvm/​Release/​bin/​tblgen:​ error while loading shared libraries: liblpsolve55.so:​ cannot open shared object file: No such file or directory
 +</​code>​
 +
 +Okay I just changed this to use liblpsolve_pic.a which is compiled with -fPIC to allow shared linkage.
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/07/07 12:15//
 +
 +Okay. "make test_tiger_sim"​ seems to be failing with my new changes to libuart.
 +Looks like it's a problem with mprintf not working. "make tigersim"​ doesn'​t produce the expected output.
 +
 +Alright so adding this back into uart.h (included from stdio.h):
 +<​code>​
 +#define printf mprint
 +</​code>​
 +
 +But I still don't get the right output from tiger modelsim:
 +For mips:
 +<​code>​
 +# 1008533759
 +</​code>​
 +For aes:
 +<​code>​
 +# 1008533759
 +</​code>​
 +
 +What does this number mean? gxemul working fine...
 +Looks like an unitialized value. Strange, when I explicitly add:
 +<​code>​
 +      main_result = 0;
 +      printf ("​%d\n",​ main_result);​
 +</​code>​
 +I still get the same thing.
 +mprintf() seems to be totally broken:
 +<​code>​
 +      printf ("​---->​%d %d %d %d\n", 0, 1, -1, main_result);​
 +</​code>​
 +Gives:
 +<​code>​
 +# ---->​1008533759 935190524 201326600 0
 +</​code>​
 +Where are these numbers coming from??
 +Is it some sort of bug in the llvm mips backend maybe?
 +I should test mprintf with gxemul.
 +For some reason printf isn't working with gxemul. Strange because make emulwatch uses printf
 +I do see a litte bit of magic in the emulwatch target:
 +<​code>​
 +sed -i "​s/​\tprintf/​\tmprintf/​g"​ mips.s
 +</​code>​
 +
 +Oh shit. I need to run "make tiger" first _before_ running "make emul"
 +Okay, I think something is broken with mprintf.
 +This:
 +<​code>​
 +      printf ("​Start\n"​);​
 +      printf ("​---->'​%d'​ '​%d'​ '​%d'​ '​%d'​\n",​ 0, 1, -1, main_result);​
 +      printf ("​End\n"​);​
 +</​code>​
 +Doesn'​t simulate properly in gxemul:
 +<​code>​
 +$ make tiger;make emul
 +...
 +Start
 +---->'​ffffffff80000180:​ 00000000 ​       nop
 +BREAKPOINT: pc = 0xffffffff80000180
 +(The instruction has not yet executed.)
 +</​code>​
 +
 +The code dies in the middle of mprintf().
 +In particular, the variable arguments seem to be failing:
 +<​code>​
 +va_arg(arg, int)
 +</​code>​
 +Seems to crash the whole program. I bet the mips backend doesn'​t
 +support variable arguments...
 +I see in the release notes for a newer LLVM version something about
 +improved support for variable arguments in the mips backend.
 +
 +I'm going to have to install the mips-gcc after all.
 +Installing from the site: http://​crosstool-ng.org/​
 +I'm putting the mips gcc in ~/​crosstool/​gcc
 +gcc gets installed in ~/x-tools/
 +
 +I need to recompile with hardware-float to avoid this warning:
 +<​code>​
 +../​../​../​mips-binutils/​bin/​mipsel-elf-ld:​ Warning: mips.elf uses hard float, ../​../​../​tiger/​linux_tools/​lib/​libuart.a(uart.o) uses soft float
 +</​code>​
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/07/06 12:15//
 +
 +After Mark's push function_pointer seems to be failing:
 +<​code>​
 +make[1]: Leaving directory `/​home/​acanis/​git/​legup/​examples/​function_pointer'​
 +function_pointer.c:​ In function ‘a’:
 +function_pointer.c:​3:​ warning: ‘return’ with a value, in function returning void
 +function_pointer.c:​ In function ‘b’:
 +function_pointer.c:​4:​ warning: ‘return’ with a value, in function returning void
 +llc: utils.cpp:​48:​ llvm::​Function* legup::​getCalledFunction(llvm::​CallInst*):​ Assertion `called'​ failed.
 +</​code>​
 +
 +Of course. Because function pointers aren't supported by LegUp. I should make
 +this error more user friendly. Added new test suite files for this.
 +
 +llist is failing because NULL is undeclared. NULL is normally declared in stdio.h
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/07/05 12:15//
 +
 +I just pulled Victor'​s fix to mprintf().
 +So it seems to work. The make emulwatch now doesn'​t show any differences.
 +
 +Interesting. So when I run: 
 +<​code>​
 +make emulwatch
 +make emultest ​
 +</​code>​
 +The result is correct. But running make tiger; make emultest fails:
 +<​code>​
 +exit at: pc = 0xffffffff80031d4c
 +reg: v0 = 0x0000000000000002
 +</​code>​
 +
 +To see the output from the dfmul printf, run "make emul".
 +I see two discrepancies:​
 +<​code>​
 +a_input=7ff0000000000000 b_input=ffffffffffffffff expected=ffffffffffffffff output=7ff8000000000000
 +a_input=3ff0000000000000 b_input=ffff000000000000 expected=ffff000000000000 output=3ff8000000000000
 +</​code>​
 +
 +But make emulwatch doesn'​t any difference these errors. How do I track down the problem?
 +Well this is because "make emulwatch"​ has the correct results.
 +So by adding printfs everywhere the bug is removed.
 +
 +Strange. When I diff the .src file between the watch version and the original
 +the watch has a slightly different _i2h function:
 +<​code>​
 +--- dfmul.emul.src ​     2011-06-06 23:​16:​59.000000000 -0400
 ++++ watch.src ​  ​2011-06-06 23:​16:​53.000000000 -0400
 +@@ -125,7 +125,7 @@
 + ​800301ac: ​     00000000 ​       nop
 + ​800301b0: ​     00021880 ​       sll     ​v1,​v0,​0x2
 + ​800301b4: ​     3c028003 ​       lui     ​v0,​0x8003
 +-800301b8: ​     244223a0 ​       addiu   ​v0,​v0,​9120
 ++800301b8: ​     24423060 ​       addiu   ​v0,​v0,​12384
 + ​800301bc: ​     00621021 ​       addu    v0,v1,v0
 + ​800301c0: ​     8c420000 ​       lw      v0,0(v0)
 + ​800301c4: ​     00000000 ​       nop
 +</​code>​
 +
 +There are massive differences in the main function.
 +Very strange. When I shrink the array size to 2 (the two errors) the results are correct.
 +If I remove the bottom 10 elements I still see the error.
 +If I reduce it to 5 elements, the first error goes away.
 +What could cause this? Some kind of bug with the stack when calling a function?
 +When I change N to be 10 when there are only 5 array elements I get the same bug.
 +If I call float64_mul on the first element 10 times I don't see the problem.
 +Is this function call related?
 +Weird, when I comment out the printf I don't get the error. But can't be the printf because
 +"make emulwatch"​ worked.
 +
 +Specifically it appears to be the printing of a_input:
 +<​code>​
 +      // error:
 +   printf ("​a_input=%016llx\n",​ a_input[i]);​
 +
 +      // no error:
 +   //printf ("​z_output=%016llx\n",​ z_output[i]);​
 +      ​
 +      // no error:
 +   //printf ("​results=%016llx\n",​ result);
 +
 +      // no error
 +   //printf ("​a_input=%016llx\n",​ b_input[i]);​
 +</​code>​
 +
 +It's some kind of bug inside: propagateFloat64NaN().
 +But when I add a printf after each statement llc dies:
 +<​code>​
 +../​../​../​build/​bin/​llc dfmul.bc -march=mipsel -relocation-model=static -mips-ssection-threshold=0 -mcpu=mips1 -o dfmul.s
 +llc: /​home/​acanis/​work/​legup/​llvm/​include/​llvm/​CodeGen/​LiveInterval.h:​355:​ llvm::​SlotIndex llvm::​LiveInterval::​beginIndex() const: Assertion `!empty() && "Call to beginIndex() on empty interval."'​ failed.
 +Stack dump:
 +0.      Program arguments: ../​../​../​build/​bin/​llc dfmul.bc -march=mipsel -relocation-model=static -mips-ssection-threshold=0 -mcpu=mips1 -o dfmul.s ​
 +1.      Running pass '​Function Pass Manager'​ on module '​dfmul.bc'​.
 +2.      Running pass '​Linear Scan Register Allocator'​ on function '​@propagateFloat64NaN'​
 +</​code>​
 +I'm not sure if this is related.
 +
 +Something about printing "​a"​ seems to fix the final errors.
 +Actually if I add a printf for bIsNaN (which is 1) I also fix one of the errors.
 +<​code>​
 +  printf ("3: %d\n", bIsNaN);
 +</​code>​
 +Why would a printf fix anything?
 +
 +It's like the if statement isn't working properly...
 +
 +Okay. With my new modified code "make emulwatch"​ is now giving me this:
 +<​code>​
 +--- lli.txt ​    ​2011-06-07 02:​02:​10.000000000 -0400
 ++++ sim.txt ​    ​2011-06-07 02:​02:​12.000000000 -0400
 +@@ -160,14 +160,12 @@
 +   ​%87=ffff000000000000
 +   %88=1
 + ​main:​bb3.i26.i
 +-  %90=1
 +-main:​bb5.i27.i
 +-  %retval.i.i=ffff000000000000
 ++  %90=0
 + ​main:​float64_mul.exit
 +-  %181=ffff000000000000
 ++  %181=3ff8000000000000
 +   ​%183=ffff000000000000
 +-  %184=0
 +-  %186=0
 ++  %184=1
 ++  %186=1
 +   ​%189=4
 +   ​%exitcond=1
 + ​main:​bb2
 +</​code>​
 +
 +Looks like the sim.txt is missing a basic block: main:​bb5.i27.i
 +
 +The first difference is:
 +<​code>​
 +  %90 = icmp eq i32 %84, 0
 +</​code>​
 +Maybe the icmp is invalid in the mips assembly?
 +
 +
 +<​code>​
 +bb3.i26.i: ​                                       ; preds = %float64_is_signaling_nan.exit.i.i
 +  %90 = icmp eq i32 %84, 0
 +  %91 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([42 x i8]* @23, i32 0, i32 0), i1 %90)
 +  br i1 %90, label %bb5.i27.i, label %float64_mul.exit
 +</​code>​
 +
 +<​code>​
 +$BB0_40: ​                               # %bb3.i26.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + addiu $19, $zero, 0
 + lui $16, %hi(__unnamed_24)
 + xor $19, $16, $19
 + addiu $4, $16, %lo(__unnamed_24)
 + sltu $5, $19, 1
 + jal mprintf
 + nop
 + beq $16, $zero, $BB0_42
 + nop
 +# BB#​41: ​                               #   in Loop: Header=BB0_1 Depth=1
 + lw $19, 296($sp)
 + nop
 + j $BB0_76
 + nop
 +$BB0_42: ​                               # %bb5.i27.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + lui $2, %hi(__unnamed_25)
 + lw $19, 296($sp)
 + nop
 + beq $17, $zero, $BB0_44
 + nop
 +# BB#​43: ​                               # %bb5.i27.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + lw $19, 292($sp)
 + nop
 +$BB0_44: ​                               # %bb5.i27.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + beq $17, $zero, $BB0_46
 + nop
 +# BB#​45: ​                               # %bb5.i27.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + addu $18, $zero, $21
 +$BB0_46: ​                               # %bb5.i27.i
 +                                        #   in Loop: Header=BB0_1 Depth=1
 + addiu $4, $2, %lo(__unnamed_25)
 + addu $5, $zero, $19
 + addu $6, $zero, $18
 + jal mprintf
 + nop
 + j $BB0_76
 + nop
 +...
 +$BB0_76: ​                               # %float64_mul.exit
 +                                        #   in Loop: Header=BB0_1 Depth=1
 +</​code>​
 +
 +Why is bb5.i27.i split into so many different basic blocks?
 +
 +Does this represent the icmp? yes. If $19 < 1 then $5 = 1 else $5 = 0.
 +If $19 represents %84 then when $19=0 then $5=1. 
 +<​code>​
 + sltu $5, $19, 1
 +</​code>​
 +In make watch. %84 seems to be correct. When does %90=0 in gxemul?
 +
 +It's very hard to correlate the .s file to the final disassembled .src file.
 +
 +Can I use bugpoint to make this bug smaller?
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/06/06 12:15//
 +
 +Cool tool for calculating lines of code: sloccount
 +
 +Okay. There is a very strange bug:
 +<​code>​
 +int main () {
 +    volatile unsigned long long testing = 0x7FFFFFFFFFFFFFFFULL;​
 +    printf ("​testing=%016llx\n",​ testing);
 +}
 +</​code>​
 +
 +When I run make emulwatch:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​mips_bug$ make emulwatch
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​mips_bug$ diff -u lli.txt sim.txt ​
 +--- lli.txt ​    ​2011-06-03 15:​51:​39.000000000 -0400
 ++++ sim.txt ​    ​2011-06-03 15:​51:​40.000000000 -0400
 +@@ -1,2 +1,2 @@
 + ​main:​entry
 +-  %0=7fffffffffffffff
 ++  %0=ffffffffffffffff
 +</​code>​
 +
 +The gxemul emulator seems to sign extend the unsigned long long. 
 +What about if it's just a normal 32-bit long?
 +Okay. That matches fine. No sign extend problem.
 +Must be an issue with 64-bit integers. ​
 +Let's compare the .s assembly with the old version of LLVM.
 +Same problem... Wow. So this is a bug that hasn't been filed yet.
 +The issue must be with something else.
 +I'll file this bug right now.
 +
 +I think it's just treating an unsigned number as a signed number.
 +Victor mentioned a problem with the ldu instruction.
 +
 +This works fine:
 +<​code>​
 +    volatile unsigned long long testing = 0x7FFFFFFFULL;​
 +</​code>​
 +But this has the sign extend problem:
 +<​code>​
 +    volatile unsigned long long testing = 0xFFFFFFFFULL;​
 +</​code>​
 +The diff between the above two snippets:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​mips_bug$ diff -u bad.s good.s
 +--- bad.s       ​2011-06-03 16:​44:​36.000000000 -0400
 ++++ good.s ​     2011-06-03 16:​45:​03.000000000 -0400
 +@@ -18,8 +18,9 @@
 +        sw      $16, 20($sp)
 +        sw      $17, 16($sp)
 +        addiu   $2, $sp, 24
 ++       ​lui ​    $3, 32767
 +        ori     $2, $2, 4
 +-       ​addiu ​  $3, $zero, -1
 ++       ​ori ​    $3, $3, 65535
 +        sw      $3, 24($sp)
 +        sw      $zero, 0($2)
 +        lui     $3, %hi($.str)
 +</​code>​
 +
 +MIPS reference:
 +<​code>​
 +LUI -- Load upper immediate
 +    Description:​ The immediate value is shifted left 16 bits and stored in the register. The lower 16 bits are zeroes.
 +    Operation: $t = (imm << 16); advance_pc (4);
 +ADDIU -- Add immediate unsigned (no overflow)
 +    Description:​ Adds a register and a sign-extended immediate value and stores the result in a register
 +    Operation: $t = $s + imm; advance_pc (4);
 +</​code>​
 +
 +In the good case: 32767 << 16 + 65535 = 2147483647. Which is right.
 +But in the bad case: -1 sign extended is all ones. But if $3 is actually 64-bits
 +this will be wrong.
 +
 +Could it be a problem with the emulator not supporting 64-bit integers?
 +I doubt it.
 +
 +Looking on the LLVM release notes for 3.0:
 +<​code>​
 +Known problems with the MIPS back-end
 +     * 64-bit MIPS targets are not supported yet.
 +</​code>​
 +But shouldn'​t matter because Tiger MIPS is a 32-bit processor.
 +
 +When I run 'make tigerwatch'​ I get:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​mips_bug$ diff -u lli.txt sim.txt ​
 +--- lli.txt ​    ​2011-06-03 16:​50:​21.000000000 -0400
 ++++ sim.txt ​    ​2011-06-03 16:​50:​27.000000000 -0400
 +@@ -1,2 +1,2 @@
 + ​main:​entry
 +-  %0=ffffffff
 ++  %0=0
 +</​code>​
 +
 +How are 64-bit integers treated in a MIPS1 ISA? MIPS1 is a 32-bit ISA.
 +So actually "​addiu ​  $3, $zero, -1" should be correct. ​
 +
 +I'm going to try to step through the code in gxemul. The normal 'make emul' command:
 +<​code>​
 +gxemul -E testmips -e R3000 mips_bug.elf -p `../​../​tiger/​linux_tools/​lib/​../​find_ra mips_bug.emul.src`
 +</​code>​
 +find_ra finds the return address of the main() function so you know when to break the gxemul simulation.
 +In this case it returns: 0xffffffff80031400
 +When I look in mips_bug.emul.src I see:
 +<​code>​
 +80031400:​ 03e00008 jr ra
 +</​code>​
 +So you must have to pad with breakpoint address with 1's.
 +
 +There is a slight difference between the mips_bug.emul.src and the mips_bug.s file:
 +mips_bug.s file:
 +<​code>​
 + sw $17, 16($sp)
 + addiu $2, $sp, 24
 + ori $2, $2, 4
 + addiu $3, $zero, -1
 + sw $3, 24($sp)
 +</​code>​
 +The mips_bug.emul.src file:
 +<​code>​
 +8003138c:​ afb10010 sw s1,​16(sp)
 +80031390:​ 27a20018 addiu v0,​sp,​24
 +80031394:​ 34420004 ori v0,​v0,​0x4
 +80031398:​ 2403ffff li v1,-1
 +8003139c:​ afa30018 sw v1,​24(sp)
 +</​code>​
 +
 +The addiu became an li.
 +Lets step through:
 +<​code>​
 + ​gxemul -E testmips -e R3000 mips_bug.elf -p 0x80031394
 +</​code>​
 +
 +Seems like the registers are actually 64-bit in this machine...
 +<​code>​
 +GXemul> s
 +ffffffff80031394:​ 34420004 ​     ori     ​v0,​v0,​0x0004
 +GXemul> s
 +ffffffff80031398:​ 2403ffff ​     addiu   ​v1,​zr,​-1
 +GXemul> reg
 +cpu0:    pc = 0xffffffff8003139c ​   <​main+0x1c>​
 +...
 +cpu0:    v1 = 0xffffffffffffffff ​   s3 = 0x0000000000000000
 +</​code>​
 +
 +It seems like the gxemul is simulating a 64-bit little-endian machine:
 +<​code>​
 +GXemul> machine
 +serial nr: 1  (nr of NICs: 1)
 +memory: 32 MB
 +cpu0: 5KE, running
 +    64-bit Little-endian (MIPS64, revision 2), 48 TLB entries
 +    L1 I-cache: 32 KB, 32 bytes per line, 2-way
 +    L1 D-cache: 32 KB, 32 bytes per line, 2-way
 +</​code>​
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/06/03 12:15//
 +
 +Did I even apply victor'​s changes to lib/​Target/​Mips/​MipsRegisterInfo.cpp?​
 +
 +No I didn'​t. The MIPS backend code exactly matches the git version.
 +So I need to apply Victor'​s patches manually.
 +
 +Okay, a bunch of the stack code has been moved into a new file:
 +<​code>​
 +MipsFrameLowering.cpp
 +</​code>​
 +
 +Okay I've tried to reapply the patch.
 +Only dfmul is failing now.
 +
 +Seems to be some kind of sign extension problem?
 +In most cases gxemul seems to be sign extending while lli isn't.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/06/02 12:15//
 +
 +There is a CallInst function called getArgOperand() which I should be using.
 +
 +They finally fixed the llvm.vim syntax file.
 +
 +Okay. I fixed the uadd.overload.* intrinsic problem.
 +
 +Now I'm down to some gxemul errors for dfmul, llist, loopbug,
 +memset.
 +
 +Could this be caused by Victor'​s MIPS changes?
 +Maybe the MIPS backend has been fixed/​broken?​
 +
 +Looking into loopbug benchmark, from git commit:
 +<​code>​
 +commit 8cdf9e016927d9361144260ccaf87d74e58ebaa8
 +Author: Andrew Canis <​andrew.canis@gmail.com>​
 +Date:   Tue Aug 24 22:20:04 2010 -0400
 +
 +    Test case for LLVM MIPS backend bug.
 +    ​
 +    Expected:
 +    $ make
 +    $ lli loopbug.bc
 +    ​
 +    On MIPS (using gxemul emulator):
 +    $ make tiger
 +    $ make emul
 +</​code>​
 +
 +Just double checked that simple backup is working again. Looks good.
 +
 +Can I try this with the unmodified llvm version?
 +Here's the command that produces the mips assembly:
 +<​code>​
 +../​../​build/​bin/​llc loopbug.bc -march=mipsel -relocation-model=static -mips-ssection-threshold=0 -mcpu=mips1 -o loopbug.s
 +</​code>​
 +
 +I just installed the 2.9 binaries in ~/​downloads/​llvm-2.9-mingw32-i386
 +Same error with the newer version of llc.
 +
 +I probably incorporated the mips backend changes incorrectly.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/06/01 12:15//
 +
 +Whoops, noticed that simple backup wasn't working (isis has gone down).
 +
 +Mips, gsm fail:
 +<​code>​
 +# ** Error: gsm.v(1692):​ Module '​memset'​ is not defined.
 +# ** Error: mips.v(740):​ Module '​memset'​ is not defined.
 +</​code>​
 +This also causes gxemul to fail:
 +<​code>​
 +../​../​../​mips-binutils/​bin/​mipsel-elf-ld -T ../​../​../​tiger/​linux_tools/​lib/​prog_link.ld -e main gsm.o ../​../​../​tiger/​tool_source/​lib/​altera_avalon_performance_counter.o -o gsm.elf -EL -L ../​../​../​tiger/​linux_tools/​lib -lgcc -lfloat -luart
 +make[1]: Leaving directory `/​home/​acanis/​work/​legup/​examples/​chstone/​gsm'​
 +gsm.o: In function `main':​
 +(_main_section+0x118):​ undefined reference to `memset'​
 +</​code>​
 +
 +Looking at the memset test. It looks like the legup versions aren't being linked properly.
 +
 +<​code>​
 +../​../​build/​bin/​llvm-ld ​ memset.prelto.bc ../​lib/​llvm/​liblegup.a -b=memset.premodulo.bc
 +</​code>​
 +
 +First of all it seems like the intrinsic lowering pass is no longer working properly:
 +<​code>​
 +acanis@acanis-desktop:​~/​work/​legup/​examples/​memset$ diff -u memset.prelto.ll memset.premodulo.ll ​
 +...
 +-declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind
 +-
 +-declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
 +-
 + ​declare i8* @memcpy(i8*,​ i8*, i32)
 + 
 + ​declare i8* @memset(i8*,​ i32, i32)
 +</​code>​
 +There are still two instrinsic calls in there.
 +Rebuilding ../​lib/​llvm/​liblegup.a ​ doesn'​t help.
 +There is some sort of problem. Basically memset() is not being linked into memset.premodulo.ll
 +
 +So previously the prelto pass replaces:
 +<​code>​
 +  call void @llvm.memset.i64(i8* %arr_addr.04.1.i31,​ i8 0, i64 11, i32 1) nounwind
 +</​code>​
 +With:
 +<​code>​
 +  %16 = call i8* @legup_memset_1(i8* %arr_addr.04.1.i31,​ i8 0, i64 11) ; <i8*> [#uses=0]
 +</​code>​
 +The postfix "​_1"​ indicates a 1 byte alignment.
 +The type of the 3rd argument (length) is i64.
 +
 +I see in the LLVM manual for the SVN head (http://​llvm.org/​docs/​) that the function name has changed:
 +<​code>​
 +  declare void @llvm.memset.p0i8.i32(i8* <​dest>,​ i8 <​val>,​ i32 <​len>,​ i32 <​align>,​ i1 <​isvolatile>​)
 +  declare void @llvm.memset.p0i8.i64(i8* <​dest>,​ i8 <​val>,​ i64 <​len>,​ i32 <​align>,​ i1 <​isvolatile>​)
 +</​code>​
 +
 +It's pretty amazing how fast LLVM changes. We're at the 2.7 release and 3.0 is coming out soon.
 +The old 2.7 syntax (from http://​llvm.org/​releases/​2.7/​docs/​LangRef.html#​int_memset)
 +<​code>​
 +  declare void @llvm.memset.i8(i8 * <​dest>,​ i8 <​val>,​ i8 <​len>,​ i32 <​align>​)
 +  declare void @llvm.memset.i16(i8 * <​dest>,​ i8 <​val>,​ i16 <​len>,​ i32 <​align>​)
 +  declare void @llvm.memset.i32(i8 * <​dest>,​ i8 <​val>,​ i32 <​len>,​ i32 <​align>​)
 +  declare void @llvm.memset.i64(i8 * <​dest>,​ i8 <​val>,​ i64 <​len>,​ i32 <​align>​)
 +</​code>​
 +In release notes for 2.8:
 +<​code>​
 + The memcpy, memmove, and memset intrinsics now take address space qualified
 + ​pointers and a bit to indicate whether the transfer is "​volatile"​ or not.
 +</​code>​
 +
 +Our PreLTO seems to be failing. The instrinsics are successfully turned into memcpy calls
 +but those should then be turned into legup_memset_* calls.
 +
 +It's strange. The old version of the code doesn'​t lower anything. While the newer
 +version prints (in -debug mode):
 +<​code>​
 +Lowering: ​  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %carray1, i8* getelementptr inbounds ([12 x i8]* @C.17.1564, i32 0, i32 0), i32 12, i32 1, i1 false)
 +New instruction: ​  %0 = call i8* @memcpy(i8* %carray1, i8* getelementptr inbounds ([12 x i8]* @C.17.1564, i32 0, i32 0), i32 12)
 +</​code>​
 +
 +Wow really strange. After touching the file PreLTO.cpp I now get this error:
 +unknown instruction on intrinsic argument
 +<​code>​
 +UNREACHABLE executed at /​home/​acanis/​work/​legup/​llvm/​lib/​Transforms/​LegUp/​PreLTO.cpp:​164!
 +Stack dump:
 +0.      Program arguments: ../​../​build/​bin/​opt -load=../​../​build/​lib/​LLVMLegUp.so -legup-prelto
 +1.      Running pass '​Function Pass Manager'​ on module '<​stdin>'​.
 +2.      Running pass '​Pre-Link Time Optimization Pass to lower intrinsics'​ on function '​@main'​
 +/bin/bash: line 1: 15114 Aborted ​                ​../​../​build/​bin/​opt -load=../​../​build/​lib/​LLVMLegUp.so -legup-prelto < memset.prelto.linked.bc > memset.prelto.bc
 +</​code>​
 +Did the makefile not build this properly before?
 +
 +Okay, so the code isn't handling getelementptr'​s properly.
 +Actually I'm a little bit confused by the code. The legup_* prefix is determined by the destination pointer...
 +
 +<​code>​
 +  call void @llvm.memcpy.i32(i8* %carray1, i8* getelementptr inbounds ([12 x i8]* @C.17.1564, i32 0, i32 0), i32 12, i32 1)
 +  call void @llvm.memcpy.i32(i8* %sarray2, i8* bitcast ([12 x i16]* @C.18.1565 to i8*), i32 24, i32 2)
 +  call void @llvm.memcpy.i32(i8* %array3, i8* bitcast ([12 x i32]* @C.19.1566 to i8*), i32 48, i32 4)
 +  call void @llvm.memcpy.i32(i8* %larray4, i8* bitcast ([12 x i64]* @C.20.1567 to i8*), i32 96, i32 8)
 +</​code>​
 +
 +Turns into:
 +<​code>​
 +  %0 = call i8* @legup_memcpy_1(i8* %carray1, i8* getelementptr inbounds ([12 x i8]* @C.17.1564, i32 0, i32 0), i32 12) ; <i8*> [#uses=0]
 +  %1 = call i8* @legup_memcpy_2(i8* %sarray2, i8* bitcast ([12 x i16]* @C.18.1565 to i8*), i32 24) ; <i8*> [#uses=0]
 +  %2 = call i8* @legup_memcpy_4(i8* %array3, i8* bitcast ([12 x i32]* @C.19.1566 to i8*), i32 48) ; <i8*> [#uses=0]
 +  %3 = call i8* @legup_memcpy_8(i8* %larray4, i8* bitcast ([12 x i64]* @C.20.1567 to i8*), i32 96) ; <i8*> [#uses=0]
 +</​code>​
 +Why can't you just use the alignment parameter?
 +For instance:
 +<​code>​
 +Lowering for LegUp: ​  call void @llvm.memset.p0i8.i64(i8* %16, i8 0, i64 96, i32 8, i1 false)
 +</​code>​
 +The destination is:   %16 = bitcast [12 x i64]* %larray to i8*
 +Which points to an array of i64's so the alignment is calculated to be 8 (64/8).
 +  ​
 +Damn, I just got hit by the new API change again: There was an api change with
 +CallInst operand order. The function is now stored as the last operand instead
 +of the first.
 +
 +Okay that worked. Down to 16 failures. I have a couple of unexplained gxemul simulation errors...
 +
 +dfdiv, dfmul, dfsin, sha:
 +<​code>​
 +LLVM ERROR: Code generator does not support intrinsic function '​llvm.uadd.with.overflow.i64'​!
 +</​code>​
 +
 +The actual error comes from:
 +<​code>​
 +lib/​CodeGen/​IntrinsicLowering.cpp:​353: ​   report_fatal_error("​Code generator does not support intrinsic function '"​+
 +</​code>​
 +
 +From the LLVM docs:
 +<​code>​
 +The '​llvm.uadd.with.overflow'​ family of intrinsic functions perform an unsigned
 +addition of the two arguments, and indicate whether a carry occurred during the
 +unsigned summation.
 +</​code>​
 +
 +So I get code looking like:
 +<​code>​
 +  %uadd.i = call %0 @llvm.uadd.with.overflow.i64(i64 %105, i64 %106) nounwind
 +  %108 = extractvalue %0 %uadd.i, 0
 +  %109 = extractvalue %0 %uadd.i, 1
 +</​code>​
 +
 +Which could easily be converted to verilog:
 +{a, b} = c + d;
 +
 +But what's the best way of handling this?
 +I think the easiest way is to turn this into an i65 addition. And shift out the carry bit.
 +Quartus should easily optimize this to the correct hardware.
 +I'll just add this to the PreLTO pass.
 +
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/05/31 12:15//
 +
 +mips intrinsic error with new llvm version:
 +<​code>​
 +../​../​../​build/​bin/​opt:​ symbol lookup error: ../​../​../​build/​lib/​LLVMLegUp.so:​
 +undefined symbol: _ZN4llvm17IntrinsicLowering18LowerIntrinsicCallEPNS_8CallInst
 +</​code>​
 +This is a linker error I experienced previously. I fixed the autoconf makefile
 +flow, now I need to fix cmake.
 +
 +I need to include:
 +LDFLAGS = $(LLVM_OBJ_ROOT)/​lib/​CodeGen/​$(BuildMode)/​IntrinsicLowering.o ​
 +
 +How do I handle this in cmake?
 +
 +What does this mean?
 +<​code>​
 +add_llvm_loadable_module( LLVMLegUp
 +..
 +</​code>​
 +This creates the build/​lib/​LLVMLegUp.so shared library.
 +There are only a few other examples of this in the code.
 +This doesn'​t fix it:
 +<​code>​
 +add_dependencies(LLVMLegUp LLVMCodeGen)
 +</​code>​
 +I need to actually link the LLVMCodeGen library into the LLVMLegup.so library:
 +<​code>​
 +target_link_libraries(LLVMLegUp LLVMCodeGen)
 +</​code>​
 +Okay adding this to /​llvm/​lib/​Transforms/​LegUp/​CMakeLists.txt works.
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/05/30 15:05//
 +
 +There'​s an interesting discussion on the CBackend on the LLVM mailing list:
 +http://​lists.cs.uiuc.edu/​pipermail/​llvmdev/​2010-November/​036278.html
 +
 +Chris suggests a full rewrite if anyone wants to work on the CBackend:
 +<​code>​
 +If anyone was really interested in this, I'd strongly suggest a complete
 +rewrite of the C backend: make use the existing target independent code
 +generator code (for legalization etc) and then just put out a weird ".s file"
 +at the end.
 +-Chris
 +</​code>​
 +
 +
 +So I've finished iterative modulo scheduling for a simple example
 +with no recurrences:​
 +<​code>​
 +    int a[N], b[N], c[N];
 +    for (i = 0; i < N; i++) {
 +        a[i] = b[i] + c[i];
 +    }
 +    return a[N-1];
 +</​code>​
 +But it takes 989ns = ~500 cycles. The II=3 so it should take 300 cycles.
 +The prologue and epilogue both require 2 basic blocks.
 +
 +I need to fix the prologue to branch to the epilog depending on the loop bound.
 +For instance if N=1 then the kernel should be skipped.
 +
 +Is there an easy way to generate a gantt chart for the reservation table?
 +psTricks seems to have gantt chart generation for latex.
 +Okay found a good sty here: http://​www.martin-kumm.de/​tex_gantt_package.php
 +
 +Added a debug macro for legup. Use the option '​-debug-only=legup'​ to only show
 +debugging from LegUp.
 +
 +I don't understand how this is executing in legup in 1000ns/​2=~500 cycles.
 +This means the loop body only takes 5 cycles when it should take 6.
 +Seems like the getelementptr has been chained.
 +It's weird though, because I see it gets scheduled in separate states
 +at one point. Then gets chained in later. What is going on here?
 +The chaining happens somewhere between SchedulerASAP::​scheduleBasicBlock() and
 +SchedulerPass::​createFSMforBB()
 +
 +I'm noticing that the scheduler needs to be completely revamped. There
 +is tons of copy pasted code all over the place. For instance, looking
 +in SchedulerMapping::​createFSM() function, this looks like an exact
 +copy of the ASAP scheduler code. And the schedulerPass
 +has the exact same copied code too. Why does the DAG need it's own custom
 +asap code? And then this code is repeated again in simpleASAPScheduler.
 +
 +Okay so I think the bug was this code in SimpleASAPScheduler::​getSoonestStateRegUses():​
 +<​code>​
 +if (depIn->​getAsapDelay() + in->​getDelay() > InstructionNode::​getMaxDelay()) {
 +</​code>​
 +Should be this:
 +<​code>​
 +if (depIn->​getAsapDelay() >= InstructionNode::​getMaxDelay()
 +        || depIn->​getAsapDelay() + in->​getDelay() > InstructionNode::​getMaxDelay()) {
 +</​code>​
 +Basically, you look at the predecessors of the current instruction (depIn). If
 +they have an asapDelay that's equal or greater than the getMaxDelay then you
 +_must_ be in the next state. ​ Otherwise, _only_ if the asapDelay of the
 +predecessor + the delay of the current instruction is _greater_ than the maxDelay
 +would you need to be moved to the next state (there isn't enough room for you
 +to be in the current state with the predecessor).
 +
 +
 + --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2011/04/25 15:05//
 There'​s an interesting discussion on the CBackend on the LLVM mailing list: There'​s an interesting discussion on the CBackend on the LLVM mailing list:
 http://​lists.cs.uiuc.edu/​pipermail/​llvmdev/​2010-November/​036278.html http://​lists.cs.uiuc.edu/​pipermail/​llvmdev/​2010-November/​036278.html
Line 1297: Line 3699:
  --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2010/08/19 8:00//  --- //​[[andrew.canis@utoronto.ca|Andrew Canis]] 2010/08/19 8:00//
  
-To pretty-print cpp:+To pretty print cpp:
 <​code>​ <​code>​
 a2ps -o print.ps MetaScheduler.h MetaScheduler.cpp ConstraintScheduling.* Simple* LegUpSchedulerDAG.* Scheduler* a2ps -o print.ps MetaScheduler.h MetaScheduler.cpp ConstraintScheduling.* Simple* LegUpSchedulerDAG.* Scheduler*
andrew_s_log.txt · Last modified: 2011/10/18 17:07 by acanis