User Tools

Site Tools


meeting_minutes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
meeting_minutes [2010/08/10 15:49]
zhangvi1
meeting_minutes [2011/06/03 14:45] (current)
Line 1: Line 1:
 +====== June 3, 2011 ======
 +
 +June 3 Meeting minutes [BB] = put on the back burner
 +
 +**Topics:**
 +
 +  * Throughput driven scheduling.
 +      * Optimize latency*period
 +  * Profiling-driven scheduling.
 +  * Loop pipelining/​clever function inlining
 +  * SDC Implementation
 +  * User constrained scheduling (pragmas)
 +  * [BB] Partitioning program to software vs hardware
 +  * Binding ​
 +      * sharing functional units good only for large, possibly for smaller units chained together? ​
 +      * 4 LUT vs 6 LUT
 +      * collapse mux into functional unit
 +  * [BB] Other HW architectures
 +  * HLS <-> memory architecture interactions ​
 +      * pipelining
 +      * number of memory ports vs latencies
 +  * [BB] auto parallelising function calls
 +  * LLVM compiler passes
 +  * Clang
 +      * investigate pragmas
 +  * Speculative scheduling
 +
 +
 +**Outcomes:​**
 +
 +Andrew:
 +  * Loop pipelining
 +  * Loop unrolling
 +  * port clang
 +  * HLS <-> Memory architecture interactions.
 +Jason:
 +  * Unordered List ItemSDC implementation
 +Stefan:
 +  * Unordered List ItemBinding utility study
 +Kevin:
 +  * Unordered List ItemUnordered List ItemMemory profiling
 +
 +Leave LLVM passes till later
 +
 +Post GUI/​debugging framework as ECE design project (2 man team)
 +
 +
 +====== June 1, 2011 ======
 +__Research areas:
 +__
 +  - µP/​Accelerator Interface
 +  - Parallel µP/​Accelerators
 +**
 +1. µP/​Accelerator Interface
 +**
 +
 +DE4 Port
 +  * Contact Steve to set up meeting
 +
 +Ways of talking to RAM (multiport and multipump)
 +
 +Multiport: ​
 +  * Currently if two accelerators access the single port cache they block each other, so 2 ports should cause improvement. But this was not seen because off chip memory is slow, hence switch to DE4 which uses DDR2 
 +  * Extreme cases are 1 port (Avalon arbitration) vs. one port per accelerator. In between, could have e.g. 2 accelerators / port, saving area since # RAMs = # input x # output ports
 +  * We could offer priority to accelerators which are computation bound  requires memory access profiling
 +
 +Multipump / Multi-Clock Domain:
 +  * Less area, but the system must be clocked slower. Only practical to clock the system 2x (maybe 4x) slower
 +
 +Parallelization Schemes
 +  * James has implemented polling
 +
 +Partition Program
 +  * Allow the programmer to partition the program data
 +
 +Partition Data
 +  * DE4 has multiple banks, and so each accelerator can have data in a different bank
 +
 +Pre-Fetching
 +  * One FSM can fetch data and the other can perform computation
 +
 +Multiple Caches / customization of cache parameters
 +  * E.g. Intel has 2 L1 caches instead of a multi-port large L1 cache
 +  * When are dual L1 caches beneficial over a large L1 cache?
 +
 +Cache Size
 +  * Easy to change on DE4
 +
 +Memory Access Profiling ​
 +  * LLVM pass
 +  * Analyze Parallelism
 +
 +Priorities
 +  * DE4 Port
 +  * Combing multi-port and multi-pump with DE4. If this is not successful, examine multiple memories / cache size.
 +
 +Projects for Stefan and Kevin
 +  * Cache Simulator for memory access profiling: simulate CHStone designs in modelsim and save memory accesses (address being accessed and where it is placed in the cache) in a text file, then build cache simulator
 +  * Benchmarks: analyze the rest of CHStones and also e.g. tiled matrix multiply
 +
 +
 ====== August 10, 2010 ====== ====== August 10, 2010 ======
  
meeting_minutes.txt · Last modified: 2011/06/03 14:45 (external edit)