User Tools

Site Tools


victor_s_pcie_task_list

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

victor_s_pcie_task_list [2012/10/30 18:01]
bryce.long created
victor_s_pcie_task_list [2013/01/03 21:54] (current)
zhangvi1
Line 8: Line 8:
  
 This will compile the original c file and legup_pcie_wrappers.c (which you can modify yourself). This will compile the original c file and legup_pcie_wrappers.c (which you can modify yourself).
 +
 +====== Mandelbrot speedup beyond multiple accelerators =====
 +
 +Mandelbrot should hopefully exhibit a nearly linear speed-up by increasing the number of accelerators. The number of accelerators is limited by the number of DSPs on the FPGA. Although I don't think we can use all DSPs for the majority of clock cycles, we can try to get close and not run into memory bottlenecks.
 +
 +==== DSP usage ====
 +
 +We can try to turn on resource constraints for DSP usage and possibly multi-pumping.
 +
 +==== Memory bottleneck ====
 +
 +With ~50-200 accelerators,​ memory access will start to become a bottleneck. Here are some ideas:
 +
 +  * Turn on loop pipelining to decrease the number of accelerators needed, but now using multiple DSPs per accelerator
 +  * Split accelerators between both memory ports
 +  * Try LVT and/or multi-pumping (James should know this well)
 +  * The optimal will be if every two accelerators shared one dual-ported memory, though this won't be flexible enough to extend to other benchmarks, it is valid for mandelbrot with a fixed input size
victor_s_pcie_task_list.txt ยท Last modified: 2013/01/03 21:54 by zhangvi1