User Tools

Site Tools


multi-cycle_enhancements

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
multi-cycle_enhancements [2014/07/18 22:21]
stefan
multi-cycle_enhancements [2014/07/18 22:30] (current)
stefan
Line 34: Line 34:
  
 The cycle counts from the EUC paper: The cycle counts from the EUC paper:
 +
 http://​www.legup.org:​9100/​builders/​linux_x86_stratix4/​builds/​968/​steps/​shell_11/​logs/​stdio http://​www.legup.org:​9100/​builders/​linux_x86_stratix4/​builds/​968/​steps/​shell_11/​logs/​stdio
  
 The cycles with all timing constraints off, and multi-cycling + SW profiling (SW profiling raises cycles, but still cycles are way down. Note most circuits get slower because I just turned off all constraints but some sped up) The cycles with all timing constraints off, and multi-cycling + SW profiling (SW profiling raises cycles, but still cycles are way down. Note most circuits get slower because I just turned off all constraints but some sped up)
 +
 http://​www.legup.org:​9100/​builders/​linux_x86_stratix4/​builds/​1018/​steps/​shell_11/​logs/​stdio http://​www.legup.org:​9100/​builders/​linux_x86_stratix4/​builds/​1018/​steps/​shell_11/​logs/​stdio
  
Line 42: Line 44:
 ====== Load Registers ====== ====== Load Registers ======
  
-See also +From [[Profiling-Driven Multi-Cycling]]:​  
 + 
 +//''​MULTI_CYCLE_DUPLICATE_LOAD_REG''​ will force each load from memory (local and global) to have a unique load register, so that it can hold the loaded value are feed multi-cycle paths. While this is not necessary for multi-cycling,​ not setting this will reduce the opportunities for multi-cycling and I have not recently tested without it.// 
 + 
 +Right now, without this option we won't have as many multi-cycle paths (and it might not even work). But we don't really need to pull out load registers for every load, just the ones which are scheduled close to other loads. If a load does not have other loads scheduled right after it, e.g. the next load is 3 cycles away, then this load can still feed multi-cycle paths of latency up to 3 or maybe 4 (since loads take 2 cycles to complete). This would save a lot of registers. It'​s ​also worth just trying with ''​MULTI_CYCLE_DUPLICATE_LOAD_REG''​ turned off since that might not make things worse and fix the problem. 
 + 
 +Also, as long as ''​MULTI_CYCLE_DUPLICATE_LOAD_REG''​ is on, local RAMs must have latency 2. That boosts fmax over latency 1 but of course hurts the cycle count. 
 + 
 +====== SW Profiling Enhancements ====== 
 + 
 +The SW profiling only pushes multi-cycle destinations in a BB (stores, instructions used across BB, function calls), but not PHIs or Loads. But loads can also be dests of MC paths on their address port, so maybe we should push those too. However, then any "​downstream"​ instruction from that load needs to be pushed twice as much.
  
 +For example, if we load a value, do some computation,​ then store it, then if we push the load (to dilate the schedule of whatever path the load terminated),​ we would need to push the store by twice as much in order to also dilate the path from the load to the store. One good idea to do this that Jason had was to just detect when we push loads, and then push everything scheduled after the load by 1 right away. I.e. do an incremental re-scheduling where when the load is pushed everything after it is pushed as well, and then when it's time for those things to be pushed, push them an additional cycle, etc.
  
-====== Pushing Loads to Later States ======+We might also benefit from "​pushing"​ PHIs. If a PHI is in an infrequent BB, then pushing it by 1 (or making its BB "​start"​ 1 cycle later) would dilate the schedule of every incoming multi-cycle path.
  
 ====== Other ====== ====== Other ======
multi-cycle_enhancements.txt · Last modified: 2014/07/18 22:30 by stefan