Introduction
LegUp supports multi-cycling, which replaces pipelined computations with multi-cycled computations. Additionally, software profiling (llvm-prof) can be run to obtain execution information for each basic block and add more multi-cycling for infrequently executed basic blocks.
LegUp 4.0 uses a new version of llvm which does not support llvm-prof, the profiling tool used for this work. To reproduce the Multicycle results therefore, follow these steps:
clone legup.git
git checkout aa959fd9d1099b7f7b4545726511ff63ce26513d
Turning on Multi-Cycling
To turn on multi-cycling, there are a few options which need to be set:
In Makefile.config, enable:
MULTICYCLE_CONSTRAINTS = 1
In legup.tcl, set the following:
set_parameter MULTI_CYCLE_REMOVE_REG 1
set_parameter MULTI_CYCLE_DUPLICATE_LOAD_REG 1
set_parameter MULTI_CYCLE_DISABLE_REG_MERGING 1
set_parameter MULTI_CYCLE_REMOVE_CMP_REG 1
Description:
MULTI_CYCLE_REMOVE_REG will de-pipeline data paths, and instead write the multi-cycle constraints to a file. This is done by make
. The constraints are then added to the project's .sdc file by make p
, if MULTICYCLE_CONSTRAINTS was enabled in Makefile.config. This is the only one of the 4 variables which needs to be set.
MULTI_CYCLE_DUPLICATE_LOAD_REG will force each load from memory (local and global) to have a unique load register, so that it can hold the loaded value are feed multi-cycle paths. See
Multi-Cycle Enhancements for for information. While this is not necessary for multi-cycling, not setting this will reduce the opportunities for multi-cycling and I have not recently tested without it. Note that it may give an error if ram latency is set to < 2 (e.g. if it's set to 1 for local rams), so make sure to set latencies to 2 instead of 1 (this will improve fmax but make latency worse)
MULTI_CYCLE_DISABLE_REG_MERGING was added because once I noticed that a register with a multi-cycle .sdc constraint was merged with another register by synthesis, and the constraint was lost. I recently did an experiment where this was turned off however and it had no negative side effects (one or two circuits saved ~50 registers, but mostly the same)
MULTI_CYCLE_REMOVE_CMP_REG was added to remove registers from icmp instructions during de-pipelining. The de-pipelining is usually handled without any “hacks” for every instruction but I couldn't figure out how to do it for compare instructions, so this is kind of a hack. I then also did the same thing for function arguments to remove drivers for their registers, so MULTI_CYCLE_REMOVE_CMP_REG now controls both these cases.
Extra options (do not turn on by default)
set_parameter MULTI_CYCLE_DEBUG 1
set_parameter MULTI_CYCLE_ADD_THROUGH_CONSTRAINTS 1
Description:
Turning on Software Profiling
Software Profling-Driven Multi-Cycling is enabled by setting the following:
In Makefile.config, enable:
LLVM_PROFILE = 1
This will run the profiler as part of make
.
In legup.tcl, set the following:
set_parameter LLVM_PROFILE 1
set_parameter LLVM_PROFILE_MAX_BB_FREQ_TO_ALTER 1
set_parameter LLVM_PROFILE_EXTRA_CYCLES 1
Description:
LLVM_PROFILE will take the profiling information generated by make
and fill a data structure in Allocation. It will then enable a re-scheduling phase to delay operators which terminate multi-cycle paths in all infrequently executed basic blocks.
LLVM_PROFILE_MAX_BB_FREQ_TO_ALTER sets the cutoff for what we consider to be an infrequently executed basic block. llvm-prof assigns each basic block a % of total executions in the overall program, and anything below this MAX is considered infrequent and will have its schedule modified.
LLVM_PROFILE_EXTRA_CYCLES determines how many extra cycles will be added to each path in an infrequent BB
Enhancements