pattern_sharing_optimizations

These are the current pattern sharing conditions. Many of them have not been varied and further experimentation may produce better results.

- Sharing is done for adders (and subtractors), bitwise operations (AND, OR, XOR) and barrel shifts
- Instructions are not shared if either of their predecessors is a constant
- All instructions in a pattern (except for the root) cannot have fanouts outside the pattern
- Root instructions of computational patterns do not need to be connected to a register
- Sharing is disabled for instructions with bit-widths or predecessor bit-widths of size 1
- Two instructions are shared only if the difference in their bit widths is < = 15 or if the difference in bit width of either of their corresponding predecessors < = 30 (round numbers chosen)
- Two instructions whose live intervals overlap are never shared
- Two instructions will only be shared if both or neither have output registers

All of these can be changed by modifying Binding.cpp.

These settings were chosen because they are general and apply to all circuits, but a given circuit may show a further reduction in area with different settings.

The sections below explain some of these settings and how they can be modified. In particular, it may be worthwhile to investigate different bit-width thresholds (points 5 and 6) and sharing instructions even if their live intervals do not overlap (only if they are in different states).

While in Stratix IV sharing is beneficial for adders (and subtractors), Cyclone II single operator data showed that sharing adders may not improve area. This can be modified using the following two macros at the top of Binding.cpp:

#define SHARE_ADD 1 // nonzero shares, 0 disables sharing #define SHARE_SUB 1

The MinimizeBitwidth pass calculates minimum bit widths for each instruction. Two instructions are shared if the absolute value of the difference of their bit widths is less than a threshold defined using the BIT_DIFF_THRESHOLD macro at the top of Binding.cpp:

#define BIT_DIFF_THRESHOLD 10

Similarly this threshold can be set for predecessor bit widths and the minimum required width to share.

Currently, instructions with constant operands are not being shared. This can be disabled by removing the following lines from the function checkValidityForSharing() :

ConstantInt * ci = dyn_cast<ConstantInt>(I->getOperand(0)); if(ci) return false; ci = dyn_cast<ConstantInt>(I->getOperand(1)); if(ci) return false;

Certain optimizations can be performed on instructions which are not accounted for in the MinimizeBitwidth pass. For example, a 32-bit adder with only one fanout which feeds into a left-shift by 20 requires less logic than a 32-bit adder with all 32-bit output bits needed. But the bit width is not reduced by a left shift (output is still 32-bits, only the 20 least significant bits are 0) so the pass does not detect this.

Other examples than the shift left include AND, OR and XOR by a constant value.

Another interesting case occurs in blowfish, where there are many XOR instructions which have the following five successors:

- Three shifts with constand inputs
- An AND by a constant input, feeding an OR with constant input
- An XOR with non-constant input

With Stratix IV SDC scheduling and 7ns period constraint, the first 4 successors are in the same state as the XOR, while the successor XOR is in a different state. Therefore, the bit width of the predecessor XOR cannot be reduced because at least one of its successors requires all 32 bits. Yet it is observed that sharing these predecessor XORs greatly increases the number of registers in the synthesized circuit, perhaps because Quartus is able to perform optimizations on these instructions which are not being detected by the bitwidth minimization pass.

Such examples occur in other circuits as well and are difficult to detect.

One possible solution (implemented in Binding.cpp and commented out) is to ignore instructions whose successors or predecessors are bitwise operations or left shifts with constant operands, but this is very restrictive and reduces overall sharing.

Currently sharing is performed within functions only, however sharing across functions would increase the solution space for finding graphs and may further reduce overall area.

See http://en.wikipedia.org/wiki/Edmonds%27s_matching_algorithm.

Currently, a greedy algorithm and cost function are used to pair graphs of operations together (see Binding.cpp). However it is possible to change this problem into the problem of pairing nodes together in a weighted graph to minimize total cost, which is solved optimally using the Jack Edmond's Algorithm.

pattern_sharing_optimizations.txt · Last modified: 2011/09/25 21:49 by stefan