These are the current pattern sharing conditions. Many of them have not been varied and further experimentation may produce better results.
All of these can be changed by modifying Binding.cpp.
These settings were chosen because they are general and apply to all circuits, but a given circuit may show a further reduction in area with different settings.
The sections below explain some of these settings and how they can be modified. In particular, it may be worthwhile to investigate different bit-width thresholds (points 5 and 6) and sharing instructions even if their live intervals do not overlap (only if they are in different states).
While in Stratix IV sharing is beneficial for adders (and subtractors), Cyclone II single operator data showed that sharing adders may not improve area. This can be modified using the following two macros at the top of Binding.cpp:
#define SHARE_ADD 1 // nonzero shares, 0 disables sharing #define SHARE_SUB 1
The MinimizeBitwidth pass calculates minimum bit widths for each instruction. Two instructions are shared if the absolute value of the difference of their bit widths is less than a threshold defined using the BIT_DIFF_THRESHOLD macro at the top of Binding.cpp:
#define BIT_DIFF_THRESHOLD 10
Similarly this threshold can be set for predecessor bit widths and the minimum required width to share.
Currently, instructions with constant operands are not being shared. This can be disabled by removing the following lines from the function checkValidityForSharing() :
ConstantInt * ci = dyn_cast<ConstantInt>(I->getOperand(0)); if(ci) return false; ci = dyn_cast<ConstantInt>(I->getOperand(1)); if(ci) return false;
Certain optimizations can be performed on instructions which are not accounted for in the MinimizeBitwidth pass. For example, a 32-bit adder with only one fanout which feeds into a left-shift by 20 requires less logic than a 32-bit adder with all 32-bit output bits needed. But the bit width is not reduced by a left shift (output is still 32-bits, only the 20 least significant bits are 0) so the pass does not detect this.
Other examples than the shift left include AND, OR and XOR by a constant value.
Another interesting case occurs in blowfish, where there are many XOR instructions which have the following five successors:
With Stratix IV SDC scheduling and 7ns period constraint, the first 4 successors are in the same state as the XOR, while the successor XOR is in a different state. Therefore, the bit width of the predecessor XOR cannot be reduced because at least one of its successors requires all 32 bits. Yet it is observed that sharing these predecessor XORs greatly increases the number of registers in the synthesized circuit, perhaps because Quartus is able to perform optimizations on these instructions which are not being detected by the bitwidth minimization pass.
Such examples occur in other circuits as well and are difficult to detect.
One possible solution (implemented in Binding.cpp and commented out) is to ignore instructions whose successors or predecessors are bitwise operations or left shifts with constant operands, but this is very restrictive and reduces overall sharing.
Currently sharing is performed within functions only, however sharing across functions would increase the solution space for finding graphs and may further reduce overall area.
See http://en.wikipedia.org/wiki/Edmonds%27s_matching_algorithm.
Currently, a greedy algorithm and cost function are used to pair graphs of operations together (see Binding.cpp). However it is possible to change this problem into the problem of pairing nodes together in a weighted graph to minimize total cost, which is solved optimally using the Jack Edmond's Algorithm.