User Tools

Site Tools


subleq_and_subleqrev_on_cyclone_v

LegUp HLS implementation of SUBLEQ and SUBLEQREV computers

janders; 2 November 2015; 24 November 2015; 24 December 2015

Experimental settings:

  • Quartus 15
  • LegUp head branch pulled November 2015 (4.0+ with improvements to bitwidth minimization made by Julie)
  • No “false” path settings in .sdc file
  • Cyclone V, 28nm FPGA, logic elements based on ALMS (fracturable 6-input LUTs)
  • Loop pipelining ON for subleq processors
  • Area reports below are solely for SUBLEQ/SUBLEQREV machines. Reporting ALMs NEEDED (ALTR tools also report ALMs used in final placement, which is a larger #).
  • Performance reports below are for entire system (including memory)
  • Power measurements reflect the SUBLEQ/SUBLEQREV machines ONLY (no memory); 15% toggle rate for all signals; 50 MHz clock rate

Scenarios considered:

  • Single-cycle memory access
  • Dual-cycle memory access

Key findings:

  • II = 3 with single cycle memory access (can start a new subleq(rev) instruction every 3 cycles)
  • II = 5 with dual-cycle memory access (can start a new subleq(rev) instruction every 5 cycles)
  • The above hold for both subleq and subleq rev
  • NO latency difference in cycles between subleq and subleqrev computers
  • Need to make minor code changes for subleqrev implementation to allow loop pipelining to work

SUBLEQ

1 cycle memory

  • 80.71 MHz (120.61 MHz if paths to memory are ignored)
  • 156.8 ALMs needed (from ALTR report)
  • 1.433 mW
  • 28.66 pJ / cycle = ~86 pJ / instruction

2 cycle memory

  • 181.06 MHz (223.76 MHz if paths to memory are ignored)
  • 148.8 ALMs needed (from ALTR report)
  • 1.352 mW
  • 27.04 pJ / cycle = ~135 pJ / instruction

SUBLEQREV

1 cycle memory

  • 73.86 MHz (110.61 MHz if paths to memory are ignored)
  • 175.9 ALMs needed (from ALTR report)
  • 1.529 mW
  • 30.58 pJ / cycle = ~91 pJ / instruction

2 cycle memory

  • 172.06 MHz (175.84 MHz if paths to memory are ignored)
  • 157.5 ALMs needed (from ALTR report)
  • 1.331 mW
  • 26 pJ / cycle = ~130 pJ / instruction

Implementation results for the Tiger MIPS

Experimental settings:

  • Same Quartus version and device settings as above.
  • Tiger MIPS is implemented WITHOUT the divider units (janders: should I add it back? we had done this for the EUC paper, because at that time, we didn't support the MIPS division instruction)
  • Only look at area/power consumed within the Tiger MIPS core (not including system and cache)
  • Performance measurements (MHz) are for JUST the MIPS – no memory
  • Power measurements assume a 15% toggle rate; 50MHz clock frequency

Tiger MIPS:

  • Area: 1737 ALMs needed, 6 DSP blocks (see Wong, Rose, Betz for tile-area ratio between DSP tiles and LAB tiles in Altera)
  • 112.03 MHz
  • 23.99 mW
  • ~479.8 pJ / cycle. We would expect that for MIPS architecture, IPC is close to 1.
subleq_and_subleqrev_on_cyclone_v.txt · Last modified: 2015/12/24 19:59 by janders