User Tools

Site Tools


survey

High-Level Synthesis Survey

AutoESL/Xilinx

People

  • Jason Cong - UCLA
  • Juanjo Noguera - Xilinx research
  • Stephen Neuendorffer - Xilinx research
  • Kees Vissers - Xilinx research
  • Zhiru Zhang - AutoESL/Xilinx
  • Bin Liu - PhD UCLA, AutoESL
  • Sven van Haastregt - PhD at Leiden University, Xilinx
  • Jesus Barba - Phd grad at UCLM, Spain
  • Chris Dick - DSP chief architect, Xilinx

Papers

  • 2011 - High-Level Synthesis for FPGAs: From prototyping to Deployment. J Cong, B Liu, S Neuendorffer… TCAD2011
  • 2011 - “J Noguera, S Neuendorffer, S Van Haastregt”. Implementation of sphere decoder for MIMO-OFDM on FPGAs using high-level synthesis tools. Integrated Circuits and …, 2011 - Springer

Implemented a DSP application using AutoPilot (version 2010.07.ft) and compared to a hand-written RTL version. Virtex-5, 225 Mhz. Sphere decoding is used for WiMAX mobile wireless networks. Started with a MATLAB reference implementation (from Dick 2010)

  1. Used Xilinx's system generator to create verilog
  2. Converted MATLAB to C++ and used HLS

Uses a systolic array. Input data rate is 1 input sample/clock. Clock cycles per channel = 64. Used C++ template classes for arbitrary precision integer types and template functions for parameterized blocks. HLS constraints for target FPGA family and target clock frequency. Pragmas for loop unrolling, and to specify which FPGA resource implements an array. System-C adaptors are generated to reuse the C++ testbenches to test the final RTL. Reference C++ code (derived from matlab) ~2000 lines. Fixed point. Refactoring

  1. macro-architecture: split code into functions representing h/w blocks that communicate with arrays. each function represents a pipeline state. arrays translated into ping-pong buffers to allow parallel execution.
  2. parameterization: use c++ template parameters. initiation interval can have a big impact on resource sharing (they present an example of this at the end of section 6)
  3. time division multiplexing: In pipelines with feedback loops registers cannot be inserted freely without introducing pipeline stalls - these recurrences (feedback loops) limit throughput. The inner loop had a 15 cycle recurrence, so c-slowing (or time division multiplexing) over 15 separate datasets accommodated the recurrence without any pipeline stalls. HLS reports the recurrences to the designer.
  4. FPGA optimizations: bit-width optimization (18-bit fixed point using c++ template classes). efficient use of DSP48 blocks - create a template parameterized function of a multiplication followed by a subtraction (can be mapped into a single DSP48)

pragmas:

  • ARRAY_STREAM: array corresponds to stream for dataflow computation instead of BRAMs.
  • ARRAY_PARTITION:
  • PIPELINE II = MM_II: pipeline the loop with initiation interval set by template parameter MM_II
  • LATENCY max=2: use a maximum of two cycles to schedule this function
  • INTERFACE ap_none port=return register: use a register for the return value

Fixed point datatypes are used: ap_int<18> All verification can be done in C - much faster than RTL. Final QoR: AutoESL vs RTL: LUTs +4%, Register -26%, DSP48s -15%, 18K BRAM -28%. Same Fmax and throughput.

CAIRN research group at IRISA, Rennes France

People

  • Steven Derrien: assistant professor at University of Rennes 1, France. Former student of P. Quinton
  • Sanjay Rajopadhye: now an associate Professor, Colorado State University. Previously at IRISA
  • Patrice Quinton: Prof at University of Rennes 1
  • Tanguy Risset: Professor at Insa-Lyon, France

Papers

  • High-Level Synthesis of Loops Using the Polyhedral Model. S Derrien, S Rajopadhye, P Quinton… - High-Level Synthesis, Springer 2008

Pico uses partial loop unrolling with software pipelining. Need a way to specify loop transformations (tiling, fusion, interchange) in the source code. Programming with ALPHA language based on the polyhedral model, using MMAlpha software. Mentions WraPit project. Loop nests are written as a recurrence equation. Future work: scheduling with a multi-dimensional time function to support loop tiling.

survey.txt · Last modified: 2012/01/05 18:21 by acanis