User Tools

Site Tools



  • In the supplied tutorial, the demo works by having the host processor write onto FPGA on-chip memory and read back from it using the DMA controller. (The FPGA does not have access to the processor's memory or a DRAM)
  • The commands for the transfer are written to the DMA controller
    • The DMA controller then reads from the Host's memory into FPGA on-chip memory
  • Find out whether data is completely being sent to the DMA controller at alt_up_pci_dma_go() – Yup

PCIe Speed & Correctness

  • Try the burst transfers with the SGDMA controller
  • Close timing at 250MHz (physical requirement)
  • Get rid of useless DMA since we're not using it (Get over the timing issues)
  • Get new component called modular SGDMA dispatcher, from Nios forums
    • Better performance (fmax) since it splits the command queues into read/write components (3 components total)
      • Command
      • Read Module
      • Write Module
  • Look at timing reports (make sure it closes timing)
    • Make sure the clocks are actually running at 250MHz
    • Add pipeline stages for QSYS interconnect (set it to 4 / MAX)
    • Width conversion might lower FMax (PCIe core has width of 64, make sure DMA is at 64 as well)
  • SGDMA gives us guaranteed ordering on read and write queues

PCIe Driver Improvements

  • Minimize copying of data by the CPU (Alex has sample driver code, sample projects)
    • Read LDD3 (device driver book Alex refers to)

Misc Things To Do

  • Need to add support for off-chip memory that the accelerators use?

Milestone 1: non-shared memory

  • We have manually gotten this to work: a simple design where hardware adds two numbers and returns the sum, and a one pixel mandelbrot calculation
  • “make pcie” generates the software and hardware needed. The manual steps are to:
    • Send the function arguments by calling pci driver commands
    • Add the LegUp-generated Avalon slave to our QSYS system,
    • Avalon master (DMA) initiates the hardware accelerator by passing argments and a start bit
    • Need to poll hardware accelerator status to ensure correctness (TODO)
    • We can also completely remove the master and use the host processor – Done

Milestone 2: shared memory

  • Initially, we thought about using a DMA into the host processor to read and write data
    • With some Googling, this requires writing a DMA driver for PCIe, and this probably requires modifications to the Kernel to allocate more space on bootup (To verify with Alex)
    • May not be feasible?
  • We may need to pass read/write messages to get the host processor to send data
    • How much to send in one transfer?
      • Need to make sure this doesn't go past the host program's stack frame, or this will cause a segfault…
      • Does LLVM knows array size, so use that with a maximum
      • Bare pointers should probably only send 1 word at max (ie pointer to global variable)
    • Writing data back will be easier because the host processor has DMA access to the FPGA
    • Lots of interesting trade-offs can be made here…
pcie_notes.txt · Last modified: 2012/11/04 20:53 by kammoona