pcie_notes
Observations
In the supplied tutorial, the demo works by having the host processor write onto FPGA on-chip memory and read back from it using the DMA controller. (The FPGA does not have access to the processor's memory or a DRAM)
The commands for the transfer are written to the DMA controller
Find out whether data is completely being sent to the DMA controller at alt_up_pci_dma_go() – Yup
PCIe Speed & Correctness
Try the burst transfers with the SGDMA controller
Close timing at 250MHz (physical requirement)
Get rid of useless DMA since we're not using it (Get over the timing issues)
Get new component called modular SGDMA dispatcher, from Nios forums
Look at timing reports (make sure it closes timing)
Make sure the clocks are actually running at 250MHz
Add pipeline stages for QSYS interconnect (set it to 4 / MAX)
Width conversion might lower FMax (PCIe core has width of 64, make sure DMA is at 64 as well)
SGDMA gives us guaranteed ordering on read and write queues
PCIe Driver Improvements
Misc Things To Do
Milestone 1: non-shared memory
We have manually gotten this to work: a simple design where hardware adds two numbers and returns the sum, and a one pixel mandelbrot calculation
“make pcie” generates the software and hardware needed. The manual steps are to:
Send the function arguments by calling pci driver commands
Add the LegUp-generated Avalon slave to our QSYS system,
Avalon master (DMA) initiates the hardware accelerator by passing argments and a start bit
Need to poll hardware accelerator status to ensure correctness (TODO)
IDEAS
Milestone 2: shared memory
Initially, we thought about using a DMA into the host processor to read and write data
We may need to pass read/write messages to get the host processor to send data
How much to send in one transfer?
Need to make sure this doesn't go past the host program's stack frame, or this will cause a segfault…
Does LLVM knows array size, so use that with a maximum
Bare pointers should probably only send 1 word at max (ie pointer to global variable)
Writing data back will be easier because the host processor has DMA access to the FPGA
Lots of interesting trade-offs can be made here…
pcie_notes.txt · Last modified: 2012/11/04 20:53 by kammoona