Synthesizing Mandelbrot over PCIe
Benchmark constraints for both pure SW and SW + Accelerator configurations:
Without any polling of the FPGA, we can calculate the 100th iteration of the mandelbrot set correctly (without the FPGA taking too long).
With polling, we can calculate the 1000th iteration and more of the mandelbrot set.
Profiling of LegUp generated hardware
Naive implementation:
generate_set_sw took 3 microseconds
generate_set_hw took 96 microseconds
generate_set_sw took 4 microseconds
generate_set_hw took 84 microseconds
generate_set_sw took 3 microseconds
generate_set_hw took 76 microseconds
generate_set_sw took 4 microseconds
generate_set_hw took 68 microseconds
Code optimization to take constant multiplication out of generate_set:
generate_set_sw took 1 microseconds
generate_set_hw took 72 microseconds
generate_set_sw took 1 microseconds
generate_set_hw took 64 microseconds
generate_set_sw took 1 microseconds
generate_set_hw took 66 microseconds
These results are for a single 1k x 1k image with 2k iterations with the psychedelic colour scheme.
Original: 2s
Multi-threaded (4+ threads): 1s
DMA interrupts
Original: 77s
16-threaded: 72s
DMA polling
Original: 48s
16-threaded: 48s