This page describes how to set up the MMU, L1 caches, and L2 cache on the Cortex-A9 MPCore processor found in the Cyclone V.
The following documents are useful references:
The ARM processor in the Cyclone V has both L1 and L2 caches. The L1 cache is split into separate instruction and data caches and is controlled directly by the processor. The L2 cache is a unified cache and is controlled by the L2C-310 cache controller.
The L1 instruction cache can be enabled using a single bit in the SCTLR register using MRC/MCR instructions. The L1 data cache can only be used when the memory management unit (MMU) is on. The L2 cache can be enabled by programming the L2C-310 controller using memory-mapped registers.
To see how the processor performs in various configurations, see the benchmarking results at ARM Benchmark Results.
The MMU translates virtual addresses used by the processor into physical addresses that correspond to actual memory locations. It also controls the caching behaviour of and access to different sections of the memory space.
Several steps are involved when turning on the MMU:
It is important to invalidate caches, TLBs, etc. because once the MMU is turned on, and address translation begins, the cache entries, etc. will no longer be valid.
The following steps should be taken:
The Cortex-A9 MPCore processor allows for two levels of translation tables. For simplicity only Level 1 translation tables are used. A flat one-to-one mapping is used where virtual addresses are mapped to the same physical address. This is done using 1MB 'sections'. Since the address space is 4GB, this requires 4096 translation table entries. The L1 translation table must be aligned 16kB aligned in the memory.
Once these steps are complete the L1 caches, and branch predictor can be turned on by setting the I, C, and Z bits in the SCTLR.
The L2C-310 cache controller is controlled using memory mapped registers. For the Cyclone V SoC the base address of these registers is 0xFFFEF000. The register descriptions can be found in the L2C-310 TRM, linked above.
The following steps are taken to enable the L2 cache controller:
The L2C-310 also includes event counting registers that can be used to monitor hit and miss rates, and events related to speculative reads and prefetching.
Note: The I and C bits in the System Control Register (SCTLR) control caching at all levels. If the L2 cache is enabled, but the I and C bits are cleared, the processor cannot take advantage of the L2 cache.
The following additional settings greatly enhance memory performance:
Several other options area available as well. Both the Cortex-A9 TRM and L2C-310 TRM (linked above) outline several optimizations for L2 memory accesses. Link.
The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor:
These features were all tested during benchmarking; however, none of them seemed to offer any performance gains. Further investigation may may be required to make the best use of these options.
There are other options available that may further boost memory performance. These options have not yet been investigated.
Of these options, L2 cache preloading may offer the greatest benefits.
This investigation resulted in two files: arm_cache.h and arm_cache.s that together provide functions to turn on the MMU and caches on the ARM Cortex-A9 MPCore in the Altera Cyclone V SoC.