User Tools

Site Tools


using_arm_caches

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
using_arm_caches [2014/05/31 19:25]
bain created
using_arm_caches [2014/06/11 12:40] (current)
bain
Line 1: Line 1:
 ====== Using ARM Caches ====== ====== Using ARM Caches ======
 +This page describes how to set up the MMU, L1 caches, and L2 cache on the Cortex-A9 MPCore processor found in the Cyclone V.
 +
 +
 +
 +===== Introduction =====
 +The following documents are useful references:
 +
 +  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html|ARMv7 ARM]]
 +  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html|Cortex-A9 TRM]]
 +  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html|Cortex-A9 MPCore TRM]]
 +  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/​index.html|L2C-310 TRM]]
 +
 +The ARM processor in the Cyclone V has both L1 and L2 caches.
 +The L1 cache is split into separate instruction and data caches and is controlled directly by the processor.
 +The L2 cache is a unified cache and is controlled by the L2C-310 cache controller.
 +
 +The L1 instruction cache can be enabled using a single bit in the SCTLR register using MRC/MCR instructions.
 +The L1 data cache can only be used when the memory management unit (MMU) is on.
 +The L2 cache can be enabled by programming the L2C-310 controller using memory-mapped registers.
 +
 +To see how the processor performs in various configurations,​ see the benchmarking results at [[arm_chstone_benchmark_results|ARM Benchmark Results]].
 +
 +
 +
 +=====  Enabling the MMU =====
 +The MMU translates virtual addresses used by the processor into physical addresses that correspond to actual memory locations.
 +It also controls the caching behaviour of and access to different sections of the memory space.
 +
 +Several steps are involved when turning on the MMU:
 +== 1. Disable caches and branch predictor ==
 +  * Clear I, C, and Z bits in SCTLR using MRC/MCR instructions.
 +
 +== 2. Invalidate Everything ==
 +It is important to invalidate caches, TLBs, etc. because once the MMU is turned on, and address translation begins, the cache entries, etc. will no longer be valid.
 +
 +The following steps should be taken:
 +  * Invalidate instruction,​ data, and unified TLBs.
 +  * Invalidate L1 instruction and data caches.
 +  * Invalidate branch predictor array.
 +
 +== 3. Set up translation table entries ==
 +The Cortex-A9 MPCore processor allows for two levels of translation tables.
 +For simplicity only Level 1 translation tables are used.
 +A flat one-to-one mapping is used where virtual addresses are mapped to the same physical address.
 +This is done using 1MB '​sections'​.
 +Since the address space is 4GB, this requires 4096 translation table entries.
 +The L1 translation table must be aligned 16kB aligned in the memory.
 +
 +  * Use 1MB sections
 +  * Set TEX[2:0], C, and B bits to use outer and inner write-back for normal memory (SDRAM)
 +  * Set TEX[2:0], C, and B bits to use non-shareable device memory for memory-mapped peripherals
 +  * Set AP[1:0] bits to allow read and write access
 +  * Set the domain to 0 (or whatever)
 +  * Set the section base address of each table entry to point to the appropriate 1MB section of physical memory
 +
 +== 4. Set translation table control registers ==
 +  * Set Translation Table Base Control Register (TTBCR) to 0 so that TTBR0 is used
 +  * Set TTBR0 to point to the L1 translation table, and to use inner and outer write-back, write-allocate cacheable memory for translation table walks
 +
 +== 5. Set domain access control register ==
 +  * Set DACR to client or master mode for the domain(s) you used in the translation table entries.
 +
 +== 6. Enable the MMU ==
 +  * Set M bit in SCTLR using MCR/MRC instructions.
 +
 +Once these steps are complete the L1 caches, and branch predictor can be turned on by setting the I, C, and Z bits in the SCTLR.
 +
 +
 +
 +===== Programming the L2 Cache Controller =====
 +The L2C-310 cache controller is controlled using memory mapped registers.
 +For the Cyclone V SoC the base address of these registers is 0xFFFEF000.
 +The register descriptions can be found in the L2C-310 TRM, linked above.
 +
 +The following steps are taken to enable the L2 cache controller:
 +  - Set the way size
 +  - Set the read, write, and hold delays for Tag RAM
 +  - Set the read, write, and hold delays for Data RAM
 +  - Set the prefetching behaviour
 +  - Invalidate the cache
 +  - Enable the L2C-310 cache controller
 +
 +The L2C-310 also includes event counting registers that can be used to monitor hit and miss rates, and events related to speculative reads and prefetching.
 +
 +Note: The I and C bits in the System Control Register (SCTLR) control caching at all levels.
 +If the L2 cache is enabled, but the I and C bits are cleared, the processor cannot take advantage of the L2 cache.
 +
 +
 +
 +===== Memory Performance Optimizations =====
 +The following additional settings greatly enhance memory performance:​
 +  * Set memory region attributes in TTB entries to use inner write-back
 +  * Use minimum stable L2C-310 read, write, and hold delays
 +  * Enable L1 Data-side prefetch
 +
 +
 +
 +===== Special L2C-310 + Cortex-A9 MPCore Options =====
 +Several other options area available as well.
 +Both the Cortex-A9 TRM and L2C-310 TRM (linked above) outline several optimizations for L2 memory accesses.
 +[[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/​CJACBHHB.html|Link]].
 +
 +The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor:
 +  * Exclusive caching
 +  * L2 prefetch hints
 +  * Early BRESP
 +  * Allow L2C-310 to write a full line of zeros
 +  * L2 cache speculative linefill (requires Snoop Control Unit to be enabled)
 +
 +These features were all tested during benchmarking;​ however, none of them seemed to offer any performance gains.
 +Further investigation may may be required to make the best use of these options.
 +
 +
 +===== Other Possible Optimizations =====
 +There are other options available that may further boost memory performance.
 +These options have not yet been investigated.
 +
 +  * L2 cache entry lockdown
 +  * Cache replacement policy
 +  * L2 cache preloading
 +
 +Of these options, L2 cache preloading may offer the greatest benefits.
 +
 +
 +====== Source Code ======
 +This investigation resulted in two files: arm_cache.h and arm_cache.s that together provide functions to turn on the MMU and caches on the ARM Cortex-A9 MPCore in the Altera Cyclone V SoC.
 +
 +   * [[https://​www.dropbox.com/​s/​l2mzqzfq3089emj/​arm_cache.h | arm_cache.h]]
 +   * [[https://​www.dropbox.com/​s/​tq6y2yod3p26yui/​arm_cache.s | arm_cache.s]]
  
-This page describes how to set up caching on the Cortex A9 MPCore processor found in the Cyclone V. 
  
using_arm_caches.1401578742.txt.gz ยท Last modified: 2014/05/31 19:25 by bain