User Tools

Site Tools


using_arm_caches

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
using_arm_caches [2014/05/31 19:36]
bain
using_arm_caches [2014/06/11 12:40] (current)
bain
Line 1: Line 1:
 ====== Using ARM Caches ====== ====== Using ARM Caches ======
 +This page describes how to set up the MMU, L1 caches, and L2 cache on the Cortex-A9 MPCore processor found in the Cyclone V.
 +
  
-This page describes how to set up caching on the Cortex-A9 MPCore processor found in the Cyclone V. 
  
 ===== Introduction ===== ===== Introduction =====
 The following documents are useful references: The following documents are useful references:
  
-  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html]] +  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html|ARMv7 ARM]] 
-  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html]] +  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html|Cortex-A9 TRM]] 
-  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html]] +  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html|Cortex-A9 MPCore TRM]] 
-  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?topic=/​com.arm.doc.ddi0246e/index.html]]+  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/index.html|L2C-310 TRM]] 
 + 
 +The ARM processor in the Cyclone V has both L1 and L2 caches. 
 +The L1 cache is split into separate instruction and data caches and is controlled directly by the processor. 
 +The L2 cache is a unified cache and is controlled by the L2C-310 cache controller. 
 + 
 +The L1 instruction cache can be enabled using a single bit in the SCTLR register using MRC/MCR instructions. 
 +The L1 data cache can only be used when the memory management unit (MMU) is on. 
 +The L2 cache can be enabled by programming the L2C-310 controller using memory-mapped registers. 
 + 
 +To see how the processor performs in various configurations,​ see the benchmarking results at [[arm_chstone_benchmark_results|ARM Benchmark Results]]. 
 + 
 + 
 + 
 +=====  Enabling the MMU ===== 
 +The MMU translates virtual addresses used by the processor into physical addresses that correspond to actual memory locations. 
 +It also controls the caching behaviour of and access to different sections of the memory space. 
 + 
 +Several steps are involved when turning on the MMU: 
 +== 1. Disable caches and branch predictor == 
 +  * Clear I, C, and Z bits in SCTLR using MRC/MCR instructions. 
 + 
 +== 2. Invalidate Everything == 
 +It is important to invalidate caches, TLBs, etc. because once the MMU is turned on, and address translation begins, the cache entries, etc. will no longer be valid. 
 + 
 +The following steps should be taken: 
 +  * Invalidate instruction,​ data, and unified TLBs. 
 +  * Invalidate L1 instruction and data caches. 
 +  * Invalidate branch predictor array. 
 + 
 +== 3. Set up translation table entries == 
 +The Cortex-A9 MPCore processor allows for two levels of translation tables. 
 +For simplicity only Level 1 translation tables are used. 
 +A flat one-to-one mapping is used where virtual addresses are mapped to the same physical address. 
 +This is done using 1MB '​sections'​. 
 +Since the address space is 4GB, this requires 4096 translation table entries. 
 +The L1 translation table must be aligned 16kB aligned in the memory. 
 + 
 +  * Use 1MB sections 
 +  * Set TEX[2:0], C, and B bits to use outer and inner write-back for normal memory (SDRAM) 
 +  * Set TEX[2:0], C, and B bits to use non-shareable device memory for memory-mapped peripherals 
 +  * Set AP[1:0] bits to allow read and write access 
 +  * Set the domain to 0 (or whatever) 
 +  * Set the section base address of each table entry to point to the appropriate 1MB section of physical memory 
 + 
 +== 4. Set translation table control registers == 
 +  * Set Translation Table Base Control Register (TTBCR) to 0 so that TTBR0 is used 
 +  * Set TTBR0 to point to the L1 translation table, and to use inner and outer write-back, write-allocate cacheable memory for translation table walks 
 + 
 +== 5. Set domain access control register == 
 +  * Set DACR to client or master mode for the domain(s) you used in the translation table entries. 
 + 
 +== 6. Enable the MMU == 
 +  * Set M bit in SCTLR using MCR/MRC instructions. 
 + 
 +Once these steps are complete the L1 caches, and branch predictor can be turned on by setting the I, C, and Z bits in the SCTLR. 
 + 
 + 
 + 
 +===== Programming the L2 Cache Controller ===== 
 +The L2C-310 cache controller is controlled using memory mapped registers. 
 +For the Cyclone V SoC the base address of these registers is 0xFFFEF000. 
 +The register descriptions can be found in the L2C-310 TRM, linked above. 
 + 
 +The following steps are taken to enable the L2 cache controller:​ 
 +  - Set the way size 
 +  - Set the read, write, and hold delays for Tag RAM 
 +  - Set the read, write, and hold delays for Data RAM 
 +  - Set the prefetching behaviour 
 +  - Invalidate the cache 
 +  - Enable the L2C-310 cache controller 
 + 
 +The L2C-310 also includes event counting registers that can be used to monitor hit and miss rates, and events related to speculative reads and prefetching. 
 + 
 +Note: The I and C bits in the System Control Register (SCTLR) control caching at all levels. 
 +If the L2 cache is enabled, but the I and C bits are cleared, the processor cannot take advantage of the L2 cache. 
 + 
 + 
 + 
 +===== Memory Performance Optimizations ===== 
 +The following additional settings greatly enhance memory performance:​ 
 +  * Set memory region attributes in TTB entries to use inner write-back 
 +  * Use minimum stable L2C-310 read, write, and hold delays 
 +  * Enable L1 Data-side prefetch 
 + 
 + 
 + 
 +===== Special L2C-310 + Cortex-A9 MPCore Options ===== 
 +Several other options area available as well. 
 +Both the Cortex-A9 TRM and L2C-310 TRM (linked above) outline several optimizations for L2 memory accesses. 
 +[[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/CJACBHHB.html|Link]]. 
 + 
 +The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor:​ 
 +  * Exclusive caching 
 +  * L2 prefetch hints 
 +  * Early BRESP 
 +  * Allow L2C-310 to write a full line of zeros 
 +  * L2 cache speculative linefill (requires Snoop Control Unit to be enabled) 
 + 
 +These features were all tested during benchmarking;​ however, none of them seemed to offer any performance gains. 
 +Further investigation may may be required to make the best use of these options. 
 + 
 + 
 +===== Other Possible Optimizations ===== 
 +There are other options available that may further boost memory performance. 
 +These options have not yet been investigated. 
 + 
 +  * L2 cache entry lockdown 
 +  * Cache replacement policy 
 +  * L2 cache preloading 
 + 
 +Of these options, L2 cache preloading may offer the greatest benefits. 
 + 
 + 
 +====== Source Code ====== 
 +This investigation resulted in two files: arm_cache.h and arm_cache.s that together provide functions to turn on the MMU and caches on the ARM Cortex-A9 MPCore in the Altera Cyclone V SoC. 
 + 
 +   * [[https://​www.dropbox.com/​s/​l2mzqzfq3089emj/​arm_cache.h | arm_cache.h]] 
 +   * [[https://​www.dropbox.com/​s/​tq6y2yod3p26yui/​arm_cache.s | arm_cache.s]]
  
  
using_arm_caches.1401579390.txt.gz · Last modified: 2014/05/31 19:36 by bain