User Tools

Site Tools


using_arm_caches

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
using_arm_caches [2014/05/31 20:39]
bain
using_arm_caches [2014/06/11 12:40] (current)
bain
Line 1: Line 1:
 ====== Using ARM Caches ====== ====== Using ARM Caches ======
 +This page describes how to set up the MMU, L1 caches, and L2 cache on the Cortex-A9 MPCore processor found in the Cyclone V.
 +
  
-This page describes how to set up caching on the Cortex-A9 MPCore processor found in the Cyclone V. 
  
 ===== Introduction ===== ===== Introduction =====
 The following documents are useful references: The following documents are useful references:
  
-  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html]] +  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html|ARMv7 ARM]] 
-  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html]] +  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html|Cortex-A9 TRM]] 
-  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html]] +  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html|Cortex-A9 MPCore TRM]] 
-  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?topic=/​com.arm.doc.ddi0246e/​index.html]]+  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/index.html|L2C-310 TRM]]
  
 The ARM processor in the Cyclone V has both L1 and L2 caches. The ARM processor in the Cyclone V has both L1 and L2 caches.
Line 21: Line 22:
 To see how the processor performs in various configurations,​ see the benchmarking results at [[arm_chstone_benchmark_results|ARM Benchmark Results]]. To see how the processor performs in various configurations,​ see the benchmarking results at [[arm_chstone_benchmark_results|ARM Benchmark Results]].
  
-=====  Enabling the MMU ===== 
-The MMU serves to translate virtual addresses used by the processor into physical addresses that correspond to actual memory locations. 
-It also controls the caching behaviour of different sections of the memory space. 
  
 +
 +=====  Enabling the MMU =====
 +The MMU translates virtual addresses used by the processor into physical addresses that correspond to actual memory locations.
 +It also controls the caching behaviour of and access to different sections of the memory space.
  
 Several steps are involved when turning on the MMU: Several steps are involved when turning on the MMU:
-  * Disable caches and branch predictor +== 1. Disable caches and branch predictor ​== 
-  * Invalidate instruction,​ data, and unified TLBsL1 instruction and data caches, and branch predictor array +  * Clear I, C, and Z bits in SCTLR using MRC/MCR instructions. 
-  ​* ​Set up translation table entries + 
-  * Set translation table control registers +== 2. Invalidate Everything == 
-  * Set domain access control register +It is important to invalidate caches, TLBs, etc. because once the MMU is turned on, and address translation begins, the cache entries, etc. will no longer be valid. 
-  * Enable the MMU+ 
 +The following steps should be taken: 
 +  * Invalidate instruction,​ data, and unified TLBs
 +  * Invalidate ​L1 instruction and data caches
 +  * Invalidate ​branch predictor array. 
 + 
 +== 3. Set up translation table entries ​== 
 +The Cortex-A9 MPCore processor allows for two levels of translation tables. 
 +For simplicity only Level 1 translation tables are used. 
 +A flat one-to-one mapping is used where virtual addresses are mapped to the same physical address. 
 +This is done using 1MB '​sections'​. 
 +Since the address space is 4GB, this requires 4096 translation table entries. 
 +The L1 translation table must be aligned 16kB aligned in the memory. 
 + 
 +  * Use 1MB sections 
 +  * Set TEX[2:0], C, and B bits to use outer and inner write-back for normal memory (SDRAM) 
 +  * Set TEX[2:0], C, and B bits to use non-shareable device memory for memory-mapped peripherals 
 +  * Set AP[1:0] bits to allow read and write access 
 +  * Set the domain to 0 (or whatever) 
 +  * Set the section base address of each table entry to point to the appropriate 1MB section of physical memory 
 + 
 +== 4. Set translation table control registers ​== 
 +  * Set Translation Table Base Control Register (TTBCR) to 0 so that TTBR0 is used 
 +  * Set TTBR0 to point to the L1 translation table, and to use inner and outer write-back, write-allocate cacheable memory for translation table walks 
 + 
 +== 5. Set domain access control register ​== 
 +  * Set DACR to client or master mode for the domain(s) you used in the translation table entries. 
 + 
 +== 6. Enable the MMU == 
 +  * Set M bit in SCTLR using MCR/MRC instructions. 
 + 
 +Once these steps are complete the L1 caches, and branch predictor can be turned on by setting the I, C, and Z bits in the SCTLR.
  
  
Line 38: Line 71:
 ===== Programming the L2 Cache Controller ===== ===== Programming the L2 Cache Controller =====
 The L2C-310 cache controller is controlled using memory mapped registers. The L2C-310 cache controller is controlled using memory mapped registers.
-For the Cyclone V SoC the base address ​for these registers is 0xFFFEF000.+For the Cyclone V SoC the base address ​of these registers is 0xFFFEF000
 +The register descriptions can be found in the L2C-310 TRM, linked above.
  
-The following steps are to be taken to enable the L2 cache controller:​ +The following steps are taken to enable the L2 cache controller:​ 
-  ​Set the Way size +  ​Set the way size 
-  ​Set the read, write, and hold delays for Tag RAM +  ​Set the read, write, and hold delays for Tag RAM 
-  ​Set the read, write, and hold delays for Data RAM +  ​Set the read, write, and hold delays for Data RAM 
-  ​Set the prefetching behaviour +  ​Set the prefetching behaviour 
-  ​Invalidate the cache +  ​Invalidate the cache 
-  ​Enable the L2C-310 cache controller+  ​Enable the L2C-310 cache controller
  
 +The L2C-310 also includes event counting registers that can be used to monitor hit and miss rates, and events related to speculative reads and prefetching.
 +
 +Note: The I and C bits in the System Control Register (SCTLR) control caching at all levels.
 +If the L2 cache is enabled, but the I and C bits are cleared, the processor cannot take advantage of the L2 cache.
  
  
  
 ===== Memory Performance Optimizations ===== ===== Memory Performance Optimizations =====
-The following settings ​can greatly enhance memory performance:​ +The following ​additional ​settings greatly enhance memory performance:​
-  * Enable L1 Data-side prefetch+
   * Set memory region attributes in TTB entries to use inner write-back   * Set memory region attributes in TTB entries to use inner write-back
   * Use minimum stable L2C-310 read, write, and hold delays   * Use minimum stable L2C-310 read, write, and hold delays
 +  * Enable L1 Data-side prefetch
  
  
-===== Special L2C-310 Options =====+ 
 +===== Special L2C-310 ​+ Cortex-A9 MPCore ​Options =====
 Several other options area available as well. Several other options area available as well.
 +Both the Cortex-A9 TRM and L2C-310 TRM (linked above) outline several optimizations for L2 memory accesses.
 +[[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/​CJACBHHB.html|Link]].
  
 The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor: The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor:
Line 66: Line 107:
   * Early BRESP   * Early BRESP
   * Allow L2C-310 to write a full line of zeros   * Allow L2C-310 to write a full line of zeros
-  * L2 cache speculative linefill (requires Snoop Control Unit)+  * L2 cache speculative linefill (requires Snoop Control Unit to be enabled)
  
 These features were all tested during benchmarking;​ however, none of them seemed to offer any performance gains. These features were all tested during benchmarking;​ however, none of them seemed to offer any performance gains.
-Further investigation may prove that +Further investigation may may be required to make the best use of these options. 
  
 ===== Other Possible Optimizations ===== ===== Other Possible Optimizations =====
Line 82: Line 124:
  
  
 +====== Source Code ======
 +This investigation resulted in two files: arm_cache.h and arm_cache.s that together provide functions to turn on the MMU and caches on the ARM Cortex-A9 MPCore in the Altera Cyclone V SoC.
 +
 +   * [[https://​www.dropbox.com/​s/​l2mzqzfq3089emj/​arm_cache.h | arm_cache.h]]
 +   * [[https://​www.dropbox.com/​s/​tq6y2yod3p26yui/​arm_cache.s | arm_cache.s]]
  
  
using_arm_caches.1401583165.txt.gz · Last modified: 2014/05/31 20:39 by bain