User Tools

Site Tools


using_arm_caches

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
using_arm_caches [2014/05/31 20:51]
bain
using_arm_caches [2014/06/11 12:40] (current)
bain
Line 7: Line 7:
 The following documents are useful references: The following documents are useful references:
  
-  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html]] +  * ARMv7-A Architecture Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.ddi0406c/​index.html|ARMv7 ARM]] 
-  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html]] +  * ARM Cortex-A9 Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0388i/​index.html|Cortex-A9 TRM]] 
-  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html]] +  * ARM Cortex-A9 MPCore Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0407i/​index.html|Cortex-A9 MPCore TRM]] 
-  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​index.jsp?topic=/​com.arm.doc.ddi0246e/​index.html]]+  * Level 2 Cache Controller (L2C-310) Technical Reference Manual [[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/index.html|L2C-310 TRM]]
  
 The ARM processor in the Cyclone V has both L1 and L2 caches. The ARM processor in the Cyclone V has both L1 and L2 caches.
Line 25: Line 25:
  
 =====  Enabling the MMU ===== =====  Enabling the MMU =====
-The MMU serves to translate ​virtual addresses used by the processor into physical addresses that correspond to actual memory locations. +The MMU translates ​virtual addresses used by the processor into physical addresses that correspond to actual memory locations. 
-It also controls the caching behaviour of different sections of the memory space.+It also controls the caching behaviour of and access to different sections of the memory space.
  
 Several steps are involved when turning on the MMU: Several steps are involved when turning on the MMU:
-  * Disable caches and branch predictor +== 1. Disable caches and branch predictor ​== 
-  * Invalidate instruction,​ data, and unified TLBsL1 instruction and data caches, and branch predictor array +  * Clear I, C, and Z bits in SCTLR using MRC/MCR instructions. 
-  ​* ​Set up translation table entries + 
-  * Set translation table control registers +== 2. Invalidate Everything == 
-  * Set domain access control register +It is important to invalidate caches, TLBs, etc. because once the MMU is turned on, and address translation begins, the cache entries, etc. will no longer be valid. 
-  * Enable the MMUL1 caches, and branch predictor+ 
 +The following steps should be taken: 
 +  * Invalidate instruction,​ data, and unified TLBs
 +  * Invalidate ​L1 instruction and data caches
 +  * Invalidate ​branch predictor array. 
 + 
 +== 3. Set up translation table entries ​== 
 +The Cortex-A9 MPCore processor allows for two levels of translation tables. 
 +For simplicity only Level 1 translation tables are used. 
 +A flat one-to-one mapping is used where virtual addresses are mapped to the same physical address. 
 +This is done using 1MB '​sections'​. 
 +Since the address space is 4GB, this requires 4096 translation table entries. 
 +The L1 translation table must be aligned 16kB aligned in the memory. 
 + 
 +  * Use 1MB sections 
 +  * Set TEX[2:0], C, and B bits to use outer and inner write-back for normal memory (SDRAM) 
 +  * Set TEX[2:0], C, and B bits to use non-shareable device memory for memory-mapped peripherals 
 +  * Set AP[1:0] bits to allow read and write access 
 +  * Set the domain to 0 (or whatever) 
 +  * Set the section base address of each table entry to point to the appropriate 1MB section of physical memory 
 + 
 +== 4. Set translation table control registers ​== 
 +  * Set Translation Table Base Control Register (TTBCR) to 0 so that TTBR0 is used 
 +  * Set TTBR0 to point to the L1 translation table, and to use inner and outer write-back, write-allocate cacheable memory for translation table walks 
 + 
 +== 5. Set domain access control register ​== 
 +  * Set DACR to client or master mode for the domain(s) you used in the translation table entries. 
 + 
 +== 6. Enable the MMU == 
 +  * Set M bit in SCTLR using MCR/MRC instructions. 
 + 
 +Once these steps are complete the L1 caches, and branch predictor ​can be turned on by setting the I, C, and Z bits in the SCTLR.
  
  
Line 40: Line 71:
 ===== Programming the L2 Cache Controller ===== ===== Programming the L2 Cache Controller =====
 The L2C-310 cache controller is controlled using memory mapped registers. The L2C-310 cache controller is controlled using memory mapped registers.
-For the Cyclone V SoC the base address ​for these registers is 0xFFFEF000.+For the Cyclone V SoC the base address ​of these registers is 0xFFFEF000.
 The register descriptions can be found in the L2C-310 TRM, linked above. The register descriptions can be found in the L2C-310 TRM, linked above.
  
-The following steps are to be taken to enable the L2 cache controller:​ +The following steps are taken to enable the L2 cache controller:​ 
-  ​Set the way size +  ​Set the way size 
-  ​Set the read, write, and hold delays for Tag RAM +  ​Set the read, write, and hold delays for Tag RAM 
-  ​Set the read, write, and hold delays for Data RAM +  ​Set the read, write, and hold delays for Data RAM 
-  ​Set the prefetching behaviour +  ​Set the prefetching behaviour 
-  ​Invalidate the cache +  ​Invalidate the cache 
-  ​Enable the L2C-310 cache controller+  ​Enable the L2C-310 cache controller 
 + 
 +The L2C-310 also includes event counting registers that can be used to monitor hit and miss rates, and events related to speculative reads and prefetching. 
 + 
 +Note: The I and C bits in the System Control Register (SCTLR) control caching at all levels. 
 +If the L2 cache is enabled, but the I and C bits are cleared, the processor cannot take advantage of the L2 cache.
  
  
Line 64: Line 100:
 Several other options area available as well. Several other options area available as well.
 Both the Cortex-A9 TRM and L2C-310 TRM (linked above) outline several optimizations for L2 memory accesses. Both the Cortex-A9 TRM and L2C-310 TRM (linked above) outline several optimizations for L2 memory accesses.
 +[[http://​infocenter.arm.com/​help/​topic/​com.arm.doc.ddi0246h/​CJACBHHB.html|Link]].
  
 The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor: The following features are available when using the L2C-310 cache controller with a Cortex-A9 MPCore processor:
Line 74: Line 111:
 These features were all tested during benchmarking;​ however, none of them seemed to offer any performance gains. These features were all tested during benchmarking;​ however, none of them seemed to offer any performance gains.
 Further investigation may may be required to make the best use of these options. Further investigation may may be required to make the best use of these options.
- 
  
  
Line 88: Line 124:
  
  
 +====== Source Code ======
 +This investigation resulted in two files: arm_cache.h and arm_cache.s that together provide functions to turn on the MMU and caches on the ARM Cortex-A9 MPCore in the Altera Cyclone V SoC.
 +
 +   * [[https://​www.dropbox.com/​s/​l2mzqzfq3089emj/​arm_cache.h | arm_cache.h]]
 +   * [[https://​www.dropbox.com/​s/​tq6y2yod3p26yui/​arm_cache.s | arm_cache.s]]
  
  
using_arm_caches.1401583897.txt.gz · Last modified: 2014/05/31 20:51 by bain