User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

cache_simulator_readme [2011/08/26 16:40]
kev.nam created
cache_simulator_readme [2011/08/26 16:44] (current)
Line 1: Line 1:
-Memory access profiler by Kevin Nam+Cache Simulator README  
 +By Kevin Nam
-This program was written for the LegUp HLS Suite. It checks whether a function is memory-independent with other +This program was written for the LegUp HLS Suite to simulate ​the cache hit/miss rates for different cache configurations,​ 
-functions or other invocations of itself, ​to see if the function can be parallelized.+given a program'​s memory access stream.
-To do this, it takes the instruction trace of the program we want to analyze. Then it builds a hash table of +## COMPILING ##
-all the accessed memory addresses. For each address, we build a list of all the functions that access it.+
-Once we have this table, we go through each address and check if there are memory dependencies.+Just do "​make"​
 +Input file containing the stream of accessed addresses is required. This file must contain a 32bit address in each line (in HEX).
-The program requires glib. Get it with+Example of an input file:
 <​code>​ <​code>​
-apt-get install libglib2.0-dev ​+00801ee5 
 </​code>​ </​code>​
-Then simply make.+The program has the following commandline arguments
-## USAGE ##+    specify file with memory access addresses:​ 
 +    -file <​filename>​
-First get the instruction trace using GXEMULDue to problems with how gxemul'​s instruction trace was outputted (printfs),​ +    specify cache sizeDefault = 8KB
-GXEMUL code was modified to write the output to a file rather than print to terminalReplace "​"​ with the modified +    ​-cachesize <​kilobytes>​
-one, in /​src/​old_main/​. Then build GXEMUL. Now the instruction trace will be saved to a file called "​gxemuldump"​+
-Once you have this file, you can run the profiler:+    specify line size. Default = 16 bytes 
 +    -linesize <​bytes>​
-./​mem_access_profiler <​instruction trace file> <access threshhold ​(integer)<function name>+    specify ways/setDefault = 1 (direct mapped)
 +    -ways <num>
-argv[1] ​is the instruction trace file+    specify replacement policy. Default = LRU. 
 + for direct mapped caches, this argument ​is irrelavent. 
 +    -replacementpolicy <​policy>​ 
 +        options: LRU, NMRU (random but not MRU), random
-argv[2] is the number ​of accesses above which the address will be printed out. +    specify how many lines of cache ahead to prefetchDefault = 0Prefetches on missesONLY WORKS FOR LRU POLICY. 
-ex if you choose 10, then all the addresses where more than 10 different functions access it will be printed outThis is useful if you want to find what variable(s) might be causing the memory dependenciesOnce you have the address, you can check the .src file of the program ​to find the variable name.+    -prefetch <​num>​ 
 +    append ​the result ​to a csv file 
 +    -savecsv <csv filename>​
-argv[3] is the name of the function you want to analyze+    turn on quiet mode (no warning messages) 
 +    -q
-Once the profiler finished running, the results will be saved to text file.+An example of valid usage:
-The text file might look like this: +./cache_sim -file example_access_stream ​-cachesize 8 -ways 1 -linesize 16 -replacementpolicy LRU -prefetch ​0 -savecsv results.csv
-<​code>​ +
-------------- Checking for function: logscl ------------- +
-stack conflict with other: ​0 +
-stack conflict with self: 0 +
-heap conflict with other: 0 +
-heap conflict with self: 0 +
-heap dependency conflict with self: 0 +
-stack dependency conflict with self: 0 +
------------------------ DONE FOR: logscl -----------------------+
-Invocation count of this function: 100 +The above will simulate ​the accesses contained in the file "​example_access_stream"​ on a 8KB cache with 16B line size. 
-</​code>​ +This will use a direct mapped cache (ways = 1)the replacement policy will be ignored, use no prefetching,​ and will 
-here, logscl is the function that was checked. This function was invoked 100 timesand has no memory dependencies ieparallelizable.+save the results to "​results.csv".
-Another result might look like: +## INTERPRETTING THE RESULTS ##
-<​code>​ +
-------------- Checking for function: encode ------------- +
-Heap conflict with other at: 80033274 +
-Heap conflict with self at: 80033274 +
-Heap conflict with other at: 80032eb0 +
-Heap conflict with self at: 80032eb0 +
-stack conflict with other: 0 +
-stack conflict with self: 0 +
-heap conflict with other: 1 +
-heap conflict with self: 1 +
-heap dependency conflict with self: 1 +
-stack dependency conflict with self: 0 +
------------------------ DONE FOR: encode -----------------------+
-Invocation count of this function50 +In the csv file, the columns are as follows: 
 +Lower misses does not necessarily mean better performance. Higher line sizes make each miss cost more cycles.  
 +For a better estimate ​of performance,​ the simulator estimates the total fetch cycles. 
 +The estimated fetch cycles is an estimate of how many cycles are used to fetch data from memory as a result of all the misses. 
 +This is an estimate, based on the rough estimate of 25 cycles per fetch for a 16B line size. Knowing ​this
 +we can estimate the cycle counts for other line sizes using the fact that loading each additional word takes an additional 
 +2 cycles. 
 +For the DE4 DDR2 ram, each additional word takes 1 word and the default fetch cycles are lower. 
 +Tweak the #defines at the top of cache_sim.cpp as needed for more accurate estimates 
 +To get the access stream, you need to modify the tiger processor to print out the accessed addresses. To do this, do the following
 +Add the following to the data cache verilog: 
 + always@ (memAddress) begin 
 + if (memAddress > 32'd0 && avs_dCacheADDR_read == 1'b1 && memAddress[31] != 1'b1) begin 
 + $display("​[d%h]",​ memAddress);​ 
 + end 
 + end 
 +Add the following to the instruction cache verilog: 
 + always@ (address) begin 
 + if (address > 32'd0 && memRead == 1'b1 && address[31] != 1'b1) begin 
 + $display("​[i%h]",​ address); 
 + end 
 + end 
 +Then do make tigersim. Parse the modelsim output transcript with the included parser program  
 +to get the instruction cache access stream and the data cache access stream. 
 +## PARSER ## 
 +./parser <input transcript file<output data cache access stream file> <output instruction cache access stream file> 
 +The output files are then used with cache_sim
-Here, there are memory dependencies,​ ie. the function is not parallelizable. ​ 
-the 1s indicate TRUE not the number of conflicts. ​ 
cache_simulator_readme.txt · Last modified: 2011/08/26 16:44 by kev.nam