CS61C -- Summer 2005 Lab 12: Caches GOALS In this lab you will experimentally determine the parameters of a computer's cache. Good luck... BACKGROUND The program ~cs61c/labs/lab12/cache.c produces timing data for accesses of elements in an array. Here is its main loop. for (index = 0; index < limit; index = index + stride) { x[index] = x[index] + 1; } The surrounding code coordinates the setting of the variables LIMIT and STRIDE and the timing of the loop. Each loop/stride test is repeated many times and the results are averaged. The same loops are run without the memory access to measure the time taken with loop overhead. The program subtracts the overhead time from the loop time so the results should show the time spent on memory access only. Unfortunately, some of our compilers don t compile the real loop and the dummy loop in the same way, so your times may be exaggerated. You don't have to fix this. The program prints three values for each test, labeled as SIZE, STRIDE, and READ+WRITE. * The SIZE value refers to the size, in bytes, of the array you are testing with, not the size of the cache. (You'll have to figure out the cache size for yourself.) * The STRIDE value says how many bytes are being skipped at each access. For example, if the stride is 4, bytes 0, 4, 8, 12, ... in the array are being accessed with bytes 1, 2, 3, 5, 6, 7, 9, 10, 11, ... being skipped. * The READ+WRITE value is the amount of time needed to make the access to memory, in nanoseconds. You'll have to figure out for yourself whether these times denote hits or misses. As explained above, the calculated times may be exaggerated if the compiler doesn t make equivalent machine code for the two loops that process the array. EXERCISE 1 Start by gathering some data. Log in to a machine of your choice. Note the machine name and any information you can find about its processor. "uname -a" or "uname -X" may work on some of the instructional machines. Copy ~cs61c/labs/lab12/cache.c and compile it without any optimizations. A simple "gcc -o cache cache.c" should work. Then run the program, which tests reading and writing with arrays from 4K to 1M at strides varying from 4 to 512 bytes. You will want to capture the output into a file, so use a command like "./cache > out" (which puts the result in a file named "out"). Then graph the data by running gnuplot. If it doesn't work on the machine you connected to, run it from your local lab machine. Give gnuplot this command: plot "out" using 0:8 with linespoints That command means "plot the lines in the file 'out' as points using the line number (index 0) as the x-coordinate and the read+write time (eighth word; index 8). gnuplot breaks the graph line where there's a blank line in the input. You should see the results of each array size experiment as separate graph segments. Each segment shows various strides, increasing from 4 bytes at its leftmost point to 512 bytes at its rightmost. (To learn more about gnuplot, try its excellent help system: "help plot", "help plot with", "help set", "help", etc.) Before you start number-crunching, try to understand the qualitative features of the graph. You should be able to explain what each of the jumps in access time means, and why some are much bigger than others. Do all your graph segments start at the same access time? EXERCISE 2 From what you see in the graph and the numbers in your table, you should be able to determine most of the parameters of the test computer's cache. Here are the questions you should answer: * Approximately how fast is a cache hit? * Approximately how fast is a cache miss? * What is the size in bytes of the first-level cache? Why? * What is the block size in bytes of the first-level cache? Why? EXERCISE 3 Answer one more question: * What is the set associativity on the cache you're analyzing? You may not be able to figure this out from the data given by the original cache.c. Changing the STRIDE_MIN, STRIDE_MAX, CACHE_MIN, and CACHE_MAX settings at the top of the file will help. When you're calculating the set associativity, you ll want a large stride and large cache sizes to eliminate the effects of the block size and cache size. With a sufficiently large stride length along with arrays big enough to contain 1, 2, 4, ... elements to access, you should be able to observe the effects of putting several elements into the same cache set (row). Normally, retrieving elements into the same set would cause repeated misses and no benefit from the cache. But if the cache is associative at all, you ll see benefits even when multiple blocks are loaded into the same set. cache.c's testing style of increasing the array size should give you results that show how many blocks can be loaded into one set, and that's the cache's associativity. A piece of information not required for this lab but worth exploring is the following: * Does the machine have more than one level of cache? Why or why not? You will probably not be able to conclude very much about multiple levels of cache because the smaller caches tend to hide the effects of the larger ones. Do not expect to be able to calculate all the parameters of non-primary caches. WHAT YOU *SHOULD* HAVE LEARNED As a result of this lab, you ought to be able to specify what data and features of the cache program let you determine a given cache parameter, and to answer questions like the following (drawn from an old exam): Running the cache program on the Wazcog Mark IV computer, you get the following results: stride -> 4 8 16 32 64 128 256 size 64 109 126 122 121 101 128 120 118 116 121 117 105 256 119 116 102 117 113 113 110 512 466 966 949 949 983 308 311 1024 458 949 983 966 949 983 307 2048 449 992 954 966 979 979 979 (Stride and size are in bytes, times in nanoseconds. Yes, this is an artificially small example, to make the numbers fit easily.) * What is the cache size in bytes? * What is the block size in bytes? * What is the allocation policy? (Direct mapped, fully associative, or set associative? If N-way set associative, what is N?)