CS61C -- Summer 2005
Lab 12:  Caches

GOALS

In this lab you will experimentally determine the parameters of a computer's
cache.  Good luck...

BACKGROUND

The program ~cs61c/labs/lab12/cache.c produces timing data for accesses of
elements in an array. Here is its main loop.

    for (index = 0; index < limit; index = index + stride) {
        x[index] = x[index] + 1;
    }

The surrounding code coordinates the setting of the variables LIMIT and STRIDE
and the timing of the loop.

Each loop/stride test is repeated many times and the results are averaged. The
same loops are run without the memory access to measure the time taken with
loop overhead. The program subtracts the overhead time from the loop time so
the results should show the time spent on memory access only. Unfortunately,
some of our compilers don t compile the real loop and the dummy loop in the
same way, so your times may be exaggerated. You don't have to fix this.

The program prints three values for each test, labeled as SIZE, STRIDE, and
READ+WRITE.

* The SIZE value refers to the size, in bytes, of the array you are testing
with, not the size of the cache. (You'll have to figure out the cache size for
yourself.)

* The STRIDE value says how many bytes are being skipped at each access. For
example, if the stride is 4, bytes 0, 4, 8, 12, ... in the array are being
accessed with bytes 1, 2, 3, 5, 6, 7, 9, 10, 11, ... being skipped.

* The READ+WRITE value is the amount of time needed to make the access to
memory, in nanoseconds. You'll have to figure out for yourself whether these
times denote hits or misses. As explained above, the calculated times may be
exaggerated if the compiler doesn t make equivalent machine code for the two
loops that process the array.

EXERCISE 1

Start by gathering some data. Log in to a machine of your choice. Note the
machine name and any information you can find about its processor. "uname -a"
or "uname -X" may work on some of the instructional machines. Copy
~cs61c/labs/lab12/cache.c and compile it without any optimizations. A simple
"gcc -o cache cache.c" should work. Then run the program, which tests reading
and writing with arrays from 4K to 1M at strides varying from 4 to 512 bytes.
You will want to capture the output into a file, so use a command like 
"./cache > out" (which puts the result in a file named "out").

Then graph the data by running gnuplot. If it doesn't work on the machine you
connected to, run it from your local lab machine. Give gnuplot this command:

	plot "out" using 0:8 with linespoints

That command means "plot the lines in the file 'out' as points using the line
number (index 0) as the x-coordinate and the read+write time (eighth word;
index 8).  gnuplot breaks the graph line where there's a blank line in the
input. You should see the results of each array size experiment as separate
graph segments. Each segment shows various strides, increasing from 4 bytes at
its leftmost point to 512 bytes at its rightmost. (To learn more about
gnuplot, try its excellent help system: "help plot", "help plot with",
"help set", "help", etc.)

Before you start number-crunching, try to understand the qualitative features
of the graph. You should be able to explain what each of the jumps in access
time means, and why some are much bigger than others. Do all your graph
segments start at the same access time?

EXERCISE 2

From what you see in the graph and the numbers in your table, you should be
able to determine most of the parameters of the test computer's cache. Here
are the questions you should answer:

* Approximately how fast is a cache hit? 

* Approximately how fast is a cache miss? 

* What is the size in bytes of the first-level cache? Why? 

* What is the block size in bytes of the first-level cache? Why?

EXERCISE 3

Answer one more question:

* What is the set associativity on the cache you're analyzing?

You may not be able to figure this out from the data given by the original
cache.c. Changing the STRIDE_MIN, STRIDE_MAX, CACHE_MIN, and CACHE_MAX
settings at the top of the file will help. When you're calculating the set
associativity, you ll want a large stride and large cache sizes to eliminate
the effects of the block size and cache size. With a sufficiently large stride
length along with arrays big enough to contain 1, 2, 4, ... elements to access,
you should be able to observe the effects of putting several elements into the
same cache set (row). Normally, retrieving elements into the same set would
cause repeated misses and no benefit from the cache. But if the cache is
associative at all, you ll see benefits even when multiple blocks are loaded
into the same set. cache.c's testing style of increasing the array size should
give you results that show how many blocks can be loaded into one set, and
that's the cache's associativity.

A piece of information not required for this lab but worth exploring is the
following:

* Does the machine have more than one level of cache? Why or why not?

You will probably not be able to conclude very much about multiple levels of
cache because the smaller caches tend to hide the effects of the larger ones.
Do not expect to be able to calculate all the parameters of non-primary
caches.

WHAT YOU *SHOULD* HAVE LEARNED

As a result of this lab, you ought to be able to specify what data and
features of the cache program let you determine a given cache parameter, and
to answer questions like the following (drawn from an old exam):

Running the cache program on the Wazcog Mark IV computer, you get the
following results:

stride ->    4   8  16  32  64 128 256

size
64	   109 126 122 121 101
128	   120 118 116 121 117 105
256	   119 116 102 117 113 113 110
512	   466 966 949 949 983 308 311
1024	   458 949 983 966 949 983 307
2048	   449 992 954 966 979 979 979

(Stride and size are in bytes, times in nanoseconds. Yes, this is an
artificially small example, to make the numbers fit easily.)

* What is the cache size in bytes?
* What is the block size in bytes?
* What is the allocation policy? (Direct mapped, fully associative, or
  set associative? If N-way set associative, what is N?)