## University of California, Berkeley - College of Engineering Department of Electrical Engineering and Computer Science Spring 2004 Instructor: Dan Garcia 2004-05-22 | Last Name | | | | | | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------|--------|------|-----| | First Name | | | | | | | Student ID Number | | | | | | | Login | cs61c- | | | | | | The name of your TA (please circle) | Alex | Chema | Jeremy | Paul | Roy | | Name of the person to your Left | | | | | | | Name of the person to your Right | | | | | | | All the work is my own. I had no prior knowledge of the exam contents nor will I share the contents with others in CS61C who have not taken it yet. (please sign) | | | | | | #### Instructions You are allowed to use two 8.5" x 11" double-sided handwritten pages of notes. No calculators are allowed. This booklet contains questions M0-M4 and F1-F5 on 10 numbered pages (including the cover page) and 3 duplicated pages from P&H and K&R. Write all your answers on this exam; **show all work and do not hand in other pieces of paper**. Question M0 (+1 points if correct) involves filling in the front of this page and putting your name & login on every *sheet* of paper. You have 3 hours to complete the exam. Good skill! | Problem | MO | M1 | M2 | М3 | M4 | Total | |------------|----|----|----|----|----|-------| | Minutes | 0 | 15 | 15 | 15 | 15 | 60 | | Max Score | 1 | 11 | 11 | 11 | 11 | 45 | | Your Score | | | | | | | | Problem | F1 | F2 | F3 | F4 | F5 | Total | |------------|----|----|----|----|----|-------| | Minutes | 24 | 24 | 24 | 24 | 24 | 120 | | Max Score | 18 | 18 | 18 | 18 | 18 | 90 | | Your Score | | | | | | | | Overall Final Exam Score: | |---------------------------| |---------------------------| ## **Question M1:** Numbers (11 Points – 15 minutes) a) In *two's complement* addition, how many of the 16 possible combinations of inputs to a 2-bit adder result in *overflow*? Assume no carry in. (3 points) b) We're going to use a nibble (4 bits) to represent a floating point number as follows: SEEM (1 bit for $\underline{S}$ ign, 2 for $\underline{E}$ xponent, 1 for $\underline{M}$ antissa), with bias=1. We'll follow the IEEE standard to reserve the smallest exponent for 0 & *denorms* (assume implicit exponent = 0), and the largest for NANs and $\pm \infty$ . Fill in the table below: (8 points) | Description | Number that the floating point notation on the right represents (in Decimal) | Binary representation of the floating point notation (in Hex) | |----------------------------------|------------------------------------------------------------------------------|---------------------------------------------------------------| | Most negative # (that is not -∞) | | | | Zero | 0 | 0x0 | | Smallest positive # | | | | Next-smallest positive # | | | | Name: | Login: cs61c- | |-------|---------------| |-------|---------------| #### Question M2: Memory & C (11 Points – 15 minutes) a) How many *bytes* are used in the *static area*, *stack* and *heap* as a result of line 8? For the heap, we're asking for the *requested* amount. Assume the arguments are passed in registers, as usual. (4 points) ``` 0 struct foo { 1 int iarray[2]; 2 char carray[4]; 3 int* parray[2]; 4 struct foo *next; 5 }; 6 int main() { 7 8 struct foo *myFoo = (struct foo *) malloc (2 * sizeof(struct foo)); 9 ``` | Static area | Stack | Неар | |-------------|-------|------| | | | | | | | | b) For a lab you wrote something similar to the following code: (s1,s2 are non-empty strings) ``` char *c1ptr; char *c2ptr; clptr = (char *) malloc ( sizeof(char) * (strlen(s1) + 1) ); c2ptr = (char *) malloc ( sizeof(char) * (strlen(s2) + 1) ); ``` Your friend wants to combine those malloc calls into one: ``` char *c1ptr; char *c2ptr; clptr = (char *) malloc ( sizeof(char) * (strlen(s1) + strlen(s2) + 2) ); c2ptr = (char *) c1ptr + (sizeof(char) * (strlen(s1) + 1)); ``` Assume there is *no heap overhead* used for the memory chunk header. Aside from your friend's poor style, in one sentence each provide a single *advantage* and a single *disadvantage* to the approach. If there is none, write NONE. (2 points each) | Advantage of friend's approach | Disadvantage of friend's approach | |--------------------------------|-----------------------------------| | | | | | | - c) What is the veracity of the following *memory management* statements? Circle **T** or **F**: (1 pt each) - **T F** We discussed 3 schemes: *K&R*, *slab allocator*, and the *buddy system*. It's possible to write code to make any of these the best and any of these the worst performer (i.e., one will never always dominate another, performance-wise). - **T F** Mark and sweep garbage collection *does not* work for circular data structures. - **T F** If you wrote code that had no calls to free and we only garbage collect when we have to, *reference counting* will start collecting before *copying*. ### Question M3: C (11 Points – 15 minutes) You are called upon to find the bugs and predict the output of the following program that was typed into a PC word processor, like notepad or textedit, so we can't trust the indenting. Specify actual or potential errors (not warnings) at *compile-time* (e.g., syntax) or *run-time* (e.g., out-of-bounds access) for lines (a)-(d). Concisely state your reasons (and suggest a fix) for all errors you find. Finally, give the output of line (e), assuming the compiler and the system *ignores all the errors it finds*. (2 pts each) ``` struct dlist { int value; struct dlist *next; }; int main() { struct dlist **p; int i,j; for (i=0; i<10; i++) *(p+i) = (struct dlist *) malloc (20*sizeof(struct dlist)); /* line (b) */ for (i=0;i<10; i++) { for(j=0; j<20; j++) { if(j<19) /* line (d) */ (*(p+i)+j+1)->value = i*j; } } printf("%d",p[5][15].value); /* line (e) */ return 0; } ``` | Line | Error? Circle "NO" or<br>"CT" (for compile-time) or<br>"RT" (for run-time) | | | If it <i>is</i> an error, briefly state the reason (and suggest a fix) | |------|----------------------------------------------------------------------------|----|----|------------------------------------------------------------------------| | (a) | NO | СТ | RT | | | (b) | NO | СТ | RT | | | (c) | NO | СТ | RT | | | (d) | NO | СТ | RT | | What is printed by line (e) (assume the compiler & system ignores all the errors it finds)? | Name: l | _ <b>ogin</b> : cs61c | |---------|-----------------------| |---------|-----------------------| ### **Question M4:** MIPS (11 Points – 15 minutes) The MIPS instruction set architecture (ISA) is going to be updated, and the Recording Industry Association of America (RIAA) has asked the designers to add an anti-piracy feature. The new instruction, riaa, will search for copyrighted material in your hard disk. In order to make space for the new instruction, they decided to remove variable logical shifts from the language. Thus, silv will disappear from MIPS and its (R-type) opcode & function will be used for riaa. Recall the format: silv \$rd, \$rt, \$rs which shifts the value in register rt to the left by the # of bits specified by register rs and puts the result in destination register rd. We wish to maintain MAL backward-compatibility, so we devise a clever solution in which we consider sllv to be a MAL *pseudoinstruction* & translate it into a *set* of TAL instructions that do the same thing. The key idea is that we're going to use self-modifying code! We'll stuff the low-order few bits from the contents of the rs register into the shamt field of a vanilla sll instruction (#11 below). We're telling you how to do it, and your job is to figure out where to put the temporary values (and a few other details). Here are the sll and sllv instructions, and the field widths: | | 6 | 5 | 5 | 5 | 5 | 6 | |------|---|----|----|----|-------|---| | sll | 0 | rs | rt | rd | shamt | 0 | | sllv | 0 | rs | rt | rd | 0 | 4 | Thus, the single sllv instruction (\$dst, \$src and \$shamtreg are abstractions for the actual registers) sllv \$dst, \$src, \$shamtreg ...would translate to the following *set* of TAL instructions: (fill in the blanks, we've shown comments) ``` 01 add ____, $0, $shamtreg # copy $shamtreg so we don't alter it andi _____, ___ # The shamt has a maximum size! 02 # shift the shamt to the right location 03 lui $at, shiftLby0(upper) # This lui and the following ori serve to ... 04 ori , $at, shiftLby0(lower) # "point" to the shiftLby0 instruction 05 lw ____, 0(____) 06 # reg now contains the shiftLby0 inst # "paste" shamt into instruction 07 80 lui $at, shiftLby0(upper) # Again, lui and the following ori serve to ... ori , $at, shiftLby0(lower) # "point" to the shiftLby0 instruction 09 # Self-modify our code! 10 ____, 0(____) 11 shiftLby0: sll $dst, $src, 0 # The shiftLby0 instruction ``` This concludes the Midterm-level questions. You're 2/3<sup>rds</sup> of the way through! ## Question F1: Verilog and Logic (18 Points [6 each] – 24 minutes) a) We've seen 6-input, 1-output logic gates like ANDs, ORs, NORs, NANDs, XORs, XNORs, etc. Radio Shack needs to have one copy of *ALL* the possible 6-input, 1-output logic gates there are possible (even stupid degenerate ones, like the one that ignores all the inputs and always outputs 0), and each costs \$1. How much does Radio Shack need to spend? Don't write your answer as an expression, work it out and write it in computer-ese, ala 1K\$, 64M\$, 2G\$, etc. b) Given the following sum-of-products expression for Foo, simplify this expression to a sum of products of (at most) 3 two-variable terms (e.g., AB + CD + AD): What's a good name for Foo? Foo = $$A*B*C + A*B*C + A*B*C + A*B*C$$ foo = \_\_\_\_\_\_\_, and is better named "\_\_\_\_\_\_" c) Given the following simplified sum-of-products expression for Bar, given A, B and C: ``` Bar = A*B + A*C + B*C ``` Draw the circuit diagram for the Bar function. You are required to instantiate *one 1-bit multiplexor*, and plug c into its select line (label its 0 and 1 inputs). You may only use basic gates AND, OR & NOT. Full credit will only be given to solutions with a mux and 3 basic gates. | Α | В | С | Scratch | space | |---|---|---|---------|-------| | 0 | 0 | 0 | | | | 0 | 0 | 1 | | | | 0 | 1 | 0 | | | | 0 | 1 | 1 | | | | 1 | 0 | 0 | | | | 1 | 0 | 1 | | | | 1 | 1 | 0 | | | | 1 | 1 | 1 | | | | Name: | Login: cs61c- | |-------|---------------| |-------|---------------| ## Question F2: Control and Datapath (18 Points – 24 Minutes) Modify the following single cycle MIPS datapath diagram to accommodate a new instruction <code>swai</code> (store word then auto-increment). The operation performs the regular <code>sw</code> operation, then post-increments the <code>rs</code> register by 1. Your modification may use simple adders, mux chips, wires, and new control signals. You may replace original labels where necessary. Recall the RTL for <code>sw</code> is: Mem[ R[rs] + SignExt[imm16] ] = R[rt]; PC=PC+4, & that sw (and swai) has the following fields: a) **Modify the picture above** and list your changes below. You may not need all the boxes. Please write them in "pipeline stage order" (i.e., changes affecting IF first, MEM next, etc) | (A) | | |-----|--| | (B) | | | (C) | | | (D) | | | (E) | | | (F) | | b) We also wish to do the same thing with lw, namely create lwai. Will this work? Circle YES or NO and argue your point in one sentence. (3 points) | е | | | | |---|--|--|--| |---|--|--|--| #### **Question F3:** Pipelining (18 points, 24 minutes) Given the following MIPS code snippet (note that instruction #6 could be anything): ``` loop: 1 addi $t0, $t0, 4 2 lw $v0, 0($t0) 3 sw $v0, 20($t0) 4 lw $s0, 60($t0) 5 bne $s0, $0, loop 6 ## \leftarrow The following instruction could be anything! ``` a) Detect hazards and insert no-ops to insure correct operation. Assume no delayed branch, no forwarding units and no interlocked pipeline stages. Your answer on the right should take the form of pair(s) of numbers: num@location - indicating num no-ops should be placed at location. E.g., if you wanted to place 6 noops between lines 2 and 3 (i.e., location=2.5) and 8 noops between lines 5 and 6 (i.e., location=5.5), you would write: "6@2.5, 8@5.5". (6 points) Scratch space b) Now, reorder/rewrite the program to maximize performance. Assume delayed branch and forwarding units, but no interlocked pipeline stages. For unknown reasons, the first instruction after the loop label must be the addi. Feel free to insert no-ops where needed. You should be able to do it using 6 instructions per loop (easier, half credit) or only 5 (hard, full credit). (12 pts) | | ## | Extra | instructions | before | the | 100p | if | necessary | |-------------------------------|----|-------|--------------|--------|-----|------|----|-----------| | | ## | Extra | instructions | before | the | loop | if | necessary | | loop:<br>1 addi \$t0, \$t0, 4 | | | | | | | | | | 2 | | | | | | | | | | 3 | | | | | | | | | | 1 | | | | | | | | | | 5 | | | | | | | | | | 5 | | | | | | | | | ## The following instruction could be anything! | Que | stion F4 | Cache | s & VM (′ | 18 points, 24 minute | es) | | | |---------|--------------------------------|------------------------|-----------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------|-----------------------------|---------------------------| | This is | s for question<br>sical memory | s (a) and<br>y. Assume | (b). The firs | et-level data cache for a coord size is 32 bits, the bloom is 4-way set associative | —–<br>ertain proces<br>ck size is 64 | | | | a) | How many b | oits are ne | eded for the | e following? Show your w | ork on the rio | ght. (3 points | <b>(</b> ) | | | Tag | Index | Offset | | | | | | b) | the width of | these fiel | lds) shift aro | s to the initial conditions a<br>und. E.g., if a bit field stay<br>field <i>decreases</i> by 1, writ | ys the same, | write "0", if a | | | | | | Change | <u> </u> | Tag | Index | Offset | | | Double the | cache siz | e (from 64 k | (B to 128 KB) | | | | | | Double the v | word size | (from 32 bit | ts to 64 bits) | | | | | | Change the | associati | ivity to fully a | associative | | | | | c) | their 32-bit N | MIPS syst | tem. They th | 61C decides to splurge a<br>nink to themselves: "Why some sentence max) for turn | should I turn | on Virtual M | | | | | | | | | | | | d) | Suppose a copage size. B | computer<br>lased on | has a 32-bit | t virtual address space, 64<br>tion, answer the following | 4 MB physica<br>questions. S | al memory a | nd a 4 KB<br>ork. (6 pts) | | d) | Suppose a copage size. B | Based on | this informat | tion, answer the following | 4 MB physica<br>questions. S | al memory ai<br>Show your w | nd a 4 KB<br>ork. (6 pts) | | d) | page size. B | ased on virtual pag | this informatiges are there | tion, answer the following | 4 MB physica<br>questions. S | al memory ai<br>Show your w | nd a 4 KB<br>ork. (6 pts) | # Question F5: Performance & I/O (18 points, 24 minutes) | a) Given the following instruction mix: | ALU | Load/Store | Branch | |-----------------------------------------|-----|------------|--------| | | 20% | 30% | 50% | ...and CPI<sub>i</sub> for each instruction i: | Machine | Clock speed | ALU | Load/Store | Branch | |---------|-------------|-----|------------|--------| | Α | 3 GHz | 3 | | 3 | | В | 1 GHz | 1 | 1 | 2 | What should the CPI of Load/Store for machine A be so that A and B have the same execution performance for this particular instruction mix? Show your work below and put your answer directly in the table above. (5 points) b) We want to send a message between two machines. We're using Gigabit ethernet & the message (including header & trailer) is 1,000 bytes long. Fill the table; show work! (5 pts) | What's the network transmission time? | | |-----------------------------------------------------------------------------------------------------|--| | What (send+receive) overhead causes effective bandwidth to drop to 10Mb/s? We want a single number. | | c) Answer the following short-answer questions in at most three words (8 pts): | Some critics say RAID 0 is a misnomer! What do they say is a more appropriate acronym for RAID 0? (Hint: what is it missing?) | | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | In retrospect, the inventors of RAID bemoaned they should have added a <i>RAID 6</i> level. What would it have gracefully allowed (that RAID 0-5 doesn't)? | | | Which I/O device we discussed had the highest data rate? | | | Dave Patterson and his ROC team argue that we've been focusing on <i>performance</i> almost to excess. What does he believe should be <i>the new benchmarks</i> ? | |