This exam is closed-book and mostly closed-notes. You may, however, use a single $8\ 1/2$ by 11 sheet of paper with hand-written notes for reference. (Both sides of the sheet are OK.) Please write your name only on this page. Do not turn the page until instructed, in order that everyone may have the same time. Then, be sure to look at all problems before deciding which one to do first. Some problems are easier than others, so plan your time accordingly. You have 120 minutes to work. Write the answer to each problem on the page on which that problem appears. You may also request additional paper, which should be labeled with your test number and the problem number. Printed name: | Problem | Page | Possible | Score | |---------|------|----------|-------| | 1 | 2 | 15 | | | 2 | 3 | 12 | | | 3 | 5 | 13 | | | 4 | 6 | 12 | | | 5 | 7 | 12 | | | 6 | 8 | 12 | | | 7 | 9 | 12 | | | 8 | 10 | 12 | | | Tota | ıl | 100 | | ## 1. [ **15 Points** ] | (a) Fill in each blank with the appropriate four-bit binary r | (a) | ( | (a) | $\operatorname{Fill}$ | in | each | blank | with | the | appropriat | e four- | -bit | binary | numera | 1: | |---------------------------------------------------------------|-----|---|-----|-----------------------|----|------|-------|------|-----|------------|---------|------|--------|--------|----| |---------------------------------------------------------------|-----|---|-----|-----------------------|----|------|-------|------|-----|------------|---------|------|--------|--------|----| | On the twelfth | day of Christmas, | |-----------------|-------------------| | my true love se | nt to me | | | drummers drumming | | | pipers piping, | | | lords a-leaping, | | | ladies dancing, | | | maids a-milking, | | | swans a-swimming, | | | geese a-laying, | | | golden rings, | | | calling birds, | | | French hens, | | | turtle doves, | | And a partridge | e in a pear tree! | - (b) Suppose the number of days in Christmas (12) were instead expressed as a single-precision floating point numeral, with 1 bit for the sign, 8 bits for the exponent, and 23 bits for the fraction. What would the 23 bits of the fraction be? - (c) You filled in the blanks with four-bit numerals interpreted as unsigned integers. Suppose that the four-bit numeral 1111 were instead interpreted as a signed integer; what would its value be? How about 1100? MCS-284 -2- December 19, 2009 2. [ 12 Points ] As mentioned on the prior test, the designers of the next-generation MIPS architecture are considering adding an instruction to the instruction set, sws. The name of this new instruction stands for Store Word Stepping. It behaves just like the normal sw instruction, except that it also writes a new value into the base address register, found by adding the offset. For example, the following two instructions: ``` sw $t0, 16($t1) addiu $t1, $t1, 16 ``` could be replaced with one: ``` sws $t0, 16($t1) ``` The machine-language format of the new instruction uses the Rs, Rt, and Imm fields in the same way as for sw. The datapath and control table of the single-cycle processor are reproduced on the next page. Make any modifications necessary to add the sws instruction. If you add any new control signals to the datapath, add columns for them to the table and show the values in those new columns in all rows, old as well as new. Similarly, if you widened any control signal, show the added bit(s) in the old rows as well as the new one. MCS-284 -3- December 19, 2009 | | Reg | ALU | Memto | Reg | Mem | Mem | | ALU | ALU | |-------------|-----|-----|-------|-------|------|-------|--------|-----|-----| | Instruction | Dst | Src | Reg | Write | Read | Write | Branch | Op1 | Op0 | | R-format | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | | lw | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | | SW | X | 1 | X | 0 | 0 | 1 | 0 | 0 | 0 | | beq | X | 0 | X | 0 | 0 | 0 | 1 | 0 | 1 | | SWS | | | | | | | | | | MCS-284 -4- December 19, 2009 3. [ 13 Points ] Suppose the following registers contain the corresponding initial values: | register | initial value | answer (a) | answer (b) | |----------|---------------|------------|------------| | 1 | 100 | | | | 2 | 200 | | | | 3 | 300 | | | | 4 | 400 | | | | 5 | 500 | | | | 6 | 600 | | | and the following memory locations contain the corresponding initial values: | address | value | |---------|-------| | 300 | 30 | | 400 | 40 | | 500 | 50 | | 600 | 60 | | 700 | 70 | | 800 | 80 | Further suppose that the following instructions are executed: add \$1, \$2, \$1 add \$4, \$1, \$1 lw \$5, 200(\$1) add \$6, \$5, \$1 lw \$2, 100(\$4) add \$3, \$6, \$6 Answer the following questions. For parts (a) and (b), put your answers into the table above. - (a) If the instructions are correctly executed, what are the new values of each register? - (b) If the instructions were instead incorrectly executed by using the initial, incomplete version of the pipelined processor, which ignores data hazards and so is missing forwarding and stalling, what register values would result? (Assume that as usual the register file is written during the first half of each cycle and read during the second half.) - (c) Suppose the instructions were correctly executed by the final version of the pipelined processor, which handles data hazards using forwarding where possible, and stalls only where essential. How many stall cycles (or bubbles) would be introduced? MCS-284 -5- December 19, 2009 - 4. [ 12 Points ] All of the following questions concern a system with 32-bit addresses and data words and that uses byte addressing. - (a) If an 8 KB direct-mapped cache with one-word blocks is replaced by an 8 KB direct-mapped cache with two-word blocks, will the total number of bits needed for the cache (not just to hold the data) increase, decrease, or stay the same? - (b) If an 8 KB direct-mapped cache with one-word blocks is replaced by an 8 KB two-way set associative cache with one-word blocks, will the total number of bits needed for the cache (not just to hold the data) increase, decrease, or stay the same? - (c) If an 8 KB direct-mapped cache with one-word blocks is replaced by an 8 KB direct-mapped cache with two-word blocks, will the number of compulsory misses increase, decrease, or stay the same? - (d) If an 8 KB direct-mapped cache with one-word blocks is replaced by an 8 KB two-way set associative cache with one-word blocks, will the total number of compulsory misses increase, decrease, or stay the same? - (e) If an 8 KB direct-mapped cache with one-word blocks is replaced by an 8 KB direct-mapped cache with two-word blocks, will the number of conflict misses increase, decrease, or stay the same? - (f) If an 8 KB direct-mapped cache with one-word blocks is replaced by an 8 KB two-way set associative cache with one-word blocks, will the total number of conflict misses increase, decrease, or stay the same? MCS-284 -6- December 19, 2009 ## 5. [ 12 Points ] - (a) A program needs to count the number of positive elements in a large one-dimensional array. In order to fully utilize a Core 2 Duo, the program runs two threads of execution, one on each core, with each thread counting the positive values from half the array locations; the two counts are added together later. There are two alternatives for how the work could be partitioned. (1) One thread could process the first half of the array and the other the second half. (2) One thread could process the even numbered elements and the other the odd numbered ones. Which alternative is likely to be significantly faster and why? - (b) Consider a similar situation, but now the program wants to make each array element positive (by replacing it with its absolute value), rather than counting how many already are positive. Why might the difference between options (1) and (2) be even more pronounced for this program? MCS-284 -7- December 19, 2009 ## 6. [ 12 Points ] - (a) When a GPU displays a three-dimensional scene, it must do many computations, including transforming each vertex's coordinates in three-dimensional space into its apparent two-dimensional location on the screen and shading each pixel of the display with its appropriate color. Explain how this yields both task-level and data-level parallelism. - (b) Why are the cores on recent NVIDIA GPUs multithreaded? For full credit, your answer should make reference to temporal locality. - (c) Suppose the design of a GPU is altered by replacing the floating point units with faster versions, each of which can execute twice as many floating point operations per second as the old versions. The performance of the old and new GPUs is compared using a sample program. Give two different reasons why the speedup might be substantially less than 2. MCS-284 -8- December 19, 2009 - 7. [ 12 Points ] A disk drive has an average seek time of 8 ms, a rotational rate of 6000 RPM, and a transfer rate of 80 MB/s. - (a) How many 8 KB requests per second can the drive handle if they are sequential? - (b) How many similar drives would be required to handle the same number of requests per second if they are random? - (c) If the purpose for reading the data from disk is in order to send it over a 100 Mbit/s Ethernet, will the disk access be the bottleneck? Why or why not? MCS-284 -9- December 19, 2009 ## 8. [ 12 Points ] (a) Draw lines to match each of the following protocol names with the layer to which it belongs: Ethernet application IP link HTTP network TCP transport - (b) At which of these layers would a hexadecimal address like 00:19:e3:46:94:ed be used to designate a destination on the same network? - (c) The following three kinds of devices are all used to join networks together, but they differ regarding the protocol layer they are at. Put them into order from the one at the lowest layer up to the one at the highest: Ethernet switch, Ethernet hub, router. MCS-284 -10- December 19, 2009