When turning in a homework problem, be sure to indicate the exercise number. These will be the reference numbers I use in reporting back your standing on the homework.
Exercise 7.x1: Suppose in a Java program the variable a
refers to a very large array containing millions of int
s. Then the following code might add up every n
th element of the array:
int total = 0; for(int i = 0; i < a.length; i += n){ total += a[i]; }
Give a value for n
that would cause this loop to exhibit very little temporal and spatial locality with regard to data accesses. Give another value of n
that would cause the loop to exhibit very high amounts of temporal locality but very little spatial locality with regard to data accesses. And finally give a third value of n
that would cause the loop to exhibit very little temporal locality but very high amounts of spatial locality with regard to data accesses. (This exercise is a variant of exercises 7.2-7.4.)
Do exercise 7.9 on page 556.
Do exercise 7.10 on page 556.
Exercise 7.x2: As described in Exercise 7.16 on page 557, "Cache C1 is direct-mapped with 16 one-word blocks. Cache C2 is direct-mapped with 4 four-word blocks. Assume that the miss penalty for C1 is 8 memory bus clock cycles and the miss penalty for C2 is 11 memory bus clock cycles." Also as in that exercise, you are to assume the caches start out empty and use word addresses. In that context, answer each of the following questions:
Give a reference string (that is, a list of addresses being referenced) that would cause C2 to have a substantially lower miss rate, such that it winds up spending fewer memory bus clock cycles on misses than C1 would.
This part is Exercise 7.16: give another reference string that again causes C2 to have the lower miss rate, but this time by a small enough margin that C2 winds up spending more cycles on misses than C1 does.
Now give a third reference string: one for which C1 has a lower miss rate than C2.
Do exercise 7.25 from the "For More Practice" portion of the CD materials.
Exercise 7.x3: Page 547 shows the AMD Opteron processor as having four TLBs: separate L1 (level 1) TLBs for instruction and data accesses, each of which are 40-entry fully associative TLBs, and separate L2 TLBs for instruction and data accesses, each of which are 512 entry set associative TLBs. By analogy with material earlier in the chapter regarding caches, answer the following questions:
What is the advantage of using separate TLBs for instruction and data accesses?
What is the advantage of including the 512-entry L2 TLBs, rather than moving directly to the page table after the 40-entry L1 TLBs?
What is the advantage of including the 40-entry L1 TLBs, rather than moving directly to a 512-entry (or even 552 entry) TLB?
What is the advantage of having the L1 TLBs be fully associative rather than four-way set associative?
What is the advantage of having the L2 TLBs be set associative rather than fully associative? Why is this different than for the L1 TLBs?
Instructor: Max Hailperin