When turning in a homework problem, be sure to indicate the exercise number. These will be the reference numbers I use in reporting back your standing on the homework.
Exercise 7.5 on pages 691-692. Because the recipe only specifies the time for baking, you will need to estimate how long each of the other steps takes. (Alternatively, you could try the recipe out and time yourself. I can supply a kitchen and ingredients, if you're interested.)
Exercise 7.x1:
A program needs to count the number of positive elements in a large one-dimensional array. In order to fully utilize a Core 2 Duo, the program runs two threads of execution, one on each core, with each thread counting the positive values from half the array locations; the two counts are added together later. There are two alternatives for how the work could be partitioned. (1) One thread could process the first half of the array and the other the second half. (2) One thread could process the even numbered elements and the other the odd numbered ones. Which alternative is likely to be significantly faster and why?
Consider a similar situation, but now the program wants to make each array element positive (by replacing it with its absolute value), rather than counting how many already are positive. Why might the difference between options (1) and (2) be even more pronounced for this program?
Exercise 7.x2: Refer to Figure 7.14 on page 670. Give an example of an arithmetic intensity (FLOPbyte ratio) for which you would expect the Opteron X2 and Opteron X4 to provide identical performance. What is the computation rate in GFLOP/s that you would expect both to have? Is this limited (on both processors) by memory bandwidth or by computation? Now give an example of an arithmetic intensity where the two processors would have different limiting bandwidths (one memory, one computation). For each of the two processors, which bandwidth is limiting and what would you expect the computation rate to be? Finally, give a third example arithmetic intensity, this one where both processors would be limited by the same kind of bandwidth, but would have different performance. At this intensity, which kind of bandwidth is limiting for both processors? And for each processor, what would the predicted rate of computation be?
Instructor: Max Hailperin