MC48 Homework 1 (Fall 1996)

Due: September 17, 1996

Our textbook remarks on page 24 that the cost per die goes up more than linearly with die size because there are not only fewer dies per wafer (at a roughly fixed cost per wafer) but also a lower yield (fraction of the dies that work), since there is a roughly fixed density of defects per square centimeter and so a larger die is more likely to contain a defect. (A die that contains even a single defect is considered non-working; testing at the semiconductor factory is designed to discover these dies, and they are simply thrown away.) In this problem, you will examine this effect.
Estimate the ratio of the cost per die of the Pentium Pro processor to the cost per die of the Pentium processor. The captions of figures 1.15 and 1.16 on pages 23 and 25 specify the number of dies per wafer at 100% yield and the die areas. One equation on page 48 relates the cost per die to the number of dies per wafer and the yield (in the obvious way), while another equation on that page relates the yield to the density of defects and the die area. (You will not need to use the approximate equation for dies per wafer given on that page, since we know the actual number of dies per wafer.) I don't know Intel's current defect density, but it is probably in the ballpark of 1 defect per square centimeter, so use that in your calculations. How does the ratio of costs compare with the ratio of die areas for these two processors?
Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 400 MHz and M2 has a clock rate of 200Mhz. The average number of cycles for each instruction class on M1 and M2 are given in the following table:
```
Class	CPI on M1    CPI on M2   C1 usage  C2 usage  3rd party usage
  A         4            2          30%       30%         50%
  B         6            4          50%       20%         30%
  C         8            3          20%       50%         20%
```
The table also contains a summary of how three different compilers use the instruction set. One compiler is a third party product, C1 is a compiler produced by the makers of M1 and C2 is a compiler produced by the makers of M2. Assume that each compiler uses the same number of instructions for a given program but that the instruction mix is as described in the figure. Using C1 how much faster can the company claim M1 is as compared to M2? Using C2 how much faster can the company claim M2 is as compared to M1? If you purchase M1 which compiler would you use? If you purchase M2 which compiler would you use? Which machine would you purchase assuming all other criteria were identical including costs?
For the following set of variables, identify all of the subsets which can be used to calculate execution time. Each subset should be minimal, i.e., not contain any variable which is not needed.
{CPI, clock rate, cycle time, MIPS, number of instructions in program, number of cycles in program}
Last year's MC48 students discovered in one of their labs the following facts about a particular program (TeX) processing a particular input file on a particular processor (the R3000):

Instruction class Fraction of instructions CPI for class
loads .337 1.20
other .663 1.15
Using techniques that we'll study in chapter 6, it would be possible to design a new processor (lets call it the S3000) otherwise like the R3000 but such that each load instruction would be replaced with one or two instructions with CPI of approximately 1.00. The data from last year's lab showed that the average load instruction would be replaced by 1.67 of these new CPI 1.00 instructions. This would increase the total number of instructions needed to execute the program, but reduce the average CPI.
1. What is the average CPI for this program execution on the R3000? How about on the hypothetical S3000?
2. Let's use the variable I to designate the number of instructions it takes to execute TeX on the R3000. What is the number of instructions that will be necessary on the S3000?
3. Suppose the clock rates of the S3000 and R3000 are identical. If the performance of TeX on the two processors is stated in MIPS, which processor has the higher MIPS rate? How much faster (as measured in MIPS) is whichever machine is faster?
4. Suppose that the performance of TeX on the two processors is instead measured by the total execution time for the program. Now which processor is faster, and by how much?

Instruction class	Fraction of instructions	CPI for class
loads	.337	1.20
other	.663	1.15

Instructor: Max Hailperin