Because there are an odd number of students in the class, one group will need to have three students in it. If you would like to nominate your group to be the group of three, please let me know. Also, if you have trouble finding a partner, let me know.
Because of this, the CPI of the R2000/R3000 is very nearly 1 even in
programs that do a lot of branching, so long as the program doesn't
use floating point instructions (which we haven't talked about) and so
long as we ignore memory-system effects (which we will until the next
chapter and the next lab). Of course, some of the instructions
executed at that CPI of 1 may be nop
instructions, if
nothing better can be found to do in the delay slots after control
transfers.
The newer R4000 version of the MIPS processor has a different pipeline structure. Rather than a five-stage pipeline, with branching handled in the second stage, the R4000 has an eight-stage pipeline, with branching handled in the fourth stage. (The reason for this is that each stage does less, so that the clock rate can be higher.)
In order to allow programs from the R2000/R3000 to run on the R4000 without change, a single instruction following each branch or jump is still executed in any case. However, this is no longer enough to take care of the full control hazard. Rather than conditionally flushing extra instructions out of the pipeline, the R4000 designers chose a simpler approach: the pipeline is simply stalled (after the one "delay slot" instruction) until it is known where instructions should be fetched from. Thus each branch (or jump) causes extra stall cycles, even if the branch winds up falling through rather than being taken. As a result, the CPI of the R4000 is increased. How much worse the CPI is depends on how much branching or jumping the program does.
Rather than test several separate programs, we can do a slight
variation on this experiment more easily. Namely, we can run one
largish program, composed out of multiple procedures, and compare data
from each of several procedures. Since procedures differ from one
another, we will still get a variety of densities of control transfer
instructions, but with fewer logistical hassles, since only one
program needs to be run. We'll take this approach, running the
troff
program (a text formatter) on a sample text input
file, and gathering statistics from it that we can analyze separately
for each procedure.
You will need to do the experimentation on one of the Silicon Graphics
Indy machines, or at least remotely logged into one over the network.
The first phase of the experiment consists of making an instrumented
version of the troff
program and running it on the sample
input to produce statistical output about how often each portion of the
program is executed. Because the instrumented version of the program
is rather large, and you won't need it after the
experiment is over, I suggest you do this work in a temporary
directory (on the Indy's own disk drive) rather than in your home
directory (on the network file server). You can do this as follows
(in a terminal/shell window):
mkdir /tmp/troff$$ cd /tmp/troff$$ ssrun -ideal /usr/local/bin/troff ~max/MC48/sample.text >/dev/nullThe first line makes the temporary directory and the second changes your current directory to it. The third line does the actual work of instrumenting the program and running it on the sample text.
At this point you have a file called something like
troff.ideal.2324
. The number on the end will likely be
different. In order to analyze this data, you will use the
prof
program. To get an overall analysis of the whole
program, with the R3000's pipeline assumed, you can do
prof -r3000 -op troff.ideal.* >troff-r3000This puts the output into the file
troff-r3000
. You can
do the same thing with 4000 substituted for 3000 (in both places) to
get a file describing how the program would run on the R4000.
If you look in these files, you will see that one major feature is a
list of the procedures, starting with the ones that contribute the
most to the program's total number of cycles and working down to the
ones that are very minor. Pick a handful of procedures to focus on,
drawing from at or near the top of the list, since these are the ones
that run for a substantial number of cycles. For each one, rerun
prof
but telling it to focus on just that one
procedure. For example, if you pick token::next
, you
would use
prof -r3000 -O token::next -op troff.ideal.* >troff-r3000-token::nextin order to get a file,
troff-r3000-token::next
,
analyzing the behavior of that one procedure on the R3000 pipeline.
You should do this for each of the procedures you selected, on each of
the two processors.
At this point, you will have a bunch of files of analysis output from
prof
, each with a name starting with
troff-r
. These files would be worth hanging onto as the
basis from which you are going to draw conclusions, including testing
your pre-lab prediction. However, the other files in the temporary
directory, in particular the instrumented version of the
troff
program and its libraries, are large and now
useless. Therefore, you should copy the output files from
prof
into your own directory and delete the temporary
directory with its contents. You can do this as follows:
cd cp /tmp/troff$$/troff-r* . rm -rf /tmp/troff$$The first line changes directory back to your home directory. (You could instead change to some subdirectory of your home directory.) The second line copies the output files from the temporary directory to the current (home) directory (be sure to include the space and period at the end of this command), and the final line removes the temporary directory and its contents (including the instrumented program).
Finally, it is worth considering whether some of the R4000's penalty
for its greater pipelining could be avoided. Rather than
unconditionally stalling the pipeline, the instructions after a branch
could be tentatively allowed to proceed, and then later flushed out
(except the first one) if the branch turns out to be taken. (This
approach is described in section 6.6.) Your data files indicate how
many of the branches are taken and how many are untaken. (You'll need
to combine the forward and backward branches.) Using this data, how
much better might the overall troff
program's CPI be on a
modified R4000 that used the flush-if-taken approach?
Instructor: Max Hailperin