In this lab, you will produce the core of a compiler by defining classes for an object-oriented Abstract Syntax Tree (AST) that can generate assembly language code. (In earlier courses you worked with ASTs, but they provided methods for evaluating or executing themselves, not for translating themselves into assembly language. That is, you developed interpreters rather than a compiler.)
You will also gain some insight into what other elements are needed for a full compiler by observing the limitations of your first compiler. You will have the chance to fill in some of those missing pieces in later labs.
Rather than starting from a completely blank slate, download c1.zip
and unzip it. You will get a directory
called c1
that contains two things: a build.xml
file and a src
directory that contains various Java source files. The build.xml
file
provides rules that the ant
tool can use to
automate the building and testing of your compiler. I briefly
describe this in the next section and will be explaining it more (with
a demonstration) at the beginning of the first lab period, for which
we will start in the normal classroom.
One point about the src
directory is very crucial.
When you use the ant
tool to build your compiler, it will
create a second directory called java
and copy all the
Java source files from src
to java
. Thus, you
will wind up with two copies of each of these files. It is
essential that you make any changes to files in the
src
directory, not the ones in the java
directory that might get lost the next time you run ant
.
If you care, I can explain the reason for this strange setup; the
simple version is that it will make a whole lot more sense starting
with the next lab, when we use tools that write some of the Java files
for us.
You should read through the code in the src
directory
and then try it out as described in the next section. There isn't
very much of it, so even though it is almost entirely non-commented, I
hope it will be reasonably understandable. If not, ask me questions;
we've got a nice small class. The Main.java
file
contains the main program for this very primitive compiler; the other
files provide supporting classes. If you look at the main program,
you will see that one way in which this compiler is very primitive is
that it has no way to read in the program it is compiling. Instead,
it has hardwired into it a single program to compile, created by
invocations of the constructors for AST classes. In your testing,
you will want to make changes to what program gets compiled by
changing what constructors are called. Currently the program consists
of the following: print out the integer constant 42. You'll see that
the relevant classes for this are defined, but no others. For
example, there are no AST classes for arithmetic expressions such as
sums and products, just for integer constants.
In order to use the tools we will be working with in this course,
including ant
, you will need to do some initial setting
up. This just needs doing the first time, not every time you work on
the labs.
If you are working on your own computer, you will
need Java 1.5 installed. Then download the
Compiler
Construction Kit, which contains ant
and the other
tools we will be using. To install it, use the command
java -jar CCKSetup-20050516.jar
You may well need additional assistance; let me know. The documentation is mostly in German, which I can read.
If you are working on an MCS lab machine, you
will need to set up some environment variables. Edit your
.cshrc
file, if you use tcsh
as your shell,
or your .profile
file, if you use bash
.
After editing the file, you will need to start up a new shell in order
for the changes to take effect. For
.cshrc
, the lines to add are as follows:
setenv CCK_HOME /Net/gac/home2/m/a/max/CompilerConstructionKit setenv PATH ${PATH}:$CCK_HOME/bin setenv JAVA_HOME /usr/java/current
For .profile
, change the setenv
at the
beginning of each line to export
and change the second
space on each line to an equal sign.
Once you have made it through the above setup, it should be very
easy to build and run the initial compiler. In a shell, change
directory into the c1
directory and give the command
ant
. You should see output describing the whole
multi-step process; near the end, the value 42 should be printed out
by the spim
simulator running the generated assembly
code.
When testing out a compiler, it is never enough just to run the
generated code and see that it behaves correctly. You should also
manually inspect the code and make sure it looks right; sometimes you
will catch a bug in your compiler that didn't wind up causing a
misbehavior in testing. In this case, after running ant
,
you should look at the assembly language output in
output.s
. If you care to, you could also load this
file into xspim
, which you should be familiar with from
MCS-284, in order to more visibly execute it.
If you run ant
a second time, a lot less should
happen, because this tool is smart enough to know that it doesn't need
to recopy and recompile Java files that haven't changed. If you edit
one of the source files (such as Main.java
) and rerun
ant
, you will see that the modified file is copied and
compiled, but the others aren't. (If you are familiar with the tool
called make
, this behavior should come as no
surprise.)
You should add five new AST classes for sum, difference, product,
quotient, and remainder expressions. The constructor for each of
these should take two Expr
arguments, the left and right
operands. Each should be able to generate the appropriate assembly
code; this process will involve recursively generating the assembly
code for the two operands. Note that your generated code should be very
plodding: it should contain the assembly code for one operand, the
code for the other operand, and then an arithmetic operation. You
might recognize that the arithmetic could all be done at compile time
and a single constant emitted. That is an important optimization,
known as constant folding. We'll get to it later. The
straightforward code is a better model for what is needed when you add
variables.
If you write these five new classes directly as subclasses of
Expr
, you will find they are very similar. To avoid
replicated code, factor out the common essence into a new class that they
all have as a superclass, and which itself is a subclass of
Expr
.
Test your additions out by making suitable changes in
Main.java
. You should try programs that contain nested
operations, such as a sum of two products. You will not be able to
try out all five operations in a single test program, however, because
this very primitive compiler would run out of $t
registers. (This is discussed in the limitations section below.)
It would also be good to add another variety of statement
(i.e., subclass of Stmt
) that can be used to string
together a sequence of statements. That way you will be able to test
out programs that print out more than one expression's value. If you
don't do this now, you will need to take care of it in the next lab,
in addition to the other objectives for that lab.
There are two approaches to this goal. One approach is to create a
kind of statement that is made from a list of other statements,
making use of some class that implements java.util.List
,
such as java.util.ArrayList
or java.util.Vector
. The other approach is to
create a kind of statement that is made from exactly two other
statements. Since each of them can itself be a composite, you can
string together an arbitrarily long sequence of statements. In this
case, it would desirable to also create an AST class for an empty
statement, which does nothing. This is similar to the way lists are
built in Scheme, using cons
pairs with an empty list at
the end. As usual, ask if you need this explained better.
This compiler has two obvious limitations. One is that the programming language is very small, lacking variables, loops, conditionals, recursive procedures, and arrays, for example. The other is that the compiler has no way of reading in an input program. Instead, the AST is directly constructed. We will be remedying both these limitations in subsequent labs. In the next lab, you will add a scanner and a parser so that the compiler can read in a source program and build the AST from it, before asking the AST to generate its code as in this lab. In subsequent labs, you will add variables, control flow structures, procedures, and other features.
Beyond these obvious limitations, another very serious limitation
is that the output code makes use of a $t
register for
each value and never reuses any of them. As such, only programs that
use 10 or fewer values can be compiled. (The MIPS architecture
contains registers $t0
through $t9
.) If you
try compiling a larger program, the compiler won't complain, but the
output file will contain such bogus registers as $t10
,
which will cause SPIM to choke when it reads in the assembly
language. One very important job for a compiler, which we will talk
about later in the semester, is allowing programs to use more values
than the number of available registers. In part this is achieved by
making use of memory locations, but the main technique is to reuse
registers. Just because a register held one value early in the
program's execution doesn't mean it can't hold another value later.
To get some feel for this, compile a large enough test program so that you
wind up with illegally large register numbers, which SPIM won't
handle. Then manually edit the output.s
file to reuse a
no-longer-needed register number instead. If you do this correctly,
you should get a program that works in SPIM, and you will now
understand what a compiler's register allocator needs to do.
Finally, by looking at the assembly language code, you can see that many improvements could be made in its efficiency. This gives a small preview of another topic we will address later in the course, optimization
Please turn in a printout of any files you modified or added. You should also turn in evidence of your testing, showing that all the new AST classes (i.e., new kinds of expressions and statements) work properly. Your testing should include an example where you needed to manually do the job of a register allocator, as described in the limitations section.