In this lab, you will produce the core of a compiler by defining classes for an object-oriented Abstract Syntax Tree (AST) that can generate assembly language code. (In earlier courses you worked with ASTs, but they provided methods for evaluating or executing themselves, not for translating themselves into assembly language. That is, you developed interpreters rather than a compiler.)
Rather than starting from a completely blank slate, download c1.zip
and unzip it. You will get a directory
called c1
that contains three things:
a build.xml
file,
a lib
directory containing a supporting library in LLVM assembly language,
and a src
directory that contains various source files, mostly written in Java. The build.xml
file
provides rules that the ant
tool can use to
automate the building and testing of your compiler. I briefly
describe this in the next section and will be explaining it more (with
a demonstration) at the beginning of the first lab period.
One point about the src
directory is very crucial.
When you use the ant
tool to build your compiler, it will
create a second directory called java
and copy all the
Java source files from src
to java
. Thus, you
will wind up with two copies of each of these files. It is
essential that you make any changes to files in the
src
directory, not the ones in the java
directory that might get lost the next time you run ant
.
This strange setup will make more sense starting
with the next lab, when we use tools that write some of the Java files
for us.
The Java code in the src
directory is documented using
the javadoc tool, which produces HTML web pages of documentation
from the Java code, taking into account specially formatted
comments. I've got the documentation linked here
as a convenience, though the next section describes how you can
generate your own copy. (This will be important once you start
modifying the compiler. My copy of the documentation won't include
your changes, whereas yours will.) You should read through this
documenation as well as the source code and then try it out as
described in the next section. If any of the code is mysterious to
you, ask me questions; we've got a nice small class.
The TestAST.java
file contains the main program for
this very primitive compiler; the other files provide supporting
classes. If you look at the main program, you will see that one way
in which this compiler is very primitive is that it has no way to
read in the program it is compiling. Instead, it has hardwired into
it a single program to compile, created by invocations of the
constructors for AST classes. In your testing, you will want to
make changes to what program gets compiled by changing what
constructors are called. Currently the program consists of the
following: print out a line containing the constant 42. You'll see
that the relevant classes for this are defined, but no others. For
example, there are no AST classes for arithmetic expressions such as
sums and products, just for constants and procedure calls.
In order to use the tools we will be working with in this course,
including ant
, you will need to do some initial setting
up. This just needs doing the first time, not every time you work on
the labs.
If you are working on your own computer, you will
need a current version of Java installed. Then download the
Compiler
Construction Kit, which contains ant
and the other
tools we will be using. To install it, use the command
java -jar CCKSetup-20050516.jar
You may well need additional assistance; let me know. The documentation is mostly in German, which I can read.
You will also need to download the LLVM system (which we are using
as our backend) from llvm.org and
install it. You only need the basic LLVM package, not
the llvm-gcc
C compiler, unless you want to modify the
small runtime library I am providing.
If you are working on an MCS lab machine, you
will need to set up some environment variables. Edit your
.cshrc
file, if you use tcsh
as your shell,
or your .profile
file, if you use bash
.
After editing the file, you will need to start up a new shell in order
for the changes to take effect. For
.cshrc
, the lines to add are as follows:
setenv MAX_HOME ~max setenv CCK_HOME $MAX_HOME/CompilerConstructionKit setenv PATH ${PATH}:$CCK_HOME/bin:$MAX_HOME/usr/local/bin
For .profile
, change the setenv
at the
beginning of each line to export
and change the second
space on each line to an equal sign.
Once you have made it through the above setup, it should be very
easy to build and run the initial compiler. In a shell, change
directory into the c1
directory and give the command
ant
. You should see output describing the whole
multi-step process; near the end, the value 42 should be printed out
by running the generated code.
When testing out a compiler, it is never enough just to run the
generated code and see that it behaves correctly. You should also
manually inspect the code and make sure it looks right; sometimes you
will catch a bug in your compiler that didn't wind up causing a
misbehavior in testing. In this case, after running ant
,
you should look at the LLVM assembly language output in
testProgram.ll
.
If you run ant
a second time, a lot less should
happen, because this tool is smart enough to know that it doesn't need
to recopy and recompile Java files that haven't changed. If you edit
one of the source files (such as TestAST.java
) and rerun
ant
, you will see that the modified file is copied and
compiled, but the others aren't. (If you are familiar with the tool
called make
, this behavior should come as no
surprise.)
You can also use ant
to perform other tasks. In
particular, the command ant doc
will produce
a doc
directory containing javadoc files generated from
the Java source code. To look at this documentation, open the
file doc/index.html
in a web browser.
You should add five new AST classes for sum, difference, product,
quotient, and remainder expressions, called
SumExpr
, DifferenceExpr
, ProductExpr
,
QuotientExpr
, and RemainderExpr
. The constructor for each of
these should take two Expr
arguments, the left and right
operands. Each should be able to generate the appropriate assembly
code; this process will involve recursively generating the assembly
code for the two operands. Note that your generated code should be very
plodding: it should contain the assembly code for one operand, the
code for the other operand, and then an arithmetic operation. You
might recognize that the arithmetic could all be done at compile time
and a single constant emitted. That is an important optimization,
known as constant folding. We'll get to it later. The
straightforward code is a better model for what is needed when you add
variables.
A typical LLVM instruction to add two values together would be as follows:
%t3 = add i32 %t1, %t2
The instructions for the other arithmetic operations are similar,
using the LLVM
opcodes sub
, mul
, sdiv
, and srem
.
If you write these five new classes directly as subclasses of
Expr
, you will find they are very similar. To avoid
replicated code, factor out the common essence into a new class that they
all have as a superclass, and which itself is a subclass of
Expr
.
Test your additions out by making suitable changes in
TestAST.java
. You should try programs that contain nested
operations, such as a sum of two products. You can even
try out all five operations in a single test program.
Please turn in a printout of any files you modified or added. You should also turn in evidence of your testing, showing that all the new AST classes (i.e., new kinds of expressions) work properly. It would be good if your new classes included javadoc comments, similar to those in the files I provided.