MCS-388 Lab 1: Generating Code from ASTs (Spring 2009)

Due: March 3, 2009


In this lab, you will produce the core of a compiler by defining classes for an object-oriented Abstract Syntax Tree (AST) that can generate assembly language code. (In earlier courses you worked with ASTs, but they provided methods for evaluating or executing themselves, not for translating themselves into assembly language. That is, you developed interpreters rather than a compiler.)

Starting point

Rather than starting from a completely blank slate, download and unzip it. You will get a directory called c1 that contains three things: a build.xml file, a lib directory containing a supporting library in LLVM assembly language, and a src directory that contains various source files, mostly written in Java. The build.xml file provides rules that the ant tool can use to automate the building and testing of your compiler. I briefly describe this in the next section and will be explaining it more (with a demonstration) at the beginning of the first lab period.

One point about the src directory is very crucial. When you use the ant tool to build your compiler, it will create a second directory called java and copy all the Java source files from src to java. Thus, you will wind up with two copies of each of these files. It is essential that you make any changes to files in the src directory, not the ones in the java directory that might get lost the next time you run ant. This strange setup will make more sense starting with the next lab, when we use tools that write some of the Java files for us.

The Java code in the src directory is documented using the javadoc tool, which produces HTML web pages of documentation from the Java code, taking into account specially formatted comments. I've got the documentation linked here as a convenience, though the next section describes how you can generate your own copy. (This will be important once you start modifying the compiler. My copy of the documentation won't include your changes, whereas yours will.) You should read through this documenation as well as the source code and then try it out as described in the next section. If any of the code is mysterious to you, ask me questions; we've got a nice small class. The file contains the main program for this very primitive compiler; the other files provide supporting classes. If you look at the main program, you will see that one way in which this compiler is very primitive is that it has no way to read in the program it is compiling. Instead, it has hardwired into it a single program to compile, created by invocations of the constructors for AST classes. In your testing, you will want to make changes to what program gets compiled by changing what constructors are called. Currently the program consists of the following: print out a line containing the constant 42. You'll see that the relevant classes for this are defined, but no others. For example, there are no AST classes for arithmetic expressions such as sums and products, just for constants and procedure calls.

Building and running the initial compiler

In order to use the tools we will be working with in this course, including ant, you will need to do some initial setting up. This just needs doing the first time, not every time you work on the labs.

Once you have made it through the above setup, it should be very easy to build and run the initial compiler. In a shell, change directory into the c1 directory and give the command ant. You should see output describing the whole multi-step process; near the end, the value 42 should be printed out by running the generated code.

When testing out a compiler, it is never enough just to run the generated code and see that it behaves correctly. You should also manually inspect the code and make sure it looks right; sometimes you will catch a bug in your compiler that didn't wind up causing a misbehavior in testing. In this case, after running ant, you should look at the LLVM assembly language output in testProgram.ll.

If you run ant a second time, a lot less should happen, because this tool is smart enough to know that it doesn't need to recopy and recompile Java files that haven't changed. If you edit one of the source files (such as and rerun ant, you will see that the modified file is copied and compiled, but the others aren't. (If you are familiar with the tool called make, this behavior should come as no surprise.)

You can also use ant to perform other tasks. In particular, the command ant doc will produce a doc directory containing javadoc files generated from the Java source code. To look at this documentation, open the file doc/index.html in a web browser.

Changes you need to make

You should add five new AST classes for sum, difference, product, quotient, and remainder expressions, called SumExpr, DifferenceExpr, ProductExpr, QuotientExpr, and RemainderExpr. The constructor for each of these should take two Expr arguments, the left and right operands. Each should be able to generate the appropriate assembly code; this process will involve recursively generating the assembly code for the two operands. Note that your generated code should be very plodding: it should contain the assembly code for one operand, the code for the other operand, and then an arithmetic operation. You might recognize that the arithmetic could all be done at compile time and a single constant emitted. That is an important optimization, known as constant folding. We'll get to it later. The straightforward code is a better model for what is needed when you add variables.

A typical LLVM instruction to add two values together would be as follows:

    %t3 = add i32 %t1, %t2

The instructions for the other arithmetic operations are similar, using the LLVM opcodes sub, mul, sdiv, and srem.

If you write these five new classes directly as subclasses of Expr, you will find they are very similar. To avoid replicated code, factor out the common essence into a new class that they all have as a superclass, and which itself is a subclass of Expr.

Test your additions out by making suitable changes in You should try programs that contain nested operations, such as a sum of two products. You can even try out all five operations in a single test program.

What you should turn in

Please turn in a printout of any files you modified or added. You should also turn in evidence of your testing, showing that all the new AST classes (i.e., new kinds of expressions) work properly. It would be good if your new classes included javadoc comments, similar to those in the files I provided.

Course web site:
Instructor: Max Hailperin <>