MCS-388 Lab 1: Generating Code from ASTs (Spring 2006)

Due: February 24, 2006

Objective

In this lab, you will produce the core of a compiler by defining classes for an object-oriented Abstract Syntax Tree (AST) that can generate assembly language code. (In earlier courses you worked with ASTs, but they provided methods for evaluating or executing themselves, not for translating themselves into assembly language. That is, you developed interpreters rather than a compiler.)

You will also gain some insight into what other elements are needed for a full compiler by observing the limitations of your first compiler. You will have the chance to fill in some of those missing pieces in later labs.

Starting point

Rather than starting from a completely blank slate, download c1.zip and unzip it. You will get a directory called c1 that contains two things: a build.xml file and a src directory that contains various Java source files. The build.xml file provides rules that the ant tool can use to automate the building and testing of your compiler. I briefly describe this in the next section and will be explaining it more (with a demonstration) at the beginning of the first lab period, for which we will start in the normal classroom.

One point about the src directory is very crucial. When you use the ant tool to build your compiler, it will create a second directory called java and copy all the Java source files from src to java. Thus, you will wind up with two copies of each of these files. It is essential that you make any changes to files in the src directory, not the ones in the java directory that might get lost the next time you run ant. If you care, I can explain the reason for this strange setup; the simple version is that it will make a whole lot more sense starting with the next lab, when we use tools that write some of the Java files for us.

You should read through the code in the src directory and then try it out as described in the next section. There isn't very much of it, so even though it is almost entirely non-commented, I hope it will be reasonably understandable. If not, ask me questions; we've got a nice small class. The Main.java file contains the main program for this very primitive compiler; the other files provide supporting classes. If you look at the main program, you will see that one way in which this compiler is very primitive is that it has no way to read in the program it is compiling. Instead, it has hardwired into it a single program to compile, created by invocations of the constructors for AST classes. In your testing, you will want to make changes to what program gets compiled by changing what constructors are called. Currently the program consists of the following: print out the integer constant 42. You'll see that the relevant classes for this are defined, but no others. For example, there are no AST classes for arithmetic expressions such as sums and products, just for integer constants.

Building and running the initial compiler

In order to use the tools we will be working with in this course, including ant, you will need to do some initial setting up. This just needs doing the first time, not every time you work on the labs.

Once you have made it through the above setup, it should be very easy to build and run the initial compiler. In a shell, change directory into the c1 directory and give the command ant. You should see output describing the whole multi-step process; near the end, the value 42 should be printed out by the spim simulator running the generated assembly code.

When testing out a compiler, it is never enough just to run the generated code and see that it behaves correctly. You should also manually inspect the code and make sure it looks right; sometimes you will catch a bug in your compiler that didn't wind up causing a misbehavior in testing. In this case, after running ant, you should look at the assembly language output in output.s. If you care to, you could also load this file into xspim, which you should be familiar with from MCS-284, in order to more visibly execute it.

If you run ant a second time, a lot less should happen, because this tool is smart enough to know that it doesn't need to recopy and recompile Java files that haven't changed. If you edit one of the source files (such as Main.java) and rerun ant, you will see that the modified file is copied and compiled, but the others aren't. (If you are familiar with the tool called make, this behavior should come as no surprise.)

Changes you need to make

You should add five new AST classes for sum, difference, product, quotient, and remainder expressions. The constructor for each of these should take two Expr arguments, the left and right operands. Each should be able to generate the appropriate assembly code; this process will involve recursively generating the assembly code for the two operands. Note that your generated code should be very plodding: it should contain the assembly code for one operand, the code for the other operand, and then an arithmetic operation. You might recognize that the arithmetic could all be done at compile time and a single constant emitted. That is an important optimization, known as constant folding. We'll get to it later. The straightforward code is a better model for what is needed when you add variables.

If you write these five new classes directly as subclasses of Expr, you will find they are very similar. To avoid replicated code, factor out the common essence into a new class that they all have as a superclass, and which itself is a subclass of Expr.

Test your additions out by making suitable changes in Main.java. You should try programs that contain nested operations, such as a sum of two products. You will not be able to try out all five operations in a single test program, however, because this very primitive compiler would run out of $t registers. (This is discussed in the limitations section below.)

Changes it would be good to make

It would also be good to add another variety of statement (i.e., subclass of Stmt) that can be used to string together a sequence of statements. That way you will be able to test out programs that print out more than one expression's value. If you don't do this now, you will need to take care of it in the next lab, in addition to the other objectives for that lab.

There are two approaches to this goal. One approach is to create a kind of statement that is made from a list of other statements, making use of some class that implements java.util.List, such as java.util.ArrayList or java.util.Vector. The other approach is to create a kind of statement that is made from exactly two other statements. Since each of them can itself be a composite, you can string together an arbitrarily long sequence of statements. In this case, it would desirable to also create an AST class for an empty statement, which does nothing. This is similar to the way lists are built in Scheme, using cons pairs with an empty list at the end. As usual, ask if you need this explained better.

Limitations

This compiler has two obvious limitations. One is that the programming language is very small, lacking variables, loops, conditionals, recursive procedures, and arrays, for example. The other is that the compiler has no way of reading in an input program. Instead, the AST is directly constructed. We will be remedying both these limitations in subsequent labs. In the next lab, you will add a scanner and a parser so that the compiler can read in a source program and build the AST from it, before asking the AST to generate its code as in this lab. In subsequent labs, you will add variables, control flow structures, procedures, and other features.

Beyond these obvious limitations, another very serious limitation is that the output code makes use of a $t register for each value and never reuses any of them. As such, only programs that use 10 or fewer values can be compiled. (The MIPS architecture contains registers $t0 through $t9.) If you try compiling a larger program, the compiler won't complain, but the output file will contain such bogus registers as $t10, which will cause SPIM to choke when it reads in the assembly language. One very important job for a compiler, which we will talk about later in the semester, is allowing programs to use more values than the number of available registers. In part this is achieved by making use of memory locations, but the main technique is to reuse registers. Just because a register held one value early in the program's execution doesn't mean it can't hold another value later. To get some feel for this, compile a large enough test program so that you wind up with illegally large register numbers, which SPIM won't handle. Then manually edit the output.s file to reuse a no-longer-needed register number instead. If you do this correctly, you should get a program that works in SPIM, and you will now understand what a compiler's register allocator needs to do.

Finally, by looking at the assembly language code, you can see that many improvements could be made in its efficiency. This gives a small preview of another topic we will address later in the course, optimization

What you should turn in

Please turn in a printout of any files you modified or added. You should also turn in evidence of your testing, showing that all the new AST classes (i.e., new kinds of expressions and statements) work properly. Your testing should include an example where you needed to manually do the job of a register allocator, as described in the limitations section.


Course web site: http://www.gustavus.edu/+max/courses/S2006/MCS-388/
Instructor: Max Hailperin <max@gustavus.edu>