MCS-388 Lab 3: Generating Better Code from ASTs (Spring 2001)

Due: March 23, 2001

Objective

This time the objective isn't to expand the source language: c3 will compile the exact same programs as c2. Instead, the goal will be to generate higher quality assembly language output, by using a more sophisticated approach to code generation. Rather than generating the assembly language code as the parsing is being done, an Abstract Syntax Tree (AST) data structure will be built during the parsing, and then the AST object will be asked to generate the code. The code it generates should have the following quality improvements:

Code skeleton

In ~max/JavaLib/edu/gac/max/mcs388/s2001 I have provided the following files as a starting point; you can start from your own c2 files if you'd rather.

The basic idea embodied in these files is that the parser builds an AST, and then the Main simply asks that AST to generate its assembly language code. As you can see from reading the parser, the AST is represented using three abstract classes: Expr (for expressions), Stmt (for statements), and Proc (for procedures). Each of these has a number of concrete subclasses. For example, when an assignment statement is parsed, a new AssignStmt is created, where AssignStmt is a subclass of Stmt. Similarly, when a variable is used as an expression, a VarExpr is created, which is a kind of Expr. There is no reason why these subclasses need to be direct subclasses: if there is some commonality you would like to factor out (for example, among the various kinds of arithmetic expressions, or among those arithmetic expressions with commutative operators), you might want to introduce one or more intermediate levels of class hierarchy. Also be aware that the abstract classes (e.g., Expr) needn't be purely abstract: if there is some functionality you can implement at that level of hierarchy, go right ahead.

From the Main.java file you can see that the Proc class (or its concrete subclass, MainProc), needs to support a gen method that produces a String of assembly language code. Presumably this will be generated by calling a similar gen method on the body statement. This gen method of the Stmt class should not only return the assembly language code, but also information about what $s or $ra registers are set by that code, so that the MainProc class can surround the statement code with suitable prolog and epilog code that does the right saving and restoring. One reasonable way to do this is to have the Stmt class gen method take a Hashtable argument, into which it stores all the registers that need saving/restoring for its sake. Then the prolog and epilog can be produced by iterating through the contents of that table.

The gen method of the AssignStmt and PrintIntStmt classes will need to in turn call upon an Expr class gen method to generate code for the expression. In order to achieve targeting, this Expr class gen method should take a destination argument. However, because of nested subexpressions, not all expressions will have a particular target destination. One reasonable solution is to always pass in an instance of a RegisterDestination class, which is a container that can hold a Register (when targeting is desired) or null (when not). If the RegisterDestination object holds null, then the Expr is responsible for choosing a suitable destination register on its own, and then storing it into the RegisterDestination holder. That way the caller gets not only the assembly language code (as the String returned by the gen method) but also the destination register used by that code, as the new contents of the RegisterDestination.

For some expressions, when no specific target register is specified, an existing register can be used as the destination, such as the $zero register for the constant 0, or a $s register for variables. (In this case, the generated code string returned would be the empty string.) For other expressions, a temporary register will need to be chosen for the destination. Subexpressions may also need temporary registers. In order to support this, one reasonable design is to have the Expr class's gen method take a second argument (in addition to the RegisterDestination), namely the RegisterAllocator from which to allocate temporaries. Any temporaries that it allocates it should also deallocate before returning, except the one it stores into the RegisterDestination.

In order to support constant folding and the use of immediates, it would probably help for the Expr class to also have methods for determining whether an Expr has a constant value and if so, what that value is. (Note that Exprs that aren't explicitly ConstExprs may well have constant values. For example, a SumExpr has a constant value if both of its operands do. The alternative of ensuring that a ConstExpr is produced whenever the value is known to be a constant is probably more trouble than it is worth.)

Some Design Flaws

The above design assumes that the only type will ever be int, and that the only place variables will ever be stored is in registers. I did this to keep this lab more manageable. If you want to build in some additional generality in these regards, talk with me.

Postlab

Write a lab report showing your c3 modules. You should also describe the tests you ran, and any problems uncovered. Be sure to be thorough with your testing, because there is lots of potential in this lab for special cases that need individual testing. You shouldn't do excessively many tests or excessively complex ones, but you should carefully choose the tests to cover all your code's cases.


Course web site: http://www.gac.edu/~max/courses/S2001/MCS-388/
Instructor: Max Hailperin <max@gac.edu>