MCS-388 Lab 3: Generating Better Code from ASTs (Spring 2003)

Due: April 11, 2003

Objective

This time the objective isn't to expand the source language: c3 will compile the exact same programs as c2. Instead, the goal will be to generate higher quality assembly language output, by using a more sophisticated approach to code generation. Rather than generating the assembly language code as the parsing is being done, an Abstract Syntax Tree (AST) data structure will be built during the parsing, and then the AST object will be asked to generate the code. The code it generates should have the following quality improvements:

If the operands of an arithmetic operator are constants, the operation should be done at compile time rather than run-time. For example, a statement to assign 2+3*4 to a variable should generate assembly output that simply loads 14 into the register. (This is known as constant folding.)
If only the right operand of an arithmetic operator is constant, then the immediate form of the MIPS assembly language instruction should be used, in which the right operand of the assembly language instruction is a constant rather than a register. (You can just generate normal instructions like add, without the i on the end, and the MIPS assembler (or xspim) is smart enough to figure out that with the right operand a number, instead of a register, you mean an immediate instruction. What it actually generates at the machine language level will vary.)
If only the left operand of an arithmetic operator is constant, and the operator is a commutative one (addition or multiplication, currently), then the operands should be reversed and an immediate instruction used.
If a constant needs to be made available in a register (for example, to be the left operand of a non-commutative operation), and the constant happens to be 0, then rather than loading 0 into a temporary register, the special zero register should be used.
When computing an expression for assignment to a variable or for printing, the value shouldn't be computed into a temporary register and then copied from there to the variable's register or the $a0 argument register. Instead, it should be computed directly into the variable's register or the argument register. (This is known as targeting.)
Rather than always saving and restoring all $s registers and $ra, only those of these registers that are actually stored into by the procedure should be saved and restored. Similarly, the stack frame size should be reduced from 40 to whatever is actually needed, and if no stack frame is needed at all (because none of these registers are stored into), the subtraction from and addition to $sp should be completely eliminated.

Code skeleton

In ~max/JavaLib/edu/gac/max/mcs388/s2003 I am providing the following files as a starting point. You might want start from your own c2 files instead, or do some mixture of the two.

The basic idea embodied in these files is that the parser builds an AST, and then the Main simply asks that AST to generate its assembly language code. As you can see from reading the parser, the AST is represented using three abstract classes: Expr (for expressions), Stmt (for statements), and Proc (for procedures). Each of these has a number of concrete subclasses. For example, when an assignment statement is parsed, a new AssignStmt is created, where AssignStmt is a subclass of Stmt. Similarly, when a variable is used as an expression, a VarExpr is created, which is a kind of Expr. There is no reason why these subclasses need to be direct subclasses: if there is some commonality you would like to factor out (for example, among the various kinds of arithmetic expressions, or among those arithmetic expressions with commutative operators), you might want to introduce one or more intermediate levels of class hierarchy. Also be aware that the abstract classes (e.g., Expr) needn't be purely abstract: if there is some functionality you can implement at that level of hierarchy, go right ahead.

From the Main.java file you can see that the Proc class (or its concrete subclass, MainProc), needs to support a gen method that produces a String of assembly language code. Presumably this will be generated by calling a similar gen method on the body statement. This gen method of the Stmt class should not only return the assembly language code, but also information about what $s or $ra registers are set by that code, so that the MainProc class can surround the statement code with suitable prolog and epilog code that does the right saving and restoring. One reasonable way to do this is to have the Stmt class gen method take a java.util.HashSet argument, into which it stores all the Registers that need saving/restoring for its sake. Then the prolog and epilog can be produced by iterating through the contents of that set.

The gen method of the AssignStmt and PrintIntStmt classes will need to in turn call upon an Expr class gen method to generate code for the expression. In order to achieve targeting, this Expr class gen method should take a destination argument. However, because of nested subexpressions, not all expressions will have a particular target destination. One reasonable solution is to always pass in an instance of a RegisterDestination class, which is a container that can hold a Register (when targeting is desired) or null (when not). If the RegisterDestination object holds null, then the Expr is responsible for choosing a suitable destination register on its own, and then storing it into the RegisterDestination holder. That way the caller gets not only the assembly language code (as the String returned by the gen method) but also the destination register used by that code, as the new contents of the RegisterDestination.

For some expressions, when no specific target register is specified, an existing register can be used as the destination, such as the $zero register for the constant 0, or a $s register for variables. (In this case, the generated code string returned would be the empty string.) For other expressions, a temporary register will need to be chosen for the destination. Subexpressions may also need temporary registers. In order to support this, one reasonable design is to have the RegisterDestination class contain as a static variable the RegisterAllocator from which to allocate temporaries. That way an Expr with no preference as to destination can ask the RegisterDestination to choose a temporary. Before doing so, the Expr should ask the subexpression's RegisterDestinations to deallocate their Registers, if they came from the RegisterAllocator of temporaries. (Even if the Expr doesn't request an allocation, it should ask for the deallocation of the subexpressions.)

In summary, the RegisterDestination class could support the following operations, or the equivalent, where rd is a RegisterDestination:

new RegisterDestination(): create with no Register specified
new RegisterDestination(someRegister): create with a Register specified
rd.specify(someRegister): specify a register if none yet
rd.getRegister(): gets the Register that was specified
rd.toString(): returns the Register's name
rd.ensureSpecified(): if not yet specified, allocate a temp and use it
rd.deallocateIfTemp(): if ensureSpecified allocated a temp, return it to the RegisterAllocator

Actually, if you look at how the specify operation would be used, a convenient variant might be

rd.moveFrom(someRegister): returns a String of assembly code: if a Register had not previously been specified for rd, sets rd to someRegister and returns an empty string; otherwise, returns a move instruction to move from someRegister to rd's register.

In order to support constant folding and the use of immediates, it would probably help for the Expr class to also have methods for determining whether an Expr has a constant value and if so, what that value is. (Note that Exprs that aren't explicitly ConstExprs may well have constant values. For example, a SumExpr has a constant value if both of its operands do. The alternative of ensuring that a ConstExpr is produced whenever the value is known to be a constant is probably more trouble than it is worth.)

Some Design Flaws

The above design assumes that the only type will ever be int, and that the only place variables will ever be stored is in registers. I did this to keep this lab more manageable. If you want to build in some additional generality in these regards, talk with me. Also, the suggested moveFrom operation in the RegisterDestination class is of questionable coherence; you could think some more about that sort of design issue if you felt so inclined.

Postlab

Write a lab report showing your c3 modules. You should also describe the tests you ran, and any problems uncovered. Be sure to be thorough with your testing, because there is lots of potential in this lab for special cases that need individual testing. You shouldn't do excessively many tests or excessively complex ones, but you should carefully choose the tests to cover all your code's cases.

Course web site: http://www.gac.edu/~max/courses/S2003/MCS-388/
Instructor: Max Hailperin <max@gac.edu>