MCS-388 Lab 3: Adding Variables (Spring 2006)

Due: April 4, 2006

Objective

In this lab, you will add variables to your compiler's source programming language, with the ability to declare variables, assign values to them, and use them in expressions.

Starting point

Copy your c2 directory from the previous lab as c3, because your new compiler will build on your work from the prior lab. If you had difficulty with the previous lab, you should talk with me to make sure you have a solid foundation to build on.

Basic declaration and assignment statements

Initially you should add just simple declaration statements and assignment statements to your language. A declaration statement at this point should contain only a single variable and not have any initializer. An example would be

int exampleVariable1;

An assignment statement assigns the value of an expression to a variable. An example would be

fellerNumber = 10 + 7;

In order to add these features to the source language, you will need to make additions to the lexical analyzer and parser, as well as adding two new subclasses of Stmt for declaration statements and assignment statements.

The generate method for a declaration statement should return an empty string of assembly code, or possibly a string that just consists of a comment, if you want to have something visible to humans. (A comment in MIPS assembly language starts with # and extends through the end of the line.)

Although the generate method for a declaration statement returns empty assembly code, that doesn't mean that this method has no effect. The method should insert the declared name into a symbol table along with a register allocated to hold that variable. (For now, you can allocate a $t register in the same way as in other parts of the compiler.)

Because the symbol table is used to communicate between declarations and uses, such as in assignment statements, it needs to be accessible to all these AST nodes. The simplest reasonable approach to the symbol table would to use a separate class, SymbolTable, which you define with just static methods (for declaring a variable and for looking up its register) and a static variable that holds the table itself. Using the modern Java generics, a declaration for the table might show that it maps strings (names of variables) to strings (names of registers):

private static Map<String,String> table = new HashMap<String,String>();

(This presumes you have imported the names Map and HashMap from java.util.)

You can temporarily ignore the possibilities that a program might declare the same variable twice or might use a variable (in an assignment statement) without a previous declaration.

You should test your compiler at this stage in its evolution. Because you have not yet added the ability to use variables in expressions, you will need to look at the assembly language output (in output.s) to see if assignment statements are correctly moving their expressions' values into the variables' registers.

The remaining portions of this lab are independent of each other and so can be done in any order. You should test your work as you complete each section.

Allowing variables to serve as expressions

Having paused for testing at the first point where the language was at all usable, you can now proceed to make the language actually useful. Modify the grammar to allow a variable to serve as an expression and create a corresponding new subclass of Expr. You should now be able to test using programs that declare variables, assign values to them, and then make use of the variables in further computations.

Handling errors

Now you need to cope with the possibility that a program being compiled might contain repeated declarations for the same variable and might contain uses of undeclared variables. You should report an error message for each of these situations.

Error messages should be specific, which means they should include the name of the variable in question and also the position within the input at which the variable was redeclared or used undeclared. To get the position, you will need to pass additional information from the parser into the AST node constructor. In the parser, if a grammar production refers to a terminal symbol using something like

IDENTIFIER:name

then in the accompanying Java action code, you can not only make use of name to refer to the attribute of the IDENTIFIER token, you can also use nameleft to refer to the character position at which that token begins.

Reporting up to five error messages per run of your compiler is reasonable. After the fifth error message, the compilation should be terminated without checking for any further errors. If any errors at all are found (even less than five), the compilation should terminate without producing any assembly output and with a non-zero exit code, which will cause ant to stop without running spim. These features will be easiest to add if you provide a central error-reporting method, perhaps a static method in the CompilerMain class.

If a variable is reported as undeclared, the user will generally not be interested in seeing further error messages regarding that same variable. The easiest way to suppress further messages for the same variable is to pretend the variable is declared.

Making the syntax more flexible

Rather than only allowing variables to be declared one at a time, and rather than forcing initialization to take place in separate assignment statements, it would be nice to support a syntax like

int x, fellerNumber = 10 + 7, answer = fellerNumber + 25;

This change can be made purely in your lexical analyzer and parser; you can generate the same AST as if the declaration statement had been broken apart into

int x;
int fellerNumber;
fellerNumber = 10 + 7;
int answer;
answer = fellerNumber + 25;

Using saved registers

Instead of using the $t registers for variables as well as other purposes, you can use $s registers for the variables. At its core, this change is an easy matter of using a different name allocator. However, you should also change the generate method in the Procedure class so that it saves into the stack frame all the $s registers that are used and at the end restores them all. The amount of stack space allocated and deallocated should also be adjusted accordingly.

What you should turn in

Please turn in a printout of the final version of any files you modified or added. You should also turn in evidence of your testing of the final version.


Course web site: http://www.gustavus.edu/+max/courses/S2006/MCS-388/
Instructor: Max Hailperin <max@gustavus.edu>