MCS-388 Lab 3: Adding Variables (Spring 2013)

Due: April 16, 2013

Objective

In this lab, you will add variables to your compiler's source programming language, with the ability to declare variables, assign values to them, and use them in expressions.

Starting point

Copy your c2 project from the previous lab as c3, because your new compiler will build on your work from the prior lab. If you had difficulty with the previous lab, you should talk with me to make sure you have a solid foundation to build on.

Basic declaration and assignment statements

Initially you should add just simple declaration statements and assignment statements to your language. A declaration statement at this point should contain only a single variable and not have any initializer. An example would be

int exampleVariable1;

An assignment statement assigns the value of an expression to a variable. An example would be

fellerNumber = 10 + 7;

In order to add these features to the source language, you will need to make additions to the lexical analyzer and parser, as well as adding two new subclasses of Stmt for declaration statements and assignment statements.

The toLLVM method for a declaration statement should return a string of LLVM assembly code that allocates stack memory for the variable and makes an LLVM name point to that memory space, as in this example:

%t0 = alloca i32

Human readers of the LLVM assembly code (such as yourself) might benefit if you added a comment containing the source variable name. A comment in LLVM assembly language starts with ; and extends through the end of the line.

The toLLVM method for a declaration statement should have one additional effect. The method should insert the declared name into a symbol table along with the LLVM name that was allocated to point to that variable's storage. (So, for example, the symbol table might indicate that exampleVariable1 has been allocated %t0 as its LLVM pointer.)

Because the symbol table is used to communicate between declarations and uses, such as in assignment statements, it needs to be accessible to all these AST nodes. The simplest reasonable approach to the symbol table would to use a separate class, SymbolTable, which you define with just static methods (for declaring a variable and for looking up its LLVM counterpart) and a static variable that holds the table itself. Using Java generics, a declaration for the table might show that it maps strings (names of source-code variables) to strings (names of LLVM pointers):

private static Map<String,String> table = new HashMap<String,String>();

(This presumes you have imported the names Map and HashMap from java.util.)

You can temporarily ignore the possibilities that a program might declare the same variable twice or might use a variable (in an assignment statement) without a previous declaration.

For each assignment statement, you will want to generate an LLVM store instruction. For example,

store i32 %t3, i32* %t0

would store the value named %t3 into the memory location pointed to by the pointer %t0.

You should test your compiler at this stage in its evolution. (In fact, you could even have tested with just declarations, before adding assignment statements.) Because you have not yet added the ability to use variables in expressions, you will need to look at the assembly language output (in testProgram.ll) to see if assignment statements are correctly storing their expressions' values into the locations pointed to by the variables' pointers.

The remaining portions of this lab are independent of each other and so can be done in any order. You should test your work as you complete each section.

Allowing variables to serve as expressions

Having paused for testing at the first point where the language was at all usable, you can now proceed to make the language actually useful. Modify the grammar to allow a variable to serve as an expression and create a corresponding new subclass of Expr. The LLVM code will make use of the load instruction, as shown here:

%t4 = load i32* %t0

You should now be able to test using programs that declare variables, assign values to them, and then make use of the variables in further computations.

Handling errors

Now you need to cope with the possibility that a program being compiled might contain repeated declarations for the same variable and might contain uses of undeclared variables. You should report an error message for each of these situations.

Error messages should be specific, which means they should include the name of the variable in question and also the position within the input at which the variable was redeclared or used undeclared. To get the position, you will need to pass additional information from the parser into the AST node constructor. In the parser, if a grammar production refers to a terminal symbol using something like

IDENTIFIER:name

then in the accompanying Java action code, you can not only make use of name to refer to the attribute of the IDENTIFIER token, you can also use nameleft to refer to the character position at which that token begins.

Reporting up to five error messages per run of your compiler is reasonable. After the fifth error message, the compilation should be terminated without checking for any further errors. If any errors at all are found (even less than five), the compilation should terminate without producing any assembly output and with a non-zero exit code, which will cause ant to stop without running subsequent steps. These features will be easiest to add if you provide a central error-reporting method, perhaps a static method in the Compiler class.

If a variable is reported as undeclared, the user will generally not be interested in seeing further error messages regarding that same variable. The easiest way to suppress further messages for the same variable is to pretend the variable is declared.

Making the syntax more flexible

Rather than only allowing variables to be declared one at a time, and rather than forcing initialization to take place in separate assignment statements, it would be nice to support a syntax like

int x, fellerNumber = 10 + 7, answer = fellerNumber + 25;

This change can be made purely in your lexical analyzer and parser; you can generate the same AST as if the declaration statement had been broken apart into

int x;
int fellerNumber;
fellerNumber = 10 + 7;
int answer;
answer = fellerNumber + 25;

What you should turn in

Please turn in a zip file of the final version of your project.

Course web site: http://gustavus.edu/+max/courses/S2013/MCS-388/
Instructor: Max Hailperin <max@gustavus.edu>