In this lab, you will add variables to your compiler's source programming language, with the ability to declare variables, assign values to them, and use them in expressions.
c2 project from the previous lab as
c3, because your new
compiler will build on your work from the prior lab. If you had
difficulty with the previous lab, you should talk with me to make sure
you have a solid foundation to build on.
Initially you should add just simple declaration statements and assignment statements to your language. A declaration statement at this point should contain only a single variable and not have any initializer. An example would be
An assignment statement assigns the value of an expression to a variable. An example would be
fellerNumber = 10 + 7;
In order to add these features to the source language, you will
need to make additions to the lexical analyzer and parser, as well as
adding two new subclasses of
Stmt for declaration
statements and assignment statements.
toLLVM method for a declaration statement should
return a string of LLVM assembly code that allocates stack memory for the
variable and makes an LLVM name point to that memory space, as in
%t0 = alloca i32
Human readers of the LLVM assembly code (such as yourself) might
benefit if you added a comment containing the source variable name.
A comment in LLVM assembly language starts with
; and extends through the end of the line.
toLLVM method for a declaration
statement should have one additional effect. The method should insert the declared name into
a symbol table along with the LLVM name that was allocated to point to that
variable's storage. (So, for example, the symbol table might
exampleVariable1 has been
%t0 as its LLVM pointer.)
Because the symbol table is used to communicate between
declarations and uses, such as in assignment statements, it needs to
be accessible to all these AST nodes. The simplest reasonable approach to the symbol table would to use a
SymbolTable, which you define with just
static methods (for declaring a variable and for looking up its
LLVM counterpart) and a static variable that holds the table itself. Using
Java generics, a declaration for the table might show that
it maps strings (names of source-code variables) to strings (names of
private static Map<String,String> table = new HashMap<String,String>();
(This presumes you have
imported the names
You can temporarily ignore the possibilities that a program might declare the same variable twice or might use a variable (in an assignment statement) without a previous declaration.
For each assignment statement, you will want to generate an
store instruction. For example,
store i32 %t3, i32* %t0
would store the value named
%t3 into the memory
location pointed to by the pointer
You should test your compiler at this stage in its evolution. (In
fact, you could even have tested with just declarations, before adding
assignment statements.) Because you have not yet added the ability to
use variables in expressions, you will need to look at the assembly
language output (in
testProgram.ll) to see if assignment
statements are correctly storing their expressions' values into the
locations pointed to by the variables' pointers.
The remaining portions of this lab are independent of each other and so can be done in any order. You should test your work as you complete each section.
Having paused for testing at the first point where the language was
at all usable, you can now proceed to make the language actually
useful. Modify the grammar to allow a variable to serve as an
expression and create a corresponding new subclass of
Expr. The LLVM code will make use of
load instruction, as shown here:
%t4 = load i32* %t0
You should now be able to test using programs that declare variables, assign values to them, and then make use of the variables in further computations.
Now you need to cope with the possibility that a program being compiled might contain repeated declarations for the same variable and might contain uses of undeclared variables. You should report an error message for each of these situations.
Error messages should be specific, which means they should include the name of the variable in question and also the position within the input at which the variable was redeclared or used undeclared. To get the position, you will need to pass additional information from the parser into the AST node constructor. In the parser, if a grammar production refers to a terminal symbol using something like
then in the accompanying Java action code, you can not only make
name to refer to
the attribute of the
IDENTIFIER token, you can also use
nameleft to refer to the character position at which that
Reporting up to five error messages per run of your compiler is
reasonable. After the fifth error message, the compilation should be
terminated without checking for any further errors. If any errors at
all are found (even less than five), the compilation should terminate
without producing any assembly output and with a non-zero exit code,
which will cause
ant to stop without running
subsequent steps. These features will be easiest to add if you
provide a central error-reporting method, perhaps a static method in
If a variable is reported as undeclared, the user will generally not be interested in seeing further error messages regarding that same variable. The easiest way to suppress further messages for the same variable is to pretend the variable is declared.
Rather than only allowing variables to be declared one at a time, and rather than forcing initialization to take place in separate assignment statements, it would be nice to support a syntax like
int x, fellerNumber = 10 + 7, answer = fellerNumber + 25;
This change can be made purely in your lexical analyzer and parser; you can generate the same AST as if the declaration statement had been broken apart into
int x; int fellerNumber; fellerNumber = 10 + 7; int answer; answer = fellerNumber + 25;
Please turn in a zip file of the final version of your project.