In this lab, you will turn your AST classes from the previous lab into a true compiler by equipping them with a scanner and parser. That way, your compiler will be able to read in any program (within its limited language) and generate the translation for that program, rather than always generating the translation for one fixed program as in the prior lab.
Download and unzip the file c2.zip. This will
produce a directory c2
structured
similarly to the previous lab, for use with ant
.
Copy all the files from the previous lab's
c1/src
into c2/src
, because your new
compiler will build on your work from the prior lab.
The scanner and parser are provided by two new source files,
src/scanner.jflex
and src/Parser.cup
.
One you have copied the files, you can try building and testing the initial
version of the compiler. In a shell with your current directory being
c2
, give the command ant
. With any luck,
the output you see should end with a successful execution of the LLVM
code generated by running your compiler on
the testProgram.c
source file.
If you look at the testProgram.c
used in the testing,
you will see that despite the .c
ending on the filename,
the syntax isn't that of the C programming language. That's going to
change in the course of this lab. For now, the test program consists
of a single arithmetic expression. A look at the grammar
in src/Parser.cup
should confirm that as the compiler
currently stands, the syntax of a program contains nothing but a
single expression. Moreover, if you look
at src/scanner.jflex
, you'll see another limitation of
the current compiler: although it skips whitespace, it has no
provision for skipping comments. (You could try putting a comment
into testProgram.c
and see that it trips up the compiler.)
These observations set the stage for the work you need to do in order
to expand the compiler to a somewhat more realistic source language.
In subsequent labs, you will further expand the language.
Before getting to work on your extensions to the compiler, it would
be worth noting that the build.xml
file has additional
features beyond the ability to run the whole compiler-creation and
test compilation process. For example, you can test just the scanner by running
ant test_scanner
to see what tokens it identifies. I
would be happy to explain any of the other testing features that might
interest you.
First, modify the scanner so that it will skip comments extending
from //
to the end of the line
and test that this
works.
Second, modify the syntax of expressions so as to allow procedure call
expressions. A procedure call expression consists of a name followed
by a parenthesized expression. This will require changes
in both the scanner and the parser. Again, you should stop and test
that your compiler works with examples of the modified language.
Once this is working, you can remove the implicit call
to println
provided in the parser's action for
the stmt
nonterminal. Instead, the call
to println
can be made explicit in the source code
file, testProgram.c
.
Third, modify the syntax of statements so that a statement is no longer just an unadorned expression, but rather has a semicolon after the expression, as in the following example:
println(2+2);
This will again require modifying both the scanner and the parser. Even though this is a minor change, you should test your compiler. One of the secrets to building a large program successfully is to be meticulous about testing each individual change.
Fourth, modify the syntax of programs so that a program is no longer just an unadorned statement, but rather has some extra symbols at the beginning and end, as in the following example:
int main(){ println(2+2); }
Once more, this will require changes in both the scanner and the
parser. The first two lexemes, int
and main
,
play different roles. The first should be a special keyword token,
which must be int
, whereas
the second should be an arbitrary name; whatever name is given should
be used for the compiled procedure, whether or not it
is main
. Again, test your compiler.
Finally, modify the syntax of programs further so that between the curly braces the programmer can write any list of zero or more statements, rather than only one. At this point, the language is as complete as it will be for this lab, unless you chose to do one or more of the optional changes. You should give your compiler one last testing.
It would be nice if the scanner also skipped comments that start
with /*
and extend through the next instance of
*/
.
If you look at the example main
procedure above,
you'll see that it is defined as returning an int
but
doesn't in fact contain any return
statement; the code
just falls off the end of the procedure body. This
works out OK because the parser is automatically building an
implicit return of 0 into the AST at the end of the procedure.
However, it would be nice if you could include an
explicit return
statement, which could return the value
of any expression, not necessarily 0.
Your test program can include calls to procedures other than
just println
; in particular, you can
use putchar
to output individual characters. However,
when you do so, the argument needs to be the code number of the
character being output, and it is a bit inelegant to have to use
magic numbers such as 43 for a plus sign or 61 for an equal sign.
It would be so much nicer if you could just write '+'
or '='
. By having the scanner produce the same tokens
for the character constants as it would for the decimal numerals,
you can leave the parser alone. If you want to get fancy, you can
also handle backslashes specially.
Please turn in a printout of the final version of any files you modified or added. You should also turn in evidence of your testing of the final version.