MCS-388 Lab 1: A First Compiler (Spring 2005)

Due: February 23, 2005

Objective

In this lab, you'll familiarize yourself with the tools we'll be using.

First you'll run a simple compiler I provide, called c0, and you'll re-build it from its source files.

Then you'll evolve c0 into a marginally more capable compiler, c1, which can compile larger programs than c0, and which has an input syntax compatible with C/C++. This is just a first step in a progression that will continue all semester long, ending with c6 in May.

Setting your CLASSPATH

The Java system we'll be using (on the Linux PCs) uses an environment variable called CLASSPATH to tell it where to look for Java classes. If you don't set it, only the main system-wide class library is searched. In order to be able to access the packages I'm providing, you'll need to put ~max/JavaLib into your CLASSPATH. Furthermore, it will be most convenient if you make a directory for your own packages and put that in your CLASSPATH as well. Assuming you don't have anything else you want in the CLASSPATH other than these two (e.g., from other courses), that you call your directory JavaLib, and that you are using tcsh or csh as your shell, you could set your CLASSPATH as follows:
setenv CLASSPATH ~max/JavaLib
setenv CLASSPATH ~/JavaLib:$CLASSPATH
You can do this in a shell window, but you'll find it convenient to put it in your .login file so it is done automatically every time. (Be careful if you already have something like this in your .login or .cshrc.) If you are using bash as your shell, you would use
export CLASSPATH=~/JavaLib:~max/JavaLib
and would put it in your .bash_profile, .bash_login, or .profile file.

Trying out c0

Assuming you've got your CLASSPATH set, you can run my c0 as follows. Put an arithmetic expression in a file (let's say foo). The expression can use numbers, parentheses, and the operators +, -, *, /, and %. There can be white space, but no comments. There shouldn't be anything but the expression (for example, no semicolon). Now compile foo into MIPS assembly language, with the result going into foo.s, by executing
java edu.gac.max.mcs388.s2005.c0.Main <foo >foo.s
Now you can look at foo.s, and you can load it into xspim and run it.

Rebuilding c0

Copy the c0 package into your own JavaLib (or whatever you've called it). The easiest way is with a shell command like
(cd ~max/JavaLib; tar cf - edu/gac/max/mcs388/s2005/c0) \
 | (cd ~/JavaLib; tar xf -)

Now change directory to your edu/gac/max/mcs388/s2005/c0 and rebuild the classes from the source files as follows:

  1. Run the java_cup parser generator on the parser.cup source file, producing the machine-generated source files parser.java and sym.java:
    java java_cup.Main <parser.cup
    
  2. Run the JLex lexical analyzer generator on the scanner.lex source file, producing the machine-generated source file scanner.lex.java:
    java JLex.Main scanner.lex
    
  3. Finally, compile the three .java files resulting from the above two steps, as well as the handwritten Main.java file:
    javac *.java
    
Assuming you put your JavaLib directory ahead of mine in your CLASSPATH, as I suggested, you should now wind up running your own rebuilt c0 if you run c0 again. To verify that this is what is really happening, you could make some minor change to your version, such as to output a comment (starting with the # character), and rebuild and run it.

Evolving c0 into c1

Now that you have taken all the tools out for a test drive, it is time for you to make a c1 package, copy the three source files into it, change the package lines at their tops to reflect the move, and then evolve the program.

The language c1 accepts should allow programs of the following form:

main(){
   print_int(2+(3+4)*5);
   print_int(7+-2*2);  // a comment could be here
   print_int(          // or at the end of any line
             3+4
            );
   .
   .
   .
   }
(Note that the gcc C compiler will also handle these same programs. For that reason, I suggest you put a .c extension on the filenames of your test input files, to make it easy to compare results. Of course, you'll need to link the C version with a file that provides print_int.)

The difference from c0 are as follows:

Evolving into c1 will require changes in both the parser (parser.cup) and the lexical analyzer (scanner.lex). In understanding what parser.cup does, you'll need some awareness of the documented interfaces of two support classes, Register and RegisterAllocator, but you needn't look at their implementations. (The implementations may well change, and even the interfaces are likely to change in the course of the semester -- hopefully in upwardly compatible ways.) If you want to look at the implementations in Register.java and RegisterAllocator.java, you can, of course.

Note that if you make any mistakes in the parser.cup or scanner.lex file, you might get an error report when you run java_cup or JLex, but then again, you might find that they just happily produce illegal Java code as their output, leading to error messages when you run javac.

Postlab

Write a lab report showing your c1 parser and scanner, highlighting the differences from the code for c0. You should also describe the tests you ran, and any problems uncovered.


Course web site: http://www.gac.edu/+max/courses/S2005/MCS-388/
Instructor: Max Hailperin <max@gac.edu>