MCS-287 Lab 1: Lexical Analysis (Spring 2006)

Due: February 27, 2006

Objective

In this lab, you will complete a lexical analyzer for Jay, starting from a skeleton. You will also test your analyzer by running it on various input files and seeing the individual tokens printed out one by one, with their types and values.

Along the way, you will not only learn about the manual construction of a lexical analyzer, you will also improve your ability to read and write Java. For example, you will work with loops that test for exit conditions in the middle or at the end, rather than at the beginning, and you will need to use several classes from the Java API, which presumably will entail consulting the on-line documentation.

Plan of work

Start by downloading and reading the skeleton version of TokenStream.java. This is derived from Figure 2.5 on page 26 of the textbook and from the version on the textbook's supporting web site, but contains several design improvements and modernizations, which I will discuss in class. (I later provided a further revised version; see the class notes for 2006-02-13.)

Your goal is to replace each instance of ... with the missing portion of Java code. However, you should not try doing all the changes at once. Instead, plan to incrementally alternate between programming and testing. What are the minimal changes that it would take to have something testable? Make those changes first, commenting out the other missing portions of code, and test. You might, for example, have an initial version that classifies every input character as a token of type OTHER. This is of course incorrect (in general), but suffices to show that you have some of the basic machinery working.

To compile and test the lexical analyzer, you would give the following two commands in a shell, assuming that you have your testing input in a file called testInput1:

javac TokenStream.java
java TokenStream testInput1

The result of this test should be one line of output for each token, displaying the type and value of the token.

Once the basic machinery is working, add features one by one, testing each as you add it. Keep in mind that you may want to add features from simplest to most difficult, rather than in the order they appear in the file. That way you will have a chance to build up your skills before tackling the harder problems. For example, analyzing two-character operators, which appears quite early in the file, is one of the most challenging features, and hence might best be left for later. You could comment out that whole brace-enclosed block of code until you are ready for that challenge.

Your goal in testing is to show that each feature works correctly, but not to distract yourself with lots of useless repetitions of the same basic test. For example, the Jay program in Figure 2.19 on page 43 would make a poor test input. In contains several repetitions of essentially similar tokens, but has a very small variety of different situations. (For example, notice that all five integer literals in this program are a single digit in length.)

A little glitch

Note that the isSeparator method in the provided file, taken from the textbook's web site, has a few extra options beyond those listed in the textbook's specification of Jay. Apparently this is so it can be used also later in the book when additional features are added to Jay for arrays and object-oriented programming. You are welcome to either leave this as-is or correct it to match the published specification of Jay.

What you should turn in

Please turn in a printout of your final version of TokenStream.java. You should also turn in evidence of your testing of this final version. Remember that your goal in the tests is to show that each case works, but not to test the same case repeatedly.


Course web site: http://www.gustavus.edu/+max/courses/S2006/MCS-287/
Instructor: Max Hailperin <max@gustavus.edu>