MCS-287 Notes for 2006-02-14

Comments on the Tucker and Noonan book:

On page 30, the claim is made that testing for uses of variables that have not been assigned values "cannot be addressed until runtime." This is true if a perfect test is desired. However, that approach is not very practical; mainstream language systems do not generally include runtime tests for unassigned variables. By contrast, a compile-time approach is in common use, even if it is not perfect. The Java Language Specification says that for a program to be legal, the compiler must be able to prove (using some simple rules) that all variables are assigned before being used. If it fails to prove that variables are definitely assigned, it rejects the program, even if the program might in fact work correctly at runtime. For example, you cannot compile the following Java program:
```
public class Foo {
    public static void main(String[] args){
        int x;
        if(args.length > 0){
            x = args.length;
        } else {
            System.exit(1);
        }
        System.out.println(x);
    }
}
```
The compiler will complain that the variable x might not have been initialized where its value is printed. Yet in fact there is no way to get to that statement without assigning a value to x, because the branch that doesn't assign to x exits the program.
On pages 32 and 34, the BNF nonterminal symbol Conditional should be IfStatement.
Page 33 incorrectly claims that Java's StatementNoShortIf category includes IfThenElseStatement. Actually it includes IfThenElseStatmentNoShortIf, which differs from IfThenElseStatement in that the part after the keyword else is another StatementNoShortIf, rather than a general Statement. Without this additional syntactic category, the ambiguity would not be eliminated.
Page 35 says that Java has 47 keywords. The number 47 was correct for the original version of Java, but the list has meanwhile grown to 50. Of the three additions, we've already seen one, enum. For the trivia minded, the other two are assert and strictfp. An interesting consequence is that an old Java program that uses assert as an identifier won't compile under a new compiler unless the compiler is told to turn off the new features.
The same paragraph on page 35 also says that the Java keywords include all those shown for Jay in Figure 2.3. Actually, there is one Jay keyword that is not a Java keyword, namely main.
On page 39, footnote 8 seems to imply that parsing algorithms encountered in compiler design courses backtrack when use of a grammatical rule leads to a dead end. In point of fact, compiler design courses (and practical compiler design) generally stick to those parsing algorithms that avoid backtracking by only choosing a grammatical rule when the choice can only lead to a dead end if the input contains a syntax error. As mentioned on the next page, the LL(1) recursive descent parsing that Tucker and Noonan describe is one example of such an algorithm.
On page 40, Figure 2.17 contains a main procedure that will accept not only syntactically correct assignment statements, but also syntactically incorrect ones that start out with a correct assignment, but then have some extra tokens added to the end. For a parser to accept only correct input, it should check at the end that the TokenStream has reached end of file.
On page 41, Figure 2.18's algorithm for writing recursive descent parsers doesn't take into account the EBNF feaure of optional elements (those with the subscript opt). The algorithm also talks about creating an object of class A, where A is the nonterminal symbol on the left of the concrete syntax rule. However, earlier we were told the classes will correspond with the abstract syntax categories. Finally, the "appropriate" while loop and if-else statements are left unspecified. My best advice is to wait and see how this parser construction plays out in specific examples, both here and (especially) in lab 2.
Perhaps the most important thing to take away from the section on parsing is an understanding of how the parser makes choices. We should look at the EBNF grammar for Jay and come to an understanding of how we can decide
- Whether to take another trip through the while loop for a repeated element, such as in the production for Declarations.
- Whether to include an optional element or not, such as in the production for Negation or (more interestingly) IfStatement.
- Which alternative production to use for a nonterminal that has several, such as Statement.
On pages 41-42, the parsing code directly accesses token.type and token.value, in keeping with Tucker and Noonan's version of the Token class. If you are instead using my modified version of the class, you will need to change these accesses to token.getType() and token.getValue(). Similarly, the code at the top of page 42 tests a token type for equality with the string "Identifier", rather than with Token.Type.IDENTIFIER. (In switching to Token.Type.IDENTIFIER, the equality test can also be changed to ==.)
On page 43, footnote 12 contains three specifications of the Fibonacci sequence:
1. The English statement that "the first two numbers are 1 and each successive number is the sum of the two preceding ones."
2. The example of the early numbers in the sequence: "0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...."
3. The algebraic statement that fib(n) = 0 if n = 0 or 1, and fib(n) = fib(n-1) + fib(n-2) otherwise.
None of these three are in agreement with each other. The first two simply differ on the starting point, whereas the third describes an entirely different (and much less interesting) sequence.

Course web site: http://www.gustavus.edu/+max/courses/S2006/MCS-287/
Instructor: Max Hailperin <max@gustavus.edu>