Comments on the Tucker and Noonan book:
On page 30, the claim is made that testing for uses of variables that have not been assigned values "cannot be addressed until runtime." This is true if a perfect test is desired. However, that approach is not very practical; mainstream language systems do not generally include runtime tests for unassigned variables. By contrast, a compile-time approach is in common use, even if it is not perfect. The Java Language Specification says that for a program to be legal, the compiler must be able to prove (using some simple rules) that all variables are assigned before being used. If it fails to prove that variables are definitely assigned, it rejects the program, even if the program might in fact work correctly at runtime. For example, you cannot compile the following Java program:
public class Foo { public static void main(String[] args){ int x; if(args.length > 0){ x = args.length; } else { System.exit(1); } System.out.println(x); } }
The compiler will complain that the variable x
might
not have been initialized where its value is printed. Yet in fact
there is no way to get to that statement without assigning a value to
x
, because the branch that doesn't assign to
x
exits the program.
On pages 32 and 34, the BNF nonterminal symbol Conditional should be IfStatement.
Page 33 incorrectly claims that Java's
StatementNoShortIf category includes
IfThenElseStatement. Actually it includes
IfThenElseStatmentNoShortIf, which differs from
IfThenElseStatement in that the part after the keyword
else
is another StatementNoShortIf, rather than a
general Statement. Without this additional syntactic category,
the ambiguity would not be eliminated.
Page 35 says that Java has 47 keywords. The number 47 was correct for the
original version of Java, but the list has meanwhile grown to 50. Of
the three additions, we've already seen one, enum
. For
the trivia minded, the other two are assert
and
strictfp
. An interesting consequence is that an old Java
program that uses assert
as an identifier won't compile
under a new compiler unless the compiler is told to turn off the new features.
The same paragraph on page 35 also says that the Java keywords include all those
shown for Jay in Figure 2.3. Actually, there is one Jay keyword that is not a Java keyword, namely main
.
On page 39, footnote 8 seems to imply that parsing algorithms encountered in compiler design courses backtrack when use of a grammatical rule leads to a dead end. In point of fact, compiler design courses (and practical compiler design) generally stick to those parsing algorithms that avoid backtracking by only choosing a grammatical rule when the choice can only lead to a dead end if the input contains a syntax error. As mentioned on the next page, the LL(1) recursive descent parsing that Tucker and Noonan describe is one example of such an algorithm.
On page 40, Figure 2.17 contains a main
procedure
that will accept not only syntactically correct assignment statements,
but also syntactically incorrect ones that start out with a correct
assignment, but then have some extra tokens added to the end. For a
parser to accept only correct input, it should check at the end that
the TokenStream
has reached end of file.
On page 41, Figure 2.18's algorithm for writing recursive descent parsers doesn't take into account the EBNF feaure of optional elements (those with the subscript opt). The algorithm also talks about creating an object of class A, where A is the nonterminal symbol on the left of the concrete syntax rule. However, earlier we were told the classes will correspond with the abstract syntax categories. Finally, the "appropriate" while loop and if-else statements are left unspecified. My best advice is to wait and see how this parser construction plays out in specific examples, both here and (especially) in lab 2.
Perhaps the most important thing to take away from the section on parsing is an understanding of how the parser makes choices. We should look at the EBNF grammar for Jay and come to an understanding of how we can decide
Whether to take another trip through the while loop for a repeated element, such as in the production for Declarations.
Whether to include an optional element or not, such as in the production for Negation or (more interestingly) IfStatement.
Which alternative production to use for a nonterminal that has several, such as Statement.
On pages 41-42, the parsing code directly accesses
token.type
and token.value
, in keeping with
Tucker and Noonan's version of the Token
class. If you
are instead using my modified version of the class, you will need to
change these accesses to token.getType()
and
token.getValue()
. Similarly, the code at the top of page
42 tests a token type for equality with the string
"Identifier"
, rather than with
Token.Type.IDENTIFIER
. (In switching to
Token.Type.IDENTIFIER
, the equality test can also be
changed to ==
.)
On page 43, footnote 12 contains three specifications of the Fibonacci sequence:
The English statement that "the first two numbers are 1 and each successive number is the sum of the two preceding ones."
The example of the early numbers in the sequence: "0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...."
The algebraic statement that fib(n) = 0 if n = 0 or 1, and fib(n) = fib(n-1) + fib(n-2) otherwise.
None of these three are in agreement with each other. The first two simply differ on the starting point, whereas the third describes an entirely different (and much less interesting) sequence.