Comments on the Tucker and Noonan book:
p. 23: For three different reasons, the claim made on this page
is incorrect that "C programs begin with a line that has
the Keyword main
followed by the three Separators (
, )
, and {
." Those
three reasons are as follows:
In C, unlike in Jay, main
is not a keyword, it is
just a normal identifier, no different in kind from
foo
. The execution of a program just happens to start by
invoking the procedure named main
In C, the procedure name need not be followed by those three separators, at least not immediately and with nothing else between them. In particular, there should generally be parameter declarations between the parentheses.
Even in Jay, the specified elements need not appear on the
first line. (They might be preceded by comment lines, for example.)
In C, there is an even greater number of things that might precede
main
in the same file.
p. 24: The token types Keyword, Separator, and Operator are of doubtful utility, because it is not the case that all keywords play the same grammatical role as one another, and likewise for separators and operators. Therefore, there is nowhere one would actually make use of these categories as such. We can discuss this more in class, and you will also see this point playing out as we get into the higher-level syntactic analysis, such as in Section 2.2.
p. 26: The skeleton lexical analysis method shown in Figure 2.5 and in greater detail on the textbook's web site has several issues. (As a result, lab 1 contains an alternate version.) The difficulties with the book's version include at least the following:
The code "first check[s] for white space" and skips it, "then check[s] for a comment and skip[s] it." The structure of the code ignores that after a comment could come more whitespace and/or more comments.
The modularity of the handling of operators is poor. The code
for comments needs to know that the isOperator
procedure
will return true for '/'
. Moreover,
isOperator
is a misnomer, as it needs to be true for
'|'
and '&'
, which can be the first
character of two-character operators, but are not themselves
operators. The way the code is structured, handling the case where
one of those characters is followed by a different character, and
hence turns out not to be the start of an operator, will be tricky.
The Token
class does not exemplify good
object-oriented programming style. Notice on page 26 that no
arguments are passed to the constructor and that values are then
assigned to the type
and value
instance
variables subsequently. It would be better style to pass the type and
value into the constructor. Using strings for the types is also
inefficient, because testing string equality requires traversing the
strings, and error prone, because the compiler will not detect a typo
in one of the strings.
Although this is not visible in the book, but rather only in the code on the web, the approach to detecting the end of file (EOF) is poorly designed. The design separates the method for getting the next token from the method for detecting EOF. We can discuss in class some of the problems this causes. A better design is to have a single method that returns the next token, if there is one, or an indication that EOF has been reached.