MCS-287 Notes for 2006-02-07

Comments on the Tucker and Noonan book:

p. 23: For three different reasons, the claim made on this page is incorrect that "C programs begin with a line that has the Keyword main followed by the three Separators (, ), and {." Those three reasons are as follows:
1. In C, unlike in Jay, main is not a keyword, it is just a normal identifier, no different in kind from foo. The execution of a program just happens to start by invoking the procedure named main
2. In C, the procedure name need not be followed by those three separators, at least not immediately and with nothing else between them. In particular, there should generally be parameter declarations between the parentheses.
3. Even in Jay, the specified elements need not appear on the first line. (They might be preceded by comment lines, for example.) In C, there is an even greater number of things that might precede main in the same file.
p. 24: The token types Keyword, Separator, and Operator are of doubtful utility, because it is not the case that all keywords play the same grammatical role as one another, and likewise for separators and operators. Therefore, there is nowhere one would actually make use of these categories as such. We can discuss this more in class, and you will also see this point playing out as we get into the higher-level syntactic analysis, such as in Section 2.2.
p. 26: The skeleton lexical analysis method shown in Figure 2.5 and in greater detail on the textbook's web site has several issues. (As a result, lab 1 contains an alternate version.) The difficulties with the book's version include at least the following:
- The code "first check[s] for white space" and skips it, "then check[s] for a comment and skip[s] it." The structure of the code ignores that after a comment could come more whitespace and/or more comments.
- The modularity of the handling of operators is poor. The code for comments needs to know that the isOperator procedure will return true for '/'. Moreover, isOperator is a misnomer, as it needs to be true for '|' and '&', which can be the first character of two-character operators, but are not themselves operators. The way the code is structured, handling the case where one of those characters is followed by a different character, and hence turns out not to be the start of an operator, will be tricky.
- The Token class does not exemplify good object-oriented programming style. Notice on page 26 that no arguments are passed to the constructor and that values are then assigned to the type and value instance variables subsequently. It would be better style to pass the type and value into the constructor. Using strings for the types is also inefficient, because testing string equality requires traversing the strings, and error prone, because the compiler will not detect a typo in one of the strings.
- Although this is not visible in the book, but rather only in the code on the web, the approach to detecting the end of file (EOF) is poorly designed. The design separates the method for getting the next token from the method for detecting EOF. We can discuss in class some of the problems this causes. A better design is to have a single method that returns the next token, if there is one, or an indication that EOF has been reached.

Course web site: http://www.gustavus.edu/+max/courses/S2006/MCS-287/
Instructor: Max Hailperin <max@gustavus.edu>