MCS-287 Notes for 2006-02-27

Tucker and Noonan show Abstract Syntax Tree classes that are just structures that can hold data, without any methods operating on that data. In order to perform operations such as checking validity, determining types, or (later in the book) evaluating expressions, they use separate procedures that operate on the data. This is a very old-fashioned style of programming that was already well supported by languages such as Pascal and C. We'll look at a simpler example of this style and consider some of its shortcomings. Then we'll look at alternative rewrites of this example into more modern styles, which are well supported by languages like Java. In subsequent days we'll continue this examination and also briefly consider some yet-more-modern styles that other languages beyond Java support.

We can call Tucker and Noonan's style a "structures + procedures" style, or just "procedural" for short. Using this procedural style, we can define a simple type of expressions with three subtypes: sums (of two subexpressions), products (of two subexpressions), and integer constants; see procedural/Expr.java. Two separate collections of procedures can operate on this same collection of data structures: one to evaluate expressions (procedural/Evaluator.java) and one to convert them into Scheme notation (procedural/Converter.java). A test program, procedural/Test.java, shows how an AST could be built up and then both converted and valuated.

In the procedural style, the dispatching methods such as evaluate need to explicitly test which kind of Expr is being operated on. In Java, this can be done using instanceof, as shown in the example code. In earlier languages, such as C and Pascal, it can be accomplished by tagging each structure with an explicit type tag. Because it is possible for the dispatching procedure to distinguish among the various structures, with the main Expr type being the union of all of them, this kind of structure is called a discriminated union.

One alternative approach would be an object-oriented style, embodying the so-called "composite pattern". In this style, the AST classes (shown in composite/Expr.java) directly embody evaluate and convert methods. The test program, composite/Test.java, invokes those methods.

I've taken the opportunity to clean up how the AST is constructed: instead of creating the structures "empty" and then assigning values to the instance variables, the structures are constructed in a meaningful state using constructor procedures. (The instance variables can also now be private.) This change could have been made on its own, without fundamentally deviating from the procedural style. (If the instance variables were private, accessor methods would need to be provided for use by the external evaluation and conversion procedures.)

The more fundamental change is the switch from external procedures acting on the structures to methods within the objects. Note in particular that the chains of ifs with instanceof tests are gone, and that Java's static type checking now ensures that there is a way to evaluate and convert each kind of expression: the code to generate runtime error messages saying "Unknown kind of Expr" is gone.

This design is a very suitable one when the number of operations (such as evaluation and conversion) remains small and fixed, whereas the number of kinds of data (kinds of expressions) is large and subject to growth. Unfortunately, that doesn't characterize programming language processing very well. The abstract syntax of a programming language is generally quite stable, whereas new analysis, optimization, and translation procedures can be invented more readily. So as not to have to keep adding new methods to all the classes (analogous to the evaluate and convert methods), it would be nice if we could group all the evaluation methods together in one separate Evaluator class, as in the original procedural approach. Similarly, we would group all the conversion methods together in a separate Converter class, and likewise for any other operations we wanted to add. Yet we still want to retain the advantages of the object-oriented approach. This leads to the so-called "visitor" pattern, which we can examine next.

We start by generalizing from the notion of an evaluator or a converter to a visitor, defined with the interface visitor/Visitor.java. Notice that this is a generic interface, where the type parameterization is used to allow different visitors to return different types of results as they visit each AST node in turn. (An evaluator returns integers whereas a converter returns strings.) The two specific visitors can be defined as classes implementing that interface, namely visitor/Evaluator.java and visitor/Converter.java. The AST classes in visitor/Expr.java are freed from any knowledge of these specific visitors (or any others that might be added); instead, they just have a general method to accept any visitor. This method is used (among other places) in the main visitor/Test.java

Course web site: http://www.gustavus.edu/+max/courses/S2006/MCS-287/
Instructor: Max Hailperin <max@gustavus.edu>