MCS-388 Lab 4: Control Flow and Scoping (Spring 2003)

Due: April 23, 2003

Objective

This time you'll keep your compiler's general form intact as you evolve it from c3 to c4, but you'll provide for some important additions to the source language:

Several new kinds of statements:
- while statements
- if statements, both with and without else parts
- brace-enclosed compound statements
Nested variable scopes (arising from declarations in compound statements)
The six arithmetic comparison operators (<, >, <=, >=, ==, !=)

Of these, while and if statements introduce interesting control flow, for the first time allowing programs that do something other than continue in a straight-line path. Compound statements are important in conjunction with while and if statements, in order to allow a loop body (for example) to perform more than one action. Moreover, introducing compound statements gives us a natural context for variable scoping. The comparison operators are the only addition of no real substance: they are fundamentally similar to the arithmetic operators, and could equally well have been in the compiler the whole time. The only reason to add them now is that they become particularly handy to have now that we have loops and conditional statements.

Some notes on the new language features

We will follow the lead of C, rather than Java, in using integers to represent truth values, rather than having a separate boolean type. If the condition of a while or if evaluates to a non-zero value it counts as true, while a zero value counts as false. The comparison operations should all produce 1 for true and 0 for false. The comparison operators should be usable anywhere any other operator (e.g., +) could be used. So, for example, there is nothing wrong with

x = y > z;

which assigns x the value 1 or 0, or

print_int((x > y) + (z > w));

which prints one of 0, 1, or 2.

The syntax of while and if statements should be as in C (and C++ and Java). For example, the controlling expression in each case must be parenthesized, and there is no keyword (like then or do) between the parenthesized expression and the following statement. Remember that any statement can be used as the body of a loop, or as an alternative in an if statement: there is no requirement of curly braces. They have nothing to do with the while or if statement, and are only present if a compound statement is used, as part of the compound statement. For example, the following three while loops are all legal:

while(x >= 10)
  x = x / 10;     // body is an assignment statement

while(x >= 10){   // body is a compound statement, which
  x = x / 10;     // happens to only have one statement in it
  }

while(x >= 10){   // body is a more general compound statement
  int d = x % 10;
  x = x / 10;
  print_int(d);
  }

Each brace-enclosed compound statement constitutes a new scope, nested inside the surrounding scope. It is legal to redeclare a variable that was declared in some outer, surrounding scope, but not legal to redeclare a variable in the exact same scope. Each variable reference refers to a declaration that textually precedes the reference and is in a scope that has not yet ended at the point of the reference. If there is no such declaration, the variable reference is in error. If there is more than one such declaration, the one in the innermost scope is used. To determine whether a declaration textually precedes a reference, consider any declaration that has an initializer to take place at the end of the initializer. For example, in the program:

main(){
  int x;          // declaration 1
  x = 3;          // reference (a)
  while(x){       // reference (b)
    x = x - 1;    // references (c) and (d)
    int x = x + 5;// reference (e) and declaration 2
    print_int(x); // reference (f)
    }
  {
    print_int(x); // reference (g)
  }
  {
    int x = 2;    // declaration 3
  }
  print_int(x);   // reference (h)
  }

The x declared by declaration 1 is used by references (a), (b), (c), (d), (e), (g), and (h), while the x declared by declaration 2 is used only by reference (f), and the x declared by declaration 3 isn't used at all.

Quality expectations for generated code

Ideally, your compiler should include the optimizations from the prior lab. However, if you were unsuccesful with these in the prior lab, adding them now should be a lower priority than adding the new features.

If you do have the new features and the prior optimizations working, it would be desirable to also turn relational operators around when appropriate to be able to take advantage of immediate operands. For example, if the source code contains 3>x, it would be better to turn this around to x<3 so that you can generate an instruction like slt $t0, $s0, 3.

For while loops, it is preferable to generate code like this:

   jump to label1
label2:
   code for the body goes here
label1:
   code for the test goes here
   branch if true to label2

rather than:

label1:
   code for the test goes here
   branch if false to label2
   code for the body goes here
   jump to label1
label2:

Normally if statements translate into something like

   code for the test goes here
   branch if false to label1
   code for the first alternative goes here
   jump to label2
label1:
   code for the "else" part (2nd alternative) goes here
label2:

However, when the if statement has no else part, or where the else part is sufficiently vacuous that it translates to an empty string of assembly code, it would be preferable not to generate the jump to label2, since it is a waste of time to jump to the next instruction.

Don't worry if there are unusual circumstances under which your code will jump (or branch) to a jump instruction, or jump (or branch) to the next instruction, so long as you've taken care to not generate routinely stupid code. (Routinely stupid means something like jumping around a non-existent else part.)

Code to start from

Code from prior lab

I hope most students will have at least the general AST infrastructure from the prior lab more or less working, if not the optimizations. This should provide a sufficient base for the new lab since you can just omit the optimizations. For most of you it will therefore be preferable to continue building on that base, rather than throwing it out and starting over from code I supply. It would be nice to fix the problems I identified in my grading comments, though the new features are sufficiently de-coupled from many typical problem areas that failure to fix all the old problems won't necessarily cause any difficulty in doing the new stuff.

If any of you thinks your difficulties with the prior lab would put you at a disadvantage for doing the new lab, you are welcome to ask me for my code from the prior lab, to use as a starting point.

Sample ScopedIdentifier class

To help get you off on a good start with the scoping, I'm providing (linked onto the web version of this lab assignment) the interface documentation for my scoped version of the Identifier class. This class is available to you as compiled code in my JavaLib, if you choose not to build your own. It uses the "dictionary of stacks" approach we discussed in class. Note that this class is nearly, but not quite, compatible with the lab3 Identifier class, so I called it ScopedIdentifier rather than Identifier. This means that you will have to alter the other files to use ScopedIdentifier rather than Identifier. (Or you could write and use your own class.) The incompatibility is that the declare method now no longer allows you to register an arbitrary Object with the ScopedIdentifier, but instead only ones that implement the Declaration interface. This ensures that there is the leavingScope callback. This in turn implies that you will no longer be able to ask the ScopedIdentifier to remember a bare Register as its declaration, since the Register class doesn't implement the Declaration interface. Instead, you can make a new class that implements Declaration, and where each instance of that new class holds a Register.

My intention is that you call all the ScopedIdentifier class's method's during the parsing process, not during the AST's later code generation. This includes enterScope, exitScope, get, declare, getDeclaration and (when errors need reporting) toString. Of course, not all of these need to be literally called directly from parser.cup. It is OK if you want to call some of them indirectly, from something called from the parser. (As one example, you may well be calling the get method from the scanner rather than the parser -- but the parser calls the scanner.)

If you are looking for another learning opportunity, this would be a great class to write yourself.

Shift/reduce conflicts

Our textbook mentions that YACC will resolve a shift/reduce conflict in favor of shifting. The same is true for java_cup, but only if you give a command-line option telling it how many conflicts to expect. That is, rather than just

   java java_cup.Main <parser.cup

you may need to say

   java java_cup.Main -expect 1 <parser.cup

if your grammar has one shift/reduce conflict in it that can be correctly resolved in favor of shifting. This is likely to be an issue with the addition of if statements.

A constant-folding subtlety

Now that a program can contain code that is never executed, some subtleties arise with regard to division by zero. These issues shouldn't distract you from the main goals of the lab, but the perfectionists among you should care.

It is important for your compiler not to die if the program contains divisions by zero, because they might be in a non-executed part of the program. Thus, you'll have to not constant-fold any divisions where the divisor is zero.

Worse yet, it turns out that SPIM (our MIPS simulator) doesn't recognize that some instructions aren't executed, and so it won't load in an assembly language program that has a division instruction with an immediate 0 operand. Thus you'll need to be careful not to generate such instructions. (You can use register $zero instead.)

Postlab

Write a lab report showing your c4 modules, highlighting the differences from the code for c3. You should also describe the tests you ran, and any problems uncovered.

Course web site: http://www.gac.edu/~max/courses/S2003/MCS-388/
Instructor: Max Hailperin <max@gac.edu>