MCS-388 Lab 4: Control Flow and Scoping (Spring 2005)

Due: April 19, 2005

Objective

This time you'll keep your compiler's general form intact as you evolve it from c3 to c4, but you'll provide for some important additions to the source language:

Of these, while and if statements introduce interesting control flow, for the first time allowing programs that do something other than continue in a straight-line path. Compound statements are important in conjunction with while and if statements, in order to allow a loop body (for example) to perform more than one action. Moreover, introducing compound statements gives us a natural context for variable scoping. The comparison operators are the only addition of no real substance: they are fundamentally similar to the arithmetic operators, and could equally well have been in the compiler the whole time. The only reason to add them now is that they become particularly handy to have now that we have loops and conditional statements.

Some notes on the new language features

We will follow the lead of C, rather than Java, in using integers to represent truth values, rather than having a separate boolean type. If the condition of a while or if evaluates to a non-zero value it counts as true, while a zero value counts as false. The comparison operations should all produce 1 for true and 0 for false. The comparison operators should be usable anywhere any other operator (e.g., +) could be used. So, for example, there is nothing wrong with
x = y > z;
which assigns x the value 1 or 0, or
print_int((x > y) + (z > w));
which prints one of 0, 1, or 2.

The syntax of while and if statements should be as in C (and C++ and Java). For example, the controlling expression in each case must be parenthesized, and there is no keyword (like then or do) between the parenthesized expression and the following statement. Remember that any statement can be used as the body of a loop, or as an alternative in an if statement: there is no requirement of curly braces. They have nothing to do with the while or if statement, and are only present if a compound statement is used, as part of the compound statement. For example, the following three while loops are all legal:

while(x >= 10)
  x = x / 10;     // body is an assignment statement

while(x >= 10){   // body is a compound statement, which
  x = x / 10;     // happens to only have one statement in it
  }

while(x >= 10){   // body is a more general compound statement
  int d = x % 10;
  x = x / 10;
  print_int(d);
  }

Each brace-enclosed compound statement constitutes a new scope, nested inside the surrounding scope. It is legal to redeclare a variable that was declared in some outer, surrounding scope, but not legal to redeclare a variable in the exact same scope. Each variable reference refers to a declaration that textually precedes the reference and is in a scope that has not yet ended at the point of the reference. If there is no such declaration, the variable reference is in error. If there is more than one such declaration, the one in the innermost scope is used. To determine whether a declaration textually precedes a reference, consider any declaration that has an initializer to take place at the end of the initializer. For example, in the program:

main(){
  int x;          // declaration 1
  x = 3;          // reference (a)
  while(x){       // reference (b)
    x = x - 1;    // references (c) and (d)
    int x = x + 5;// reference (e) and declaration 2
    print_int(x); // reference (f)
    }
  {
    print_int(x); // reference (g)
  }
  {
    int x = 2;    // declaration 3
  }
  print_int(x);   // reference (h)
  }
The x declared by declaration 1 is used by references (a), (b), (c), (d), (e), (g), and (h), while the x declared by declaration 2 is used only by reference (f), and the x declared by declaration 3 isn't used at all.

Quality expectations for generated code

Ideally, your compiler should include the optimizations from the prior lab. However, if you were unsuccesful with these in the prior lab, adding them now should be a lower priority than adding the new features.

If you do have the new features and the prior optimizations working, it would be desirable to also turn relational operators around when appropriate to be able to take advantage of immediate operands. For example, if the source code contains 3>x, it would be better to turn this around to x<3 so that you can generate an instruction like slt $t0, $s0, 3.

For while loops, it is preferable to generate code like this:

   jump to label1
label2:
   code for the body goes here
label1:
   code for the test goes here
   branch if true to label2
rather than:
label1:
   code for the test goes here
   branch if false to label2
   code for the body goes here
   jump to label1
label2:

Normally if statements translate into something like

   code for the test goes here
   branch if false to label1
   code for the first alternative goes here
   jump to label2
label1:
   code for the "else" part (2nd alternative) goes here
label2:
However, when the if statement has no else part, or where the else part is sufficiently vacuous that it translates to an empty string of assembly code, it would be preferable not to generate the jump to label2, since it is a waste of time to jump to the next instruction.

Don't worry if there are unusual circumstances under which your code will jump (or branch) to a jump instruction, or jump (or branch) to the next instruction, so long as you've taken care to not generate routinely stupid code. (Routinely stupid means something like jumping around a non-existent else part.)

Code to start from

Code from prior lab

I hope most students will have at least the general AST infrastructure from the prior lab more or less working, if not the optimizations. This should provide a sufficient base for the new lab since you can just omit the optimizations. For most of you it will therefore be preferable to continue building on that base, rather than throwing it out and starting over from code I supply. It would be nice to fix the problems I identified in my grading comments, though the new features are sufficiently de-coupled from many typical problem areas that failure to fix all the old problems won't necessarily cause any difficulty in doing the new stuff.

If any of you thinks your difficulties with the prior lab would put you at a disadvantage for doing the new lab, you are welcome to ask me for my code from the prior lab, to use as a starting point.

Sample SymbolTable class

To help get you off on a good start with the scoping, I'm providing (linked onto the web version of this lab assignment) the interface documentation for my SymbolTable class, which I demonstrated for you. This class is available to you as compiled code in my JavaLib, if you choose not to build your own. It uses the "map of stacks" approach we discussed in class.

My intention is that you call all the SymbolTable's methods during the parsing process, not during the AST's later code generation. You probably understand how get and put will be used, because they are so similar to the corresponding HashMap methods in the previous lab; the one difference is in exception handling. The enterScope and exitScope methods are also not a major challenge. Thus, the most confusing question for you is likely to be how to use localValues. The point of the localValues method is that you can call it immediately before exitScope, to find out the Registers that need deallocation.

If you are looking for another learning opportunity, SymbolTable would be a great class to write yourself.

Shift/reduce conflicts

Our textbook mentions that YACC will resolve a shift/reduce conflict in favor of shifting. The same is true for java_cup, but only if you give a command-line option telling it how many conflicts to expect. That is, rather than just
   java java_cup.Main <parser.cup
you may need to say
   java java_cup.Main -expect 1 <parser.cup
if your grammar has one shift/reduce conflict in it that can be correctly resolved in favor of shifting. This is likely to be an issue with the addition of if statements.

A constant-folding subtlety

Now that a program can contain code that is never executed, some subtleties arise with regard to division by zero. These issues shouldn't distract you from the main goals of the lab, but the perfectionists among you should care, once your compiler is working in more fundamental regards.

It is important for your compiler not to die if the program contains divisions by zero, because they might be in a non-executed part of the program. Thus, you'll have to not constant-fold any divisions where the divisor is zero.

Worse yet, it turns out that SPIM (our MIPS simulator) doesn't recognize that some instructions aren't executed, and so it won't load in an assembly language program that has a division instruction with an immediate 0 operand. Thus you'll need to be careful not to generate such instructions. (You can use register $zero instead.)

Postlab

Write a lab report showing your c4 modules, highlighting the differences from the code for c3. You should also describe the tests you ran, and any problems uncovered.


Course web site: http://www.gac.edu/~max/courses/S2005/MCS-388/
Instructor: Max Hailperin <max@gustavus.edu>