MCS-388 Lab 5: Procedures (Spring 2005)

Due: May 11, 2005

Objective

You will expand the language accepted by your compiler to include multiple procedures (rather than just the main procedure and the predefined print_int procedure), and to include procedure calls and returns.

Most of the features you introduced in your previous c4 compiler are independent from the new material on procedures. Thus, if you had problems with the control flow statements or comparison operations in c4, you can build c5 from c3 rather than c4. (Of course, procedures are less interesting without conditionals, since you can't do non-infinite recursion without the ability to test for a base case.) The only c4 feature you need is scoping. (Even that only needs to work for two levels, rather than fully generally.)

Warning/Invitation

A couple years ago, one of my students wrote in his course evaluation: "I noticeably failed the procedure lab. In retrospect, I should have gone to you for more help. I don't blame you for that failing." Don't put yourself in that position; I would welcome the opportunity to work with you.

Levels at which this lab can be done

I have identified a minimal set of functionality that I fully expect everyone to achieve, a more complete set that I hope everyone will achieve (but which it would be wise to put at a lower priority than the minimal set), and a final most-complete set that I expect some of you will achieve.

Minimal functionality

The minimal functionality is to allow the declaration and definition of procedures that have the ``return type'' void, i.e., that don't return any value, together with new statement types for calling such procedures and returning from them. The details are explained below, but first let's look at an example program:

void foo(int x);

void bar(){
  foo(3);
}

void main(){
  bar();
}

void print_int(int val);

void foo(int count){
  if(count == 0)
    return;
  foo(count - 1);
  print_int(count);
}

The assembly language code corresponding to this example program is given in the later section on generated code.

The overall structure of a program should now be a list of zero or more items, each of which is either a procedure declaration or a procedure definition. (Note that main is no longer special; its definition is just like any other procedure definition. The lexical analyzer should stop recognizing main, and similarly print_int, as a keyword, and instead treat it as just a normal identifier.)

A procedure declaration specifies the name, return type, and parameters of a procedure, but doesn't provide any body code for it. Here is an example procedure declaration:

void foo(int one, int another);

There are two reasons for procedure declarations. One is to provide type information for library procedures not defined in the source file. For example, you'll probably want to use the declaration

void print_int(int value);

The other reason for a declaration is provide early information about a procedure that will be defined later in the source file. The reason why this is important is that we will impose a rule that no procedure can be called before it has been either declared or defined. Thus if we want to have two mutually-recursive procedures (only one of which can be defined before the other, but each of which calls the other), we will need to use a procedure declaration.

A procedure definition looks just like a procedure declaration, except that in place of the semicolon it has a brace-enclosed block of statements. It serves as a declaration, but additionally provides for the actual code. We will have a rule that each procedure may only be defined once, but it can be declared as many times as you want, so long as all the declarations are consistent with one another and with the definition (if any). We'll treat procedure definitions as declarations for the procedure before the body of the procedure; that way self-recursive procedures are legal without needing a separate declaration of the procedure first.

You should eliminate one kind of statement, the special print_int statement, and in its place introduce a general procedure call statement. A procedure call is written as the name of the procedure, followed by a parenthesized list of comma-separated argument expressions, and then a semicolon. (Thus the previous print_int statements become legal procedure-call statements.) A procedure call is only legal if the procedure has been declared or defined and the number and type of arguments matches the parameters in the declaration or definition. (Currently the only possible type is int.) Additionally, for procedure calls used as statements, the declaration or definition must show a return type of void.

You should also add one other kind of statement, a return statement, with the syntax (for now) being that it is just the keyword return followed by a semicolon. Execution of this statement causes the procedure to immediately return.

For simplicity, we will restrict procedures to take at most four arguments, so they can be passed in the $a registers. You may also simplify your compiler, at this first, minimal level of functionality, by leaving out some of the error checking. (However, this is just an expedient for getting something working quickly; I do want you to eventually add in the error checking, and some of it may be easier to design in from the beginning.)

Less-minimal functionality

Once you have the above minimal functionality working, you should add in any error checking you left out, and expand your language to include procedures with return type int as well as void. This will be accompanied by needing a way to call and return from this new kind of procedure.

To call these procedures, you'll need a procedure call expression, rather than just a procedure call statement. However, for this level of functionality (and difficulty), you should restrict the language syntax in such a way that procedure call expressions can only occur as top-level expressions, not nested inside any other expression or inside a procedure call statement. That way you can be sure that none of the caller-saves registers will be live at the time of the procedure call, which simplifies matters. Procedure calls appearing as expressions look just like those appearing as statements; the error checking is also the same except that now the procedure must be one that returns an int.

To return a value from these procedures, you'll add a second form of return statement, in which an expression appears between the return keyword and the semicolon. Ideally you should check that this form of return statement is only used in procedures that are declared as returning int, and that the expressionless form is used only in procedures that are declared as returning void. Don't worry about checking for procedures that are declared as returning int but might fall off the end without executing a return statement; that requires more sophisticated technology to check for, which we'll discuss in class.

Below is a second example program, using the features described above. Again, the assembly language translation is given in the section on generated code.

int id(int x){
  return x;
}

void print_int(int x);

int main(){
  if(id(0))
    return 1;
  int three = id(3);
  print_int(three);
  return 0;
}

The ultimate functionality

If you can get the above working, and still are looking for further challenge, it would be desirable to allow procedure call expressions to also be nested within other expressions. In order to do this, you'll need to keep track of what caller-saves registers (temporary registers and argument registers) are in use across the call, and arrange to save them to the stack and restore them from the stack. You'll also need to move a procedure call's return value out of the value register and into a temporary register if the call appears as a sub-expression and another call occurs in the same overall expression. The easiest way to deal with this is to always move the value to another register, namely the target register if one is specified, or a temporary if not.

Generated code

Translation of first example program

For brevity, I won't show the standard print_int library code, which always appears in the assembly output no matter what the program is.

bar:
        subu $sp, $sp, 4
        sw $ra, 0($sp)
        li $a0, 3
        jal foo
bar_epilog:
        lw $ra, 0($sp)
        addu $sp, $sp, 4
        jr $ra

main:
        subu $sp, $sp, 4
        sw $ra, 0($sp)
        jal bar
main_epilog:
        lw $ra, 0($sp)
        addu $sp, $sp, 4
        jr $ra

foo:
        subu $sp, $sp, 8
        sw $ra, 0($sp)
        sw $s0, 4($sp)
        move $s0, $a0
        seq $t0, $s0, 0
        beqz $t0, if_0
        j foo_epilog
if_0:
        sub $a0, $s0, 1
        jal foo
        move $a0, $s0
        jal print_int
foo_epilog:
        lw $ra, 0($sp)
        lw $s0, 4($sp)
        addu $sp, $sp, 8
        jr $ra

Procedures

No assembly code needs to be generated for a procedure declaration, it serves only for the compiler's own bookkeeping. (However, the library procedure print_int should still be emitted at the beginning.)

For a procedure definition, you should emit the procedure's name as a label, then the procedure's prolog, body, and labelled epilog. The general structure is like that of the main procedure in prior labs. Upon entry to a procedure, the procedure should move its arguments from the $a registers they arrive in to $s registers, and thereafter treat them as normal local variables living in the $s registers. The epilog should have a label at the beginning for return statements to jump to. (Remember that each procedure will need to use a different label for this, so that multiple procedures can co-exist.) Another novelty is that the placing of 0 into the $v0 register should be omitted from the epilog. Instead, if someone wants to return 0, they will need to use an explicit return 0; statement, which places the 0 into $v0 and then jumps to the epilog.

Return statements

As indicated above, a return statement just needs to evaluate the expression (if any) into $v0 and then jump to the epilog label. There are two ways the return statement could find its procedure's epilog label:

Add an argument to Stmt gen methods for the epilog label and pass it all the way down from the procedure definition through SeqStmt, WhileStmt, etc. until it reaches ReturnStmt.
Use a static variable (preferably private, with access through a public static method) to record the epilog label from the procedure definition and then retrieve it in the ReturnStmt gen method.

Procedure call statements

A procedure call statement can be compiled simply by generating the code for all the argument expressions, each targeted into the appropriate argument register ($a0, $a1, etc.), and then a jal instruction with the called procedure's name. Remember that the jal instruction implicitly writes into the $ra register. Thus you will need to add this register to the set of registers that needs saving. (The saving/restoring will get done in the calling procedure's prolog and epilog.)

Procedure call expressions that aren't subexpressions

If you allow procedure calls to be expressions, but only as top level expressions, then you can generate the same code for them as for a procedure call statement, followed by a move of the value from $v0 to the specified target register, if any. (If no target register was specified, the value can be provided in $v0. This might be relevant if the procedure call were serving as the test expression for a while loop or if statement.) For example, here is the translation of the second example program given earlier (again, without print_int):

id:
        subu $sp, $sp, 4
        sw $s0, 0($sp)
        move $s0, $a0
        move $v0, $s0
        j id_epilog
id_epilog:
        lw $s0, 0($sp)
        addu $sp, $sp, 4
        jr $ra

main:
        subu $sp, $sp, 8
        sw $ra, 0($sp)
        sw $s0, 4($sp)
        li $a0, 0
        jal id
        beqz $v0, if_0
        li $v0, 1
        j main_epilog
if_0:
        li $a0, 3
        jal id
        move $s0, $v0
        move $a0, $s0
        jal print_int
        li $v0, 0
        j main_epilog
main_epilog:
        lw $ra, 0($sp)
        lw $s0, 4($sp)
        addu $sp, $sp, 8
        jr $ra

Procedure call subexpressions

If you go all the way and allow procedure calls to also serve as subexpressions and as arguments in procedure call statements, then you will need to precede the above-mentioned code with sw instructions to save to the stack frame any $t or $a registers that are currently in use. (The latter could arise if this procedure call were nested inside another one, e.g., f(x, g(y)).) After the procedure call, you will need to generate corresponding lw instructions. Additionally, somewhere you will need to allocate space in the stack frame. The simplest approach is to subtract from the stack pointer before the stores and add back to it after the loads. (Another approach is pre-allocate space in the procedure prolog, so that the only adjusting of the stack pointer is in the prolog and epilog.)

Postlab

Write a lab report showing your c5 modules, highlighting the differences from the code for c4 (or c3). You should also describe the tests you ran, and any problems uncovered.

Course web site: http://www.gac.edu/~max/courses/S2005/MCS-388/
Instructor: Max Hailperin <max@gustavus.edu>