MC97 Lab 5: Procedures (Spring 1997)

Due: May 7, 1997

Objective

You will expand the language accepted by your compiler to include multiple procedures (rather than just the main procedure and the predefined print_int procedure), and to include procedure calls and returns.

Most of the features you introduced in your previous c4 compiler are independent from the new material on procedures. Thus, if you had problems with the control flow statements or comparison operations in c4, you can build c5 from c3 rather than c4. (Of course, procedures are less interesting without conditionals, since you can't do non-infinite recursion without the ability to test for a base case.) The only c4 feature you need is scoping. (Even that only needs to work for two levels, rather than fully generally.)

Levels at which this lab can be done

I have identified a minimal set of functionality that I fully expect everyone to achieve, a more complete set that I hope everyone will achieve (but which it would be wise to put at a lower priority than the minimal set), and a final most-complete set that I expect some of you will achieve.

Minimal functionality

The minimal functionality is to allow the declaration and definition of procedures that have the ``return type'' void, i.e., that don't return any value, together with new statement types for calling such procedures and returning from them.

The overall structure of a program should now be a list of zero or more items, each of which is either a procedure declaration or a procedure definition. (Note that main is no longer special; its definition is just like any other procedure definition. The lexical analyzer should stop recognizing main, and similarly print_int, as a keyword, and instead treat it as just a normal identifier.)

A procedure declaration specifies the name, return type, and parameters of a procedure, but doesn't provide any body code for it. Here is an example procedure declaration:

void foo(int one, int another);

There are two reasons for procedure declarations. One is to provide type information for library procedures not defined in the source file. For example, you'll probably want to use the declaration

void print_int(int value);

The other reason for a declaration is provide early information about a procedure that will be defined later in the source file. The reason why this is important is that we will impose a rule that no procedure can be called before it has been either declared or defined. Thus if we want to have two mutually-recursive procedures (only one of which can be defined before the other, but each of which calls the other), we will need to use a procedure declaration.

A procedure definition looks just like a procedure declaration, except that in place of the semicolon it has a brace-enclosed block of statements. It serves as a declaration, but additionally provides for the actual code. We will have a rule that each procedure may only be defined once, but it can be declared as many times as you want, so long as all the declarations are consistent with one another and with the definition (if any). We'll treat procedure definitions as declarations for the procedure before the body of the procedure; that way self-recursive procedures are legal without needing a separate declaration of the procedure first.

You should eliminate one kind of statement, the special print_int statement, and in its place introduce a general procedure call statement. A procedure call is written as the name of the procedure, followed by a parenthesized list of comma-separated argument expressions, and then a semicolon. (Thus the previous print_int statements become legal procedure-call statements.) A procedure call is only legal if the procedure has been declared or defined and the number and type of arguments matches the parameters in the declaration or definition. (Currently the only possible type is int.) Additionally, for procedure calls used as statements, the declaration or definition must show a return type of void.

You should also add one other kind of statement, a return statement, with the syntax (for now) being that it is just the keyword return followed by a semicolon. Execution of this statement causes the procedure to immediately return.

For simplicity, we will restrict procedures to take at most four arguments, so they can be passed in the $a registers. You may also simplify your compiler, at this first, minimal level of functionality, by leaving out some of the error checking. (However, this is just an expedient for getting something working quickly; I do want you to eventually add in the error checking, and some of it may be easier to design in from the beginning.)

Less-minimal functionality

Once you have the above minimal functionality working, you should add in any error checking you left out, and expand your language to include procedures with return type int as well as void. This will be accompanied by needing a way to call and return from this new kind of procedure.

To call these procedures, you'll need a procedure call expression, rather than just a procedure call statement. However, for this level of functionality (and difficulty), you should restrict the language syntax in such a way that procedure call expressions can only occur as top-level expressions, not nested inside any other expression or inside a procedure call statement. That way you can be sure that none of the caller-saves registers will be live at the time of the procedure call, which simplifies matters. Procedure calls appearing as expressions look just like those appearing as statements; the error checking is also the same except that now the procedure must be one that returns an int.

To return a value from these procedures, you'll add a second form of return statement, in which an expression appears between the return keyword and the semicolon. Ideally you should check that this form of return statement is only used in procedures that are declared as returning int, and that the expressionless form is used only in procedures that are declared as returning void. Don't worry about checking for procedures that are declared as returning int but might fall off the end without executing a return statement; that requires more sophisticated technology to check for, which we'll discuss in class.

The ultimate functionality

If you can get the above working, and still are looking for further challenge, it would be desirable to allow procedure call expressions to also be nested within other expressions. In order to do this, you'll need to keep track of what caller-saves registers (temporary registers and argument registers) are in use across the call, and arrange to save them to the stack and restore them from the stack. You'll also need to move a procedure call's return value out of the value register and into a temporary register if the call appears as a sub-expression and another call occurs in the same overall expression. The easiest way to deal with this is to always move the value to another register, namely the target register if one is specified, or a temporary if not.

Generated code

Procedures

No assembly code needs to be generated for a procedure declaration, it serves only for the compiler's own bookkeeping. (However, the library procedure print_int should still be emitted at the beginning.)

For a procedure definition, you should emit the procedure's name as a label, then the procedure's prolog, body, and labelled epilog. The general structure is like that of the main procedure in prior labs. Upon entry to a procedure, the procedure should move its arguments from the $a registers they arrive in to $s registers, from there on treating them as normal local variables living in the $s registers. The epilog should have a label at the beginning for return statements to jump to. (Remember that each procedure will need to use a different label for this, so that multiple procedures can co-exist.) Another novelty is that the placing of 0 into the $v0 register should be omitted from the epilog. Instead, if someone wants to return 0, they will need to use an explicit return 0; statement, which places the 0 into $v0 and then jumps to the epilog.

Return statements

As indicated above, a return statement just needs to evaluate the expression (if any) into $v0 and then jump to the epilog label.

Procedure call statements

A procedure call statement can be compiled simply by generating the code for all the argument expressions, each targeted into the appropriate argument register ($a0, $a1, etc.), and then a jal instruction with the called procedure's name. Remember that the jal instruction implicitly writes into the $ra register. Thus you will need to add this register to the set of registers that needs saving. (The saving/restoring will get done in the calling procedure's prolog and epilog.)

Procedure call expressions that aren't subexpressions

If you allow procedure calls to be expressions, but only as top level expressions, then you can generate the same code for them as for a procedure call statement, followed by a move of the value from $v0 to the specified target register, if any. (If no target register was specified, the value can be provided in $v0. This might be relevant if the procedure call were serving as the test expression for a while loop or if statement.)

Procedure call subexpressions

If you go all the way and allow procedure calls to also serve as subexpressions and as arguments in procedure call statements, then you will need to precede the above-mentioned code with sw instructions to save to the stack frame any $t or $a registers that are currently in use. (The latter could arise if this procedure call were nested inside another one, e.g., f(x, g(y)).) After the procedure call, you will need to generate corresponding lw instructions. Additionally, somewhere you will need to allocate space in the stack frame. The simplest approach is to subtract from the stack pointer before the stores and add back to it after the loads. (Another approach is pre-allocate space in the procedure prolog, so that the only adjusting of the stack pointer is in the prolog and epilog.)

Postlab

Write a lab report showing your c5 modules, highlighting the differences from the code for c4 (or c3). You should also describe the tests you ran, and any problems uncovered.

Course web site: http://www.gac.edu/~max/courses/S97/MC97/
Instructor: Max Hailperin <max@gac.edu>