while
statements
if
statements, both with and without else
parts
<
,
>
, <=
, >=
,
==
, !=
)
Of these, while
and if
statements introduce interesting control
flow, for the first time allowing programs that do something other
than continue in a straight-line path. Compound statements are
important in conjunction with while
and if
statements, in order to
allow a loop body (for example) to perform more than one action.
Moreover, introducing compound statements gives us a natural context
for variable scoping. The comparison operators are the only addition
of no real substance: they are fundamentally similar to the
arithmetic operators, except for a minor quirk of LLVM that we will
need to work around. The comparison operators could equally well have been in the compiler
the whole time. The only reason to add them now is that they become
particularly handy to have now that we have loops and conditional
statements.
We will follow the lead of C, rather than Java, in using integers
to represent truth values, rather than having a separate boolean
type. If the condition of a while
or if
evaluates to a non-zero value
it counts as true, while a zero value counts as false. The comparison
operations should all produce 1 for true and 0 for false.
The comparison operators should be usable
anywhere any other operator (e.g., +) could be used. So, for example,
there is nothing wrong with
x = y > z;
which assigns x the value 1 or 0, or
println((x > y) + (z > w));
which prints one of 0, 1, or 2.
The syntax of while
and if
statements should be as in C (and C++
and Java). For example, the controlling expression in each case must
be parenthesized. Remember that
any statement can be used as the body of a loop, or as an alternative
in an if
statement: there is no requirement of curly
braces. They have nothing to do with the while
or
if
statement and are only present if a compound
statement is used; they form part of the compound statement. For
example, the following three while
loops are all legal:
while(x >= 10) x = x / 10; // body is an assignment statement while(x >= 10){ // body is a compound statement, which x = x / 10; // happens to only have one statement in it } while(x >= 10){ // body is a more general compound statement int d = x % 10; x = x / 10; println(d); } { // here is is a compound statement int x = 3; { // with another compound statement inside int y = 4; // just to emphasize that they don't println(x+y); // need to show up with while or if } println(x); }
Each brace-enclosed compound statement constitutes a new scope, nested inside the surrounding scope. It is legal to redeclare a variable that was declared in some outer, surrounding scope, but not legal to redeclare a variable in the exact same scope.
Each comparison operation needs to translate into two LLVM
instructions because LLVM is a strongly typed language, in which
32-bit values (such as we are using for integers) are distinct
from 1-bit values (such as the comparison instruction produces).
As such, we need to first use one instruction to do the comparison
(producing a 1-bit result) and then a second instruction to widen
it to 32 bits. This second instruction is known as "zero
extension," because the 31 additional bits are always zero. Here
is an example of testing whether the value named %t0
is less than the value named %t1
:
%t2 = icmp slt i32 %t0, %t1 %t3 = zext i1 %t2 to i32
At this point, the value named %t3
is a normal
32-bit integer value, which will be either 0 or 1. The
code slt
indicates that a signed less-than
comparison was desired. Other comparisons are sgt
(signed greater than), sle
(signed less than or equal
to), sge
(signed greater than or equal
to), eq
(equal to), and ne
(not equal to).
In order to translate while
and if
statements, you'll need to know how to code branch instructions in
LLVM assembly language. To mark a position within the assembly
code that is a possible target for branching to, use a label
followed by a colon, as in this example:
label_0:
In order to unconditionally branch to that label, you would use the following LLVM instruction:
br label %label_0
In order to conditionally branch to label_1
if %t17
is nonzero and to label_2
if %t17
is zero, use this pair of instructions:
%t18 = icmp ne i32 %t17, 0 br i1 %t18, label %label_1, label %label_2
Note that the conditional branch instruction includes two labels.
Unlike in most assembly languages, there is no circumstance under
which a branch instruction falls through to the next instruction: it
always branches somewhere. (The branch target label could happen to
be immediately afterward.) Another point worth noting is that the
conditional branch needs to be based on a single-bit value, that is,
a value of type i1
. For this reason, given that we are
primarily using normal 32-bit values, the conditional branch needed
to be prefaced by an instruction that tests whether %t17
is not
equal to 0.
A related way in which LLVM differs from normal assembly languages is that a label cannot be immediately preceded by a normal instruction, such as a load, store, or arithmetic computation. Instead of following one of those instructions immediately by a label, you need to insert an unconditional branch instruction in between, branching to the label.
In the preceding lab you used a HashMap
; now you'll want to use some
class of your own creation that supports not only the put
and get
methods (or close analogs of them), but also
methods for entering and exiting scopes. I expect that we'll spend
some class time on how this can be represented, but you would also be
well advised to look back over your solution to last year's MCS-287 Lab 3.
Our textbook mentions (on page 293) that YACC will resolve a
shift/reduce conflict in favor of shifting and (on page 282) that this correctly
resolves the dangling-else
ambiguity that you are likely to encounter
when you incorporate the two forms of if
statements. The same is true for
CUP, but
only if you give an option telling it how many conflicts
to expect. That is, in the build.xml
file, you can
change the action
<cup srcfile="${src}/Parser.cup" destdir="${java}" parser="Parser" />to
<cup srcfile="${src}/Parser.cup" destdir="${java}" parser="Parser" expect="1" />
if your grammar has one shift/reduce conflict in it that can be
correctly resolved in favor of shifting. You should make the
corresponding change in the cup_dump
target of the
build.xml
file as well if you decide to go this route.
There are at least two other ways you could deal with the
dangling-else
ambiguity. The easiest one is just to declare a
precedence for the else
terminal at the top of your Parser.cup
file. Then you won't need to modify the build.xml
and
yet CUP will still resolve the shift/reduce conflict in favor of
shifting, much like it would for an expression such as
3+4*5
. The other approach, conceptually clean but messy
in practice, would be to write an unambiguous grammar in the first
place, as shown on page 212 of the textbook.
From the previous lab, you are emitting an LLVM instruction such as the following each time a variable is declared:
%t0 = alloca i32
This instruction allocates stack space to hold the variable. That creates an issue now that programs can have loops in them. Consider a program such as the following:
int main(){ int n = 100; while(n){ int x = n; println(x); n = n - 1; } }
This program will allocate stack space for the
variable x
every time through the loop, which is 100
times in this case. By replacing the 100 with a larger number, you
could make the amount of allocated memory even larger. (None of
the memory is deallocated until the procedure returns.) This
isn't good. To improve your compiler's handling of such situations,
you should move the alloca
instruction to the beginning
of the procedure, rather than positioning it where the declaration
statement occurs. There are a couple different strategies for how
you might accomplish that. I'd be happy to talk with you.
Write a lab report highlighting the differences from the code for your previous compiler. You should also describe the tests you ran and any problems uncovered.