MCS-287 Lab 4: Parsing and Analysis in Prolog (Spring 2009)

Due May 18, 2009

In this lab assignment, you will gain some experience using Prolog. Specifically, you will use the SWI-Prolog interpreter, which is available on the Linux systems in the MCS lab by executing the pl command in a terminal window. You can also download SWI-Prolog for your own computer.

You will be working with the same grammar as in Lab 3:

<blck> ::= begin <stmts> end

<stmts> ::= <empty>
          | <stmt> <stmts>

<stmt> ::= declare <name>
         | use <name>
         | <blck>

You might notice that I am now using <blck> for the nonterminal that was <block> in Lab 3. This is because you will be defining a Prolog predicate to correspond with each of the grammar's nonterminals and SWI Prolog already has a predefined predicate named block.

Also, to make your life easier, let's change the specification of <name> so that any Prolog atom is a <name>. You can test whether X is bound to an atom using atom(X). (However, because I was slow in conveying this information, I also announed that I would accept solutions that omit this check.)

This lab is subdivided into four stages so that you will have less to work on at any one time and have a greater chance of success. When you have succeeded at a particular stage, be sure to save your program out under a different filename before starting work on the next stage. That way, if you aren't successful at the new stage, you will still have your working program from the previous stage. If you get the final stage to work, just turn in your code from that stage. If not, turn in the code from the most advanced stage that you got to work and, if you have an interesting but non-working attempt at the next stage, you can turn that in as well. Be sure to label what you turn in so that I know how to interpret it.

Stage 1: Checking whether the input can be parsed

Define a two-argument predicate blck such that blck(List1, List2) is provable if List1 starts with atoms that match the <blck> nonterminal and then continues with the atoms in List2. For example, the following query should produce "Yes":

blck([begin,declare,x,begin,use,x,end,end,something,else],[something,else]).

In this stage, you are not to worry about issues such as undeclared names and illegal redeclarations. The string of atoms should just be checked for conformity with the grammar.

To help define the blck predicate, you should make use of two other analogous predicates, one for each of the other nonterminals in the grammar. The definitions should be mutually recursive in the same way that the grammar productions are.

If you want to test each predicate as you define it, you will need to temporarily eliminate the mutual recursion. You could do this by first defining a simplified version of the stmt predicate that only takes into account the first two productions for the <stmt> nonterminal. Once you have tested that and verified that it works, you could define stmts and test it with sequences of simple statements. Then you can add blck and test it (with no nested blocks). Once all three predicates seem to be working, then you can modify the stmt predicate to take into account the third production, that is, the possibility that a statement is a nested block.

Using the blck predicate, it is easy to write a predicate that can test whether a list of atoms is legal, that is, whether it is in the language described by the grammar. Namely, a legal list of atoms consists of a <blck> followed by nothing else:

legal(L) :- blck(L,[]).

Stage 2: Parsing the input into a syntax tree

One of the goals of parsing a language is to tell legal input strings from illegal ones, as in stage 1. However, there is another goal as well, which is to uncover the hierarchical structure that is implicit in the input. For that reason, modify all your predicates to take one more argument, which is to be used as in the following example:

?- legal([begin,declare,x,begin,use,x,declare,y,end,use,y,end],P).

P = [declare(x), [use(x), declare(y)], use(y)] 

The syntax tree for a blck is the same as the syntax tree for an instance of the nonterminal stmts: a list containing the syntax trees for each individual stmt. The syntax tree for a stmt has three possible forms, corresponding to the three productions for this nonterminal. In the first two cases, the syntax tree will be something like declare(x) or use(x). In the case that the stmt is a blck, the syntax tree will be the same list of syntax trees as is used for the blck (and for the blck's underlying stmts).

Note that the string of atoms used in this example is treated as legal even though the use of the name y appears outside the scope of that name's declaration; just as in stage 1, we are ignoring this issue for the moment. That is about to change in stage 3.

Stage 3: Checking for undeclared names

Your goal for this stage is to make the preceding example produce "No" because the given list of atoms contains a use of an undeclared name. For lists without any uses of undeclared names, you should produce the same result as in the previous stage.

To do this, you should add two more arguments to each of the predicates that correspond to a nonterminal (that is, blck, stmts, and stmt). Each of these new arguments is a list of names. The first one is a list of the names that can legally be used in this portion of the input and the second one is a list of the names that can legally be used after it. For example, consider the following query:

?- stmt([use,y,moreinput],Tail,P,[x,y,z],Vars).

Tail = [moreinput],
P = use(y),
Vars = [x, y, z] 

This query is parsing a statement, which from the P variable's binding we can see is a use of the name y. This is legal because y is one of the three names that the query specified could legally be used. (The other two were x and z.) The fact that Vars is bound to the exact same list of three names reflects the fact that the same names are legal for use after this statement as were legal for use before it. Here are two further examples:

?- stmt([use,w,moreinput],Tail,P,[x,y,z],Vars).

No
?- stmt([declare,w,moreinput],Tail,P,[x,y,z],Vars).

Tail = [moreinput],
P = declare(w),
Vars = [w, x, y, z] 

The first of these two queries returns "No" because the list of atoms now does not start with a legal <stmt>. Although the first two atoms are legal so far as the grammar goes, the use of w is now recognized as illegal because this name is not in the list of names that may legally be used. In the second example, you can see how the list of legal names is built up; one more name is legal after the declaration of w.

After modifying the three predicates for the three nonterminals, you will also have to make a minor modification to the legal predicate:

legal(L,P) :- blck(L,[],P,[],_).

With this definition, you should now be able to execute queries such as these:

?- legal([begin,declare,x,begin,use,x,declare,y,end,use,y,end],P).

No
?- legal([begin,declare,x,begin,use,x,declare,y,end,use,x,end],P).

P = [declare(x), [use(x), declare(y)], use(x)] 

Stage 4: Checking for illegal redeclarations

Finally, you should also reject any input that contains a redeclaration of a name that has already been declared at the same nesting level, while allowing redeclaration of names defined in outer levels.

You can do this by adding two more arguments to each of the nonterminal predicates. As in the previous stage, each is a list of names, one for the situation before the current portion of the input and one for the situation afterward. However, this time the lists should contain those names that would be illegal to declare at the current nesting level because they are already declared at this level.