MCS-287 Lab 4: Parsing and Analysis in Prolog (Spring 2012)

Due May 14, 2012

In this lab assignment, you will gain some experience using Prolog. Specifically, you will use the SWI-Prolog interpreter, which is available on the Mac OS X systems in the MCS lab by executing the swipl command in a terminal window. You can also download SWI-Prolog for your own computer.

You will be working with the same grammar as in Lab 3:

<blck> ::= begin <stmts> end

<stmts> ::= <empty>
          | <stmt> <stmts>

<stmt> ::= declare <name>
         | use <name>
         | <blck>

One minor change is the specification of <name>; now any Prolog atom is a <name>. You can test whether X is bound to an atom using atom(X).

This lab is subdivided into four stages so that you will have less to work on at any one time and have a greater chance of success. When you have succeeded at a particular stage, be sure to save your program out under a different filename before starting work on the next stage. That way, if you aren't successful at the new stage, you will still have your working program from the previous stage. If you get the final stage to work, just turn in your code from that stage. If not, turn in the code from the most advanced stage that you got to work and, if you have an interesting but non-working attempt at the next stage, you can turn that in as well. Be sure to label what you turn in so that I know how to interpret it.

Stage 1: Checking whether the input can be parsed

You will define a predicate for each of the three nonterminals (<stmt>, <stmts>, and <blck>) similar to what I demonstrated in class with <even>, <odd>, and <foo>.

In order that you can work on one predicate at a time and test each predicate before you move to the next one, we need to initially break the cycle of mutual recursion between them. Therefore, you will start with the following simplified grammar for <stmt>:

<stmt> ::= declare <name>
         | use <name>

Define a two-argument predicate stmt such that stmt(List1, List2) is provable if List1 starts with atoms that match this simplified version of the <stmt> nonterminal and then continues with the atoms in List2. For example, the following query should produce true:

stmt([use,x,something,else],[something,else]).

In this stage, you are not to worry about issues such as undeclared names and illegal redeclarations. The string of atoms should just be checked for conformity with the grammar.

Once you are satsified with your test results on this predicate, use it to define a stmts predicate such that stmts(List1, List2) is provable if List1 starts with atoms that match the <stmts> nonterminal and then continues with the atoms in List2. For example, the following query should produce true:

stmts([declare,x,use,x,more,stuff],[more,stuff]).

After you've tested that sequences of declare and use statements work, define a blck predicate such that blck(List1, List2) is provable if List1 starts with atoms that match the <blck> nonterminal and then continues with the atoms in List2. When testing this, remember that nested blcks are still not possible because you are still using the simplified version of stmt. For example, the following query should produce true:

blck([begin,declare,x,use,x,end,tail],[tail]).

Finally, you can modify the stmt predicate to take into account the third production, that is, the possibility that a statement is a nested block. That should allow queries such as the following one to produce true:

blck([begin,declare,x,begin,use,x,end,end,something,else],[something,else]).

Using the blck predicate, it is easy to write a predicate that can test whether a list of atoms is legal, that is, whether it is in the language described by the grammar. Namely, a legal list of atoms consists of a <blck> followed by nothing else:

legal(L) :- blck(L,[]).

Stage 2: Parsing the input into a syntax tree

One of the goals of parsing a language is to tell legal input strings from illegal ones, as in stage 1. However, there is another goal as well, which is to uncover the hierarchical structure that is implicit in the input. For that reason, modify all your predicates to take one more argument, which is to be used as in the following examples:

?- stmt([declare,x,stuff],[stuff],P).
P = declare(x) 

?- stmts([declare,x,use,x,stuff],[stuff],P).
P = [declare(x), use(x)] 

?- blck([begin,declare,x,use,x,end,stuff],[stuff],P).
P = [declare(x), use(x)] 

?- legal([begin,declare,x,begin,use,x,declare,y,end,use,y,end],P).
P = [declare(x), [use(x), declare(y)], use(y)]

Note that the string of atoms used in the final example is treated as legal even though the use of the name y appears outside the scope of that name's declaration; just as in stage 1, we are ignoring this issue for the moment. That is about to change in stage 3.

Stage 3: Checking for undeclared names

Your goal for this stage is to make the preceding example produce false because the given list of atoms contains a use of an undeclared name. For lists without any uses of undeclared names, you should produce the same result as in the previous stage.

To do this, you should add two more arguments to each of the predicates that correspond to a nonterminal (that is, blck, stmts, and stmt). Each of these new arguments is a list of names. The first one is a list of the names that can legally be used in this portion of the input and the second one is a list of the names that can legally be used after it. For example, consider the following query:

?- stmt([use,y,moreinput],Tail,P,[x,y,z],Vars).
Tail = [moreinput],
P = use(y),
Vars = [x, y, z]

This query is parsing a statement, which from the P variable's binding we can see is a use of the name y. This is legal because y is one of the three names that the query specified could legally have used. (The other two were x and z.) The fact that Vars is bound to the exact same list of three names reflects the fact that the same names are legal for use after this statement as were legal for use before it. Here are two further examples:

?- stmt([use,w,moreinput],Tail,P,[x,y,z],Vars).
false.

?- stmt([declare,w,moreinput],Tail,P,[x,y,z],Vars).
Tail = [moreinput],
P = declare(w),
Vars = [w, x, y, z]

The first of these two queries returns false because the list of atoms now does not start with a legal <stmt>. Although the first two atoms are legal so far as the grammar goes, the use of w is now recognized as illegal because this name is not in the list of names that may legally be used. In the second example, you can see how the list of legal names is built up; one more name is legal after the declaration of w.

Note that I am suggesting that the the stmts and blck predicates have the same two new arguments as I illustrated above with stmt. In the case of stmts, the first would be a list of the names already legally usable at the start of the sequence of statements and the second would be the list of names legally usable after the full sequence. In the case of blck, the second new argument will always be the same as the first, because after the block has ended, no new names are usable, even if some were declared within the block. The only reason to add two arguments rather than one is for the sake of consistency.

After modifying the three predicates for the three nonterminals, you will also have to make a minor modification to the legal predicate:

legal(L,P) :- blck(L,[],P,[],[]).

With this definition, you should now be able to execute queries such as these:

?- legal([begin,declare,x,begin,use,x,declare,y,end,use,y,end],P).
false.

?- legal([begin,declare,x,begin,use,x,declare,y,end,use,x,end],P).
P = [declare(x), [use(x), declare(y)], use(x)]

Stage 4: Checking for illegal redeclarations

Finally, you should also reject any input that contains a redeclaration of a name that has already been declared at the same nesting level, while allowing redeclaration of names defined in outer levels.

You can do this by adding two more arguments to each of the nonterminal predicates. As in the previous stage, each is a list of names, one for the situation before the current portion of the input and one for the situation afterward. However, this time the lists should contain those names that would be illegal to declare at the current nesting level because they are already declared at this level.