MCS-287 Lab Project 4: Transparent and Opaque Types (Spring 2000)

Due: May 16, 2000

This project focuses on adding type names to section 13.1's type checker. There are two different kinds of type names. Transparent type names, also known as type abbreviations, are simply a convenient shorthand for a type that could be spelled out in full. Opaque type names, on the other hand, serve as a protective barrier between the procedures that define an ADT and the code that makes use of that ADT. Outside code is prevented from making any use of the underlying representation of the ADT. This should become more clear in the course of this project handout and doing the project. I strongly suggest you read through this whole handout before you start.

You may do this project with a partner of your choice, or alone if you prefer. If you work with a partner, the two of you should turn in a single jointly authored project report. Your project report should reflect your final product, rather than focusing on each incremental step along the way. Don't go into the details in English: your audience can read Scheme. You may assume your audience is familiar with the EOPL book, the extra chapter 13, this project handout, and the files I provide.

Trying out the type checker

I am providing you with a working version of the basic type checker from section 13.1. This code is linked from the web version of this project handout. Recall that the basic type checker is the procedure type-of-exp. However, it is helpful to have a user interface procedure. The user interface I am providing is called type. It is analogous to run in the prior project. That is, you would apply it to a string that is a Ted expression. For example,

(type "+(2, 3)")

This will return the type of the expression (here int), and will also print out a trace of the recursive steps in the type checking, unless you set trace-type-of-exp to #f. The definition of type is quite simple:

(define type
  (lambda (s)
    (unparse-type (type-of-exp (character-string-parser s) init-tenv))))

Be sure to try out the type checker on a variety of expressions. A more complicated one, from Figure 13.1.1, would be

let decrement = assert (-> (int) int) : proc (n) +(n, -1);
    compose = assert (-> ((-> (int) bool), (-> (int) int))
                         (-> (int) bool)) :
                proc (f, g)
                  assert (-> (int) bool) :
                    proc (n) f(g(n))
  in let isone = compose(zero, decrement)
      in isone(2)

Note also that the character-string-parser I gave you for this project is extended from the one in the prior project, so as to support the extra syntax used in chapter 13 and this project. You can try parsing all sorts of forms with character-string-parser, even forms that you can't yet process with type, to see what kind of ASTs result. The ASTs are based on the following define-records (which are in chap13-parser-etc.ss):

(define-record assert (type exp))
(define-record tcons (name types))
(define-record definerec (decls))
(define-record definetypeabbreviation (name type))
(define-record namedtype (name))
(define-record definetype (name rep-type decls))
(define-record tuple (exps))
(define-record select (index exp))
(define-record definesumtype (name variants))
(define-record variant (name fields))
(define-record field (name type))
(define-record case (exp clauses))
(define-record clause (name formals exp))
(define-record elseclause (exp))

Only the first six of these are relevant to the assigned part of the lab project; the last eight are for section 13.2, which is purely optional. Of the essential first six, assert and tcons are mentioned in chapter 13. To see examples of definetypeabbreviation and namedtype, you can use character-string-parser on the definitions of intproc and compose that are given near the bottom of chapter 13's page 13. I'll be giving examples later in this project handout of the concrete syntax for definerec and definetype. When I do, you can feed them to character-string-parser to see the corresponding AST records.

Some introductory additions to the checker

Add define to type-of-exp. Here is an example of how this would work. If you first apply type to the string

define five = +(2, 3)

it should return int as its value. Now, if you evaluate (type "five"), you should get back int. In other words, type checking the definition has the side effect of adding a new binding to init-tenv, which influences later type checking.

Now add definerec. Here is an example of the concrete syntax I've chosen for this form:

definerec
  even = assert (-> (int) bool) : 
           proc(n) if zero(n) then true else odd(-(n, 1));
  odd = assert (-> (int) bool) :
          proc(n) if zero(n) then false else even(-(n, 1))
end

This parses into a record of type definerec containing a list of decls. These should be processed much as for letrec. One new binding for each decl should be added to init-tenv showing what type the variable name is being declared as. As with letrec, the individual decls' assert expressions should also be processed with type-of-exp, even though we already have explicit information on each of their types. This is to make sure there is no type error in any of them. Because of the additions to init-tenv, after you run type on the above string, you could evaluate (type "even(3)") and get back bool. Recall that type checking the definerec expression is done primarily for its side effect on init-tenv. However, we still need to get back some type for the definerec expression itself. My suggestion would be to use make-tcons to make a type construction with name product and with the list of types being the types of the various decls' expressions. That is, when you run type on the above example definerec, you'll get back

(product (-> (int) bool) (-> (int) bool))

Transparent type abbreviations

Now add definetypeabbreviation as described on chapter 13's page 13. Running type on the definition of intproc should return (-> (int) int) as its value, and have some side effect. That side effect should be such that if you now run type on

(assert intproc : proc(x) x)(5)

you'll get back int. Also, if you run type on the definition of compose, you'll get back

(-> (intproc, intproc) intproc)

How you do this is up to you. Presumably you'll need some sort of data structure for holding abbreviations. You'll also need to make sure that procedures such as proc-type? work when given a namedtype. (A namedtype should be replaced by the underlying type it is a name for before making any determination about it, such as whether it is a procedure type, whether it matches some other type, etc.)

Where are transparent types suboptimal?

To understand the motivation for opaque types, we need to see where transparent type abbreviations fall short. When used as a convenient shorthand, like the above intproc, there is nothing wrong with transparent type abbreviations: they are the right tool for that job.

But now consider another possible reason we might name a type: to distinguish a new ADT we are programming from its underlying representation. For an example, let's consider a finite function (or ff) ADT. Because of our Ted language's limitations, we'll work with finite functions from integers to integers. Our goal is to be able to use the ADT in ways such as the following (assuming we have a Ted read-eval-print loop):

--> define ffOne = extendFf(10, 100, emptyFf);

--> applyFf(ffOne, 10);
100

--> isBoundInFf(ffOne, 10);
true

--> isBoundInFf(emptyFf, 10);
false

As you can see, the ADT provides four names: emptyFf, extendFf, applyFf, and isBoundInFf. These four constitute the interface of the ADT, and are all that external code should know about. Of course, there must be some underlying representation: some concrete kind of value that emptyFf is and that extendFf produces, and that the other two procedures expect. We could define the type abbreviation name ff for that representation type, whatever it is. Then if we ask for the type of ffOne, we'll find it is ff. Or if we ask for the type of extendFf, we'll find it is

(-> (int int ff) ff)

Here is one possible definition of the type abbreviation and the interface procedures:

definetypeabbreviation ff (-> (int, bool) int)

definerec emptyFf = assert ff : proc(arg, applying) 0;
          extendFf = assert (-> (int, int, ff) ff) :
                       proc(key, val, baseFf)
                         assert ff :
                           proc(arg, applying)
                             if eq(arg, key) then
                               if applying then
                                 val
                               else
                                 1
                             else
                               baseFf(arg, applying);
          applyFf = assert (-> (ff, int) int) :
                      proc(anFf, arg) anFf(arg, true);
          isBoundInFf = assert (-> (ff, int) bool) :
                          proc(anFf, arg) eq(anFf(arg, false), 1)
   end

You should be able to try this out with your type checker from the prior section: run type on each of the definitions, and then on expressions such as those given above.

All is well, right? Well, not quite. There is nothing to stop a user from treating ffOne as a procedure, rather than as an ff. Instead of using one of the official interface operations of the ff ADT, the user could do something like ffOne(10, false) If you check this with your type, you'll find it is not a type error. Instead, it is has the type int. What has happened is that the user has reached down into the underlying representation of the ADT. That is bad for the long-term survivability of his code, since ADT representations tend to change. In the next section, we'll see how that can be avoided with an opaque type.

Opaque types

We will define opaque types using a new language feature, definetype, which you need to add to type-of-exp. Here is how it would be used to define the ADT from the prior section:

definetype ff = (-> (int, bool) int)
  within emptyFf = assert ff : proc(arg, applying) 0;
         extendFf = assert (-> (int, int, ff) ff) :
                      proc(key, val, baseFf)
                        assert ff :
                          proc(arg, applying)
                            if eq(arg, key) then
                              if applying then
                                val
                              else
                                1
                            else
                              baseFf(arg, applying);
         applyFf = assert (-> (ff, int) int) :
                     proc(anFf, arg) anFf(arg, true);
         isBoundInFf = assert (-> (ff, int) bool) :
                         proc(anFf, arg) eq(anFf(arg, false), 1)
  end

Type checking this form involves three steps:

The name ff is made an abbreviation for (-> (int, bool) int), just as with definetypeabbreviation.
Now the four procedure definitions are processed, just as with definerec.
The name ff is then changed to be a type abbreviation for a different type, namely a tcons with the name ff and an empty list of types. (This is just like int or bool, but with the name ff.)

If you try this out, you should be able to make the previous examples of legitimate ADT usage still work. For example, evaluating

(type
"applyFf(ffOne, 10)")

should produce int. However, if you try type checking ffOne(10, false) you should now get an error message, telling you that ffOne is of type ff, which isn't a procedure type. That is opacity: we are hiding the fact that ff really is a procedure type. Only the four ADT procedures are allowed to see that.

Optional extensions

The above implementation of transparent and opaque types is all you need to do for this project. But if you want to play around more, here are some suggestions. Probably the simplest of these is the one I list third, but you can decide for yourself which you find most interesting.

You could build an alternative user interface that provides a read-typecheck-print loop, which prints out the type of each expression entered.
You could do Exercise 13.1.4, which suggests you write an interpreter (version of eval-exp) for this chapter's version of Ted and integrate it along with type checking into a single read-eval-print loop that displays both values and types. This is kind of nice in that you can see that the values are indeed of the type indicated. (For example, with applyFf(ffOne, 10) you'll get back the value 100 and the type int and be able to see that yes, 100 is an integer.)
You could add the tuple expressions, product types, and select expressions from section 13.2. See the top of page 17 for an example. (I unparse the product type differently, as (product int bool).)
You could add sum types as in 13.2. I've added to the parser the ability to parse case expressions such as I showed in class. An example would be
```
case lst of
 emptylist() -> 0;
 intcons(a, d) -> +(1, length(d))
end
```
or, using an else clause,
```
case lst of
 intcons(a, d) -> +(1, length(d));
 else 0
end
```

This handout is partially derived from a draft of chapter 7 of the forthcoming second edition of EOPL, which will replace the current chapter 13.

Instructor: Max Hailperin