MCS-287 Notes for EOPL Section 6.1 (Spring 2000)

Indirect and direct arrays

The distinction between direct and indirect arrays only makes sense in a language with assignable variables, i.e., variables as cells, as in Section 5.5. In a language where variables directly name values (Sections 5.1-5.4), the only model of arrays that makes sense is the direct model: a variable can be the name for (denote) a number, a procedure, or an array. Given that a variable can't name a cell containing a number, it is implausible that it could name a cell containing a reference to an array. Once we have names for cells though (Section 5.5), we get the indirect/direct choice.

One possibility is that a variable always names a single cell, which sometimes contains a reference to an array object (sequence of additional cells), rather than a number. See the following figure.

Indirect model: x is a variable containing 5 and a is a variable containing a reference to an array object containing 3 and 1. Note that when a cell (i.e., a variable or array element) contains a number, I've shown an arrow referring to the number for consistency, while EOPL puts the number in the cell for conciseness. This is an irrelevant distinction, because it doesn't matter whether two cells refer to the same number 5 or each have their own 5: one 5 is the same as any other.

This seems simple and uniform, because variables always name the same thing: a cell. In this indirect model, we can change what array object is stored in a variable's cell, just as we can change what number is stored in a variable's cell. We can also change what is stored in the array object's constituent cells, unlike numbers, which we can't change. Thus, although uniformity brings one form of simplicity, we pay with complexity of a different kind: there are two kinds of array change.

Changing which array a variable refers to and changing the contents of the array are conceptually quite different, but some times they can seem confusingly similar. Suppose we observe that the variable a refers to a two-element array containing 3 and 1. A little later, we observe that a refers to a two-element array containing 2 and 7. What happened? There are two possibilities. One is that a still refers to the same array object as before, but the array's cells have had 2 and 7 stored into them. The other possibility is that a now refers to a different array object, which might have contained 2 and 7 the whole time.

Notice also that with the indirect model we can quite easily have anonymous arrays. (Consider the Scheme expression (vector-length (make-vector 5)), which evaluates to 5 without ever naming the vector.) Although anonymous arrays may fit naturally into this model, some people find them confusing, providing a secondary reason to consider an alternative model.

In order to avoid having two changeable layers (with the need to keep straight which is changed), we can instead use a direct model, where a variable is always the name for some collection of cells, but that collection can either be of size 1 (a "scalar variable," capable of holding a single number) or of positive size (an "array variable," capable of holding a sequence of numbers). See the following figure.

Direct model: x is a scalar variable containing 5 and a is an array variable containing 3 and 1. Note that a is a name for the entire array, even though it happens to be positioned over the first cell of the array. In the notes for Section 6.2 we will need to name individual cells within arrays, and will introduce a separate notation for doing so.

Note that it is tempting in this model to consider scalar variables and one-element array variables to be the same thing, which wouldn't make sense in the indirect model. Some languages succumb to this temptation and others don't.

Also, notice that this direct model, using array variables rather than array objects, does not naturally accommodate anonymous arrays: An array comes into existence as part of a variable declaration. This explains why EOPL's language contains the letarray (and definearray) construct, rather than something analogous to Scheme's make-vector. For consistency, they use this approach even with the indirect model, where letarray a[2] in body can be thought of as essentially an abbreviation for let a = makearray(2) inbody. With the direct model, letarray is more fundamental, since the variable a is the array, rather than just containing a reference to the array.

Notice that an assignment like x := y always means to copy what is in the variable called y into the variable called x. In the indirect model, the variables called y and x are always single cells, and so the assignment always means copying the content of a single cell into another single cell. The cell content that is copied may be a reference to an array object, in which case x comes to refer to the same array object as y, as shown below:

Indirect array assignment

With the direct model, x and y may be array variables rather than scalar variables - they may be sequences of several cells. The assignment x := y still means to copy what is in the variable y into the variable x, but this now means copying the contents of each of the group of cells named y into the corresponding cell in the group named x, as shown below:

Direct array assignment

The difference isn't what assignment means so much as what a variable is; in both cases we copied y's contents into x

Both array models show up in real programming languages. Scheme and Java both use the indirect model. Pascal uses the direct model, but has an explicit "pointer" data type that can be use to simulate the indirect model, at the expense of extra notational machinery. C and C++ also use the direct model and have explicit pointers that can be used to simulate the indirect model. However, they also provide automatic conversions between arrays and pointers, which allow much of the extra notational machinery to be avoided, making it look as though you were using an indirect array model, much of the time. Also, C and C++ do not allow assignment of direct arrays, only of the pointers, though other kinds of aggregate variables can be assigned, which results in copying of each element.

Call-by-value

Call-by-value is a parameter passing mechanism wherein each of a procedure's parameters names a new variable. When the procedure is applied to arguments, the arguments are copied into the new parameter variables, just as if by assignments.

With this one simple statement, and the preceding material on the two array models and what assignment means, you should be able to figure out what call-by-value means for each of the two array models. In particular, Figure 6.16 (page 185) can be deduced by considering the following analogue of Figure 6.15:

letarray u[3]; v[2]
in begin
     u[0] := 5; u[1] := 6; u[2] := 4; v[0] := 3; v[1] := 8;
     letarray x[3]
     in begin
          x := u;
          x[1] := 7; x := v; x[1] := 9
        end
   end

Note that the assignment x := v is a bit odd (even illegal in some languages) because of the two different size arrays involved. (EOPL's version of the direct model calls for copying v's elements into the corresponding part of x, leaving the rest of x alone.) The essential aspects of this example would be unchanged, however, if we extended v to a three-element array so as to avoid this issue. And in any case, the above code, which involves no procedure calling at all, is completely equivalent to what happens when Figure 6.15 is evaluated using call-by-value. This is true with either the indirect model (variables containing references to array objects) or the direct model (array variables).

Call-by-value is used in many real programming languages, including Scheme and Java. Pascal and Ada allow the programmer to specify call-by-value, as well as other options. (The parameter passing method can be individually chosen for each parameter.) C uses call-by-value, and so does C++ by default, but the same notes apply here as in the previous subsection. Since direct assignment of arrays is not permitted in C and C++, arrays can't actually be passed as arguments. However, there are automatic conversions to pointers, so without doing anything extra, you can get an effect much like indirect-model call-by-value, even though what is actually happening is the passing (by value) of an explicit pointer to a direct-model array. Also, non-array aggregate variables can directly be passed by value, copying each element.

The Array ADT

Moving from the conceptual level to the details of implementing interpreters in Scheme, I would like to substitute a different definition of the Array ADT for the one given in EOPL. First, two notes on the code:

Some of it is linked from this web page rather than appearing directly here. Those using hardcopy should be aware of this. (However, the most crucial parts to understand are directly here.)
In a couple places I've used the from-to-do procedure from Concrete Abstractions. I've got a definition of it linked here.

I prefer to divide the Array ADT into two layers: a core layer, providing the basic functionality in terms of some underlying representation, and a layer of "extra" procedures, written in terms of the core layer and providing only additional convenience.

Recall that an array is collection of cells. I consider the Array ADT's core layer to consist of the following four procedures:

(make-array length): This makes a new array, containing length cells, each holding some unspecified value.
(array? object): This returns true if and only if the object is an array.
(array-length array): This returns the number of cells in the array.
(array-cell array i): This returns cell number i from the array, where the cells are numbered starting with 0.

I have provided two different implementations of this core layer, making different design choices:

One version treats the definition of an array as a collection of cells rather literally. It works with any definition of the Cell ADT, such as the one you have been using in Chapter 5. This array representation uses a Scheme vector to hold its collection of cells. Note in particular that the procedure vector-set! is never used in this representation: the vector is serving just as an immutable "glue" to hold the cells together, with the only potential for mutation being within each cell.
The other version takes advantage of the fact that individual vector elements can be set with vector-set!, and hence essentially act like cells. Thus, we can avoid having individual cells separate from the vector, by switching to a new representation of the Cell ADT, where a cell is a position within a vector.

Remember that the above two choices are just low-level implementation details: conceptually, you should always think of an array as a single sequence of cells, as shown in the diagrams earlier in these notes. This is the view supported by the four interface procedures to core level, listed above.

Using the four core procedures, we can write the "extras":

(define array-ref
  (lambda (array index)
    (cell-ref (array-cell array index))))

(define array-set!
  (lambda (array index value)
    (cell-set! (array-cell array index)
               value)))

(define array-whole-set!
  (lambda (dest-array source-array)
    (let ((dest-len (array-length dest-array))
          (source-len (array-length source-array)))
      (if (> source-len dest-len)
          (error "Array too long for assignment:" source-array)
          (from-to-do
           0 (- source-len 1)
           (lambda (i)
             (array-set! dest-array i
                         (array-ref source-array i))))))))

(define array-copy
  (lambda (array)
    (let ((new-array (make-array (array-length array))))
      (array-whole-set! new-array array)
      new-array)))

These provide the same functionality as the like-named procedures in EOPL, but do so in a cleaner way, since they are defined in terms of the four abstract interface procedures of the core Array ADT, rather than in terms of a particular representation. Writing directly in terms of the representation is an optimization, one that is possible with my two representations as well. However, it seems out of character to emphasize optimization of low-level details over clarity in EOPL.

All the above code (core and extras) replaces Figure 6.1.1 on page 181. Figures 6.1.2 and 6.1.3 can remain unchanged. Figure 6.1.4 on page 184 should be altered by fixing one bug in denoted-value-assign!: it shouldn't be possible to store an array into a scalar variable:

(define denoted-value-assign!
  (lambda (den-val exp-val)
    (cond
     ((and (not (array? den-val))
           (not (array? exp-val)))
      (cell-set! den-val exp-val))
     ((and (array? den-val)
           (array? exp-val))
      (array-whole-set! den-val exp-val))
     (else
      (error "Incompatible assignment:" den-val exp-val)))))

One final change is at the bottom of page 184. This is a version of array-set! that prevents an array from being stored into one of the elements of an array:

(define array-set!
  (lambda (array index value)
    (if (array? value)
        (error "Cannot assign array to array element:" value)
        (cell-set! (array-cell array index)
                   value))))

Instructor: Max Hailperin