CS 3500 Fall 2013

Lecture 7: Data Abstractions - Abstraction Function, Representation Invariant
-------------------------------------------------------

Topics:
-------

Designing mutable data type: Queue

Abstraction function

Representation invariant

Understanding the clone method

----------------------------


Designing mutable data type: Queue:
-----------------------------------


Designing algebraic specification:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To better understand the algebraic specification of data types, we will go through an exercise to define the algebraic specification for an immutable (and later mutable) data type that represents a queue.

We start with the signature:

    Public static methods (of the Queue class):
        empty   :              -->  Queue
        add     : Queue x int  -->  Queue

        append  : Queue x int  -->  Queue
        first   : Queue        -->  int
        remove  : Queue        -->  Queue
        isEmpty : Queue        -->  boolean
        size    : Queue        -->  int

The first two are the 'basic creators'. The remaining ones manipulate the queue.


It helps to start with concrete examples and describe what each methods ill actually do.

In the informal discussion we can describe a queue of integers as q4 = (2 4 5 3) -- this representing a queue where 2 was added first, then 4, then 5, then 3.

So, we got to this example by the following sequence of operations:

q1 = empty.append(2)
q2 = q1.append(4)
q3 = q2.append(5)
q4 = q3.append(3)

Then we want to make sure that:

q4.first()  is 2
q4.remove() is same as empty.append(4).append(5).append(3)
q4.size() is 4
q4.isEmpty() is 'false'


We now need to provide the algebraic specification for these methods:
                           -----------------------

Queue.append(Queue.empty(), k) = Queue.add(Queue.empty(), k)
Queue.append(Queue.add(q, n), k) = Queue.add(Queue.add(q, n), k)


Queue.first(Queue.add(Queue.empty(), k)) = k 
Queue.first(Queue.add(q, n)) = Queue.first(q)               


Queue.remove(Queue.add(Queue.empty(), k)) = Queue.empty()
Queue.remove(Queue.add(q, k)) = Queue.add(Queue.remove(q), k) 


Queue.size(Queue.empty()) = 0
Queue.size(Queue.add(q, n)) = 1 + Queue.size(q)


Queue.isEmpty(Queue.empty()) = true
Queue.isEmpty(Queue.add(q, n)) = false


Notice that some of these methods are not denied for the empty queue.

How did we get this?

Let us look at Queue.first: 

If the queue contains only one element, then that element is returned. That gives us the first equation:

Queue.first(Queue.add(Queue.empty(), k)) = k 

If the queue is longer, then the element to return is the first of the queue to which we have last appended something, and that leads to the second equation:

Queue.first(Queue.add(q, n)) = Queue.first(q)               


It works similarly for Queue.remove:

If there is only one element left in the queue, and we remove one element, we get the empty queue:

Queue.remove(Queue.add(Queue.empty(), k)) = Queue.empty()

If we want to remove from a longer queue, then the last element on the queue is still the last element, the removal needs to be from the queue before that:

Queue.remove(Queue.add(q, k)) = Queue.add(Queue.remove(q), k)

We can follow the design recipe to implement a queue as given here.


Mutable queue - Specification:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To make this into a mutable queue, we need to change the specification for the methods 'append' and 'remove' -- both of them do not generate a new queue, but modify the existing one. The methods signatures will be:


        append  : Queue x int  -->  void
        remove  : Queue        -->  void

We will also change the methods to 'dynamic' methods - so there is no need for the 'static' method wrappers.

We only need a constructor for the empty queue, as any other queue will be created by adding new elements to an existing queue.

The 'dynamic' methods we need to implement are:

        append  : Queue x int  -->  void
        first   : Queue        -->  int
        remove  : Queue        -->  void
        isEmpty : Queue        -->  boolean
        size    : Queue        -->  int

and the specifications are now given as comments for the methods:

/**
 * append the given Integer at the end of this queue
 * Requires: any valid queue and a valid integer
 * Modifies: the current queue is one element bigger, 
 * the new element is added at the logical end of the queue
 */
void Queue.append(Integer k)

/**
 * produce the Integer at the front of this queue
 * Requires: any valid queue 
 * Effect: returns the Integer at the logical front of the queue
 */
Integer Queue.first()             

/**
 * remove the Integer at the front of this queue
 * Requires: any valid queue 
 * Modifies: the current queue is one element shorter, 
 * the Integer at the logical front of the queue is removed
 */
void Queue.remove()

/**
 * produce the size this queue
 * Requires: any valid queue 
 * Effect: returns the size of the queue
 */
int Queue.size()


Mutable queue - Implementation:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can implement the given specification in several different ways -- using an 'ArrayList' to record the elements of the queue, using a 'Vector', or an 'Array', or, even a data structure similar to the one used for the immutable version.

We will choose the 'ArrayList' and implement it is two different ways. The first implementation will have the first element of the queue at index 0, the second one will have the most recently added element at index 0.

Omitting all comments, here is the first version:

public class Queue {
  ArrayList<Integer> q;
  
  public Queue() {
    this.q = new ArrayList<Integer>();
  }
  
  public void append(Integer k){
    this.q.add(k);
  }
  
  public Integer first(){
    if (this.q.isEmpty()) {
      throw new RuntimeException("No first in an empty queue");
    }
    else {
      return this.q.get(0);
    }
  }
  
  public void remove(){
    if (this.q.isEmpty()) {
      throw new RuntimeException("Nothing to remove from an empty queue");
    }
    else {
      this.q.remove(0);
    }
  }
  
  public boolean isEmpty() {
    return this.q.isEmpty();
  }
  
  public int size() {
    return this.q.size();
  }
}

This is great - but the programmer who will work on further enhancements of our implementation has no idea which of the two options we have chosen, without reading the code very carefully. The code is not much different from the other implementation:

class Queue {
  ArrayList<Integer> q;
  
  Queue() {
    this.q = new ArrayList<Integer>();
  }
  
  public void append(Integer k){
    this.q.add(0, k);
  }
  
  public Integer first(){
    if (this.q.isEmpty()) {
      throw new RuntimeException("No first in an empty queue");
    }
    else {
      return this.q.get(q.size() - 1);
    }
  }
  
  public void remove(){
    if (this.q.isEmpty()) {
      throw new RuntimeException("Nothing to remove from an empty queue");
    }
    else {
      this.q.remove(q.size() - 1);
    }
  }
  
  public boolean isEmpty() {
    return this.q.isEmpty();
  }
  
  public Integer size() {
    return this.q.size();
  }
}

So, it is the job of the implementer to provide additional information to the programmer that may want to verify that the implementation is correct, or who may work on further enhancements to the implementation.


Abstraction function:
---------------------

The first task is to describe how the implementation represents the desired data type. This description is an integral part of detailed comments provided with the implementation.

For the first implementation it may be written as:

// A typical Queue of integers is {k0, k1, ..., kn} 
// with k0 as the first element added, and kn as the last element added
//
// The abstraction function is
// AF(queue) = 
//   { queue.q[0] = k1, queue.q[q.size() - 1] = kn | for queue = {k0, k1, ..., kn}}

(We use the notation queue.q[i] to describe the i-th element of the 'ArrayList' q defined in the class Queue.)

Another way to write this would be:

// For a queue created by adding elements k0, k1, k2, … in this order
// the abstraction function is 
//   AF(queue) = {k0, k1, k2, …}
//   where queue.q[i] = ki for 0 <= i < queue.q.size()

This makes it clear to the implementer which of the two possible representation has been chosen.

The example given here is a simple one, and defining the abstraction function seems to be too much work for little gain. In real life programming, the abstraction definition will typically be more complicated, less obvious, and the abstraction function will provide the needed explanation of the implementor's intent.


Representation invariant:
-------------------------

The 'representation invariant' augments the abstraction function by providing a way to test the properties of the implementation. The 'representation invariant' is a predicate function that describes the properties of all legitimate objects in the implementation.

// The rep invariant is
//   queue.q != null &&
//   queue.q.size() = m if m items were added to the queue &&
//   queue.q[i] = ki where ki is the i-th item added to the queue


Test cases that would verify that the given information is indeed represented as specified can then be designed from the individual clauses of the 'representation invariant'. So in our case we could test that after adding three integers to our queue, the element of queue.q at the index 0 is the first element added, the element of queue.q at the index 1 is the second element added, the element of queue.q at the index 3 is the third element added. 

The programmer should include the methods that verify the 'representation invariant' as an additional part of the code - to be used to test implementation after every new modification has been made.

So, we would add the following methods:

// Effects: return true is the rep invariant holds for this;
//          otherwise false
public boolean repOK() {
  Queue queue = new Queue();
  if (queue.q == null || !queue.q.isEmpty())
    return false;
  queue.add(5);
  queue.add(3);
  queue.add(8);
  return
    queue.q.size() == 3 &&
    queue.q.get(0).equals(5) &&
    queue.q.get(1).equals(3) &&
    queue.q.get(2).equals(8);
}


We still need to work on the representation invariants for the methods we have designed.  

The append method was used in the tests for the rep invariant of the basic data representation and no additional test is possible. We have also tested the 'isEmpty' method, but it should be tested again after we remove all items we have added to the queue. The 'size' method needs to be tested after some removals as well. And, of course we need to test the 'getFirst' and 'remove' methods as well.

Here are the additional tests for representation invariants:

// Effects: return true is the rep invariant holds for this;
//          otherwise false
public boolean repOK2() {
  Queue queue = new Queue();
  if (queue.q == null || !queue.q.isEmpty())
    return false;
  queue.add(5);
  queue.add(3);
  queue.add(8);

  // tests for getFirst(), remove(), and size():
  if (!queue.getFirst().equals(5))
    return false;
  
  queue.remove();
  if (queue.q.size() != 2 || !queue.getFirst().equals(3))
    return false;
  
  queue.remove();
  if (queue.q.size() != 1 || !queue.getFirst().equals(8))
    return false;
  
  queue.remove();
  if (queue.q.size() != 0 || !queue.isEmpty())
    return false;
  return true;
}


Understanding the clone method:
-------------------------------

For immutable collections of data and other immutable data structures there is little need for a 'clone' method. Any instance of the class  will remain unchanged throughout the duration of the program's execution.

However, for mutable data, and especially for mutable collections of data, designing of the clone method poses a challenge. First of all, by declaring that the class implements the 'Cloneable' interface Java provides a default 'clone' method that generates a 'shallow' copy of the instance of this class. It creates a new instance of the class and initializes every field with the values of the corresponding fields in the original object. So, while we get a new object, the fields in the two objects refer to the identical objects, if the field values are not of the primitive types.

We should try to define our own 'clone' method whenever possible, making new copies of the values of all fields, and carrying out this task recursively to the fields within our field objects, -- all the way down to the primitive types. But this is not as simple as it seems. Mutable data may involve circularity, and resolving circularity is not a trivial task.

So, in the practical situations we may extend the shallow copy one or two levels into our data structure, and stop there.

One danger of the 'shallow copy' is that the client who uses it may gain access to the information the implementer did not wanted to reveal, even to the extend that he may modify the representation.