Objects, OOP

Object oriented programming (OOP) gets a lot of hype. This lecture explores what OOP is, why it generates so much excitement, where it works well, and where it does not.

What are Objects? What is Object Oriented Programming?

Practitioners associate the term Object Oriented Programming with a wide variety of concepts. Part of the reason that OOP gets so much hype is that some developers think that you need object-oriented programming to reap certain benefits. Here are some concepts associated with OOP:

Encapsulation: Cluster procedures with the data that they process, and ensure that only procedures in the cluster have access to the data.
Dispatch: Tie a group of procedures up into a bundle (potentially with the data that the procedures process as well), and pass around this bundle as a first class value. The bundle of procedures should adhere to some well-defined interface; then client code can invoke the procedures by treating them as black-box implementations of the interface, and developers are free to add new variant implementations, as long as the variants adhere to the specification. (Related term: Message Passing)
Delegation: This is an extension of dispatch: in addition to tying the procedures into a bundle, every procedure in the bundle adheres to a protocol where the bundle is itself passed as an argument. The particular formal parameter that corresponds to the bundle is often referred to as the self or this parameter (and is often associated with special syntax in an Object Oriented programming language, e.g. this in Java or C#). Then each procedure definition can choose whether it is appropriate to use methods from the bundle to accomplish subtasks.
Hierarchical Design, Take 1: Implementation Inheritance: Explicitly name the classes of data (within the language, not in comments). Relate the classes according to an explicitly indicated subclass relationship. Group common code patterns high in the class hierarchy tree, delegating to another method where appropriate. (Related term: Subclassing)
Hierarchical Design, Take 2: Type Inheritance: Explicitly name the classes of data (within the language, not in comments). Relate the classes according to a subtype relation. Clients write their code to the interface of classes high in the tree, so that it can be used with all of the classes that conform to that interface. (Related term: Subtyping)

I will try address all of the ideas above during this lecture.

Lets set the stage by reviewing how we have been doing development in this course so far; then I will discuss how that contrasts with development in an object-oriented language.

Applicative Programming (see also Functional Programming)

A fixed data representation, potentially with invariants and an interpretation.
A set of functions that manipulate the data. Developers are free to extend this set by defining new functions. The structure of these functions can often "follow the grammar" used to define the data.

Throughout this course, we have been using an applicative programming style: we first determine what the data definition is, and then we write functions that manipulate that data.

This makes sense, because we usually have been writing functions that process abstract-syntax trees for a particular language: interpreters, type-checkers, and various program transformers. Often programming language developers will be working with a fixed grammar for the language in question; the language is fixed, but the set of operations we want to perform on terms in the language is growing over time.

Note that this strategy makes it difficult to change the data definitions later; every change to the data definition may require review of all the functions that process instances of that data.

Data Abstraction

A data type consists of two parts:

interface
implementation

For an abstract data type, the interface consists of a set of operations that clients are allowed to use when manipulating values of the abstract data type. With abstract data types, clients do not need to know how the data is represented. That means the representation can be changed without breaking any client code. In other words, client code is representation-independent.

(This section was taken from Lecture 4; you should review it and Chapter 2 of EoPL for more details.)

A concrete example: a Queue abstraction

A queue represents a sequence of values that are delivered in "first-in first-out" (FIFO) order; the first element enqueued will be the first element removed, the second element enqueued will be the second element removed, and so on.

We will write a sequence of values mathematically using the notation [a, b, c, …, x, y, z]; thus we will denote an implementation's representation of such a sequence using the notation ⌈[a, b, c, …, x, y, z]⌉. (Note that the ellipses notation is somewhat informal; one important detail is that we use [a, …, k] to denote a potentially empty sequence, but [a, b, …, k] and [a, …, k, l] both denote non-empty sequences.)

empty : → Queue
snoc : Queue × Val → Queue
isEmpty : Queue → Boolean
head : Queue → Val
tail : Queue → Queue

(empty) = ⌈ [] ⌉

(snoc ⌈[a, …, w]⌉ x) = ⌈ [a, …, w, x] ⌉

(isEmpty ⌈[]⌉) = #t

(isEmpty ⌈[a, b, …, w]⌉) = #f

(head ⌈[a, b, …, w]⌉) = a

(tail ⌈[a, b, …, w]⌉) = ⌈[b, …, w]⌉

First applicative queue implementation

Here is a one queue implementation: queue1 silly. What are its features? What are its drawbacks?

Features:

Every sequence has a uniquely defined representation.
Implementation is very simple; "obviously correct."

Drawbacks:

Slow! Every snoc takes O(n) time!

Second applicative queue implementation

Here is another queue implementation: queue2 fast. What are its features? What are its drawbacks?

Features:

Implementation is still reasonably simple
(note however that some sequences are not uniquely represented).
Worst case O(1) time for snoc and head.
Amortized O(1) time for all operations, if the queue is used linearly (in Okasaki's terms, the O(1) bound holds only if the queue is used ephemerally, in a single-threaded fashion).

Drawbacks:

Worst-case O(n) time for tail.
If the queue is used persistently, where we can continue to refer to old queues even after dequeuing elements via tail, then this queue representation does not have amortized O(1) time for all operations.

Taking a step back

There do exist (even more complex) queue implementations that can achieve:

amortized O(1) time for all operations, even when the queue is used persistently, or
worst-case O(1) time for all operations.

But, should the client care how queues are implemented?

Can we write our client in a manner so that it does not care how queues are implemented? (See for example queue-tests.scm.)

Here is the output from that test suite on the first and second queue implementations.

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (exit)
% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue2.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (exit)
%

Encapsulating a Queue implementation

I have structured the code above in a manner such that I can load either queue implementation and then run the test suite.

However, I took special care when writing my test suite code to make sure that it never attempted to look at the internal representation of queues.

Here is another test suite that was not written with such care. In particular, it assumes that queues have been implemented using the first representation. Compare a run using the first implementation:

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (load "queue-tests-rep-exposed.scm")
success
success
success
success
success
success
>

but with a run using the second implementation:

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue2.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (load "queue-tests-rep-exposed.scm")
FAILURE (empty) should be (); got (() ())
FAILURE (snoc (empty) (quote a)) should be (a); got ((a) ())
success
FAILURE (snoc (snoc (empty) (quote a)) (quote b)) should be (a b); got ((a) (b))
FAILURE (tail (snoc (snoc (empty) (quote a)) (quote b))) should be (b); got ((b) ())
FAILURE (tail (tail (tail (snoc (snoc (snoc (snoc (snoc (empty) (quote a)) (quote b)) (quote c)) (quote d)) (quote e))))) should be (d e); got ((d e) ())
>

Note that these failures are not particularly illuminating; it is not obvious why the actual and expected values do not match.

The problem is that the client (in this case, the test suite) has violated the Queue abstraction; it has relied on the particular representation used for queues, but a proper client should only interact with the data by using the appropriate procedures defined in the abstraction.

One can enforce an abstraction by properly modularizing the code. Most modern languages provide facilities for defining modules as collections of related procedures, and rules for what procedures have access to the internal representation of an abstraction.

Here is a revision of the first (slow but simple) queue implementation that illustrates one way to control access to a representation, and thus enforces modularity.

This is only meant to illustrate one way to achieve this effect; as stated above, most modern languages provide more convenient facilities for defining access rules.

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1-encapsulated.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (load "queue-tests-rep-exposed.scm")
FAILURE (empty) should be (); got #<procedure:an-abstract-queue>
FAILURE (snoc (empty) (quote a)) should be (a); got #<procedure:an-abstract-queue>
success
FAILURE (snoc (snoc (empty) (quote a)) (quote b)) should be (a b); got #<procedure:an-abstract-queue>
FAILURE (tail (snoc (snoc (empty) (quote a)) (quote b))) should be (b); got #<procedure:an-abstract-queue>
FAILURE (tail (tail (tail (snoc (snoc (snoc (snoc (snoc (empty) (quote a)) (quote b)) (quote c)) (quote d)) (quote e))))) should be (d e); got #<procedure:an-abstract-queue>
>

Now the failure messages are a bit clearer: the tests are failing because the client (the test writer) wrote down that the code expected values such as the list (a) but the actual values we receive are abstract queues.

Even if the client wanted to break the abstraction by trying to apply the returned procedure, they would be foiled, as illustrated below.

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1-encapsulated.scm")
> (define my-queue (snoc (snoc (empty) 'a) 'b))
> my-queue
#<procedure:an-abstract-queue>
> (my-queue 'attempting-to-get-inside!)
error: unauthorized-attempt-to-access-queue-internals!

 === context ===
repl-loop

> (my-queue (cons 'unique 'and-hidden))
error: unauthorized-attempt-to-access-queue-internals!

 === context ===
repl-loop

>

This encapsulation of the internal representation, exposing only operations that know how to properly manipulate the data and preserve invariants of the representation, is often considered a key benefit of object-oriented programming.

Furthermore, with a sufficiently expressive language, the most common attempts to violate encapsulation can be detected statically; the program can be rejected before you run it. (The technique illustrated above is detecting the encapsulation violation dynamically, so the system does not signal an error until we run the code for the test suite that violates the encapsulation.)

However, encapsulation is not a benefit that is provided by only object-oriented languages. Many non-object-oriented languages do support modular program definitions where representations are hidden; after all, I just demonstrated one way to accomplish the task in Scheme!

On top of that, you can write code in Java or C# where the internal representation is exposed as a set of public fields; it is up to the programmer to decide how to use the features of the language to achieve their desired system design.

Still, many practitioners will list encapsulation as a reason that they use object-oriented languages for their systems development.

Dispatch in a Queue implementation

In the above demonstrations, I had to load up the queue1 and queue2 implementations whole-sale; it would not be legal with the code above to mix-and-match queue implementations.

That is, the client code can be ignorant of which queue abstraction it is using, but it still needs to be linked against a single queue implementation, or else there are likely to be serious consequences.

We can see some of the bad effects of trying to link against several of the above queue implementations at once below:

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1-encapsulated.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (define an-important-queue-to-remember (snoc (snoc (snoc (empty) 'a) 'b) 'c))
> (head an-important-queue-to-remember)
a
> (head (tail an-important-queue-to-remember))
b
> (load "queue2.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (head an-important-queue-to-remember)
car: expects argument of type <pair>; given #<procedure:an-abstract-queue>

 === context ===
/Users/pnkfelix/Dev/projects/csg111/2008/lectures/queue2.scm:23:13: head
repl-loop

>

If you want to be able to use different implementations of the same interface at the same time, you need a more sophisticated way of mapping the operations you want to perform with the actual code that performs those operations.

By representing our data in a different way, we can interact with it by passing a message asking it to perform a particular operation.

Here are translations of the first and second queue implementations into a message-passing style.

Now we can load one version, use it, load another version, use that as well, and the values we constructing with the first version continue to work.

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1-dispatch.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (define an-important-queue-to-remember (snoc (snoc (snoc (empty) 'a) 'b) 'c))
> (head an-important-queue-to-remember)
a
> (head (tail an-important-queue-to-remember))
b
> (load "queue2-dispatch.scm")
> (load "queue-tests.scm")
success
success
success
success
success
success
> (head an-important-queue-to-remember)
a
> (head (tail an-important-queue-to-remember))
b
>

The code for queue1-dispatch and queue2-dispatch illustrates dynamic dispatch (or sometimes single dispatch, or sometimes just dispatch).

Dispatch can provide a strong separation between interface and implementation, because one typically defines the interface when one is developing the set of messages that will be passed around. The actual code that implements the desired behavior associated with the messages can be developed long after the interface has been conceived.

But there is a piece missing, and understanding what the missing piece is requires that we take another step back.

Queues in a larger context: classes and classifications

A queue abstraction can be useful for certain algorithms that require a FIFO order on element delivery and do not require any other sort of order of fast access to enqueued elements.

But some clients may not require a strict FIFO order; some clients will be happy to consume an element from anywhere in the queue.

And other clients may require fast access to both the most-recently inserted and least-recently inserted elements.

It would be useful to categorize our implementations according to what capabilities they have; that is, what operations they can support. If we were to classify our various interfaces into a hierarchy, then our clients could clearly state their requirements by choosing the interface appropriate to their needs.

For my examples in this lecture, I will use a simple hierarchy. Here is its interface (not implementation):

Collection supports methods
  isEmpty : Collection -> Bool
  addElem : Collection * Value -> Collection
  anyElem : Collection -> (list Value Collection) ;; only non-empty works!
  addAll  : Collection * Collection -> Collection
  toList  : Collection -> Listof[Value]

Queue extends Collection and adds methods
  snoc    : Queue * Value -> Queue
  head    : Queue -> Value                        ;; only non-empty works!
  tail    : Queue -> Queue                        ;; only non-empty works!

Tree extends Collection and adds methods
  isLeaf    : Tree -> Bool
  nodeValue : Tree -> Value                       ;; only non-leaf works!
  left      : Tree -> Tree                        ;; only non-leaf works!
  right     : Tree -> Tree                        ;; only non-leaf works!

The base Queue constructor is 
  empty : -> Queue

The base Tree constructors are
  leaf  : -> Tree
  node  : Tree * Tree * Value -> Tree

Implementation Inheritance

There may be a lot of potential code sharing amongst the different implementations. A Collection class might implement a method, addAll, where one collection c1 consumes another collection c2 of elements and produces the union of the two by iteratively invoking the c1.add method on each element it can get out of c2.

This is a common code pattern that we would like to put into one place, into the Collection class. Extensions of the Collection will be responsible for properly implementing its add method, but once that is in place, then all of the extensions immediately support the addAll method. (At least in principle; there are caveats here.)

Delegation in a Queue implementation

The crucial idea to support implementation inheritance: pass yourself around as another parameter! Then, when you need to invoke a method, pass a message to yourself!

(This may sound absurd; why would this accomplish anything?)

This is the heart of delegation; pass a special self parameter around; for any method that you want to allow your subclasses to handle in their implementation, you perform the invocation by passing a message to self.

(Felix finds it fascinating that this notion of self-reference, the key to delegation, is also the key for implementing recursive functions in languages like PROC. But that is just an aside.)

Scheme code illustrating Delegation

Here is the relevant code that illustrates delegation, using Scheme as the implementation language so that the self parameter is explicit in the code.

% mzscheme
Welcome to MzScheme version 352, Copyright (c) 2004-2006 PLT Scheme Inc.
> (load "queue1-delegate.scm")
> (define q1 (empty))
> (toList q1)
()
> (define q2 (addElem (addElem (addElem q1 'a) 'b) 'c))
> (toList q2)
(a b c)
> (toList (tail q2))
(b c)
> (load "tree-delegate.scm")
> (define t1 (leaf))
> (toList t1)
()
> (define t2 (node (node (leaf) (leaf) 'p)
                   (node (node (leaf) (leaf) 'q)
                         (node (leaf) (leaf) 'r)
                         's)
                   't))
> (toList t2)
(p t q s r)
> (isLeaf t2)
#f
> (isLeaf (right t2))
#f
> (isLeaf (left t2))
#f
> (nodeValue (left t2))
p
> (toList (right t2))
(q s r)
> (toList (addAll q2 t2))
(a b c p t q s r)

Java code illustrating Delegation

And here is the big punch-line: the above bits of Scheme code, but encoded as Java classes! (plus a Main program that tests them).

% javac Collection.java Queue.java Tree.java Main.java
% java Main
> q1.toString():
( )
> q2.toString():
( a b c )
> q2.tail().toString():
( b c )
> t1.toString():
( )
> t2.toString():
( p t q s r )
> t2.isLeaf():
false
> t2.right().isLeaf():
false
> t2.left().isLeaf():
false
> t2.left().nodeValue():
p
> t2.right().toString():
( q s r )
> q2.addAll(t2).toString():
( a b c p t q s r )

(Does the output of Main look familiar?)

Type Inheritance

Subclassing versus Subtyping.

When subclassing is not subtyping, part 1: a contrived example.

When subclassing is not subtyping, part 2: a real-world example.

Last updated 16 April 2008.