### 3` `Benefits and Limits of Objects

In the language course, we have been programming by defining a data type and its variants and then defining all the "services" over these structures using procedures that work by case analysis. This style of programming is sometimes called "procedural paradigm" or "functional design" (note that "functional" here does not refer to "side-effect free"!).

In PLAI, we have done this with define-type to introduce the data type and its variants, and using type-case in case-analyzing procedures. This procedural approach is common in other languages like C (unions), Pascal (variants), ML and Haskell’s algebraic data types, or plain Scheme’s tagged data.

So, what does object-oriented programming really bring us? What are its weaknesses? As it turns out, using an object-oriented language does not mean that programs are "object oriented". Many Java programs are not, or at least sacrifice some of the fundamental benefits of objects.

This chapter is based on the article On Understanding Data Abstraction, Revisited by William R. Cook (2009).

#### 3.1` `Abstract Data Types

Let us first look at abstract data types (ADTs). An ADT is a data type that hides its representation and only supplies operations to manipulate its values.

adt Set is empty : Set insert : Set x Int -> Set isEmpty? : Set -> Bool contains? : Set x Int -> Bool

(define empty '()) (define (insert set val) (if (not (contains? set val)) (cons val set) set)) (define (isEmpty? set) (null? set)) (define (contains? set val) (if (null? set) #f (if (eq? (car set) val) #t (contains? (cdr set) val))))

> (define x empty) > (define y (insert x 3)) > (define z (insert y 5)) > (contains? z 2) #f

> (contains? z 5) #t

We could as well implement the set ADT with another representation, such as using PLAI’s define-type mechanism to create a variant type to encode a set as a linked list.

(define-type Set [mtSet] [aSet (val number?) (next Set?)]) (define empty (mtSet)) (define (insert set val) (if (not (contains? set val)) (aSet val set) set)) (define (isEmpty? set) (equal? set empty)) (define (contains? set val) (type-case Set set [mtSet () #f] [aSet (v next) (if (eq? v val) #t (contains? next val))]))

> (define x empty) > (define y (insert x 3)) > (define z (insert y 5)) > (contains? z 2) #f

> (contains? z 5) #t

#### 3.2` `Procedural Representations

We can as well consider sets as being defined by their characteristic function: a function that, given a number, tells us whether or not this number is part of the set. In that case, a set is simply a function Int -> Bool. (In PLAI, we saw that in Chapter 11, when studying the procedural representation of environments.)

(define empty (λ (n) #f)) (define (insert set val) (λ (n) (or (eq? n val) (contains? set n)))) (define (contains? set val) (set val))

> (define x empty) > (define y (insert x 3)) > (define z (insert y 5)) > (contains? z 2) #f

> (contains? z 5) #t

(define even (λ (n) (even? n)))

(define random (λ (n) (> (random) 0.5)))

> (define a (insert even 3)) > (define b (insert a 5)) > (contains? b 12) #t

> (contains? b 5) #t

#### 3.3` `Objects

In essence, sets as functions are objects! Note that objects do not abstract type: the type of a set-as-function is very concrete: it is a function Int -> Bool. Of course, as we have seen in the first chapters, an object is a generalization of a function in that it can have multiple methods.

##### 3.3.1` `Object Interfaces

interface Set is contains? : Int -> Bool isEmpty? : Bool

(define empty (OBJECT () ([method contains? (n) #f] [method isEmpty? () #t]))) (define (insert s val) (OBJECT () ([method contains? (n) (or (eq? val n) (-> s contains? n))] [method isEmpty? () #f])))

> (define x empty) > (define y (insert x 3)) > (define z (insert y 5)) > (-> z contains? 2) #f

> (-> z contains? 5) #t

Note that object interfaces are essentially higher-order types: methods are functions, so passing objects around means passing groups of functions around. This is a generalization of higher-order functional programming. Object-oriented programs are inherently higher-order.

##### 3.3.2` `Principles of Object-Oriented Programming

Principle: An object can only access other objects through their public interfaces

Once we create an object, like the one bound to z above, the only thing we can do with it is to interact by sending messages. We cannot "open it". No attribute of the object is visible, only its interface. In other words:

Principle: An object can only have detailed knowledge about itself

This is fundamentally different from the way we program with ADT values: in a type-case analysis (recall the implementation of contains? in the ADT implementation with define-type), one is opening the value and gaining direct access to its attributes. ADTs provide encapsulation, but for the clients of the ADT; not for its implementation. Objects go further in this regard. Even the implementation of the methods of an object cannot access attributes of objects other than itself.

From this we can derive another fundamental principle:

Principle: An object is the set of observations that can be made upon it, as defined by its object interface.

This is a strong principle, that says that if two objects behave the same for a certain experiment (ie., a number of observations), then they should be undistinguishable otherwise. This means that the use of identity-related operations (like pointer equality) are violating this principle of OOP. With == in Java, we can distinguish two objects that are different even though they behave in the same way.

##### 3.3.3` `Extensibility

The above principles can be considered the characteristic feature of OOP. As Cook puts it: "Any programming model that allows inspection of the representation of more than one abstraction at a time is NOT object oriented"

The Component Object Model (COM) is one of the purest OO programming model in practice. COM enforces all these principles: there is no built-in equality, there is no way to determine if an object is an instance of a given class. COM programs are therefore highly extensible.

Note that the extensibility of objects is in fact completely independent from inheritance! (We don’t even have classes in our language) It instead comes from the use of interfaces.

##### 3.3.4` `What about Java?

Java is not a pure object-oriented language, not importantly because it has primitive types, but because it supports many operations that violate the principles we have described above. Java has primitive equality ==, instanceof, casts to class types, that make it possible to distinguish two objects even though they behave the same. Java makes it possible to declare a method that accepts objects based on their classes, not their interfaces (in Java, a class name is also a type). And of course, Java allows objects to access the internals of other objects (public fields, of course, but even private fields are accessible by objects of the same class!).

This means that Java also supports ADT-style programming. There is no nothing wrong with that! But it is important to understand the design tradeoffs involved, to make an informed choice. For instance, in the JDK, certain classes respect OO principles on the surface (allowing extensibility), but in their implementation using ADT techniques (not extensible, but more efficient). If you’re interested, look at the List interface, and the LinkedList implementation.

Programming in "pure OO" in Java basically means not using class names as types (ie. use class names only after new), and never use primitive equality (==).

#### 3.4` `The Extensibility Problem

Object-oriented programming is often presented as the panacea in terms of extensible software. But what exactly is meant with "extensible"?

The extensibility problem is concerned with defining a data type (structure + operations) in a way that two kinds of extension are properly supported: adding new representational variants, or adding new operations.

Here, we use the term ADT in line with Cook’s usage. It is important to clarify however that the discussion of the extensibility problem here actually contrasts objects with variant types (aka. algebraic data types). We are concerned with how to have an extensible implementation. The interface abstraction is not relevant here.

##### 3.4.1` `ADT

We first consider the ADT approach. We define a data type for expressions with three variants:

(define-type Expr [num (n number?)] [bool (b boolean?)] [add (l Expr?) (r Expr?)])

Now we can define the interpreter as a function that type-cases on the abstract syntax tree:

(define (interp expr) (type-case Expr expr [num (n) n] [bool (b) b] [add (l r) (+ (interp l) (interp r))]))

This is good-old PLAI practice. With a little example:

> (define prog (add (num 1) (add (num 2) (num 3)))) > (interp prog) 6

Extension: New Operation

Let us now consider that we want to add a new operation on expressions. In addition to interpret an expression, we want to typecheck an expression, that is, to determine the type of value it will produce (here, either number or boolean). This is fairly trivial in our case, but still makes it possible to detect without interpretation that a program is bound to fail because it adds things that are not both numbers:

(define (typeof expr) (type-case Expr expr [num (n) 'number] [bool (b) 'boolean] [add (l r) (if (and (equal? 'number (typeof l)) (equal? 'number (typeof r))) 'number (error "Type error: not a number"))]))

> (typeof prog) 'number

> (typeof (add (num 1) (bool #f))) Type error: not a number

If we reflect on this extension case, we see that it all went smoothly. We wanted a new operation, and just had to define a new function. This extension is modular, because it is defined in a single place.

Extension: New Data

(define-type Expr [num (n number?)] [bool (b boolean?)] [add (l Expr?) (r Expr?)] [ifc (c Expr?) (t Expr?) (f Expr?)])

(define (interp expr) (type-case Expr expr [num (n) n] [bool (b) b] [add (l r) (+ (interp l) (interp r))] [ifc (c t f) (if (interp c) (interp t) (interp f))])) (define (typeof expr) (type-case Expr expr [num (n) 'number] [bool (b) 'boolean] [add (l r) (if (and (equal? 'number (typeof l)) (equal? 'number (typeof r))) 'number (error "Type error: not a number"))] [ifc (c t f) (if (equal? 'boolean (typeof c)) (let ((type-t (typeof t)) (type-f (typeof f))) (if (equal? type-t type-f) type-t (error "Type error: both branches should have same type"))) (error "Type error: not a boolean"))]))

> (define prog (ifc (bool false) (add (num 1) (add (num 2) (num 3))) (num 5))) > (interp prog) 5

This extensibility scenario was much less favorable. We had to modify the datatype definition and all the functions.

To summarize, with ADTs, adding new operations (eg. typeof) is easy and modular, but adding new data variants (eg. ifc) is cumbersome and non-modular.

##### 3.4.2` `OOP

How do objects perform in these extensibility scenarios?

First, we start with the object-oriented version of our interpreter:

(define (bool b) (OBJECT () ([method interp () b]))) (define (num n) (OBJECT () ([method interp () n]))) (define (add l r) (OBJECT () ([method interp () (+ (-> l interp) (-> r interp))])))

Note that, in line with OO design principles, each expression objects knows how to interpret itself. There is no more a centralized interpreter that deals with all expressions. Interpreting a program is done by sending the interp message to the program:

> (define prog (add (num 1) (add (num 2) (num 3)))) > (-> prog interp) 6

Extension: New Data

Adding a new kind of data like a conditional ifc object can be done by simply defining a new object factory, with the definition of how these new objects handle the interp message:

(define (ifc c t f) (OBJECT () ([method interp () (if (-> c interp) (-> t interp) (-> f interp))])))

> (-> (ifc (bool #f) (num 1) (add (num 1) (num 3))) interp) 4

This case shows that, conversely to ADTs, adding new data variants with OOP is direct and modular: just create a new (kind of) object(s). This is a clear advantage of objects over ADTs.

Extension: New Operation

But before concluding that OOP is the panacea for extensible software, we have to consider the other extension scenario: adding an operation. Suppose we now want to typecheck our programs, just as we did before. This means that expression objects should now also understand the "typeof" message. To do that, we actually have to modify all object definitions:

(define (bool b) (OBJECT () ([method interp () b] [method typeof () 'boolean]))) (define (num n) (OBJECT () ([method interp () n] [method typeof () 'number]))) (define (add l r) (OBJECT () ([method interp () (+ (-> l interp) (-> r interp))] [method typeof () (if (and (equal? 'number (-> l typeof)) (equal? 'number (-> r typeof))) 'number (error "Type error: not a number"))]))) (define (ifc c t f) (OBJECT () ([method interp () (if (-> c interp) (-> t interp) (-> f interp))] [method typeof () (if (equal? 'boolean (-> c typeof)) (let ((type-t (-> t typeof)) (type-f (-> f typeof))) (if (equal? type-t type-f) type-t (error "Type error: both branches should have same type"))) (error "Type error: not a boolean"))])))

> (-> (ifc (bool #f) (num 1) (num 3)) typeof) 'number

> (-> (ifc (num 1) (bool #f) (num 3)) typeof) Type error: not a boolean

This extensibility scenario forced us to modify all our code to add the new methods.

To summarize, with objects, adding new data variants (eg. ifc) is easy and modular, but adding new operations (eg. typeof) is cumbersome and non-modular.

Note that this is just the dual situation of ADTs!

#### 3.5` `Different Forms of Data Abstraction

Cook’s paper goes more in depth in the comparison of these forms of data abstraction. It is definitely a must-read!

ADTs have a private representation type that prohibits tampering and extension. This is good for reasoning (analysis) and optimization. But it only permits one representation at a time.

Objects have behavioral interfaces, which allow definition of new implementations at any time. This is good for flexibility and extensibility. But it makes it hard to analyze code, and makes certain optimizations impossible.

Both forms of abstraction also support different forms of modular extensions. It is possible to modularly add new operations on an ADT, but supporting new data variants is cumbersome. It is possible to modularly add new representations to an object-oriented system, but adding new operations implies crosscutting modifications.

There are ways to navigate this tradeoff. For instance, one can expose certain implementation details in the interface of an object. That sacrifices some extensibility, but recovers the possibility to do some optimizations. The fundamental question is therefore a design question: what do I really need?

Now you understand why many languages support both kinds of data abstraction.