On this page:
6.1 Class Hierarchy
6.2 Method Lookup
6.3 Fields and Inheritance
6.3.1 Inheriting Fields
6.3.2 Binding of Fields
6.3.3 Field Shadowing
6.4 Cleaning up the Class Protocol
6.5 Super Sends
6.6 Inheritance and Initialization

6 Inheritance

Éric Tanter

In the presence of classes, we may want a mechanism similar to Delegation in order to be able to reuse and selectively refine existing classes. We therefore extend our object system with support for class inheritance. As we will see, many issues have to be dealt with. As usual, we will proceed in a gradual manner.

6.1 Class Hierarchy

We introduce the capability for a class to extend another class (called its superclass). We focus on single inheritance, where a class only extends a single class.

Multiple inheritance. C++

As a result, classes are organized in a hierarchy. The superclasses of a class (transitively) are called its ancestors; dually, the set of transitive subclasses of a class are called its descendants.

For example:
(define Point
  (CLASS extends Root
           ([field x 0])
           ([method x? () (? x)]
            [method x! (new-x) (! x new-x)]
            [method move (n) (-> self x! (+ (-> self x?) n))])))
 
(define ColorPoint
  (CLASS extends Point
           ([field color 'black])
           ([method color? () (? color)]
            [method color! (clr) (! color clr)])))

6.2 Method Lookup

When we send a message to an object, we look in its class for a method that implements the message, and then we apply it. This is reflected in our CLASS macro definition as:

[(invoke)
 (if (assoc (second vals) methods)
     (apply ((cdr (assoc (second vals) methods)) (first vals)) (cddr vals))
     (error "message not understood"))]

With inheritance, if an object is sent a message for which a method is not found in its class, we can look for a method in the superclass, and so on. We will first refine the invoke protocol and cut it in two steps: first, a lookup step, which includes going to lookup in superclasses if no method is found in the current class, and then the actual invoke step.

(defmac (CLASS extends superclass
               ([field f init] ...)
               ([method m params body] ...))
  #:keywords field method extends
  #:captures self ? !
  (let ([scls superclass]
        (methods
         (local [(defmac (? fd) #:captures self
                   ((obj-class self) 'read self 'fd))
                 (defmac (! fd v) #:captures self
                   ((obj-class self) 'write self 'fd v))]
           (list (cons 'm (λ (self)
                            (λ params body))) ...))))
    (letrec ([class (λ (msg . vals)
                      (case msg
                        ....
                        [(invoke)
                         (let ((method (class 'lookup (second vals))))
                           (apply (method (first vals)) (cddr vals)))]
                        [(lookup)
                         (let ([found (assoc (first vals) methods)])
                           (if found
                               (cdr found)
                               (scls 'lookup (first vals))))]))])
      class)))

The CLASS syntactic abstraction is extended with an extends clause (a new keyword in a class definition). Before we can try this out, we need to define a root class at the top of the tree, in order to put an end to the method lookup process. The Root class below does just that:
(define Root
  (λ (msg . vals)
    (case msg
      [(lookup) (error "message not understood:" (first vals))]
      [else     (error "root class: should not happen: " msg)])))
Root is implemented directly as a function, without using the CLASS form, so that we don’t need to specify its superclass (it has none). In case it is sent a lookup message, it gives an error that the message is not understood. Note that in this system, it is an error to send any message that is not lookup to the root class.

Let us see an example of class inheritance at work, first with a very simple example:

(define A
  (CLASS extends Root ()
         ([method foo () "foo"]
          [method bar () "bar"])))
(define B
  (CLASS extends A ()
         ([method bar () "B bar"])))

 

> (define b (new B))
> (-> b foo)

"foo"

> (-> b bar)

"B bar"

This looks just fine: sending a message that is unknown in B works as expected, and sending bar also results in B’s refinement to be executed instead of A’s method. That is, method invocation is properly late bound. We say that method bar in B overrides the method of the same name defined in A.

Let us look at a slightly more complex example:

> (define p (new Point))
> (-> p move 10)
> (-> p x?)

10

and now with a ColorPoint:
> (define cp (new ColorPoint))
> (-> cp color! 'red)
> (-> cp color?)

'red

> (-> cp move 5)

hash-ref: no value found for key

  key: 'x

What happened? It seems that we are not able to use field x of a colored point. Fair enough, we haven’t dealt at all about how fields have to be handled in the presence of inheritance.

6.3 Fields and Inheritance

Let us look again at how we handle object creation at the moment:
[(create)
 (make-obj class
           (make-hash (list (cons 'f init) ...)))]
That is it: we are only initializing the values dictionary for the fields that are declared in the current class! We ought to be initializing the dictionary with values for the fields of the ancestor classes as well.

6.3.1 Inheriting Fields

An object should have values for all the fields declared in any of its ancestors. Therefore, when we create a class, we should determine all the fields of its instances. To do that, we have to extend classes such that they keep a list of all their fields, and are able to provide that information to any subclass that requires it.

(defmac (CLASS extends superclass
               ([field f init] ...)
               ([method m params body] ...))
  #:keywords field method extends
  #:captures self ? !
  (let* ([scls superclass]
         [methods ....]
         [fields (append (scls 'all-fields)
                         (list (cons 'f init) ...))])
    (letrec
        ([class (λ (msg . vals)
                  (case msg
                    [(all-fields) fields]
                    [(create) (make-obj class
                                        (make-hash fields))]
                    ....))]))))
We introduce a new fields identifier in the lexical environment of the class. This identifier is bound to the complete list of fields that the class’ instances should have. All fields of the superclass are obtained by sending it the all-fields message (whose implementation simply returns the list bound to fields). When creating an object, we just make a fresh dictionary with all the fields.

Because we have added a new message to the vocabulary of classes, we need to wonder what happens if this message is sent to the Root class: what are all the fields of that class? Well, it has to be the empty list, since we are using append blindly:
(define Root
  (λ (msg . vals)
    (case msg
      [(lookup)     (error "message not understood:" (first vals))]
      [(all-fields) '()]
      [else (error "root class: should not happen: " msg)])))

Let us see if this works:

> (define cp (new ColorPoint))
> (-> cp color! 'red)
> (-> cp color?)

'red

> (-> cp move 5)
> (-> cp x?)

5

Good!

6.3.2 Binding of Fields

Actually, there is still one more issue that we haven’t considered: what happens if a subclass defines a field with a name that already exists in one of its ancestors?

(define A
 (CLASS extends Root
        ([field x 1]
         [field y 0])
        ([method ax () (? x)])))
(define B
  (CLASS extends A
         ([field x 2])
         ([method bx () (? x)])))

 

> (define b (new B))
> (-> b ax)

2

> (-> b bx)

2

In both cases, we get the value bound to the x field of B. In other words, we have late binding of fields, exactly as we do for methods. Is that reasonable?

strong encapsulation

Let us see: an object is meant to encapsulate some (possibly mutable) state behind a proper procedural interface (methods). It is clear that late binding of methods is a desirable property, because methods are what makes an object’s external interface. What about fields? Fields are supposed to be hidden, internal state of the object—in other words, implementation details, not public interface. Actually, notice that in our language so far, it is not even possible to access a field of another object other than self! So at the very least, late binding of fields is doubtful.

should private methods be late bound? are they?

Let us look at what happened with Delegation. How are fields handled there? Well, fields are just free variables of a function, so they are lexically scoped. This is a much more reasonable semantics for fields. When methods are defined in a class, they are defined in terms of the fields that are directly defined in that class, or one of its superclass. This makes sense, because all that is information known at the time one writes a class definition. Having late binding of fields means reintroducing dynamic scoping for all free variables of a method: an interesting source of errors and headaches! (Think of examples of subclasses mixing up their superclasses by accidentally introducing fields with existing names.)

6.3.3 Field Shadowing

We now see how to define the semantics known as field shadowing, in which a field in a class shadows a field of the same name in a superclass, but a method always accesses a field as declared in the class or one of its ancestors.

Concretely, this means that an object can hold different values for fields of the same name; which one to use depends on the class in which the executing method is defined (this is know as the host class of the method). Because of this multiplicity, it is not possible to use a hash table anymore. Instead, we will keep in a class the list of the field names, and in the object, a vector of values, accessed by position. A field access will be done in two steps: first determining the position of the field according to the list of names, and then accessing the value in the vector held by the object.

For instance, for class A above, the list of names is '(x y) and the vector of values in a new instance of A is #(1 0). For class B, the list of names is '(x y x) and the vector of values in a new instance is #(1 0 1). The advantage of keeping the fields ordered this way is that, if not shadowed, a field is always at the same position within an object.

To respect the semantics of shadowing, we have (at least) two options. We can rename shadowed fields to a mangled name, for instance '(x0 y x) in B so that methods hosted in B and its descendants only see the latest definition of x, that is, the field introduced in B. An other alternative is to keep the field names unchanged, but to perform lookup starting from the end: in other words, we will want to find the last position of a field name in the list of names. We will go for the latter.

We can update our definition of CLASS so as to introduce the vector and the field lookup strategy:
....
[(create)
 (let ([values (list->vector (map cdr fields))])
   (make-obj class values))]
[(read)
 (vector-ref (obj-values (first vals))
             (find-last (second vals) fields))]
[(write)
 (vector-set! (obj-values (first vals))
              (find-last (second vals) fields)
              (third vals))]
....
We obtain a vector (with the initial field values) and construct the object with it. Then, for field access, we access the vector at the appropriate position, as returned by find-last. However, if we try this out, we soon realize that it does not work! We still have the same broken semantics as before.

Why is it? Let us refresh memories and see how we desugar a field read ?:
(defmac (? fd) #:captures self
  ((obj-class self) 'read self 'fd))
We see that we generate an expression that asks self for its class, and then send the read message to that class. Hmmm, but self is dynamically bound to the receiver, so we will always ask the same class to read the field! This is wrong. We should not send the read message to the class of the receiver, but rather to the host class of the method! How do we do that? We need a way to refer, from a method body, to its host class or, even better, to directly access the list of fields of the host class.

We could put the list of fields in the lexical environment of the method, as we do for self, but that would mean that programmers could accidentally interfere with the binding (in contrast, self is a typical keyword in object-oriented languages). The list of fields (and the name used to bind it) should rather remain local to our implementation. Since we locally define ? and ! in a class, we can simply declare the list of fields, fields, in the scope of these syntax definitions; the hygiene of macro expansion ensures that user code cannot accidentally interfere with fields.

....
(let* ([scls superclass]
       [fields (append (scls 'all-fields)
                       (list (cons 'fd val) ...))]
       [methods
        (local [(defmac (? fd) #:captures self
                  (vector-ref (obj-values self)
                              (find-last 'fd fields)))
                (defmac (! fd v) #:captures self
                  (vector-set! (obj-values self)
                               (find-last 'fd fields)
                               v))]
          ....)]))

This implementation is not so satisfactory because we call find-last (expensive/linear) for each and every field access. Can this be avoided? How?

Note that we are now directly accessing the fields list, so we do not need to send field access messages to the class anymore. And similarly for writing to a field.

Let us see if all this works as expected:

(define A
 (CLASS extends Root
        ([field x 1]
         [field y 0])
        ([method ax () (? x)])))
(define B
  (CLASS extends A
         ([field x 2])
         ([method bx () (? x)])))

 

> (define b (new B))
> (-> b ax)

1

> (-> b bx)

2

6.4 Cleaning up the Class Protocol

Since we first introduced Classes, we have made several changes to the protocol of classes:
  • We have split the invoke protocol into two, by introducing lookup whose purpose is solely to lookup the appropriate method definition in the class hierarchy.

  • We have added all-fields in order to be able to retrieve the fields of a class. This is used at class construction time to obtain the fields of the superclass and append them to the ones being defined.

  • We got rid of the read/write protocol for field accesses, in order to properly scope field names in methods.

It is a good time to reflect upon the class protocol and see whether or not what we have here is a minimal protocol, or if we could get rid of some of it. What is a good criteria for deciding? well, since we are talking about the protocol of a class, it may better be that it actually relies on the class processing the message. For instance, since we first introduce the read/write protocol, we could have gotten rid of it. Remember:
....
[(read)  (dict-ref (obj-values (first vals)) (second vals))]
[(write) (dict-set! (obj-values (first vals)) (second vals)
                    (third vals))]
....

Is there anything here that depends upon free variables of the class function? (in other words, that depends on the state of the class object) No, the only input needed is the current object, the name of the field being accessed, and possibly the value being written to it. We could therefore have placed this code directly in the expansion of ? and !, thereby effectively "compiling away" an unnecessary layer of interpretation.

What about invoke? Well, it’s just sending out a message to itself, which we could do directly when expanding->, and then the application per se is independent of the class:
(defmac (-> o m arg ...)
  (let ([obj o])
    ((((obj-class obj) 'lookup 'm) obj) arg ...)))

What about the other parts of the class protocol? all-fields, create, and lookup do access the internal state of the class: for all-fields we access fields; for create we access both fields and class itself; and for lookup we access both methods and superclass. So, our classes only need to understand these three messages.

6.5 Super Sends

When a method overrides a method in a superclass, it is sometimes useful to be able to invoke that definition. This allows many typical refinement patterns, such as adding something to do before or after a method is executed, like additional processing on its arguments and return values, among many others. This is achieved by doing what is called a super send. We use --> as the syntax for a super send.

Let us look at a first example:

(define Point
 (CLASS extends Root
          ([field x 0])
          ([method x? () (? x)]
           [method x! (new-x) (! x new-x)]
           [method as-string ()
                   (string-append "Point("
                                  (number->string (? x)) ")")])))
 
(define ColorPoint
 (CLASS extends Point
          ([field color 'black])
          ([method color? () (? color)]
           [method color! (clr) (! color clr)]
           [method as-string ()
                   (string-append (--> as-string) "-"
                                  (symbol->string (? color)))])))

 

> (define cp (new ColorPoint))
> (-> cp as-string)

"Point(0)-black"

Note how a super send allows us to reuse and extend the definition of as-string in Point in order to define the method in ColorPoint. In Java, this is done by invoking a method on super, but what exactly is super? what is the semantics of a super send?

A first thing to clarify is: what is the receiver of a super send? In the example above, to which object is as-string sent when using -->? self! In fact, super only really affects method lookup. A common misunderstanding is that when doing a super send, method lookup starts with the superclass of the receiver, instead of its class. Let us see why this is incorrect in a small, artificial example:
(define A
  (CLASS extends Root ()
         ([method m () "A"])))
 
(define B
  (CLASS extends A ()
         ([method m () (string-append "B" (--> m) "B")])))
 
(define C
  (CLASS extends B () ()))
 
(define c (new C))
(-> c m)
What does this program return? Let’s see. -> expands into a send of lookup to c’s class, which is C. There is no methods defined for m in C, so it sends lookup to its superclass, B. B finds a method for m, and returns it. It is then applied to the current self (c) and then passed arguments, none in this case. The evaluation of the method implies the evaluation of the three arguments of string-append, the second of which is a super send. With the above definition of a super send, this means that m is looked up not in C (the actual class of the receiver) but in B (its superclass). Is there a method for m in B? Yes, and we are actually executing it.... In other words, with this understanding of super the above program does NOT terminate.

Some dynamic languages, like Ruby, allow the class inheritance relation to be changed at runtime. This is common in prototype-based languages, like Self and JavaScript.

What is the mistake? To consider that a self send implies looking up the method in the superclass of the receiver. In the example above, we should actually lookup m in A, not in B. To this end, we need to know the superclass of the host class of the method in which the super send is performed. Is this a value that should be bound statically in method bodies, or dynamically? Well, we’ve just said it: it is the superclass of the host class of the method, and that is not likely to change dynamically (at least in our language). Luckily, we already have a binding in the lexical context of methods that refers to the superclass, scls. So we just need to introduce a new local macro for -->, whose expansion asks the superclass scls to lookup the message. --> can be used by user code, so it is added to the list of #:captures identifiers:
(defmac (CLASS extends superclass
               ([field f init] ...)
               ([method m params body] ...))
  #:keywords field method extends
  #:captures self ? ! -->
  (let* ([scls superclass]
         [fields (append (scls 'all-fields)
                         (list (cons 'f init) ...))]
         [methods
          (local [(defmac (? fd) ....)
                  (defmac (! fd v) ....)
                  (defmac (--> md . args) #:captures self
                    (((scls 'lookup 'md) self) . args))]
            ....)])))
Note how lookup is now sent to the superclass of the host class of the currently-executing method, scls, instead of to the actual class of the current object.

> (define c (new C))
> (-> c m)

"BAB"

6.6 Inheritance and Initialization

We have previously seen how to address object initialization (Initialization), by introducing special methods called initializers. Once an object is created, and before it is returned to the creator, its initializer is called.

In the presence of inheritance, the process is a bit more subtle, because if initializers override each other, some necessary initialization work can be missed. The work of an initializer can be quite specific, and we want to avoid subclasses to have to deal with all the details. One could simply assume the normal semantics of method dispatch, whereby initialize in a subclass can call the super initializer if needed. The problem with this freedom is that the initializer in the subclass may start to do things with the object when inherited fields are not yet consistently initialized. To avoid this issue, in Java, the first thing a constructor must do is to call a super constructor (it may have to compute arguments for that call, but that’s all it is allowed to do). Even if the call is not in the source code, the compiler adds it. Actually, this restriction is also enforced at the VM level by the bytecode verifier: low-level bytecode surgery can therefore not be used to avoid the super constructor call.