On this page:
2.1 Execution profiles
2.2 Tail recursion and tail calls
2.3 Languages
2.4 Trampoline

2 Recursion: Efficiency Considerations

We now turn to crucial efficiency considerations related to recursion. This section is just a brief and incomplete overview, but it should give you the necessary concepts to learn more on your own if you want to.

2.1 Execution profiles

The following functions gcd and fact, while similar in structure at first sight, have actually very different execution profiles:

(define (gcd a b)
  (if (zero? b)
      a
      (gcd b (modulo a b))))
 
(define (fact n)
    (if (zero? n)
        1
        (* n (fact (- n 1)))))

Let us focus on the recursive calls that occur when applying these functions. First, gcd:
(gcd 14 21)
 (gcd 21 14)
 (gcd 14 7)
 (gcd 7 0)
 7
And now fact:
(fact 4)
 (* 4 (fact 3))
 (* 4 (* 3 (fact 2)))
 (* 4 (* 3 (* 2 (fact 1))))
 (* 4 (* 3 (* 2 (* 1 (fact 0)))))
 (* 4 (* 3 (* 2 (* 1 1))))
 (* 4 (* 3 (* 2 1)))
 (* 4 (* 3 2))
 (* 4 6)
 24
As you can see, applying fact requires a growing control context to remember what is next to do once the recursive call returns. On the contrary, when applying gcd, this context growth does not occur because the recursive call is the final result. In practice, the control context is realized through stack allocation, which means a function that executes like fact can easily exhaust the available stack space.

2.2 Tail recursion and tail calls

The main difference between both functions above is that the recursive call to gcd in its body occurs in tail position, ie. as the last thing to do. In the body of fact, the recursive call occurs in argument position of the multiplication. A call in tail position is called a tail call. If the tail call is a recursive call, then it is a tail recursive call. A recursive function where all recursive calls are in tail position is tail recursive.

Exercise: Look at the interpreter. Is it a tail recursive function?

While common knowledge is that any function call requires allocating a new stack frame, if a call is in tail position, we can reuse the same stack frame by just overriding the local variables. This observation yields two important optimizations.

Tail Recursion Optimization (TRO) is the optimization that consists in recognizing recursive calls that are in tail position, and replacing them with jumps. For gcd, this means that we can just compile the recursive call to a jump to the beginning of the body, provided we override a and b with the new values. TRO is usually simple to do because it is a local optimization: one can analyze a function in isolation and perform the optimization.

Tail Call Optimization (TCO) is a more general optimization that consists in optimizing not only direct recursive calls, but any call that happens to be in tail position. This is important because some recursive executions occur indirectly, through a chain of function applications. This happens naturally when coding state machines (with cycles that involve several states), or even simple design patterns. Think of the composite design pattern in object-oriented programming. TCO is more complex to implement because it deals with different calls, hence different sizes of stack frames. It is not a simple local transformation.

2.3 Languages

Supporting TCO or not is typically not a property of a language, but of a particular language implementation. One notable exception to this is Scheme, for which the language specification imposes TCO. That is, to be a legal Scheme implementation, TCO must be supported.

Most object-oriented language implementations do not support tail call optimization. That is, the major implementations of Java, Python, Smalltalk, Ruby, JavaScript do not perform TCO. This means that simple recursive patterns (either direct or indirect) trigger stack overflow errors. For instance, a recursive method/pattern over a linked list of 20,000 elements will blow away your Java VM (in its default configuration)!

ECMAScript 6 does offer TCO – which means that recent JavaScript implementations will support the optimization. See this table to check support for TCO in your favorite browser.

One of the main arguments against TCO is that it blurs the call stack, making certain forms of debugging and reflection not applicable. For preserving debugging, the simple solution (which Racket does as well) is simply to not perform TCO when the program is run in debug mode. For reflection, that is, the ability to observe the execution stack at runtime, there are alternative options, like continuation marks (developed by John Clements in his PhD thesis, also supported in Racket), which permit a tradeoff between no TCO and full TCO, preserving only specific information across tail calls.

There exist libraries to introduce support for TRO and/or TCO in several of these OO languages, for instance TCO for Python. Rubby also seems to have a compiler flag to activate TCO. Note that the Racket class system fully supports TCO out-of-the-box, while being an impressively expressive and principled object-oriented language with first-class classes.

Scala supports TRO as a local compiler optimization out-of-the-box, and even provides an annotation tailrec so that programmers can specify that they intend their function to be tail recursive. If they make a mistake and write a recursive function that is not tail recursive, the compiler reports an error identifying the recursive call that is not in tail position. See tail recursion optimization in Scala vs Java for some details and illustrations.

2.4 Trampoline

Because the Java Virtual Machine on which Scala programs run does not support TCO, Scala is limited to TRO. However, to support TCO, Scala uses a well-known technique called a trampoline. See TailCalls in the standard library). Note that some C compilers do some form of TCO or TRO under certain circumstances, with the appropriate optimization flags. But for ensuring general TCO, one also needs to resort to a trampoline. Trampolines are a simple pattern that can be applied in any language; see for instance Trampoline in C++ and Trampolines in JavaScript.

This blog post has nice little drawings to illustrate what a trampoline is. But you can appeal to your imagination: when jumping on a trampoline, you can virtually "go up" as long a distance as you want, without ever going beyond the atmosphere. It might just take you a lot of time and a lot of little jumps. Similarly, the trampoline pattern cuts down a (possibly infinite) chain of calls in a sequence of small steps, each time either returning a function that encapsulates the rest of the work to be done, or a marker to say that one is done.

Here is a very simple trampoline implementation, in Racket (of course, that’s useless in Racket since TCO is already supported ☺).

(deftype TailCall
  (call rest)
  (done val))
 
(define (result tc)
  (match tc
    [(done val) val]
    [(call rest) (result (rest))]))
A tail call is a structure that is either a call (with one attribute which is a thunk encapsulating what next to do) or a done (with the final value). Given a tail call structure, result is a recursive function that checks if the computation is done, and if not, applies one step and calls itself recursively on the returned value. Note that the recursive call to result is a tail call, so a local tail call optimization (as supported by Scala) will apply. If you implement a tramponline in a language without TRO, result should use a while loop instead.

The following shows how to modify mutually-recursive functions to use the trampoline framework:

(define (even n)
  (match n
    [0 (done #t)]
    [1 (done #f)]
    [else (call (λ () (odd (sub1 n))))]))
 
(define (odd n)
  (match n
    [0 (done #f)]
    [1 (done #t)]
    [else (call (λ () (even (sub1 n))))]))

Calling (even 1000102278) only produces a promise to do the job. Executing (result (even 1000102278)) triggers the recursive processing, and eventually produces the final value, while consuming a fixed stack space.