On this page:
2.1 Execution profiles
2.2 Tail recursion and tail calls
2.3 Making a function tail recursive
2.4 Languages
2.5 Trampoline

2 Recursion: Efficiency Considerations

We now turn to crucial efficiency considerations related to recursion. This section is just a brief and incomplete overview, but it should give you the necessary concepts to learn more on your own if you want to.

2.1 Execution profiles

The following functions gcd and fact, while similar in structure at first sight, have actually very different execution profiles:

(define (gcd a b)
  (if (zero? b)
      a
      (gcd b (modulo a b))))
 
(define (fact n)
    (if (zero? n)
        1
        (* n (fact (- n 1)))))

Let us focus on the recursive calls that occur when applying these functions. First, gcd:
(gcd 14 21)
 (gcd 21 14)
 (gcd 14 7)
 (gcd 7 0)
 7
And now fact:
(fact 4)
 (* 4 (fact 3))
 (* 4 (* 3 (fact 2)))
 (* 4 (* 3 (* 2 (fact 1))))
 (* 4 (* 3 (* 2 (* 1 (fact 0)))))
 (* 4 (* 3 (* 2 (* 1 1))))
 (* 4 (* 3 (* 2 1)))
 (* 4 (* 3 2))
 (* 4 6)
 24
As you can see, applying fact requires a growing control context to remember what is next to do once the recursive call returns. On the contrary, when applying gcd, this context growth does not occur because the recursive call is the final result. In practice, the control context is realized through stack allocation, which means a function that executes like fact can easily exhaust the available stack space.

2.2 Tail recursion and tail calls

The main difference between both functions above is that the recursive call to gcd in its body occurs in tail position, ie. as the last thing to do. In the body of fact, the recursive call occurs in argument position of the multiplication. A call in tail position is called a tail call. If the tail call is a recursive call, then it is a tail recursive call. A recursive function where all recursive calls are in tail position is tail recursive.

Exercise: Look at the interpreter. Is it a tail recursive function?

While common knowledge is that any function call requires allocating a new stack frame, if a call is in tail position, we can reuse the same stack frame by just overriding the local variables. This observation yields two important optimizations.

Tail Recursion Optimization (TRO) is the optimization that consists in recognizing recursive calls that are in tail position, and replacing them with jumps. For gcd, this means that we can just compile the recursive call to a jump to the beginning of the body, provided we override a and b with the new values. TRO is usually simple to do because it is a local optimization: one can analyze a function in isolation and perform the optimization.

Tail Call Optimization (TCO) is a more general optimization that consists in optimizing not only direct recursive calls, but any call that happens to be in tail position. This is important because some recursive executions occur indirectly, through a chain of function applications. This happens naturally when coding state machines (with cycles that involve several states), or even simple design patterns. Think of the composite design pattern in object-oriented programming. TCO is more complex to implement because it deals with different calls, hence different sizes of stack frames. It is not a simple local transformation.

2.3 Making a function tail recursive

Going back to fact, the question is whether we can rewrite it so that it so that it is tail recursive. This way, TRO would allow fact to be compiled to an efficient version, which does not consume stack space proportional to the depth of the recursion.

The good news is that, yes, one can transform any recursive function into a tail recursive function. For now, let us focus on fact. The reason we accumulate control context is that the recursive call must return before we do the multiplication. We can simply calculate the pending multiplication result (exploiting associativity of multiplication) and pass around an accumulator that represents the value computed so far. Here is the definition:

(define (fact n)
  (define (fact/acc n acc)
    (if (zero? n)
        acc
        (fact/acc (sub1 n) (* n acc))))
  (fact/acc n 0))

This new version of fact uses an auxiliary function fact/acc that takes as extra argument the accumulator, initially 1, since this is the base case. Note that fact/acc is tail recursive, because in the else branch, the recursive call is the last thing to do. The pending multiplication is now moved to the accumulator. Here is the evaluation profile:

(fact 4)
 (fact/acc 4 1)
 (fact/acc 3 4)
 (fact/acc 2 12)
 (fact/acc 1 24)
 (fact/acc 0 24)
 24

Notice how the control context now remains constant.

Of course, the transformation we have applied is fairly specific to the factorial case. But the general spirit remain. The general transformation that turns any recursive function into a tail recursive one consists in passing the control context as an extra argument. With factorial, it just happens that we were able to specialize the control context to a number. In the general case, we instead need to pass a stack of "things-to-do-next". Such a "thing-to-do-next" is called a continuation. Hence, the general transformation is called a continuation-passing style (CPS) transformation. In fact, the CPS transformation is not specific to recursive functions, it can be applied to any program, turning it into a program where all calls are in tail position.

Continuations are a central concept in programming languages. They can be used to implement all sorts of control operators, like exceptions and coroutines. To learn more about CPS and continuations, see these excellent posts by Matt Might on CPS for JavaScript, how to compile with continuations, and implementing exceptions. Also, PLAI has a whole section on continuations for Web programming.

2.4 Languages

Supporting TCO or not is typically not a property of a language, but of a particular language implementation. One notable exception to this is Scheme, for which the language specification imposes TCO. That is, to be a legal Scheme implementation, TCO must be supported.

Most object-oriented language implementations do not support tail call optimization. That is, the major implementations of Java, Python, Smalltalk, Ruby, JavaScript do not perform TCO. This means that simple recursive patterns (either direct or indirect) trigger stack overflow errors. For instance, a recursive method/pattern over a linked list of 20,000 elements will blow away your Java VM (in its default configuration)!

ECMAScript 6 does offer TCO – which means that recent JavaScript implementations will support the optimization. See this table to check support for TCO in your favorite browser.

One of the main arguments against TCO is that it blurs the call stack, making certain forms of debugging and reflection not applicable. For preserving debugging, the simple solution (which Racket does as well) is simply to not perform TCO when the program is run in debug mode. For reflection, that is, the ability to observe the execution stack at runtime, there are alternative options, like continuation marks (developed by John Clements in his PhD thesis, also supported in Racket), which permit a tradeoff between no TCO and full TCO, preserving only specific information across tail calls.

There exist libraries to introduce support for TRO and/or TCO in several of these OO languages, for instance TCO for Python. Rubby also seems to have a compiler flag to activate TCO. Note that the Racket class system fully supports TCO out-of-the-box, while being an impressively expressive and principled object-oriented language with first-class classes.

Scala supports TRO as a local compiler optimization out-of-the-box. You can use the online Compiler Explorer to observe how Scala optimizes tail recursive calls, while Java does not.

See for instance this snippet that shows how two recursive functions (one non-tail recursive, the other tail recursive) are compiled to Java bytecode. Note how the compilation of the first function uses the invokevirtual instruction, while the second simply uses goto.

Scala also provides an annotation tailrec so that programmers can specify that they intend their function to be tail recursive. If they make a mistake and write a recursive function that is not tail recursive, the compiler reports an error identifying the recursive call that is not in tail position.

2.5 Trampoline

Because the Java Virtual Machine on which Scala programs run does not support TCO, Scala is limited to TRO. However, to support TCO, Scala uses a well-known technique called a trampoline. See TailCalls in the standard library). Note that some C compilers do some form of TCO or TRO under certain circumstances, with the appropriate optimization flags. But for ensuring general TCO, one also needs to resort to a trampoline. Trampolines are a simple pattern that can be applied in any language; see for instance Trampoline in C++ and Trampolines in JavaScript.

This blog post has nice little drawings to illustrate what a trampoline is. But you can appeal to your imagination: when jumping on a trampoline, you can virtually "go up" as long a distance as you want, without ever going beyond the atmosphere. It might just take you a lot of time and a lot of little jumps. Similarly, the trampoline pattern cuts down a (possibly infinite) chain of calls in a sequence of small steps, each time either returning a function that encapsulates the rest of the work to be done, or a marker to say that one is done.

Here is a very simple trampoline implementation, in Racket (of course, that’s useless in Racket since TCO is already supported ☺).

(deftype TailCall
  (call rest)
  (done val))
 
(define (result tc)
  (match tc
    [(done val) val]
    [(call rest) (result (rest))]))
A tail call is a structure that is either a call (with one attribute which is a thunk encapsulating what next to do) or a done (with the final value). Given a tail call structure, result is a recursive function that checks if the computation is done, and if not, applies one step and calls itself recursively on the returned value. Note that the recursive call to result is a tail call, so a local tail call optimization (as supported by Scala) will apply. If you implement a tramponline in a language without TRO, result should use a while loop instead.

The following shows how to modify mutually-recursive functions to use the trampoline framework:

(define (even n)
  (match n
    [0 (done #t)]
    [1 (done #f)]
    [else (call (λ () (odd (sub1 n))))]))
 
(define (odd n)
  (match n
    [0 (done #f)]
    [1 (done #t)]
    [else (call (λ () (even (sub1 n))))]))

Calling (even 1000102278) only produces a promise to do the job. Executing (result (even 1000102278)) triggers the recursive processing, and eventually produces the final value, while consuming a fixed stack space.