We now turn to crucial efficiency considerations related to recursion. This section is just a brief and incomplete overview, but it should give you the necessary concepts to learn more on your own if you want to.
The following functions gcd and fact, while similar in structure at first sight, have actually very different execution profiles:
(define (gcd a b) (if (zero? b) a (gcd b (modulo a b)))) (define (fact n) (if (zero? n) 1 (* n (fact (- n 1)))))
(gcd 14 21) ↦ (gcd 21 14) ↦ (gcd 14 7) ↦ (gcd 7 0) ↦ 7
(fact 4) ↦ (* 4 (fact 3)) ↦ (* 4 (* 3 (fact 2))) ↦ (* 4 (* 3 (* 2 (fact 1)))) ↦ (* 4 (* 3 (* 2 (* 1 (fact 0))))) ↦ (* 4 (* 3 (* 2 (* 1 1)))) ↦ (* 4 (* 3 (* 2 1))) ↦ (* 4 (* 3 2)) ↦ (* 4 6) ↦ 24
The main difference between both functions above is that the recursive call to gcd in its body occurs in tail position, ie. as the last thing to do. In the body of fact, the recursive call occurs in argument position of the multiplication. A call in tail position is called a tail call. If the tail call is a recursive call, then it is a tail recursive call. A recursive function where all recursive calls are in tail position is tail recursive.
Exercise: Look at the interpreter. Is it a tail recursive function?
While common knowledge is that any function call requires allocating a new stack frame, if a call is in tail position, we can reuse the same stack frame by just overriding the local variables. This observation yields two important optimizations.
Tail Recursion Optimization (TRO) is the optimization that consists in recognizing recursive calls that are in tail position, and replacing them with jumps. For gcd, this means that we can just compile the recursive call to a jump to the beginning of the body, provided we override a and b with the new values. TRO is usually simple to do because it is a local optimization: one can analyze a function in isolation and perform the optimization.
Tail Call Optimization (TCO) is a more general optimization that consists in optimizing not only direct recursive calls, but any call that happens to be in tail position. This is important because some recursive executions occur indirectly, through a chain of function applications. This happens naturally when coding state machines (with cycles that involve several states), or even simple design patterns. Think of the composite design pattern in object-oriented programming. TCO is more complex to implement because it deals with different calls, hence different sizes of stack frames. It is not a simple local transformation.
Supporting TCO or not is typically not a property of a language, but of a particular language implementation. One notable exception to this is Scheme, for which the language specification imposes TCO. That is, to be a legal Scheme implementation, TCO must be supported.
One of the main arguments against TCO is that it blurs the call stack, making certain forms of debugging and reflection not applicable. For preserving debugging, the simple solution (which Racket does as well) is simply to not perform TCO when the program is run in debug mode. For reflection, that is, the ability to observe the execution stack at runtime, there are alternative options, like continuation marks (developed by John Clements in his PhD thesis, also supported in Racket), which permit a tradeoff between no TCO and full TCO, preserving only specific information across tail calls.
There exist libraries to introduce support for TRO and/or TCO in several of these OO languages, for instance TCO for Python. Rubby also seems to have a compiler flag to activate TCO. Note that the Racket class system fully supports TCO out-of-the-box, while being an impressively expressive and principled object-oriented language with first-class classes.
Scala supports TRO as a local compiler optimization out-of-the-box, and even provides an annotation tailrec so that programmers can specify that they intend their function to be tail recursive. If they make a mistake and write a recursive function that is not tail recursive, the compiler reports an error identifying the recursive call that is not in tail position. See tail recursion optimization in Scala vs Java for some details and illustrations.
This blog post has nice little drawings to illustrate what a trampoline is. But you can appeal to your imagination: when jumping on a trampoline, you can virtually "go up" as long a distance as you want, without ever going beyond the atmosphere. It might just take you a lot of time and a lot of little jumps. Similarly, the trampoline pattern cuts down a (possibly infinite) chain of calls in a sequence of small steps, each time either returning a function that encapsulates the rest of the work to be done, or a marker to say that one is done.
Here is a very simple trampoline implementation, in Racket (of course, that’s useless in Racket since TCO is already supported ☺).
(deftype TailCall (call rest) (done val)) (define (result tc) (match tc [(done val) val] [(call rest) (result (rest))]))
The following shows how to modify mutually-recursive functions to use the trampoline framework:
(define (even n) (match n [0 (done #t)] [1 (done #f)] [else (call (λ () (odd (sub1 n))))])) (define (odd n) (match n [0 (done #f)] [1 (done #t)] [else (call (λ () (even (sub1 n))))]))
Calling (even 1000102278) only produces a promise to do the job. Executing (result (even 1000102278)) triggers the recursive processing, and eventually produces the final value, while consuming a fixed stack space.