Thursday, August 05, 2004

Last-Call-Optimisation

First lets consider the old hoary chestnut, the factorial.

; Scheme
(define (fact n)
  (if (zero? n)
      1
      (* n (fact (- n 1)))))

# Python
define fact(n):
  if not n:
    return 1
  else:
    return n * fact(n - 1)
Note: I am going to have to use scheme later, so for the next few examples I'll be using scheme and python together to give non-schemers a basis for comparason.

Now this is a very straight forward recursive definition of the function. The only trouble is that we are going to grow the stack proportional to n. So we end up with the following (abbreviated) call-tree:

fact 4
  fact 3
    fact 2
      fact 1
        fact 0
        * 1 1
      * 2 1
    * 3 2
  * 6 4
-> 24
Note that for (fact 4) we end up with five stack frames on our call stack, and unfortunately there is no alternative as the code is written. We can however eliminate this efficiency by converting to an iterative version:
; Scheme
(define (fact-iter n accum)
  (if (zero? n)
      accum
      (fact-iter (- n 1) (* n accum))))

(define (fact n)
  (fact-iter n 1)

# Python
def factiter(n, accum):
  if n == 0:
    return accum
  else:
    return factiter(n - 1, n * accum)

def fact(n):
  return factiter(n, 1)
Now the call stack looks like:
fact 4
  fact-iter 4 1
    fact-iter 3 4
      fact-iter 2 12
        fact-iter 1 24
          fact-iter 0 24
-> 24
Which dosn't look very useful until you consider that as each call to fact-iter returns, it's caller simply passes the resulting value up the call-chain. In otherwords, the caller's stack-frame is never used after it calls the next fact-iter in the chain; and what is never used dosn't need to be kept. This is called last-call-optimisation, and is a very common optimisation used by many languages to make the above code as efficient as the while loop solution traditionally taught in imperative languages:
/* C */
int fact(n) {
  int accum = 1;
  while (n) {
    accum = accum * n;
    n--;
  }
  return accum;
}
In fact the traditional code generated by a LCO supporting compiler looks alot like this:
/* MIML ;) */
int fact(n) {
  int accum = 1;
factiter:
  if (n == 0)
    return accum;
  else
    accum = n * accum;
    n = n - 1;
    goto factiter;
}
Which is pretty much exactly what the while loop would compile to. This still leaves the question: "Why?", what's wrong with the while loop?

The answer to this is that factorial is defined recursively:

       / 0 -> 1
  n! = |
       \ n -> n * (n - 1)
and it is trivial to compare this definition to the first recursive example and check it is correct. The tail-recursive expression in the second example isn't as easy to confirm, but it is at least in a form amenable to a trivial inductive proof by inspection. Because the while loop dosn't abstract away the inductive step but incorperates it into the loop pre/post-conditions and invariants, it is less amenible to proof. In fact the three examples can be seen as reducing levels of abstraction: moving away from the problem domain (a maths definition) towards the raw metal; and almost a tautology, the closer we can stay to the problem domain the better.

Now this is great for linear algorithms, or algorithms that process linearly recursive datastructures (fancy name for linked-lists ;). However consider a pre-order traversal of a binary tree? This can be made tail-recusive, but it is a fair bit more involved, and the subject of my next post.

2 comments:

Quoll said...

To be pedantic, I wouldn't use the bare assertion that "factorial is defined recursively", as it often isn't.

More often it is defined iteratively as:
n! = \Pi_{i=1}^n i

Sorry, my LaTeX is rusty...

Anonymous said...

I think your LCO is more commonly called tail-call optimization. This is different from the optimization you've described for factorial, which is an accumulator transformation. Together, they turn a recursive fact function into an efficient loop. Scheme does tail-calls, but not accumulator transformations. Python does neither, but most people don't write recursive code so that's OK.