Deriving the Fix-Point Y-Combinator
Mark July 27, 2023 [Programming] #IntroPreface
Why would you want to know how to derive the Y-Combinator? Well, mostly for interest's own sake.
- History: It's a common misconception that electrical computers begat — or at least motivated — study into computation, complexity analysis, universal decidability, and so on. There's a storied history of logicians delving into these topics long before anyone imagined we would be teaching silicon how to think. Curry's presentation of the fixed-point combinator for λ-Calculus (sometime in the late 1930's, I can't find a concrete source) is one of many early cornerstone results that then went on to shape the world of computing.
- Curry’s Paradox: This famous paradox shows where logic fails to model some basic properties of speech and mathematics. The Y-Combinator may be used used to surface this paradox in computational systems. Its close relation with other well known topics like the Liar's paradox, decidability, and Gödel's incompleteness theorems makes this a fun brain teaser to interact with and poses serious challenges to researches involved in the work of trying to formalize or understand the systems being modelled. On top of the emergent complexity inherent to the millions of interacting parts of a large software system, writing bug-proof software is systemically difficult too.
- Recursion: Curry's Y-Combinator is an elegant formal definition of recursion. Describing the semantics of a function call is made more complex if your language allows recursion. While the process is tedious, function calls can generally be described as an equivalent flat sequence of steps (for example; where you assign some variables and copy the body), so you can easily preserve the meaning of a program with functions within a language that doesn't have them. In the face of recursion this simple idea doesn't terminate, so you need some mechanism that lets you reify functions.
Prelude
This blog entry attempts to create a series of small exploratory steps; the net result of which uncovers something very much like the Y-Combinator — a variadic extension, perhaps? It's not that important what we call it. We'll jump right in with Python. We're going to start with a very simple function and a contrived restriction, then keep taking little steps to abstract a bit at a time.
What we're about to do is feed some of the ideas from λ-calc into a simple python program. I don't believe writing Python as λ-calc is a either a recipe for success or a very pythonic thing to do. We're treating Python more as a means to actively play around with pseudo-code than any attempt to be abidingly Pythonic. Python’s dynamic nature and relatively unobtrusive syntax make it a good language to use for this. In the off chance you want to copy some of the code and play around with it until it makes sense, this should be a fairly straight forward task.
I also think Python's runtime semantics are intuitive to a lot of programmers. So let’s jump in!
An anonymous version of factorial
Our first step is going to be to try to puzzle out how to write an anonymous recursive function.
Without recursion, this task is generally quite simple. Here is a function named double
, to apply some value to the function we first reference it by name and apply a value (Instead of a standard Python function def
, I’ve assigned a lambda expression to a variable. It just makes the transformation into an anonymous function very simple).
=
10
Of course, if I want to define the same function without naming it, I can do so. However, If I want to call it, I must call it in-line. Like so:
10
When you do this, at first blush it looks like the re-usability of functions becomes lost to you. For instance, what if I want to know what happens when I compute double(5) + double(6)
? It looks like I have to copy my anonymous function a second time.
+
22
Why even create a function if we can't re-use it? Fortunatly, we can rescue this situation with a little bit of work. We can paramaterize the function itself, which lets us call the parameter twice.
22
This is a way in which we can make sense of an expression like double(5) + double(6)
.
Why restrict ourselves to anonymous functions? Well, nearly every example of recursion you'll see performs the self-referencing step by name. By untangling this single restriction we can both arrive at a deeper understanding of recursion and derive the Y-Combinator. Rather neat, no?
There is a lot of content out there that starts by introducing λ-calc basics and going from there. I chose this sort of sideways approach because as far as I know it is a relatively novel way of understanding this cool bit of computing history.
As a formal definition of recursion λ-calc's Fix-Point Combinator doesn’t much care about how other mathematical objects like numbers or propositions are represented. I think we can get a reasonable distance toward understanding a version of the Fix-Point Combinator via the semantics of Python without worrying about λ-terms, how variables are introduced, what makes a legal α-conversion or β-reduction, etc. I contend we don’t need any of that for what's described here.
What we do need, however, is the single restriction that stops us from using Python's built in call-site resolution to perform recursion on our behalf. Once we've shown that it can be done, we'll start using names again to keep the code going forward a little cleaner and easy to understand.
Here is the function I want to write anonymously:
=
120
Assuming factorial
is just like double
, we could just try dropping the name and calling it in-line:
NameError: name 'factorial' is not defined
Of course, this doesn't work. Since factorial
referred to itself, removing the name messes with the internals of the implementation. So this is the first puzzle: how can we define factorial using this simple recursive definition without relying on some external state where factorial is already defined?
If you’re like me, thinking this through should break your brain a little bit. Understanding the relationship between recursion and looping is a bit more intuitive. If you write a stack to push and pop some state, you can re-write anything recursive as a loop. If you don’t have any loops and you don’t have any built-in recursion, how can you do recursion by just passing functions around?
What we can do is try relying on a common programming trick. If you're ever in a position where you've removed some global value (as is sometimes good coding practise), but you want to keep a function that depends on that value, you can pass that value into the function as a parameter instead.
=
return f
# Becomes
return f
Of course, you'll still need to define prefix
somewhere before calling select_user_repr
, but how you architect your software to accomplish that doesn't matter to select_user_repr
anymore (which makes it a bit more readily reusable).
So if factorial
isn’t in the global scope, maybe we can pass it in somehow instead.
So now we have an anonymous function, but it takes a parameter that we're not sure how to fulfill. This next bit is admittedly a bit of a leap. Let’s think through what goes where the ????????
is sitting above. Remember that this is a recursive call, we want to pass in the same function that we’re calling! Can we do that?
If we could use names we might try something like this:
=
TypeError: <lambda>() missing 1 required positional argument: 'n'
Oh! Oh yeah, if we pass in r
as f
here, then we'll be required to meet the API of r
. Remember that we updated r
to take two arguments. We know what they are and they’re now both in scope so we can just include the new first argument.
=
#_____________________________________^______
120
That seems to work. If you can mess around with the code so far and figure out why this worked, you’ll have managed the most difficult part of figuring out the Fix-Point Combinator.
Of course we've named r
here which wasn't supposed to be allowed.
However, we're now in a better place than we were before because r
doesn't depend on itself explicitly. This means the in-lining trick we used with double
will work with r
. We just need to copy the definition of r
twice.
# 1. r = lambda f, n: 1 if n==0 else n * f(f, n-1)
# 2. r(r,5)
# Substitute r from line 1 in-place of both r in line 2
120
It's a bit of an eye-sore, but it works! We're going to keep going with r
for now, but just remember we can expand r
as shown above.
The parametrization above doesn’t require self-reference, just a function with the same behaviour. For example:
=
=
120
If you trace what’s happening here, r1
is actually only called the once. r2
is then called n-1 times until an answer is found. In a way, r1
’s primary purpose here is to bootstrap r2
. The part of r1
that does the bootstrapping is this: f(f, n-1)
. You can see how it’s calling f
with itself, which we know ends up being r2
with itself. What if r1
doesn’t do any of the rest and just acts to bootstrap r2
? Something like r1 = lambda f, n: f(f,n)
should work? Well yes, but there’s another way to understand this.
Factorial, auto-doubling
When we look at where we've come so far, we have an anonymous function and it works but it's rather annoying to always need to pass in a second identical version of the function each time we want to call it. Instead, we'd like to have a function where we can just pass it a number and get the factorial of that number back as a result.
This is pretty easy, as it turns out. We'll parameterize the 5
in the snippet above into a new function and then call that with 5
.
=
# r(r,5) <-- pull out the 5 as a param ‘n’ for a new function
120
So now we have an anonymous function can call this in-line with only a number and it will evaluate to the factorial of that number. The function calls r
with the correct parameters and we're off to the races again.
We've accomplished the first task that we set out to accomplish. We've created an anonymous version of factorial
. We can call it a few times with different values and compare it against the standard non-anonymous version just to convince ourselves this really works.
# Recursive starting function
=
# Anonymous rewrite
=
return ==
[True, True, True, True, True]
We're just mucking about, so seeing this output is good enough for now.
We could stop here, this really isn't too bad. Though of note is that right now we've hand-written a wrapper for r
. If we write a new recursive function, then we'll need to write another wrapper as well.
For example:
#greatest common denominator
=
# Anonymous rewrite
=
14
This wrapper function isn't really doing anything special, all it does is create a new function that's lacking the first parameter of the wrapped function and then calls the wrapped function properly. The only things that change from wrapper to wrapper is how many parameters there are and which function is being wrapped. Instead of a wrapper for factorial
and another wrapper for gcd
. We can generate wrappers automatically for any similar recursive function expression.
First, lets describe what it means to be a "similar recursive function expression". There are two criteria:
- You must take the recursive call of the function as the first parameter.
- You must pass forward the first parameter as the first parameter again on any recursive calls.
Loosely, your function must look like this:
# Notice that f is the first arg and that f calls itself with f as the first arg
# Any code you want, probably a base case
# ...
# Recursive call
# Any code you want
# ...
return
We're going to write a function mock
(as a reference to the "To Mock a Mockingbird" puzzle book) that generates these wrappers automatically for us. This is a fairly common pattern in computer science. You might hear terms like Adapter, Decorator, or Proxy used in the Object Oriented software dev world. At its simplest, you take some functionality as input and return some functionality that wraps the input with some extra behaviour.
Consider no_op_wrapper
below. It returns a new function that acts the same (extensionally equal) as the given function. There's a level of indirection but no change in behaviour.
return
return
10
mock
is just like no_op_wrapper
, only it adds some observable behaviour. Now the given function is applied to itself first before the call is made.
return
return
You can write this a bit more concisely using Python's lambda expressions:
=
This lets us do the recursive wrapping trick for any recursive function expression as discussed above. Lets try it with factorial
and gcd
:
120
14
We got the right answers, so mock
seems to be working for us.
Pre-applied Recursion
In every one of our recursive function expressions, the second requirement is that we're always going to need to call f
with itself:
# Factorial
#_______________________________^^^______
# GCD
#____________________________________^^^______
This is a small, but repetitive step. Its possible to get this step wrong and create some pretty bizarre control flow.
It's also a rule that should seem a bit suspicious. The first rule parametrizes f
and is what lets us create a recursive call. The second rule is just an extra bit of bookkeeping. It just re-aligns the API to this new form of a function that takes itself as a first parameter. We should think about whether this is accidental or essential complexity. Consider that we've already created a function called mock
that removes our need to call r
with itself. There doesn't seem to be any reason (in principle) why we couldn't use mock
to call f
with itself too.
Really, as a point of pride, we want this step gone. The puzzle here is: how?
Consider the following: mock(f)(n-1)
is extensionally equal to f(f, n-1)
.
120
Given that this works, the recursive expression for factorial
doesn't call f
directly anymore. If we could give the expression mock(f)
in place of f
, then we can drop the requirement that f
needs to call itself as the first argument.
We can use the same trick where we wrap the input and return a modified version. First, lets create a wrapper around our recursive expression r
that doesn't do anything yet:
=
=
\
== \
\
== \
120
True
Next, we add a modification to our wrapper:
Of course a wrapper that does nothing isn't useful, but we can modify this slightly by altering the wrapper's f
before passing it to our recursive inner function. Which will mean that factorial will no longer need to perform that step for itself. Instead of calling mock(f)
as part of the implementation of r
, we call mock(f)
before calling r
.
=
# Update r so it isn't passing f forward anymore
=
120
In doing this, we've crested the hill.
The expression we're wrapping no longer requires f
to pass itself forward on each subsequent call. We're now in a similar position as before we wrote mock
. We need to create this custom wrapper around r
where we call mock(f)
. The solution is the same as well. We'll create a new function called fix
that wraps our new type of recursive function expression.
Here are the updated rules for our recursive function expressions:
- You must take the recursive call of the function as the first parameter.
You must pass forward the first parameter as the first parameter again on any recursive calls.
That feels nice.
Here's what fix
looks like.
=
=
Lets try it out!
120
14
Which gets us to a really nice place. Using fix, we can create any recursive function anonymously if we just include f
as the first parameter and call it exactly the same way we'd make our recursive calls usually.
=
# becomes
#--------------------------------------------------------------
=
# becomes
fix is the Y-Combinator
Fix is actually the Y-Combinator with some *a
and **kw
thrown in to allow us to define our recursive functions in a more python-familiar variadic functions style.
So, if the steps we took along the way didn’t throw you for a loop, you can now derive the Y-Combinator for yourself! Exciting!
Appendix
Let's take our definitions, drop the *a
and **kw
stuff and see what it looks like. In λ-calc, doing this is very loosly akin to something called an η-reduction. In Python, this reduction isn't as straight forward. So while the code below is Python-like, don't take it too seriously. We play a bit fast a loose with the syntax here so that if you see this stuff in the wild, you may recognize it.
This really is the least important part of the article :)
=
# η-reduction (Drop extra arguments that are just passed through)
=
# α-conversion (Rename variables)
=
=
# η-reduction (Drop extra arguments)
=
# expand the first and second M
=
# β reduction (Apply f to the second M)
=
# β reduction (Apply expression on the right to the first M)
=
# α-conversion (Rename variables)
# This is Y as given by Curry
=
# λ-Calc notation
= .
# Or the version we need in eager evaluation languages like python to
# force lazy execution. So this is a bit closer to the fix function we
# derived in this article (Think of z like "*a, **kw")
= .
= .