Efficient implementation of closures?

A somewhat random question: I'm wondering if there's any kind of trick in LLVM that would allow me to implement closures efficiently.

We can assume that a closure function has a hidden parameter which points to its environment, that is, the values of the variables which were in scope at the point where the closure function was defined.

The problem comes when mixing closure and non-closure functions, because closure functions have a different calling protocol. Suppose I have a function pointer. This function pointer can either point to a regular function, which does not take the hidden parameter; or it can point to a closure function, which does. The problem is that the call site cannot support two incompatible calling conventions.

I can think of two solutions to this problem, but both entail considerable loss of efficiency: The first solution is to have every function, closure or not, have the hidden parameter; For non-closure functions the value of the parameter is simply ignored. The downside is that now every function has to pay the cost of an extra parameter.

The second solution is that when calling via a pointer, we always call with the closure protocol, i.e. we include the hidden parameter. However, when taking the address of a non-closure function, we actually point to a stub function which strips off the hidden parameter before calling the real function. This solution is better in that it means that the extra cost is only paid when calling a non-closure function via a pointer. However, it means that we essentially pay the cost of marshalling the calling arguments twice.

-- Talin

There is a third option. Let me explain using C++ terminology.

I assume that all escaping variables are in a heap-allocated
activation record. Represent a closure as a class which overloads the
parenthesis operator. When you construct that class, you give a
reference to the activation record as one of the parameters to the
constructor, and the class holds that reference as a field. The
closure can be re-written to access escaping variables indirectly
through that reference.

Calling a closure is then no different than a virtual method call, and
there is no need to change the calling convention, or to mix-and-match
calling conventions.

Nick

I have been thinking about this myself recently and there is another
specialization that can completely remove the performance hit of functional
abstraction in some important cases and, in particular, it is made feasible
by JIT compilation.

It may be instructive to consider a concrete example such as summing the
elements of a float array by passing the add function to the higher-order
fold function:

  let sum xs = Array.fold_left ( + ) 0.0 xs

Previous generation languages like OCaml compile this into a completely
generic representation where polymorphism is handled at run-time.
Consequently, even though the polymorphism is buried in the fold function
(not visible at all in our "sum" function) it still incurs massive
performance degradation.

JIT compilation makes it easy to avoid the cost of polymorphism by generating
code for each permutation of type variables that a definition is used with.
That immediately removes the cost of polymorphism because we get a version of
fold specialized for float arrays.

However, there is still the overhead of the call to "+" which is obviously
gratuitous in this case even if the function call is statically optimized
from a closure into a function pointer. Inlining the higher-order fold
function completely removes this cost as well and recovers the performance of
hand-written C code.

JIT compilation makes a lot of interesting optimizations feasible...

I'm fairly certain that MLton does this during static compilation.

Yes but it requires whole program analysis which obviates dynamic loading and
incremental compilation and that, in turn, greatly limits the utility of the
compiler.