Closures, newbie question

So I read the Kaleidoscope tutorial, big thanks to Chris Latter. Good pace, still excellent coverage.

Just at the end it mentions closures and I was wondering how those are done in llvm.
The link was to wikipedia, and i do know what closures/blocks/continuations are, (i think) but maybe someone could point me to where to read about how to do them in llvm.

Thanks
Torsten

[...]

Just at the end it mentions closures and I was wondering how those are done in llvm.
The link was to wikipedia, and i do know what closures/blocks/continuations are, (i think) but maybe someone could point me to where to read about how to do them in llvm.

I've implemented closures in the past --- it's fiddly and a surprising
amount of work, but not actually hard. Basically:

- do static analysis of your code to determine what functions import
what upvalues from where and what functions export upvalues to where.

- any function exporting upvalues needs to allocate its stack frame (or
at least, enough of its stack frame to contain the upvalues) from the heap.

- any function which imports upvalues must take an extra parameter
pointing to the imported stack frame.

- any call to a function which imports upvalues must pass in the stack
frame in the call.

And of course it all has to be recursive, because you may need to pass
said stack frame through another function.

e.g.:

function level1() /* exports i, imports nothing */
{
  var i = 0;
  function level2() /* exports i, imports i */
  {
    function level3() /* exports i & j, imports i */
    {
      var j = 0;
      function level4() /* exports nothing, imports i & j */
      {
        return i + j;
      }

      level4();
    }

    level3();
  }

  level2();
}

There are numerous ways to pass the stack frames into the inner
functions. The naive approach is to simply have each stack frame contain
a pointer to the enclosing frame, but that means that level4 has to
evaluate f.level3.level2.level1.i. Optimising this is easy, and that's
what I've done in the past, but I'm actually wondering whether it may
not be easiest just to pass each frame in as a separate parameter and
then let LLVM take care of the details:

function level1()
{
  var level1frame = { i = 0; };
  level2(level1frame)
}

function level2(level1frame)
{
  var level2frame = { };
  level3(level1frame, level2frame)
}

function level3(level1frame, level2frame)
{
  var level3frame = { j = 0 };
  level4(level1frame, level2frame, level3frame)
}

function level4(level1frame, level2frame, level3frame)
{
  return level1frame.i + level3frame.j;
}

My current project is a very small pure functional language. It doesn't
support closures, but it does have upvalues. I'm implementing them in
this by just passing the variables in as parameters directly --- as it's
pure, I don't have to worry about nested functions changing an upvalue.
(It's at http://cowlark.com/calculon/dir?ci=tip if you're interested,
but be warned, upvalues are currently broken.)

[...]

There are numerous ways to pass the stack frames into the inner
functions. The naive approach is to simply have each stack frame contain
a pointer to the enclosing frame, but that means that level4 has to
evaluate f.level3.level2.level1.i. Optimising this is easy, and that's
what I've done in the past, but I'm actually wondering whether it may
not be easiest just to pass each frame in as a separate parameter and
then let LLVM take care of the details:

I am reminded that in an implementation such as this, closures are *not*
first-class values --- you can't pass them around and they can only be
called statically. D'oh.

In order to produce first-class closures you need to bake all the
information described above statically into an object, and then pass
that object around.

My original implementation of this did this by having a special
pointer-to-closure representation which was two pointers; one to the
code, and one to the stack frame needed to call the code. An equally
valid alternative is to allocate a heap object with the same information
in it. In this case you're basically creating a object with a virtual
method; calling the method calls the closure.

There is another aspect to be wary of: depending on your implementation,
it's possible for two different closures of compatible types have
different calling conventions. Therefore anyone who has a simple
pointer-to-closure object doesn't know how to call the object. e.g.:

function level1()
{
  var i;
  function level2()
  {
    return i;
  }
}

From the point of view of the language, level1 and level2 have the same
type signature --- but as level2 is taking a hidden pointer to level1's
stack frame, and level1 doesn't, it's possible that the implementations
won't.

The easiest way around this is to arrange your implementation so that
the two functions above *do* have the same calling convention which can
be determined from the type signature. This requires a fixed number of
hidden parameters (usually one). The
multiple-stack-frames-passed-as-parameters implementation I described
above isn't suitable here, so I can't recommend it.

(My compiler could tell statically whether you were calling the closure
directly or via a pointer, and so would generate different code. This
made static calls much more efficient but it made calls via a pointer
less efficient, and I'm not sure whether it was worth it or not. It was
certainly much more complex.)