"Blocks" in Clang (aka closures)

Hi All,

Steve has started working on an implementation of a language feature named 'Blocks'. The back story on this was that it was prototyped in an private Clang fork (because it is much easier to experiment with clang than with GCC), then implemented in GCC (where it evolved a lot), and now we're re-implementing it in Clang. The language feature is already supported by mainline llvm-gcc, but we don't have up-to-date documentation for it. When that documentation is updated, it will definitely be checked into the main clang repo (in clang/docs). Note that llvm-gcc supports a bunch of deprecated syntax from the evolution of Blocks, but we don't plan to support that old stuff in Clang.

Until there is more real documentation, this is a basic idea of Blocks: it is closures for C. It lets you pass around units of computation that can be executed later. For example:

void call_a_block(void (^blockptr)(int)) {
   blockptr(4);
}

void test() {
   int X = ...
   call_a_block(^(int y){ print(X+y); }); // references stack var snapshot

   call_a_block(^(int y){ print(y*y); });
}

In this example, when the first block is formed, it snapshots the value of X into the block and builds a small structure on the stack. Passing the block pointer down to call_a_block passes a pointer to this stack object. Invoking a block (with function call syntax) loads the relevant info out of the struct and calls it. call_a_block can obviously be passed different blocks as long as they have the same type.

From a technical perspective, blocks fit into C in a couple places: 1) a new declaration type (the caret) which work very much like a magic kind of pointer that can only point to function types. 2) block literals, which capture the computation 3) a new storage class __block 4) a really tiny runtime library.

The new storage class comes into play when you want to get mutable access to variables on the stack. Basically you can mark an otherwise-auto variable with __block (which is currently a macro that expands to an attribute), for example:

void test() {
   int X = ...
   __block int Y = ...
   ^{ X = 4; }; // error, can't modify a const snapshot.
   ^{ Y = 4; }; // ok!
}

From the implementation standpoint, roughly the address of a __block object is captured by the block instead of its value.

The is tricky though because blocks are on the stack, and you may want to refer to some computation (and its __block captured variables) after the function returns. To do this, we have a simple form of reference counting to manage the lifetimes of these. For example, in this case:

void (^P)(int); // global var

void gets_a_block(void (^blockptr)(int)) {
   P = blockptr;
}

void called_sometime_later() {
   P(4);
}

if gets_a_block is called with a block on the stack, and called_sometime_later is called after that stack frame is popped, badness happens (yay for C!). Instead, we use:

void (^P)(int); // global var

void gets_a_block(void (^blockptr)(int)) {
   P = _Block_copy(blockptr); // copies to heap if on the stack with refcount +1, otherwise increments refcount.
}

void called_sometime_later() {
   P(4);
   _Block_release(P); // decrements refcount.
   P = 0;
}

The semantics of this is that it copies the block off the stack *as well as any __block variables it references*, and the shared __block variables are themselves freed when all referencing blocks go away. The really tiny runtime library implements things like _Block_copy and friends.

Other interesting things are that the blocks themselves do limited/optional type inference of the result type:

   foo(^int(){ return 4; }); // takes nothing, returns int.
   foo(^(){ return 4; }); // same thing, inferred to return int.

If you're interested in some more low-level details, it looks like gcc/testsuite/gcc.apple/block-blocks-test-8.c in the llvm-gcc testsuite has some of the underlying layout info, though I have no idea if it is up-to-date.

To head off the obvious question: this syntax and implementation has nothing to do with C++ lambdas. Blocks are designed to work well with C and Objective-C, and unfortunately C++ lambdas really require a language with templates to be very useful. The syntax of blocks and C++ lambdas are completely different, so we expect to eventually support both in the same compiler.

In any case, more detailed documentation will be forthcoming, but I would be happy to answer specific questions (before Friday, at which point I disappear for two weeks on vacation, woo!)

-Chris

Hi Chris,

A couple of questions about this:

1) In Seaside (Smalltalk web-app framework) blocks are used as a way of implementing continuation passing. This requires support for re-binding variables in the closure. Smalltalk does this via the BlockContext object. Is there an equivalent of it here?

2) I know of two existing implementations of blocks in Objective-C, one by Brad Cox in 1991 and one by David Stes in 1998. Both used similar syntax (David Stes simply made untyped arguments default to id). What was the rationale for designing a new syntax for block literals? Was this to allow Objective-C syntax blocks which are objects encapsulating C blocks?

3) Why is the __block storage class required as an explicit type tag? Can this not be inferred by the compiler (i.e. any variable you assign to in the block is promoted, others are copied)?

3a) It isn't clear from your example what happens if I create a block referencing a (non-__block) variable, assign it to a global variable (or an instance variable in another object somewhere), modify the referenced variable on the stack, and then invoke the block function. Does the block see the old or the new value? If it's the old value, then I think this answers my previous question but sounds like it will confuse programmers. It seems that it would be less confusing if every block-reference variable implicitly received the __block storage class, and an explicit tag was made available for variables which should be bound-by-copy.

Exciting news, anyway. I look forward to being able to replace my Objective-C BlockClosure class used in Smalltalk with something better supported. Being able to pass blocks to Objective-C code from Smalltalk has reduced code complexity in a lot of cases for us, and being able to do the same thing (and the converse) from Objective-C is likely to make a lot of people very happy.

David

1) In Seaside (Smalltalk web-app framework) blocks are used as a way
of implementing continuation passing. This requires support for re-
binding variables in the closure. Smalltalk does this via the
BlockContext object. Is there an equivalent of it here?

I'm not familiar with Seaside, so generically...

All local variables are const-copied into the block (if used, of course -- if not, nothing happens) unless marked with __block.

int x = 3;
__block int y = 4;

^{
    // effectively 'const int x;' here
    x = 4; // error
    y = 5; // works fine
};

Global variables work just like they do everywhere else.

... more below ...

2) I know of two existing implementations of blocks in Objective-C,
one by Brad Cox in 1991 and one by David Stes in 1998. Both used
similar syntax (David Stes simply made untyped arguments default to
id). What was the rationale for designing a new syntax for block
literals? Was this to allow Objective-C syntax blocks which are
objects encapsulating C blocks?

The goal is to realize a syntax and feature that is:

- compatible with C, C++, and Objective-C
- a relatively familiar syntax
- fast (hence, Blocks start out as stack based)
- complete

3) Why is the __block storage class required as an explicit type
tag? Can this not be inferred by the compiler (i.e. any variable you
assign to in the block is promoted, others are copied)?

We initially pursued such a syntax and then abandoned it.

Specifically, a variable used by-reference is different; it is accessed via a level of indirection and applying & to grab the address can produce surprising results. Automatic promotion would be surprising and a move away from the precision of C.

Like auto, register, and static, __block is a different class of storage.

3a) It isn't clear from your example what happens if I create a block
referencing a (non-__block) variable, assign it to a global variable
(or an instance variable in another object somewhere), modify the
referenced variable on the stack, and then invoke the block
function. Does the block see the old or the new value? If it's the
old value, then I think this answers my previous question but sounds
like it will confuse programmers. It seems that it would be less
confusing if every block-reference variable implicitly received the
__block storage class, and an explicit tag was made available for
variables which should be bound-by-copy.

The value of non-block variables are bound at the time execution passes over the block declaration.

That is:

int x = 5;

int (^bX)() = ^{ return x; }

x = 6;

bX(); // returns 5

The argument could be made either way about default behavior. In the end, we decided on a default behavior that impacts variable use and behavior *outside* of the block as little as possible.

b.bum

David,

I’m unfamiliar with the Brad Cox implementation of Blocks (though I know he’s been a strong advocate for the feature). A few years ago, Brad and I worked on an ObjC “history” article for http://en.wikipedia.org/wiki/HOPL (and blocks weren’t mentioned). Can you point me to any references? (I’m confused).

Thanks,

snaroff

Hi Steve,

I could be wrong, and he only proposed the feature rather than implementing it. His 1991 Taskmaster position paper at ECOOP described them. His blocks were only downward funargs (intended for exception handling), but the POC implementation using the same syntax, I believe, allows them be full closures. The Taskmaster paper is well worth reading.

David

Hi Steve,

I could be wrong, and he only proposed the feature rather than implementing it. His 1991 Taskmaster position paper at ECOOP described them. His blocks were only downward funargs (intended for exception handling), but the POC implementation using the same syntax, I believe, allows them be full closures. The Taskmaster paper is well worth reading.

Brad wrote about blocks in the ECOOP paper - they were never implemented in the Stepstone translator or GCC.

I remember the Taskmaster paper (and agree it's well worth reading).

snaroff

Thanks Bill,

1) In Seaside (Smalltalk web-app framework) blocks are used as a way
of implementing continuation passing. This requires support for re-
binding variables in the closure. Smalltalk does this via the
BlockContext object. Is there an equivalent of it here?

I'm not familiar with Seaside, so generically...

All local variables are const-copied into the block (if used, of course -- if not, nothing happens) unless marked with __block.

int x = 3;
__block int y = 4;

^{
   // effectively 'const int x;' here
   x = 4; // error
   y = 5; // works fine
};

Global variables work just like they do everywhere else.

I don't think I explained this one properly. Say I have the block you defined here, and I store it somewhere. Later, I want to reuse the block, but have Y point to a different __block-scope variable. This pattern is very important for a lot of the stuff that Seeside uses to turn a sequence of HTTP requests into something that looks like a normal GUI application to the programmer. The general question here is are the blocks reflective, like Smalltalk blocks, or are the opaque like GCC nested functions?

David

Opaque.

Chris Lattner wrote:

Hi All,

Steve has started working on an implementation of a language feature named 'Blocks'. The back story on this was that it was prototyped in an private Clang fork (because it is much easier to experiment with clang than with GCC), then implemented in GCC (where it evolved a lot), and now we're re-implementing it in Clang. The language feature is already supported by mainline llvm-gcc, but we don't have up-to- date documentation for it. When that documentation is updated, it will definitely be checked into the main clang repo (in clang/docs).

I noticed some commits for "blocks" support in Clang but the documentation hasn't been committed yet. Are blocks supported in Clang?

Thanks,
JP

If I recall correctly, the parsing and semantic analysis are basically
complete, and there's the -rewrite-blocks option to clang to transform
C code with blocks into equivalent code without them, but there isn't
any native CodeGen support yet.

-Eli

Eli Friedman wrote:

Eli Friedman wrote:

Chris Lattner wrote:

Hi All,

Steve has started working on an implementation of a language feature
named 'Blocks'. The back story on this was that it was prototyped in
an private Clang fork (because it is much easier to experiment with
clang than with GCC), then implemented in GCC (where it evolved a
lot), and now we're re-implementing it in Clang. The language feature
is already supported by mainline llvm-gcc, but we don't have up-to-
date documentation for it. When that documentation is updated, it
will definitely be checked into the main clang repo (in clang/docs).

I noticed some commits for "blocks" support in Clang but the
documentation hasn't been committed yet. Are blocks supported in Clang?

If I recall correctly, the parsing and semantic analysis are basically
complete, and there's the -rewrite-blocks option to clang to transform
C code with blocks into equivalent code without them, but there isn't
any native CodeGen support yet.

-Eli

Are the examples supposed to run using llvm-gcc on OSX? I'm getting
undefined symbols e.g. __NSConcreteStackBlock using the llvm-gcc
testsuite. I'm assuming that's part of the "small runtime"
and it's not out yet.

Your assumption is correct.

snaroff