Objective-C Code Generation

Hi,

I'm interested in working on Objective-C code generation for the GNU and Étoilé Objective-C runtimes (I've worked on the former and wrote the latter). I imagine the easiest way of getting this working is to transform the Objective-C AST into a pure-C AST with calls to the relevant runtime libraries.

Presumably someone at Apple will want to add support for their runtime libraries as well at some point, so having a clean interface to allow easily switching between the three is going to be important (rather than the 10,000 lines of runtime-specific, unreadable, code in GCC). Are there any existing hooks for inserting this abstraction layer? If so, can someone point me in the right direction, and if not can someone suggest a good place to put them?

David

I'm interested in working on Objective-C code generation for the GNU
and Étoilé Objective-C runtimes (I've worked on the former and wrote
the latter).

Great!

I imagine the easiest way of getting this working is to
transform the Objective-C AST into a pure-C AST with calls to the
relevant runtime libraries.

This is the way that GCC works, but I don't think it is really the best way. I'd much rather have the clang codegen module directly produce the LLVM IR for the constructs it needs. I don't anticipate that this will be a problem for metadata or other objc constructs.

This does require an understanding of the LLVM IR, but it is well documented, and there are lots of examples. If you get stuck, please ask on this list, we'd be happy to help.

Presumably someone at Apple will want to add support for their runtime
libraries as well at some point, so having a clean interface to allow
easily switching between the three is going to be important (rather
than the 10,000 lines of runtime-specific, unreadable, code in GCC).

Yes, absolutely :slight_smile:

Are there any existing hooks for inserting this abstraction layer? If
so, can someone point me in the right direction, and if not can
someone suggest a good place to put them?

At this point, I'd suggest starting with the simple constructs (e.g. the stand alone objc expressions like @"foo", add pointers to interfaces, etc) and then move on the metadata. If you do each patch cleanly and incrementally I don't expect a big problem. When we start on codegen support for the next runtime, we can generalize the code and figure out what abstractions are best on a case-by-case basis.

-Chris

Hi Chris,

I imagine the easiest way of getting this working is to
transform the Objective-C AST into a pure-C AST with calls to the
relevant runtime libraries.

This is the way that GCC works, but I don't think it is really the best way. I'd much rather have the clang codegen module directly produce the LLVM IR for the constructs it needs. I don't anticipate that this will be a problem for metadata or other objc constructs.

I am not completely sure what would be gained by doing this. Unlike C++, every Objective-C construct maps cleanly to a C construct. Objective-C objects are C structures, Objective-C message sends are C function calls (and methods are just C functions with their pointers added to the class structure), and so on. If work has already been done to transform these C constructs into LLVM IR then it seems like there would be some significant code duplication to do the same for Objective-C.

I don't mind generating LLVM IR (I've already written code which generates code targeting the GNU runtime from GNU Lightning), but I'm not sure what the benefit would be.

Are there any existing hooks for inserting this abstraction layer? If
so, can someone point me in the right direction, and if not can
someone suggest a good place to put them?

At this point, I'd suggest starting with the simple constructs (e.g. the stand alone objc expressions like @"foo", add pointers to interfaces, etc) and then move on the metadata. If you do each patch cleanly and incrementally I don't expect a big problem. When we start on codegen support for the next runtime, we can generalize the code and figure out what abstractions are best on a case-by-case basis.

@"foo" is actually decidedly non-trivial from a code-generation perspective. It needs to be expanded to an object, which is a structure of the form:

{
  Class isa;
  char * c_string;
  unsigned int length;
}

Where the class pointed to by isa is determined at compile-time and resolved at runtime (both GNUstep and Cocoa define this as their own constant string representation (an NSString subclass). It's hard to get something like this right until you have classes being generated correctly, which is probably where I'll start.

David

Hi Chris,

I imagine the easiest way of getting this working is to
transform the Objective-C AST into a pure-C AST with calls to the
relevant runtime libraries.

This is the way that GCC works, but I don't think it is really the
best way. I'd much rather have the clang codegen module directly
produce the LLVM IR for the constructs it needs. I don't anticipate
that this will be a problem for metadata or other objc constructs.

I am not completely sure what would be gained by doing this. Unlike C+
+, every Objective-C construct maps cleanly to a C construct.
Objective-C objects are C structures, Objective-C message sends are C
function calls (and methods are just C functions with their pointers
added to the class structure), and so on. If work has already been
done to transform these C constructs into LLVM IR then it seems like
there would be some significant code duplication to do the same for
Objective-C.

I don't mind generating LLVM IR (I've already written code which
generates code targeting the GNU runtime from GNU Lightning), but I'm
not sure what the benefit would be.

clang has several consumers that rely on integrity of ASTs to reflect the source language; pretty printer, dumper, rewriter are among them. In future there will be more. Lowering AST to its c equivalent will kill the abstraction needed by these consumers. It also makes objective-c code gen. runtime dependent as Darwin and Gnu runtimes have different rewuirements in many cases. Of course you can always write an AST-to-AST translator for objective-c code gen. only. But this will be just another Consumer and fits well with clang's architecture.

- Fariborz

I imagine the easiest way of getting this working is to
transform the Objective-C AST into a pure-C AST with calls to the
relevant runtime libraries.

This is the way that GCC works, but I don't think it is really the
best way. I'd much rather have the clang codegen module directly
produce the LLVM IR for the constructs it needs. I don't anticipate
that this will be a problem for metadata or other objc constructs.

I am not completely sure what would be gained by doing this. Unlike C+
+, every Objective-C construct maps cleanly to a C construct.
Objective-C objects are C structures, Objective-C message sends are C
function calls (and methods are just C functions with their pointers
added to the class structure), and so on.

And all of them map onto LLVM IR :slight_smile:

If work has already been
done to transform these C constructs into LLVM IR then it seems like
there would be some significant code duplication to do the same for
Objective-C.

I don't mind generating LLVM IR (I've already written code which
generates code targeting the GNU runtime from GNU Lightning), but I'm
not sure what the benefit would be.

Please see Fariborz's response, it is exactly right. If nothing else, multiple levels of translation slow down compile time.

@"foo" is actually decidedly non-trivial from a code-generation
perspective. It needs to be expanded to an object, which is a
structure of the form:

{
  Class isa;
  char * c_string;
  unsigned int length;
}

Ok, but it isn't hard to generate any of this in LLVM IR. It's probably less code than setting up all the C AST types etc.

Where the class pointed to by isa is determined at compile-time and
resolved at runtime (both GNUstep and Cocoa define this as their own
constant string representation (an NSString subclass). It's hard to
get something like this right until you have classes being generated
correctly, which is probably where I'll start.

Ok! If it's easier to start with defining simple interfaces, go for that, starting with @"foo" was just a suggestion,

-Chris

Hi Chris,

I imagine the easiest way of getting this working is to
transform the Objective-C AST into a pure-C AST with calls to the
relevant runtime libraries.

This is the way that GCC works, but I don't think it is really the
best way. I'd much rather have the clang codegen module directly
produce the LLVM IR for the constructs it needs. I don't anticipate
that this will be a problem for metadata or other objc constructs.

I am not completely sure what would be gained by doing this.
Unlike C+
+, every Objective-C construct maps cleanly to a C construct.
Objective-C objects are C structures, Objective-C message sends are C
function calls (and methods are just C functions with their pointers
added to the class structure), and so on. If work has already been
done to transform these C constructs into LLVM IR then it seems like
there would be some significant code duplication to do the same for
Objective-C.

I don't mind generating LLVM IR (I've already written code which
generates code targeting the GNU runtime from GNU Lightning), but I'm
not sure what the benefit would be.

clang has several consumers that rely on integrity of ASTs to reflect
the source language; pretty printer, dumper, rewriter are among them.
In future there will be more. Lowering AST to its c equivalent will
kill the abstraction needed by these consumers.

I don't believe any abstraction is lost. You just have to be able to distinguish between AST's that reflect the user's source code from AST's that reflect the underlying ObjC object/runtime model.

That doesn't imply I think we should necessarily implement ObjC using AST transforms. If Chris would like to avoid the ObjC->C AST transforms (and go directly to LLVM IR), I trust his judgement. Since we've never done this, some experimentation is necessary. I'm sure David will keep us posted...

snaroff

Yes, I see. In a lot of cases, this will be equivalently easy and probably cleaner. After poking the code a bit I can see one initial stumbling block. Generating code for an Objective-C method involves two steps:

1) Emitting the function that implements the method.
2) Set up the class structure to point to it (modify the structure directly with the GNU runtime, via functions on the Étoilé and Apple runtimes).

The second step is best done via some custom LLVM IR code for each runtime. The first step ought to be able to reuse the code in CodeGenFunction::GenerateCode. Looking at the ObjCMethodDecl class, it seems that this implements most of the methods used by the code generator, but since they do not have a common superclass specifying these functions the code can not easily be used. Can anyone suggest a good way of factoring this out so it can be used in both places? I notice a few TODO and FIXME comments there, so I am hesitant about just copying and pasting code that is going to be changed later.

David

Generating code for an Objective-C method involves
two steps:

1) Emitting the function that implements the method.
2) Set up the class structure to point to it (modify the structure
directly with the GNU runtime, via functions on the Étoilé and Apple
runtimes).

Ok.

The second step is best done via some custom LLVM IR code for each
runtime. The first step ought to be able to reuse the code in
CodeGenFunction::GenerateCode. Looking at the ObjCMethodDecl class,
it seems that this implements most of the methods used by the code
generator, but since they do not have a common superclass specifying
these functions the code can not easily be used. Can anyone suggest a
good way of factoring this out so it can be used in both places? I
notice a few TODO and FIXME comments there, so I am hesitant about
just copying and pasting code that is going to be changed later.

What specifically do you need to do? In the next runtime, message expressions are all pretty simple: they are llvm Function's with internal linkage and names that contain the selector. I assume you would like to factor out the argument lowering code? Is there anything else?

-Chris

I'm talking about method definitions, rather than message sends. These methods are implemented as functions with two hidden arguments (id self and SEL _cmd on the GNU and NeXT runtimes, id self and struct objc_call _call on the Étoilé runtime so, at some point, factoring out the argument lowering code will be useful). A pointer to this function is then installed in the relevant objc_class structure. Obviously this last step will require some new code (and be completely runtime-specific. I think Apple have at least two ways of doing it for their various runtimes), but generating the function body ought to be possible with existing code (since everything that's valid as an Objective-C method is also valid as a function).

The hidden parameters and the function name (I don't think any existing runtimes rely on name mangling for lookup, but it's possible a future one will) should probably be specified in some runtime-specific code. After this, it can be generated exactly as a function would.

David

Ok, it sounds like factoring out the relevant functionality is the right way to go. We'll eventually need very similar code for C++ methods as well, which get a this pointer.

-Chris