New diff

<moving this to cfe-dev, which should have been cc'd the whole time>

Fixed a bug where I was returning the wrong thing while generating metaclass structures (almost the same as the last diff).

This patch doesn't apply.

Here's a new version. It pulls out most of the cases where the value from an llvm::Constant is taken in CodeGen and passes in the initialisers instead.

Now you're copying strings around all over the place for no reason. Converting things to a vector<std::string> does a bunch of string copies. Why do we want this?

I've also changed it so that the Selector passed to the the methods for generating message sends is a selector, rather than the name of a selector. This should be previously obtained by GetSelector. There are now two versions of this, one taking a pair of char*s and one taking a pair of llvm::Value*s as arguments.

I don't understand where you're going with this. You've taken all the nice simple APIs that take Selectors and pass Value*'s around. You are inverting the logic of the code generator to avoid passing in clang data structures.

Further, I don't see why the ObjC codegen stuff shouldn't pull in CodeGenModule. It seems that you're again inverting control flow here to eliminate a very reasonable dependency. Because of this, you've reinvented things (like CGObjCGNU::MakeConstantString), and introduced problems (e.g. MakeConstantString doesn't unique the strings like CGM's does).

There are now two versions of this, one taking a pair of char*s and one taking a pair of llvm::Value*s as arguments. The first returns a statically-bound selector, the latter returns a dynamically-bound selector. This seems cleaner and also makes it easier to support languages where message dispatch can be performed with an extra layer of indirection (I'm not doing this at the moment but I know people who want to use this code for JavaScript and Io implementations and it will help them). Currently everything in clang uses untyped selectors, but I plan on using typed selectors when available in the future (i.e. generate a selector type string from the types passed to the selector). This will allow the Étoilé runtime (which uses selector types for dispatch, where available) to either throw an exception or perform auto [un]boxing at message-send time, rather than just corrupt the stack (as the GNU and NeXT/Apple runtimes do).

I understand (now) that you are trying to reuse the clang codegen in the context of your smalltalk implementation. While this is an interesting goal, this is *not* an acceptable reason to make the clang ObjC codegen code inefficient and weird. I'm somewhat ok with you adding codepaths that are not used by clang (and are thus dead) as long as they are not huge, but making codepaths clang *does* use inefficient and inelegant is not acceptable.

    return Builder.CreateCall(imp, Args.begin(), Args.end());
  }

objc.diff (47.2 KB)

<moving this to cfe-dev, which should have been cc'd the whole time>

Fixed a bug where I was returning the wrong thing while generating metaclass structures (almost the same as the last diff).

This patch doesn't apply.

Here's a new version. It pulls out most of the cases where the value from an llvm::Constant is taken in CodeGen and passes in the initialisers instead.

Now you're copying strings around all over the place for no reason. Converting things to a vector<std::string> does a bunch of string copies. Why do we want this?

Ideally I wouldn't. I'd want it to just take references, but since Selector returns a string, rather than a pointer to a string, a copy is required, and I made the others strings for consistency. Selector objects are not an adequate abstraction here because they do not contain type information, which we can extract from the AST, and they are not really the right thing to pass in semantically since it is the method name, not the selector, that we want (the two are often used interchangeably, but are not the same).

I've also changed it so that the Selector passed to the the methods for generating message sends is a selector, rather than the name of a selector. This should be previously obtained by GetSelector. There are now two versions of this, one taking a pair of char*s and one taking a pair of llvm::Value*s as arguments.

I don't understand where you're going with this. You've taken all the nice simple APIs that take Selectors and pass Value*'s around. You are inverting the logic of the code generator to avoid passing in clang data structures.

This part of the change is definitely cleaner. The runtime interface now more closely mirrors how message dispatch works (as a two stage process, getting the selector and sending the message). It also allows as much type information to be provided to the runtime as is available. Since clang knows the types of the arguments being passed, it can pass this when looking up the selector (it doesn't yet), without changing the message send method. This will aid debugging Objective-C code in the future, since a runtime exception can be thrown if a method is called with arguments of the wrong type: this is not always possible to detect at compile time, and currently causes a corrupt stack and is a colossal pain to trace.

Further, I don't see why the ObjC codegen stuff shouldn't pull in CodeGenModule. It seems that you're again inverting control flow here to eliminate a very reasonable dependency. Because of this, you've reinvented things (like CGObjCGNU::MakeConstantString), and introduced problems (e.g. MakeConstantString doesn't unique the strings like CGM's does).

I would be more than happy to refactor this to reuse existing code if it can be done in a way that didn't introduce direct dependencies on clang. I would propose creating a delegate object that can be passed in, which exposes any methods in CGM that are needed and can be replaced by a different delegate in other implementations, if this would be acceptable.

There are now two versions of this, one taking a pair of char*s and one taking a pair of llvm::Value*s as arguments. The first returns a statically-bound selector, the latter returns a dynamically-bound selector. This seems cleaner and also makes it easier to support languages where message dispatch can be performed with an extra layer of indirection (I'm not doing this at the moment but I know people who want to use this code for JavaScript and Io implementations and it will help them). Currently everything in clang uses untyped selectors, but I plan on using typed selectors when available in the future (i.e. generate a selector type string from the types passed to the selector). This will allow the Étoilé runtime (which uses selector types for dispatch, where available) to either throw an exception or perform auto [un]boxing at message-send time, rather than just corrupt the stack (as the GNU and NeXT/Apple runtimes do).

I understand (now) that you are trying to reuse the clang codegen in the context of your smalltalk implementation.

And in other compilers. The Nu team recently approached me to discuss using the code for their object model (currently they target the Apple runtime directly), and others are planning on using it for JavaScript and Io ports.

I have found bugs in the clang code directly as a result of using it in other work, and I consider reuse to be an important way of testing the code. I would rather not have to maintain a portable and a clang version of this code, since it would mean that bug fixes would take much longer to make their way into clang.

While this is an interesting goal, this is *not* an acceptable reason to make the clang ObjC codegen code inefficient and weird.

Weird is highly subjective - I consider that the code now corresponds closely to the object model semantics, and so don't find it weird. Inefficient is probably true, and I would welcome any constructive suggestions for optimisation.

I'm somewhat ok with you adding codepaths that are not used by clang (and are thus dead) as long as they are not huge, but making codepaths clang *does* use inefficient and inelegant is not acceptable.

I have not written anything in this file which is not intended for use in compiling Objective-C. Some parts are not yet connected to implementations, however anything intended for other languages I have kept in separate files and not submitted. The only code paths that are dead are ones where I have not yet written the calling code.

  return Builder.CreateCall(imp, Args.begin(), Args.end());
}
-
+#include "llvm/ValueSymbolTable.h"
/// Generates a MethodList. Used in construction of a objc_class and

Please don't include things in the middle of .cpp files.

Ooops, this was left in from debugging - it can be removed.

David

<moving this to cfe-dev, which should have been cc'd the whole time>

Here's a new version. It pulls out most of the cases where the value from an llvm::Constant is taken in CodeGen and passes in the initialisers instead.

Now you're copying strings around all over the place for no reason. Converting things to a vector<std::string> does a bunch of string copies. Why do we want this?

Ideally I wouldn't. I'd want it to just take references, but since Selector returns a string, rather than a pointer to a string, a copy is required, and I made the others strings for consistency.

Why not pass the selector around? It is exactly one word and is cheap to copy. It is perfect for this sort of interface.

Selector objects are not an adequate abstraction here because they do not contain type information,

Neither do strings.

which we can extract from the AST, and they are not really the right thing to pass in semantically since it is the method name, not the selector, that we want (the two are often used interchangeably, but are not the same).

Why not?

I've also changed it so that the Selector passed to the the methods for generating message sends is a selector, rather than the name of a selector. This should be previously obtained by GetSelector. There are now two versions of this, one taking a pair of char*s and one taking a pair of llvm::Value*s as arguments.

I don't understand where you're going with this. You've taken all the nice simple APIs that take Selectors and pass Value*'s around. You are inverting the logic of the code generator to avoid passing in clang data structures.

This part of the change is definitely cleaner. The runtime interface now more closely mirrors how message dispatch works (as a two stage process, getting the selector and sending the message). It also allows as much type information to be provided to the runtime as is available. Since clang knows the types of the arguments being passed, it can pass this when looking up the selector (it doesn't yet), without changing the message send method. This will aid debugging Objective-C code in the future, since a runtime exception can be thrown if a method is called with arguments of the wrong type: this is not always possible to detect at compile time, and currently causes a corrupt stack and is a colossal pain to trace.

Is this something supported by existing ObjC runtimes or some new feature you are prototyping? I don't see how this relates to the Selector vs string issue, since neither provides type info.

Further, I don't see why the ObjC codegen stuff shouldn't pull in CodeGenModule. It seems that you're again inverting control flow here to eliminate a very reasonable dependency. Because of this, you've reinvented things (like CGObjCGNU::MakeConstantString), and introduced problems (e.g. MakeConstantString doesn't unique the strings like CGM's does).

I would be more than happy to refactor this to reuse existing code if it can be done in a way that didn't introduce direct dependencies on clang. I would propose creating a delegate object that can be passed in, which exposes any methods in CGM that are needed and can be replaced by a different delegate in other implementations, if this would be acceptable.

Not acceptable. Why not just have your other implementations provide a corresponding version of CGM and Selector that they can pass around?

I understand (now) that you are trying to reuse the clang codegen in the context of your smalltalk implementation.

And in other compilers. The Nu team recently approached me to discuss using the code for their object model (currently they target the Apple runtime directly), and others are planning on using it for JavaScript and Io ports.

I have found bugs in the clang code directly as a result of using it in other work, and I consider reuse to be an important way of testing the code. I would rather not have to maintain a portable and a clang version of this code, since it would mean that bug fixes would take much longer to make their way into clang.

That is very cool and I don't want to lose that. However, clang maintainability and performance is my #1 prio, so it trumps reusability of the code in other contexts. I think that we really just need to find the right way to factor this, I think we can come to a design that fits both purposes reasonably well.

While this is an interesting goal, this is *not* an acceptable reason to make the clang ObjC codegen code inefficient and weird.

Weird is highly subjective - I consider that the code now corresponds closely to the object model semantics, and so don't find it weird. Inefficient is probably true, and I would welcome any constructive suggestions for optimisation.

You're right and I was being overly critical here. The thing I really didn't like about the old design is that you emitted dead strings to LLVM IR and read them back out on the "other side" of the boundary. Passing std::strings around *is* a step up from that, but it still isn't as efficient as passing around Selectors.

I'm somewhat ok with you adding codepaths that are not used by clang (and are thus dead) as long as they are not huge, but making codepaths clang *does* use inefficient and inelegant is not acceptable.

I have not written anything in this file which is not intended for use in compiling Objective-C. Some parts are not yet connected to implementations, however anything intended for other languages I have kept in separate files and not submitted. The only code paths that are dead are ones where I have not yet written the calling code.

There was code in the file for handling "typed selectors" which was dead.

-Chris

Okay, I think this all makes sense. I'll add back in the Selector and CGM dependency, and resubmit the diff. It will take me a little while to refactor and test it with both codebases, but I hope to have something for you this week (the code for typed selectors isn't dead, it's just resting. Both the GNU and Étoilé runtimes use these and can easily - or do already - support errors on lookups with incorrectly typed selectors. The GNUstep implementation of Distributed Objects is also faster if typed selectors are used, since it avoids the extra call to -methodSignatureForSelector: which can trigger a dozen or so message sends to look up information which can be provided by the compiler).

David

Okay, I think this all makes sense. I'll add back in the Selector and CGM dependency, and resubmit the diff. It will take me a little while to refactor and test it with both codebases, but I hope to have something for you this week

Sounds great! Again, I'm sorry for pulling you around in multiple directions like this. I really do appreciate the work you are doing, and I highly value your contributions to clang!

(the code for typed selectors isn't dead, it's just resting. Both the GNU and Étoilé runtimes use these and can easily - or do already - support errors on lookups with incorrectly typed selectors. The GNUstep implementation of Distributed Objects is also faster if typed selectors are used, since it avoids the extra call to -methodSignatureForSelector: which can trigger a dozen or so message sends to look up information which can be provided by the compiler).

Ok! I wasn't aware of that feature, sound nice.

-Chris