Fix for PR4701

Hi Everyone,

This patch provides an actual fix (as opposed to my previous attempt, which just moved the bugs around) for PR4701, where code accessing fields declared on Objective-C pseudo-builtin types id and Class broke.

With this patch applied, all test pass and programs that include the GNU runtime's <obj/objc-api.h> compile correctly.

I am not totally convinced by the approach here. If a redefinition of id or Class is encountered, this is stored any any attempt to access a field on one of these builtins is then implicitly cast to the real type. This is quite ugly, but seems less ugly than doing it the other way around (adding special cases to all of the Objective-C stuff to let struct objc_object* act as a receiver, and so on).

David

clang.diff (8.08 KB)

Hi David,

If you can include a test case that demonstrate what this patch fixes, I believe I can give you a better approach...

Thanks,

snaroff

Hi Steve,

there's a test case attached to PR4701 (http://llvm.org/bugs/show_bug.cgi?id=4701). I've just discovered that this still breaks assigning Class variables to struct objc_class* variables and vice versa. I think I can fix this by adding another hack in Sema::CheckPointerTypesForAssignment(), but I'd still prefer a better approach if you can suggest one...

When I try to compile GNUstep, I discover a more subtle problem which my patch now fixes, which is that MetaClass is defined like this:

typedef struct objc_class *MetaClass;

This means that clang fails to notice that MetaClass and Class (which are both typedefs for the same underlying type) are compatible when using the builtin definition of Class, which breaks any assignment of a MetaClass value to a Class-typed variable and vice versa.

David

Hi Steve,

there's a test case attached to PR4701 (http://llvm.org/bugs/show_bug.cgi?id=4701). I've just discovered that this still breaks assigning Class variables to struct objc_class* variables and vice versa. I think I can fix this by adding another hack in Sema::CheckPointerTypesForAssignment(), but I'd still prefer a better approach if you can suggest one...

Why not replace 'ObjCIsaExpr' with 'ObjCBuiltinFieldExpr' (used for both 'id' and 'Class'...)? You would need to add an explicit 'IdentifierInfo' slot...

Would this do the trick?

snaroff

Hi Steve,

there's a test case attached to PR4701 (http://llvm.org/bugs/show_bug.cgi?id=4701). I've just discovered that this still breaks assigning Class variables to struct objc_class* variables and vice versa. I think I can fix this by adding another hack in Sema::CheckPointerTypesForAssignment(), but I'd still prefer a better approach if you can suggest one...

Why not replace 'ObjCIsaExpr' with 'ObjCBuiltinFieldExpr' (used for both 'id' and 'Class'...)? You would need to add an explicit 'IdentifierInfo' slot...

I'm not sure what this would achieve. You'd then need to hard-code all of the fields of Class into the compiler. For future OS X compatibility, we are slowly moving towards using a replacement header which defines Class as an opaque type, and this approach would make these fields visible even when the header is not included, which is not what we want. It would also complicate Sema a lot because we'd have to add a set of ObjCBuiltinFieldExpr nodes for each field in each runtime we support (currently the GNU runtime and the Apple legacy runtime would need a set of these).

Would this do the trick?

I also don't see how this would help with casts from Class to MetaClass and vice versa either.

David

Hi Steve,

there's a test case attached to PR4701 (http://llvm.org/bugs/show_bug.cgi?id=4701). I've just discovered that this still breaks assigning Class variables to struct objc_class* variables and vice versa. I think I can fix this by adding another hack in Sema::CheckPointerTypesForAssignment(), but I'd still prefer a better approach if you can suggest one...

Why not replace 'ObjCIsaExpr' with 'ObjCBuiltinFieldExpr' (used for both 'id' and 'Class'...)? You would need to add an explicit 'IdentifierInfo' slot...

I'm not sure what this would achieve. You'd then need to hard-code all of the fields of Class into the compiler. For future OS X compatibility, we are slowly moving towards using a replacement header which defines Class as an opaque type, and this approach would make these fields visible even when the header is not included, which is not what we want. It would also complicate Sema a lot because we'd have to add a set of ObjCBuiltinFieldExpr nodes for each field in each runtime we support (currently the GNU runtime and the Apple legacy runtime would need a set of these).

big hack. When Objective-C was born, it wasn't such a big deal (since there was only one runtime).

Now that id/Class are more 'builtin', Chris/I decided that ObjCIsaExpr worked nicely as a compatibility bridge. I just figured the same approach could be used for your situation. At Apple, we've totally moved away from allowing user code to directly access the meta-data. In our non-fragile runtime, all the same meta-data can be accessed using C functions (so we don't have this problem for Class).

Unless you can deprecate some of this stuff (by replacing it with C function accessors), I'm afraid anything we do is a hack.

snaroff

From my perspective, depending on the C headers in this fashion is a big hack. When Objective-C was born, it wasn't such a big deal (since there was only one runtime).

I completely agree. It's a horrible hack, but that doesn't alter the fact that a large body of existing code that depends on this old behaviour has accumulated over the last two decades.

Now that id/Class are more 'builtin', Chris/I decided that ObjCIsaExpr worked nicely as a compatibility bridge.

It's fine for id (well, almost; the GNU runtime calls this field class_pointer, which is horrible but, again, something lots of existing code depends on). For Class / MetaClass, it's a much bigger problem.

I just figured the same approach could be used for your situation. At Apple, we've totally moved away from allowing user code to directly access the meta-data. In our non-fragile runtime, all the same meta-data can be accessed using C functions (so we don't have this problem for Class).

Unless you can deprecate some of this stuff (by replacing it with C function accessors), I'm afraid anything we do is a hack.

For new code, that's fine. I've written a compatibility framework that sits on top of the GNU runtime and provides exactly the same functionality with the same interfaces, but there is still a fair body of legacy code for both the GNU runtime and the legacy Apple runtime that directly manipulates the class structures and currently clang can't compile this code, so some form of hack is required. We also need to maintain compatibility with GCC in a lot of this code, which does require these headers for accessing fields on id / Class / MetaClass. Deprecating it is fine - and I'd be really happy with a compiler switch that let us error on code that attempted to access fields other than isa on these types even with when their definitions are available - but deprecating and removing are two different things.

I've attached an updated version of the diff that fixes all of the corner cases found by compiling GNUstep's Foundation. It's a hack, but it's a less-invasive hack than adding a load of new expression types. I think the ideal solution would be to have a flag on typedef types that indicated that they can be used as Objective-C message receivers. I assumed __attribute__((NSObject)) would do this, but it seems not to. If we had such a flag then we could just use id and Class typedefs if they were provided with that flag set, which would provide an even less-invasive hack.

David

clang.diff (10 KB)

From my perspective, depending on the C headers in this fashion is a big hack. When Objective-C was born, it wasn't such a big deal (since there was only one runtime).

I completely agree. It's a horrible hack, but that doesn't alter the fact that a large body of existing code that depends on this old behaviour has accumulated over the last two decades.

Now that id/Class are more 'builtin', Chris/I decided that ObjCIsaExpr worked nicely as a compatibility bridge.

It's fine for id (well, almost; the GNU runtime calls this field class_pointer, which is horrible but, again, something lots of existing code depends on). For Class / MetaClass, it's a much bigger problem.

I just figured the same approach could be used for your situation. At Apple, we've totally moved away from allowing user code to directly access the meta-data. In our non-fragile runtime, all the same meta-data can be accessed using C functions (so we don't have this problem for Class).

Unless you can deprecate some of this stuff (by replacing it with C function accessors), I'm afraid anything we do is a hack.

For new code, that's fine. I've written a compatibility framework that sits on top of the GNU runtime and provides exactly the same functionality with the same interfaces, but there is still a fair body of legacy code for both the GNU runtime and the legacy Apple runtime that directly manipulates the class structures and currently clang can't compile this code, so some form of hack is required. We also need to maintain compatibility with GCC in a lot of this code, which does require these headers for accessing fields on id / Class / MetaClass. Deprecating it is fine - and I'd be really happy with a compiler switch that let us error on code that attempted to access fields other than isa on these types even with when their definitions are available - but deprecating and removing are two different things.

I've attached an updated version of the diff that fixes all of the corner cases found by compiling GNUstep's Foundation. It's a hack, but it's a less-invasive hack than adding a load of new expression types. I think the ideal solution would be to have a flag on typedef types that indicated that they can be used as Objective-C message receivers. I assumed __attribute__((NSObject)) would do this, but it seems not to. If we had such a flag then we could just use id and Class typedefs if they were provided with that flag set, which would provide an even less-invasive hack.

This diff seems too invasive. I think my suggestion would be much more localized.

I'm still unclear why my suggestion doesn't fly (since it only involved renaming/repurposing the ObjCIsaExpr class...).

How many fields are we talking about?

snaroff

For id, only one, although we need both ->isa and ->class_pointer because we are slowly moving from using the old GNU headers (which define it as ->class_pointer) to an Apple-compatible one. For Class, there are 13 in the GNU runtime and 10 for the legacy Apple runtime. Note that many of these have quite complex types (structures containing arrays and so on). If you can define them all in a less-invasive change than the one I've proposed then please commit it.

Note that your proposal will leave us using two separate code paths for accessing fields on struct objc_class* and fields on Class, in spite of the two types being equivalent.

Your proposal, however, still does not fix the MetaClass problem, which requires more changes to allow implicit casts between Class (which is now a builtin) and MetaClass (which is a typedef to the same C structure that the headers define Class to be). You will also need to support all of the related cases, for example assigning a struct objc_object* value to an id variable, assigning a Class to a struct objc_class* variable. Note that these are both permitted even with the modern Apple runtime. My diff handles this case correctly.

I believe once you have implemented equivalent functionality with your proposal you will end up with a significantly more complex set of changes than the ones I have written, but I would be more than happy to be proven wrong, as long as something that compiles existing Objective-C code is committed.

David

From my perspective, depending on the C headers in this fashion is a big hack. When Objective-C was born, it wasn't such a big deal (since there was only one runtime).

I completely agree. It's a horrible hack, but that doesn't alter the fact that a large body of existing code that depends on this old behaviour has accumulated over the last two decades.

Now that id/Class are more 'builtin', Chris/I decided that ObjCIsaExpr worked nicely as a compatibility bridge.

It's fine for id (well, almost; the GNU runtime calls this field class_pointer, which is horrible but, again, something lots of existing code depends on). For Class / MetaClass, it's a much bigger problem.

I just figured the same approach could be used for your situation. At Apple, we've totally moved away from allowing user code to directly access the meta-data. In our non-fragile runtime, all the same meta-data can be accessed using C functions (so we don't have this problem for Class).

Unless you can deprecate some of this stuff (by replacing it with C function accessors), I'm afraid anything we do is a hack.

For new code, that's fine. I've written a compatibility framework that sits on top of the GNU runtime and provides exactly the same functionality with the same interfaces, but there is still a fair body of legacy code for both the GNU runtime and the legacy Apple runtime that directly manipulates the class structures and currently clang can't compile this code, so some form of hack is required. We also need to maintain compatibility with GCC in a lot of this code, which does require these headers for accessing fields on id / Class / MetaClass. Deprecating it is fine - and I'd be really happy with a compiler switch that let us error on code that attempted to access fields other than isa on these types even with when their definitions are available - but deprecating and removing are two different things.

I've attached an updated version of the diff that fixes all of the corner cases found by compiling GNUstep's Foundation. It's a hack, but it's a less-invasive hack than adding a load of new expression types. I think the ideal solution would be to have a flag on typedef types that indicated that they can be used as Objective-C message receivers. I assumed __attribute__((NSObject)) would do this, but it seems not to. If we had such a flag then we could just use id and Class typedefs if they were provided with that flag set, which would provide an even less-invasive hack.

This diff seems too invasive. I think my suggestion would be much more localized.

I'm still unclear why my suggestion doesn't fly (since it only involved renaming/repurposing the ObjCIsaExpr class...).

How many fields are we talking about?

For id, only one, although we need both ->isa and ->class_pointer because we are slowly moving from using the old GNU headers (which define it as ->class_pointer) to an Apple-compatible one. For Class, there are 13 in the GNU runtime and 10 for the legacy Apple runtime. Note that many of these have quite complex types (structures containing arrays and so on). If you can define them all in a less-invasive change than the one I've proposed then please commit it.

I see. I didn't realize the scope...

Note that your proposal will leave us using two separate code paths for accessing fields on struct objc_class* and fields on Class, in spite of the two types being equivalent.

Your proposal, however, still does not fix the MetaClass problem, which requires more changes to allow implicit casts between Class (which is now a builtin) and MetaClass (which is a typedef to the same C structure that the headers define Class to be). You will also need to support all of the related cases, for example assigning a struct objc_object* value to an id variable, assigning a Class to a struct objc_class* variable. Note that these are both permitted even with the modern Apple runtime. My diff handles this case correctly.

I believe once you have implemented equivalent functionality with your proposal you will end up with a significantly more complex set of changes than the ones I have written, but I would be more than happy to be proven wrong, as long as something that compiles existing Objective-C code is committed.

Given the number of fields, it sounds like we should go with your least disruptive solution to this. You've convinced me that my original suggestion is unlikely to scale to 23 fields...

Oh well. Given the amount of time I spent on this "cleanup", I hate to add cruft (but we need to support the legacy idioms I suppose...).

Thanks for your patience,

snaroff

Steve,

So, if we go with the approach accepting the redefinition, should we
kill off ObjCIsaExpr?

- Daniel

I'd like to keep it. Long-term, I agree with Steve - having explicit typedefs for what should be builtin types is an ugly hack. If we leave it in then users can slowly drop the includes of the headers that define these types and fall back to using the compiler-supplied definitions. In a little while we can add a warning for files that define these types.

David

I'd like to keep it. Long-term, I agree with Steve - having explicit
typedefs for what should be builtin types is an ugly hack. If we leave it
in then users can slowly drop the includes of the headers that define these
types and fall back to using the compiler-supplied definitions. In a little
while we can add a warning for files that define these types.

To clarify, so your position is that long term clients should drop
uses of the other fields, but direct assignment / reading of isa is
ok?

One downside of that approach (aside from being speculative) is that
it encourages users to write non-portable (to gcc) code. Since it is
also speculative it seems to me to unnecessarily complicate clients,
but I'm happy to defer to others opinions on this.

- Daniel

I'd like to keep it. Long-term, I agree with Steve - having explicit
typedefs for what should be builtin types is an ugly hack. If we leave it
in then users can slowly drop the includes of the headers that define these
types and fall back to using the compiler-supplied definitions. In a little
while we can add a warning for files that define these types.

To clarify, so your position is that long term clients should drop
uses of the other fields, but direct assignment / reading of isa is
ok?

Yes. The Apple runtime.h already marks the fields other than isa as deprecated with the legacy runtime and unavailable on the modern runtime.

One downside of that approach (aside from being speculative) is that
it encourages users to write non-portable (to gcc) code. Since it is
also speculative it seems to me to unnecessarily complicate clients,
but I'm happy to defer to others opinions on this.

Not really a problem for the GNU runtime - any code that uses properties or fast enumeration will compile with clang but not GCC already because no one seems to be working on Objective-C in the FSF's tree and the Apple tree doesn't support the GNU runtime. People using the Apple runtime almost always get these definitions via Foundation.h (often indirectly via Cocoa.h), so there's little chance of them accidentally writing code that doesn't include them (although a future version of the Apple headers may choose to wrap their definitions in #ifndef __clang__).

David

Why the exemption for isa?

Isa is certainly more popular to access than other fields, but there’s no reason not to go through runtime functions.

It’s rather a waste of space to use a full pointer for the class id. It would be nice if we could steal bits for refcounting and such.

-Ken

Actually, isa is a protected ivar of NSObject. Excepted if this field visibility is changed to private, I don't think you can change the isa type without breaking compatibility.

Why the exemption for isa?

No especially good reason, other than the fact that it is mentioned explicitly in all of the Objective-C (and Smalltalk) documentaiton I've seen. The other fields are implementation details, while isa is part of the language.

Isa is certainly more popular to access than other fields, but there's no reason not to go through runtime functions.

Well, inside an instance you'll access it as just another ivar. It's only when you're accessing it from outside, on an id rather than on something like an NSObject*, that you need this special handling.

Steve wanted to remove the isa pointer from the ivar lists of base classes too, which is a stronger motivation for having it treated separately. This wouldn't be an incredibly invasive change, as there are very few Objective-C base classes in the wild (at least, in comparison with non-base classes), but it would break GCC compatibility quite severely, so it's something we'd have to introduce gradually.

I'm not particularly fussed either way. I don't mind removing it now, with a view to potentially reintroducing it later, or leaving it in.

David

Is it okay if I commit this patch? Then we can start ironing out any issues that arise.

David

Sounds good.

Thanks for working on this,

snaroff

Hi David,

I'd like to sort out what we are doing with IsaExpr. I still don't see
a reason to complicate codegen and other clients with this expression.
This could just be because I don't want to implement support for it
(*grin*), but I think that if we really care about being future
looking we would move people to runtime functions; a custom expression
node for syntax which is identical to other general language syntax is
pretty gross...

Also, I noticed a regression with this patch, the problem stems from this: