Proposal to simplify ObjC Type AST's

Motivation: Simplify clang’s representation of ObjC Type’s.

Quick review: clang’s Objective-C support currently implements 4 distinct ways to refer to instances - they are:

(1) “id”: No static type information (essentially an object-oriented “void *”).
(2) “Interface *”: Refers to an instance of type Interface.
(3) “id <p1, p2>”: Refers to any instance that adopts the ‘p1 & p2’ protocols.
(4) “Interface <p1, p2> *”: Refers to an instance of type Interface that adopts the ‘p1 & p2’ protocols.

Some brief historical notes relevant to this discussion:

(1) The original implementation of ObjC only supported “id” (which was implemented by a C typedef in “objc.h”).
(2) Interface-based typing was later added. Unlike “id” (where the pointer was implicit), interface-based typing still involves an explicit C pointer declaration. This is because we didn’t want to close the door on supporting value-based objects (circa 1988). After 20 years, we’ve never seriously considered adding value objects to ObjC (since they don’t support our release-to-release binary compatibility goals). In hindsight, it’s too bad interface-based typing didn’t have an implicit pointer (like Java). Oh well.
(3) Protocol-based typing was later added to both (1) and (2).
(4) Lastly, GCC supports “Class <p1, p2>”. Chris and I decided to defer supporting this until we simplified the ObjC types under discussion.

This very brief history lesson might help explain the current set of ObjC type AST’s. For example:

(1) We have no ObjC Type AST for “id” (it is currently a magic TypedefType AST installed by Sema and accessed through ASTContext).
(2) We have ObjCInterfaceType, which unlike “id”, doesn’t imply a pointer.
(3) Lastly, we have an ObjCQualifiedIdType (with no relation to the magic typedef) and ObjCQualifiedInterfaceType (which is related to ObjCInterfaceType).

So, reasoning about ObjC object pointer types involves knowing about all the subtle historical differences noted above. ASTContext::isObjCObjectPointerType() helps mask some of the complexity, however there are many other places in clang where the differences are explicit/cumbersome. To help simplify things, I’d like to consider moving to the following single AST that would represent all ObjC object pointer types. Here is some pseudo code:

class ObjCObjectPointerType : public Type, public llvm::FoldingSetNode {

// We could used the lower order bits to encode id/Class (which are built-in, not user-defined).
// Alternatively, we could synthesize built-in ObjCInterfaceDecl’s that correspond to id/Class.
ObjCInterfaceDecl *Decl;

// List of protocols, sorted on protocol name. No protocol is entered more than once.
llvm::SmallVector<ObjCProtocolDecl*, 8> Protocols;

public:
bool isObjCIdType();
bool isObjCInterfaceType();
bool isObjCQualifiedIdType();
bool isObjCQualifiedInterfaceType();

};

The following classes would be deprecated: ObjCQualifiedIdType, ObjCQualifiedInterfaceType. ObjCInterfaceType will still exist, however it’s usage will be minimal (since you can’t declare a variable/field of ObjCInterfaceType). You can, however specify an ObjCInterfaceType as an argument to @encode(Interface).

Note that the implicit pointer is a little odd (since it doesn’t directly reflect what the user typed). For example, the type for “Interface *” will contain 0 pointer types and the type for “Interface **” will contain 1 pointer type. While this may seem odd, it is more useful and reflects the common idiom. The fact that the common idiom doesn’t reflect the language syntax is more of an historical artifact (as mentioned above).

Since a lot of code depends on the ObjC type classes, I wanted to get some feedback before I start tearing things up:-) The actual conversion will be done in several phases (assuming everyone likes the general direction).

Thanks in advance for any feedback!

snaroff

Hi Steve,

I personally like the idea of keeping these classes around and having them subclass ObjCObjectPointerType. This allows one to use cast<> when querying for a specific Objective-C object reference. This allows clients that care about the differences between these references to access this information using static typing.

It also seems to me that specific object references may need information that isn’t necessary for the others, (e.g., having the ‘Protocols’ field). Collapsing all object references into a single class that contains the information for all of them seems somewhat retrograde to me and a little inefficient. We also don’t know what new kind of object reference types may come down the line in future revisions of the language, so keeping the modularity (with a common ancestor class) seems cleaner to me.

One thing that isn’t clear in this proposal is how the implicit typedef definition for ‘id’ will work. Will ‘id x’ resolve to a declaration for ‘x’ that has an ObjCObjectPointerType? What is the type of ‘id’ when we aren’t compiling for Objective-C? (i.e., is it a pointer to a struct, as it is right now).

Ted

Few comments; please see below.

Motivation: Simplify clang’s representation of ObjC Type’s.

Quick review: clang’s Objective-C support currently implements 4 distinct ways to refer to instances - they are:

(1) “id”: No static type information (essentially an object-oriented “void *”).
(2) “Interface *”: Refers to an instance of type Interface.
(3) “id <p1, p2>”: Refers to any instance that adopts the ‘p1 & p2’ protocols.
(4) “Interface <p1, p2> *”: Refers to an instance of type Interface that adopts the ‘p1 & p2’ protocols.

Some brief historical notes relevant to this discussion:

(1) The original implementation of ObjC only supported “id” (which was implemented by a C typedef in “objc.h”).
(2) Interface-based typing was later added. Unlike “id” (where the pointer was implicit), interface-based typing still involves an explicit C pointer declaration. This is because we didn’t want to close the door on supporting value-based objects (circa 1988). After 20 years, we’ve never seriously considered adding value objects to ObjC (since they don’t support our release-to-release binary compatibility goals). In hindsight, it’s too bad interface-based typing didn’t have an implicit pointer (like Java). Oh well.
(3) Protocol-based typing was later added to both (1) and (2).
(4) Lastly, GCC supports “Class <p1, p2>”. Chris and I decided to defer supporting this until we simplified the ObjC types under discussion.

This very brief history lesson might help explain the current set of ObjC type AST’s. For example:

(1) We have no ObjC Type AST for “id” (it is currently a magic TypedefType AST installed by Sema and accessed through ASTContext).
(2) We have ObjCInterfaceType, which unlike “id”, doesn’t imply a pointer.
(3) Lastly, we have an ObjCQualifiedIdType (with no relation to the magic typedef) and ObjCQualifiedInterfaceType (which is related to ObjCInterfaceType).

So, reasoning about ObjC object pointer types involves knowing about all the subtle historical differences noted above. ASTContext::isObjCObjectPointerType() helps mask some of the complexity, however there are many other places in clang where the differences are explicit/cumbersome. To help simplify things, I’d like to consider moving to the following single AST that would represent all ObjC object pointer types. Here is some pseudo code:

class ObjCObjectPointerType : public Type, public llvm::FoldingSetNode {

// We could used the lower order bits to encode id/Class (which are built-in, not user-defined).
// Alternatively, we could synthesize built-in ObjCInterfaceDecl’s that correspond to id/Class.
ObjCInterfaceDecl *Decl;

// List of protocols, sorted on protocol name. No protocol is entered more than once.
llvm::SmallVector<ObjCProtocolDecl*, 8> Protocols;

public:
bool isObjCIdType();

Is there going to be isObjCClassType() in the future?

bool isObjCInterfaceType();

This name is confusing to me. Since this AST is for pointers only, shouldn’t it be named something like isObjCInterfacePointerType()?

bool isObjCQualifiedIdType();
bool isObjCQualifiedInterfaceType();

Same as my last question.


};

The following classes would be deprecated: ObjCQualifiedIdType, ObjCQualifiedInterfaceType. ObjCInterfaceType will still exist, however it’s usage will be minimal (since you can’t declare a variable/field of ObjCInterfaceType). You can, however specify an ObjCInterfaceType as an argument to @encode(Interface).

You are keeping ObjCInterfaceType for stand-alone interface (and presumably a variation for qualified interface type).
If so, then it is not clear to me how you do a type conversion from ObjCObjectPointerType to ObjCInterfaceType when user asks for it.
I know that it is rare, but it can happen as in the following test case:

@interface I @end
I *pi;
int main()
{
return sizeof (*pi);
}

I guess a more general question is does a pointer to ObjCInterfaceType conforms to ObjCObjectPointerType?
- Fariborz

class ObjCObjectPointerType : public Type, public llvm::FoldingSetNode {

// We could used the lower order bits to encode id/Class (which are built-in, not user-defined).
// Alternatively, we could synthesize built-in ObjCInterfaceDecl’s that correspond to id/Class.
ObjCInterfaceDecl *Decl;

// List of protocols, sorted on protocol name. No protocol is entered more than once.
llvm::SmallVector<ObjCProtocolDecl*, 8> Protocols;

public:
bool isObjCIdType();
bool isObjCInterfaceType();
bool isObjCQualifiedIdType();
bool isObjCQualifiedInterfaceType();

};

The following classes would be deprecated: ObjCQualifiedIdType, ObjCQualifiedInterfaceType. ObjCInterfaceType will still exist, however it’s usage will be minimal (since you can’t declare a variable/field of ObjCInterfaceType). You can, however specify an ObjCInterfaceType as an argument to @encode(Interface).

Hi Steve,

I personally like the idea of keeping these classes around and having them subclass ObjCObjectPointerType. This allows one to use cast<> when querying for a specific Objective-C object reference. This allows clients that care about the differences between these references to access this information using static typing.

We can certainly have more classes, however the original classes don’t fit well with having one class that represents both id’s, Interface’s, and (when the time comes) Class.

Here are two alternatives:

// Two classes. In this case, ObjCObjectPointerType is concrete (and can represent id, Interface, and Class).

class ObjCObjectPointerType : public Type { … };

// Represents “id

, Interface

, and Class

”.
class ObjCQualifiedObjectPointerType : public ObjCObjectPointerType, public llvm::FoldingSetNode { … };

// Seven classes. In this case, ObjCObjectPointerType is abstract.

class ObjCObjectPointerType : public Type { … };

class ObjCIdType : public ObjCObjectPointerType { … };
class ObjCInterfacePointerType : public ObjCObjectPointerType { … };
class ObjCClassType : public ObjCObjectPointerType { … };

class ObjCQualifiedIdType : public ObjCIdType, public llvm::FoldingSetNode { … };

class ObjCQualifiedInterfacePointerType : public ObjCInterfacePointerType, public llvm::FoldingSetNode { … };

class ObjCQualifiedClassType : public ObjCClassType, public llvm::FoldingSetNode { … };

I think having a common base class (abstract or not) makes either of these more appealing than what we have now.

Based on your feedback, it seems like you prefer having 7 classes that model the various ObjC types. True?

btw…the names are for illustrative purpose.

It also seems to me that specific object references may need information that isn’t necessary for the others, (e.g., having the ‘Protocols’ field). Collapsing all object references into a single class that contains the information for all of them seems somewhat retrograde to me and a little inefficient. We also don’t know what new kind of object reference types may come down the line in future revisions of the language, so keeping the modularity (with a common ancestor class) seems cleaner to me.

Since we unique types, the space efficiency didn’t concern me. Nevertheless, I understand your point on modularity. Consider this though…at the moment, we have one class that represents all the built-in C types. By analogy, we could have decided to have many subclasses (e.g. BuiltinCharType, BuiltinIntType, BuiltinFloatType, BuiltinBoolType, etc.). Instead of having a boatload of classes, we have a boatload of predicates on Type. That said, I think having is/getAs hooks on Type is an effective way to reduce the complexity of the class hierarchy (especially when the differences don’t matter in many places).

One thing that isn’t clear in this proposal is how the implicit typedef definition for ‘id’ will work. Will ‘id x’ resolve to a declaration for ‘x’ that has an ObjCObjectPointerType? What is the type of ‘id’ when we aren’t compiling for Objective-C? (i.e., is it a pointer to a struct, as it is right now).

Good question. The current scheme of modeling ‘id’ as a typedef was largely done to unify it with how C code works (i.e. the lowest common denominator). This isn’t necessary though. In the new proposal, ‘id x’ will resolve to an ObjCObjectPointerType (as you suggest). The type of ‘id’ when compiling for C code will be a typedef (whose definition is in <objc.h>). The ObjCObjectPointerType would abstract you from understanding the details of the ‘id’ typedef.

snaroff

I personally like the idea of keeping these classes around and having them subclass ObjCObjectPointerType. This allows one to use cast<> when querying for a specific Objective-C object reference. This allows clients that care about the differences between these references to access this information using static typing.

We can certainly have more classes, however the original classes don’t fit well with having one class that represents both id’s, Interface’s, and (when the time comes) Class.

Here are two alternatives:

// Two classes. In this case, ObjCObjectPointerType is concrete (and can represent id, Interface, and Class).

class ObjCObjectPointerType : public Type { … };

// Represents “id

, Interface

, and Class

”.
class ObjCQualifiedObjectPointerType : public ObjCObjectPointerType, public llvm::FoldingSetNode { … };

// Seven classes. In this case, ObjCObjectPointerType is abstract.

class ObjCObjectPointerType : public Type { … };

class ObjCIdType : public ObjCObjectPointerType { … };
class ObjCInterfacePointerType : public ObjCObjectPointerType { … };
class ObjCClassType : public ObjCObjectPointerType { … };

class ObjCQualifiedIdType : public ObjCIdType, public llvm::FoldingSetNode { … };

class ObjCQualifiedInterfacePointerType : public ObjCInterfacePointerType, public llvm::FoldingSetNode { … };

class ObjCQualifiedClassType : public ObjCClassType, public llvm::FoldingSetNode { … };

I think having a common base class (abstract or not) makes either of these more appealing than what we have now.

Based on your feedback, it seems like you prefer having 7 classes that model the various ObjC types. True?

Interesting. The 7 class approach is nice in that it clearly represents the different categories of Objective-C object reference types. It also seems like a lot of classes, but it is conceptually clean. Most of the time clients would just use ObjCObjectPointerType, fewer clients would use ObjCQualifiedInterfacePointerType, and fewer would use the rest of the classes.

Are there cases in Sema that would be easier to write using the 7 class approach than the 2 class approach?

btw…the names are for illustrative purpose.

It also seems to me that specific object references may need information that isn’t necessary for the others, (e.g., having the ‘Protocols’ field). Collapsing all object references into a single class that contains the information for all of them seems somewhat retrograde to me and a little inefficient. We also don’t know what new kind of object reference types may come down the line in future revisions of the language, so keeping the modularity (with a common ancestor class) seems cleaner to me.

Since we unique types, the space efficiency didn’t concern me. Nevertheless, I understand your point on modularity. Consider this though…at the moment, we have one class that represents all the built-in C types. By analogy, we could have decided to have many subclasses (e.g. BuiltinCharType, BuiltinIntType, BuiltinFloatType, BuiltinBoolType, etc.).

True, but the built-in types for C are bounded by the language standard. The Objective-C class hierarchy is defined by frameworks and headers. The latter seems a couple orders of magnitude larger.

Also, the built-in types aren’t really parametric, like id<…> and Class<…>, which represent a family of types rather than a specific type. I also don’t consider 7 classes to be a boatload of classes.

Instead of having a boatload of classes, we have a boatload of predicates on Type. That said, I think having is/getAs hooks on Type is an effective way to reduce the complexity of the class hierarchy (especially when the differences don’t matter in many places).

It’s also conceptually muddy to me to throw all the bits that any object pointer type would want to use into a single class. It can create cases where the meaning of different instance variables becomes conflated depending on the kind of object pointer type one is trying to represent.

One thing that isn’t clear in this proposal is how the implicit typedef definition for ‘id’ will work. Will ‘id x’ resolve to a declaration for ‘x’ that has an ObjCObjectPointerType? What is the type of ‘id’ when we aren’t compiling for Objective-C? (i.e., is it a pointer to a struct, as it is right now).

Good question. The current scheme of modeling ‘id’ as a typedef was largely done to unify it with how C code works (i.e. the lowest common denominator). This isn’t necessary though. In the new proposal, ‘id x’ will resolve to an ObjCObjectPointerType (as you suggest). The type of ‘id’ when compiling for C code will be a typedef (whose definition is in <objc.h>). The ObjCObjectPointerType would abstract you from understanding the details of the ‘id’ typedef.

Right. So in the case of -x objective-c, the frontend will just have to magically handle the typedef definition for ‘id’ if it encounters it?

I personally like the idea of keeping these classes around and having them subclass ObjCObjectPointerType. This allows one to use cast<> when querying for a specific Objective-C object reference. This allows clients that care about the differences between these references to access this information using static typing.

We can certainly have more classes, however the original classes don’t fit well with having one class that represents both id’s, Interface’s, and (when the time comes) Class.

Here are two alternatives:

// Two classes. In this case, ObjCObjectPointerType is concrete (and can represent id, Interface, and Class).

class ObjCObjectPointerType : public Type { … };

// Represents “id

, Interface

, and Class

”.
class ObjCQualifiedObjectPointerType : public ObjCObjectPointerType, public llvm::FoldingSetNode { … };

// Seven classes. In this case, ObjCObjectPointerType is abstract.

class ObjCObjectPointerType : public Type { … };

class ObjCIdType : public ObjCObjectPointerType { … };
class ObjCInterfacePointerType : public ObjCObjectPointerType { … };
class ObjCClassType : public ObjCObjectPointerType { … };

class ObjCQualifiedIdType : public ObjCIdType, public llvm::FoldingSetNode { … };

class ObjCQualifiedInterfacePointerType : public ObjCInterfacePointerType, public llvm::FoldingSetNode { … };

class ObjCQualifiedClassType : public ObjCClassType, public llvm::FoldingSetNode { … };

I think having a common base class (abstract or not) makes either of these more appealing than what we have now.

Based on your feedback, it seems like you prefer having 7 classes that model the various ObjC types. True?

Interesting. The 7 class approach is nice in that it clearly represents the different categories of Objective-C object reference types. It also seems like a lot of classes, but it is conceptually clean. Most of the time clients would just use ObjCObjectPointerType, fewer clients would use ObjCQualifiedInterfacePointerType, and fewer would use the rest of the classes.

Are there cases in Sema that would be easier to write using the 7 class approach than the 2 class approach?

I’ll have to look. I’m warming up to the 7 classes we’ve sketched. The common base gives us the simplicity we are shooting for (without putting too much into one class, which may have been too simplistic).

btw…the names are for illustrative purpose.

It also seems to me that specific object references may need information that isn’t necessary for the others, (e.g., having the ‘Protocols’ field). Collapsing all object references into a single class that contains the information for all of them seems somewhat retrograde to me and a little inefficient. We also don’t know what new kind of object reference types may come down the line in future revisions of the language, so keeping the modularity (with a common ancestor class) seems cleaner to me.

Since we unique types, the space efficiency didn’t concern me. Nevertheless, I understand your point on modularity. Consider this though…at the moment, we have one class that represents all the built-in C types. By analogy, we could have decided to have many subclasses (e.g. BuiltinCharType, BuiltinIntType, BuiltinFloatType, BuiltinBoolType, etc.).

True, but the built-in types for C are bounded by the language standard. The Objective-C class hierarchy is defined by frameworks and headers. The latter seems a couple orders of magnitude larger.

Also, the built-in types aren’t really parametric, like id<…> and Class<…>, which represent a family of types rather than a specific type. I also don’t consider 7 classes to be a boatload of classes.

I agree…didn’t mean to dramatize:-)

Instead of having a boatload of classes, we have a boatload of predicates on Type. That said, I think having is/getAs hooks on Type is an effective way to reduce the complexity of the class hierarchy (especially when the differences don’t matter in many places).

It’s also conceptually muddy to me to throw all the bits that any object pointer type would want to use into a single class. It can create cases where the meaning of different instance variables becomes conflated depending on the kind of object pointer type one is trying to represent.

I agree, the cleaner solution is to have an explicit interface for both ‘id’ and ‘Class’.

One thing that isn’t clear in this proposal is how the implicit typedef definition for ‘id’ will work. Will ‘id x’ resolve to a declaration for ‘x’ that has an ObjCObjectPointerType? What is the type of ‘id’ when we aren’t compiling for Objective-C? (i.e., is it a pointer to a struct, as it is right now).

Good question. The current scheme of modeling ‘id’ as a typedef was largely done to unify it with how C code works (i.e. the lowest common denominator). This isn’t necessary though. In the new proposal, ‘id x’ will resolve to an ObjCObjectPointerType (as you suggest). The type of ‘id’ when compiling for C code will be a typedef (whose definition is in <objc.h>). The ObjCObjectPointerType would abstract you from understanding the details of the ‘id’ typedef.

Right. So in the case of -x objective-c, the frontend will just have to magically handle the typedef definition for ‘id’ if it encounters it?

Yep.

snaroff

Few comments; please see below.

Motivation: Simplify clang’s representation of ObjC Type’s.

Quick review: clang’s Objective-C support currently implements 4 distinct ways to refer to instances - they are:

(1) “id”: No static type information (essentially an object-oriented “void *”).
(2) “Interface *”: Refers to an instance of type Interface.
(3) “id <p1, p2>”: Refers to any instance that adopts the ‘p1 & p2’ protocols.
(4) “Interface <p1, p2> *”: Refers to an instance of type Interface that adopts the ‘p1 & p2’ protocols.

Some brief historical notes relevant to this discussion:

(1) The original implementation of ObjC only supported “id” (which was implemented by a C typedef in “objc.h”).
(2) Interface-based typing was later added. Unlike “id” (where the pointer was implicit), interface-based typing still involves an explicit C pointer declaration. This is because we didn’t want to close the door on supporting value-based objects (circa 1988). After 20 years, we’ve never seriously considered adding value objects to ObjC (since they don’t support our release-to-release binary compatibility goals). In hindsight, it’s too bad interface-based typing didn’t have an implicit pointer (like Java). Oh well.
(3) Protocol-based typing was later added to both (1) and (2).
(4) Lastly, GCC supports “Class <p1, p2>”. Chris and I decided to defer supporting this until we simplified the ObjC types under discussion.

This very brief history lesson might help explain the current set of ObjC type AST’s. For example:

(1) We have no ObjC Type AST for “id” (it is currently a magic TypedefType AST installed by Sema and accessed through ASTContext).
(2) We have ObjCInterfaceType, which unlike “id”, doesn’t imply a pointer.
(3) Lastly, we have an ObjCQualifiedIdType (with no relation to the magic typedef) and ObjCQualifiedInterfaceType (which is related to ObjCInterfaceType).

So, reasoning about ObjC object pointer types involves knowing about all the subtle historical differences noted above. ASTContext::isObjCObjectPointerType() helps mask some of the complexity, however there are many other places in clang where the differences are explicit/cumbersome. To help simplify things, I’d like to consider moving to the following single AST that would represent all ObjC object pointer types. Here is some pseudo code:

class ObjCObjectPointerType : public Type, public llvm::FoldingSetNode {

// We could used the lower order bits to encode id/Class (which are built-in, not user-defined).
// Alternatively, we could synthesize built-in ObjCInterfaceDecl’s that correspond to id/Class.
ObjCInterfaceDecl *Decl;

// List of protocols, sorted on protocol name. No protocol is entered more than once.
llvm::SmallVector<ObjCProtocolDecl*, 8> Protocols;

public:
bool isObjCIdType();

Is there going to be isObjCClassType() in the future?

Sure…

bool isObjCInterfaceType();

This name is confusing to me. Since this AST is for pointers only, shouldn’t it be named something like isObjCInterfacePointerType()?

That’s fine with me…

bool isObjCQualifiedIdType();
bool isObjCQualifiedInterfaceType();

Same as my last question.


};

The following classes would be deprecated: ObjCQualifiedIdType, ObjCQualifiedInterfaceType. ObjCInterfaceType will still exist, however it’s usage will be minimal (since you can’t declare a variable/field of ObjCInterfaceType). You can, however specify an ObjCInterfaceType as an argument to @encode(Interface).

You are keeping ObjCInterfaceType for stand-alone interface (and presumably a variation for qualified interface type).
If so, then it is not clear to me how you do a type conversion from ObjCObjectPointerType to ObjCInterfaceType when user asks for it.
I know that it is rare, but it can happen as in the following test case:

@interface I @end
I *pi;
int main()
{
return sizeof (*pi);
}

I guess a more general question is does a pointer to ObjCInterfaceType conforms to ObjCObjectPointerType?

Great example. I think ObjCObjectPointerType would implement getPointeeType(), which would return an ObjCInterfaceType. This is what BlockPointerType and MemberPointerType do.

Sema::CheckIndirectionOperand() will need to allow for this, since the “*” will be implicit.

Make sense?

snaroff

Yes. There is still a question of what to do when ‘id’/‘Class’ 's declaration seen in objc.h. Since you going to
treat them as first-class types. Then one possible solution is to #ifdef them out for clang as its has outlived its
usefulness.

  • Fariborz

We can just continue to treat them magically; we already have code for
doing that (see Sema::MergeTypeDefDecl).

-Eli

Yes. There is still a question of what to do when 'id'/'Class' 's
declaration seen in objc.h. Since you going to
treat them as first-class types. Then one possible solution is to #ifdef
them out for clang as its has outlived its
usefulness.

We can just continue to treat them magically; we already have code for
doing that (see Sema::MergeTypeDefDecl).

In the current scheme of things, both old and new 'id' s are typedefs. I am not sure
how we want to treat a built-in type being redeclared as a typedef; other than
ignoring one in favor of the other.

- Fariborz

As far as I can tell, that doesn't change in this scheme: typedefs are
our standard way of introducing names of builtin types into the
translation unit. We'd just be typedef'ing "id" to an ObjCIdType
rather than a pointer to a struct objc_object.

The only tricky thing to get right is if we allow code to reference
"struct objc_object" directly; I think we can take care of that with a
check in ASTContext::getPointerType, though (essentially, the idea is
to make it impossible to construct a "struct objc_object*").

-Eli

Yes. There is still a question of what to do when 'id'/'Class' 's
declaration seen in objc.h. Since you going to
treat them as first-class types. Then one possible solution is to #ifdef
them out for clang as its has outlived its
usefulness.

We can just continue to treat them magically; we already have code for
doing that (see Sema::MergeTypeDefDecl).

In the current scheme of things, both old and new 'id' s are typedefs.

As far as I can tell, that doesn't change in this scheme: typedefs are
our standard way of introducing names of builtin types into the
translation unit. We'd just be typedef'ing "id" to an ObjCIdType
rather than a pointer to a struct objc_object.

The only tricky thing to get right is if we allow code to reference
"struct objc_object" directly; I think we can take care of that with a
check in ASTContext::getPointerType, though (essentially, the idea is
to make it impossible to construct a "struct objc_object*").

I am not sure if any want want to do this. But following code is legal:
#include <objc/objc.h>
id pi;
Class min()
{
   return pi->isa;
}

- Fariborz

That doesn't compile on my computer ("error: ‘struct objc_object’ has
no member named ‘isa’").

In any case, we can support this sort of construct with explicit
checks in Sema::ActOnMemberReferenceExpr and
Sema::CheckIndirectionOperand.

-Eli

As far as I can tell, that doesn't change in this scheme: typedefs are
our standard way of introducing names of builtin types into the
translation unit. We'd just be typedef'ing "id" to an ObjCIdType
rather than a pointer to a struct objc_object.

The only tricky thing to get right is if we allow code to reference
"struct objc_object" directly; I think we can take care of that with a
check in ASTContext::getPointerType, though (essentially, the idea is
to make it impossible to construct a "struct objc_object*").

The default definition of id in ASTContext is fall-back code which is only important if the program does not include headers that define this type. The only Objective-C code I've ever seen where this is the case is the clang testsuite. In the real world it never happens; you always include the header provided by whichever runtime you happen to be using (most commonly indirectly via Foundation.h) which defines id, SEL, Class, and IMP.

Constructing a struct objc_object* does happen in real code, but only if you have included the header, created a struct objc_object and taken its address. GCC does accept this code:

#import <objc/Object.h>

int main(void)
{
     struct objc_object a;
     a.isa = objc_getClass("Object");
     [&a init];
     return 0;
}

This is incredibly ugly, however, and I think it would be marginally less bad if it required an explicit cast of the receiver to an object type to work. If this breaks any existing code, it's code that deserves to have been broken a long time ago. GCC actually accepts some even more wrong things, like this:

int main(void)
{
     struct {void* isa;} a;
     a.isa = objc_getClass("Object");
     [&a init];
     return 0;
}

This does raise a warning that a is not a valid receiver type (and that init is not known as a selector), but still compiles and runs.

If you are compiling bits of an Objective-C runtime or a similar supporting library then you will come across this kind of thing, but in this code you will already have a lot of explicit casts.

I am not sure if any want want to do this. But following code is legal:
#include <objc/objc.h>
id pi;
Class min()
{
  return pi->isa;
}

This is a much more common idiom. Ideally, however, we would be treating this as an ivar access with a known offset of 0, rather than as a structure field access. This probably ought to generate the same code irrespective of whether pi is an id, an NSObject* or an Object* (although it should give a warning / error on NSObject where isa is declared @protected). Perhaps we should be declaring an implicit Class isa as an ivar on any ObjCIdType, so ObjCIdType defines a pointer to an object with a single ivar which accepts all messages.

David

I am not sure if any want want to do this. But following code is

legal:

#include <objc/objc.h>

id pi;

Class min()

{

return pi->isa;

}

This is a much more common idiom. Ideally, however, we would be
treating this as an ivar access with a known offset of 0, rather than
as a structure field access. This probably ought to generate the same
code irrespective of whether pi is an id, an NSObject* or an Object*
(although it should give a warning / error on NSObject where isa is
declared @protected). Perhaps we should be declaring an implicit
Class isa as an ivar on any ObjCIdType, so ObjCIdType defines a
pointer to an object with a single ivar which accepts all messages.

This sounds good to me.

Consider the following pseudo code (for discussion…not meant to be complete):

@class Class, Protocol;

@interface id // ObjCIdType: The implicit root of all ObjC objects.
{
Class isa;
}
// All ObjC roots will inherit the following class methods.
// This will greatly simplify defining an ObjC root class.

  • (Class)class;

  • (Class)superclass;

  • (BOOL)instancesRespondToSelector:(SEL)aSelector;

  • (BOOL)conformsToProtocol:(Protocol *)protocol;

  • (IMP)instanceMethodForSelector:(SEL)aSelector;

  • (BOOL)isKindOfClass:(Class)aClass;
  • (BOOL)isMemberOfClass:(Class)aClass;
  • (BOOL)conformsToProtocol:(Protocol *)aProtocol;
  • (BOOL)respondsToSelector:(SEL)aSelector;
    @end

@interface Class // ObjCClassType: Will implicitly inherit from ‘id’

// No explicit ivar declarations, the structure of ‘Class’ is intentionally opaque (since the details are runtime specific).

  • (Class)class;

  • (Class)superclass;

  • (BOOL)instancesRespondToSelector:(SEL)aSelector;

  • (BOOL)conformsToProtocol:(Protocol *)protocol;

  • (IMP)instanceMethodForSelector:(SEL)aSelector;

@end

Conceptually, I like this…it makes some of the current legacy C ‘magic’ much more explicit.

It does, however, require another bit of magic. Since the current definition of ‘id’ and ‘Class’ include a pointer, we will have to treat this specially (and not require an explicit ‘’ as we do for user-defined classes). Shouldn’t be a big deal, however wanted to mention it. One day, maybe we can make the '’ implicit for user-defined classes:-)

The other issue is source compatibility with existing root classes (that explicitly declare an ‘isa’, for example). This could be handled in many ways…not worth talking about until we agree on the concept/direction.

What do you think?

snaroff

I am not sure if any want want to do this. But following code is
legal:
#include <objc/objc.h>
id pi;
Class min()
{
return pi->isa;
}

This is a much more common idiom. Ideally, however, we would be
treating this as an ivar access with a known offset of 0, rather than
as a structure field access. This probably ought to generate the same
code irrespective of whether pi is an id, an NSObject* or an Object*
(although it should give a warning / error on NSObject where isa is
declared @protected). Perhaps we should be declaring an implicit
Class isa as an ivar on any ObjCIdType, so ObjCIdType defines a
pointer to an object with a single ivar which accepts all messages.

This sounds good to me.

Consider the following pseudo code (for discussion...not meant to be complete):

@class Class, Protocol;

@interface id // ObjCIdType: The implicit root of all ObjC objects.
{
  Class isa;
}
// All ObjC roots will inherit the following class methods.
// This will greatly simplify defining an ObjC root class.

Can you clarify what you mean by this? Not all root classes implement these methods (e.g. Object does not; it implements equivalent ones with different names). Are you proposing that we synthesize these methods? I'm not totally opposed to this, although it will require a lot of runtime-specific code.

@interface Class // ObjCClassType: Will implicitly inherit from 'id'

// No explicit ivar declarations, the structure of 'Class' is intentionally opaque (since the details are runtime specific).

- (Class)class;
- (Class)superclass;
- (BOOL)instancesRespondToSelector:(SEL)aSelector;
- (BOOL)conformsToProtocol:(Protocol *)protocol;
- (IMP)instanceMethodForSelector:(SEL)aSelector;

@end

Conceptually, I like this...it makes some of the current legacy C 'magic' much more explicit.

It's definitely a nicer model conceptually.

It does, however, require another bit of magic. Since the current definition of 'id' and 'Class' include a pointer, we will have to treat this specially (and not require an explicit '*' as we do for user-defined classes). Shouldn't be a big deal, however wanted to mention it. One day, maybe we can make the '*' implicit for user-defined classes:-)

Well, it would break almost all existing code, but definitely be nicer syntax. I'm not sure if there's a way of gradually introducing it; even if you make it per-compilation-unit it's going to be a bit confusing for people reading the code.

If only we had a nice rewriter that could take Objective-C code and emit slightly-modified code...

The other issue is source compatibility with existing root classes (that explicitly declare an 'isa', for example). This could be handled in many ways...not worth talking about until we agree on the concept/direction.

This is my biggest concern. Ignoring the first ivar in a root class if it's an id and is called isa is pretty trivial, but I wonder how we will cope with existing Objective-C headers.

The GNU headers, for example, have a silly definition of id, which is (roughly):

typedef struct objc_object { Class class_pointer; } *id;

What happens with this model when someone does:

id a = whatever;
a->class_pointer = something;

If we are treating id as an object with a public isa pointer, then this code will break, and including the GNU header is likely to cause conflicts too.

What do you think?

Very nice in theory, but I'd like to know how we deal with the irritating corner-cases.

David

Putting aside the question of mixing framework conventions with the language, should instanceMethodForSelector: be here? It’s not part of the <NSObject> protocol, for for a very good reason: it’s not meaningful for most proxies.

How about eight classes?

// Eight classes. In this case, both ObJCObjectPointerType and ObjCProtocolQualifierNode are abstract.

class ObjCObjectPointerType : public Type { … };

class ObjCIdType : public ObjCObjectPointerType { … };
class ObjCInterfacePointerType : public ObjCObjectPointerType { … };
class ObjCClassType : public ObjCObjectPointerType { … };

class ObjCProtocolQualifierNode : public llvm::FoldingSetNode { … };

class ObjCQualifiedIdtype : public ObjCIdType, public ObjCProtocolQualifierNode { … };
class ObjCQualifiedInterfacePointerType : public ObjCInterfacePointerType, public ObjCProtocolQualifierNode { … };
class ObjCQualifiedClassType : public ObjCClassType, public ObjCProtocolQualifierNode { … };

This composition enables the “protocol-qualified” concept to be abstract too.

(I don’t know the clang codebase enough to give ObJCProtocolQualifierNode a good name.)

– Chris

Hi David,

Some responses below...

I am not sure if any want want to do this. But following code is
legal:
#include <objc/objc.h>
id pi;
Class min()
{
return pi->isa;
}

This is a much more common idiom. Ideally, however, we would be
treating this as an ivar access with a known offset of 0, rather than
as a structure field access. This probably ought to generate the same
code irrespective of whether pi is an id, an NSObject* or an Object*
(although it should give a warning / error on NSObject where isa is
declared @protected). Perhaps we should be declaring an implicit
Class isa as an ivar on any ObjCIdType, so ObjCIdType defines a
pointer to an object with a single ivar which accepts all messages.

This sounds good to me.

Consider the following pseudo code (for discussion...not meant to be complete):

@class Class, Protocol;

@interface id // ObjCIdType: The implicit root of all ObjC objects.
{
Class isa;
}
// All ObjC roots will inherit the following class methods.
// This will greatly simplify defining an ObjC root class.

Can you clarify what you mean by this? Not all root classes implement these methods (e.g. Object does not; it implements equivalent ones with different names). Are you proposing that we synthesize these methods? I'm not totally opposed to this, although it will require a lot of runtime-specific code.

All I'm saying is we try and codify/standardize the root implied by the ObjC language (without adding any additional runtime requirements). For example, the ObjC language requires dynamic typing, implemented by having an 'isa' and methods to access the class/metaclass objects synthesized by the compiler. From my perspective, the "core" methods should be standardized.

Since we didn't do this originally, we are (unfortunately) left with both NSObject and Object (with lots of conceptual overlap). As compiler writers, I'm not saying we are entirely responsible for defining this API (it would have to be done in collaboration with runtime/library folks). I've spoken with Apple's runtime lead and he is supportive of the general direction.

@interface Class // ObjCClassType: Will implicitly inherit from 'id'

// No explicit ivar declarations, the structure of 'Class' is intentionally opaque (since the details are runtime specific).

- (Class)class;
- (Class)superclass;
- (BOOL)instancesRespondToSelector:(SEL)aSelector;
- (BOOL)conformsToProtocol:(Protocol *)protocol;
- (IMP)instanceMethodForSelector:(SEL)aSelector;

@end

Conceptually, I like this...it makes some of the current legacy C 'magic' much more explicit.

It's definitely a nicer model conceptually.

Glad you agree.

It does, however, require another bit of magic. Since the current definition of 'id' and 'Class' include a pointer, we will have to treat this specially (and not require an explicit '*' as we do for user-defined classes). Shouldn't be a big deal, however wanted to mention it. One day, maybe we can make the '*' implicit for user-defined classes:-)

Well, it would break almost all existing code, but definitely be nicer syntax. I'm not sure if there's a way of gradually introducing it; even if you make it per-compilation-unit it's going to be a bit confusing for people reading the code.

If only we had a nice rewriter that could take Objective-C code and emit slightly-modified code...

I could write the rewriter. Since types are unique, it's not entirely trivial (but I've dealt with this before in the ObjC->C rewriter).

The other issue is source compatibility with existing root classes (that explicitly declare an 'isa', for example). This could be handled in many ways...not worth talking about until we agree on the concept/direction.

This is my biggest concern. Ignoring the first ivar in a root class if it's an id and is called isa is pretty trivial, but I wonder how we will cope with existing Objective-C headers.

The GNU headers, for example, have a silly definition of id, which is (roughly):

typedef struct objc_object { Class class_pointer; } *id;

What happens with this model when someone does:

id a = whatever;
a->class_pointer = something;

If we are treating id as an object with a public isa pointer, then this code will break, and including the GNU header is likely to cause conflicts too.

I'm not sure about this, but it's actually a good example of why this issue is worth tackling. This kind of source incompatibility between the Apple/GNU runtimes is kind of silly.

What do you think?

Very nice in theory, but I'd like to know how we deal with the irritating corner-cases.

I'll powwow with Chris and see what he thinks (he was out last week). this level of the ObjC type system without addressing some of the issues outlined in this thread.

snaroff

This sounds good to me.

Consider the following pseudo code (for discussion...not meant to be complete):

@class Class, Protocol;

@interface id // ObjCIdType: The implicit root of all ObjC objects.
{
Class isa;
}
// All ObjC roots will inherit the following class methods.
// This will greatly simplify defining an ObjC root class.

+ (Class)class;

+ (Class)superclass;
+ (BOOL)instancesRespondToSelector:(SEL)aSelector;
+ (BOOL)conformsToProtocol:(Protocol *)protocol;
+ (IMP)instanceMethodForSelector:(SEL)aSelector;

- (BOOL)isKindOfClass:(Class)aClass;
- (BOOL)isMemberOfClass:(Class)aClass;
- (BOOL)conformsToProtocol:(Protocol *)aProtocol;
- (BOOL)respondsToSelector:(SEL)aSelector;
@end

@interface Class // ObjCClassType: Will implicitly inherit from 'id'

// No explicit ivar declarations, the structure of 'Class' is intentionally opaque (since the details are runtime specific).

- (Class)class;
- (Class)superclass;
- (BOOL)instancesRespondToSelector:(SEL)aSelector;
- (BOOL)conformsToProtocol:(Protocol *)protocol;
- (IMP)instanceMethodForSelector:(SEL)aSelector;

@end

Putting aside the question of mixing framework conventions with the language, should instanceMethodForSelector: be here? It’s not part of the <NSObject> protocol, for for a very good reason: it’s not meaningful for most proxies.

Hi Jens,

The code was for discussion purposes (not a spec). Just because it isn't meaningful for proxies, doesn't mean it shouldn't be part of the core meta object protocol.

The ability to associate a selector with a pointer to a method is kind of fundamental (and from my perspective makes sense).

snaroff