As I said in a previous email, every Objective-C program redefines the built-in types from a header found in the system. Currently, the types for these are hard-coded in clang, for the NeXT runtime, causing compilation to abort with any other runtime (specifically the GNU one - there will be more problems with the Étoilé runtime later since clang assumes selectors to be pointers in quite a few places).
The attached patch allows the definition in the headers to replace the hard-coded definitions, allowing Objective-C programs to compile for the GNU runtime again.
I see. The problem was that now the new id type is being used, but ASTContext doesn't recognise it as the ObjC type. I've tweaked it to set it in ASTContext and removed the assert()s here that check it isn't already set. Note that this will now leak, but I'm not sure what the best way of fixing this is (can we safely delete the old version? Should we lazily create them somewhere if they're referenced before first use (and if so where)?).
The attached version passes all of the tests that it was passing before the patch.
David
P.S. I also see a failure in typedef-redef.c, which I didn't cause:
Errors seen but not expected:
Line 5: typedef redefinition with different types ('unsigned long' vs '__size_t')
Line 55: previous definition is here
I not aware of any objc file that does not include at least an header with the definition of the objc runtime base types (id, SEL, BOOL, …).
Why where these types hard-coded in clang first ?
That was the question I asked last week. Having received no answer, I presumed that there was some good reason. If there isn't, then we can delete them from Sema.cpp and reinstate the asserts.
This causes several failure in the test suite
(Parser/objc-forcollection-*) if I apply. Can
you investigate?
I see. The problem was that now the new id type is being used, but
ASTContext doesn't recognise it as the ObjC type. I've tweaked it
to set it in ASTContext and removed the assert()s here that check it
isn't already set. Note that this will now leak, but I'm not sure
what the best way of fixing this is (can we safely delete the old
version? Should we lazily create them somewhere if they're
referenced before first use (and if so where)?).
The attached version passes all of the tests that it was passing
before the patch.
I not aware of any objc file that does not include at least an header
with the definition of the objc runtime base types (id, SEL, BOOL, …).
Why where these types hard-coded in clang first ?
That was the question I asked last week. Having received no answer, I
presumed that there was some good reason. If there isn't, then we can
delete them from Sema.cpp and reinstate the asserts.
I looked at your patch and need to dig a bit deeper before I approve it.
The "good reason" clang predefines or hard codes id/SEL/Class/Protocol is gcc does. My original GCC (or clang) implementation(s) didn't do this. Nevertheless, I believe GCC changed this within the past 5 years or so (so I changed clang). Off hand, I don't know why they changed it. It complicates the dance between the compiler and the headers (which is weird). Hacking MergeTypeDefDecl() is unfortunate.
Nevertheless, I don't see how this patch completely solves your problem. When the FIXME to verify the underlying types is implemented, I don't see how this is going to work. Are you going to hack Sema::ActOnTranslationUnitScope() to install different types? (that correspond to headers from your runtime?).
I agree this is a problem, but I'm not sure of the solution yet and
haven't had time to think about it much.
To clarify the issues:
(1) We need to have predefined types for all of id/Class/SEL/Protocol
because they occur in the language (self, _cmd, @protocol, etc).
(2) If these types are subsequently defined in headers, then they need
to bind to the internal types.
(3) At CodeGen, we need provide actual definitions for these types. In
theory we could do a lot of bitcasting to avoid this, but that makes very
ugly IR.
From a design perspective it seems like the predefined types should start
as opaque types in clang. This I believe solves the original issue David
presented, but I think it also means a number of changes to Sema. This
should handle (1) and (2).
For CodeGen, I think we just look and see if the type has been refined. If
so we make sure it is compatible with the real type. If it hasn't been refined,
we provide the definition.
The attached diff fixes a problem with generating code which passes @selector() values as arguments. This is due to the fact that selector type from Sema is a packed structure, while CodeGen is using an unpacked one.
I agree this is a problem, but I'm not sure of the solution yet and
haven't had time to think about it much.
To clarify the issues:
(1) We need to have predefined types for all of id/Class/SEL/Protocol
because they occur in the language (self, _cmd, @protocol, etc).
(2) If these types are subsequently defined in headers, then they need
to bind to the internal types.
(3) At CodeGen, we need provide actual definitions for these types. In
theory we could do a lot of bitcasting to avoid this, but that makes very
ugly IR.
From a design perspective it seems like the predefined types should start
as opaque types in clang. This I believe solves the original issue David
presented, but I think it also means a number of changes to Sema. This
should handle (1) and (2).
Just to be clear, are you talking about truly opaque types or C style opaque types (i.e. "void *").
I'd prefer truly opaque first class types that are directly modeled in the type system. For example, it would make sense if we had Type::TypeClass:{..., ObjCIdType, ObjCSELType, ObjCClassType, ...} when compiling for ObjC. When compiling C code, the types would be obtained from the headers for the particular runtime.
If the above change is too radical:-), then I guess mplementing id/Class/SEL as an opaque type could work (since they are all pointers). Protocols are a bit trickier (since they aren't implemented in terms of pointers). Here is a snippit from /usr/include/objc/runtime.h...
I would prefer the bitcast be in the runtime interface if it is necessary. It
is nicer for the runtime interface functions to simply be specified as returning
Value's of the correct type. In fact, GetSelector already has this in
its comment.
I agree this is a problem, but I'm not sure of the solution yet and
haven't had time to think about it much.
To clarify the issues:
(1) We need to have predefined types for all of id/Class/SEL/Protocol
because they occur in the language (self, _cmd, @protocol, etc).
Yup.
(2) If these types are subsequently defined in headers, then they need
to bind to the internal types.
Agreed.
(3) At CodeGen, we need provide actual definitions for these types. In
theory we could do a lot of bitcasting to avoid this, but that makes very
ugly IR.
Yes. The second patch I sent will actually break this with the Étoilé runtime, since this uses i32 for SEL (they're just an index in a hash table).
From a design perspective it seems like the predefined types should start
as opaque types in clang. This I believe solves the original issue David
presented, but I think it also means a number of changes to Sema. This
should handle (1) and (2).
Yup. I would prefer totally opaque types with no assumptions about whether they're typedefs, pointers, or whatever.
For CodeGen, I think we just look and see if the type has been refined. If
so we make sure it is compatible with the real type. If it hasn't been refined,
we provide the definition.
This makes sense. Ideally we should be calling the methods in ASTContext from the runtime back ends to get these types, and printing some friendly error if they are defined as a type which is not structurally equivalent to what the runtime back end expects.