type-system-rewrite branch near landing

An update on the type-system-rewrite branch (http://llvm.org/viewvc/llvm-project/llvm/branches/type-system-rewrite/):

It's now to the point where it passes all regression tests all of single source (and most of externals/multisource) when using an LLVM 2.9 version of clang to compile programs to a rbc file. I have what looks like one more subtle type mapping bug to track down, which will hopefully resolve the remaining linker failures on multisource/externals. Then I'll update the branch to ToT LLVM again.

Once I'm done with that, I'm getting eager to land the branch on mainline. To do this, I need to update the various projects with buildbots that are going to break, specifically:

1. Clang - Jay, do you have a patch for this? Can you create a branch of the clang repo or send an updated version of the patch to the list?

2. llvm-gcc - While I'd like for it to be dead, :slight_smile: I've been asked to prolong its life a little longer. I'll look into this.

3. dragonegg - I don't have any way to build or test this, so I can't update it. Is anyone willing to tackle this? The updates should be relatively straight-forward and I'm happy to help answer any questions.

I'd really like get this done and landed by this weekend. I have a very large collection of follow-on cleanups that can also happen once the main branch lands.

-Chris

1. Clang - Jay, do you have a patch for this?

Yes. It's good enough to build most of LLVM+Clang, except for a couple
of files. But I'm running out of time and expertise to be able to fix
the remaining bits. Some specific concerns:

1. Many Objective-C(++) tests fail, because they use implicitly
defined structs for various ObjC runtime data structures; the
ASTConsumer HandleTagDeclDefinition callback is never called for these
structs, which means that I don't bother to lay them out, so they only
ever get an opaque LLVM type. Then we get assertion failures when
generating code to access fields in these structs.

2. Even some simple C cases fail, e.g.:

struct S;
extern struct T {
  struct S (*p)(void);
} t;
struct S { int i; };
void g(void) {
  t.p();
}

The problem here is that T.p is codegenned as i8*, so we need to
bitcast it to a proper function pointer type before calling it.
Shouldn't be too hard to implement.

3. There are other CodeGen tests that fail with assertion failures,
which I haven't investigated. (I'm starting to wonder if I went too
far when I ripped out all the PointersToResolve /
HandleLateResolvedPointers() stuff from CodeGenTypes.cpp.)

4. A bunch of the CodeGen tests need adjusting because we're
generating slightly different IR from what they expect.

Can you create a branch of the clang repo or send an updated version of the patch to the list?

I'd be very surprised if I have the authority to create a branch,
given things like:

  ~/svn/llvm-project/llvm/branches$ svn up
  svn: Server sent unexpected return value (403 Forbidden) in response
to REPORT request for '/svn/llvm-project/!svn/vcc/default'

... so I've attached the patch.

Thanks,
Jay.

clang-type-system-rewrite.diff (138 KB)

1. Clang - Jay, do you have a patch for this?

Yes. It's good enough to build most of LLVM+Clang, except for a couple
of files. But I'm running out of time and expertise to be able to fix
the remaining bits. Some specific concerns:

1. Many Objective-C(++) tests fail, because they use implicitly
defined structs for various ObjC runtime data structures; the
ASTConsumer HandleTagDeclDefinition callback is never called for these
structs, which means that I don't bother to lay them out, so they only
ever get an opaque LLVM type. Then we get assertion failures when
generating code to access fields in these structs.

Is there some reason we can't just layout types lazily when requested,
like the old code does? Should save some work in common cases as well
as implicitly solve any issues here.

2. Even some simple C cases fail, e.g.:

struct S;
extern struct T {
struct S (*p)(void);
} t;
struct S { int i; };
void g(void) {
t.p();
}

The problem here is that T.p is codegenned as i8*, so we need to
bitcast it to a proper function pointer type before calling it.
Shouldn't be too hard to implement.

I don't really like the idea that function types, and especially enum
types, will permanently be associated with an incomplete type.
There's too much room for something to screw up, particularly for the
enum case (which probably won't get tested very well).

I think if you keep track of all the types based on an incomplete tag
type, and just wipe them all out when the type is completed,
everything should just work; a type can't be completed in the middle
of the CodeGen for a function, and we bitcast all globals before use,
so there's nothing else to change.

3. There are other CodeGen tests that fail with assertion failures,
which I haven't investigated. (I'm starting to wonder if I went too
far when I ripped out all the PointersToResolve /
HandleLateResolvedPointers() stuff from CodeGenTypes.cpp.)

Hmm... there shouldn't be anything you need to handle there, I think.

4. A bunch of the CodeGen tests need adjusting because we're
generating slightly different IR from what they expect.

Can you create a branch of the clang repo or send an updated version of the patch to the list?

I'd be very surprised if I have the authority to create a branch,
given things like:

~/svn/llvm-project/llvm/branches$ svn up
svn: Server sent unexpected return value (403 Forbidden) in response
to REPORT request for '/svn/llvm-project/!svn/vcc/default'

I think that explicitly returns Forbidden just to avoid the strain on
the server; if you have any write access, you should have write access
to everything.

-Eli

FWIW, noone is allowed to check out "all" branches. You have to pick one.

-Chris

1. Many Objective-C(++) tests fail, because they use implicitly
defined structs for various ObjC runtime data structures; the
ASTConsumer HandleTagDeclDefinition callback is never called for these
structs, which means that I don't bother to lay them out, so they only
ever get an opaque LLVM type. Then we get assertion failures when
generating code to access fields in these structs.

Is there some reason we can't just layout types lazily when requested,
like the old code does? Should save some work in common cases as well
as implicitly solve any issues here.

I couldn't get it working that way, but I can't remember exactly what
the problems were. I'll have another go. It would certainly be less
disruptive if it still worked like that.

struct S;
extern struct T {
struct S (*p)(void);
} t;
struct S { int i; };
void g(void) {
t.p();
}

I don't really like the idea that function types, and especially enum
types, will permanently be associated with an incomplete type.
There's too much room for something to screw up, particularly for the
enum case (which probably won't get tested very well).

I think if you keep track of all the types based on an incomplete tag
type, and just wipe them all out when the type is completed,
everything should just work; a type can't be completed in the middle
of the CodeGen for a function, and we bitcast all globals before use,
so there's nothing else to change.

So &t would get an LLVM type like {i8*}*; but it would be bitcast to
{%struct.S ()*}* before loading p from it? I'll try that out.

Thanks,
Jay.

Hi Jay,

FYI, I merged mainline up to r134683 into the branch, and fixed the last llvm-test failures. I'm good to land it now, except for the small matter of clang/llvm-gcc/dragonegg. :slight_smile:

I'm going to spend tomorrow tackling llvm-gcc. Once I'm done with it I'll take a look at Clang if you haven't beaten me to it.

If no one is interested in updating dragonegg, then I'll have to land the branch without updating it. It should be relatively easy to update dragonegg once llvm-gcc is updated. Worst case, its build can be fixed once Duncan gets back from vacation.

-Chris

~/svn/llvm-project/llvm/branches$ svn up
svn: Server sent unexpected return value (403 Forbidden) in response
to REPORT request for '/svn/llvm-project/!svn/vcc/default'

I think that explicitly returns Forbidden just to avoid the strain on
the server; if you have any write access, you should have write access
to everything.

Off topic, but what really annoys me is:

~/svn/llvm-project/llvm$ svn up trunk/
svn: Server sent unexpected return value (403 Forbidden) in response
to REPORT request for '/svn/llvm-project/!svn/vcc/default'
~/svn/llvm-project/llvm$ ( cd trunk/ ; svn up )
At revision 134691.

It's a pain having to cd around just to do svn updates.

Jay.

1. Clang - Jay, do you have a patch for this? Can you create a branch of the clang repo or send an updated version of the patch to the list?

I've created a clang type-system-rewrite branch and committed all my
purely mechanical changes: de-constifying llvm::Types as necessary,
and using the new llvm::StructType::createNamed/setBody to create the
"implicit" structs used by things like the Objective-C runtime.

I haven't committed anything to do with updating the general Clang ->
LLVM type conversion machinery, because I don't know when or whether
I'll be able to get it finished. If anyone else wants to work on it
please feel free! I've attached my current work-in-progress, as a diff
from the new clang type-system-rewrite branch.

Thanks,
Jay.

clang-type-system-rewrite.diff (42.3 KB)

When I thought about this very hard, it made sense. Thanks for the insight!

It sounds a bit hard to keep track of which types are based (directly
or indirectly) on incomplete types, so for the time being I'm clearing
the whole of the TypeCache and the FunctionInfos cache every time any
type is completed. I realise this is a bit brutal.

Now I'm running into another problem with the implementation:
EmitCXXConstructor() and EmitCXXDestructor() don't do most of the
stuff that's done for normal functions in
EmitGlobalFunctionDefinition. In particular, they're not prepared to
find an existing LLVM declaration of the function with the wrong LLVM
type, and replace it with a definition having the right type.

Jay.

make update?

Well this works for llvm+clang for me, though I must admit I have a non-portable
top level make mod that allows svn update to cross soft links--allows me the ability
to delete said soft link and build only llvm. Anyway without such soft links, llvm +
clang update via "make update". Not sure this covers all your issues.

Garrison

Sorry, I don't know what causes that. We were having problems with people accidentally checking out the entire repo instead of just a single branch, perhaps the fix for that is over-aggressive.

-Chris

2. Even some simple C cases fail, e.g.:

struct S;
extern struct T {
struct S (*p)(void);
} t;
struct S { int i; };
void g(void) {
t.p();
}

The problem here is that T.p is codegenned as i8*, so we need to
bitcast it to a proper function pointer type before calling it.
Shouldn't be too hard to implement.

I don't really like the idea that function types, and especially enum
types, will permanently be associated with an incomplete type.
There's too much room for something to screw up, particularly for the
enum case (which probably won't get tested very well).

I think if you keep track of all the types based on an incomplete tag
type, and just wipe them all out when the type is completed,
everything should just work; a type can't be completed in the middle
of the CodeGen for a function, and we bitcast all globals before use,
so there's nothing else to change.

When I thought about this very hard, it made sense. Thanks for the insight!

It sounds a bit hard to keep track of which types are based (directly
or indirectly) on incomplete types, so for the time being I'm clearing
the whole of the TypeCache and the FunctionInfos cache every time any
type is completed. I realise this is a bit brutal.

Hmm... I don't have anything simple to suggest here.

Now I'm running into another problem with the implementation:
EmitCXXConstructor() and EmitCXXDestructor() don't do most of the
stuff that's done for normal functions in
EmitGlobalFunctionDefinition. In particular, they're not prepared to
find an existing LLVM declaration of the function with the wrong LLVM
type, and replace it with a definition having the right type.

Hacking in that logic in shouldn't be too hard... I hate to suggest
copy-paste, but that's probably the easiest thing to do. It wasn't
necessary before in these cases because the directly relevant types
for a constructor/destructor are guaranteed to be complete before any
use of the declarations.

-Eli

Thanks Jay, I'll take the baton on this. If anyone is interested, here's the patch converting llvmg-cc over. It seems to work.

p.patch.gz (8.76 KB)