inefficiencies in ConstantUniqueMap ?

Hi,

Consider ConstantUniqueMap::getOrCreate() (in lib/VMCore/ConstantsContext.h):

  /// getOrCreate - Return the specified constant from the map, creating it if
  /// necessary.
  ConstantClass *getOrCreate(const TypeClass *Ty, ValRefType V) {
    MapKey Lookup(Ty, V);
    ConstantClass* Result = 0;
    ...

For array (or struct or vector) constants, typically:

ValType = vector<Constant*>
ValRefType = ArrayRef<Constant*>
MapKey = pair<ArrayType, vector<Constant*>>

So am I right in thinking that the line:

    MapKey Lookup(Ty, V);

is expensive because it has to copy a whole vector? It seems like this
should not be necessary, just to look something up in the map.

Further, if we don't find it in the map, we call Create(), which
constructs *another* copy of MapKey(Ty, V), which will be expensive
all over again.

Thanks,
Jay.

Yes, this is all grossly inefficient. I'm currently working on revamping the LLVM IR type system (see the type-system-rewrite branch):
http://llvm.org/viewvc/llvm-project/llvm/branches/type-system-rewrite/

I'm hoping to land it in the next week or two. The remaining pieces todo on the branch are:
1. the IR linker needs to resolve types correctly.
2. The bitcode reader needs support for LLVM 2.9 bitcode files.
3. Clang/dragonegg need to adapt to the new API (help appreciated!)
4. LangRef and Programmer's manual need to be updated.

Once that lands, the constant uniquing is greatly simplified because AbstractType stuff all vaporizes. When the branch lands, the constant uniquing maps should all be replaced with folding sets.

-Chris

Hi Chris,

3. Clang/dragonegg need to adapt to the new API (help appreciated!)

what needs to be done exactly?

Ciao, Duncan.

Background info: http://www.nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt

As I understand it, PATypeHolder, OpaqueType and the Module's
TypeSymbolTable are gone. Instead, StructTypes can optionally be
named, and if they are then:

- they use name equivalence instead of structural equivalence.
- you can create them without any fields, and then add the fields
later when the struct is complete.

I've played with the Clang bits of this. The biggest problem I've
found is that Clang uses LLVM's type resolution not just for
forward-declared structs/classes/unions, which convert
straightforwardly to the new system, but also for forward-declared
enums, which don't.

Also, varargs forms of StructType::createNamed() and
StructType::setBody() would save a lot of mindless churn in the Clang
source code.

Jay.

3. Clang/dragonegg need to adapt to the new API (help appreciated!)

what needs to be done exactly?

Background info: http://www.nondot.org/sabre/LLVMNotes/TypeSystemRewrite.txt

As I understand it, PATypeHolder, OpaqueType and the Module's
TypeSymbolTable are gone. Instead, StructTypes can optionally be
named, and if they are then:

- they use name equivalence instead of structural equivalence.
- you can create them without any fields, and then add the fields
later when the struct is complete.

Right, exactly. The major impact is that the various AbstractTypeUser-related classes (including OpaqueType) are gone, and the StructType interface is richer:
http://llvm.org/viewvc/llvm-project/llvm/branches/type-system-rewrite/include/llvm/DerivedTypes.h?revision=133420&view=markup

I've played with the Clang bits of this. The biggest problem I've
found is that Clang uses LLVM's type resolution not just for
forward-declared structs/classes/unions, which convert
straightforwardly to the new system, but also for forward-declared
enums, which don't.

I haven't started looking at this at all yet, but it seems that we could use a similar example for forward declared structs whose bodies are needed. Clang currently codegen's this:

struct S;
struct S f(void);
struct T { struct S (*f)(void); };
struct T t = {f};

into:

%0 = type opaque
%struct.T = type { %0* }

@t = global %struct.T { %0* bitcast (void ()* @f to %0*) }, align 8

for example. Instead of doing that, we can/should just codegen it to:

%struct.T = type { void ()* }

@t = global %struct.T { void ()* @f }, align 8

directly. Basically, if we "need" a type and don't have it, just lower it directly to void instead of 'opaque'. This will lead to some more bitcasts downstream, but we already have that, they get resolved if f is ever implemented.

Also, varargs forms of StructType::createNamed() and
StructType::setBody() would save a lot of mindless churn in the Clang
source code.

Please feel free to add it to the branch!

-Chris

I've played with the Clang bits of this. The biggest problem I've
found is that Clang uses LLVM's type resolution not just for
forward-declared structs/classes/unions, which convert
straightforwardly to the new system, but also for forward-declared
enums, which don't.

In case anyone's interested, here's my work-in-progress patch for
clang. (Note that it's against a slightly old clang tree, because the
llvm type-system-rewrite branch hasn't had a merge from trunk
recently; and you also need the attached llvm patchlet to make it all
build.)

I'm not 100% satisfied with it but it is good enough to build sqlite3
from the test suite.

I haven't started looking at this at all yet, but it seems that we could use a similar example for forward declared structs whose bodies are needed. Clang currently codegen's this:

struct S;
struct S f(void);
struct T { struct S (*f)(void); };
struct T t = {f};

into:

%0 = type opaque
%struct.T = type { %0* }

@t = global %struct.T { %0* bitcast (void ()* @f to %0*) }, align 8

for example. Instead of doing that, we can/should just codegen it to:

%struct.T = type { void ()* }

@t = global %struct.T { void ()* @f }, align 8

directly.

I now get:

%struct.T = type { i8* }
@t = global %struct.T { i8* bitcast (void ()* @f to i8*) }, align 8
declare void @f()

(I lowered the incomplete function type all the way to void, instead
of to void(), only because that made it simpler for
CodeGenModule::GetOrCreateLLVMFunction() to tell the difference
between a proper function type and a placeholder type. There's
probably a better way of doing this.)

Basically, if we "need" a type and don't have it, just lower it directly to void instead of 'opaque'.

I'm doing this for:

- incomplete non-fixed enum types
- functions types whose argument or return types are incomplete
(because in that case the ABI stuff can't work out how parameters are
going to be passed/returned, so you can't get a proper LLVM type for
the function)

This will lead to some more bitcasts downstream, but we already have that, they get resolved if f is ever implemented.

I don't have a good understanding of which bits of Clang's codegen are
prepared to insert bitcasts like this; but it seems to work well
enough to build sqlite3!

Jay.

clang-type-system-rewrite.diff (89.4 KB)

llvm-constantarray-get-arrayref.diff (702 Bytes)

Hi Jay,

As I understand it, PATypeHolder, OpaqueType and the Module's
TypeSymbolTable are gone. Instead, StructTypes can optionally be
named, and if they are then:

- they use name equivalence instead of structural equivalence.
- you can create them without any fields, and then add the fields
later when the struct is complete.

I find this distinction between structs with names and structs without
names strange. Why is a name relevant to anything? If you can add fields
to a struct with a name why not to a struct without a name?

Ciao, Duncan.

It's Chris's design and I don't feel qualified to speak on his behalf.
But... isn't this just the way C works? Structs either have a tag, or
some fields, or both, but definitely not neither? And structs with the
same tag are the same type in C, which is why the name is relevant.

Jay.

Hi Jay,

As I understand it, PATypeHolder, OpaqueType and the Module's
TypeSymbolTable are gone. Instead, StructTypes can optionally be
named, and if they are then:

- they use name equivalence instead of structural equivalence.
- you can create them without any fields, and then add the fields
later when the struct is complete.

I find this distinction between structs with names and structs without
names strange. Why is a name relevant to anything? If you can add fields
to a struct with a name why not to a struct without a name?

It's Chris's design and I don't feel qualified to speak on his behalf.
But... isn't this just the way C works? Structs either have a tag, or
some fields, or both, but definitely not neither? And structs with the
same tag are the same type in C, which is why the name is relevant.

thanks for the explanation (which I didn't understand :frowning: ). Hopefully Chris
will chime in.

Ciao, Duncan.

I'll have another go then!

I find this distinction between structs with names and structs without
names strange. Why is a name relevant to anything? If you can add fields
to a struct with a name [...]

This corresponds to the following C constructs:

struct S; // create struct with name but no fields
...
struct S { int i, j; } // add fields to named struct

[...] why not to a struct without a name?

This would imply that you were starting with a struct with no name and
no fields, which is not something you can create in C:

struct; // this isn't a C type!

Maybe "adding fields" is bad terminology -- we're really talking about
setting the struct's list of fields, which you can only do once. (You
can't add yet more fields later.)

Jay.

Hi Jay,

I'll have another go then!

thanks!

I find this distinction between structs with names and structs without
names strange. Why is a name relevant to anything? If you can add fields
to a struct with a name [...]

This corresponds to the following C constructs:

struct S; // create struct with name but no fields
...
struct S { int i, j; } // add fields to named struct

In Ada you can have a forward declaration with fields (known as discriminants),
for example:

    type X (Z : Integer); -- Z is the discriminant
    type XP is access X;
    type X (Z : Integer) is record
       P : XP; -- Add an additional field, now there are two
    end record;

I don't think this will cause any problems since GCC has resolved the type by
the time it hits dragonegg (i.e. dragonegg doesn't see the incomplete type).
In llvm-gcc we would see the incomplete type, but I don't care about llvm-gcc
any more :slight_smile:

[...] why not to a struct without a name?

This would imply that you were starting with a struct with no name and
no fields, which is not something you can create in C:

struct; // this isn't a C type!

I see structs with fields but without names come flying out of gcc for C++ code
all the time. I'm not sure where they come from - perhaps they are utility
types created by the front-end. These are probably harmless since then can just
be given an invented name. Or maybe it is harmless to let them have no name.

Maybe "adding fields" is bad terminology -- we're really talking about
setting the struct's list of fields, which you can only do once. (You
can't add yet more fields later.)

The name-noname distinction still seems artificial to me. I assume it is being
driven by efficiency considerations: i.e. not allowing named types to be refined
simplifies LLVM's internals considerably.

Ciao, Duncan.

[...] why not to a struct without a name?

This would imply that you were starting with a struct with no name and
no fields, which is not something you can create in C:

struct; // this isn't a C type!

I see structs with fields but without names come flying out of gcc for C++
code
all the time.

That's fine; in C this is like the type of s in:

struct { int i, j; } s;

But in this case, you know what the fields are straight away; you
don't need to create a tag-less field-less struct and then add the
fields *later*.

To put it another way, when a C struct is first declared, it has
either a name (tag) or a list of fields or both, but not neither!

The name-noname distinction still seems artificial to me. I assume it is
being
driven by efficiency considerations: i.e. not allowing named types to be
refined
simplifies LLVM's internals considerably.

I see your point. Maybe Chris's implementation *does* allow you to
refine unnamed types. I hadn't really considered it before.

Jay.

The issue is that we have two cases here:

1. anonymous structs which are just used for aggregation, but that are uniqued by-structure. This is mostly useful for simple things like complex numbers, which you want the type to expand inline in .ll files etc.

2. named structs which are uniqued by (obviously their name).

The subcase of #2 is that we want a "named struct without a name" because we want the -strip pass to still be useful. Because there is a semantic difference between named and unnamed types, I chose to call them "anonymous" structs, which don't have an identity, and "named" types whose name is allowed to be empty.

I plan to rename StructType::get to StructType::getAnon after the branch lands.

-Chris

There is no refinement in the new system: once a type is created, it doesn't change. The only "refinement" allowed is to specify a body for a named type.

The implication of this is that the only way to get a recursive type is to use a 'named' struct.

If we were to simplify anything away, it would be anonymous structs. I thought about this quite a bit, but I think that having the ability to allow simple (non-recursive) aggregation is really useful for frontends.

-Chris

In case anyone's interested, here's my work-in-progress patch for
clang. (Note that it's against a slightly old clang tree, because the
llvm type-system-rewrite branch hasn't had a merge from trunk
recently; and you also need the attached llvm patchlet to make it all
build.)
I'm not 100% satisfied with it but it is good enough to build sqlite3
from the test suite.

Wow, that's great! I'm using my few spare cycles to try to finish up the linker and bitcode reader, I really appreciate you working on this. I landed your Constants.h helper function. I'll update the branch to mainline if I can figure out how to get SVN to do what I want :slight_smile:

struct S;
struct S f(void);
struct T { struct S (*f)(void); };
struct T t = {f};

into:

%0 = type opaque
%struct.T = type { %0* }

@t = global %struct.T { %0* bitcast (void ()* @f to %0*) }, align 8

for example. Instead of doing that, we can/should just codegen it to:

%struct.T = type { void ()* }

@t = global %struct.T { void ()* @f }, align 8

directly.

I now get:

%struct.T = type { i8* }
@t = global %struct.T { i8* bitcast (void ()* @f to i8*) }, align 8
declare void @f()

(I lowered the incomplete function type all the way to void, instead
of to void(), only because that made it simpler for
CodeGenModule::GetOrCreateLLVMFunction() to tell the difference
between a proper function type and a placeholder type. There's
probably a better way of doing this.)

This makes perfect sense to me.

Basically, if we "need" a type and don't have it, just lower it directly to void instead of 'opaque'.

I'm doing this for:

- incomplete non-fixed enum types
- functions types whose argument or return types are incomplete
(because in that case the ABI stuff can't work out how parameters are
going to be passed/returned, so you can't get a proper LLVM type for
the function)

Yep, makes sense.

This will lead to some more bitcasts downstream, but we already have that, they get resolved if f is ever implemented.

I don't have a good understanding of which bits of Clang's codegen are
prepared to insert bitcasts like this; but it seems to work well
enough to build sqlite3!

My bet is that this is really really close.

-Chris

I landed your Constants.h helper function. I'll update the branch to mainline if I can figure out how to get SVN to do what I want :slight_smile:

Thanks, a merge from trunk would be great! My Constants.h patch was
just a stop-gap; the merge will bring that function in properly.

Thanks,
Jay.