Associating types directly with debug metadata?

We would need to access the LLVM debug metadata type information directly from LLVM types. It looks like the current clang and llvm-gcc don't support such an association, nor appears the LLVM itself do. (We are tracking TOT, but only once a month or so.)

In terms of LLVM IR, apparantly this would simply mean optionally adding another metadata value to each type. For example, let us have a C type

  struct T { int a };

In LLVM IR (with no name conflicts), this get presented as

  %struct.T = type { i32 }

Now, we would like to annotate this with a piece of metadata, something like this:

  %struct.T = type { i32 }, !dbg !11

with the associated metadata being something like the following:

!1 = metadata !{i32 524329, metadata !"foo.c", metadata !"/tmp", metadata !2} ; [ DW_TAG_file_type ]
!2 = metadata !{i32 524305, i32 0, i32 12, metadata !"foo.c", metadata !"/tmp", metadata !"clang version 2.9 ($URL$)", i1 true, i1 false, metadata !"", i32 0} ; [ DW_TAG_compile_unit ]
!5 = metadata !{i32 524324, metadata !1, metadata !"int", metadata !1, i32 0, i64 32, i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ]
!11 = metadata !{i32 524307, metadata !1, metadata !"T", metadata !1, i32 2, i64 32, i64 32, i64 0, i32 0, null, metadata !12, i32 0, null} ; [ DW_TAG_structure_type ]
!12 = metadata !{metadata !13}
!13 = metadata !{i32 524301, metadata !1, metadata !"a", metadata !1, i32 2, i64 32, i64 32, i64 0, i32 0, metadata !5} ; [ DW_TAG_member ]

Then the type would probably need to be added to a new named metadata tag as well, maybe !llvm.dbg.ty?

Would the right starting point be to simply add an MDNode pointer to the Type class? That should be then convertible to a DIType?

The next step apparently would be to support to LLParser to read in such associations, and to AsmPrinter to print it out. But how about the bitcode? Would this require a new LLVMBitCode?

With that, would LLVM be able to deal with the type-associated metadata, or is there something in the LLVM side that I'm missing?

Once that works, I guess I need to ask how to extend clang to emit the information, but the right place for that would be cfe-dev, wouldn't it?

--Pekka Nikander

We would need to access the LLVM debug metadata type information directly from LLVM types. It looks like the current clang and llvm-gcc don't support such an association, nor appears the LLVM itself do.

True. I am curious how do you want to use this association ?

Then the type would probably need to be added to a new named metadata tag as well, maybe !llvm.dbg.ty?

This is a preferred way to do. BTW, I used llvm.dbg.ty name so it is a good idea to pick up some name that also conveys your use of this association.

Would the right starting point be to simply add an MDNode pointer to the Type class? That should be then convertible to a DIType?

We want to avoid any Type class modification. Instead you can use pair in named metadata to match metadata with type.

!11 = metadata !{i32 524307, metadata !1, metadata !"T", metadata !1, i32 2, i64 32, i64 32, i64 0, i32 0, null, metadata !12, i32 0, null} ; [ DW_TAG_structure_type ]

!21 = metadata !{ metadata !11, %struct.T %z}

!llvm.my_special_type_info = !{!21}

The next step apparently would be to support to LLParser to read in such associations, and to AsmPrinter to print it out.

Yes.

But how about the bitcode? Would this require a new LLVMBitCode?

You need to modify bitcode writer/reader.

With that, would LLVM be able to deal with the type-associated metadata, or is there something in the LLVM side that I'm missing?

Yes. BTW, Dan Gohman is working on TBAA. He is also working on necessary support to associate Type with a metadata.

Once that works, I guess I need to ask how to extend clang to emit the information, but the right place for that would be cfe-dev, wouldn't it?

yes.

Be aware that types are uniqued and that the IR name is completely meaningless.

-Chris

We would need to access the LLVM debug metadata type information directly from LLVM types. It looks like the current clang and llvm-gcc don't support such an association, nor appears the LLVM itself do.

True. I am curious how do you want to use this association ?

Thanks for you advice below, Devang.

We are using LLVM type definitions + debug metadata to interpret binary data, read from memory, and dump it out as LLVM IR constants. A similar kind of thing to what GNU Emacs does when it dumps its memory image when bootstrapping, or what some OSes do to speed up their boot times on a known hardware configuration. However, our approach is different in that we want to have an LLVM IR dump, not a binary dump as emacs and the OSes do.

In our case, the data structures we are dumping are ones that are configured when the original program is initialised, and then not changed afterwards; we are essentially producing frozen configurations. Then we optimize the performance critical part of the original program with a frozen configuration (instead of using a dynamically assembled configuration). The results from our prototyping so far are pretty good; in some cases we are able to repeatedly perform loop peeling and constant propagation so that the original loop disappears altogether. (We didn't get constant propagation to go through the PHIs our loops initially have. Peeling the loop helped.)

Using @llvm.invariant.start doesn't give as good results, since from our dump we know the exact value of the data fields, in addition to that they don't change. Hence, that allows us to optimise the code much more aggressively.

And yes, (to Chris in the other e-mail), we are aware that the IR type names are "meaningless". That's the main reason why we want to use the debug metadata, under the assumption that it would have better coherence with the source code.

Then the type would probably need to be added to a new named metadata tag as well, maybe !llvm.dbg.ty?

This is a preferred way to do. BTW, I used llvm.dbg.ty name so it is a good idea to pick up some name that also conveys your use of this association.

Would !llvm.dbg.types be appropriate? Are there naming conventions here?

Would the right starting point be to simply add an MDNode pointer to the Type class? That should be then convertible to a DIType?

We want to avoid any Type class modification. Instead you can use pair in named metadata to match metadata with type.

!11 = metadata !{i32 524307, metadata !1, metadata !"T", metadata !1, i32 2, i64 32, i64 32, i64 0, i32 0, null, metadata !12, i32 0, null} ; [ DW_TAG_structure_type ]

!21 = metadata !{ metadata !11, %struct.T %z}

!llvm.my_special_type_info = !{!21}

Ok, that approach should work. Though we want to associate the type, not some variable with the type, i.e. something like

!21 = metadata !{ %struct.T, metadata !11 }

But I don't know if that would be valid syntax...

Yes. BTW, Dan Gohman is working on TBAA. He is also working on necessary support to associate Type with a metadata.

Thanks for the hint. Our goal seems to be orthogonal, though, I think.

--Pekka

Would the right starting point be to simply add an MDNode pointer to the Type class? That should be then convertible to a DIType?

We want to avoid any Type class modification. Instead you can use pair in named metadata to match metadata with type.

!11 = metadata !{i32 524307, metadata !1, metadata !"T", metadata !1, i32 2, i64 32, i64 32, i64 0, i32 0, null, metadata !12, i32 0, null} ; [ DW_TAG_structure_type ]

!21 = metadata !{ metadata !11, %struct.T %z}

!llvm.my_special_type_info = !{!21}

Ok, that approach should work. Though we want to associate the type, not some variable with the type, i.e. something like

!21 = metadata !{ %struct.T, metadata !11 }

But I don't know if that would be valid syntax...

I thought about that more, and I think the "right" way would be to have a syntax like

!21 = metadata !{ type %struct.T, metadata !11 }

or perhaps

!21 = metadata !{ typeval %struct.T, metadata !11 }

to avoid the problem with the keyword 'type'.

But to be able to do that, should apparently be possible to represent types as first class values. That in turn would apparently require a new Type::TypeID?

Would that be worth doing? Apparently we can fake with our need, and use a null pointer instead. Technically, that would be wrong, but would work with the current code without any modifications:

!21 = metadata !{ %struct.T *null, metadata !11 }

Any opinions? (I guess I'll just try the number of changes needed for a 'typeval' and report back.)

--Pekka

I would recommend this approach.

There are not any strict conventions. We generally use llvm.dbg.<blah> for debug info needs. "llvm." prefix should be reserved for uses that are actually in mainline llvm sources.

We want to avoid any Type class modification. Instead you can use pair in named metadata to match metadata with type.

I thought about that more, and I think the "right" way would be to have a syntax like

!21 = metadata !{ type %struct.T, metadata !11 }

or perhaps

!21 = metadata !{ typeval %struct.T, metadata !11 }

to avoid the problem with the keyword 'type'.

Would that be worth doing? Apparently we can fake with our need, and use a null pointer instead. Technically, that would be wrong, but would work with the current code without any modifications:

!21 = metadata !{ %struct.T *null, metadata !11 }

I would recommend this approach.

But that would have a hacky feeling :slight_smile:

Anyway, I already implemented an early patch for making Types representable as Values. For that I created a new class, TypeValue, which represents a type as a value. An early patch for that enclosed. Works when reading or writing textual representations; no BC representation yet. And no test case yet.

This works well enough for our purposes; I don't have any incentive to go further unless someone else needs a similar kind of functionality.

--Pekka

TypeValue.diff (11 KB)

I thought about that more, and I think the "right" way would be to have a syntax like

!21 = metadata !{ typeval %struct.T, metadata !11 }

to avoid the problem with the keyword 'type'.

Anyway, I already implemented an early patch for making Types representable as Values. For that I created a new class, TypeValue, which represents a type as a value. An early patch for that enclosed. Works when reading or writing textual representations; no BC representation yet. And no test case yet.

Here is another version of the patch. This one includes also a small patch to llvm-gcc so that it generates the type metadata for structures and classes. This one also generates and parses correctly the metadata for .ll files. No .bc support yet. The biggest problem with this version is that it breaks when the compiler/linker performs type reductions, and I don't understand why.

--Pekka

TypeValue.diff (16.4 KB)

Pekka Nikander wrote:

I thought about that more, and I think the "right" way would be to have a syntax like

!21 = metadata !{ typeval %struct.T, metadata !11 }

to avoid the problem with the keyword 'type'.

Anyway, I already implemented an early patch for making Types representable as Values. For that I created a new class, TypeValue, which represents a type as a value. An early patch for that enclosed. Works when reading or writing textual representations; no BC representation yet. And no test case yet.

Here is another version of the patch. This one includes also a small patch to llvm-gcc so that it generates the type metadata for structures and classes. This one also generates and parses correctly the metadata for .ll files. No .bc support yet. The biggest problem with this version is that it breaks when the compiler/linker performs type reductions, and I don't understand why.

Please don't do this. My objection to the type value is on the grounds that it is not a value. It does not match the conceptual meaning of a value, that is, something you could put in a machine register on some machine.

If you need to name an LLVM type in your MDNode, just create an undef value of that type.

Nick

I agree completely.

-Chris

Here is another version of the patch. This one includes also a small patch to llvm-gcc so that it generates the type metadata for structures and classes. This one also generates and parses correctly the metadata for .ll files. No .bc support yet. The biggest problem with this version is that it breaks when the compiler/linker performs type reductions, and I don't understand why.

Please don't do this. My objection to the type value is on the grounds
that it is not a value. It does not match the conceptual meaning of a
value, that is, something you could put in a machine register on some
machine.

If you need to name an LLVM type in your MDNode, just create an undef
value of that type.

I agree completely.

Ok, I will try using undefs, but do they work for aggregate types? Well, I'll see.

However, I don't understand what is so different in my design from MDNode and MDString being values? Sure, I could make the TypeValue a subclass of MDNode, name it something like MDType, and use a tag different form "typeval", something like "metatype". Would that be better? Or should it still be a direct subclass of Value, like both MDNode and MDString are?

I tried to carefully model my code after MDNode and MDString, wherever possible. My intention is to use these from metadata only, after all.

--Pekka

Pekka Nikander wrote:

Here is another version of the patch. This one includes also a small patch to llvm-gcc so that it generates the type metadata for structures and classes. This one also generates and parses correctly the metadata for .ll files. No .bc support yet. The biggest problem with this version is that it breaks when the compiler/linker performs type reductions, and I don't understand why.

Please don't do this. My objection to the type value is on the grounds
that it is not a value. It does not match the conceptual meaning of a
value, that is, something you could put in a machine register on some
machine.

If you need to name an LLVM type in your MDNode, just create an undef
value of that type.

I agree completely.

Ok, I will try using undefs, but do they work for aggregate types? Well, I'll see.

However, I don't understand what is so different in my design from MDNode and MDString being values? Sure, I could make the TypeValue a subclass of MDNode, name it something like MDType, and use a tag different form "typeval", something like "metatype". Would that be better? Or should it still be a direct subclass of Value, like both MDNode and MDString are?

I tried to carefully model my code after MDNode and MDString, wherever possible. My intention is to use these from metadata only, after all.

Yes, you make a good point that metadata can't go into a machine register; that's a unique property imbued by the 'metadata' type. There are still certain axioms that apply to values, such as:

  - all values have a type and a value id that is assigned at creation and immutable
  - you can create an undef value of any type (incl. aggregates as you asked earlier, but there's also undef metadata)
  - all values have a use-list you can traverse through

It's not clear to me how you would meet those, in particular the second one.

As you proved by example, you could create a TypeValue and LLVM wouldn't collapse, but it goes against the grain of the design. If undef will work for you, please use it.

Nick

I’d really be interested in getting full support for this.
We’re doing abstract interpretation on c++ code (lowered to c code via llvm).
I restrained from adding a direct mapping from type to type name because it appears to be somewhat
involved (having to change the IT etc) but it would be very helpful for us to have this.

Alex

P.S.: To Devang : I’ll submit the other patch for debug info asap…I just didn’t have the time yet…

If you need to name an LLVM type in your MDNode, just create an undef
value of that type.

I agree completely.

Ok, I will try using undefs...

As you proved by example, you could create a TypeValue and LLVM wouldn't collapse, but it goes against the grain of the design. If undef will work for you, please use it.

I finally had time to try the undef. Generating the undefs in llvm-gcc is easy -- I guess it will be as easy in clang, too. Here is some example output:

  !ahir.sema.types = !{!2079, !2080, !2081, .... }

  !2079 = metadata !{%struct.IPAddress undef, metadata !21}
  !2080 = metadata !{%"struct.String::memo_t" undef, metadata !89}
  !2081 = metadata !{%"struct.String::rep_t" undef, metadata !80}

That looks ok.

But when linking (and apparently with some optimisers), I have almost the same problem as before. The information mostly disappears. I should have several hundred undefs after linking -- but only some 25 of those get to the linked version.

The difference between using my TypeValue hack and your recommended usage of undef is that in the former case the typeval-containing metadata nodes still were there, but the type was null, while now with undefs the undef-containing metadata nodes disappear. That is, with my previous version the !ahir.sema.types named metadata had hundreds of MDnodes pointed by it, but most of those MDnodes were wrong, containing a null instead of a typeval. Now, with undefs, the !ahir.sema.types named metadata has only 25 MDnodes pointed by it, but the ones remaining are correct.

I have no idea what is going on, i.e. what earlier changed the types so that my TypeValues became invalid, and what is now apparently simply destroying the MDnodes.

I'd be willing to fix this if I just knew where to start.

--Pekka

undef.patch (2.18 KB)