Find uses of Metadata / DITypes

I’m looking for a way to efficiently traverse the Metadata structure in reverse, or otherwise find DITypes that refer to another DIType. As an example, say I have a DICompositeType describing a struct. Now I want to find other DITypes containing this type, e.g. a DIDerivedType describing its pointer type, or another struct containing this struct as one of its fields. Is this possible in LLVM today? Would it be unreasonable to save Metadata to Metadata uses, like what is done for Value to Value uses?

Any help would be greatly appreciated,
Henrik

It might help if you could explain what you are *actually* trying to do, since there are often other solutions for the higher-level problem that fit better into the design and architecture.

-- adrian

Ah yes, of course! For our thesis we’re trying to reconstruct Value names from the IR to C syntax, to help with clarity in optimisation remarks. To do this for something like a GetElementPointer we first have to find the name of the pointer operand, and then we try to name the offsets. Naming array offsets is relatively straightforward, but for structs we need the DICompositeType which contains the struct field names. So we make the recursive call to name the pointer operand also return the operand’s DIType, and from this we get the base type of the pointer. However we’re struggling with handling bitcasts properly at the moment. When the pointer operand of a GEP is a bitcast, say from a struct pointer type to a pointer to the struct’s first field we can figure that out by diffing the Value types and then traversing the DIType accordingly. When the cast is from smaller to wider type, on the other hand, we cannot just traverse the DIType structure to the wider type as the link is only in one direction. We recognise that there may be several potential wider types matching the Value type, but this is a best effort matching.

We have discussed traversing all the DITypes in the module to find uses of the smaller type, and we have also thought about making changes to clang to emit debug variable intrinsics for bitcasts so we can find the correct DIType from the bitcast instruction itself. We’re not sure how feasible these are in the grander scheme of things. Would emitting debug variable intrinsics for all bitcasts have a semantics mismatch in later passes when DWARF data is generated? Is it feasible to traverse all of the DITypes without damaging performance too much? Would saving the uses in the reverse direction like for Values have too large of a memory footprint? Obviously we’ll make the changes we need to make things work for our thesis, but if possible we would like to not make changes that are invasive enough that it’s unmergeable back into mainline LLVM.

/Henrik

Thanks, that helps. At a very high level, I would be very careful about adding additional metadata. Particularly the DIType hierarchy can get quite large and has in the past been a memory and performance bottleneck. If you are planning to upstream your work, you’ll need to measure the impact on a full-LTO build of, e.g., clang itself and prove that the memory usage doesn’t explode. However, it sounds like what are trying to do is more pointed to improve optimization remarks, so it might be feasible to just scan the information on demand looking just at what is currently visible (like the IR Verifier is doing, for example), without blowing up the footprint for everything else.

Would it be unreasonable to save Metadata to Metadata uses, like what is done for Value to Value uses?

If you look at DIComposite type in C++, we don’t even store the uses in the other direction as metadata pointer, but instead refer to types by their unique name, to support type uniquing in LTO.

Admittedly, I haven’t thought about this deeply, but the way I would approach it would be to enumerate what types are visible from within each (inlined) lexical scope by walking only the llvm.dbg.* intrinsics in that scope, and build up a dictionary on the side to capture the reverse links.