I started looking into different solutions for the cyclic debug metadata handling and I am currently unsure how to move forward. Above we discussed two main approaches:
- Attach the type debug information to an operation embedded in a global metadata operation and reference these operations using symbols to break the cycles.
- Use mutable attributes to represent the cyclic dependencies.
I did prototype solution 2) since it is closer to the current debug metadata handling. AFAIK tablegen does not support mutable attributes. The change would therefore produce significant amounts of C++ code for every attribute that has to be mutable. Another issue is that the printer does not replace nested attributes with aliases, which means that nested attributes may be printed multiple times (supposedly due to llvm-project/AsmPrinter.cpp at d94b069a89ec6c54030540c031a1032845bdbac0 · llvm/llvm-project · GitHub).
Solution 1) works with module-level operations but seems more heavy-weight and less scalable than the current solution. For example, dropping unused module-level type information does not happen automatically anymore and has to be done by a module pass. I may be wrong though and the additional cost due to the operation / symbol handling is negligible.
I have a slight preference for solution 1) since it follows the design of the alias scope and access group metadata. I believe it is also more flexible if we have to model additional cyclic dependencies and it does not require C++ attribute definitions.
I also considered other possible solutions:
- AFAIK MLIR core currently does not have a way to represent structs and other composite types. It may thus be ok if we do not support these concepts in the debug information as well. We could then stick with the current implementation and simply make sure the import from LLVM IR drops cyclic debug info.
- We could change the representation of the debug info to avoid the cyclic dependencies without losing any information. Intuitively, the type debug info should be representable by a tree where a type is then fully defined by the leafs and the root of the tree (c.f., example below). Such a representation deviates quite a bit from the LLVM IR debug metadata representation and makes import and export more complex.
At the moment, LLVM IR represents a composite type using a root node that contains a list of elements that point to the fields of the struct/class composite type. The fields are modeled as derived types that have a back reference to the root node using them. All uses of a type, for example, all variables of that type, then have a reference to the root node only:
#root = di_composite_type<name = "A", elements = #leaf1, #leaf2>
#leaf1 = di_derived_type<name = "field1", scope = #root>
#leaf2 = di_derived_type<name = "field2", scope = #root>
#variable = di_variable<type = #root>
As we cannot model the back edges from the leaf nodes to the root node, we may simply drop the “elements” parameter of the composite type and reconstruct it when translating to LLVM IR. Dropping these edges does not cause an information loss since the leaf nodes still have a scope parameter that points to the composite type, which means the dropped elements parameter can be reconstructed. There is one caveat though, we need to make sure the export can find the leaf nodes to reconstruct the dropped edges. This could be achieved by introducing a new node kind that contains the root and the leaf nodes needed to reconstruct debug information of the type. The cycle free representation thus could look as follows:
#root = di_composite_type<name = "A">
#leaf1 = di_derived_type<name = "field1", scope = #root>
#leaf2 = di_derived_type<name = "field2", scope = #root>
#struct_type = di_type<root = #root, leaf_nodes = #leaf1, #leaf2>
#variable = di_variable<type = #struct_type>
The export to LLVM IR would then walk back from the leaf nodes to the root and reintroduce the dropped edges.
While solution 4) seems attractive for the simple example, it is more complex for real-world examples that have to deal with nested and dependent structs. For example, if two structs have pointers to each other (A->B and A<-B), then both of them need to know the union of A’s and B’s leaf nodes. The advantage of the solution is that it is cycle free (modulo overlooks from my side) and that it could be modeled with non-mutable attributes.
Any thoughts on a preferred or better solution? At the movement, I favor 1) or 4).