Handling Cyclic Dependencies in Debug Info

We recently started with importing non-toy programs from LLVMIR into MLIR’s LLVM dialect and came to the realization that LLVM IR’s debug metadata can have cyclic dependencies.

An example of such a cyclic dependencies is a struct that has a pointer to itself, such as a linked list node:

struct node {
 int val;
 struct node *next;
};

This structure results in the following LLVM IR debug info metadata (simplified):

!16 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "node", elements: !17, identifier: "_ZTS4node")
!17 = !{!18, !19}
!19 = !DIDerivedType(tag: DW_TAG_member, name: "next", baseType: !20)
!20 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !16)

which has two back references from the derived types !19 and !20, which represent the next member variable and its type, to the composite type !16, which represents the node. In particular, 20! describes the node pointer type resulting in the cycle !16 → !17 → !19 → !20 → !16.

At the moment, MLIR LLVM dialect represents metadata using attributes that reference each other. If we remove the next member from the struct, the import produces the following LLVM dialect attributes (simplified):

#val_type = #llvm.di_derived_type<
  tag = DW_TAG_member, name = "val", baseType = #di_basic_type
>
#node_type = #llvm.di_composite_type<
  tag = DW_TAG_structure_type, name = "node", elements = #val_type
>

Unfortunately, things stop working once we try to import the full struct due to the cyclic dependencies discussed above. Essentially, there is no topological sort / sequential construction order for the debug info attributes anymore.

We thus need a way to break the cyclic dependencies in the debug info representation. At the moment, our working assumption is these cycles mostly show up for composite types, but there may be more such instances (let us know if you are aware of more such constructs!). We see currently two ways forward:

  1. Use symbols to break cyclic dependencies. Other MLIR representations of LLVM IR metadata - such as access groups, alias scopes, and soon TBAA metadata - use operations that are referenced by symbols to represent metadata. We could introduce such operations to represent composite types or even more debug metadata constructs.

  2. A less intrusive approach could be to use the fact that composite types have an identifier that is supposed to be unique in the compilation unit. We could use this fact to represent back edges to the composite type using this identifier. While clearly a smaller change, this approach is also much more specific to the problem at hand and it may break if there are other kinds of cyclic dependencies.

Let me give you a bit more context on how these solutions could look like. A symbol based approach (solution 1) could use the existing llvm.metadata operation to specify the debug info related information:

llvm.metadata @__debug_info {
  llvm.di_derived_type @next_type {
    tag = DW_TAG_member, name = "next", baseType = @ptr_type
  }
  llvm.di_composite_type @node_type {
    tag = DW_TAG_structure_type, name = "node", elements = @next_type
  }
  llvm.di_derived_type @ptr_type {
    tag = DW_TAG_pointer_type, base_type = @node_type
  }
}

This solution breaks the cyclic dependencies using symbols. One disadvantage I see over the attribute based approach is that it may be harder to remove stale debug information.

An alternative (solution 2) may be to replace the back reference with a special terminal node that references the composite type using the unique identifier:

#next_type_ref = #llvm.di_composite_type_ref<
  identifier = "_ZTS4node"
>
#ptr_type = #llvm.di_derived_type<
  tag = DW_TAG_pointer_type, base_type = #next_type_ref
>
#node_type = #llvm.di_composite_type<
  tag = DW_TAG_structure_type, name = "node", elements = #next_type, identifier = "_ZTS4node"
>
#next_type = #llvm.di_derived_type<
  tag = DW_TAG_member, name = "next", baseType = #ptr_type
>

This solution breaks the cyclic dependencies using a string reference. It assumes the composite type has a unique identifier, which seemingly is the case for ODR source languages (https://llvm.org/docs/LangRef.html#dicompositetype). During the export the composite type reference would then be replaced by a real back edge pointing to the exported composite type metadata node.

Any thoughts and ideas on these two or a possible third solution are very welcome.

There are two other ways to model cyclic dependencies in MLIR, not mentioned here.

  • Make #llvm.di_composite_type a mutable attribute, similar to how !llvm.struct type currently works, that is identified by its identifier. You could then instantiate it with just the identifier name and the later on set its elements, forming a cycle. This is relatively close to your solution 2, although arguably a better treaded path.
  • Similar to solution 1, have you considered a graph region? MLIRs regions/blocks don’t have to follow strict SSA dominance rules. Similar to llvm.metadata, there could be a top level llvm.debug_info op with a graph region. This could eg. then look like:
llvm.debug_info {
  %next_type = llvm.di_derived_type {
    tag = DW_TAG_member, name = "next", baseType = %ptr_type
  }
  %node_type = llvm.di_composite_type {
    tag = DW_TAG_structure_type, name = "node", elements = %next_type
  }
  %ptr_type = llvm.di_derived_type {
    tag = DW_TAG_pointer_type, base_type = %node_type
  }
}

That said, this solution would require the same kind of revamping.

1 Like

@zero9178 Thanks for these ideas!

I did not know that mutable attributes exist and it sounds like they are a strictly better alternative than encoding the back edge using a string. I think this insight makes solution 2 much more attractive!

The example I posted above does only show a subset of the debug metadata usage. One additional aspect, not shown previously, is that we need to reference the debug metadata from the actual program / ir:

// debug metadata definition
llvm.debug_info @ __debug_info {
 llvm.di_derived_type @var_type { ... }
}

// somewhere in the program
llvm.intr.dbg.value llvm.di_local_variable<name="my_var", type = @ __debug_info::var_type> = %arg : i64

The example shows how a debug intrinsic reference a type definition inside the metadata region using a symbol. While a graph region is a great solution for the local dependencies, it does not solve the task of relating debug info to the program / ir? We would thus presumably need a second mechanism to link metadata to the program, which makes me believe using symbols is a better solution.

I started looking into different solutions for the cyclic debug metadata handling and I am currently unsure how to move forward. Above we discussed two main approaches:

  1. Attach the type debug information to an operation embedded in a global metadata operation and reference these operations using symbols to break the cycles.
  2. Use mutable attributes to represent the cyclic dependencies.

I did prototype solution 2) since it is closer to the current debug metadata handling. AFAIK tablegen does not support mutable attributes. The change would therefore produce significant amounts of C++ code for every attribute that has to be mutable. Another issue is that the printer does not replace nested attributes with aliases, which means that nested attributes may be printed multiple times (supposedly due to llvm-project/AsmPrinter.cpp at d94b069a89ec6c54030540c031a1032845bdbac0 · llvm/llvm-project · GitHub).

Solution 1) works with module-level operations but seems more heavy-weight and less scalable than the current solution. For example, dropping unused module-level type information does not happen automatically anymore and has to be done by a module pass. I may be wrong though and the additional cost due to the operation / symbol handling is negligible.

I have a slight preference for solution 1) since it follows the design of the alias scope and access group metadata. I believe it is also more flexible if we have to model additional cyclic dependencies and it does not require C++ attribute definitions.

I also considered other possible solutions:

  1. AFAIK MLIR core currently does not have a way to represent structs and other composite types. It may thus be ok if we do not support these concepts in the debug information as well. We could then stick with the current implementation and simply make sure the import from LLVM IR drops cyclic debug info.
  2. We could change the representation of the debug info to avoid the cyclic dependencies without losing any information. Intuitively, the type debug info should be representable by a tree where a type is then fully defined by the leafs and the root of the tree (c.f., example below). Such a representation deviates quite a bit from the LLVM IR debug metadata representation and makes import and export more complex.

At the moment, LLVM IR represents a composite type using a root node that contains a list of elements that point to the fields of the struct/class composite type. The fields are modeled as derived types that have a back reference to the root node using them. All uses of a type, for example, all variables of that type, then have a reference to the root node only:

#root = di_composite_type<name = "A", elements = #leaf1, #leaf2>

#leaf1 = di_derived_type<name = "field1", scope = #root>

#leaf2 = di_derived_type<name = "field2", scope = #root>

#variable = di_variable<type = #root>

As we cannot model the back edges from the leaf nodes to the root node, we may simply drop the “elements” parameter of the composite type and reconstruct it when translating to LLVM IR. Dropping these edges does not cause an information loss since the leaf nodes still have a scope parameter that points to the composite type, which means the dropped elements parameter can be reconstructed. There is one caveat though, we need to make sure the export can find the leaf nodes to reconstruct the dropped edges. This could be achieved by introducing a new node kind that contains the root and the leaf nodes needed to reconstruct debug information of the type. The cycle free representation thus could look as follows:

#root = di_composite_type<name = "A">

#leaf1 = di_derived_type<name = "field1", scope = #root>

#leaf2 = di_derived_type<name = "field2", scope = #root>

#struct_type = di_type<root = #root, leaf_nodes = #leaf1, #leaf2>

#variable = di_variable<type = #struct_type>

The export to LLVM IR would then walk back from the leaf nodes to the root and reintroduce the dropped edges.

While solution 4) seems attractive for the simple example, it is more complex for real-world examples that have to deal with nested and dependent structs. For example, if two structs have pointers to each other (A->B and A<-B), then both of them need to know the union of A’s and B’s leaf nodes. The advantage of the solution is that it is cycle free (modulo overlooks from my side) and that it could be modeled with non-mutable attributes.

Any thoughts on a preferred or better solution? At the movement, I favor 1) or 4).

@River707 @ftynse

Mutable types were specifically introduced to support structs with cycling references in LLVM dialect struct types - [RFC] First-Party Support for LLVM Types.

We also have a mechanism to print without causing infinite recursion, llvm-project/LLVMTypeSyntax.cpp at ae0c232e46c8d8b4fc3d0e007094db216d7a82f6 · llvm/llvm-project · GitHub. It should be almost directly reusable for attributes. I don’t know how aliases are involved here.

Mutable attributes are not supported by ODS because there is one upstream client for them. The amount of code (and related testing) necessary for supporting these outweighs the amount of code in direct implementation of the only client (LLVMStructType) in C++. Depending on how many mutable attributes you will need to add, it may be worth supporting them in ODS.

As you might have guessed, my preference specifically for debug info related to self-referencing types is to follow the design of self-referencing types, that is, (2). This simplifies the conceptual mapping between the type and the related debug information.

That being said, I don’t have a complete understanding of how debug info is represented in LLVM IR, so symbols or graph region approach may also work I’m not in favor of a significantly diverging modeling (4) as we would like to keep the translation process simple. This modeling would require us to perform heavy graph manipulations in the translation that will be difficult to test and debug. I’d rather debug boilerplate op definitions.

Thanks for the pointers to the printer!

Mutable types were specifically introduced to support structs with cycling references in LLVM dialect struct types - [RFC] First-Party Support for LLVM Types.

I understand the argument of using identical solutions for similar problems. I was assuming LLVM structs with cyclic dependencies would disappear once there are only opaque pointers. I may be wrong though and cyclic structs are still a thing past the switch to opaque pointers only?

I’m not in favor of a significantly diverging modeling (4)

That is a good point. Then I will go for a solution that tries to import/export the debug metadata as is (most likely using mutable attributes). Luckily, it seems like it is sufficient to make the composite type attribute mutable.

I assume they will, but I’m waiting for those changes to definitively land before reflecting them in MLIR to avoid churn. That being said, I don’t know how this affects debug information. FWIW, debug information would be the place to store the underlying pointee type even if we make it opaque in the IR.

I assume they will, but I’m waiting for those changes to definitively land before reflecting them in MLIR to avoid churn. That being said, I don’t know how this affects debug information. FWIW, debug information would be the place to store the underlying pointee type even if we make it opaque in the IR.

Yes the opaque pointers do not solve the debug info problem. Front-end languages will stick to their own cyclic type representations. I was more thinking that there is less of a need to follow the LLVMStructType design given that it may not be super long lived. That said, I think mutable attributes are the better solution in theory. The scary part was my prototype implementation…

Sorry, I’ve been very out for the past month. Just now really getting back into work mode.

This is a great write up @gysit thanks for doing this!

One of the big factors I think we need to take into consideration here is what the intended use cases are for the metadata. If we need to mutate/update/etc. this metadata, different representations are significantly more annoying to deal with in certain scenarios (the symbol approach could potentially break any non-module level passes that want to change/update metadata).

@gysit Is your prototype using mutable attributes public? If it’s the disgustingness of having to rewrite in C++, we should honestly just update the Attr/Type gen to support that. It was always a TODO eventually, might be worth just adding that now. Happy to look into that if you need it.

– River

2 Likes

I uploaded a draft revision (⚙ D142189 WIP mutable composite type attribute.). The “prototype” is very hacky and misses some functionality (e.g., the attribute is incomplete and printing is suboptimal). Despite the incompleteness the amount of C++ is large compared to the existing tablegen implementation. Especially, since testing of the error handling during parsing will be necessary etc. I thus think it would be very nice to have support for mutable attributes. The question is if DICompositeTypeAttr and LLVMStructType justify the effort. Especially, since LLVMStructType is quite complex and probably hard to fully implement in tablegen.

When doing the prototype, I saw the following pain points:

  1. It is definitely quite a lot of C++ code compared to the few lines of tablegen.
  2. When printing any nested attribute is fully printed even if an alias would exist. I am worried that this bloats the size of the textual representation by quite a bit.
  3. The elements field of DICompositeType is not part of the uniquing anymore. That means I most likely need an extra string identifier (the prototype uses an integer identifier) to ensure composite types that differ only in terms of elements can be represented. This identifier may also be used to print a short form if the attribute references itself. Unfortunately, it seems like I cannot use the name of the DICompositeType since there may be different composite types with the same name.

Let me illustrate some of these points. As mentioned in 2), if I parse

#int0 = #llvm.di_basic_type<
  tag = DW_TAG_base_type, name = "int"
>
#comp1 = #llvm.di_composite_type_mut<
  id = 0, name = "test1", elements = #int0
>

it re-prints as

#int = #llvm.di_basic_type<
 tag = DW_TAG_base_type, name = "int"
>

#comp = #llvm.di_composite_type_mut<
  id = 0, name = "test1", 
  elements = #llvm.di_basic_type<tag = DW_TAG_base_type, name = "int0">
>

with the integer type being printed twice (since it is used in another place in the file). This may be a shortcoming of my implementation since I do not yet implement SubElementAttrInterface. Or, it could be a limitation of the AsmPrinter (llvm-project/AsmPrinter.cpp at d94b069a89ec6c54030540c031a1032845bdbac0 · llvm/llvm-project · GitHub)?

The printing gets more complex if the attribute has self references (point 3)).

#comp = #llvm.di_composite_type_mut<
  id = 0, name = "test0",
  elements = #llvm.di_composite_type_mut<id = 0, name = "test1">
>

The custom print method here needs to detect the type is printed recursively and drop the mutable parts when printing the inner instance of the type. I think it would be much nicer if the alias could be printed instead.

#comp = #llvm.di_composite_type_mut<
  id = 0, name = "test0",
  elements = #comp
>

It can also happen that the alias is unknown if two composite types mutually reference each other. I thus believe the solution of LLVMStructType with a unique identifer StringAttr is the best solution for the problem (c.f. llvm-project/llvmir-types.mlir at 87e345b1bdb76867cc6e9ae59b6dd2633a480d38 · llvm/llvm-project · GitHub) and working with aliases only cannot work.

Admittedly, points 2) and 3) are cosmetic problems and not actual implementation problems.