We recently started with importing non-toy programs from LLVMIR into MLIR’s LLVM dialect and came to the realization that LLVM IR’s debug metadata can have cyclic dependencies.
An example of such a cyclic dependencies is a struct that has a pointer to itself, such as a linked list node:
struct node {
int val;
struct node *next;
};
This structure results in the following LLVM IR debug info metadata (simplified):
!16 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "node", elements: !17, identifier: "_ZTS4node")
!17 = !{!18, !19}
!19 = !DIDerivedType(tag: DW_TAG_member, name: "next", baseType: !20)
!20 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !16)
which has two back references from the derived types !19 and !20, which represent the next member variable and its type, to the composite type !16, which represents the node. In particular, 20! describes the node pointer type resulting in the cycle !16 → !17 → !19 → !20 → !16.
At the moment, MLIR LLVM dialect represents metadata using attributes that reference each other. If we remove the next member from the struct, the import produces the following LLVM dialect attributes (simplified):
#val_type = #llvm.di_derived_type<
tag = DW_TAG_member, name = "val", baseType = #di_basic_type
>
#node_type = #llvm.di_composite_type<
tag = DW_TAG_structure_type, name = "node", elements = #val_type
>
Unfortunately, things stop working once we try to import the full struct due to the cyclic dependencies discussed above. Essentially, there is no topological sort / sequential construction order for the debug info attributes anymore.
We thus need a way to break the cyclic dependencies in the debug info representation. At the moment, our working assumption is these cycles mostly show up for composite types, but there may be more such instances (let us know if you are aware of more such constructs!). We see currently two ways forward:
-
Use symbols to break cyclic dependencies. Other MLIR representations of LLVM IR metadata - such as access groups, alias scopes, and soon TBAA metadata - use operations that are referenced by symbols to represent metadata. We could introduce such operations to represent composite types or even more debug metadata constructs.
-
A less intrusive approach could be to use the fact that composite types have an identifier that is supposed to be unique in the compilation unit. We could use this fact to represent back edges to the composite type using this identifier. While clearly a smaller change, this approach is also much more specific to the problem at hand and it may break if there are other kinds of cyclic dependencies.
Let me give you a bit more context on how these solutions could look like. A symbol based approach (solution 1) could use the existing llvm.metadata operation to specify the debug info related information:
llvm.metadata @__debug_info {
llvm.di_derived_type @next_type {
tag = DW_TAG_member, name = "next", baseType = @ptr_type
}
llvm.di_composite_type @node_type {
tag = DW_TAG_structure_type, name = "node", elements = @next_type
}
llvm.di_derived_type @ptr_type {
tag = DW_TAG_pointer_type, base_type = @node_type
}
}
This solution breaks the cyclic dependencies using symbols. One disadvantage I see over the attribute based approach is that it may be harder to remove stale debug information.
An alternative (solution 2) may be to replace the back reference with a special terminal node that references the composite type using the unique identifier:
#next_type_ref = #llvm.di_composite_type_ref<
identifier = "_ZTS4node"
>
#ptr_type = #llvm.di_derived_type<
tag = DW_TAG_pointer_type, base_type = #next_type_ref
>
#node_type = #llvm.di_composite_type<
tag = DW_TAG_structure_type, name = "node", elements = #next_type, identifier = "_ZTS4node"
>
#next_type = #llvm.di_derived_type<
tag = DW_TAG_member, name = "next", baseType = #ptr_type
>
This solution breaks the cyclic dependencies using a string reference. It assumes the composite type has a unique identifier, which seemingly is the case for ODR source languages (https://llvm.org/docs/LangRef.html#dicompositetype). During the export the composite type reference would then be replaced by a real back edge pointing to the exported composite type metadata node.
Any thoughts and ideas on these two or a possible third solution are very welcome.