TL;DR
Our current way of printing LLVMStructType in the LLVM dialect doesn’t scale well, in terms of memory consumption and printing speed, when it comes to complicate structs (multiple layers of nested structs). We are proposing a new way to print LLVMStructType, consisting of type aliasing and shorthand syntax.
Note that this proposal not only changes LLVM dialect but also affects how types alias is implemented.
Problem definition
Here is what struct types look like in LLVM dialect:
!llvm.struct<"simple", (i32, i32)>
!llvm.struct<"nested", (ptr<struct<"simple", (i32, i32)>>, i16, i8)>
!llvm.struct<"self_reference", (ptr<struct<"self_reference">>, i16, i32)>
!llvm.struct<"nested_sibling", (ptr<struct<"simple", (i32, i32)>>,
ptr<struct<"simple", (i32, i32)>>)>
From “nested” we saw that LLVM dialect always spells out the body of nested struct types (i.e. the inner “simple”), unless it’s referencing itself, like “self_reference” struct above. In which case it uses a shorthand syntax, !llvm.struct<"struct_name">
.
Note that shorthand syntax is only applicable to types that had appeared in the parent hierarchy, so “nested_sibling” still spells out “simple” twice in sibling element types.
Imagining we have a struct type that is really wide (has large number of child element types) and deep (has pointer to another struct that also has pointer to another different struct…) and there are many usages of this struct type among all the operations (references of struct type in operations also spell out the struct body), it’s really easy to spend significant amount of memory for those (spelled-out) struct bodies strings and take quite a long time to print them out as well. From our previous trials using real-world code (SQLite3), it took more than 5 minutes just to print MLIR out before it was killed due to OOM (on a 32GB memory machine).
Proposed solution
The solution is consisting of two parts. First, the printer always prints the shorthand syntax for struct types whose body had been printed before in the same module. In other words, we memorized the struct bodies we printed before globally and avoid printing them again.
llvm.func @foo(%arg0: !llvm.ptr<i8>) {
// "hello" 's body is spelled out here
%0 = llvm.bitcast %arg0: !llvm.ptr<i8> to !llvm.ptr<struct<"hello", (i32, i32)>>
// "hello" is printed in shorthand syntax here
%1 = llvm.load %1: !llvm.ptr<struct<"hello">>
...
}
// "hello" is printed in shorthand syntax here
llvm.func @bar(%arg0: !llvm.ptr<struct<"hello">>) {
...
}
However, it might be a little hard for developers to tracing back to find the body of a struct when it is printed in shorthand syntax, because the body might be printed in arbitrary position in previous code.
To aid this issue, the second part of this proposal creates type aliases for struct types:
// "hello" 's body is always spelled out at top level
!llvmStruct_hello = !llvm.struct<"hello", (i32, i32)>
llvm.func @foo(%arg0: !llvm.ptr<i8>) {
%0 = llvm.bitcast %arg0: !llvm.ptr<i8> to !llvm.ptr<!llvmStruct_hello>
%1 = llvm.load %1: !llvm.ptr<!llvmStruct_hello>
...
}
llvm.func @bar(%arg0: !llvm.ptr<!llvmStruct_hello>) {
...
}
Whether we should only create aliases for expensive struct types or every struct types is opened for discussion. The point is, developers can quickly navigate to the struct body since type alias definitions always appear at the top.
Let’s look an example of these type aliases:
!alias_a = !llvm.struct<"nested_sibling", (ptr<struct<"simple", (i32, i32)>>,
ptr<struct<"simple">>)>
!alias_b = ...
!alias_c = ...
...
!alias_z = !llvm.struct<"simple">
Now when we’re tracing the type using !alias_z
, a similar problem we discussed above emerged again: We would like to know the body of “simple”, but that piece of information is located somewhere in the previous type alias definitions, possibly buried in another large struct body.
A simple re-ordering can solve this issue though:
!alias_z = !llvm.struct<"simple", (i32, i32)>
!alias_a = !llvm.struct<"nested_sibling", (ptr<struct<"simple">>,
ptr<struct<"simple">>)>
Basically, when encountering a nested struct, it will be better to have its type alias definition hoisted before the current one. It looks more naturally and it’s also easier for developers to track the struct body from a type alias reference.
Such ordering policy is the last piece of change we would like to propose here.
Your comments on this proposal are much appreciated!
Q & A
Q: Are type alias definitions ordered now?
A: Yes, they’re organized in lexical order by their alias names (I don’t know why).
Q: Can we use previously defined struct type aliases in another struct type alias definition appears later, instead of shorthand syntax?
A: I tried this before, but a MLIR type alias always needs to appear before its reference so it’s not applicable for struct types that have mutual references. We can mix these two strategies together though (using type alias for most of the time and only use shorthand syntax for mutual referencing structs). It’s opened for discussion.