We don’t have a presentation tomorrow, instead we’ll discuss the topic of partial lowering, type conversion, and the role of “cast” operations.
Below are some thoughts to prime the discussion, but I’d love to hear about everyone’s experience!
Composability of the lowering strategies in MLIR involves partial lowering, where different parts of a program are lowered through different paths in stages. The lowering of operations composes naturally and does not pose any particular friction. However type conversions are more challenging and require materializing explicit conversions in the IR to cast back-and-forth between partially converted types.
For example, look at this piece of IR:
%0 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%1 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%2 = "toy.mul"(%0, %1) : (!toy<"array<?x?xf32>">, !toy<"array<?x?xf32>">) -> (!toy<"array<?x?xf32>">)
Starting from this IR which is operating on toy array type, a partial lowering going through a hypothetical tensor type could look like:
%0 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%1 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%2 = "std.mulf"(%0, %1) : (!toy<"array<?x?xf32>">, !toy<"array<?x?xf32>">) -> (tensor<?x?xf32>)
This would break the verifier for the std.mulf operation which expects to operate on tensor but has toy array input. This requires, in general, the introduction of some sort of cast operation to fix the type system during partial lowering.
%0 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%1 = read_array_from_file(%filename) : (!toy<"string">) -> (!toy<"array<?x?xf32>">)
%0_cast = cast(%0) : (!toy<"array<?x?xf32>">) -> tensor<?x?xf32>
%1_cast = cast(%1) : (!toy<"array<?x?xf32>">) -> tensor<?x?xf32>
%2 = "std.mulf"(%0_cast, %1_cast) : (tensor<?x?xf32>, tensor<?x?xf32>) -> (tensor<?x?xf32>)
This kind of cast can be seen as a “promise” that the two sides of the cast will ultimately converge to the same type after all partial conversions have finished, and then the cast can be eliminated. Such an operation could be interpreted opaquely in the most conservative way (all possible effects, escaping pointers, etc.).
In the current state, the only such operation is llvm.mlir.cast, which allows for casting between an LLVM type and a builtin type in both directions. This operation isn’t helpful, however, when types other than the builtins are involved.
The TypeConverter class exposes hooks that can be extended by users to materialize such conversions. Unfortunately, the configuration of these hooks isn’t very convenient or easy to set up in a coherent way in a multi-pass lowering pipeline at the moment. (Note that this issue can be separable, but is still related to the discussion)
It also does not compose necessarily well when a single SSA value involves multiple types, like a container: !my.list<!other.type> where either the container type or the inner type (or both!) can require a conversion.
Additionally, we want this conversion to be configured on a per-module basis, so that multi-module MLIR modules (such as heterogeneous compilation) work cleanly.
So far, the lack of a general opaque cast means that users must provide these in an ad-hoc fashion. This is cumbersome, in some situations the cast operation is only required because of the way that dialect conversion works, and it isn’t always clear which dialect to add such ops. For example the llvm.mlir.cast
operation only handles builtin types and any other dialect can’t directly reuse it. This does have the advantage, though, of providing immediate verification; as each of the ad-hoc user-provided conversion operations can enforce that the conversion happens only between two supported and “compatible” types. A generic cast would “delay” any lowering errors to the end of the pipeline, when the lack of convergence leaves out some unresolved cast.
Finally one last issue may be one of arity where 1->N or N->N conversions are required, however this isn’t a common case and the conversion framework already has limited support for making such cases Just Work.
Below is a summarization of some of the issues related to type conversions and cast operations, that have been encountered by real users:
- The cast operation is often only required by the nature of dialect conversion.
- In some situations, a cast operation is only temporarily necessary during a single conversion pass and isn’t intended to outlive the pass. Unfortunately, this still requires that users define a custom cast operation just for this step, creating a situation only necessary because they used the conversion infra.
- TypeConverters are currently configured individually by each lowering pass, meaning that conversions and materializations are not shared.
- This prevents partial lowering from being a viable lowering strategy in many cases, because operands/block arguments/etc. can’t be properly converted.
- It isn’t clear where each cast operation should live.
- The location in which the cast operation is placed is often decided arbitrarily. In some cases, it is added to the dialect that can accept a dependency on both input and output types. In other cases, it is placed in the dialect that is “less important”, i.e. the dialect that is “more willing” to take an additional operation/dependency.
- Cast operations today either have low reusability or just “accept the world”.
- llvm.mlir.cast operation takes a slightly principled approach in that it only handles builtin types. The cost of this is that the cast is only viable in certain scenarios. Some get around this, several dialects simply define “any_to_any” casts to satisfy the constraints of the framework.
- It seems useful to have a concept of a cast-like op that “needs to be eliminated” by matching/folding with other cast ops but doesn’t have an independent executable semantics.
- For example, folding
cast_a_to_b(cast_b_to_a(x)) -> x
would be the only valid way to lower those ops. This pairing of cast ops with other ops that undo them is always naturally satisfied when using the dialect conversion infrastructure. - See discussion in:
D91967 [mlir][std] Mark tensor_to_memref as no side-effect
- Having a unified cast op formalizes these requirements.
- For example, folding
Another previous discussion on Discourse can be found in [RFC] Dialect type cast op.