[RFC] A New "One-Shot" Dialect Conversion Driver

ftynse · May 22, 2024, 7:25pm

Fair point. I am coming from the implicit assumption that the “proper” way of using the conversion infra is to collect all patterns associated with a single type system change and do one sweep, which will make this situation significantly less likely.

gysit · May 23, 2024, 6:10am

Was it ever considered to do the cloning approach on a per operation basis? A partial conversion could theoretically clone only the operations that need to be converted and insert them right after the original operation (in the same module). A separate map data structure could then track original and converted operations. This map could be used to provide follow up pattern applications with original and converted operands. The conversion pattern driver would then be responsible to replace/erase the original operations after all operations have been converted.

The difficulty is probably the region handling (I presume the regions would have to be moved to the newly created operations and temporary unrealized conversion casts would need to be introduced to provide versions of the block arguments with the original type) and maybe also the step that replaces the uses of the original operations. Rollback sounds also difficult with regards to the region handling, but not impossible.

matthias-springer · May 23, 2024, 8:05am

Cloning works best on isolated-from-above regions. Otherwise, cloned ops may be visible when iterating over the users of an SSA value.

gysit · May 23, 2024, 8:22am

True, during the conversion the IR would be in a transient state with old and new operations interleaved. That may not be desired.

ingomueller-net · May 23, 2024, 1:45pm

I would like to understand a bit better how type conversion would be handled in the various proposals.

The original proposal says this:

and a later addition says this:

I could understand the latter as: the pattern has to do the type conversion itself it it wants to change types. This would be possible if the pattern itself creates (source-to-target) unrealized_conversion_casts from the original input values and (target-to-source) before replacing the results. Note that in that case, each pattern would actually preserve the (source) types. Also note that, f the same type conversions are used everywhere, most casts would cancel each other out and disappear during pattern application.

Is this what you had in mind, @matthias-springer?

A similar approach is used in the 1:N type conversion pass. The logic of adding the forward and backward casts is encapsulated in OneToNConversionPattern (here and here).

This sort of works and the infrastructure does not need to know about these case unless we want to support user-provided source and target materializations: if some of the unrealized casts that don’t fold away should be replaced by a specific logic, then we need to track which are the casts that where added during conversion and replace those. (Examples for a 1:N conversion are tuples being decomposed into their individual fields or complex numbers being decomposed into their real and imaginary parts, where the source-to-target materialization could be tuple.to_values %i : tuple<!a, !b, !c> -> (!a, !b, !c) or complex.re+complex.im and target-to-source materialization doing the reverse, respectively.)

The 1:N conversion solves this in kind of an ugly way: it annotates all casts that it inserts with a particular attribute (here) and then, after applying all patterns, walks over the entire IR and replaces all ops with that attribute with the user-provided logic (here). I guess the reason for doing so was to be able to re-use the greedy pattern driver.

If, instead, the type conversion is handled by the infrastructure, I think it could more easily track the casts that it inserts (assuming that that’s what it ends up doing to bridge source and target types) and then clean up remaining casts with user materializations after pattern application.

I guess that would be another argument for having the casts being inserted by the infrastructure?

BTW, since the 1:N type conversion follows the approach of eagerly inserting casts, it may be worth trying to get an idea of how bad that problem actually is by experimenting with that for a bit…

Is this still the case? Can you provide an example for this, @ftynse?

To be sure we agree on why this is important: if we were to provide a pattern the original op with unrealized casts from the target to the source type system, then patterns can (1) look at the new types through an adaptor and (2) look at the old type by inspecting the root op of the pattern. (This is done in the 1:N type conversion.) However, because the inputs of both may now be produced by a cast, patterns can’t look beyond that. I doubt that that’s common and I think we need good reasons to support it.

mehdi_amini · May 23, 2024, 2:41pm

I am confused: how does I differ from what the current conversion driver does? Is it just an extra step of cloning before invoking the patterns?

matthias-springer · May 23, 2024, 3:21pm

Yes, that’s what I had i mind. This should actually happen automatically in OneShotConversionPatternRewriter::replaceOp, which will insert the unrealized_conversion_cast if the types do not match. So it’s ultimately the driver that inserts the cast.

Yes, I expect many unrealized_conversion_cast ops to appear and disappear throughout the conversion process. (Thus the “pooling” idea that was mentioned above.)

Oh, I didn’t think of these. We could either keep track of these ourselves or delegate that to the user. Via the listener mechanism that allows the user to listen to op creations, etc. We probably didn’t have that listener support in the pattern driver yet when you implemented the 1:N conversion.

gysit · May 23, 2024, 3:51pm

I am confused: how does I differ from what the current conversion driver does? Is it just an extra step of cloning before invoking the patterns?

Maybe it is!

I thought the conversion driver only materializes IR changes at the end of the conversion (but maybe certain changes are materialized immediately during pattern execution?). The “cloning” approach would not do any book keeping except for storing a pair of original and converted operation for every converted op (plus for the block arguments). The conversion patterns themselves would also be more restrictive in the sense that they match and operation and return a converted clone. So “cloning”, is up to the pattern. Changing an operation in-place would not be possible in this model.

I think Matthias’ point is right tough. Having original and converted operations side-by-side during the conversion is confusing when debugging etc.

To illustrate what I meant. Let’s say we have the following input and want to convert from index to i64:

func.func(%arg0 : index) -> index {
  %res = arith.add %arg0, %arg0 : index
  func.return %res : index
}

Step one would be converting the function itself (someone needs to move the region and introduce casts for the block argument here…):

func.func(%arg0 : index) -> index 
func.func(%conv_arg0 : i64) -> i64 {
  %arg0 = unrealized_conversion_cast %conv_arg0 : (i64) -> index
  %res = arith.add %arg0, %arg0 : index
  func.return %res : index
}

Step two would be converting/cloning the ops in the function body (original and converted operations are side-by-side):

func.func(%arg0 : index) -> index 
func.func(%conv_arg0 : i64) -> i64 {
  %arg0 = unrealized_conversion_cast %conv_arg0 : (i64) -> index
  %res = arith.add %arg0, %arg0 : index
  %conv_res = arith.add %conv_arg0, %conv_arg0 : i64
  func.return %res : index
  func.return %conv_res : i64
}

Step three would be deleting the original operations:

func.func(%conv_arg0 : i64) -> i64 {
  %conv_res = arith.add %conv_arg0, %conv_arg0 : index
  func.return %conv_res : i64
}

Anyways I don’t want to (further) derail the discussion. I just thought this could help avoiding cloning the full module when doing partial conversion.

mehdi_amini · May 23, 2024, 5:22pm

When you call rewriter.create<Op>(...) it is immediately created.
The delayed part is “replaceAllUsesWith” which only populates a mapping. But it’s not even that the driver will materialize it, it is the future conversion patterns of the users that will do so.
So what you describe is not far from how the dialect conversion operates today, this what in the RFC is described as:

It is difficult to use and to debug. Some examples:

IR dumps during a dialect conversion (before/during/after a pattern application) are broken: ops that were already scheduled for erasure are still shown, value replacements have not manifested yet (they are tracked in an IRMapping instead).

Here is an example of a trace today (without type conversion):

func.func @func(%in1 : i32, %in2 : i32) -> i32 {
  %add = arith.addi %in1, %in2 : i32
  %mul = arith.muli %add, %in2 : i32
  %add2 = arith.addi %mul, %add : i32
  return %add2 : i32
}

Which you can try running with mlir-opt --convert-to-llvm -debug and you’ll get this trace (pruned here):

  * Pattern : 'arith.addi -> ()' {
    ** Insert  : 'llvm.add'(0x563d1f02cb20)
    ** Replace : 'arith.addi'(0x563d1eff7340)

"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<i32 (i32, i32)>, linkage = #llvm.linkage<external>, sym_name = "func", visibility_ = 0 : i64}> ({
^bb0(%arg0: i32, %arg1: i32):
  %0 = "llvm.add"(%arg0, %arg1) <{overflowFlags = #llvm.overflow<none>}> : (i32, i32) -> i32
  %1 = "arith.addi"(%arg0, %arg1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  %2 = "arith.muli"(%1, %arg1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  %3 = "arith.addi"(%2, %1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  "func.return"(%3) : (i32) -> ()
}) : () -> ()

Here the first add is converted to llvm.add, which is inserted right here immediately. The replaceAllUsesWith isn’t materialized: %1 = "arith.addi"(%arg0, %arg1) is still used in %2 = "arith.muli"(%1, %arg1) .
Then:

  * Pattern : 'arith.muli -> ()' {
    ** Insert  : 'llvm.mul'(0x563d1f02f540)
    ** Replace : 'arith.muli'(0x563d1eff7d50)
"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<i32 (i32, i32)>, linkage = #llvm.linkage<external>, sym_name = "func", visibility_ = 0 : i64}> ({
^bb0(%arg0: i32, %arg1: i32):
  %0 = "llvm.add"(%arg0, %arg1) <{overflowFlags = #llvm.overflow<none>}> : (i32, i32) -> i32
  %1 = "arith.addi"(%arg0, %arg1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  %2 = "llvm.mul"(%0, %arg1) <{overflowFlags = #llvm.overflow<none>}> : (i32, i32) -> i32
  %3 = "arith.muli"(%1, %arg1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  %4 = "arith.addi"(%3, %1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  "func.return"(%4) : (i32) -> ()
}) : () -> ()

Now we converted the mul: the llvm.mul is using the result of the llvm.add, but it’s not used anywhere else. The original ops are still connected as in the original IR: if we delete the llvm ops we have still the original IR.

At the end the IR looks like:

"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<i32 (i32, i32)>, linkage = #llvm.linkage<external>, sym_name = "func", visibility_ = 0 : i64}> ({
^bb0(%arg0: i32, %arg1: i32):
  %0 = "llvm.add"(%arg0, %arg1) <{overflowFlags = #llvm.overflow<none>}> : (i32, i32) -> i32
  %1 = "arith.addi"(%arg0, %arg1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  %2 = "llvm.mul"(%0, %arg1) <{overflowFlags = #llvm.overflow<none>}> : (i32, i32) -> i32
  %3 = "arith.muli"(%1, %arg1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  %4 = "llvm.add"(%2, %0) <{overflowFlags = #llvm.overflow<none>}> : (i32, i32) -> i32
  %5 = "arith.addi"(%3, %1) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
  "llvm.return"(%4) : (i32) -> ()
  "func.return"(%5) : (i32) -> ()
}) : () -> ()

Notice the two returns: we have the entire IR duplicated in the function. If you were to slice from the llvm.return you get the newly converted IR, similarly from the func.return you get the old one untouched.

The final stage of the driver is now to delete all the old operations that were “replacedAllUsesWith”. If any still has users which can’t be removed (haven’t been converted), an unrealized_cast is inserted to use the result of the new op (only when the type changes, if I remember correctly).

gysit · May 23, 2024, 5:40pm

Thanks for the explanation! I thought the operation creation is also delayed.

matthias-springer · May 26, 2024, 1:23pm

I built a proof-of-concept: [draft] Dialect Conversion without Rollback by matthias-springer · Pull Request #93412 · llvm/llvm-project · GitHub

Note, this implementation is not polished yet. Just to illustrate what I had in mind.

The new driver is compatible with existing dialect conversion patterns. I updated two test cases (nvgpu-to-nvvm, complex-to-standard). Minimal changes are needed. I did not want to build a completely separate dialect conversion, but something that can be gradually migrated to. (With the hope that the old driver can be removed at some point.)

The main implementation is in GreedyPatternRewriteDriver.cpp. I was able to reuse most of the worklist-based infrastructure. The implementation is around 250 LoC (partial conversion only, some TODOs left). The old driver is around 2000 LoC.

matthias-springer · May 27, 2024, 2:57pm

Yes, that would be possible. But it would increase the complexity of the existing driver further (additional “if” checks).

There are also some fundamental parts of the existing driver that we should change: e.g., it is based on recursion (to legalize newly created ops).

ftynse · May 28, 2024, 3:19pm

I don’t know if such a case currently exists upstream and don’t necessarily have time to look it up. Generally, I expect the burden of proof to be on whoever proposes a change.

If you suggest that patterns should not look beyond the root of the pattern, that sounds extremely restrictive for some cases. For example, if one wanted to rewrite select(cmpi) into a min/max, or basically anything that looks at getDefiningOp.

Can’t we just keep unrealized_conversion_cast, maybe with some additional annotation, and let the user rewrite those separately? FWIW, unrealized_conversion_cast did not exist when the materializations were first added, so one had to configure materializations.

One random comment. I remember several discussions around an operation-aware TypeConverter (as opposed to the current one that always returns the same converted type given original types). What’s your take on this?

nicolasvasilache · May 28, 2024, 3:45pm

+1 getting a context-aware type conversion is to me one of the biggest benefit of getting a new toy in this space.

matthias-springer · May 28, 2024, 6:58pm

This sounds like a good strategy. We can add a SmallVector<UnrealizedConversionCastOp> * field to ConversionConfig that will be populated with newly-inserted casts.

I think had uses cases for that in the past. So I would say generally a good idea. But also somewhat orthogonal to this driver discussion. (I think it could be implemented even in the current driver with not too much effort.)

AlexanderViand-Intel · June 24, 2024, 9:11pm

Sorry to take this further off-topic, but I’d also love to see an op/context-aware type converter as I just ran into the limitations of the current converter twice in a row.

On the actual topic: a simpler dialect converter, especially if it’s easier to debug, would be very much appreciated

matthias-springer · June 25, 2024, 7:21am

Can you describe a bit why an op-aware type converter would be useful? In theory, a dialect conversion could even be used without a type converter. Are you calling TypeConverter::convertType from within a conversion pattern?

AlexanderViand-Intel · July 8, 2024, 3:18pm

For example, in HEIR we have a situation where we translate from the abstract notion of “computation on secrets” to actual cryptographic operations over ciphertexts. There, we do a type conversion from, say i32 to lwe.ciphertext<...stuff..., i32>. Unfortunately, one of the things in “stuff” is a dimension attribute, where some operations have output-dim = input-dim, but others (like multiplication) actually output a ciphertext with input-dim + 1 dimensions. With the current type converter, a simple multiplication conversion pattern would fail as it does not produce the expected type. (We work around this at the moment by also emitting a “reduce ctxt dimensions” operation as part of the pattern, but then we have to do extra work to get rid of those later).

matthias-springer · August 17, 2024, 6:11pm

Status update: Over the last months, I have done a bunch of refactorings, code cleanups and bug fixes in the the dialect conversion code base. Thanks for the reviews, everyone. We had a few breakages in downstream projects. Some of these were due to bad test coverage of the dialect conversion framework in upstream MLIR. But with every breakage, the test coverage is getting better, as we are extending our test suite.

With [mlir][Transforms] Dialect conversion: Make materializations optional by matthias-springer · Pull Request #104668 · llvm/llvm-project · GitHub, automatic argument/source/target materializations through the type converter will become optional. (Commit in review now. Feel free to review or leave comments!)

If enabled (default), argument/source/target materializations are built after the “finalize” phase of the dialect conversion, when all other IR changes have already been materialized. There used to be a quite complex analysis to predict not-yet-materialized IR changes (in particular: future uses of unresolved materializations), that is now no longer needed. See commit message for more details.

Users can now turn off automatic materializations when they are not needed. I’ve seen this in some downstream projects: users had to register materialization functions on the type converter that built unrealized_conversion_cast ops. That will no longer be necessary.

Turning off automatic materializations is also a way to debug failed to legalize unresolved materialization failures. Unresolved materializations will then show up as unrealized_conversion_cast ops in the resulting IR. (An IR that is valid and where all IR changes have been materialized, in contrast to -debug output.)

My current One-Shot Dialect Conversion driver prototype does not have automatic materializations, so this commit was also an important milestone towards compatibility between the two drivers.

j2kun · December 5, 2024, 5:57pm

To add to this, the generic situation we find ourselves in is that we want to do some analysis over an IR, and lower the same type differently based on the result.

Imagine for simplicity each SSA value is assigned an integer, and that materializes to a type attribute in a lower level dialect during the conversion pass.

The way we would have to do this now is by having a pass run the analysis, populate attributes on the ops, and then manually type convert in the pattern.

The downside is that many of the helpers (FunctionOpInterfaceSignatureConversion and similar for interface-backed ops) break because the types of region/block arguments are not able to take into account the op.

Topic		Replies	Views
[RFC] Merging 1:1 and 1:N Dialect Conversions MLIR	9	1274	January 4, 2025
Dialect conversion: type materialization MLIR	19	2659	March 10, 2021
Conversions with Multiple Target Dialects MLIR	13	1013	August 28, 2020
Partial lowering with type conversions MLIR	11	1301	June 9, 2020
Chains of unrealized casts MLIR	7	906	July 29, 2022

[RFC] A New "One-Shot" Dialect Conversion Driver

Related topics