PSA: run -reconcile-unrealized-casts after all -convert-*-to-llvm from now on

I feel that if the need for a PSA was felt then it should have been sent before landing the change.

I also find the current state is a bit unfortunate: I want a conversion to happen (and a conversion that claims to be std to llvm say) but for that I have to somehow know to add a pass that seems completely unrelated (which I think is main reason for this PSA as it would have been difficult to understand why the breakage is occuring as these things). Looking back at when op was added I see: “is generally only intended to be
used to satisfy the temporary intermixing of type systems during the conversion
of one type system to another.” - but this change raises it to outside the conversions for me, it makes it something whose lifetime exceeds a type conversion.

1 Like

I can understand this argument, but past practice was exactly the inverse.

Let me emphasize the part of “during the conversion of one type system to another”. Dialects are not equivalent to type systems so, as long as we have conversions between dialects, the casts will appear as results of those conversions. This pass does not make the op exceed a type conversion, it actually marks the end of said conversion. If we want to never see the casts, the conversion pass must be able to handle all ops that can possibly operate in the original type system, as explained above.

I agree that it might not be obvious that this pass needs to be run, but I argue that it is even less obvious that one needs to run convert-std-to-llvm for the same reason, and do so even if they have no standard operations in the input IR and to convert builtin types! This is also something fixable with documentation, but I haven’t yet got the time to push it out given that I have to repeat the justification stated in the commit message. documents the overall conversion process.

1 Like

Some food for thought: this is an instance of Combining analysis, combining optimizations in disguise.

Looking a bit deeper at the evolution of dialects and conversions, one would realize that unrealized_conversion_casts op is an abstraction that leaks across pass boundaries. IIRC it used to be a tmp custom op with a special named hidden deep inside the conversion infra (at the time it was hard to print IIRC).

The various conversions can be interpreted are “somewhat artificially split along dialect lines”: convertXtoY instead of convertWorldToY. The reasons for those splits are themselves valid (independent dialects, do not include the world etc).

Still the state of the world is that we have a somewhat leaky abstraction today and IMO a sink cleanup pass is significantly better than adding yet more stuff to each conversion.

Note that we have something that is philosophically very similar with all the bufferization passes and similar questions often pop up each time one wants to pass state across pass or pattern boundaries.

I’d love to see some proposals / thoughts about this more general and recurrent problem and how MLIR can help mitigate it.

1 Like

This might just be a matter of naming/discoverability/framing. I believe that everything @ftynse is doing here is consistent with the principles outlined in Type Conversions the Not So Hard Way (slides, recording). If folks aren’t familiar with the best practices for ecosystem-scalable type conversion, I really recommend watching that talk.

Although it is implemented in a slightly different way mechanically, -reconcile-unrealized-casts is exactly the same as FinalizingBufferizePass. In general, all type conversions that happen over the course of multiple passes (usually for layering/incrementality/scalability reasons) will need something like this, and it results in great diagnostics when things go wrong :slight_smile: (no ops/types “slipping through”, etc.)

1 Like

Good point!
That said, since the unrealized_cast are really a part of the dialect conversion infrastructure, could they be handled more specifically by this infra? Could the dialect conversion framework be taught to recognize them and apply special treatment? I haven’t thought about this in as much details as you do, just throwing superficial ideas here :slight_smile:

We’re a bit off-topic here, but past practices have gone both ways (example: PSA: change in pass registration (please update your APIs!) and PSA: breaking change for C++ custom printer/parser and PSA: Change for FuncOp syntax ), we just don’t have discussed the expectations so it is left up to the judgement of the people involved.
It is always easy to address post-commit, even sometimes just revert for the time being to have more discussion when someone has strong concerns with something.

Thanks for pointing this out! I haven’t pushed my thinking that far. The only immediate comment is I wouldn’t necessarily consider unrealized_conversion_cast a leak here, we intend for it to be in the IR until it is cleaned up. Certainly, the op itself has no meaningful runtime semantics.

This reminds me of the slow-running discussions we have on phase ordering and pass granularity in context of (codegen) transformations. We were previously interested in pass granularity that is smaller than one pass per function/module. Here, we can go into another direction: there is a transformation of “type conversion” that consists of multiple passes but should be conceptually treated as one.

Indeed, I had the FinalizingBufferizePass in mind as a mechanism to declare “the type conversion should be complete”.

And thanks for posting the presentation, the partial+finalizing conversion is also the approach taken here. Maybe we should consider turning those slides into a document / tutorial.

It isn’t part of the infrastructure strictly speaking. We configure the type converter to emit unrealized_conversion_casts - I believe the infrastructure would just fail otherwise, at least for source and target materialization. I have also seen direct emission of unrealiezd_conversion_cast in conversion patterns in the wild, although I would argue those patterns weren’t doing the conversion right.

If we want to keep things extensible, we probably don’t want to hardcode the cast operation into the infrastructure itself, there may be more “natural” casts that can be inserted by users. What we could do is to provide some sort of finalization clean-up hook that would run after the normal legalization process and after materialization hooks. This starts resembling the staged pattern approach some Linalg passes use -

One thing that is unclear to me here is how to indicate the place in the pass pipeline where we are no longer expecting to have unrealized casts. Each pass could have an option for this that gets forwarded to the conversion infra, but this felt more complex and intrusive (out-of-tree passes would need to be adapted too, and pass infra doesn’t know that we run dialect conversion) than just having a pass for that purpose.

Slightly orthogonally, I also toyed with the idea of having “verification passes” in context of verifier/analysis dependency discussions. Such passes are only there to ensure that the IR can be consumed by following stages and cleaner error reporting and, e.g., can be omitted in no-asserts builds. In this case, we could have a “verification pass” that checks if the previous passes actually folded away the casts, but this still makes it a pass.

Please throw more ideas if you have them! :slight_smile:

1 Like

I haven’t read the article Nicolas linked (paywall), but I was thinking about this the other day in the context of lowering MHLO->Linalg. IREE has a linalg extension dialect where we have a few ops that are linalg-like, but don’t quite fit into linalg (yet, maybe :stuck_out_tongue:). We currently lower from MHLO->LinalgExt and MHLO->Linalg in two separate passes. As part of handling detensorization and signedness, we need type conversion and in this case only some of the ops get converted, but we’re left with some casts (e.g. casting an mhlo.constant of tensor<i32> to i32, but the constant is lowered later). So I marked unrealized_conversion_cast legal for this lowering. But unrealized_conversion_cast is also legal in MHLO->Linalg (which I hadn’t realized, actually). I like that the patch under discussion makes explicit something that we were already doing implicitly (for the same reason I was pro-unrealized_conversion_cast in the first place). In general, I find it much easier to reason about unconverted IR than about errors coming from the dialect conversion framework (which I was fighting with quite a bit when setting this up). I think the thing we need is better formalization on exactly how far these casts are allowed to persist and an explicit cleanup pass seems like a step in that direction. This is my recent PR working on the aforementioned example:

I also have a patch I sent a while ago to add a generic way of creating verification passes, as Alex mentions, but it got stalled It’s also not quite as nice as I would have liked because it can’t be just a pass you create in your pipeline (as I had originally envisioned), as constructing a ConversionTarget requires a context. I will probably take another look at it, but if someone else is interested in picking that up, or has a better idea, then I’d be happy to cede the effort. Actually looking at it now, I see that ConversionTarget only needs a context in order to construct OperationNames. I wonder if it could switch to using strings and then you could construct it without a context (we did something similar with pass pipelines at some point, IIRC). Not sure if that’s worth it.

This is incorrect. You don’t need to run the -canonicalize pass to have the folding hooks to run. Any greedy rewriter call or the conversion driver will call the folding hooks irrespective of whether any canonicalization patterns are added. For eg. even if you call the greedy rewrite driver with zero patterns (empty pattern list) or any of your patterns, folding will still happen for all registered ops.

With dialect conversion, I thought it was only for ops marked illegal. And type conversions use dialect conversion. So typically only a small set of specifically chosen ops would be folded? Am I missing something here?

I really have strong concerns with this change, the thinking here, and the explanations provided. I would just strongly recommending reverting this patch.

I’m of course pro reverting, if someone has significant concerns, but I actually don’t really understand how your concerns here apply differently to the current state vs the previous state. Previously everything had to run the std-to-llvm and now it instead needs to run this pass. In neither case was this happening within each pass. I certainly understand concerns about where unrealized casts should persist and cross pass boundaries, but that was the case before and after this patch (and something we should continue to discuss). Also AFAIU Sean is correct that dialect conversion doesn’t attempt to fold ops that are already legal. Anyway, I’m signing off for the weekend, but just my 2¢

Strong +1 here. I was as well exactly making these points.

This is a really strange argument. If someone wants to convert std to llvm or whatever else to the llvm dialect, it’s what they would run. Requiring an extra pass for what should have been handled as part of the conversion driver naturally and running such a pass isn’t a change in the right direction I fee.

Folding is always considered a good thing to do — as early as possible by design. This is why the greedy rewrite driver from the very start has always applied the folding hook for every op in its inner worklist iteration irrespective of which patterns are in the list or on which ops the supplied patterns are. If the dialect conversion infra isn’t running the folding hook on all ops, that would be inconsistent with the general thinking and one should consider improving that infra and have the folding hook run so that this naturally happens as part of the conversion infra. We shouldn’t need to run an extra cleanup pass for what should have been logically and naturally handled inside of -convert-std-to-llvm or similar passes like @jpienaar and @nicolasvasilache also strongly echo.

Just to make it clear: it is fine that the maintainer here initially assumed that a post-commit review was sufficient – so this is in line with the LLVM policy. However, I think a major change like this one without exploring the right direction first is out of line and I would request a revert here in line with the policy. We shouldn’t be trying to fix this going further in the forward direction.

I’m just going to answer this in a general way by saying: fixing things using the right approach or discussing the approach first is better than fixing it with some approach. That’s because the changes may not necessarily go in the right direction and cause unnecessary churn – the author/maintainer could have missed other approaches.

Reg. folding, please see my previous response as to why the behavior of the dialect conversion driver should be the same as that of applyPatternsAndFoldGreedily on the folding part. In a worklist-driven rewrite or conversion driver that iterates until a fixed point, if something can be folded away as a result of IR around you changing, there is no point delaying it – it is a good thing to always do that and we already do that pervasively (you already know the amount of transformation in upstream and downstream MLIR that is based on applyPatternsAndFoldGreedily). The point I’ve been making (mostly asking) is that a cycle of unrealized conversion casts can always be folded away using local folding hooks and I’m not sure you are considering that before supporting the approach this revision takes. If all unrealized conversion casts can’t be optimized away, one should strive to drop how many ever possible as soon as possible – having the folding hooks run properly allows that. My concerns are in no way different from those of @jpienaar and @nicolasvasilache – it’s just that I consider them strong enough to warrant a revert.

If we keep going in the right direction, we should never have to run an unrealized-conversion-cast pass separately in the pipeline; it can just be a test pass for testing sake! Progressive lowering doesn’t mean you spill over close to internal temporary abstractions from your pass when you simply didn’t need to – this would be an overdoing and an abuse of the principle. In fact, I feel pass pipeline developers would and should always strive to (and they already do) to not be left with any unrealized conversions cast at the terminus of any correct and logically complete pipeline. So the casts would disappear in some conversion/transformation pass or another.

Why would someone interested in targeting llvm dialect from a set of dialect that does not include std should run -std-to-llvm? (I think this is the argument you’re answering here)

I’m pro reverting and discussing this in more detail as per developer guidelines, since there is a strong objection.

The one thing this pass does that is useful, is that if for whatever reason it is not possible to fully eliminate the unrealized_conversion_cast ops, then it fails compilation. It thus provides a guarantee to later passes that they are gone. In my mind, it is the property that this pass establishes, that the unrealized_conversion_cast ops are gone, which is the really useful thing. And none of the individual partial conversion passes can enforce that.

I think we can debate whether or not all the partial conversions should naturally run such a cleanup, which if possible, should leave no unrealized_conversion_cast ops if possible. Though of course there are many natural times where the cast ops must live between passes, such as a def from dialect a and use from dialect b being converted in a-to-llvm and then b-to-llvm – there is fundamentally a cast op that must live between those passes while only one of the two ops has been converted. And also when discussing the scope of this feature, what is under discussion is really folding away arbitrary type conversion materializations, which might involve arbitrary other ops (such as the ones that happen during bufferization) – so knowing which folds to run “automatically” to eliminate them might be tricky. I think if that’s not fairly trivial to implement, then having the type conversion “finalization” pass such as FinalizingBufferize / reconcile-unrealized-casts additionally have the responsibility to do the final folding is totally reasonable.

One of the great virtues of the dialect conversion framework is that it does exactly what you tell it to based on the legality you specify. Having it randomly fold arbitrary other ops in the program just because it can would be something I extremely strongly oppose. That has numerous issues related to debuggabilty and random “spooky action at a distance”. In fact, I have even filed a feature request to be able to turn off the “fold everything” behavior in applyPatternsAndFoldGreedily! PR49502

1 Like

Partial conversions will consider ops marked illegal. Full conversions will consider ops not marked legal, i.e., they will also process ops that are marked neither way.

It is exactly this. I’m not sure @bondhugula understands the current state of the conversion nor the direction taken with the std splitting.

Let’s say we are converting from a set of dialects A,B,C to the LLVM dialect with multiple passes. The pipeline will contain A->LLVM, B->LLVM and C->LLVM. Even the first pass will create casts. The last pass may or may not remove them. If C happens to be “std”, the corresponding pass will remove remaining casts because it was historically set up to do so because of it being monolithic. If the original set of dialects did not contain “std”, the caller still has to run std-to-llvm because that’s where the current clean-up pattern lives! This only makes sense if we think “std” is somehow privileged, which we explicitly decided against with the std splitting.

We don’t do these conversions with a single pass because of scalability reasons. We have 9 dialects in-tree that convert to LLVM. Do we really want to have passes converting all possible combinations of those?

I would kindly ask you not to imply that I submitted the patch without review. This is wrong. It was reviewed by @nicolasvasilache. I said that I would address post-commit reviews should they appear, as usual.

I agree with Sean here. This folding behavior is desirable often but not always. Given that we are building an infrastructure that intends to be open and adaptable, it should be a choice for the user.

You seem to ignore the details of how the dialect conversion infrastructure operates, and my previous three explanations, and base your argument on how the greedy driver works, which is irrelevant for conversions. Let’s try by example. Given the following.

%0 = "b.op"() : () -> !type
"a.op"(%0) : !type -> ()

We apply a conversion from A to LLVM. The type converter from LLVMCommon is used with, in addition, !type being converted to !llvm.i32. Before applying the pattern to rewrite “a.op”, the infrastructure looks up the operands to be supplied to the pattern (note that ConversionPattern takes a separate list of operands from the operation, unlike the regular RewritePattern, specifically for type conversion purposes). Since the type converter is converting away from !type, the infrastructure interprets it as !type not being legal. Therefore, it calls the target materialization hook to produce an operand of a legal type. In LLVMTypeConverter, the hook produces the cast operation.

/// Incomplete IR state
%0 = "b.op"() : () -> !type
%1 = unrealized_conversion_cast %0 : !type to !llvm.i32
/// ...

As with all operations produced in the conversion process, the infrastructure checks whether the newly created operation is legal on the conversion target. In this case, we must have declared it legal (or dynamically legal with some condition). If it is not legal, the infrastructure will attempt to apply foldings and patterns to make it legal, and fail entirely if it can’t.

The conversion pattern is then applied and the resulting IR is as follows.

%0 = "b.op"() : () -> !type
%1 = unrealized_conversion_cast %0 : !type to !llvm.i32
"llvm.op"(%1) : !llvm.i32 -> ()

The conversion succeeds and the pass terminates.

We now apply a conversion from B to LLVM, with the same type converter as before. There are no operands for “b.op”, so the pattern can be applied directly. The temporary IR state is as follows.

%0_new = "llvm.op"() : () -> !llvm.i32
%0 = "b.op"() : () -> !type
%1 = unrealized_conversion_cast %0 : !type to !llvm.i32
"llvm.op"(%1) : !llvm.i32 -> ()

The infrastructure detects that the conversion pattern changed the type of the result: %0_new and %0 have different types. And it is likely that the users of %0 may become invalid if they are given a value of !llvm.i32 type instead of !type. The infrastructure has no way of knowing whether they accept !llvm.i32 or not. (Incidentally, the previous setup in std-to-llvm would force the operand type change, some leading to incorrect code). Therefore, it calls the source materialization hook. In LLVMTypeConverter, the hook produces another cast operation.

%0_new = "llvm.op"() : () -> !llvm.i32
%0_new_casted = unrealized_conversion_cast %0_new : !llvm.i32 to !type
%0 = "b.op"() : () -> !type
%1 = unrealized_conversion_cast %0 : !type to !llvm.i32
"llvm.op"(%1) : !llvm.i32 -> ()

Now %0_new_casted can be used in place of %0 if necessary. As before, the infrastructure needs to make sure all produced operations are legal. The operation produced in this pass is the one defining %0_new_casted. There are three options, described in the commit message and my post above, but let me repeat again.

  1. unrealized_conversion_cast is legal - nothing happens, legal operations are not modified, the operations persist in the IR.
  2. unrealized_conevrsion_cast is illegal - the infrastructure tries to fold it, the folding does not apply on %0_new_casted, there are no patterns specified for it, the operation cannot be removed or legalized, the conversion process fails entirely.
  3. unrealized_conversion_cast is dynamically legal - it is unclear what condition it should be. When the operation it produced, it may not have any users yet! So the condition of “all users are backward casts” may apply spuriously. We would need a way for the dynamic legality check to look into the guts of the conversion infrastructure to see pending value replacements. I doubt this is desirable, and the complexity of the check will be huge. Alternatively, unrealized_conversion_cast may be conferred a special status so that is “understood” and handled by the infrastructure itself, but that would go against the “as little built-in as possible” principle.

Furthermore, if one decides to run these two passes in the opposite order, the folding will apply! Relying only on folding would create an extremely fragile pass ordering requirement that needs a profoundly deep understanding of how the conversion infrastructure works, which, as my repeated explanations can suggest, shouldn’t be necessary to just run a pass.

It is likely possible to construct an example with three dialects where the folding would never apply.

Upd: when looking for a circular example, I realized that the folding will not apply in the example above if we swap the pass order. The IR between two passes is as follows.

%0 = "llvm.op"() : () -> !llvm.i32
%1 = unrealized_conversion_cast %0 : !llvm.i32 -> !type
"a.op"(%0) : !type -> ()

Since the cast operation is already there in the IR, the infrastructure will try processing it. Same three options are available.

  • unrealized_conversion_cast is legal, nothing will happen, the folding will not be called when the second operation is created because the created operation itself is legal.
  • unrealized_convresion_cast is illegal, the conversion will fail immediately on %1.
  • unrealized_conversion_cast is dynamically legal. The legality check will have to be based on whether the operation is present in the input IR (legal) or is being added by a materialization hook (illegal and should be folded). This is currently impossible.

I can see some merit of @bondhugula’s suggestion here: we may want fold newly created operations regardless of them being legal. This wouldn’t help with the original pass order though.

Given that I myself seem to not understand all the details of conversion when reasoning “on paper” rather than running code, I would claim that we absolutely don’t want pass ordering requirements to be dependent on this. And giving the users a “get rid of these undesired ops” pass is the least complex way.

An alternative way to think about it is so see all -*-to-llvm passes as -*-to-llvm-and-builtin, which is technically true. The -reconcile-unrealized-casts is then merely -builtin-to-llvm, which is how it originally started before I realized that it was not specific to LLVM.