I’m just going to answer this in a general way by saying: fixing things using the right approach or discussing the approach first is better than fixing it with some approach. That’s because the changes may not necessarily go in the right direction and cause unnecessary churn – the author/maintainer could have missed other approaches.
Reg. folding, please see my previous response as to why the behavior of the dialect conversion driver should be the same as that of applyPatternsAndFoldGreedily on the folding part. In a worklist-driven rewrite or conversion driver that iterates until a fixed point, if something can be folded away as a result of IR around you changing, there is no point delaying it – it is a good thing to always do that and we already do that pervasively (you already know the amount of transformation in upstream and downstream MLIR that is based on applyPatternsAndFoldGreedily). The point I’ve been making (mostly asking) is that a cycle of unrealized conversion casts can always be folded away using local folding hooks and I’m not sure you are considering that before supporting the approach this revision takes. If all unrealized conversion casts can’t be optimized away, one should strive to drop how many ever possible as soon as possible – having the folding hooks run properly allows that. My concerns are in no way different from those of @jpienaar and @nicolasvasilache – it’s just that I consider them strong enough to warrant a revert.
If we keep going in the right direction, we should never have to run an unrealized-conversion-cast pass separately in the pipeline; it can just be a test pass for testing sake! Progressive lowering doesn’t mean you spill over close to internal temporary abstractions from your pass when you simply didn’t need to – this would be an overdoing and an abuse of the principle. In fact, I feel pass pipeline developers would and should always strive to (and they already do) to not be left with any unrealized conversions cast at the terminus of any correct and logically complete pipeline. So the casts would disappear in some conversion/transformation pass or another.
Why would someone interested in targeting llvm dialect from a set of dialect that does not include std should run -std-to-llvm? (I think this is the argument you’re answering here)
I’m pro reverting and discussing this in more detail as per developer guidelines, since there is a strong objection.
The one thing this pass does that is useful, is that if for whatever reason it is not possible to fully eliminate the unrealized_conversion_cast ops, then it fails compilation. It thus provides a guarantee to later passes that they are gone. In my mind, it is the property that this pass establishes, that the unrealized_conversion_cast ops are gone, which is the really useful thing. And none of the individual partial conversion passes can enforce that.
I think we can debate whether or not all the partial conversions should naturally run such a cleanup, which if possible, should leave no unrealized_conversion_cast ops if possible. Though of course there are many natural times where the cast ops must live between passes, such as a def from dialect a and use from dialect b being converted in a-to-llvm and then b-to-llvm – there is fundamentally a cast op that must live between those passes while only one of the two ops has been converted. And also when discussing the scope of this feature, what is under discussion is really folding away arbitrary type conversion materializations, which might involve arbitrary other ops (such as the ones that happen during bufferization) – so knowing which folds to run “automatically” to eliminate them might be tricky. I think if that’s not fairly trivial to implement, then having the type conversion “finalization” pass such as FinalizingBufferize / reconcile-unrealized-casts additionally have the responsibility to do the final folding is totally reasonable.
One of the great virtues of the dialect conversion framework is that it does exactly what you tell it to based on the legality you specify. Having it randomly fold arbitrary other ops in the program just because it can would be something I extremely strongly oppose. That has numerous issues related to debuggabilty and random “spooky action at a distance”. In fact, I have even filed a feature request to be able to turn off the “fold everything” behavior in applyPatternsAndFoldGreedily! PR49502
Partial conversions will consider ops marked illegal. Full conversions will consider ops not marked legal, i.e., they will also process ops that are marked neither way.
It is exactly this. I’m not sure @bondhugula understands the current state of the conversion nor the direction taken with the std splitting.
Let’s say we are converting from a set of dialects A,B,C to the LLVM dialect with multiple passes. The pipeline will contain A->LLVM, B->LLVM and C->LLVM. Even the first pass will create casts. The last pass may or may not remove them. If C happens to be “std”, the corresponding pass will remove remaining casts because it was historically set up to do so because of it being monolithic. If the original set of dialects did not contain “std”, the caller still has to run std-to-llvm because that’s where the current clean-up pattern lives! This only makes sense if we think “std” is somehow privileged, which we explicitly decided against with the std splitting.
We don’t do these conversions with a single pass because of scalability reasons. We have 9 dialects in-tree that convert to LLVM. Do we really want to have passes converting all possible combinations of those?
I would kindly ask you not to imply that I submitted the patch without review. This is wrong. It was reviewed by @nicolasvasilache. I said that I would address post-commit reviews should they appear, as usual.
I agree with Sean here. This folding behavior is desirable often but not always. Given that we are building an infrastructure that intends to be open and adaptable, it should be a choice for the user.
You seem to ignore the details of how the dialect conversion infrastructure operates, and my previous three explanations, and base your argument on how the greedy driver works, which is irrelevant for conversions. Let’s try by example. Given the following.
We apply a conversion from A to LLVM. The type converter from LLVMCommon is used with, in addition, !type being converted to !llvm.i32. Before applying the pattern to rewrite “a.op”, the infrastructure looks up the operands to be supplied to the pattern (note that ConversionPattern takes a separate list of operands from the operation, unlike the regular RewritePattern, specifically for type conversion purposes). Since the type converter is converting away from !type, the infrastructure interprets it as !type not being legal. Therefore, it calls the target materialization hook to produce an operand of a legal type. In LLVMTypeConverter, the hook produces the cast operation.
/// Incomplete IR state
%0 = "b.op"() : () -> !type
%1 = unrealized_conversion_cast %0 : !type to !llvm.i32
/// ...
As with all operations produced in the conversion process, the infrastructure checks whether the newly created operation is legal on the conversion target. In this case, we must have declared it legal (or dynamically legal with some condition). If it is not legal, the infrastructure will attempt to apply foldings and patterns to make it legal, and fail entirely if it can’t.
The conversion pattern is then applied and the resulting IR is as follows.
We now apply a conversion from B to LLVM, with the same type converter as before. There are no operands for “b.op”, so the pattern can be applied directly. The temporary IR state is as follows.
The infrastructure detects that the conversion pattern changed the type of the result: %0_new and %0 have different types. And it is likely that the users of %0 may become invalid if they are given a value of !llvm.i32 type instead of !type. The infrastructure has no way of knowing whether they accept !llvm.i32 or not. (Incidentally, the previous setup in std-to-llvm would force the operand type change, some leading to incorrect code). Therefore, it calls the source materialization hook. In LLVMTypeConverter, the hook produces another cast operation.
Now %0_new_casted can be used in place of %0 if necessary. As before, the infrastructure needs to make sure all produced operations are legal. The operation produced in this pass is the one defining %0_new_casted. There are three options, described in the commit message and my post above, but let me repeat again.
unrealized_conversion_cast is legal - nothing happens, legal operations are not modified, the operations persist in the IR.
unrealized_conevrsion_cast is illegal - the infrastructure tries to fold it, the folding does not apply on %0_new_casted, there are no patterns specified for it, the operation cannot be removed or legalized, the conversion process fails entirely.
unrealized_conversion_cast is dynamically legal - it is unclear what condition it should be. When the operation it produced, it may not have any users yet! So the condition of “all users are backward casts” may apply spuriously. We would need a way for the dynamic legality check to look into the guts of the conversion infrastructure to see pending value replacements. I doubt this is desirable, and the complexity of the check will be huge. Alternatively, unrealized_conversion_cast may be conferred a special status so that is “understood” and handled by the infrastructure itself, but that would go against the “as little built-in as possible” principle.
Furthermore, if one decides to run these two passes in the opposite order, the folding will apply! Relying only on folding would create an extremely fragile pass ordering requirement that needs a profoundly deep understanding of how the conversion infrastructure works, which, as my repeated explanations can suggest, shouldn’t be necessary to just run a pass.
It is likely possible to construct an example with three dialects where the folding would never apply.
Upd: when looking for a circular example, I realized that the folding will not apply in the example above if we swap the pass order. The IR between two passes is as follows.
Since the cast operation is already there in the IR, the infrastructure will try processing it. Same three options are available.
unrealized_conversion_cast is legal, nothing will happen, the folding will not be called when the second operation is created because the created operation itself is legal.
unrealized_convresion_cast is illegal, the conversion will fail immediately on %1.
unrealized_conversion_cast is dynamically legal. The legality check will have to be based on whether the operation is present in the input IR (legal) or is being added by a materialization hook (illegal and should be folded). This is currently impossible.
I can see some merit of @bondhugula’s suggestion here: we may want fold newly created operations regardless of them being legal. This wouldn’t help with the original pass order though.
Given that I myself seem to not understand all the details of conversion when reasoning “on paper” rather than running code, I would claim that we absolutely don’t want pass ordering requirements to be dependent on this. And giving the users a “get rid of these undesired ops” pass is the least complex way.
An alternative way to think about it is so see all -*-to-llvm passes as -*-to-llvm-and-builtin, which is technically true. The -reconcile-unrealized-casts is then merely -builtin-to-llvm, which is how it originally started before I realized that it was not specific to LLVM.
I don’t mind reverting if given specific concerns applicable to the patch in question. Otherwise, reverting it will not address those concerns and will incur even more churn on everybody.
In particular, I have seen the following:
We don’t want unrealized_conversion_cast to “leak” across passes. This was already happening before this patch. The objection is to either splitting the std or unrealized cast itself. In the worst case, it could apply to the first split-out of std-to-llvm, probably memref-to-llvm (which incidentally also changed the same tests), but it landed a couple of months ago with no reaction whatsoever.
The conversion should run the folding hook differently that it does. The patch in question does not change how the conversion infrastructure works.
There is another way achieve the same behavior. I am yet so see a concrete proposal how, preferably with code. I have repeatedly demonstrated that “it should be possible with local folding” is incorrect beyond a purely theoretical argument and is based on a misunderstanding of how conversions work.
The only concern that applies to the patch in question is that it landed too fast for somebody’s taste. I would actually be happy if all reviews were faster.
There is another way to reason about this change in context of dialect layering if we drop the assumption that unrealized_conversion_cast is special. It is just an op in the “builtin” dialect. All -convert-X-to-llvm passes are, in fact, -convert-X-to-llvm-and-builtin because they produce operations from llvm and builtin dialects. We need to further lower those builtin operations to the LLVM dialect. This is can be done by an appropriate pass, similarly to any other conversion. This pass is just, perhaps misleadingly, called -reconcile-unrealized-casts whereas it should have been called -convert-builtin-to-llvm. Ironically, this was my initial take on this as River noticed.
Therefore, I propose the following:
create a -convert-builtin-to-llvm that removes the cast chains specifically between built-in and LLVM dialect types, and leaves the other casts alone if they are present;
turn -reconcile-conversion-casts into a test pass for the pattern + folding and use that as testbed to further improve the conversion infrastructure if necessary.
Should we have a poll? I find “all or nothing” conversion passes to be impossibly hard to work with, and this change is a huge quality of life improvement!
I’d like to nuance this: there is a limit to the hyper configurability of what we offer. The more your add, the harded it becomes to keep something as a consistent ensemble and get a system that is understandable, debuggable, and just usable.
In particular, to make something configurable I’d rather have have a strong justification. Sean wrote “Having it randomly fold arbitrary other ops in the program just because it can would be something I extremely strongly oppose” ; which is fair but we’re not talking about arbitrary folds here, we’re talking about the conversion infrastructure introducing cast and reconciling them as it goes: this is very specific.
So, while I’d like to see the infrastructure improve on the handling of casts, I’d also like to mention that I agree with @ftynse that this is all unrelated to this patch, which really is just splitting a feature outside of -std-to-llvm, which makes conceptually sense to me considering how the system works today.
I totally agree with the nuanced view here!
There are two details that look important to me in context of the conversion infra (it’s us not the infra who add the casts, the infra would probably need to keep state across passes), but I think this discussion deserves a separate thread or even an ODM for better visibility.
Yep. Totally agreed. If it is possible for the conversion infra to magically do the right folding just for inserted type materializations, that sounds interesting. However, it seems that is nontrivial (highly so, actually) because:
the materializations can insert arbitrary ops, so knowning “what is a materialization” that should be a candidate for special folding is hard (impossible?). I don’t think we want to special case unrealized_conversion_cast here – that op was simply intended as a convenience if a more specific ops wasn’t present, not as “the one true dialect conversion cast op everybody must use, or else the infra doesn’t work properly”.
As @ftynse mentioned, given how the conversion infra works, it seems difficult simply from a mechanical perspective to know when / to what ops to apply this special folding.
As @ftynse mentioned, the way that this all works is actually very complex, and I’m not seeing a simple, easy-for-users-to-understand way to trigger the folding. Saying “always add -reconcile-unrealized-casts to the end of your LLVM conversion pipeline” is easy and in line with what users already expect from bufferization and best practices for type conversions in general.
I like this reasoning. I think it even more clearly argues for the direction in the patch.
Are there any specific concerns for the direction in this patch? I think we have sufficiently established that there is no straightforward way for these folds to happen within the individual -*-to-llvm passes themselves. Without a concrete suggestion for how to do that (ideally a patch), the whole conversation about removing -reconcile-unrealized-casts seems to be a non-starter. (I consider splitting -reconcile-unrealized-casts out from -std-to-llvm to be clearly desirable – we shouldn’t privilege -std-to-llvm and longer-term there is no -std-to-llvm pass at all ).
+1 I am much more concerned about reverting the patch than keeping it: this is a net improvement given the status of dialect splitting and my previous argument.
One desirable evolution I’d see coming out of this would be to:
rewire -reconcile-unrealized-casts into a new convert-to-X pass that adds the unrealized casts by default.
dynamically traverse all the registered dialects Y and populate conversion patterns with the X-to-Y
keep the convert-X-to-Y for testing and one-off purposes but document that they are not the preferred way to connect things e2e.
This could work via a dialect interface that connects to populateXToLLVMPatterns with two caveats:
Either the conversion must be final/full, i.e. there are no type casts possible as a result, with the expectation that no further conversions to LLVM happen, or the pass cannot guarantee that the conversion is complete. Otherwise we run into the same issue of specifying the legality condition for the cast.
Only the last stage of the conversion will happen. That is, if the input contains linalg.generic or affine.for, we will have to first convert those to SCF, then convert SCF to branches and only then we get something that is convertible to LLVM.
(1) doesn’t look very bad given that conversions to LLVM usually are final and the mechanism you propose would also work for out-of-tree dialects. (2) is slightly more complex. We can say that dialects implementing the hypothetical “ConvertToLLVMInterface” should also provide patterns for lowering to and from all the intermediate stages, but this puts us back in the land of pattern sets subsuming each other with the potential of having duplicated or, worse, conflicting patterns.
I think such a super-conversion pass can be provided as opt-in functionality if desired but I wouldn’t go as far as making “convert-X-to-LLVM” a test pass only.
Let me make things more concrete. You don’t need to run folding on all unrelated ops.
At the end of applyPartialConversion or applyFullConversion, collect all “unrealized conversion cast” ops.
Run the following method with an empty patterns list.
/// Applies the specified rewrite patterns on `ops` alone while also trying to
/// fold these ops as well any other ops that were in turn created due to such
/// rewrites. Furthermore, any pre-existing ops in the IR outside of `ops`
/// remain completely untouched if `strict` is set to true. If `strict` is
/// false, other operations that use results of rewritten ops or supply operands
/// to such ops are in turn simplified; any other ops still remain untouched
/// (i.e., regardless of `strict`). Note that ops in `ops` could be erased as a
/// result of folding, becoming dead, or via pattern rewrites. If more far
/// reaching simplification is desired, applyPatternsAndFoldGreedily should be
/// used. Returns true if at all any IR was rewritten.
bool applyOpPatternsAndFold(ArrayRef<Operation *> ops,
const FrozenRewritePatternSet &patterns,
bool strict);``
This will fold only the unrealized conversion cast ops (until fixed point) and only the unrealized conversion cast ops. This is just a naive solution but it does all the necessary folding and cleanup right away towards the end of the conversion.
@_sean_silva But there is a whole playground of solutions you are overlooking here – we don’t have to fold unrelated ops. In fact, I prefer having the option of not running folding on unrelated ops (even with the usual rewrite driver) and this is already possible with two of the rewriter driver API methods. The greedy rewrite driver is not the same asapplyPatternsAndFoldGreedily – there are two more methods that provide very controlled behavior (see the second one below that actually uses a worklist and iterates until fixed point). So, if you want to do folding on ops of a specific derived OpTy, that’s possible today – although it may not be the most efficient way if it has to be done in a fully integrated manner in the conversion infra. Because you’ll have to make a pass to collect ops of interest.
/// Applies the specified patterns on `op` alone while also trying to fold it,
/// by selecting the highest benefits patterns in a greedy manner. Returns
/// success if no more patterns can be matched. `erased` is set to true if `op`
/// was folded away or erased as a result of becoming dead. Note: This does not
/// apply any patterns recursively to the regions of `op`.
LogicalResult applyOpPatternsAndFold(Operation *op,
const FrozenRewritePatternSet &patterns,
bool *erased = nullptr);
/// Applies the specified rewrite patterns on `ops` alone while also trying to
/// fold these ops as well any other ops that were in turn created due to such
/// rewrites. Furthermore, any pre-existing ops in the IR outside of `ops`
/// remain completely untouched if `strict` is set to true. If `strict` is
/// false, other operations that use results of rewritten ops or supply operands
/// to such ops are in turn simplified; any other ops still remain untouched
/// (i.e., regardless of `strict`). Note that ops in `ops` could be erased as a
/// result of folding, becoming dead, or via pattern rewrites. If more far
/// reaching simplification is desired, applyPatternsAndFoldGreedily should be
/// used. Returns true if at all any IR was rewritten.
bool applyOpPatternsAndFold(ArrayRef<Operation *> ops,
const FrozenRewritePatternSet &patterns,
bool strict);
Just run the second method with an empty patterns list if all you need to do is in-place updates or fold’s on a given list of ops.
I’m not sure anyone is arguing in favor of “all or nothing”. If there are conversion casts that will spill out given the conversions that are possible, it’s good for the developer to be able to see those. (Some passes for eg. even use an option: like -xla-legalize-tf='allow-partial-conversion' to allow both.) The debate here is about where and how early you reconcile those unrealized conversion cast ops. I was arguing that we should not be delaying whatever is reconcilable and explore performing those as part of the conversion pass transparently. The extra pass added would be a “no op” then.
This is a good idea! There are several conceptual issues with the naive implementation though:
It would break the contracts that the dialect conversion currently offers: operations declared legal will not be modified; if a full conversion succeeded, the resulting IR only contains legal operations (it is possible, although unlikely in practice, to have an op that is dynamically legal only if its operands are produced by a cast, removing casts makes it illegal).
The conversion infra will have to be made aware of unrealized_conversion_cast to be able to treat it specially. This is the kind of coupling we try to avoid.
There is still no mechanism to fail the conversion unless all relevant casts were removed. We can traverse the IR one more time to check that, but it sounds like too much overhead.
There is a cost to having this feature. We don’t want all conversion passes, including those that don’t convert types, to do one or two more sweeps of the IR. We could have a flag for this behavior, but there are concerns about having too much configurability.
The first two issues boil down to whether we make unrealized_conversion_cast a special op known to the infrastructure that is processed differently from everything else. If we do, we should also make the infrastructure produce it automatically rather than require its users to set up materialization hooks to produce specifically this operation. I like the usage simplification this might bring as long as the infrastructure can operate smoothly with it, exposing the complexity of special handling for a specific op on top of the already significant complexity of the dialect conversion is worrying. There are also users of conversion, e.g., bufferization, that use more semantics-bearing casts in a similar pass structure. An argument can be made that those users can pay the complexity costs for the feature they need, but not everybody should.
I assume Uday’s proposal was to add such a call to individual passes: after doing dialect conversion, each pass could clean up after itself by folding away unrealized_conversion_cast ops that may have been created. I think it would still be valuable to have a pass at the end that of any lowering that has a well defined target that should be reached that reports errors if illegal ops still exist (see the patch I linked earlier and could dust off again if there’s interest). Whether each pass should do its best to clean up after itself is something I could go either way on. It seems like it could have bad performance implications (since it would need to walk the whole IR again), which may not be worth it.
No. My proposal was of making it transparent to the individual passes. So it happens inside of apply*Conversion (and outside only if one has custom type casts or perhaps other interactions - even there, there are other solutions with interfaces if needed). I think it’s pretty premature to think of this approach as adding an inefficiency without thinking through further starting from the naive approach of controlled folding on the “desired ops”. There’s really a large playground of solutions that I think goes in the right direction and I’d claim would be more efficient and directed – it’s only doing the necessary folding and that too early. To repeat, we don’t need to be handling this folding of the conversion casts in a separate pass because: (A) they logically fit into the conversion, and (B) trying to separate out such things leads eventually to the risk of introducing a subtle phase ordering issue in the event the folding of the unrealized conversion casts has itself an interaction with the conversion pass that it should have been part of.
It does look like everyone here missed the existence of the applyOp* rewriter API methods and prematurely concluded the folds couldn’t be done in a controlled way on only the desired ops or equated the rewrite driver to “either fold everything everywhere or nothing”.