Request for advice: extending type conversion to 1:N conversion

I am in a situation where I think a generic 1:N type conversion utility would be an elegant solution, so I have looked into extending upstream for these cases. Before I spend more time, I’d like to gather input.

Problem

I have a composed type in my dialect that I want to decompose into its constituting parts. For the purpose of this discussion, we can pretend that this type is tuple. Then “decomposing” a tuple<i32,i32> consists of replacing it with two i32s.

Let’s say we have undef, insertvalue, and extractvalue ops for tuple, then the “decomposition” of those essentially consists of forwarding SSA values:

func.func @testUndefInsertExtract(%value: i32) -> i32 {
  %undef = tuple.undef : tuple<i32>
  %inserted = tuple.insertvalue %value into %undef[0] : tuple<i32>
  %extracted = tuple.extractvalue %inserted[0] : tuple<i32>
  return %extracted : i32
}

Turns into:

func.func @testUndefInsertExtract(%value: i32) -> i32 {
  return %value : i32
}

The above example requires some patterns specific to tuple and undef, insertvalue, and extractvalue. Further examples include “decompositions” across ops of func and scf. For example, the following passes a tuple in and out of a function call:

func.func @testFuncCall(%arg : i32) -> (i32, i32) {
  %undef_outer = tuple.undef : tuple<i32, tuple<i32>>
  %undef_inner = tuple.undef : tuple<i32>
  %inserted_inner = tuple.insertvalue %arg into %undef_inner[0] : tuple<i32>
  %inserted_outer_one = tuple.insertvalue %arg into %undef_outer[0] : tuple<i32, tuple<i32>>
  %inserted_outer_two = tuple.insertvalue %inserted_inner into %inserted_outer_one[1] : tuple<i32, tuple<i32>>
  %call_result = func.call @testFuncFunc(%inserted_outer_two) : (tuple<i32, tuple<i32>>) -> tuple<i32, tuple<i32>>
  %extracted_value_one = tuple.extractvalue %call_result[0] : tuple<i32, tuple<i32>>
  %extracted_inner = tuple.extractvalue %call_result[1] : tuple<i32, tuple<i32>>
  %extracted_value_two = tuple.extractvalue %extracted_inner[0] : tuple<i32>
  return %extracted_value_one, %extracted_value_two : i32, i32
}

Through “decomposition” (including the corresponding func.func and func.return ops) this turns into:

func.func @testFuncCall(%arg0: i32) -> (i32, i32) {
  %0:2 = call @testFuncFunc(%arg0, %arg0) : (i32, i32) -> (i32, i32)
  return %0#0, %0#1 : i32, i32
}

Note that populateFunctionOpInterfaceTypeConversionPattern, populateCallOpTypeConversionPattern, populateSCFStructuralTypeConversionsAndLegality, and similar exist to apply type conversions to some ops in func, scf, and similar, but AFAIK, only the latter handles 1:N conversions (and only since very recently).

Current support

While implementing a pass that does the above manually, my impression of the support for 1:N conversions in the current API was the following:

Existing support

  1. TypeConverter::addConversion with arguments of the form std::optional<LogicalResult>(T, SmallVectorImpl<Type> &) supports 1:N conversions.
  2. materialize(Argument|Source|Target)Conversion(OpBuilder &builder, Location loc, Type resultType, ValueRange inputs) supports 1:N conversions (but that isn’t enough, see below).
  3. SignatureConversion allows to express a series of 1:N conversions, and these can be created with convertSignatureArg[s] and similar and then applied to blocks and regions with applySignatureConversion and similar (though not all cases, as a comment indicates).
  4. populateSCFStructuralTypeConversionsAndLegality supports 1:N type conversions for SCF ops since very recently (though with self-rolled solutions for the missing features below).

NB: The only places (except for a few unit tests) where the above is used seems to be the SparseTensor dialect. I have added asserts in addConversion that fail on 1:N conversions and only those start breaking (under check-mlir).

Missing support

  1. TypeConverter::materializeTargetConversion misses an overload for N:M (or at least N:1) conversions, i.e., SmallVector<Value> materializeTargetConversion(OpBuilder &builder, Location loc, TypeRange resultType, ValueRange inputs). I believe that this is required to implement the target materialization of a 1:N conversion, where the “target” types are the N types. I think of it as the “reverse” of the corresponding source materialization, so if the latter is 1:N, the former has to be N:1.
  2. (Op)?ConversionPattern::matchAndRewrite(Operation *op, ArrayRef<Value> operands, ConversionPatternRewriter &rewriter) hard-code 1:1 type conversions in their signature: each entry in operands corresponds to the one type resulting from the conversion from the original type at the same index.
  3. OpConversionPattern::(rewrite|matchAndRewrite)(SourceOp op, OpAdaptor adaptor, ConversionPatternRewriter &rewriter) hard-code 1:1 type conversions via the generated adaptors, which hard-code the fact that each accessor returns one (converted) value.
  4. ConversionPatternRewriterImpl::remapValues simply does not handle 1:N conversions (because existing patterns rely on that behavior, as a comment indicates).

Questions

  1. Is the above analysis of the current state accurate and complete? If not, what is wrong or missing?
  2. What is the current plan for extending the support for 1:N type conversions? Have there been any recent discussions on the topic?
  3. Is anybody actively working on this topic or is planning to do so?
  4. What are possible/preferred solutions to the missing features above?

I could spend some time working on this topic if I get some guidance. After collecting some ideas and advice here, I can work on an RFC.

@mogball @river707 @ftynse @mehdi_amini @nicolasvasilache @PeimingLiu @aartbik

3 Likes

I have recently been in your exact situation. I have a handful of structural types that break apart when they lower into individual SSA values. Implementing this lowering was a lot of incredibly manual work. For most operations that naturally take variadic operands and results (like scf.for, func.func, scf.if, etc), it is easy to implement the in-place expansion as a single rewrite pattern. I can generalize that pattern and upstream it if, if you want.

The more interesting challenge is to be able to generically “transpose” operations that don’t naturally take or return variadic operands or results. For instance, suppose you have value of type struct<tuple<i32, i64>> and you insert and extract values from it. Ideally, the transformation is

// before
%tup = extractvalue %struct[0] : struct<tuple<i32, i64>>

// after
%tup0 = extractvalue %struct[0] : struct<i32, i64>
%tup1 = extractvalue %struct[1] : struct<i32, i64>
%tup = unrealized_conversion_cast %tup0, %tup1 : (i32, i64) to tuple<i32, i64>

Or, for example, ptr<tuple<i32, i64>>:

// before
%tup = load %ptr : ptr<tuple<i32, i64>>

// after
%tup0Ptr = getelementptr %ptr[0, 0] : ptr<struct<i32, i64>>
%tup0 = load %tup0Ptr : ptr<i32>
%tup1Ptr = getelementptr %ptr[0, 1] : ptr<struct<i32, i64>>
%tup1 = load %tup1Ptr : ptr<i64>
%tup = unrealized_conversion_cast %tup0, %tup1 : (i32, i64) to tuple<i32, i64>

The problem is I don’t think this is something that can be generically implemented on all operations, without those operations defining an interface, at least. Changing the conversion patterns, which as you said hard-code 1-to-1 conversions, will pervasively require all conversion patterns to implement a “transpose” if necessary. Unless we intend to make this a core MLIR concept, I think this is quite onerous. In my case, I have a closed set of operations, and I can hardcode the transpose transforms for all of them. It’s not pretty, but I would love to discuss more about this and toss ideas around.

Overall, my conversion setup for this is:

  • A set of patterns to explicitly handle operations for certain aggregate/composed types, like pointers, structs, arrays, etc. with pattern benefit 2.
  • A single pattern that handles all other operations, hopefully, with benefit 1.

I would love to be able to do this generically/automatically too.

2 Likes

Ah, interesting! Your situation sounds indeed quite similar.

Indeed, I guess, if there were patterns for func.func and related ops upstream, that would help me quite a bit already. I guess that those would happen in populateFunctionOpInterfaceTypeConversionPattern and populateCallOpTypeConversionPattern? (Note that the scf ops now have patterns available through populateSCFStructuralTypeConversionsAndLegality.)

I have not thought about a full generalization of the “transposition” of ops yet. It sounds like this requires quite some op-specific logic, which seems difficult to generalize.

I think that the infrastructure for writing such patterns in the presence of 1:N type conversions could be improved, though. I was thinking about implementing some of the “missing features” above. For example, there could be another overload of ConversionPattern::matchAndRewrite for 1:N conversions, maybe with an ArrayRef<ArrayRef<Value>> operands instead of the flat ArrayRef<Value> operands argument or an additional SignatureConversion &conversion argument. Those would then be populated with the N converted values for each original operand – which is something that I am doing manually currently (and I guess, you are, too?).

Yes. I am manually materializing 1:N (and the opposing N:1) conversions. I think we can add overloads of matchAndRewrite or the pattern itself as you suggested, although I’d be curious to know what the missing pieces are exactly. I can try to get a patch up for the general pattern today.

I thought a lot about this like 1.5 years ago, but then never did it (because no need at the time, and other factors). Realistically we can’t uproot the underlying 1-1 API of the rewriter, there isn’t a good modeling that I’ve seen that doesn’t make it get really really awkward/weird. The thing that should happen though, which never did, is adding sugar on top of packing/unpacking 1->N conversions. The rewriter can easily manage the insertion and legalization of these, but just never grew to handle it. IMO we can quite succinctly handle this like the other conversions, i.e. by inserting/unwrapping unrealized_conversion_casts, which can get legalized at the end as necessary. All 1->N patterns I’ve seen (or written) all follow the same flow of pack/unpack, and this is something that should be taken care of either automatically or explicitly (but trivially) by the user. These patterns IMO are a different class from normal patterns, and I don’t see a reason not to treat them differently in the API.

– River

For completeness, the facilities around populateDecomposeCallGraphTypesPatterns should also be mentioned. They seem to implement 1:N decompositions for func.call, func.func, and func.return using a “ValueDecomposer”.

Another small bit of information: The TensorFlow runtime implements their own mechanism for 1:N type conversions here.

@River: Can you elaborate on what is awkward/weird (in extending the rewriter to 1:1 conversions?) and why 1:N conversion patterns are fundamentally different from other conversion patterns?

TL;DR: I think that there is a natural extension of the existing type and dialect conversion to 1:N type conversions where 1:N patterns essentially work the same way as 1:1 patterns. The lack of support for 1:N conversion currently forces pattern writers to add unpack/pack logic, which I think are really source/target/argument materializations that the dialect conversion could add automatically where necessary the same way it does that for 1:1 conversions.

To illustrate my view, let me add another example to the discussion, as I think the initial tuple example doesn’t cover all interesting aspects. Let’s say we want to decompose complex<T> into two Ts, namely, for the real and imaginary part, respectively, and convert math from the complex dialect to arith-based math on those parts. (That is basically what the ComplexToStandard transform does.)

I find it natural to think of this as a conversion. For example:

%0 = ...
%1 = complex.neg %0: complex<f32>
// uses of %1

turns into

%0 = ... // re
%1 = ... // im
$2 = arith.negf %0 : f32
%3 = arith.negf %1 : f32
// uses of %2 and %3

Currently, the patterns in ComplexToStandard indeed consist of unpack/pack logic (namely via comlex.re+complex.im and complex.create). In a sequence of complex ops, this results in repeated packing and re-unpacking. For example:

$0 = ...
%1 = complex.neg %0: complex<f32>
%2 = complex.add %1, %0: complex<f32>
%3 = complex.neg %2 : complex<f32>
// uses of %3

turns into

%0 = ...
%1 = complex.re %0 : complex<f32>            // unpack
%2 = complex.im %0 : complex<f32>            // unpack
%3 = arith.negf %1 : f32
%4 = arith.negf %2 : f32
%5 = complex.create %3, %4 : complex<f32>    // pack
%6 = complex.re %5 : complex<f32>            // unpack
%7 = complex.re %0 : complex<f32>            // unpack
%8 = arith.addf %6, %7 : f32
%9 = complex.im %5 : complex<f32>            // unpack
%10 = complex.im %0 : complex<f32>           // unpack
%11 = arith.addf %9, %10 : f32
%12 = complex.create %8, %11 : complex<f32>  // pack
%13 = complex.re %12 : complex<f32>          // unpack
%14 = complex.im %12 : complex<f32>          // unpack
%15 = arith.negf %13 : f32
%16 = arith.negf %14 : f32
%17 = complex.create %15, %16 : complex<f32> // pack
// uses of %17

Arguably, what we really want is this:

%0 = ... // re
%1 = ... // im
%2 = arith.negf %0 : f32
%3 = arith.negf %1 : f32
%4 = arith.addf %2, %0 : f32
%5 = arith.addf %3, %1 : f32
%6 = arith.negf %4 : f32
%7 = arith.negf %5 : f32
// uses of %6 and %7

We can, of course, get there by adding -canonicalize -cse but I am wondering why don’t we produce this directly? To me, it looks like we are forcing (manual!) materialization between every two ops. Why? After all, we could do 1:1 type conversions with source and target materializations between every two ops as well but we don’t!

Note that materializations also make sense with 1:N type conversions, and I see them taking the same role as the one I understand they have for 1:1 type conversions. Say we have the following function:

func.func @complex_neg(%arg: complex<f32>) -> complex<f32> {
  %neg = complex.neg %arg: complex<f32>
  return %neg : complex<f32>
}

If we only change the types in the func ops, we get:

func.func @g(%arg0: f32, %arg1: f32) -> (f32, f32) {
  %0 = complex.create %arg0, %arg1 : complex<f32>  // "argument materialization"
  %1 = complex.neg %0: complex<f32>
  %2 = complex.re %1 : complex<f32>                // "target materialization"
  %3 = complex.im %1 : complex<f32>                // "target materialization"
  return %2, %3 : f32, f32
}

If we only convert the complex ops, we get:

func.func @f(%arg0: complex<f32>) -> complex<f32> {
  %0 = complex.re %arg0 : complex<f32>         // "target materialization"
  %1 = complex.im %arg0 : complex<f32>         // "target materialization"
  %2 = arith.negf %0 : f32
  %3 = arith.negf %1 : f32
  %4 = complex.create %2, %3 : complex<f32>    // "source materialization"
  return %4 : complex<f32>
}

If we convert both, we get:

func.func @g(%arg0: f32, %arg1: f32) -> (f32, f32) {
  %0 = arith.negf %arg0 : f32
  %1 = arith.negf %arg1 : f32
  return %0, %1 : f32, f32
}

Note that no conversion or materialization whatsoever is required in the last case.

If conversion patterns were extended to 1:N type conversions, they’d be relieved from the responsibility to pack and unpack, they would remove the need for canonicalizations and foldings to get rid of the packs/unpacks, and they may even remove the need for defining pack/unpack ops altogether.

2 Likes

Another instance of 1:N type conversion: IREE used to use a TypeExpander in their signature expansion pass.

1 Like

For posteriority: I have implemented much of the above in D144469 and D147225.

1 Like

I believe that this or something similar is happening here