Request for advice: extending type conversion to 1:N conversion

I am in a situation where I think a generic 1:N type conversion utility would be an elegant solution, so I have looked into extending upstream for these cases. Before I spend more time, I’d like to gather input.

Problem

I have a composed type in my dialect that I want to decompose into its constituting parts. For the purpose of this discussion, we can pretend that this type is tuple. Then “decomposing” a tuple<i32,i32> consists of replacing it with two i32s.

Let’s say we have undef, insertvalue, and extractvalue ops for tuple, then the “decomposition” of those essentially consists of forwarding SSA values:

func.func @testUndefInsertExtract(%value: i32) -> i32 {
  %undef = tuple.undef : tuple<i32>
  %inserted = tuple.insertvalue %value into %undef[0] : tuple<i32>
  %extracted = tuple.extractvalue %inserted[0] : tuple<i32>
  return %extracted : i32
}

Turns into:

func.func @testUndefInsertExtract(%value: i32) -> i32 {
  return %value : i32
}

The above example requires some patterns specific to tuple and undef, insertvalue, and extractvalue. Further examples include “decompositions” across ops of func and scf. For example, the following passes a tuple in and out of a function call:

func.func @testFuncCall(%arg : i32) -> (i32, i32) {
  %undef_outer = tuple.undef : tuple<i32, tuple<i32>>
  %undef_inner = tuple.undef : tuple<i32>
  %inserted_inner = tuple.insertvalue %arg into %undef_inner[0] : tuple<i32>
  %inserted_outer_one = tuple.insertvalue %arg into %undef_outer[0] : tuple<i32, tuple<i32>>
  %inserted_outer_two = tuple.insertvalue %inserted_inner into %inserted_outer_one[1] : tuple<i32, tuple<i32>>
  %call_result = func.call @testFuncFunc(%inserted_outer_two) : (tuple<i32, tuple<i32>>) -> tuple<i32, tuple<i32>>
  %extracted_value_one = tuple.extractvalue %call_result[0] : tuple<i32, tuple<i32>>
  %extracted_inner = tuple.extractvalue %call_result[1] : tuple<i32, tuple<i32>>
  %extracted_value_two = tuple.extractvalue %extracted_inner[0] : tuple<i32>
  return %extracted_value_one, %extracted_value_two : i32, i32
}

Through “decomposition” (including the corresponding func.func and func.return ops) this turns into:

func.func @testFuncCall(%arg0: i32) -> (i32, i32) {
  %0:2 = call @testFuncFunc(%arg0, %arg0) : (i32, i32) -> (i32, i32)
  return %0#0, %0#1 : i32, i32
}

Note that populateFunctionOpInterfaceTypeConversionPattern, populateCallOpTypeConversionPattern, populateSCFStructuralTypeConversionsAndLegality, and similar exist to apply type conversions to some ops in func, scf, and similar, but AFAIK, only the latter handles 1:N conversions (and only since very recently).

Current support

While implementing a pass that does the above manually, my impression of the support for 1:N conversions in the current API was the following:

Existing support

  1. TypeConverter::addConversion with arguments of the form std::optional<LogicalResult>(T, SmallVectorImpl<Type> &) supports 1:N conversions.
  2. materialize(Argument|Source|Target)Conversion(OpBuilder &builder, Location loc, Type resultType, ValueRange inputs) supports 1:N conversions (but that isn’t enough, see below).
  3. SignatureConversion allows to express a series of 1:N conversions, and these can be created with convertSignatureArg[s] and similar and then applied to blocks and regions with applySignatureConversion and similar (though not all cases, as a comment indicates).
  4. populateSCFStructuralTypeConversionsAndLegality supports 1:N type conversions for SCF ops since very recently (though with self-rolled solutions for the missing features below).

NB: The only places (except for a few unit tests) where the above is used seems to be the SparseTensor dialect. I have added asserts in addConversion that fail on 1:N conversions and only those start breaking (under check-mlir).

Missing support

  1. TypeConverter::materializeTargetConversion misses an overload for N:M (or at least N:1) conversions, i.e., SmallVector<Value> materializeTargetConversion(OpBuilder &builder, Location loc, TypeRange resultType, ValueRange inputs). I believe that this is required to implement the target materialization of a 1:N conversion, where the “target” types are the N types. I think of it as the “reverse” of the corresponding source materialization, so if the latter is 1:N, the former has to be N:1.
  2. (Op)?ConversionPattern::matchAndRewrite(Operation *op, ArrayRef<Value> operands, ConversionPatternRewriter &rewriter) hard-code 1:1 type conversions in their signature: each entry in operands corresponds to the one type resulting from the conversion from the original type at the same index.
  3. OpConversionPattern::(rewrite|matchAndRewrite)(SourceOp op, OpAdaptor adaptor, ConversionPatternRewriter &rewriter) hard-code 1:1 type conversions via the generated adaptors, which hard-code the fact that each accessor returns one (converted) value.
  4. ConversionPatternRewriterImpl::remapValues simply does not handle 1:N conversions (because existing patterns rely on that behavior, as a comment indicates).

Questions

  1. Is the above analysis of the current state accurate and complete? If not, what is wrong or missing?
  2. What is the current plan for extending the support for 1:N type conversions? Have there been any recent discussions on the topic?
  3. Is anybody actively working on this topic or is planning to do so?
  4. What are possible/preferred solutions to the missing features above?

I could spend some time working on this topic if I get some guidance. After collecting some ideas and advice here, I can work on an RFC.

@mogball @river707 @ftynse @mehdi_amini @nicolasvasilache @PeimingLiu @aartbik

3 Likes

I have recently been in your exact situation. I have a handful of structural types that break apart when they lower into individual SSA values. Implementing this lowering was a lot of incredibly manual work. For most operations that naturally take variadic operands and results (like scf.for, func.func, scf.if, etc), it is easy to implement the in-place expansion as a single rewrite pattern. I can generalize that pattern and upstream it if, if you want.

The more interesting challenge is to be able to generically “transpose” operations that don’t naturally take or return variadic operands or results. For instance, suppose you have value of type struct<tuple<i32, i64>> and you insert and extract values from it. Ideally, the transformation is

// before
%tup = extractvalue %struct[0] : struct<tuple<i32, i64>>

// after
%tup0 = extractvalue %struct[0] : struct<i32, i64>
%tup1 = extractvalue %struct[1] : struct<i32, i64>
%tup = unrealized_conversion_cast %tup0, %tup1 : (i32, i64) to tuple<i32, i64>

Or, for example, ptr<tuple<i32, i64>>:

// before
%tup = load %ptr : ptr<tuple<i32, i64>>

// after
%tup0Ptr = getelementptr %ptr[0, 0] : ptr<struct<i32, i64>>
%tup0 = load %tup0Ptr : ptr<i32>
%tup1Ptr = getelementptr %ptr[0, 1] : ptr<struct<i32, i64>>
%tup1 = load %tup1Ptr : ptr<i64>
%tup = unrealized_conversion_cast %tup0, %tup1 : (i32, i64) to tuple<i32, i64>

The problem is I don’t think this is something that can be generically implemented on all operations, without those operations defining an interface, at least. Changing the conversion patterns, which as you said hard-code 1-to-1 conversions, will pervasively require all conversion patterns to implement a “transpose” if necessary. Unless we intend to make this a core MLIR concept, I think this is quite onerous. In my case, I have a closed set of operations, and I can hardcode the transpose transforms for all of them. It’s not pretty, but I would love to discuss more about this and toss ideas around.

Overall, my conversion setup for this is:

  • A set of patterns to explicitly handle operations for certain aggregate/composed types, like pointers, structs, arrays, etc. with pattern benefit 2.
  • A single pattern that handles all other operations, hopefully, with benefit 1.

I would love to be able to do this generically/automatically too.

2 Likes

Ah, interesting! Your situation sounds indeed quite similar.

Indeed, I guess, if there were patterns for func.func and related ops upstream, that would help me quite a bit already. I guess that those would happen in populateFunctionOpInterfaceTypeConversionPattern and populateCallOpTypeConversionPattern? (Note that the scf ops now have patterns available through populateSCFStructuralTypeConversionsAndLegality.)

I have not thought about a full generalization of the “transposition” of ops yet. It sounds like this requires quite some op-specific logic, which seems difficult to generalize.

I think that the infrastructure for writing such patterns in the presence of 1:N type conversions could be improved, though. I was thinking about implementing some of the “missing features” above. For example, there could be another overload of ConversionPattern::matchAndRewrite for 1:N conversions, maybe with an ArrayRef<ArrayRef<Value>> operands instead of the flat ArrayRef<Value> operands argument or an additional SignatureConversion &conversion argument. Those would then be populated with the N converted values for each original operand – which is something that I am doing manually currently (and I guess, you are, too?).

Yes. I am manually materializing 1:N (and the opposing N:1) conversions. I think we can add overloads of matchAndRewrite or the pattern itself as you suggested, although I’d be curious to know what the missing pieces are exactly. I can try to get a patch up for the general pattern today.

I thought a lot about this like 1.5 years ago, but then never did it (because no need at the time, and other factors). Realistically we can’t uproot the underlying 1-1 API of the rewriter, there isn’t a good modeling that I’ve seen that doesn’t make it get really really awkward/weird. The thing that should happen though, which never did, is adding sugar on top of packing/unpacking 1->N conversions. The rewriter can easily manage the insertion and legalization of these, but just never grew to handle it. IMO we can quite succinctly handle this like the other conversions, i.e. by inserting/unwrapping unrealized_conversion_casts, which can get legalized at the end as necessary. All 1->N patterns I’ve seen (or written) all follow the same flow of pack/unpack, and this is something that should be taken care of either automatically or explicitly (but trivially) by the user. These patterns IMO are a different class from normal patterns, and I don’t see a reason not to treat them differently in the API.

– River

For completeness, the facilities around populateDecomposeCallGraphTypesPatterns should also be mentioned. They seem to implement 1:N decompositions for func.call, func.func, and func.return using a “ValueDecomposer”.

Another small bit of information: The TensorFlow runtime implements their own mechanism for 1:N type conversions here.