Failed to run tensor.insert in scf.for loop

I’m using insertOp to update tensor value, but result in " legalize operation ‘builtin.unrealized_conversion_cast’ that was explicitly marked illegal".

Here’s the code snippet,(which update three numbers for the initial tensor):

func.func @example() → (tensor<4xf64>) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c3 = arith.constant 3 : index
%f0 = arith.constant 0.000000e+00 : f64

%num = arith.constant dense<[37.62, 15.43, -2.401]> : tensor<3xf64>
%index = arith.constant dense<[0, 2, 4]> : tensor<3xindex>

%splat = tensor.splat %f0 : tensor<4xf64>
%newTensor = scf.for %iv = %c0 to %c3 step %c1 
    iter_args(%tmpTensor = %splat) -> (tensor<4xf64>){
  %scalar = tensor.extract %num[%iv] : tensor<3xf64>
  %idx = tensor.extract %index[%iv] : tensor<3xindex>
  %nextTensor = tensor.insert %scalar into %tmpTensor[%idx] : tensor<4xf64>
  scf.yield %nextTensor : tensor<4xf64>
}
return %newTensor : tensor<4xf64>

}

The lowering pass i use:

-tensor-bufferize -arith-expand -linalg-bufferize -tensor-bufferize -convert-linalg-to-loops -func-bufferize -arith-bufferize -convert-scf-to-cf -expand-strided-metadata -memref-expand -arith-expand -convert-cf-to-llvm -convert-arith-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts

The complete error message:

example.mlir:2:11: error: failed to legalize operation ‘builtin.unrealized_conversion_cast’ that was explicitly marked illegal
%c0 = arith.constant 0 : index

When i remove the scf.for loop(also the code in it), everything work fine with the above passes. So it seems like the problem comes from the tensor.insert operation, but result in message for airth.constant. I’m open to any guidance or suggestions you might have, and I’m eager to learn from your expertise.

Unrealized conversion cast errors tend to be not the most informative.
But overall, your snippet looks correct at the first glance. The fact that it lowered all the way to reconcile-unrealized-casts suggests that it might be a problem with some lowering or the pipeline.

So, let’s start by reproducing the error. Running the provided lowering passes on the code snippet, I get similar failed to legalize operation 'builtin.unrealized_conversion_cast' error.
In this case, it is often useful to step back one pass and see what’s the IR state just before the reconcile-unrealized-casts pass.

Rerunning the same passes just without the last one. I’d expect all the ops to be at LLVM dialect. However, quick glace over the IR, brings up some bufferization leftovers:

%75 = bufferization.to_memref %71 : memref<4xf64>
%76 = builtin.unrealized_conversion_cast %75 : memref<4xf64> to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%77 = llvm.extractvalue %37[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 

This suggests that something might be wrong with bufferization or rather how it is invoked. These days it is much better to rely on one shot bufferization.

Let’s replace all the individual bufferization passes with -one-shot-bufferize and also enable its bufferize-function-boundaries option to allow the function parameter to be converted to memref.
The new pipeline will be:

-one-shot-bufferize=bufferize-function-boundaries -arith-expand -convert-linalg-to-loops -convert-scf-to-cf -expand-strided-metadata -memref-expand -arith-expand -convert-cf-to-llvm -convert-arith-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts

This successfully produces IR with all operations lowered the LLVM dialect.

If more unrealized conversion cast errors occurred, the above process could be repeated to identify other missing passes and/or their incorrect order.
When a pipeline produces unexpected or invalid results, it is often also helpful to examine IR after each individual pass or at least some key points like right before and after bufferization.

Hopefully this small debugging walkthrough is enough to give you some insight how to handle similar issues in the future :slight_smile:

Thank you very much for your help. The ‘one-shot-bufferize’ pass is indeed useful, especially work well with tensor dialect. My previous idea was to convert tensors into memrefs outside the ‘scf.for’ loop, and then process the memrefs within the loop. This idea works, but in an ugly way. All other parts of the program operate at the tensor level, and here comes up with a memref. Once again, thank you for your detailed debugging process. I think i can handle other similar problems now(rather than simply trying different passes).

Generally, it is better to avoid mixing tensor and memref abstractions unless there is a strong reason to do it. But occasionally such approach can have its uses.

And if mixing these abstractions is unavoidable, bufferization dialect can help to bridge the two worlds e.g., bufferization.to_memref/to_tensor/materialize_in_destination.