[mlir][vector distribution] WarpOpScfForOp fails when scf.for has results that are unused

charitha22 · May 17, 2025, 12:07am

I have the following code example that I would like to distribute to threads using the vector-warp-distribute pass.

func.func @warp_scf_for(%arg0: index) {
  %c128 = arith.constant 128 : index
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %0 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<4xf32>) {
    %ini = "some_def"() : () -> (vector<128xf32>)
    %k = arith.constant 0 : index
    %3:2 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini, %arg5 = %k) -> (vector<128xf32>, index) {
      %add = arith.addi %arg3, %c1 : index
      %t = arith.addi %k , %c1 : index
      %acc = "some_def"(%add, %arg4, %t) : (index, vector<128xf32>, index) -> (vector<128xf32>)
      scf.yield %acc, %t : vector<128xf32>, index
    }
    gpu.yield %3 : vector<128xf32>
  }
  "some_use"(%0) : (vector<4xf32>) -> ()
  return
}

Currently this example fails with following trace,

llvm-project/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:2301: static void mlir::gpu::WarpExecuteOnLane0Op::build(mlir::OpBuilder&, mlir::OperationState&, mlir::TypeRange, mlir::Value, int64_t, mlir::ValueRange, mlir::TypeRange): Assertion `args.size() == blockArgTypes.size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

It looks like WarpOpScfForOp implicitly require that all the yielded results of scf.for are used by someone. Here %t is only used inside the for body but not outside. Is this a bug in the implementation or is there a way to avoid this issue?

ftynse · May 22, 2025, 9:51am

In any case, this is a bug because a compiler pass must not crash on valid IR.

Looking specifically, this seems to be coming from the calling WarpExecuteOnLane0Op::build with incorrectly-sized arguments, which means its caller is doing something wrong.

charitha22 · May 28, 2025, 8:46pm

Hi Alex,

Thanks for the response.

I have created a fix for this issue here: [mlir][vector] Fix for WarpOpScfForOp failure when scf.for has results that are unused. by charithaintc · Pull Request #141853 · llvm/llvm-project · GitHub

If possible, please help with tagging the reviewers that are actively working on or familiar with this code.

Thanks
Charitha (Intel MLIR team)

Topic		Replies	Views
Confused about -convert-parallel-loops-to-gpu MLIR	1	188	March 6, 2024
Failure converting `scf.yield` operation MLIR	13	436	June 23, 2023
Seeking Guidance on Executing MLIR Code with GPU Dialect on GPU Beginners gpu , nvptx , mlir	2	99	March 28, 2025
[RFC] Move execute_on_lane_0 from vector to gpu dialect MLIR gpu , mlir	5	104	November 11, 2024
SCFToGPU convertion -convert-parallel-loops-to-gpu MLIR gpu	3	695	June 19, 2023

[mlir][vector distribution] WarpOpScfForOp fails when scf.for has results that are unused

Related topics