Failure On Lower scf.parallel and scf.for

Jinjie · June 1, 2024, 10:25am

Hi, I’m working on lowering the Linalg dialect to the LLVM Dialect. I’m trying to parallelize the matrix multiplication by OpenMP but have met some difficulties. My codes follow here.

func.func @matmul(%input: memref<2x2xf32>, %output: memref<2x2xf32>) {
  %init = arith.constant dense<[[1.0, 2.0], [3.0, 4.0]]>: tensor<2x2xf32>
  %init_buf = bufferization.to_memref %init: memref<2x2xf32>
  linalg.matmul ins(%input, %init_buf: memref<2x2xf32>, memref<2x2xf32>) outs(%output: memref<2x2xf32>)
  func.return
}

And my instruction follows.

mlir-opt-19 -convert-linalg-to-parallel-loops -convert-scf-to-openmp -convert-scf-to-cf matmul.mlir

It reports an error.

matmul.mlir:4:3: error: 'memref.alloca_scope' op expects region #0 to have 0 or 1 blocks
  linalg.matmul ins(%input, %init_buf: memref<2x2xf32>, memref<2x2xf32>) outs(%output: memref<2x2xf32>)
  ^
matmul.mlir:4:3: note: see current operation: 
"memref.alloca_scope"() ({
  "cf.br"(%2)[^bb1] : (index) -> ()
^bb1(%6: index):  // 2 preds: ^bb0, ^bb2
  %7 = "arith.cmpi"(%6, %1) <{predicate = 2 : i64}> : (index, index) -> i1
  "cf.cond_br"(%7)[^bb2, ^bb3] <{operandSegmentSizes = array<i32: 1, 0, 0>}> : (i1) -> ()
^bb2:  // pred: ^bb1
  %8 = "memref.load"(%arg0, %arg2, %6) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
  %9 = "memref.load"(%4, %6, %arg3) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
  %10 = "memref.load"(%arg1, %arg2, %arg3) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
  %11 = "arith.mulf"(%8, %9) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  %12 = "arith.addf"(%10, %11) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  "memref.store"(%12, %arg1, %arg2, %arg3) <{nontemporal = false}> : (f32, memref<2x2xf32>, index, index) -> ()
  %13 = "arith.addi"(%6, %0) <{overflowFlags = #arith.overflow<none>}> : (index, index) -> index
  "cf.br"(%13)[^bb1] : (index) -> ()
^bb3:  // pred: ^bb1
  "memref.alloca_scope.return"() : () -> ()
}) : () -> ()

I noticed it always happened when an “scf.for” is nested in the “scf.parallel”. Is there any way to make it work?

Thank you for your reading and patience!

kiranchandramohan · June 2, 2024, 9:41pm

Going through the conversation in Help lowering affine loop to OpenMP might help the memref.alloca_scope issue.

Jinjie · June 2, 2024, 11:39pm

Thank you for your reply. Your link is beneficial. I made some modifications to it, and it worked.

mlir-opt-19 -lower-affine -convert-scf-to-openmp -convert-func-to-llvm -arith-bufferize -finalize-memref-to-llvm -convert-scf-to-cf -convert-openmp-to-llvm -canonicalize -convert-to-llvm matmul.mlir

Topic		Replies	Views
Failure converting `scf.yield` operation MLIR	13	419	June 23, 2023
Help lowering OpenMP dialect to LLVM MLIR	4	389	May 23, 2023
Dialect conversion fails with illegal operation via the C++ API, but succeeds via the CLI MLIR	2	1062	November 12, 2020
MLIR/Linalg bad performance MLIR	5	577	August 12, 2021
Making linalg.matmul to GPU runnable code MLIR	6	1330	April 19, 2022

Failure On Lower scf.parallel and scf.for

Related Topics