Failure On Lower scf.parallel and scf.for

Hi, I’m working on lowering the Linalg dialect to the LLVM Dialect. I’m trying to parallelize the matrix multiplication by OpenMP but have met some difficulties. My codes follow here.

func.func @matmul(%input: memref<2x2xf32>, %output: memref<2x2xf32>) {
  %init = arith.constant dense<[[1.0, 2.0], [3.0, 4.0]]>: tensor<2x2xf32>
  %init_buf = bufferization.to_memref %init: memref<2x2xf32>
  linalg.matmul ins(%input, %init_buf: memref<2x2xf32>, memref<2x2xf32>) outs(%output: memref<2x2xf32>)
  func.return
}

And my instruction follows.

mlir-opt-19 -convert-linalg-to-parallel-loops -convert-scf-to-openmp -convert-scf-to-cf matmul.mlir

It reports an error.

matmul.mlir:4:3: error: 'memref.alloca_scope' op expects region #0 to have 0 or 1 blocks
  linalg.matmul ins(%input, %init_buf: memref<2x2xf32>, memref<2x2xf32>) outs(%output: memref<2x2xf32>)
  ^
matmul.mlir:4:3: note: see current operation: 
"memref.alloca_scope"() ({
  "cf.br"(%2)[^bb1] : (index) -> ()
^bb1(%6: index):  // 2 preds: ^bb0, ^bb2
  %7 = "arith.cmpi"(%6, %1) <{predicate = 2 : i64}> : (index, index) -> i1
  "cf.cond_br"(%7)[^bb2, ^bb3] <{operandSegmentSizes = array<i32: 1, 0, 0>}> : (i1) -> ()
^bb2:  // pred: ^bb1
  %8 = "memref.load"(%arg0, %arg2, %6) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
  %9 = "memref.load"(%4, %6, %arg3) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
  %10 = "memref.load"(%arg1, %arg2, %arg3) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
  %11 = "arith.mulf"(%8, %9) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  %12 = "arith.addf"(%10, %11) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  "memref.store"(%12, %arg1, %arg2, %arg3) <{nontemporal = false}> : (f32, memref<2x2xf32>, index, index) -> ()
  %13 = "arith.addi"(%6, %0) <{overflowFlags = #arith.overflow<none>}> : (index, index) -> index
  "cf.br"(%13)[^bb1] : (index) -> ()
^bb3:  // pred: ^bb1
  "memref.alloca_scope.return"() : () -> ()
}) : () -> ()

I noticed it always happened when an “scf.for” is nested in the “scf.parallel”. Is there any way to make it work?

Thank you for your reading and patience!

Going through the conversation in Help lowering affine loop to OpenMP might help the memref.alloca_scope issue.

Thank you for your reply. Your link is beneficial. I made some modifications to it, and it worked.

mlir-opt-19 -lower-affine -convert-scf-to-openmp -convert-func-to-llvm -arith-bufferize -finalize-memref-to-llvm -convert-scf-to-cf -convert-openmp-to-llvm -canonicalize -convert-to-llvm matmul.mlir
1 Like