Hi, I’m working on lowering the Linalg dialect to the LLVM Dialect. I’m trying to parallelize the matrix multiplication by OpenMP but have met some difficulties. My codes follow here.
func.func @matmul(%input: memref<2x2xf32>, %output: memref<2x2xf32>) {
%init = arith.constant dense<[[1.0, 2.0], [3.0, 4.0]]>: tensor<2x2xf32>
%init_buf = bufferization.to_memref %init: memref<2x2xf32>
linalg.matmul ins(%input, %init_buf: memref<2x2xf32>, memref<2x2xf32>) outs(%output: memref<2x2xf32>)
func.return
}
And my instruction follows.
mlir-opt-19 -convert-linalg-to-parallel-loops -convert-scf-to-openmp -convert-scf-to-cf matmul.mlir
It reports an error.
matmul.mlir:4:3: error: 'memref.alloca_scope' op expects region #0 to have 0 or 1 blocks
linalg.matmul ins(%input, %init_buf: memref<2x2xf32>, memref<2x2xf32>) outs(%output: memref<2x2xf32>)
^
matmul.mlir:4:3: note: see current operation:
"memref.alloca_scope"() ({
"cf.br"(%2)[^bb1] : (index) -> ()
^bb1(%6: index): // 2 preds: ^bb0, ^bb2
%7 = "arith.cmpi"(%6, %1) <{predicate = 2 : i64}> : (index, index) -> i1
"cf.cond_br"(%7)[^bb2, ^bb3] <{operandSegmentSizes = array<i32: 1, 0, 0>}> : (i1) -> ()
^bb2: // pred: ^bb1
%8 = "memref.load"(%arg0, %arg2, %6) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
%9 = "memref.load"(%4, %6, %arg3) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
%10 = "memref.load"(%arg1, %arg2, %arg3) <{nontemporal = false}> : (memref<2x2xf32>, index, index) -> f32
%11 = "arith.mulf"(%8, %9) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
%12 = "arith.addf"(%10, %11) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
"memref.store"(%12, %arg1, %arg2, %arg3) <{nontemporal = false}> : (f32, memref<2x2xf32>, index, index) -> ()
%13 = "arith.addi"(%6, %0) <{overflowFlags = #arith.overflow<none>}> : (index, index) -> index
"cf.br"(%13)[^bb1] : (index) -> ()
^bb3: // pred: ^bb1
"memref.alloca_scope.return"() : () -> ()
}) : () -> ()
I noticed it always happened when an “scf.for” is nested in the “scf.parallel”. Is there any way to make it work?
Thank you for your reading and patience!