Lowering Affine Loops to LLVM

insight · July 22, 2024, 10:42am

Hey all,

I am trying to lower the following affine to LLVM

// affine.mlir :-

module attributes {tf.versions = {bad_consumers = , min_consumer = 0 : i32, producer = 1459 : i32}} {
func.func @__inference_conv2d_34(%arg0: memref<1x224x224x3xf32> {tf._user_specified_name = “input”}, %arg1: memref<5x5x3x3xf32> {tf._user_specified_name = “filters”}, %arg2: memref<1x112x112x3xf32>) attributes {tf.entry_function = {control_outputs = “”, inputs = “input,filters”, outputs = “identity_RetVal”}} {
%cst = arith.constant 0.000000e+00 : f32
%0 = bufferization.to_tensor %arg0 : memref<1x224x224x3xf32>
%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x112x112x3xf32>
affine.for %arg3 = 0 to 1 {
affine.for %arg4 = 0 to 112 {
affine.for %arg5 = 0 to 112 {
affine.for %arg6 = 0 to 3 {
affine.store %cst, %alloc[%arg3, %arg4, %arg5, %arg6] : memref<1x112x112x3xf32>
}
}
}
}
%padded = tensor.pad %0 low[0, 1, 1, 0] high[0, 2, 2, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<1x224x224x3xf32> to tensor<1x227x227x3xf32>
%1 = bufferization.to_memref %padded : memref<1x227x227x3xf32>
%alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<1x112x112x3xf32>
memref.copy %alloc, %alloc_0 : memref<1x112x112x3xf32> to memref<1x112x112x3xf32>
affine.for %arg3 = 0 to 1 {
affine.for %arg4 = 0 to 112 {
affine.for %arg5 = 0 to 112 {
affine.for %arg6 = 0 to 3 {
affine.for %arg7 = 0 to 5 {
affine.for %arg8 = 0 to 5 {
affine.for %arg9 = 0 to 3 {
%2 = affine.load %1[%arg3, %arg4 * 2 + %arg7, %arg5 * 2 + %arg8, %arg9] : memref<1x227x227x3xf32>
%3 = affine.load %arg1[%arg7, %arg8, %arg9, %arg6] : memref<5x5x3x3xf32>
%4 = affine.load %alloc_0[%arg3, %arg4, %arg5, %arg6] : memref<1x112x112x3xf32>
%5 = arith.mulf %2, %3 : f32
%6 = arith.addf %4, %5 : f32
affine.store %6, %alloc_0[%arg3, %arg4, %arg5, %arg6] : memref<1x112x112x3xf32>
}
}
}
}
}
}
}
memref.copy %alloc_0, %arg2 : memref<1x112x112x3xf32> to memref<1x112x112x3xf32>
return
}
}

using

mlir-opt affine.mlir --lower-affine --convert-scf-to-cf --finalize-memref-to-llvm --convert-cf-to-llvm --convert-func-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts

But I am getting the error

affine.mlir:2:36: note: see current operation: %12 = “builtin.unrealized_conversion_cast”(%11) : (!llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>) → memref<1x224x224x3xf32>

After some debugging, I think the issue is with bufferization.to_tensor operation. But from [MLIR] lower bufferization.to_memref to LLVM - #2 by matthias-springer the bufferization.to_tensor need restrict attribute which is not generated when it got lowered. So how do I deal with this? I am trying to lower it to LLVM IR at the end.

Thank you.

matthias-springer · July 22, 2024, 11:37am

Your input IR is mixed tensor/memref IR. Can you write the IR in tensor-only or memref-only? It will be easier to deal with. If you start with tensor IR, the -one-shot-bufferize pass can turn it into memref IR. (But it won’t be able to handle certain mixed tensor/memref IR such as bufferization.to_tensor without the restrict keyword.)

insight · July 22, 2024, 12:01pm

I am not writing the IR manually. I am making use of tf-opt on tensorflow function (using some bufferization passes) to generate it.

tf-opt -xla-legalize-tf -hlo-legalize-to-linalg -linalg-bufferize -func-bufferize -buffer-results-to-out-params -canonicalize -convert-linalg-to-affine-loops -canonicalize func.mlir

matthias-springer · July 22, 2024, 12:07pm

I would try removing all bufferization-related passes and instead run -one-shot-bufferize="bufferize-function-boundaries" as one of the last passes (before lowering to LLVM).

insight · July 22, 2024, 1:17pm

It’s not working.

Below is the func.mlir :-

module attributes {tf.versions = {bad_consumers = , min_consumer = 0 : i32, producer = 1459 : i32}} {
func.func @__inference_conv2d_34(%arg0: tensor<1x224x224x3xf32> {tf._user_specified_name = “input”}, %arg1: tensor<5x5x3x3xf32> {tf._user_specified_name = “filters”}) → tensor<1x112x112x3xf32> attributes {tf.entry_function = {control_outputs = “”, inputs = “input,filters”, outputs = “identity_RetVal”}} {
%0 = “tf.Conv2D”(%arg0, %arg1) {data_format = “NHWC”, device = “”, dilations = [1, 1, 1, 1], explicit_paddings = , padding = “SAME”, strides = [1, 2, 2, 1], use_cudnn_on_gpu = true} : (tensor<1x224x224x3xf32>, tensor<5x5x3x3xf32>) → tensor<1x112x112x3xf32>
%1 = “tf.Identity”(%0) {device = “”} : (tensor<1x112x112x3xf32>) → tensor<1x112x112x3xf32>
return %1 : tensor<1x112x112x3xf32>
}
}

If I remove all the bufferization related passes, the -convert-linalg-to-affine-loops is not creating any affine.loops (which I am trying to optimize) as seen below.

Output of tf-opt func.mlir -xla-legalize-tf -hlo-legalize-to-linalg -convert-linalg-to-affine-loops -canonicalize -o affine.mlir

module attributes {tf.versions = {bad_consumers = , min_consumer = 0 : i32, producer = 1459 : i32}} {
func.func @__inference_conv2d_34(%arg0: tensor<1x224x224x3xf32> {tf._user_specified_name = “input”}, %arg1: tensor<5x5x3x3xf32> {tf._user_specified_name = “filters”}) → tensor<1x112x112x3xf32> attributes {tf.entry_function = {control_outputs = “”, inputs = “input,filters”, outputs = “identity_RetVal”}} {
%cst = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<1x112x112x3xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<1x112x112x3xf32>) → tensor<1x112x112x3xf32>
%padded = tensor.pad %arg0 low[0, 1, 1, 0] high[0, 2, 2, 0] {
^bb0(%arg2: index, %arg3: index, %arg4: index, %arg5: index):
tensor.yield %cst : f32
} : tensor<1x224x224x3xf32> to tensor<1x227x227x3xf32>
%2 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%padded, %arg1 : tensor<1x227x227x3xf32>, tensor<5x5x3x3xf32>) outs(%1 : tensor<1x112x112x3xf32>) → tensor<1x112x112x3xf32>
return %2 : tensor<1x112x112x3xf32>
}
}

I am beginner in MLIR, so not sure If I am missing something obvious here.

matthias-springer · July 22, 2024, 2:06pm

How about this?

tf-opt func.mlir -xla-legalize-tf -hlo-legalize-to-linalg  -canonicalize -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops -o affine.mlir

There are some non-MLIR passes in here (e.g., xla-legalize-tf) that I am not familiar with. Also often passes must be applied in the correct order. E.g., it sounds like -convert-linalg-to-affine-loops must run after bufferization. (But I’m not familiar with that pass either.)

insight · July 22, 2024, 2:19pm

I am trying to convert ML Operations like Convolution (which uses 4 nested loops) to High Level Operations (HLO) using xla-legalize-tf, followed by converting HLO to linalg and linalg to affine loops as seen from the above diagram.

I already tried it, unfortunately it doesn’t work following is the error dump

tf.mlir:5:5: error: operand #0 may return/yield a new buffer allocation
    return %1 : tensor<1x112x112x3xf32>
    ^
tf.mlir:5:5: note: diagnostic emitted with trace:
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libtensorflow_framework.so.2 0x00007fa29c57c972 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 50
1  tf-opt                       0x0000556a2def1fe9
2  tf-opt                       0x0000556a2def21ee
3  tf-opt                       0x0000556a2df1f584
4  tf-opt                       0x0000556a2ce892bc
5  tf-opt                       0x0000556a1fe64ad9
6  tf-opt                       0x0000556a2ce91214
7  tf-opt                       0x0000556a2ce9ae17
8  tf-opt                       0x0000556a2ce9d989
9  tf-opt                       0x0000556a2ce9a77f
10 tf-opt                       0x0000556a2ce71ca2
11 tf-opt                       0x0000556a2ddf99c0
12 tf-opt                       0x0000556a2ddf9fd5
13 tf-opt                       0x0000556a2ddfb2d5
14 tf-opt                       0x0000556a2db7cfed
15 tf-opt                       0x0000556a2db7d60d
16 tf-opt                       0x0000556a2db7d776
17 tf-opt                       0x0000556a2df55aff
18 tf-opt                       0x0000556a2db78ad6
19 tf-opt                       0x0000556a2db7bf1c
20 tf-opt                       0x0000556a1fc8669f
21 libc.so.6                    0x00007fa29a913d90
22 libc.so.6                    0x00007fa29a913e40 __libc_start_main + 128
23 tf-opt                       0x0000556a1fc74de5

tf.mlir:5:5: note: see current operation: "func.return"(%6) : (tensor<1x112x112x3xf32>) -> ()

matthias-springer · July 22, 2024, 3:03pm

Is it possible that you run an old version of MLIR?

I thought we removed that error message:

tf.mlir:5:5: error: operand #0 may return/yield a new buffer allocation

I can’t find it when grepping through the MLIR code base.

You can try this: -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs" (Or if that was the wrong flag, find the place in the code base where this error is thrown. There should be an option to allow buffers to be returned.)

Topic		Replies	Views
[MLIR] lower bufferization.to_memref to LLVM MLIR llvm , mlir	1	234	May 26, 2024
After Linalg tiling can't lower to llvm MLIR	4	383	November 1, 2023
Problem with -convert-cf-to-llvm MLIR	4	678	March 26, 2024
How to lower linalg.generic to affine loops MLIR	1	623	January 31, 2023
Help lowering affine loop to OpenMP MLIR	8	804	April 28, 2024

Lowering Affine Loops to LLVM

Related topics