Lowering Affine Loops to LLVM

Hey all,

I am trying to lower the following affine to LLVM

// affine.mlir :-

module attributes {tf.versions = {bad_consumers = , min_consumer = 0 : i32, producer = 1459 : i32}} {
func.func @__inference_conv2d_34(%arg0: memref<1x224x224x3xf32> {tf._user_specified_name = “input”}, %arg1: memref<5x5x3x3xf32> {tf._user_specified_name = “filters”}, %arg2: memref<1x112x112x3xf32>) attributes {tf.entry_function = {control_outputs = “”, inputs = “input,filters”, outputs = “identity_RetVal”}} {
%cst = arith.constant 0.000000e+00 : f32
%0 = bufferization.to_tensor %arg0 : memref<1x224x224x3xf32>
%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x112x112x3xf32>
affine.for %arg3 = 0 to 1 {
affine.for %arg4 = 0 to 112 {
affine.for %arg5 = 0 to 112 {
affine.for %arg6 = 0 to 3 {
affine.store %cst, %alloc[%arg3, %arg4, %arg5, %arg6] : memref<1x112x112x3xf32>
}
}
}
}
%padded = tensor.pad %0 low[0, 1, 1, 0] high[0, 2, 2, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<1x224x224x3xf32> to tensor<1x227x227x3xf32>
%1 = bufferization.to_memref %padded : memref<1x227x227x3xf32>
%alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<1x112x112x3xf32>
memref.copy %alloc, %alloc_0 : memref<1x112x112x3xf32> to memref<1x112x112x3xf32>
affine.for %arg3 = 0 to 1 {
affine.for %arg4 = 0 to 112 {
affine.for %arg5 = 0 to 112 {
affine.for %arg6 = 0 to 3 {
affine.for %arg7 = 0 to 5 {
affine.for %arg8 = 0 to 5 {
affine.for %arg9 = 0 to 3 {
%2 = affine.load %1[%arg3, %arg4 * 2 + %arg7, %arg5 * 2 + %arg8, %arg9] : memref<1x227x227x3xf32>
%3 = affine.load %arg1[%arg7, %arg8, %arg9, %arg6] : memref<5x5x3x3xf32>
%4 = affine.load %alloc_0[%arg3, %arg4, %arg5, %arg6] : memref<1x112x112x3xf32>
%5 = arith.mulf %2, %3 : f32
%6 = arith.addf %4, %5 : f32
affine.store %6, %alloc_0[%arg3, %arg4, %arg5, %arg6] : memref<1x112x112x3xf32>
}
}
}
}
}
}
}
memref.copy %alloc_0, %arg2 : memref<1x112x112x3xf32> to memref<1x112x112x3xf32>
return
}
}

using

mlir-opt affine.mlir --lower-affine --convert-scf-to-cf --finalize-memref-to-llvm --convert-cf-to-llvm --convert-func-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts

But I am getting the error

affine.mlir:2:36: note: see current operation: %12 = “builtin.unrealized_conversion_cast”(%11) : (!llvm.struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>) → memref<1x224x224x3xf32>

After some debugging, I think the issue is with bufferization.to_tensor operation. But from [MLIR] lower bufferization.to_memref to LLVM - #2 by matthias-springer the bufferization.to_tensor need restrict attribute which is not generated when it got lowered. So how do I deal with this? I am trying to lower it to LLVM IR at the end.

Thank you.

Your input IR is mixed tensor/memref IR. Can you write the IR in tensor-only or memref-only? It will be easier to deal with. If you start with tensor IR, the -one-shot-bufferize pass can turn it into memref IR. (But it won’t be able to handle certain mixed tensor/memref IR such as bufferization.to_tensor without the restrict keyword.)

I am not writing the IR manually. I am making use of tf-opt on tensorflow function (using some bufferization passes) to generate it.

tf-opt -xla-legalize-tf -hlo-legalize-to-linalg -linalg-bufferize -func-bufferize -buffer-results-to-out-params -canonicalize -convert-linalg-to-affine-loops -canonicalize func.mlir

I would try removing all bufferization-related passes and instead run -one-shot-bufferize="bufferize-function-boundaries" as one of the last passes (before lowering to LLVM).

It’s not working.

Below is the func.mlir :-

module attributes {tf.versions = {bad_consumers = , min_consumer = 0 : i32, producer = 1459 : i32}} {
func.func @__inference_conv2d_34(%arg0: tensor<1x224x224x3xf32> {tf._user_specified_name = “input”}, %arg1: tensor<5x5x3x3xf32> {tf._user_specified_name = “filters”}) → tensor<1x112x112x3xf32> attributes {tf.entry_function = {control_outputs = “”, inputs = “input,filters”, outputs = “identity_RetVal”}} {
%0 = “tf.Conv2D”(%arg0, %arg1) {data_format = “NHWC”, device = “”, dilations = [1, 1, 1, 1], explicit_paddings = , padding = “SAME”, strides = [1, 2, 2, 1], use_cudnn_on_gpu = true} : (tensor<1x224x224x3xf32>, tensor<5x5x3x3xf32>) → tensor<1x112x112x3xf32>
%1 = “tf.Identity”(%0) {device = “”} : (tensor<1x112x112x3xf32>) → tensor<1x112x112x3xf32>
return %1 : tensor<1x112x112x3xf32>
}
}

If I remove all the bufferization related passes, the -convert-linalg-to-affine-loops is not creating any affine.loops (which I am trying to optimize) as seen below.

Output of tf-opt func.mlir -xla-legalize-tf -hlo-legalize-to-linalg -convert-linalg-to-affine-loops -canonicalize -o affine.mlir

module attributes {tf.versions = {bad_consumers = , min_consumer = 0 : i32, producer = 1459 : i32}} {
func.func @__inference_conv2d_34(%arg0: tensor<1x224x224x3xf32> {tf._user_specified_name = “input”}, %arg1: tensor<5x5x3x3xf32> {tf._user_specified_name = “filters”}) → tensor<1x112x112x3xf32> attributes {tf.entry_function = {control_outputs = “”, inputs = “input,filters”, outputs = “identity_RetVal”}} {
%cst = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<1x112x112x3xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<1x112x112x3xf32>) → tensor<1x112x112x3xf32>
%padded = tensor.pad %arg0 low[0, 1, 1, 0] high[0, 2, 2, 0] {
^bb0(%arg2: index, %arg3: index, %arg4: index, %arg5: index):
tensor.yield %cst : f32
} : tensor<1x224x224x3xf32> to tensor<1x227x227x3xf32>
%2 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>} ins(%padded, %arg1 : tensor<1x227x227x3xf32>, tensor<5x5x3x3xf32>) outs(%1 : tensor<1x112x112x3xf32>) → tensor<1x112x112x3xf32>
return %2 : tensor<1x112x112x3xf32>
}
}

I am beginner in MLIR, so not sure If I am missing something obvious here.

How about this?

tf-opt func.mlir -xla-legalize-tf -hlo-legalize-to-linalg  -canonicalize -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops -o affine.mlir

There are some non-MLIR passes in here (e.g., xla-legalize-tf) that I am not familiar with. Also often passes must be applied in the correct order. E.g., it sounds like -convert-linalg-to-affine-loops must run after bufferization. (But I’m not familiar with that pass either.)

I am trying to convert ML Operations like Convolution (which uses 4 nested loops) to High Level Operations (HLO) using xla-legalize-tf, followed by converting HLO to linalg and linalg to affine loops as seen from the above diagram.

I already tried it, unfortunately it doesn’t work following is the error dump

tf.mlir:5:5: error: operand #0 may return/yield a new buffer allocation
    return %1 : tensor<1x112x112x3xf32>
    ^
tf.mlir:5:5: note: diagnostic emitted with trace:
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libtensorflow_framework.so.2 0x00007fa29c57c972 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 50
1  tf-opt                       0x0000556a2def1fe9
2  tf-opt                       0x0000556a2def21ee
3  tf-opt                       0x0000556a2df1f584
4  tf-opt                       0x0000556a2ce892bc
5  tf-opt                       0x0000556a1fe64ad9
6  tf-opt                       0x0000556a2ce91214
7  tf-opt                       0x0000556a2ce9ae17
8  tf-opt                       0x0000556a2ce9d989
9  tf-opt                       0x0000556a2ce9a77f
10 tf-opt                       0x0000556a2ce71ca2
11 tf-opt                       0x0000556a2ddf99c0
12 tf-opt                       0x0000556a2ddf9fd5
13 tf-opt                       0x0000556a2ddfb2d5
14 tf-opt                       0x0000556a2db7cfed
15 tf-opt                       0x0000556a2db7d60d
16 tf-opt                       0x0000556a2db7d776
17 tf-opt                       0x0000556a2df55aff
18 tf-opt                       0x0000556a2db78ad6
19 tf-opt                       0x0000556a2db7bf1c
20 tf-opt                       0x0000556a1fc8669f
21 libc.so.6                    0x00007fa29a913d90
22 libc.so.6                    0x00007fa29a913e40 __libc_start_main + 128
23 tf-opt                       0x0000556a1fc74de5

tf.mlir:5:5: note: see current operation: "func.return"(%6) : (tensor<1x112x112x3xf32>) -> ()

Is it possible that you run an old version of MLIR?

I thought we removed that error message:

tf.mlir:5:5: error: operand #0 may return/yield a new buffer allocation

I can’t find it when grepping through the MLIR code base.

You can try this: -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs" (Or if that was the wrong flag, find the place in the code base where this error is thrown. There should be an option to allow buffers to be returned.)

1 Like