Do not erase identity generic op with library_call field

Hi, community!

Recently, I’m working on calling the external library function and user’s self-defined function in linalg IR. And I find we have the field of library_call in Linalg’s generic operation. some code like these are generated automatically:

module attributes {torch.debug_module_name = "ExternCallModule"}  {
  func @forward(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %2 = linalg.init_tensor [%0, %1] : tensor<?x?xf32>
    %3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"], library_call = "external_function_2xf32"} ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%2 : tensor<?x?xf32>) {
    ^bb0(%arg2: f32, %arg3: f32, %arg4: f32):  // no predecessors
      linalg.yield %arg4 : f32
    } -> tensor<?x?xf32>
    return %3 : tensor<?x?xf32>
  }
}

You see there’s only a yield in the linalg.generic, which will be erased in the canonicalization:

module attributes {torch.debug_module_name = "ExternCallModule"}  {
  func @forward(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %2 = linalg.init_tensor [%0, %1] : tensor<?x?xf32>
    return %2 : tensor<?x?xf32>
  }
}

I modified it by editing EraseIdentityGenericOp in LinalgOps.cpp, the patch is: ⚙ D115871 Do not erase identity generic op with library_call field.
Btw, thank you in advance for your review. @gysit

Does it modification make sense? Any advice would be very grateful!

Hi MuPei,

Can you explain what external_function_2xf32 is doing? Is it an empty function?

Usually, the body of a linalg.generic should always implement the same logic as the associated library_call. If the body is empty it should be safe to erase the operation even if the library_call attribute is set.

Hi gysit,

Can you explain what external_function_2xf32 is doing? Is it an empty function?
Usually, the body of a linalg.generic should always implement the same logic as the associated library_call . If the body is empty it should be safe to erase the operation even if the library_call attribute is set.

The problem I want to solve is to allow MLIR to provide an external interface without knowing the specific implementation of a function, and this function can be implemented by users themselves (for example, in C language).

This is a practical problem I encountered in my work. The specific situation is:
For an operator that the front end (such as torch-mlir) cannot support, users do not need to modify the code of torch-mlir to add support for this operator, but let torch-mlir automatically identify the unsupported operator and strip it out. Then MLIR can provide a function definition of an external interface so that users can complete its implementation by themselves.

The whole route is a relatively big story, and I am happy to submit an RFC to share my solution with the community after completion.


And I think there’s an alternative solution for this case: adding a new op for representing the unknow implementation in LinalgOps.td, which we can name it UnknowImplOp. So the Linalg IR could be:

module attributes {torch.debug_module_name = "ExternCallModule"}  {
  func @forward(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>) -> tensor<?x?xf32> {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %2 = linalg.init_tensor [%0, %1] : tensor<?x?xf32>
    %3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"], library_call = "external_function_2xf32"} ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%2 : tensor<?x?xf32>) {
    ^bb0(%arg2: f32, %arg3: f32, %arg4: f32):  // no predecessors
      %4 = linalg.unknow_impl(%arg2, %arg3)
      linalg.yield %4 : f32
    } -> tensor<?x?xf32>
    return %3 : tensor<?x?xf32>
  }
}

How do you think about it? :thinking:

Thanks for the explanation,

I have a strong preference for your alternative solution but instead of defining a new operation I would suggest to use AssertOp. I think it should be a good fit here and prevent the canonicalization.

Also, did you consider generating a function call directly?

1 Like

I have a strong preference for your alternative solution but instead of defining a new operation I would suggest to use AssertOp. I think it should be a good fit here and prevent the canonicalization.

Thanks for your advice! I guess maybe I could assert the inputs and outputs are not empty or something like that.

Also, did you consider generating a function call directly?

Yes! I used the library_call field and used the -convert-std-to-llvm='emit-c-wrappers=1' to generate a function call, so that users can implement their own functions.

Thanks for your advice! I guess maybe I could assert the inputs and outputs are not empty or something like that.

Yes I probably just start with an assert false that is triggered when lowering to vectorized code or loops which is the standard use case for Linalg.

Yes! I used the library_call field and used the -convert-std-to-llvm='emit-c-wrappers=1' to generate a function call, so that users can implement their own functions.

Theoretically you could also generate the result of -convert-std-to-llvm directly by emitting an empty function in the first place. That way you can avoid Linalg completely.

Yes I probably just start with an assert false that is triggered when lowering to vectorized code or loops which is the standard use case for Linalg.

Got it, Thank you!

Theoretically you could also generate the result of -convert-std-to-llvm directly by emitting an empty function in the first place. That way you can avoid Linalg completely.

Yeah, that sounds cool. But I’m using the torch-mlir project to convert the pytorch code to Linalg IR automatically, so currently, I find the Linalg might be necessary in this case.

Anyway, thanks for teaching me that convert from std to llvm immediately. I’d like to try how it works. :grin:

+1, it depends on how much codegen behavior you want on these ops; basically:

  • if you just want to call an external library, a simple call op should suffice.
  • if you want to have an op that can both codegen and escape to a library in different scenarios then we may want to consider a specific opaque library op.
  • if you want to specify specific tiling behaviors with an interface and be able to tile, fuse, pad etc and then call a library, then linalg.generic makes the most sense.
  • if you want to codegen an op that has special internal behavior (e.g. internally call some other operation that manipulates some global state and uses atomic operations), you could have a library call within the linalg.generic body (although this latter use case has not yet been pushed on seriously).

+1 on this, this seems much cleaner than an assert: the assert can be leveraged by the optimizer to simplify the code path (in a block that ends with an assert(false) you could remove any non-side-effecting operation leading to the assert).

Thanks for your advice.

  • if you want to have an op that can both codegen and escape to a library in different scenarios then we may want to consider a specific opaque library op.

Sorry, I didn’t really get that. Could you please offer me an example?

you could have a library call within the linalg.generic body

Did you mean having a std.call in the linalg.generic body region? Or just use the library_call field?

Both at the same time actually :slight_smile:

1 Like

Yeah, I got it! Thank you~

I’ve tried two solutions for this problem:

  1. create an assert op before linalg.yield op, which works quiet well
  2. create a call op like:
     b.create<mlir::CallOp>(loc, libraryCallAttr.getValue(), TypeRange({}), inputs);
    
    but I got such error: error: 'std.call' op 'external_function_2xf32' does not reference a valid function.
    It seems that “The call operation represents a direct call to a function that is within the same symbol scope as the call.”, but actually, the ‘external_function_2xf32’ needs to be defined by users(I mean someone who uses our compiler).

So could you please give me some advice about the second solution?

You probably need to declare the function in your module (a function definition with a body is not needed).

The following example mlir file defines and calls print_memref_f32:
https://github.com/llvm/llvm-project/blob/d20249fde649a3a490618232bb48a1c701d35f03/mlir/test/mlir-cpu-runner/memref-reshape.mlir#L7

On thing that may be tricky is bufferization (the conversion from tensor to memref type). I expect you have to define external_function_2xf32 using tensors and bufferization then converts the function to take memref parameters.

1 Like

brilliant! thank you!

Sorry for the slow response, I am on half-vacation mode.
You may have found that the lowering of a generic to std introduces such declarations automatically
if needed: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp#L43

It would be useful to turn this pass that I pointed into a rank-erased implementation.

Thank you!