Can't create op with replaced op in dialect conversion

I’m working on packing and emulating i1 in SPIR-V. I have a patch to work, it is not landed yet. ⚙ D99724 [mlir][StandardToSPIRV] Handle i1 case for lowering memref.load/store op

It basically treats i1 like i8. With the patch.

I can lower both i1 and i8 cases, and the IRs are identical.

func @load_i1(%arg0: memref<i1>) {
  %0 = memref.load %arg0[] : memref<i1>
  return
}

func @load_i8(%arg0: memref<i8>) {
  %0 = memref.load %arg0[] : memref<i8>
  return
}

to

module attributes {spv.target_env = #spv.target_env<#spv.vce<v1.0, [Int16, StorageBuffer16BitAccess, Shader], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_16bit_storage]>, {}>}  {
  spv.func @load_i8(%arg0: !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>) "None" {
    %0 = spv.Constant 0 : i32
    %1 = spv.Constant 4 : i32
    %2 = spv.SDiv %0, %1 : i32
    %3 = spv.AccessChain %arg0[%0, %2] : !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>, i32, i32
    %4 = spv.Load "StorageBuffer" %3 : i32
    %5 = spv.Constant 4 : i32
    %6 = spv.Constant 8 : i32
    %7 = spv.UMod %0, %5 : i32
    %8 = spv.IMul %7, %6 : i32
    %9 = spv.ShiftRightArithmetic %4, %8 : i32, i32
    %10 = spv.Constant 255 : i32
    %11 = spv.BitwiseAnd %9, %10 : i32
    %12 = spv.Constant 24 : i32
    %13 = spv.ShiftLeftLogical %11, %12 : i32, i32
    %14 = spv.ShiftRightArithmetic %13, %12 : i32, i32
    spv.Return
  }
}

They generate the identical IRs. However, I hit an issue when I returned the value.

module attributes {
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Int16, StorageBuffer16BitAccess, Shader],
    [SPV_KHR_storage_buffer_storage_class, SPV_KHR_16bit_storage]>, {}>
} {

func @load_i8(%arg0: memref<i8>) -> i8 {
  %0 = memref.load %arg0[] : memref<i8>
  return %0 : i8
}

}

The above one can be lowered to SPIR-V, but the below one can not

module attributes {
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Int16, StorageBuffer16BitAccess, Shader],
    [SPV_KHR_storage_buffer_storage_class, SPV_KHR_16bit_storage]>, {}>
} {

func @load_i1(%arg0: memref<i1>) -> i1 {
  %0 = memref.load %arg0[] : memref<i1>
  return %0 : i1
}

}

The i1 case only works if it does not have users.

I dumped operands[0] in ReturnOpPattern. i8 case showed the replaced op %15 = spv.ShiftRightArithmetic %14, %13 : i32, i32, and i1 case showed the old one %16 = memref.load <<UNKNOWN SSA VALUE>>[] : memref<i1>. It is weird because it said that the pattern applied successfully, but the users still get the old operand.

I also verified that i1 case and i8 case generate the same log with -debug and they are all marked legal and success.

It looks like a memref.load was created because the variable name is %16. Are there more debugging tips? Is it a bug in dialect conversion?

To repro, run mlir-opt -convert-std-to-spirv a.mlir

full log for i1: gist:8a4494f337629299959b6e7d59a882f5 · GitHub
full log for i8: gist:56a8fcdd6a8eb25a743a71667754e6e4 · GitHub

You don’t mention what you mean by “only works” - what output/behavior do you observe when it doesn’t? And what is the output for the i8 case? You likely have a bug/memory error in both cases. Did you use a IR manipulation method outside of the conversion rewriter API? This looks like an issue with BlockArgument replacements - there was also a bug deep inside dialect conversion for one of these scenarios (over six months ago), and I hope you aren’t using an older version.

Here’s how your IR looks like before deleting the ops

i8:

module attributes {spv.target_env = #spv.target_env<#spv.vce<v1.0, [Int16, StorageBuffer16BitAccess, Shader], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_16bit_storage]>, {}>}  {
  spv.func @load_i8(%arg0: !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>) -> i32 "None" {
    %0 = spv.Constant 0 : i32
    %1 = spv.AccessChain %arg0[%0, %0] : !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>, i32, i32
    %2 = spv.Constant 4 : i32
    %3 = spv.SDiv %0, %2 : i32
    %4 = spv.AccessChain %arg0[%0, %3] : !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>, i32, i32
    %5 = spv.Load "StorageBuffer" %4 : i32
    %6 = spv.Constant 4 : i32
    %7 = spv.Constant 8 : i32
    %8 = spv.UMod %0, %6 : i32
    %9 = spv.IMul %8, %7 : i32
    %10 = spv.ShiftRightArithmetic %5, %9 : i32, i32
    %11 = spv.Constant 255 : i32
    %12 = spv.BitwiseAnd %10, %11 : i32
    %13 = spv.Constant 24 : i32
    %14 = spv.ShiftLeftLogical %12, %13 : i32, i32
    %15 = spv.ShiftRightArithmetic %14, %13 : i32, i32
    %16 = memref.load <<UNKNOWN SSA VALUE>>[] : memref<i8>  // deleted
    spv.ReturnValue %15 : i32
    return %16 : i8  // deleted
  }
  func @load_i8(memref<i8>) -> i8
}

i1:

module attributes {spv.target_env = #spv.target_env<#spv.vce<v1.0, [Int16, StorageBuffer16BitAccess, Shader], [SPV_KHR_storage_buffer_storage_class, SPV_KHR_16bit_storage]>, {}>}  {
  spv.func @load_i1(%arg0: !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>) -> i1 "None" {
    %0 = spv.Constant 0 : i32
    %1 = spv.AccessChain %arg0[%0, %0] : !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>, i32, i32
    %2 = spv.Constant 4 : i32
    %3 = spv.SDiv %0, %2 : i32
    %4 = spv.AccessChain %arg0[%0, %3] : !spv.ptr<!spv.struct<(!spv.array<1 x i32, stride=4> [0])>, StorageBuffer>, i32, i32
    %5 = spv.Load "StorageBuffer" %4 : i32
    %6 = spv.Constant 4 : i32
    %7 = spv.Constant 8 : i32
    %8 = spv.UMod %0, %6 : i32
    %9 = spv.IMul %8, %7 : i32
    %10 = spv.ShiftRightArithmetic %5, %9 : i32, i32
    %11 = spv.Constant 255 : i32
    %12 = spv.BitwiseAnd %10, %11 : i32
    %13 = spv.Constant 24 : i32
    %14 = spv.ShiftLeftLogical %12, %13 : i32, i32
    %15 = spv.ShiftRightArithmetic %14, %13 : i32, i32
    %16 = memref.load <<UNKNOWN SSA VALUE>>[] : memref<i1> // deleted
    spv.ReturnValue %16 : i1 
    return %16 : i1  // deleted
  }
  func @load_i1(memref<i1>) -> i1
}

Notice the difference in arguments of spv.ReturnValue, the i1 case uses %16 instead of %15, which makes it a live user of the memref.load. That triggers source materialization, which fails and thus fails the entire conversion process with the message you see (I find it generally useful to work back from the error message that can be found somewhere in the source code).

I tracked this further down to operand selection code here llvm-project/DialectConversion.cpp at b7ef804807855e607da3eba221c1fc59e27f778e · llvm/llvm-project · GitHub. In the case of i8, it looks for a value of type i32 in the remapping chain and finds the new value. In the case of i1 however, it looks for a value of type i1 (sic!), so the new value is not acceptable and it chooses the original value instead.

Digging deeper, the type converter associated with the pattern does not convert i1 to i32 but it does convert i8 to i32. You can also observe this is in the logs you provided: there is i8 converted to 32-bit for SPIR-V but no sign of an equivalent message for i1.

I suppose this is because i1 is considered valid by the converted given SPIR-V capabilities, but you need to figure out how to fix that.

2 Likes

This is really helpful, thank you so much! We suspected that this is SPIR-V conversion specific issue when I talked to @MaheshRavishankar and @ThomasRaoux . I planned to dig into it more deeply, but I did not get a chance. I think i1 is tricky in SPIR-V, and I did not handle it well. I will talk to @antiagainst and figure out how to support it correctly in SPIR-V. Thanks again for helping on this!

1 Like

Sorry I was out of office for a few days. Thanks @hanchung for looking into this and thanks @ftynse for the great analysis!

Yes as the code quoted by @hanchung shows, boolean types is just quite special in SPIR-V: there is no defined bit width for it. Normally they should just be internal to the kernels and not be exposed at the interface level. Otherwise they need to be emulated. On graphics side there are layout specifiers like std430/etc. to control how they should be packed but here we are coming from high level ML programming models we don’t have that. The emulation haven’t been done for boolean types properly in the type converter so that’s why we are seeing the strange behavior. In general this emulation should also be controlled explicitly given they matters for ABI. Right now we just unconditionally turns them on. This should be changed. I’ll do a pass to fix these issues.

I think something is hidden here: llvm-project/SPIRVConversion.cpp at e81b3401177a67481605447ea5250d8345cb4341 · llvm/llvm-project · GitHub

There are no capability and extension for i1, so it will always return i1.

I think we only want the emulation (like use i32 type instead) when storing/loading i1. They should be i1 type in kernels, e.g., if there is a branch which takes i1 as operand, we can not unconditionally convert it to non-i1 types. I already hit issues in SCFToSPIRV tests: error: 'spv.BranchConditional' op operand #0 must be bool, but got 'i32'.

In this case, we sometimes want i1 and sometimes not. It depends on the use case in operations, and this is painful. It also looks impossible to me to implement such thing in dialect conversion.

IMO, we should not propagate i1 issues to this low level. It’s better and easier to address it maybe in std dialect.

This was coming from TOSA tests in IREE. On IREE side, I think we should avoid load from or store to i1 buffers atm.

Yup. There is no capabilities for i1 in SPIR-V. It’s actually not expected to appear in interface storage classes where we need to have ABI match between the kernel and the runtime. For compute it’s available everywhere so no need for special capabilities.

Generally in SPIR-V, for a primitive type, there can be two categories of capabilities involved: one category for storage and one category for compute. And for the first class, depending on the storage class, the capability can be different. For example, for i8, if it appears in the PushConstant/StorageBuffer/Uniform/etc. storage class it will need the StoragePushConstant8/StorageBuffer8BitAccess/UniformAndStorageBuffer8BitAccess/etc. capability. For compute we just need Int8. Similarly for other non-32-bit primitive types. All the details can be found here.

Yeah we shouldn’t covert all i1 unconditionally. I think what we want here is just to treat memref<*xi1, N> specially in the type converter, with proper options to control, and the load/store ops for it. The rest should be kept untouched. This is basically a data layout issue. @ftynse recently landed a bunch of infrastructure support for it. I’ll look into it to hook up with SPIR-V dialect. For now you can add options to the type converter where suitable.

Thanks for the help! I forgot to truncate the bits so I still hit the issues. Now it’s fixed.