InstCombine question on combineLoadToOperationType

Hello,

Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence.

We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like:

define void @bits(<32 x i1>* %A, <32 x i1>* %B) {

%a = load <32 x i1>, <32 x i1>* %A

store <32 x i1> %a, <32 x i1>* %B

ret void

}

Is transformed to:

define void @bits(<32 x i1>* %A, <32 x i1>* %B) {

%1 = bitcast <32 x i1>* %A to i32*

%a1 = load i32, i32* %1, align 4

%2 = bitcast <32 x i1>* %B to i32*

store i32 %a1, i32* %2, align 4

ret void

}

This looks to be intentional.

Is there a way to specify in the data-layout that v32i1 storage is not 32-bits?

Absent that, is there any other reliable way to retain the original vector loads/store without just disabling this part of InstCombine?

Or is it the backend’s responsibility to try and work with this?

Thanks!

Pete

No, not at the moment. You could propose something, but you’d probably have a hard time convincing anyone it’s necessary; nobody has cared about this for a very long time. No, and you’ll run into other problems (e.g. alias analysis) if the data layout lies about the size of a load or store. Where are these loads coming from? x86 without AVX512 doesn’t have any convenient way generate code for a <32 x i1> store, but it doesn’t matter because frontends don’t generate loads and stores. If you have a frontend which is generating loads and stores like this, you could probably change it to use some other sequence (like a platform-specific intrinsic, or some sequence involving sext/trunc). -Eli

Hello,

Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence.
We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like:
define void @bits(<32 x i1>* %A, <32 x i1>* %B) {
  %a = load <32 x i1>, <32 x i1>* %A
  store <32 x i1> %a, <32 x i1>* %B
  ret void
}

Is transformed to:
define void @bits(<32 x i1>* %A, <32 x i1>* %B) {
  %1 = bitcast <32 x i1>* %A to i32*
  %a1 = load i32, i32* %1, align 4
  %2 = bitcast <32 x i1>* %B to i32*
  store i32 %a1, i32* %2, align 4
  ret void
}

This looks to be intentional.
Is there a way to specify in the data-layout that v32i1 storage is not 32-bits?

No, not at the moment. You could propose something, but you'd probably have a hard time convincing anyone it's necessary; nobody has cared about this for a very long time.

Absent that, is there any other reliable way to retain the original vector loads/store without just disabling this part of InstCombine?

No, and you'll run into other problems (e.g. alias analysis) if the data layout lies about the size of a load or store.

Or is it the backend’s responsibility to try and work with this?

Where are these loads coming from? x86 without AVX512 doesn't have any convenient way generate code for a <32 x i1> store, but it doesn't matter because frontends don't generate <N x i1> loads and stores.

If you have a frontend which is generating loads and stores like this, you could probably change it to use some other sequence (like a platform-specific intrinsic, or some sequence involving sext/trunc).

We do have a frontend that can generate <32 x i1> loads/stores, though it is rare that these are inst-combined to i32 loads/stores like here (these were only illustrative examples).
I’m trying to decide what the best way to remedy this is, and this info and suggestions help.
Thanks!

Pete

Why not just generating the code with the proper storage? If <32 x i1> are used where the storage is <32 x i8> (for example), it seems a bad idea to lie to the IR and hide it with platform-specific intrinsic, right? I fear this would cause other problem down the line in the optimizer.