I am trying to improve the codegen for @llvm.vector.reduce.{and,or,xor}
on AArch64 and have modified the lowerings for ISD::VECREDUCE_{AND,OR,XOR}
for it. Unfortunately it broke an existing special case for <N x i1>
vectors, which would replace the vecreduce_or <N x i1>
with vecreduce_max <N x i1>
which uses a single instruction on AArch64. If I was able to check in my custom lowering whether the element type was i1
, I could recreate this, but it seems that the vecreduce
lowerings receive the types after they’ve already been legalized, so getVectorElementType()
returns an i8
. Is there any way to recover that the vector element type was originally i1
so I can recreate the existing optimization in my custom lowerings?
Nevermind, I think the issue is something else. I believe LLVM is trying to optimize the code by replacing this:
%b = trunc <16 x i8> %a to <16 x i1>
%c = vecreduce_or <16 x i1> %b
%d = zext %c to i16
with this:
%b = vecreduce_or <16 x i8> %a
%c = and i8 %b, 1
%d = zext %c to i16
At least that’s what I’m parsing from reading the debug logs. Is there any way to tell LLVM to please not promote the operand of the instruction or something like that?