Status of llvm.experimental.vector.reduce.* intrinsics

Hi,

I was wandering about the status of the
llvm.experimental.vector.reduce.* intrinsics. Are all back-ends
supporting those intrinsics or are they still in a very "experimental"
state?

Thanks,
Michael

​Hi Michael,

​The intrinsics are still technically in an experimental state as we need to have a further discussion to build consensus before marking them as fully supported.

The AArch64 backend has been using them for all natively supported vector reductions for a few months now, with no issues as far as I’m aware. There are some rough edges which need some further work. For example, we’re currently relying on a TTI hook to determine whether or not we create an intrinsic call or degenerate into a shuffevector sequence based on the reduction type. This was intended as a transitional stage. To mark the intrinsics as being first class operations we probably need to add support in codegen to expand the VECREDUCE_* nodes into the shufflevector reduction pattern, so that targets can generate the intrinsics in all cases without having to rely on TTI.

Amara

Hi Amara,

thank you for the clarification. I tested the intrinsics x86_64 and it
seemed to work pretty well. Looking forward to try this intrinsics with
the AArch64 backend. Maybe I find the time to look into codegen to get
this intrinsics out of experimental stage. They seem pretty useful.

Cheers,
Michael

In addition to Amara's point, it'd be good to have it working and
default for other architectures before we can move out of experimental
if we indeed intend to make it non-arch-specific (which we do).

So, if you could share your code for the x86 port, that'd be great.
But if you could help with the final touches on the code-gen part,
that'd be awesome.

cheers,
--renato

Hi Renato,

just to make it clear, I didn't implement reductions on x86_64 they just
worked when I tried to lower an
llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A shuffle pattern
is generated for the intrinsic.

  vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]
  vpor %xmm1, %xmm0, %xmm0
  vpshufd $229, %xmm0, %xmm1 # xmm1 = xmm0[1,1,2,3]
  vpor %xmm1, %xmm0, %xmm0
  vpsrld $16, %xmm0, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vpextrb $0, %xmm0, %eax

However, on AArche64 I encountered an unreachable where codegen does not
know how to promote the i1 type. Since I am more familiar with the
midlevel I have to start digging into codegen. Any hints where to start
would be awesome.

Cheers,
Michael

Can you tell us what you’re looking to do with the intrinsics?

On all non-AArch64 targets the ExpandReductions pass will convert them to the shuffle pattern as you’re seeing. That pass was written in order to allow experimentation of the effects of using reduction intrinsics at the IR level only, hence we convert into the shuffles very late in the pass pipeline.

Since we haven’t seen any adverse effects of representing the reductions as intrinsics at the IR level, I think in that respect the intrinsics have probably proven themselves to be stable. However the error you’re seeing is because the AArch64 backend still expects to deal with only intrinsics it can natively support, and i1 is not a natively supported type for reductions. See the code in AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for where we decide which reduction types we can support.

For these cases, we need to implement more generic legalization support in order to either promote to a legal type, or in cases where the target cannot support it as a native operation at all, to expand it to a shuffle pattern as a fallback. Once we have all that in place, I think we’re in a strong position to move to the intrinsic form as the canonical representation.

FYI one of the motivating reasons for these to be introduced was to allow non power-of-2 vector architectures like SVE to express reduction operations.

Amara