Scalable vectors support for shuffles

Hi everyone!

I have been interested in adding support for scalable vectors to improve code generation. Currently LLVM supports scalable vectorization for ARM SVE and RVV backends. One place of interest for me has been shuffle instructions. LLVM represents shuffle masks using a numerical array of integers and leaves it upon the backend to choose optimal instructions based on it. Since vector length in unknown for scalable vectors at compile time, the current solution for representing shuffle instructions is adding special intrinsics for specific patterns. One of the earlier proposals regarding this: [llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.).
I was interested in finding an alternative representation which would be more LLVM IR-like: pattern independent representation. For initial prototyping, I tried representing some of the patterns present in the auto-vectorizing pipeline using “variable” masks and adding a backend pass to “unravel” the mask to find backend-specific patterns and respective instructions. To summarize some of my instruction selection efforts, using the same overarching theme of pattern recognition in the backend for scalable vectors:

  1. Extended strided load detection in gather-scatter instructions for scalable vectors in RISC-V backend
  2. Use variable mask in shuffle instructions and detect splice like pattern to emit vslide{down/up} in RISC-V and detect special cases for fixed “immediate” values for instructions like vslide1{down/up}
  3. Representing reduction expansions (that cannot be directly lowered to backend) for scalable vectors using similar variable shuffle masks.

Are folks interested in discussing such a representation which would allow the LLVM IR to be more flexible with optimizations and add the onus of specific instruction selection on the backend? I’d be very excited to discuss any of this in a future meeting.