[SVE][AArch64] Codegen for a scalable vector splat

Hi,

During the discussion on introducing scalable vectors we established that we could use the canonical IR form for splats of scalable vector types (insert element into lane 0 of an undef vector, shuffle that with another undef vector of the same type and a zeroinitializer mask).

We do run into a problem for lowering to SelectionDAG however, since the canonical form there is a BUILD_VECTOR with all elements being the same. This obviously doesn't work if we don't know how many elements there are. We have a couple of solutions and would like to know which the community prefers.

1) Add a new SPLAT_VECTOR ISD node. This was part of our overall RFC from 2016 and is the solution that we're currently using downstream. It just accepts a single scalar value. This has worked well with just the SVE codegen using it, but I don't know if we would run into problems if we try to make this the canonical splat form for SDAG.

2) Extend BUILD_VECTOR to accept an initial boolean indicating whether it is a splat, and if true the first element can be assumed to be the same as all others. The splat form would be the only valid use of BUILD_VECTOR for scalable vector types. For fixed length vectors we could either change existing checks for splats to only look at the flag and would only need one extra argument for the splat value, or use the flag as a shortcut and fall back to checking all the elements if there's the possibility of a fold generating a splat and it not being recognized.

Given the existence of MVTs with >1000 elements per vector, the SPLAT_VECTOR or BUILD_VECTOR with single element approach may also be beneficial for some fixed length backends.

Any thoughts?

-Graham

Hi Graham,

I'm not a big fan of boolean flags on related concepts because the API
becomes brittle. In this case, BUILD_VECTOR-splat would need an assert
to make sure there's only one element.

Can we reuse the SPLAT_VECTOR node for predicated fixed width vectors?
This would simplify a bit the tail of the loop in AVX, for example,
but would also make sure we treat similar patterns in the same way,
and add to the list of reasons why to add a new node.

cheers,
--renato

Hi Renato,

Hi Graham,

I'm not a big fan of boolean flags on related concepts because the API
becomes brittle. In this case, BUILD_VECTOR-splat would need an assert
to make sure there's only one element.

Ok.

Can we reuse the SPLAT_VECTOR node for predicated fixed width vectors?
This would simplify a bit the tail of the loop in AVX, for example,
but would also make sure we treat similar patterns in the same way,
and add to the list of reasons why to add a new node.

If we introduce the new node, I certainly expect it to be used for
fixed width vectors as well (though possibly not right away). Is there
anything special about predication in this use case, or is a simple
splat sufficient?

-Graham

There are many patterns that predication can do to constant splats,
but I was just thinking of implementing the tail loop in the same for
predicated fixed width as we do with SVE. So a simple splat would work
for now.

--renato

Agreed. #1 is better.

The Complex proposal requires special uses of broadcasts. Hopefully someone from that project can comment here. It would be nice to design Complex support into SPLAT_VECTOR, so that special lowerings aren’t needed.

Hi,

During the discussion on introducing scalable vectors we established that we could use the canonical IR form for splats of scalable vector types (insert element into lane 0 of an undef vector, shuffle that with another undef vector of the same type and a zeroinitializer mask).

We do run into a problem for lowering to SelectionDAG however, since the canonical form there is a BUILD_VECTOR with all elements being the same. This obviously doesn't work if we don't know how many elements there are. We have a couple of solutions and would like to know which the community prefers.

1) Add a new SPLAT_VECTOR ISD node. This was part of our overall RFC from 2016 and is the solution that we're currently using downstream. It just accepts a single scalar value. This has worked well with just the SVE codegen using it, but I don't know if we would run into problems if we try to make this the canonical splat form for SDAG.

2) Extend BUILD_VECTOR to accept an initial boolean indicating whether it is a splat, and if true the first element can be assumed to be the same as all others. The splat form would be the only valid use of BUILD_VECTOR for scalable vector types. For fixed length vectors we could either change existing checks for splats to only look at the flag and would only need one extra argument for the splat value, or use the flag as a shortcut and fall back to checking all the elements if there's the possibility of a fold generating a splat and it not being recognized.

Given the existence of MVTs with >1000 elements per vector, the SPLAT_VECTOR or BUILD_VECTOR with single element approach may also be beneficial for some fixed length backends.

I believe that the new ISD node is the better design. It can default to
Expand and the default expansion can fall back to the existing
BUILD_VECTOR node (for fixed-length vectors). Thus, we can maintain
backwards compatibility. Having a special flag on on BUILD_VECTOR that
one needs to remember to check in order to know how to interpret the
other node arguments seems prone to mistakes. In other words, we do
already have a getSplatValue member on BuildVectorSDNode, and a number
of other utility functions, but transforming those from utility
functions to semantically-mandatory checks seems suboptimal.

-Hal

+1 to a new node, we’d very likely do the same thing for GlobalISel and move to a canonical spat representation for all targets.

Amara

Just spitballing… why not have a splat construct straight through LLVM? It would make the IR more readable, opposed to the insert+shuffle method.

Good question. Years back the original SVE proposal was to have a new instruction, “seriesvector”, to represent arithmetic series in vector elements. That instruction could also be used to represent splats if the step value was 0, but the instruction was deemed unnecessarily powerful during community feedback. I believe the proposed replacement is “stepvector” (https://reviews.llvm.org/D47774) which doesn’t have a variable step so can’t do splats anymore.

IIRC the overall feeling is the same as the other attempts to canonicalize representations. We don’t introduce new representations unless absolutely necessary, and insert+shufflevector is technically sufficient to achieve a splat, even though it looks pretty horrible, bloats the IR etc.

Amara

This is a recurrent enough question that perhaps we need to have a
blog post about it. :slight_smile:

Mainly, the gist is flexibility and maintainability. Optimisations can
be done on standard IR, but new constructs need to be taught to all
passes before they become really useful. By introducing a new node
type for every language/machine concept, we'd have a combinatorial
explosion on the number of conversions to do, as well as lose the
ability to efficiently pattern match to find optimisations.

The side effect is having to be careful (and thus conservative) on
code transformation. You end up with longer and more complicated
def-use chains, which are hard to match, transform, move around,
simplify. But at least, simpler patterns work, and improving a pattern
becomes an incremental change.

The main benefit is being able to have a common infrastructure to do
all of those changes in the right place, and hopefully only once per
stage, at non-combinatorial time complexity.

Hope this helps.

--renato

PS: In this particular case, the ISD node would be temporary and
localised, and at this level, we really don't want code to change
anyway. Very different from IR.