Overview
Currently we have built-in C types for RISCV vector tuple type, e.g. vint32m1x2_t
, however it’s is represented as structure of scalable vector types, i.e. {<vscale x 2 x i32>, <vscale x 2 x i32>}
. It loses the information for num_fields(NF) as struct is flattened during selectionDAG
, thus it makes it not possible to handle inline assembly of vector tuple type, it also makes the calling convention of vector tuple types handing not strait forward and hard to realize the allocation code, i.e. RVVArgDispatcher
.
This RFC proposes new llvm types for RISCV vector tuples represented as TargetExtType
which contains both LMUL
and NF
(num_fields) information and keep it all the way down to selectionDAG
to match the corresponding MVT
(support in the following patch). The llvm IR for the example above is then represented as target("riscv_vec_tuple", <vscale x 8 x i8>, 2)
in which the first type parameter is the equivalent size scalable vecotr of i8 element type, the following integer parameter is the NF
of the tuple.
The new RISCV specific vector insert/extract intrinsics are also added as llvm.riscv.vector.insert
and llvm.riscv.vector.extract
to handle tuple type subvector insertion/extraction since the generic ones only operates on VectorType
but not TargetExtType
.
There are total of 32 llvm types added for each VREGS * NF <= 8
, where VREGS
is the vector registers needed for each LMUL
and NF
is num_fields.
The name of types are:
target("riscv_vec_tuple", <vscale x 1 x i8>, 2) // LMUL = mf8, NF = 2
target("riscv_vec_tuple", <vscale x 1 x i8>, 3) // LMUL = mf8, NF = 3
target("riscv_vec_tuple", <vscale x 1 x i8>, 4) // LMUL = mf8, NF = 4
target("riscv_vec_tuple", <vscale x 1 x i8>, 5) // LMUL = mf8, NF = 5
target("riscv_vec_tuple", <vscale x 1 x i8>, 6) // LMUL = mf8, NF = 6
target("riscv_vec_tuple", <vscale x 1 x i8>, 7) // LMUL = mf8, NF = 7
target("riscv_vec_tuple", <vscale x 1 x i8>, 8) // LMUL = mf8, NF = 8
target("riscv_vec_tuple", <vscale x 2 x i8>, 2) // LMUL = mf4, NF = 2
target("riscv_vec_tuple", <vscale x 2 x i8>, 3) // LMUL = mf4, NF = 3
target("riscv_vec_tuple", <vscale x 2 x i8>, 4) // LMUL = mf4, NF = 4
target("riscv_vec_tuple", <vscale x 2 x i8>, 5) // LMUL = mf4, NF = 5
target("riscv_vec_tuple", <vscale x 2 x i8>, 6) // LMUL = mf4, NF = 6
target("riscv_vec_tuple", <vscale x 2 x i8>, 7) // LMUL = mf4, NF = 7
target("riscv_vec_tuple", <vscale x 2 x i8>, 8) // LMUL = mf4, NF = 8
target("riscv_vec_tuple", <vscale x 4 x i8>, 2) // LMUL = mf2, NF = 2
target("riscv_vec_tuple", <vscale x 4 x i8>, 3) // LMUL = mf2, NF = 3
target("riscv_vec_tuple", <vscale x 4 x i8>, 4) // LMUL = mf2, NF = 4
target("riscv_vec_tuple", <vscale x 4 x i8>, 5) // LMUL = mf2, NF = 5
target("riscv_vec_tuple", <vscale x 4 x i8>, 6) // LMUL = mf2, NF = 6
target("riscv_vec_tuple", <vscale x 4 x i8>, 7) // LMUL = mf2, NF = 7
target("riscv_vec_tuple", <vscale x 4 x i8>, 8) // LMUL = mf2, NF = 8
target("riscv_vec_tuple", <vscale x 8 x i8>, 2) // LMUL = m1, NF = 2
target("riscv_vec_tuple", <vscale x 8 x i8>, 3) // LMUL = m1, NF = 3
target("riscv_vec_tuple", <vscale x 8 x i8>, 4) // LMUL = m1, NF = 4
target("riscv_vec_tuple", <vscale x 8 x i8>, 5) // LMUL = m1, NF = 5
target("riscv_vec_tuple", <vscale x 8 x i8>, 6) // LMUL = m1, NF = 6
target("riscv_vec_tuple", <vscale x 8 x i8>, 7) // LMUL = m1, NF = 7
target("riscv_vec_tuple", <vscale x 8 x i8>, 8) // LMUL = m1, NF = 8
target("riscv_vec_tuple", <vscale x 16 x i8>, 2) // LMUL = m2, NF = 2
target("riscv_vec_tuple", <vscale x 16 x i8>, 3) // LMUL = m2, NF = 3
target("riscv_vec_tuple", <vscale x 16 x i8>, 4) // LMUL = m2, NF = 4
target("riscv_vec_tuple", <vscale x 32 x i8>, 2) // LMUL = m4, NF = 2
Background
RISCV vector tuple type is a clang builtin-type to model the NFIELDS
of vector register groups in Vector Load/Store Segment Instructions, it is represented as, for example, vint32m1x2_t
and it can be lowered to corresponding llvm type “{<vscale x 2 x i32> %0, <vscale x 2 x i32> %1}” in current llvm implementation. During register assignments after calling convention or register allocation, it should be placed in VRN1M2
register class.
Problem statement
The constraint of vector tuple type is that it needs to be allocated to adjacent number of vector registers, for example, %0 → v2, %1 → v3, but it’s illegal to assign as such, %0 → v2, %1 → v5.
However in current approach which uses “struct of scalable vector” as illustrated in Background, it would be flattened to multiple primitive types, i.e. nxv2i32, nxv2i32
, which loses the group information during selectionDAG
construction, thus we are not able to guarantee that they’re placed in adjacent vector registers.
Related work
There are multiple patches that add initial support for RISCV vector tuple type:
- Permit load/store/alloca for struct of the same scalable vector type
- Define RVV tuple types
- Add typedef of the tuple type and define tuple type variant of vlseg2e32
- Define tuple type variant of vsseg2e32
- Define tuple type variant of vlseg2e32ff
- Define tuple type variant of vlsseg2e32
- Define tuple type variant of vssseg2e32
- Define tuple type variant of vloxseg2ei32 vluxseg2ei32
- Define tuple type variant of vsoxseg2ei32 vsuxseg2ei32
- Define vget for tuple type
- Define vset for tuple type
- RISCV vector calling convention
Pull requests
- Support RISCV vector tuple type in llvm IR
- Add RISCV vector tuple type to value types(MVT)
- Support RISCV vector tuple CodeGen and Calling Convention
CC: @rofirrim @asb @wangpc-pp @nikic @lukel @preames @topperc @kito-cheng