[RFC][IR] Permit load/store/alloca for struct of the same scalable vector type

Motivation

The RVV intrinsics currently have segment load/store intrinsics like:

void __riscv_vlseg2e32_v_i32m1 (vint32m1_t *v0, vint32m1_t *v1, const int32_t *base, size_t vl);
void __riscv_vsseg3e32_v_i32m1 (int32_t *base, vint32m1_t v0, vint32m1_t v1, vint32m1_t v2, size_t vl);

Multiple issues [0, 1, 2, 3, 4] has suggested the RVV intrinsics to develop more convenient types like tuple types, or take the limit even further, to allow user-defined structures. This proposal is motivated by feedback of unpleasant user experience and wants to define built-in vector tuple types for the intrinsic users. The tuple types will allow the RVV segment load/store intrinsics to have an aligned interface with the non-segment load/store (vle, vse) intrinsics.

vint32m1x2_t __riscv_vlseg2e32_v_i32m1(const int8_t *base, size_t vl);
void __riscv_vsseg3e32_v_i32m1 (int32_t *base, vint32m1x3_t v_tuple, size_t vl);

First thing to be clear here is that we don’t want to discuss anything related to a user-defined structure yet, as Richard Sandiford has made a significant effort [5] back in the developments of SVE but the language standard hasn’t favored such proposal.

This RFC cares and only cares about introducing the RVV vector tuple type right now and hope to relax current restrictions of the LLVM IR to achieve this.

Background

The scalable vector tuple type was proposed during the developments of the RVV intrinsics back in 2021, but the patch [6] and RFC in the mailing list [7] of HsiangKai’s didn’t receive a positive feedback. The development schedule of the RVV intrinsics was tight so the feature was dropped and eventually we ended up with the current interface we have (as mentioned above). Right now we have the time and are revisiting issues again to give the users a ratified specification of the intrinsics that is improved through our whole year of user experience upon it.

Problem

The most natural way to define a tuple of scalable vector type is to see it as an aggregate type of scalable vectors, but currently LLVM does not allow this to be done because the allocation of such aggregate type does not satisfy the current definition for alloca [8].

//// Edit (2023/04/02): Add the paragraph
To explain in further detail, the definition in alloca:

The ‘alloca ’ instruction allocates sizeof(<type>)*NumElements bytes of memory on the runtime stack, returning a pointer of the appropriate type to the program.

type ’ may be any sized type.

corresponds to the check in bool Type::isSized, where StructType::isSized currently don’t consider any structure with scalable vector type as sized.

LLVM Language Reference also mentions the restriction of alloca/load/store any un:

Scalable vectors cannot be global variables or members of arrays because their size is unknown at compile time. They are allowed in structs to facilitate intrinsics returning multiple values. Structs containing scalable vectors cannot be used in loads, stores, allocas, or GEPs.

//// End of Edit (2023/04/02)

Bypassing this restriction is not possible because eventually the problem is that type used in LLVM IR has to be allocated in the stack and has to map to an address. So expanding the tuple type in C into separate scalable vector types in the LLVM IR will not be possible.

Proposal

//// Edit (2023/04/02): After the discussion in RISC-V LLVM bi-weekly sync-up, the previous option of having the tuple type as an explicit special case and not considering as “sized” is not a clean approach to the problem. Eliminated the second option here.

We can:

  • Consider the tuple types, which will be aggregate types with homogenous scalable vector types, as a sized type. The description of alloca " The ‘alloca ’ instruction allocates sizeof(<type>)*NumElements bytes of memory on the runtime stack" would need to be adjusted. The sizeof notation need to be explained further that during compile time the minimum size of the memory is known and the actual size of the memory allocated will be known during runtime.
  • Consider the tuple type as a special case here, since the tuple type will only be exposed and used to the RVV intrinsics users.

//// Edit (2023/04/02)
With this direction, places in the Language reference manual will need to be modified, the next update of D146872 should include this.
//// End of Edit (2023/04/02)


About GEP instructions, for now we may forbid the type to be a target of GEP as the “getting underlying vector” and “seting vector into the tuple” will be handled through call of RVV intrinsic.

Eventually we will have LLVM IR like:

// C
__rvv_int32m1x2_t bar() {
  __rvv_int32m1x2_t v_tuple;
  return v_tuple;
}
void baz(__rvv_int32m1x2_t v_tuple) {
}
// LLVM IR
define { <vscale x 2 x i32>, <vscale x 2 x i32> } @bar() {
entry:
  %v_tuple = alloca { <vscale x 2 x i32>, <vscale x 2 x i32> }, align 4
  %0 = load { <vscale x 2 x i32>, <vscale x 2 x i32> }, ptr %v_tuple, align 4
  ret { <vscale x 2 x i32>, <vscale x 2 x i32> } %0
}
define void @baz(<vscale x 2 x i32> %v_tuple.coerce0, <vscale x 2 x i32> %v_tuple.coerce1) {
entry:
  %v_tuple = alloca { <vscale x 2 x i32>, <vscale x 2 x i32> }, align 4
  %v_tuple.addr = alloca { <vscale x 2 x i32>, <vscale x 2 x i32> }, align 4
  %0 = load { <vscale x 2 x i32>, <vscale x 2 x i32> }, ptr %v_tuple, align 4
  %1 = insertvalue { <vscale x 2 x i32>, <vscale x 2 x i32> } %0, <vscale x 2 x i32> %v_tuple.coerce0, 0
  %2 = insertvalue { <vscale x 2 x i32>, <vscale x 2 x i32> } %1, <vscale x 2 x i32> %v_tuple.coerce1, 1
  store { <vscale x 2 x i32>, <vscale x 2 x i32> } %2, ptr %v_tuple, align 4
  %v_tuple1 = load { <vscale x 2 x i32>, <vscale x 2 x i32> }, ptr %v_tuple, align 4
  store { <vscale x 2 x i32>, <vscale x 2 x i32> } %v_tuple1, ptr %v_tuple.addr, align 4
  ret void
}

Proposal to the RVV intrinsic specification is

Current proof-of-concept is under

References

[0] Vector tuple type · Issue #17 · riscv-non-isa/rvv-intrinsic-doc · GitHub
[1] RISC-V Vector Intrinsics types missing casts between same size vectors and groups of vectors · Issue #83 · riscv-non-isa/rvv-intrinsic-doc · GitHub
[2] Is it possible to construct "sizeless" structs with register types as members? · Issue #110 · riscv-non-isa/rvv-intrinsic-doc · GitHub
[3] Relax constaints on sizeless types · Issue #154 · riscv-non-isa/rvv-intrinsic-doc · GitHub
[4] Make the tuple type API for segment load store as compiler optional feature · Issue #139 · riscv-non-isa/rvv-intrinsic-doc · GitHub
[5] Richard Sandiford - [00/10][RFC] Splitting the C and C++ concept of "complete type"
[6] ⚙ D98169 [IR] Permit load/store/alloca for struct with the same scalable vectors.
[7] https://groups.google.com/g/llvm-dev/c/6ZK2eS4-8t0/m/PG6H1NNDBAAJ
[8] LLVM Language Reference Manual — LLVM 18.0.0git documentation

CC-ing: @topperc @kito-cheng @efriedma-quic @sdesmalen-arm @rofirrim @frasercrmck @LebedevRI @jdoerfert @preames @nikic

Hi @eopXD,

Thanks for putting this together.

Consider the tuple types, which will be aggregate types with homogenous scalable vector types, as a sized type. The description of alloca " The ‘alloca ’ instruction allocates sizeof(<type>)*NumElements bytes of memory on the runtime stack" would need to be adjusted. The sizeof notation need to be explained further that during compile time the minimum size of the memory is known and the actual size of the memory allocated will be known during runtime.

An additional clarification may be useful here but my understanding is that this is already happening with allocas of scalable vectors, right?

I would be in favour of only homogeneous structs of scalable vector be considered sized. I assume TypeSize can be used to represent the size by scaling the scalable vector type size to the number of elements of the tuple, so this should not be an obstacle. Maybe there is some other far-reaching consequence when making them sized?

I’m also OK with forbidding GEPs on those types. Not doing so would probably break the expectation held in many places that offsets are constant values.

Couple points I want to highlight.

We can’t use wide vectors like SVE does because we already use the KnownMinSize of the scalable type to pick different register sizes. RVV has the ability to join registers into pairs, quadruples, or octuples. There is some overlap between tuples and this pairing/quadrupling/octupling so at first it may seem like they can share the same types.

Unfortunately, we need to support tuples that contain halves, quarters, or eights of multiple registers. If we use a single vector type, it would become difficult to disambiguate these cases.
Using a struct allows us to nicely distinquish the cases.

We also hope that by using structs in a direct way like this, passes like SROA/mem2reg will be able break them down or remove the load/store/alloca all together.

I think this is a reasonable extension. As the struct size is still representable by TypeSize, this doesn’t change anything too fundamental. The primary change is that struct offsets are now no longer guaranteed to be fixed, so all users of relevant StructLayout users need to be reviewed to ensure they handle that correctly.

Omitting GEP support is fine by me, with the understanding that vscale support in GEPs should be removed in general, so there’s no need to add more special cases for it at this point.