RFC: Allowing scalable vectors in structs to support multiple return values from intrinsics


RISCV vector intrinsics have a need to return multiple scalable vector values. We plan to use this to return multiple results from segment load intrinsics. These instructions conceptually load multiple separate registers. I believe SVE has similar load intrinsics, but they use wide scalable vectors that get split during type legalization. We are struggling to do the same since we already use the size of the fixed part of a scalable vector to determine LMUL size. This means we have legal scalable vector types with different fixed part sizes. This makes it very difficult for type legalization to split a wide vector type for a segment load into the correct number of pieces. We feel that returning multiple results and avoiding the type legalizer is an easier path.

Unfortunately, supporting multiple scalable vectors being returned from an intrinsic requires scalable vectors to be allowed in structs. We would disallow such structs to be used by loads/stores/allocas/geps which would avoid needing to determine the offsets of any of the fields in the struct.

I’ve posted an initial patch for the IR and verifier portions of this here https://reviews.llvm.org/D94142

We would appreciate any feedback on this direction.


Ugh — I agree this is probably the least bad direction to go :-).

The real issue here is that LLVM IR doesn’t support multiple result values, instead forcing the use of first class aggregates. Given that design decision (which I regret btw) I agree with you that we have to do something like this, since scalable vectors are registers. Things shouldn’t be pinned into memory just because we get multiple results back.

I think that restricting this from “loaded and stored” FCA is a good way to go.


I think your approach is probably the best, but I think it’s worth sketching out an alternative approach using token that wouldn’t require verifier changes.

If you want to avoid verifier changes, the token type already gives you what you want: it is opaque, cannot be loaded and stored or phid, and can serve as a “value multiplexer”. You could invent new intrinsics to extract the values you need. The source of a token cannot be obscured, so lowering can always look through a token operand, find the source, and link up the correct SDValue in codegen. The downside is that all this solution would be opaque to the mid-level optimizers. They already understand insertvalue and extractvalue. So, I don’t think this is a good way to go, but it’s worth considering.

I think, if we all agree that first class aggregates that can be loaded, stored, and phi’d, were a historical design mistake, then your original proposed solution is a step in the direction of removing FCA support.

I wonder if, in the future, it would be possible to auto-upgrade IR that contains these FCA operations by splitting them into scalar operations, and we could make the verifier reject all of these undesired operations.

+1 to this goal. Getting there might be tricky, and take a while, but definite +1 to the direction.

Yeah I agree with Phillip, this would be a great direction.