Understanding an issue with indirect split CCVals (RISC-V)


I have found a bug when passing fixed-length vectors to/from functions in RISC-V. I’m a bit rusty on the CC support in general (let alone on those for RISC-V) so I was hoping for some pointers on how to fix it.

Take this program:

declare i32 @foo(<4 x i8>)

define i32 @bar(<4 x i8> %v) {
%r = call i32 @foo(<4 x i8> %v)
ret i32 %r

And compile with for RV64:

$> llc -mtriple riscv64 bug.ll -o - -stop-after finalize-isel


  • { id: 0, name: ‘’, type: default, offset: 0, size: 4, alignment: 4,
    stack-id: default, callee-saved-register: ‘’, callee-saved-restored: true,
    debug-info-variable: ‘’, debug-info-expression: ‘’, debug-info-location: ‘’ }

SD killed %4, %stack.0, 24 :: (store 8 into %stack.0)
SD killed %3, %stack.0, 16 :: (store 8 into %stack.0)
SD killed %2, %stack.0, 8 :: (store 8 into %stack.0)
SD killed %1, %stack.0, 0 :: (store 8 into %stack.0)
%5:gpr = ADDI %stack.0, 0
$x10 = COPY %5
PseudoCALL target-flags(riscv-plt) @foo, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10

You’ll notice that the v4i8 parameter has been split into 4 8-byte registers and passed indirectly through the stack as a 32-byte object, but the temporary stack location has been created according to v4i8, whose store size and alignment are both 4 bytes. This clobbers the stack and will produce misaligned stores.

Should we be storing these with four SDs but to a 32-byte sized and 8-byte aligned location, or should they be truncating SB stores? I couldn’t find the answer in the RISC-V ABI docs.

Regardless of what we should be doing, I couldn’t see that LLVM would currently handle either particularly well. For truncating stores, I couldn’t see a way of getting at what would be the “IntermediateVT” (e.g. i8) through ISD::InputArg and ISD::OutputArg in the backend. For correcting the size & alignment of the stack location, I’m not convinced the issue is only in the RISC-V backend; in SelectionDAGISel::LowerArguments it looks as though it’s possible for PartBase to get out of sync when NumValues != 1 && VT.getStoreSize() != NumRegs * RegisterVT.getStoreSize(). Is that documented?

Any pointers on where to take this would be appreciated.