Clarify llvm.vector.insert / llvm.vector.extract intrinsic semantics

I have a loop that fails to vectorize for AArch64 SVE due to an llvm.vector.extract intrinsic call that is both conditional and loop invariant. I expected LICM to speculatively hoist the call, and a godbolt example shows that is sufficient to trigger vectorization. LICM fails to hoist the call, because these intrinsics are not currently marked IntrSpeculatable. My question is, should they be?

The text in the language reference is ambiguous enough that the answer is not obvious, specifically, “idx must be a constant multiple of the known-minimum vector length of the result type. If the result type is a scalable vector, idx is first scaled by the result type’s runtime scaling factor. Elements idx through (idx + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined.”

I suspect the intent is for “result vector is undefined” to mean one of two things:

  • The result vector is a poison value, similar to how extractelement and insertelement are defined.
  • The behavior is undefined. This seems to have been the conclusion for a similar issue with the vector predication intrinsics, which suffer from the same ambiguity in the language reference. See D125296.

If it’s the former, we should be able to mark these intrinsics IntrSpeculatable, right? Targets will need to generate code that doesn’t crash or otherwise misbehave when idx is out of range, but if I am reading the code correctly, the target independent SelectionDAG legalization of these intrinsics is already doing that. Here is another godbolt example showing the generated code for AArch64.

Any guidance on how to proceed would be appreciated. At a minimum, I would like to tighten up the specification in the language ref.

Dave Kreitzer

We prefer poison over UB in the semantics.
Whether poison can be used boils down to how the intrinsic is lowered in all targets. It can be poison only if it doesn’t trigger any crash/exception/fault in any target.

If you know the answer to the previous question, please submit a patch to LangRef to fix the ambiguity.

I did some further investigation, and for in-tree targets, I believe these intrinsics are lowered safely such that we can define their semantics using poison rather than undefined behavior. D129656 has the suggested change.

The cases where we need to make the poison vs. UB distinction are insertion/extraction of a fixed-width vector into/from a scalable vector. For all other cases, the IR verifier statically determines & enforces the rule that " Elements idx through (idx + num_elements(result_type) - 1) must be valid vector indices."

All unit tests that exercise these cases generate safe code when idx is out-of-range at runtime. I determined this by manual inspection. Here are two examples for AArch64 & RISCV, the two affected targets.

In the AArch64 code below, the value is x8 is clamped to prevent ldr q0, [x9, x8] from loading beyond the runtime length of the stored vector.

; Goes through memory currently; idx != 0.
define <2 x i64> @extract_v2i64_nxv2i64_idx2(<vscale x 2 x i64> %vec) nounwind {
; CHECK-LABEL: extract_v2i64_nxv2i64_idx2:
; CHECK:       // %bb.0:
; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
; CHECK-NEXT:    addvl sp, sp, #-1
; CHECK-NEXT:    cntd x8
; CHECK-NEXT:    mov w9, #2
; CHECK-NEXT:    sub x8, x8, #2
; CHECK-NEXT:    ptrue p0.d
; CHECK-NEXT:    cmp x8, #2
; CHECK-NEXT:    st1d { z0.d }, p0, [sp]
; CHECK-NEXT:    csel x8, x8, x9, lo
; CHECK-NEXT:    mov x9, sp
; CHECK-NEXT:    lsl x8, x8, #3
; CHECK-NEXT:    ldr q0, [x9, x8]
; CHECK-NEXT:    addvl sp, sp, #1
; CHECK-NEXT:    ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT:    ret
  %retval = call <2 x i64> @llvm.experimental.vector.extract.v2i64.nxv2i64(<vscale x 2 x i64> %vec, i64 2)
  ret <2 x i64> %retval

The semantics of the RISCV vslidedown instruction accommodate an out-of-range index by using 0 for any out-of-range elements.

define void @extract_v2i8_nxv2i8_2(<vscale x 2 x i8> %x, <2 x i8>* %y) {
; CHECK-LABEL: extract_v2i8_nxv2i8_2:
; CHECK:       # %bb.0:
; CHECK-NEXT:    vsetivli zero, 2, e8, mf4, ta, mu
; CHECK-NEXT: v8, v8, 2
; CHECK-NEXT:    vsetivli zero, 2, e8, mf8, ta, mu
; CHECK-NEXT:    vse8.v v8, (a0)
; CHECK-NEXT:    ret
  %c = call <2 x i8> @llvm.experimental.vector.extract.v2i8.nxv2i8(<vscale x 2 x i8> %x, i64 2)
  store <2 x i8> %c, <2 x i8>* %y
  ret void