## TL;DR

The current semantics of `llvm.experimental.get.vector.length`

is too permissive such that it might give optimizers a hard time on assessing the number of iterations in a VP-vectorized loop. This RFC proposes a new set of semantics similar to that of RVV’s VSETVLI instructions but generic enough for other (future) targets using VP intrinsics for loop vectorizations.

## Background & Problem Statements

VP intrinsics require an explicit vector length (EVL) argument which describes the actual number of elements it will process. In a loop vectorized by VP intrinsics, such EVL can be obtained from calling `llvm.experimental.get.vector.length`

with the number of remaining elements (`%cnt`

) and effectively the vector type it’s working on (described by `%vf`

and `%scalable`

) as the arguments.

```
declare i32 @llvm.experimental.get.vector.length.i32(i32 %cnt, i32 immarg %vf, i1 immarg %scalable)
```

At the end of an iteration, we subtract the returned EVL from `%cnt`

to get the new `%cnt`

for the next iteration. In this discussion, we also let `%max_lanes`

be the number of lanes in the type described by `%vf`

and `%scalable`

.

While working on #91796, which converts VP-vectorized loop into using induction variables (IV) derived from EVL, we discovered that the liberal semantics of `experimental.get.vector.length`

might cause some correctness issues: in a correct transformation that swaps the original IV for another, the new vectorized loop needs to have the same number of iterations as the original one. Take a vectorized loop with VF = 4 (i.e. each vector operation inside the loop processes at most 4 elements) as an example, if we have a total of 38 elements to process (the original trip count, `TC`

), there are total of 10 iterations in the original loop with the last iteration processing only 2 elements. When switching to EVL-based IV, the number of iterations is dictated by how many times it takes to subtract EVL from the remaining number of elements begins with `TC`

.

With the current `experimental.get.vector.length`

semantics, *technically* it can return 1 for EVL in each iteration, creating a new vectorized loop with total of 38 iterations, which is obviously wrong. The reason it can return 1 for EVL is because there are only three constraints in the current semantics:

- If
`%cnt`

equals to 0, it has to return 0 - The returned value is less than or equal to
`%max_lane`

- The return values decrease monotonically (i.e. future values are less or equal than the current value) in each iteration of a vectorized loop.

## Proposal

Here are the new constraints for the return value of `experimental.get.vector.length`

I would like to propose here:

- If
`%cnt`

equals to 0, returns 0. - The returned value is always less than or equal to
`%max_lanes`

. - (Credit: @topperc) The returned value is always larger than or equal to
`ceil(%cnt / ceil(%cnt / %max_lanes))`

.- This implies that if
`%cnt`

is non-zero, the result should be non-zero

as well. - This also implies that if
`%cnt`

is less than`%max_lanes`

, it has to

return`%cnt`

.

- This implies that if
- The returned values decrease monotonically in each loop iteration. That is,

the returned value of a iteration is at least as large as that of any later

iterations.

This new semantics is similar to VSETVLI instruction’s behavior from RISC-V Vector Extension (RVV), yet it’s not too RISC-V specific and leaving rooms for future adoptions of this intrinsics from other targets.

These new rules effectively put a *lower bound* on the value returned from `experimental.get.vector.length`

in each iteration. So cases like EVL = 1 we showed earlier are no longer allowed, making the creation a new EVL-based IV loop with number of iterations matching the original loop easier.

Feedbacks are welcomed