[sve] Is it reasonable to disable #pragma unroll for sve

void testWhileWR(int *data1, int *data2, int size) {
  // #pragma unroll 2
  #pragma GCC unroll 2
  for (int i = 0; i < size; i++) {
    data2[i] = i;
  • assemble output of llvm:

.LBB0_2:                                // =>This Inner Loop Header: Depth=1
        mov     x11, x8                                // 1st loop body
        st1w    { z1.s }, p0, [x1, x8, lsl #2]
        incw    x11
        whilelo p1.s, x11, x9
        b.pl    .LBB0_4
        add     z1.s, z1.s, z0.s                     // start 2nd loop body
        st1w    { z1.s }, p1, [x10, x8, lsl #2]
        inch    x8
        add     z1.s, z1.s, z0.s
        whilelo p0.s, x8, x9
        b.mi    .LBB0_2

llvm generate 2 similar loop bodies for the #pragma unroll 2, and gcc doesn’t unroll when it have +sve option (unroll without +sve option for gcc).

Because the sve already make full use the vector registers lanes, so it doesn’t look like there’s any performance gain.

There is, when more than one add executor in the processor, I believe.

Thanks @Lysias
Why only care about the add executor?
More precisely, do you want to express that all instructions in a loop require multiple functional units when we can get the performance gain with unroll?