[ARM SVE] *_ZI instructions would not be selected in loop bodies: SVEAllActive cannot see through BB boundaries

Hi,

I hope I am asking this question in the right place here. I am working on an academic project extending ARM SVE.

I find that all *_ZI (ADD_ZI, MUL_ZI) variants of the instructions would not be selected inside a loop.

const int VB = svcntb();
void test(int n, int8_t* pdata) {
  for (int i = 0; i < n; i++, pdata += VB) {
    svint64_t v = fixvec(mt(), mt());   // generate random vector via dup instructions
    v = svmul_n_s64_m(svptrue_b64(), v, 3);
    v = svadd_n_s64_m(svptrue_b64(), v, 1);
    svst1(svptrue_b64(), (int64_t*)pdata, v);
  }
}

I am no expert of LLVM. But judging from debug log of llc, it seems that

  • The all true predicates get moved out of the loop body during the LICM pass.
  • The instructions inside the loop body takes a CopyFromReg predicate as arguments.
  • SVEAllActive which guards the selection of *_ZI instructions, would not know that the moved-out predicate is all true. It only follows REINTERPRET_CAST, not CopyFrom.

I am not sure if this is a desired feature. Obviously it would increase instruction counts by not selecting *_ZI variants.

Maybe some passes have to be done before the LICM pass to retain the AllActive predicate information for the instruction selection.

Regards

SelectionDAG ISel can’t see values across basic blocks, so it’s impossible to write a pattern to generate the optimal instruction; any value defined in a different block is going to be an opaque CopyFromReg.

We normally don’t try to block LICM to deal with this sort of issue. A few possibilities:

  • We move stuff around before isel (AArch64TargetLowering::shouldSinkOperands),
  • We pattern-match after isel (AArch64MIPeepholeOpt)
  • We transform the intrinsic to a native IR instruction (AArch64TTIImpl::instCombineIntrinsic)

(If you’re seeing a missed optimization with unmodified LLVM, please file a bug at Issues · llvm/llvm-project · GitHub so we don’t lose track of it.)

1 Like

Thanks a lot for the reply. Trying to digest the possibilities you mentioned (not an LLVM expert).

And I did open a issue in github llvm-project