Crash in SLP for vector data type as function argument.

Hi Shahid,

Thanks for the reply.

Actually, yes, the emitreduction() takes vectorizedvalue which is leaf of the tree. '
I got confused by the name of the argument passed while calling emitReduction().

Value *ReducedSubTree = emitReduction(VectorizedRoot, Builder)

Anyways, that should hardly matter.

I had mentioned the test case :

int foo(uint32x4_t a) {
  return a[0] + a[1] + a[2] + a[3];
}

LLVM IR :

define i32 @hadd(<4 x i32> %a) {
entry:
   %vecext = extractelement <4 x i32> %a, i32 0
   %vecext1 = extractelement <4 x i32> %a, i32 1
  %add = add i32 %vecext, %vecext1
   %vecext2 = extractelement <4 x i32> %a, i32 2
   %add3 = add i32 %add, %vecext2
   %vecext4 = extractelement <4 x i32> %a, i32 3
   %add5 = add i32 %add3, %vecext4
   ret i32 %add5
}

Now, when leaf %vecext is reached, the vectorizeTree() function call sets the VectorizedValue to 0th operand of extractelement instruction.

case Instruction::ExtractElelement: {
  if(CanReuseExtract(E->Scalars)) {
       Value *V = VL0->getOperand(0);
        E->VectorizedValue = V;
        return V;
     }
    return Gather(E->Scalars, VecTy);
}

Now in emitReduction(), the VectorizedValue is dyn_cast to Instruction.
In above IR, %a is not an instruction (function argument), hence while referring the casted value which is null,
crash occurs.

Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);

Note : The above test case won't crash with current svn version, since code for parsing the tree for above IR is yet
to be included in svn. Initial patch was submitted in http://reviews.llvm.org/D6818.
I am working on refining it, however, the above code flow is not disturbed at all in my patch of parsing.
You can try to reproduce the problem by importing above patch in local code.

When the vector data type 'a' is in global scope, a 'load' instruction is generated in basic block of the function:

test case 2:

unint32x4_t a;
int foo() {
  return a[0] + a[1] + a[2] + a[3];
}

IR for above test case :

@a = common global <4 x i32> zeroinitializer, align 16

define i32 @hadd() #0 {
entry:
   %0 = load <4 x i32>* @a, align 16, !tbaa !1
   %vecext = extractelement <4 x i32> %0, i32 0
   %vecext1 = extractelement <4 x i32> %0, i32 1
   %add = add i32 %vecext, %vecext1
  %vecext2 = extractelement <4 x i32> %0, i32 2
   %add3 = add i32 %add, %vecext2
   %vecext4 = extractelement <4 x i32> %0, i32 3
   %add5 = add i32 %add3, %vecext4
   ret i32 %add5
}

Now, since here, 0th operand of leaf %vecext is a load instruction,
the dyn_casting into an instruction will succeed here and reduction will be emitted properly.

How can we solve this problem? What type of casting should a function argument belong to?

Regards,
Suyog

Sender : Shahid, Asghar-ahmad<Asghar-ahmad.Shahid@amd.com>
Title : RE: [LLVMdev] Crash in SLP for vector data type as function argument.

Hi Suyog,

IMO emitReduction() takes a vectorized value which is the leafs of the matched pattern/tree.
So what you are thinking as root is actually the leaf of the tree.
Root should actually be the value which is being feed to the "return" statement.

It would be of great help if you could, share the sample test?

Regards,
Shahid

Hi Suyog,

Since CanReuseExtract(E->Scalars) checks properly the possibility of reusing the operand zero of
"extractelement", using below code in emitReduction may help resolve this issue.
emitReduction(...) {
...
Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);

The code in emitReduction has to be fixed. As your example shows it is not safe to assume we will always have an instruction as a result of vectorizeTree(). It seems to me that we can just remove the line that performs the cast. All subsequent uses of the value ‘ValToReduce’ actually are uses of “Value *TmpVec”. The IRBuilder in the variable “Builder” carries the insertion point for all operations in this function (inserting after the instruction “ValToReduce” would be a reason why we need an “Instruction”).

  /// \brief Emit a horizontal reduction of the vectorized value.
  Value *emitReduction(Value *VectorizedValue, IRBuilder<> &Builder) {
    assert(VectorizedValue && "Need to have a vectorized tree node");
    Instruction *ValToReduce = dyn_cast<Instruction>(VectorizedValue);
    assert(isPowerOf2_32(ReduxWidth) &&
           "We only handle power-of-two reductions for now");

    Value *TmpVec = ValToReduce;
    for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
      if (IsPairwiseReduction) {
        Value *LeftMask =
          createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
        Value *RightMask =
          createRdxShuffleMask(ReduxWidth, i, true, false, Builder);

        Value *LeftShuf = Builder.CreateShuffleVector(
          TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
        Value *RightShuf = Builder.CreateShuffleVector(
          TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
          "rdx.shuf.r");
        TmpVec = createBinOp(Builder, ReductionOpcode, LeftShuf, RightShuf,
                             "bin.rdx");
      } else {
        Value *UpperHalf =
          createRdxShuffleMask(ReduxWidth, i, false, false, Builder);
        Value *Shuf = Builder.CreateShuffleVector(
          TmpVec, UndefValue::get(TmpVec->getType()), UpperHalf, "rdx.shuf");
        TmpVec = createBinOp(Builder, ReductionOpcode, TmpVec, Shuf, "bin.rdx");
      }
    }

Thanks,
Arnold