Hi all,
One of the reasons the Livermore Loops couldn’t be vectorized is that it was using global structures to hold the arrays. Today, I’m investigating why is that so and how to fix it.
My investigation brought me to LoopVectorizationLegality::canVectorizeMemory():
if (WriteObjects.count(*it)) {
DEBUG(dbgs() << “LV: Found a possible read/write reorder:”
<< **it <<"\n");
return false;
}
In the first pass, it registers all underlying objects for writes, than it does it again for reads, if the value was already there, it’s a conflict.
However, the read is from Foo.bl / Foo.cl and the write to Foo.al, so why is GetUnderlyingObjects() returning the same objects/pointers?
A quick look at it revealed me the problem:
llvm::GetUnderlyingObject(Value *V, const DataLayout *TD, unsigned MaxLookup) yields:
→ GEPOperator *GEP = dyn_cast(V)
→ V = GEP->getPointerOperand();
→ GlobalAlias *GA = dyn_cast(V)
→ V = GA->getAliasee();
return V;
In this case, V is a reference to the structure, not the element. It seems to me that assigning the pointer operand from GEP is too simplistic. Either GetUnderlyingObject() should store the indices to return the correct object, or GetUnderlyingObjects() should create a special case for it (as it does with selects and phi nodes).
Does that make sense?
cheers,
–renato
PS:
A simplified version of the IR:
%struct.anon = type { [256 x i64], [256 x i64], [256 x i64] }
@Foo = common global %struct.anon zeroinitializer, align 8
…
%arrayidx = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 1, i32 %idxprom
%0 = load i64* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 2, i32 %idxprom
%1 = load i64* %arrayidx2, align 8
%mul = mul nsw i64 %1, %0
%arrayidx4 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, i32 %idxprom
store i64 %mul, i64* %arrayidx4, align 8