I am interested in how IRTranslator and LegalizerHelper handle alloca, specifically when
alloca’s size is dynamic, andalloca’s alignment is stricter than what the target requires for its stack
When this is true, translateAlloca will round the size of the alloca up to the target’s stack alignment here:
bool IRTranslator::translateAlloca(const User &U,
MachineIRBuilder &MIRBuilder) {
...
// Round the size of the allocation up to the stack alignment size
// by add SA-1 to the size. This doesn't overflow because we're computing
// an address inside an alloca.
unsigned StackAlign =
MF->getSubtarget().getFrameLowering()->getStackAlignment();
auto SAMinusOne = MIRBuilder.buildConstant(IntPtrTy, StackAlign - 1);
auto AllocAdd = MIRBuilder.buildAdd(IntPtrTy, AllocSize, SAMinusOne,
MachineInstr::NoUWrap);
auto AlignCst =
MIRBuilder.buildConstant(IntPtrTy, ~(uint64_t)(StackAlign - 1));
auto AlignedAlloc = MIRBuilder.buildAnd(IntPtrTy, AllocAdd, AlignCst);
Align Alignment = std::max(AI.getAlign(), DL->getPrefTypeAlign(Ty));
if (Alignment <= StackAlign)
Alignment = Align(1);
MIRBuilder.buildDynStackAlloc(getOrCreateVReg(AI), AlignedAlloc, Alignment);
...
Then, LegalizerHelper aligns the pointer result of DYN_STACKALLOC here:
Register LegalizerHelper::getDynStackAllocTargetPtr(Register SPReg,
Register AllocSize,
Align Alignment,
LLT PtrTy) {
...
if (Alignment > Align(1)) {
APInt AlignMask(IntPtrTy.getSizeInBits(), Alignment.value(), true);
AlignMask.negate();
auto AlignCst = MIRBuilder.buildConstant(IntPtrTy, AlignMask);
Alloc = MIRBuilder.buildAnd(IntPtrTy, Alloc, AlignCst);
}
return MIRBuilder.buildCast(PtrTy, Alloc).getReg(0);
}
In my case Alignment > Align(1) is true because the alignment coming from alloca is stricter than StackAlign. Doesn’t aligning the size of the alloca in IRTranslator become redundant if the resulting pointer gets aligned here in the legalizer either way? If so, I think the codegen improves if we avoid padding the size of the alloca in translateAlloca when it has alignment stricter than the stack alignment.
Otherwise, is there a reason why the size of alloca must be aligned, and not just the pointer result?
I traced this back to the implementation reviewed in D66678.
Thanks.