optimisation issue in an llvm IR pass

Hello,

I have an optimisation issue in an llvm IR pass - the issue being that
unnecessary instructions are generated in the final assembly (with -O3).

I want to create the following assembly snippet:

  mov dl,BYTE PTR [rsi+rdi*1]
  add dl,0x1
  adc dl,0x0
  mov BYTE PTR [rsi+rdi*1],dl

however what is created is (variant #1):

  mov dl,BYTE PTR [rsi+rdx*1]
  add dl,0x1
  cmp dl,0x1 // <- not needed
  adc dl,0x0
  mov BYTE PTR [rsi+rdi*1],dl

in variant #2 the following is created:

mov dl,BYTE PTR [rsi+rdi*1]
mov ecx,edx // <- not needed
add cl,0x1 // <- should be done to dl instead
adc dl,0x1
mov BYTE PTR [rsi+rdi*1],dl

Far below are both variants with the full code around it, however the
difference in both variants is this:

//variant1
auto cf = IRB.CreateICmpEQ(Incr, ConstantInt::get(Int8Ty, 0));
Incr = IRB.CreateAdd(Incr, cf);

//variant2
auto cf = IRB.CreateICmpULT(Incr, ConstantInt::get(Int8Ty, 1));
Incr = IRB.CreateAdd(Incr, cf);

//interestingly this totally different approach creates the same
instructions as variant2
CallInst *AddOv = IRB.CreateBinaryIntrinsic(Intrinsic::uadd_with_over
flow, Counter, ConstantInt::get(Int8Ty, 1));
AddOv->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *SumWithOverflowBit = AddOv;
Incr = IRB.CreateAdd(IRB.CreateExtractValue(SumWithOverflowBit, 0),
IRB.CreateZExt(IRB.CreateExtractValue(SumWithOverflowBit, 1), Int8Ty));

So - How do I have to write the code that the target code has a chance
of being generated?
For me its the same result on LLVM 6.0 and 7.

Alternatively
  add BYTE PTR [rsi+rdi*1],0x1
  adc BYTE PTR [rsi+rdi*1],0x0
would work as well.

Thank you very much!

Regards,
Marc

-> Below now is the full llvm IR pass code

// vvv same code before both variants
LoadInst *PrevLoc = IRB.CreateLoad(AFLPrevLoc);
PrevLoc->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *PrevLocCasted = IRB.CreateZExt(PrevLoc, IRB.getInt32Ty());

LoadInst *MapPtr = IRB.CreateLoad(AFLMapPtr);
MapPtr->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *MapPtrIdx = IRB.CreateGEP(MapPtr, IRB.CreateXor(PrevLocCasted,
CurLoc));

LoadInst *Counter = IRB.CreateLoad(MapPtrIdx);
Counter->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

Value *Incr = IRB.CreateAdd(Counter, ConstantInt::get(Int8Ty, 1));
// ^^^ same code before both variants

// CODE FOR VARIANT #1
auto cf = IRB.CreateICmpEQ(Incr, ConstantInt::get(Int8Ty, 0));
Incr = IRB.CreateAdd(Incr, cf);

// CODE FOR VARIANT #2
auto cf = IRB.CreateICmpULT(Incr, ConstantInt::get(Int8Ty, 1));
Incr = IRB.CreateAdd(Incr, cf);

// vvv same code after both variants
IRB.CreateStore(Incr,
MapPtrIdx)->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
StoreInst *Store = IRB.CreateStore(ConstantInt::get(Int32Ty, cur_loc >>
1), AFLPrevLoc);
Store->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
// ^^^ same code after both variants

Don’t the CreateICmp calls return a Value* with an i1 type? But then they are added to an i8 type? Not sure that works.

Have you tried using the llvm.uadd.with.overflow.i8 intrinsic?

Hi Craig,

Don't the CreateICmp calls return a Value* with an i1 type? But then
they are added to an i8 type? Not sure that works.

I had that initially:
auto cf = IRB.CreateICmpULT(Incr, ConstantInt::get(Int8Ty, 1));
auto carry = IRB.CreateZExt(cf, Int8Ty);
Incr = IRB.CreateAdd(Incr, carry);
it makes no difference to the generated assembly

Have you tried using the llvm.uadd.with.overflow.i8 intrinsic?

we have tried this:

CallInst *AddOv =
IRB.CreateBinaryIntrinsic(Intrinsic::uadd_with_overflow, Counter,
ConstantInt::get(Int8Ty, 1));
AddOv->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *SumWithOverflowBit = AddOv;
Incr = IRB.CreateAdd(IRB.CreateExtractValue(SumWithOverflowBit, 0),
IRB.CreateZExt(IRB.CreateExtractValue(SumWithOverflowBit, 1), Int8Ty));

but that generated exactly the same as

auto cf = IRB.CreateICmpULT(Incr, ConstantInt::get(Int8Ty, 1));
Incr = IRB.CreateAdd(Incr, cf);

All the above create (with llvm 6.0, 7 and 8):

mov dl,BYTE PTR [rsi+rdi*1]
mov ecx,edx // <- not needed
add cl,0x1 // <- should be done to dl instead
adc dl,0x1
mov BYTE PTR [rsi+rdi*1],dl

Thanks!

Regards,
Marc

This patch fixes the issue. I haven’t checked our lit tests yet to see what effect this has and if we can commit it.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

@@ -2967,7 +2967,8 @@ SDValue DAGCombiner::visitADDCARRYLike(SDValue N0, SDValue N1, SDValue CarryIn,

// Iff the flag result is dead:

// (addcarry (add|uaddo X, Y), 0, Carry) → (addcarry X, Y, Carry)

if ((N0.getOpcode() == ISD::ADD ||

  • (N0.getOpcode() == ISD::UADDO && N0.getResNo() == 0)) &&
  • (N0.getOpcode() == ISD::UADDO && N0.getResNo() == 0 &&

  • N0.getValue(1) != CarryIn)) &&

isNullConstant(N1) && !N->hasAnyUseOfValue(1))

return DAG.getNode(ISD::ADDCARRY, SDLoc(N), N->getVTList(),

N0.getOperand(0), N0.getOperand(1), CarryIn);

Patch posted for review here https://reviews.llvm.org/D64190

~Craig

Hello Craig,

thanks for filing a patch!
I would have hoped that the issue is on my side and not an issue in llvm
though :wink:

Is that patch for llvm 9 (only)?

Regards,
Marc

Yeah it will only be for 9.0

Looks like something like this will work on older versions, but it uses an X86 specific intrinsic so isn’t portable to other targets.

define void @foo(i32* %x) {
%a = load i32, i32* %x
%b = tail call { i8, i32 } @llvm.x86.addcarry.32(i8 0, i32 %a, i32 1)
%c = extractvalue { i8, i32 } %b, 0
%d = extractvalue { i8, i32 } %b, 1
%e = zext i8 %c to i32
%f = add i32 %d, %e
store i32 %f, i32* %x
ret void
}
declare { i8, i32 } @llvm.x86.addcarry.32(i8 , i32, i32)