[AArch64] Missed FCCMP opportunity

Hi everyone! I am trying to work on the issue: [AArch64] Missed FCCMP opportunity · Issue #60819 · llvm/llvm-project · GitHub . In this issue I need to optimize the machine code generated by llvm and need to make it look like generated by gcc. But as I am newbie to llvm, so please can anyone provide me guidance for solving this issue?

I added some notes on the bug. “-debug-only=isel” is generally useful for looking at issues related to SelectionDAG.

I think for optimizing the assembly I need to write separate function something like this:

static SDValue performFloatOpt(SDNode *N, SelectionDAG &DAG) {
  EVT VT = N->getValueType(0);
  SDValue Cmp0 = N->getOperand(0);
  SDValue Cmp1 = N->getOperand(1);
  SDLoc DL(N);
  SDValue Cmp, Condition;
  unsigned NZCV;

  if (!Cmp0.getValueType().isFloatingPoint() || !Cmp1.getValueType().isFloatingPoint())
    return SDValue();

  if (!Cmp0->hasOneUse() || !Cmp1->hasOneUse())
    return SDValue();

  Condition = DAG.getConstant(AArch64CC::VS, DL, MVT::i32);
  NZCV = AArch64CC::getNZCVToSatisfyCondCode(AArch64CC::VS);
  Cmp = Cmp1;

  SDValue NZCVOp = DAG.getConstant(NZCV, DL, MVT::i32);
  SDValue CCmp = DAG.getNode(AArch64ISD::FCCMP, DL, MVT::i32, Cmp.getOperand(0), Cmp.getOperand(1), NZCVOp, Condition);

  SDValue CSel = DAG.getNode(AArch64ISD::CSINC, DL, MVT::i32, DAG.getConstant(0, DL, MVT::i32), CCmp, Condition);
  return DAG.getNode(ISD::AND, DL, MVT::i32, Cmp0.getOperand(0), Cmp1.getOperand(0));
}

And I am calling this function inside performORCombine() and performANDCombine()
but don’t know why it’s not working. Can you please provide your insights into it?

Not sure what “not working” means. If you mean the optimization isn’t triggering, maybe add some debug prints to check that your code is getting called?

Not sure you’re creating the FCCMP node you’re creating is correct; I think there are normally five operands?

Yes, the optimization is not triggering. I tried adding debug messages and I think the function is not getting called because the debug message is not printing.

I don’t know exactly if there are five operands. But I think if there should be five operands only then it should give error during the build. Right?

The way the getNode() API works, the API itself doesn’t know the right number of operands for target-specific nodes, so getting it wrong would probably show up as an error at runtime during the final isel step.

Looking again, I’m not sure Cmp0.getValueType().isFloatingPoint() is checking what you want it to (Cmp0.getValueType() is the type of result of the compare, not the operands).

=== foo2
Initial selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
    t7: i1 = setcc t2, ConstantFP:f32<0.000000e+00>, setuo:ch
      t4: f32,ch = CopyFromReg t0, Register:f32 %1
    t8: i1 = setcc t4, ConstantFP:f32<0.000000e+00>, setuo:ch
  t9: i1 = and t7, t8
  t10: i32 = any_extend t9
    t11: i32 = zero_extend t9
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t11
  t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1


Optimized lowered selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 13 nodes:
  t0: ch,glue = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
        t16: i1 = setcc t2, t2, setuo:ch
        t15: i1 = setcc t4, t4, setuo:ch
      t9: i1 = and t16, t15
    t11: i32 = zero_extend t9
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t11
  t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1


Type-legalized selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 14 nodes:
  t0: ch,glue = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
        t17: i32 = setcc t2, t2, setuo:ch
        t18: i32 = setcc t4, t4, setuo:ch
      t19: i32 = and t17, t18
    t21: i32 = and t19, Constant:i32<1>
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t21
  t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1


Optimized type-legalized selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 12 nodes:
  t0: ch,glue = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
      t17: i32 = setcc t2, t2, setuo:ch
      t18: i32 = setcc t4, t4, setuo:ch
    t19: i32 = and t17, t18
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t19
  t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1


Legalized selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 16 nodes:
  t0: ch,glue = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
        t27: f32 = AArch64ISD::FCMP t2, t2
      t28: i32 = AArch64ISD::CSEL Constant:i32<0>, Constant:i32<1>, Constant:i32<7>, t27
        t24: f32 = AArch64ISD::FCMP t4, t4
      t26: i32 = AArch64ISD::CSEL Constant:i32<0>, Constant:i32<1>, Constant:i32<7>, t24
    t19: i32 = and t28, t26
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t19
  t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1


Optimized legalized selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 16 nodes:
  t0: ch,glue = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
        t27: f32 = AArch64ISD::FCMP t2, t2
      t28: i32 = AArch64ISD::CSEL Constant:i32<0>, Constant:i32<1>, Constant:i32<7>, t27
        t24: f32 = AArch64ISD::FCMP t4, t4
      t26: i32 = AArch64ISD::CSEL Constant:i32<0>, Constant:i32<1>, Constant:i32<7>, t24
    t19: i32 = and t28, t26
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t19
  t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1


===== Instruction selection begins: %bb.0 'entry'

ISEL: Starting selection on root node: t14: ch = AArch64ISD::RET_FLAG t13, Register:i32 $w0, t13:1
ISEL: Starting pattern match
  Morphed node: t14: ch = RET_ReallyLR Register:i32 $w0, t13, t13:1
ISEL: Match complete!

ISEL: Starting selection on root node: t13: ch,glue = CopyToReg t0, Register:i32 $w0, t19

ISEL: Starting selection on root node: t19: i32 = and t28, t26
ISEL: Starting pattern match
  Initial Opcode index to 316493
  Match failed at index 316497
  Continuing at 316863
  Match failed at index 316866
  Continuing at 316911
  Match failed at index 316913
  Continuing at 316959
  Match failed at index 316965
  Continuing at 317002
  Morphed node: t19: i32 = CSELWr Register:i32 $wzr, t28, TargetConstant:i32<7>, t32:1
ISEL: Match complete!

ISEL: Starting selection on root node: t28: i32 = AArch64ISD::CSEL Constant:i32<0>, Constant:i32<1>, Constant:i32<7>, t27
ISEL: Starting pattern match
  Initial Opcode index to 358582
  TypeSwitch[i32] from 358599 to 358602
  Morphed node: t28: i32 = CSINCWr Register:i32 $wzr, Register:i32 $wzr, TargetConstant:i32<7>, t33:1
ISEL: Match complete!

ISEL: Starting selection on root node: t24: f32 = AArch64ISD::FCMP t4, t4
ISEL: Starting pattern match
  Initial Opcode index to 368891
  Skipped scope entry (due to false predicate) at index 368894, continuing at 368927
  Match failed at index 368933
  Continuing at 368948
  Morphed node: t24: i32 = FCMPSrr nofpexcept t4, t4
ISEL: Match complete!

ISEL: Starting selection on root node: t27: f32 = AArch64ISD::FCMP t2, t2
ISEL: Starting pattern match
  Initial Opcode index to 368891
  Skipped scope entry (due to false predicate) at index 368894, continuing at 368927
  Match failed at index 368933
  Continuing at 368948
  Morphed node: t27: i32 = FCMPSrr nofpexcept t2, t2
ISEL: Match complete!

ISEL: Starting selection on root node: t4: f32,ch = CopyFromReg t0, Register:f32 %1

ISEL: Starting selection on root node: t2: f32,ch = CopyFromReg t0, Register:f32 %0

ISEL: Starting selection on root node: t12: i32 = Register $w0

ISEL: Starting selection on root node: t3: f32 = Register %1

ISEL: Starting selection on root node: t1: f32 = Register %0

ISEL: Starting selection on root node: t0: ch,glue = EntryToken

===== Instruction selection ends:
Selected selection DAG: %bb.0 'foo2:entry'
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
  t2: f32,ch = CopyFromReg t0, Register:f32 %0
  t4: f32,ch = CopyFromReg t0, Register:f32 %1
          t27: i32 = FCMPSrr nofpexcept t2, t2
        t33: ch,glue = CopyToReg t0, Register:f32 $nzcv, t27
      t28: i32 = CSINCWr Register:i32 $wzr, Register:i32 $wzr, TargetConstant:i32<7>, t33:1
        t24: i32 = FCMPSrr nofpexcept t4, t4
      t32: ch,glue = CopyToReg t0, Register:f32 $nzcv, t24
    t19: i32 = CSELWr Register:i32 $wzr, t28, TargetConstant:i32<7>, t32:1
  t13: ch,glue = CopyToReg t0, Register:i32 $w0, t19
  t14: ch = RET_ReallyLR Register:i32 $w0, t13, t13:1


Total amount of phi nodes to update: 0
*** MachineFunction at end of ISel ***
# Machine code for function foo2: IsSSA, TracksLiveness
Function Live Ins: $s0 in %0, $s1 in %1

bb.0.entry:
  liveins: $s0, $s1
  %1:fpr32 = COPY $s1
  %0:fpr32 = COPY $s0
  nofpexcept FCMPSrr %0:fpr32, %0:fpr32, implicit-def $nzcv, implicit $fpcr
  %2:gpr32 = CSINCWr $wzr, $wzr, 7, implicit $nzcv
  nofpexcept FCMPSrr %1:fpr32, %1:fpr32, implicit-def $nzcv, implicit $fpcr
  %3:gpr32 = CSELWr $wzr, killed %2:gpr32, 7, implicit $nzcv
  $w0 = COPY %3:gpr32
  RET_ReallyLR implicit $w0

# End machine code for function foo2.




=== main
Initial selection DAG: %bb.0 'main:entry'
SelectionDAG has 5 nodes:
    t0: ch,glue = EntryToken
  t3: ch,glue = CopyToReg t0, Register:i32 $w0, Constant:i32<0>
  t4: ch = AArch64ISD::RET_FLAG t3, Register:i32 $w0, t3:1


Optimized lowered selection DAG: %bb.0 'main:entry'
SelectionDAG has 5 nodes:
    t0: ch,glue = EntryToken
  t3: ch,glue = CopyToReg t0, Register:i32 $w0, Constant:i32<0>
  t4: ch = AArch64ISD::RET_FLAG t3, Register:i32 $w0, t3:1


Type-legalized selection DAG: %bb.0 'main:entry'
SelectionDAG has 5 nodes:
    t0: ch,glue = EntryToken
  t3: ch,glue = CopyToReg t0, Register:i32 $w0, Constant:i32<0>
  t4: ch = AArch64ISD::RET_FLAG t3, Register:i32 $w0, t3:1


Legalized selection DAG: %bb.0 'main:entry'
SelectionDAG has 5 nodes:
    t0: ch,glue = EntryToken
  t3: ch,glue = CopyToReg t0, Register:i32 $w0, Constant:i32<0>
  t4: ch = AArch64ISD::RET_FLAG t3, Register:i32 $w0, t3:1


Optimized legalized selection DAG: %bb.0 'main:entry'
SelectionDAG has 5 nodes:
    t0: ch,glue = EntryToken
  t3: ch,glue = CopyToReg t0, Register:i32 $w0, Constant:i32<0>
  t4: ch = AArch64ISD::RET_FLAG t3, Register:i32 $w0, t3:1


===== Instruction selection begins: %bb.0 'entry'

ISEL: Starting selection on root node: t4: ch = AArch64ISD::RET_FLAG t3, Register:i32 $w0, t3:1
ISEL: Starting pattern match
  Initial Opcode index to 381846
  Morphed node: t4: ch = RET_ReallyLR Register:i32 $w0, t3, t3:1
ISEL: Match complete!

ISEL: Starting selection on root node: t3: ch,glue = CopyToReg t0, Register:i32 $w0, Constant:i32<0>

ISEL: Starting selection on root node: t2: i32 = Register $w0

ISEL: Starting selection on root node: t1: i32 = Constant<0>

ISEL: Starting selection on root node: t0: ch,glue = EntryToken

===== Instruction selection ends:
Selected selection DAG: %bb.0 'main:entry'
SelectionDAG has 6 nodes:
  t0: ch,glue = EntryToken
    t6: i32,ch = CopyFromReg t0, Register:i32 $wzr
  t3: ch,glue = CopyToReg t0, Register:i32 $w0, t6
  t4: ch = RET_ReallyLR Register:i32 $w0, t3, t3:1


Total amount of phi nodes to update: 0
*** MachineFunction at end of ISel ***
# Machine code for function main: IsSSA, TracksLiveness

bb.0.entry:
  %0:gpr32all = COPY $wzr
  $w0 = COPY %0:gpr32all
  RET_ReallyLR implicit $w0

# End machine code for function main.

This is generated while running the isel command.
Okay! If isFloatingPoint() might be creating the issue then we can also use dump() function but I’m not sure about it’s syntax.

But isFloatingPoint() is also used in emitConditionalComparison to check for floating point. Right? So I guess it should not create problem but if you can help me with the dump() syntax then I can try that too.

N->dump() or Cmp0->dump() should just work, I think? Dumps the node to stderr in a similar format to the debug dumps.

You can also write text to stderr using something like errs() << "ENTERING CODE SECTION\n";. That can be helpful to see where control flow goes.

A floating-point compare has floating-point operands, but the result is an integer.

Okay! Let me try it.