[CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

Hi,

I find it’s quite strange how condition dependent instructions are processed in PeepholeOptimizer::runOnMachineFunction:

01577 if ((isUncoalescableCopy(*MI) &&

01578 optimizeUncoalescableCopy(MI, LocalMIs)) ||

01579 (MI->isCompare() && optimizeCmpInstr(MI, &MBB)) ||

01580 (MI->isSelect() && optimizeSelect(MI, LocalMIs))) {

01581 // MI is deleted.

01582 LocalMIs.erase(MI);

01583 Changed = true;

01584 continue;

01585 }

01586

01587 if (MI->isConditionalBranch() && optimizeCondBranch(MI)) {

01588 Changed = true;

01589 continue;

01590 }

CmpInstr, SelectInstr and CondBranch are processed separately. It’s assumed that CmpInstr and SelectInstr are deleted but CondBranch is not.

In fact CmpInstr is always connected to SelectInstr or CondBranch or both of them. So if such connection exists it should be processed as a whole.

For example, there are cases when CMP+BRC can be replaced by BR. The same is true for CMP+SEL.

The main problem I have is that I have to find corresponding CmpInstr and to repeat analysis of it in optimizeSelect and in optimizeCondBranch.

Any thoughts why it’s implemented in such way.

Kind regards,

Evgeny Astigeevich

Hi Evgeny,

Hi,

I find it’s quite strange how condition dependent instructions are processed in PeepholeOptimizer::runOnMachineFunction:

01577 if ((isUncoalescableCopy(*MI) &&
01578 optimizeUncoalescableCopy(MI, LocalMIs)) ||
01579 (MI->isCompare() && optimizeCmpInstr(MI, &MBB)) ||
01580 (MI->isSelect() && optimizeSelect(MI, LocalMIs))) {
01581 // MI is deleted.
01582 LocalMIs.erase(MI);
01583 Changed = true;
01584 continue;
01585 }
01586
01587 if (MI->isConditionalBranch() && optimizeCondBranch(MI)) {
01588 Changed = true;
01589 continue;
01590 }

CmpInstr, SelectInstr and CondBranch are processed separately. It’s assumed that CmpInstr and SelectInstr are deleted but CondBranch is not.
In fact CmpInstr is always connected to SelectInstr or CondBranch or both of them. So if such connection exists it should be processed as a whole.

This code allows you to do that, unless I am mistaken.

For example, there are cases when CMP+BRC can be replaced by BR. The same is true for CMP+SEL.

I believe this should be done in respectively optimizeCondBranch and optimizeSelect.

The main problem I have is that I have to find corresponding CmpInstr and to repeat analysis of it in optimizeSelect and in optimizeCondBranch.

Ok, so basically your concern is that we may call twice analyzeCompare. Is that it?
This function is probably cheap so I wouldn’t be too concerned about that. If I turn out to be wrong, then yes we can think of a better mechanism.

Any thoughts why it’s implemented in such way.

The idea of the peephole optimizer is top-down approach and greedily applied optimization.

I have to admit I don’t see the concern with the instruction being condition dependent; we don’t want to call optimizeCondBranch :).
I believe I missed your point.

Cheers,
-Quentin

Hi Quentin,

Yes, the code allows to process connected instructions. Although it should be taken into account that the instruction next to the current processed instruction must never be erased because this invalidates iterator.

I’ve been fixing a bug in AArch64InstrInfo::optimizeCompareInstr: instructions are converted into S form but it’s not checked that they produce the same flags as CMP. The bug exists upstream as well.

Together with the fix I want to add some peephole rules for combinations CMP+BRC and CMP+SEL. In the context of optimizeCmpInstr I have all information about CmpInstr. I simply go down and check all instructions which use AArch64::NZCV whether they can be substituted with the simpler version. After all I delete CmpInstr. This approach contradicts with PeepholeOptimizer design because BRC and SEL must be processed in corresponding functions. Yes, ‘analyzeCompare’ is cheap but in optimizeCondBranch and in optimizeSelect we need to go up to find the instruction defining condition flags. In case of BRC CMP should not be far from it but I am not sure about SEL. Also when BRC is replaced with BR CMP can be removed (BTW processing of instructions below BRC can be stopped). I don’t know if there any restrictions on instructions below BRC. Anyway I don’t expect many of them. In case of CMP+SEL we can not remove CMP after simplifying SEL because there can be other SEL instructions using flags from CMP.

I have to admit I don’t see the concern with the instruction being condition dependent; we don’t want to call optimizeCondBranch :).

I believe I missed your point.

I missed your point too J I think it’s always good to get rid of CondBranch.

We have cases like:

SUBS Wd, Wn, 0

B.LO

As SUBS sets C to 1 B.LO will fall through. So we can substitute them with an unconditional branch.

Thanks,

Evgeny

Hi Evgeny,

Hi Quentin,

Yes, the code allows to process connected instructions. Although it should be taken into account that the instruction next to the current processed instruction must never be erased because this invalidates iterator.

Indeed.

I’ve been fixing a bug in AArch64InstrInfo::optimizeCompareInstr: instructions are converted into S form but it’s not checked that they produce the same flags as CMP. The bug exists upstream as well.

Could you file a PR or just push the patch :).

Together with the fix I want to add some peephole rules for combinations CMP+BRC and CMP+SEL. In the context of optimizeCmpInstr I have all information about CmpInstr. I simply go down and check all instructions which use AArch64::NZCV whether they can be substituted with the simpler version. After all I delete CmpInstr. This approach contradicts with PeepholeOptimizer design because BRC and SEL must be processed in corresponding functions.

Ok I got your concern: basically you want to do the CMP+BRC or CMP+SEL inside optimizeCmpInstr instead of having them into optimizeSelect and optimizeBranch so that you don’t do the analysis twice.

Historically the peephole optimizer is processing patterns bottom-up (use to def). The rationale is we only have one def but we may have several uses. In other words, it is easy to replace a use after you prove it is correct, but what you want is top down (def->use) and in that case, you need some extra checks (the potential other uses) to prove that the def can optimized.

The bottom line, I believe this is not done this way because it is not peephole-ish in terms of complexity.

Yes, ‘analyzeCompare’ is cheap but in optimizeCondBranch and in optimizeSelect we need to go up to find the instruction defining condition flags.

Going up is generally cheap, we just ask for the unique definition of the vreg. I believe in your case it is not cheap because you are tracking a physical reg and not a vreg.
Is that the problem?

In case of BRC CMP should not be far from it but I am not sure about SEL. Also when BRC is replaced with BR CMP can be removed (BTW processing of instructions below BRC can be stopped). I don’t know if there any restrictions on instructions below BRC.

You should have only terminators at the end of the BB.
You may have another branch though.

Anyway I don’t expect many of them. In case of CMP+SEL we can not remove CMP after simplifying SEL because there can be other SEL instructions using flags from CMP.

This is what I explained with the defs need more checks. That’s why optimizeSelect seems a good fit for that.

> I have to admit I don’t see the concern with the instruction being condition dependent; we don’t want to call optimizeCondBranch :).
> I believe I missed your point.

I missed your point too J I think it’s always good to get rid of CondBranch.

I was talking about the code in the peephole optimizer :).
Like:
if isCondBranch then optimizeCondBranch
We don’t want unconditional call to optimizeCondBranch. I.e., optimizeCondBranch expects a condbranch as argument.

Cheers,
-Quentin

Hi Quentin,

Yes, in case of physical regs SSA rules are broken and getUniqueVRegDef can not be used.

Thank you for clarifying things.

Now I have more understanding how it works.

I need to split code among these functions and cover it by tests. After that I will submit it for review.

Thanks,

Evgeny