Question on Machine Combiner Pass


In the file lib/CodeGen/MachineCombiner.cpp I see that in the function MachineCombiner::preservesCriticalPathLen
we try to determine whether the new combined instruction lengthens the critical path or not.

In order to do this we compute the depth and latency for the current instruction (MUL+ADD) and the alternate instruction (MADD).

But we call two different set of APIs for the current and new instructions:

For new instruction we use:

unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace);

unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace);

While for the current instruction we use:

unsigned RootDepth = BlockTrace.getInstrCycles(Root).Depth;

unsigned RootLatency = TSchedModel.computeInstrLatency(Root);

This has been introduced in the following commit:

commit e4fa341dde3c9521b7f11bd53ecdcbeb3f8fcbda

Author: Gerolf Hoflehner

MachineCombiner Pass for selecting faster instruction sequence on AArch64

For this example code sequence:

%mul = mul nuw nsw i32 %conv2, %conv

%mul7 = mul nuw nsw i32 %conv6, %conv4

%add = add nuw nsw i32 %mul7, %mul

ret i32 %add

We generate the following assembly:
mul w8, w0, w1

mul w9, w2, w3

add w0, w9, w8


Whereas I expected the MUL+ADD to be combined to MADD otherwise I see degraded performance in several of my tests.

Could someone please explain why we use two different APIs to compute depth and latency for the two instructions?