Using a MachineInstruction Address

Hi,
Is there a straight-forward way to obtain an arbitrary MachineInstruction address and maintain it updated along the backend optimizations, even if it is in the middle of a MachineBasicBlock?
I have an instruction that takes a relative address. E.g

BB0:
myInstruction BBN + 4*x

BBN:
... x instructions ...
target_instruction <<=== it points to here.

The issues I have are:
1) I only see the ability to obtain basic-block addresses/labels, and these are only updated in terminator instructions. So by change that basic block is renamed/merged the value in myInstruction gets corrupt.
2) Even if I set-up the instruction at the top of a basic block, eventually the BB gets merged, so just pointing to the BB label is not enough.

I do see MCSymbol, but I don't get it. It seems to be resolved during asm printing.

Thanks,

Diogo Sampaio
Senior Compiler Engineer • Kalray
dsampaio@kalrayinc.com • [ https://www.kalrayinc.com/ | www.kalrayinc.com ]

[ https://www.kalrayinc.com/ | ]
  
The Processor at the Heart
of Intelligent Systems

Please consider the environment before printing this e-mail.

I have a somewhat similar scenario in a downstream back-end. You might try #2 and then call one of:

/// Set this block to reflect that it potentially is the target of an indirect branch.
void setHasAddressTaken() { AddressTaken = true; }

/// Test whether this block must have its label emitted.
bool hasLabelMustBeEmitted() const { return LabelMustBeEmitted; }

I added a similar feature to AArch64 recently to handle jump-tables. I
think tracking both BB-start and offset is probably a non-starter, so
to take vocabulary from your example I implemented something like:

        myInstruction Ltmp0
        [...]
    BBN:
        ... x instructions ...
    Ltmp0:
        target_instruction

In this situation target_instruction is a Pseudo-instructrion that
gets expanded at the AsmPrinter stage into a label followed by the
real instruction. Both myInstruction and target_instruction would
share some kind of immediate operand saying which instance they are,
and the symbol generated would be coordinated by XYZFunctionInfo
(first user asks for a temporary symbol and records it there).

If target_instruction could actually be lots and lots of different
alternatives that you don't want to create pseudos for then you may be
able to arrange a bundle with a label-pseudo and the real instruction.
I just mention this so you don't abandon the idea entirely, I can give
more details if needed.

Cheers.

Tim.

Thanks for the replies Tim and Jason,

So I went for the idea of using a pseudo-instruction that is expanded to a label.
In my particular case, it is just the label delimiting a hardware loop end. And I have another one
for the loop start. I’m simply lowering the IR generated intrinsics into pseudo-instructions
(actually replacing the conditional branch that makes the loop latch).

From doing that I had issues with branchFolding doing some undesired changes, as it does not see the loop structure no more.
I managed to get it to work by using hasAddressTaken / labelMustBeEmitted to the loop latch block, and make analyzeBranch
return that it can’t compute the branches when the basic block holds one of the pseudo instructions.
However it seems that is over-constraint branchFolding, and the code is not that optimal in the end, but it works.

But I’m still having issues from instructions moving across the loop boundaries.
I found out that setting the pseudo-instructions as a “isSchedulingBoundary” helps with the schedulers, but still,
when reg-allocator is synthesizing phi-nodes into instructions, some are converted inside/outside the loop in wrong manner.
So I have to search for where to insert the instructions and some times move some instruction around. But that’s not that
trivial in some cases.

If I define the pseudo instructions as branch instructions I guess that should be enough for forcing the loop structure to be maintained, right?
Is there any special thing I need to do other then managing to place them in end of a MBB and make analyzeBranch understand them?

Alternatively, looking from changes in the HardwareLoops pass and some current upstream diffs, it seems I’m not the only one having such sort of issues.
Could that be solved in a more generic manner? Perhaps teaching the compiler about Hardware[Machine]Loops (being a sub-classes of [Machine]Loops).
At IR level the loop intrinsics would delimit the loop start/end. At MIR level, it would query the TargetTransformInfo for which are the loop boundaries instructions
(an start and a latch instructions).
That could guide the backend optimizations, (scheduler, branch folding … including the reg-alloc to correctly place phi nodes). Does that seems a reasonable idea?

One last question, more a aesthetic thing which I hadn’t time to look into… When the pseudo instruction is expanded
to a label it still gets indented. Is there any special instruction type or a flag to tell it is a label so it should not be indented?
Or is there a special manner to print labels?
(I’m simply setting the instruction with isCodegenOnly=1 and using the asm string as “$label:”, which is one of the operands.

Cheers.

Diogo.

Everything you just described sounds like expected behavior. To make your system more reliable, you will probably need some kind of precisely specified LLVM intrinsic and MachineInstr pair with precise semantics that the rest of the optimizers can reason about. I am reminded of the convergent attribute redesign and the challenges in the AMDGPU backend.

However, nobody has pointed out yet MachineInstr::setPostInstrSymbol, so let me mention it:
https://llvm.org/doxygen/classllvm_1_1MachineInstr.html#ac8ce95857a66b3706a84d1fd5072f0dd

This API is a bit dangerous, because unless you are confident that optimizers will not delete or duplicate your MachineInstr, you can end up with zero or two or more label definitions. However, it works reasonably well for tracking function call return addresses in debug info, or in late stage passes after branch folding.