How to handle ISD::STORE when both operands are FrameIndex?

Hello.

After “Initial selection DAG” stage I get a DAG with node

t14: ch = store<(store 4 into %ir.p45, align 8, addrspace 1)> t10, FrameIndex:i32<2>, FrameIndex:i32<3>, undef:i32

  1. Where does it come from? Can I do anything to make it not appear?
  2. If not, how do I change it so that the operand being stored would be first loaded into a register, and that register would be used instead? Like

ch = StoreStackF<Mem:(store 4 into %ir.p45, align 8, addrspace 1)> Register:%1, TargetFrameIndex:i32<3>, t10

Thanks in advance.

1. Where does it come from? Can I do anything to make it not appear?

It comes from something like:

    %ptr = alloca i8
   %var = alloca i8*
   store i8* %ptr, i8** %var

and in general it's not possible to eliminate the combination.

2. If not, how do I change it so that the operand being stored would be first loaded into a register, and that register would be used instead?

While the store is being selected LLVM will just treat the value being
stored as a generic pointer-width integer unless you have written a
specific pattern for storing a FrameIndex somewhere. That's probably
what you want so no need to change anything.

After that, LLVM will try to select the FrameIndex itself (which
you'll need to support anyway so that you can pass a pointer to a
local variable between functions).

Generally this seems to be handled by XYZISelDAGToDAG.cpp, which
converts an ISD::FrameIndex into an appropriate arithmetic instruction
that can offset from SP. That gets lowered to actual known offsets
later on, in exactly the same way loads and stores are.

It's probably done in C++ rather than TableGen because the operands of
these resulting instructions often look pretty weird (for example a
TargetFrameIndex instead of a register). That's mostly speculation
though, I haven't tried to write a bare frameindex pattern.

Cheers.

Tim.

FrameIndex values come from objects that are still allocated on the stack at the time of the SDAG construction. In your case it seems that the address and the value to store are both on the stack. You don’t need to do anything in particular with it. If you have a selection pattern of form “(store RegClassA:$Addr, RegClassB:$Val), (STORE $Addr, $Val)”, then it will force loading both, Addr and Val into registers. I think someone has added a pattern to match frame index in patterns as well (it didn’t use to be possible and you had to use ComplexPattern), so your pattern can handle that directly too.

  1. Where does it come from? Can I do anything to make it not appear?

It comes from something like:

%ptr = alloca i8
%var = alloca i8*
store i8* %ptr, i8** %var

and in general it’s not possible to eliminate the combination.

Aha, thanks.

  1. If not, how do I change it so that the operand being stored would be first loaded into a register, and that register would be used instead?

While the store is being selected LLVM will just treat the value being
stored as a generic pointer-width integer unless you have written a
specific pattern for storing a FrameIndex somewhere.

Actually, this is exactly my case. I have a pattern that looks like

(store_stack IntRegs:$reg, FIOperand:$i)

It worked for me when first operand was a register and now I stumbled upon two FrameIndicies.

That’s probably
what you want so no need to change anything.

After that, LLVM will try to select the FrameIndex itself (which
you’ll need to support anyway so that you can pass a pointer to a
local variable between functions).

Generally this seems to be handled by XYZISelDAGToDAG.cpp, which
converts an ISD::FrameIndex into an appropriate arithmetic instruction
that can offset from SP. That gets lowered to actual known offsets
later on, in exactly the same way loads and stores are.

It’s probably done in C++ rather than TableGen because the operands of
these resulting instructions often look pretty weird (for example a
TargetFrameIndex instead of a register). That’s mostly speculation
though, I haven’t tried to write a bare frameindex pattern.

Cheers.

Tim.

FrameIndex values come from objects that are still allocated on the stack at the time of the SDAG construction. In your case it seems that the address and the value to store are both on the stack. You don’t need to do anything in particular with it. If you have a selection pattern of form “(store RegClassA:$Addr, RegClassB:$Val), (STORE $Addr, $Val)”, then it will force loading both, Addr and Val into registers.

You mean, LLVM will automagically generate loads? This isn’t happening for me.

And what’s “STORE”? Is it somehow different from “store”?

While the store is being selected LLVM will just treat the value being
stored as a generic pointer-width integer unless you have written a
specific pattern for storing a FrameIndex somewhere.

Actually, this is exactly my case. I have a pattern that looks like

(store_stack IntRegs:$reg, FIOperand:$i)

It worked for me when first operand was a register and now I stumbled upon two FrameIndicies.

That looks more like it's storing *to* a FrameIndex than storing a
FrameIndex, but it actually shouldn't matter. The same sequence of
events I described would happen except the selected bare FrameIndex
would feed into the pointer operand of the store rather than the value
operand.

If you had both kinds of pattern (storing an FI and storing to an FI)
and wanted one to have priority you can add a "let AddedComplexity =
10" or similar to one of them so that LLVM favours it when both would
apply.

FrameIndex values come from objects that are still allocated on the stack at the time of the SDAG construction. In your case it seems that the address and the value to store are both on the stack. You don’t need to do anything in particular with it. If you have a selection pattern of form “(store RegClassA:$Addr, RegClassB:$Val), (STORE $Addr, $Val)”, then it will force loading both, Addr and Val into registers.

You mean, LLVM will automagically generate loads? This isn't happening for me.

"Materializing" would probably have been a less ambiguous choice of
words. I wouldn't expect a load instruction on most architectures.

And what's "STORE"? Is it somehow different from "store"?

On most targets in LLVM the implemented instruction names are written
in capital letters (mostly): ADDrm, MULrr, STRXrs. The STORE there was
meant to represent a target-specific store instruction (like you'd get
on the RHS of a Pat instantiation) without committing to any
particular architecture.

Could you describe what *is* happening for you, BTW? Maybe with an
"llc -debug" log on a simple example. We might be able to give more
specific advice with the actual error.

Cheers.

Tim.

While the store is being selected LLVM will just treat the value being
stored as a generic pointer-width integer unless you have written a
specific pattern for storing a FrameIndex somewhere.

Actually, this is exactly my case. I have a pattern that looks like

(store_stack IntRegs:$reg, FIOperand:$i)

It worked for me when first operand was a register and now I stumbled upon two FrameIndicies.

That looks more like it’s storing to a FrameIndex than storing a
FrameIndex, but it actually shouldn’t matter. The same sequence of
events I described would happen except the selected bare FrameIndex
would feed into the pointer operand of the store rather than the value
operand.

If you had both kinds of pattern (storing an FI and storing to an FI)
and wanted one to have priority you can add a “let AddedComplexity =
10” or similar to one of them so that LLVM favours it when both would
apply.

FrameIndex values come from objects that are still allocated on the stack at the time of the SDAG construction. In your case it seems that the address and the value to store are both on the stack. You don’t need to do anything in particular with it. If you have a selection pattern of form “(store RegClassA:$Addr, RegClassB:$Val), (STORE $Addr, $Val)”, then it will force loading both, Addr and Val into registers.

You mean, LLVM will automagically generate loads? This isn’t happening for me.

“Materializing” would probably have been a less ambiguous choice of
words. I wouldn’t expect a load instruction on most architectures.

And what’s “STORE”? Is it somehow different from “store”?

On most targets in LLVM the implemented instruction names are written
in capital letters (mostly): ADDrm, MULrr, STRXrs. The STORE there was
meant to represent a target-specific store instruction (like you’d get
on the RHS of a Pat instantiation) without committing to any
particular architecture.

Could you describe what is happening for you, BTW? Maybe with an
“llc -debug” log on a simple example. We might be able to give more
specific advice with the actual error.

Sigh. I’m completely confused and lost. Thanks for bearing with me.

Here’s my current definition:

def AddrFI: ComplexPattern<i32, 1, “SelectAddrFI”, [frameindex], []>;

class StackAddress : CodePatPred<[{
return cast(N)->getAddressSpace() == 1;
}]>;

class StoreFrag : PatFrag <
(ops node:$value, node:$ptr), (op node:$value, node:$ptr)

;

class StackStore : StoreFrag , StackAddress;

def store_stack : StackStore;

def StoreStackF : InstRI<2, (outs), (ins IntRegs:$reg, i32imm:$i),
“storestackf $reg, [$i]”, [(store_stack i32:$reg, AddrFI:$i)]>;

I’m puzzled why despite having “IntRegs:$reg” in ins list, it still matches FrameIndex:

SEL: Starting selection on root node: t14: ch = store<(store 4 into %ir.p45, align 8, addrspace 1)> t10, FrameIndex:i32<2>, FrameIndex:i32<3>, undef:i32
ISEL: Starting pattern match
Initial Opcode index to 4
Morphed node: t14: ch = StoreStackF<Mem:(store 4 into %ir.p45, align 8, addrspace 1)> FrameIndex:i32<2>, TargetFrameIndex:i32<3>, t10
ISEL: Match complete!

ISEL: Starting selection on root node: t11: i32 = FrameIndex<2>
ISEL: Starting pattern match
Initial Opcode index to 0
Match failed at index 0
LLVM ERROR: Cannot select: t11: i32 = FrameIndex<2>

Hi Gleb,

def StoreStackF : InstRI<2, (outs), (ins IntRegs:$reg, i32imm:$i),
                    "storestackf $reg, [$i]", [(store_stack i32:$reg, AddrFI:$i)]>;

I'm puzzled why despite having "IntRegs:$reg" in ins list, it still matches FrameIndex:

A register-class will match anything with the appropriate type (i32
here). The idea is that any value of that type can be put into a
register somehow, and matching whatever it happens to be (a FrameIndex
in this case) is exactly what's needed to do that.

ISEL: Starting selection on root node: t11: i32 = FrameIndex<2>
ISEL: Starting pattern match
  Initial Opcode index to 0
  Match failed at index 0
LLVM ERROR: Cannot select: t11: i32 = FrameIndex<2>

Ah good, this is what I thought was probably happening so it's nothing
exotic. You need to be able to match a FrameIndex somehow. Not just
for this, but for something like:

    void foo() {
      int arr[5];
      bar(&arr[0]);
    }

It's pretty simple to implement matching, you generally just need to
convert it to some add instruction that can do SP+imm. See for example
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:2916.

Cheers.

Tim.

Hi Gleb,

def StoreStackF : InstRI<2, (outs), (ins IntRegs:$reg, i32imm:$i),
“storestackf $reg, [$i]”, [(store_stack i32:$reg, AddrFI:$i)]>;

I’m puzzled why despite having “IntRegs:$reg” in ins list, it still matches FrameIndex:

A register-class will match anything with the appropriate type (i32
here). The idea is that any value of that type can be put into a
register somehow, and matching whatever it happens to be (a FrameIndex
in this case) is exactly what’s needed to do that.

That’s a useful explanation, thanks.

ISEL: Starting selection on root node: t11: i32 = FrameIndex<2>
ISEL: Starting pattern match
Initial Opcode index to 0
Match failed at index 0
LLVM ERROR: Cannot select: t11: i32 = FrameIndex<2>

Ah good, this is what I thought was probably happening so it’s nothing
exotic. You need to be able to match a FrameIndex somehow. Not just
for this, but for something like:

void foo() {
int arr[5];
bar(&arr[0]);
}

It’s pretty simple to implement matching, you generally just need to
convert it to some add instruction that can do SP+imm.

For the sake of simplicity I decided not to bother with stack register yet and instead I just emit FrameIndex as immediate:

bool MyBackendDAGToDAGISel::SelectAddrFI(SDValue &N, SDValue &R) {
if (N.getOpcode() != ISD::FrameIndex)
return false;
MachineFrameInfo &MFI = MF->getFrameInfo();
int FX = cast(N)->getIndex();
R = CurDAG->getTargetFrameIndex(FX, MVT::i32);
return true;
}

This way I end up with

store %r1, [1]

and handle it in my CPU emulator accordingly.

So, instead of matching that FrameIndex in store, I really want to emit a load first and then use a register in the store instruction.
Can you, please, advise me how to implement that?

As you say, SelectAddrFI will only be used when looking into a store,
and I think that's fine: it's a specialized addressing-mode
implemented in just the right place. You need a completely separate
"case" in MyBackendDAGToDAGISel::Select to handle a bare
ISD::FrameIndex.

Since you don't use SP, your implementation of FrameIndex will
probably just be a MOV-immediate instruction. Final output would be
something like:

    mov %r1, 2 // Produced by matching the ISD::FrameIndex
    store %r1, [1]

I don't think load is going to be involved anywhere unless you meant
it in the broader sense where (e.g.) Z80 referred to its move
instructions as loads.

Cheers.

Tim.

For the sake of simplicity I decided not to bother with stack register yet and instead I just emit FrameIndex as immediate:

bool MyBackendDAGToDAGISel::SelectAddrFI(SDValue &N, SDValue &R) {
if (N.getOpcode() != ISD::FrameIndex)
return false;
MachineFrameInfo &MFI = MF->getFrameInfo();
int FX = cast(N)->getIndex();
R = CurDAG->getTargetFrameIndex(FX, MVT::i32);
return true;
}

This way I end up with

store %r1, [1]

and handle it in my CPU emulator accordingly.

So, instead of matching that FrameIndex in store, I really want to emit a load first and then use a register in the store instruction.

As you say, SelectAddrFI will only be used when looking into a store,
and I think that’s fine: it’s a specialized addressing-mode
implemented in just the right place. You need a completely separate
“case” in MyBackendDAGToDAGISel::Select to handle a bare
ISD::FrameIndex.

Thanks for clearing this up! I was thinking the same, but wasn’t sure.

Since you don’t use SP, your implementation of FrameIndex will
probably just be a MOV-immediate instruction. Final output would be
something like:

mov %r1, 2 // Produced by matching the ISD::FrameIndex
store %r1, [1]

I don’t think load is going to be involved anywhere unless you meant
it in the broader sense where (e.g.) Z80 referred to its move
instructions as loads.

Silly me -_
Why was I talking about a load all this time? You’re right, I needed a simple mov there. I have added the following code into MyBackendDAGToDAGISel::Select() and it worked:

if(FrameIndexSDNode *FIN = dyn_cast(N))
{
auto Addr = CurDAG->getTargetFrameIndex(
FIN->getIndex(), TLI->getPointerTy(CurDAG->getDataLayout()));
SDNode *MOV = CurDAG->getMachineNode(MYBACK::MovRI, dl, MVT::i32, Addr);
CurDAG->ReplaceAllUsesWith(N, MOV);
return;
}

I’m eternally grateful for helping me with this, thank you again.

Eventually you will need to replace the frame index with actual code that calculates the address based on the stack pointer/frame pointer. This is done via a target hook TargetRegisterInfo::eliminateFrameIndex. It takes the instruction with the FI to be replaced, and the operand number that contains the FI:

  virtual void eliminateFrameIndex(MachineBasicBlock::iterator MI,
                                   int SPAdj, unsigned FIOperandNum,
                                   RegScavenger *RS = nullptr) const = 0;

It makes the replacement a lot easier to have the frame index always in a context where it's added to an immediate, for example a load from fi#0 may use it in a place of a base register, followed by an immediate offset operand. During the FI elimination, when you know that fi#0 is SP+32, you put SP in the base register operand and add the 32 to the offset field. In the same way it's convenient to have the transfer of a FI into a register be an add-immediate of 0, so instead of "mov reg, fi#0" you'd have "addi reg, fi#0, 0". Now the replacement can simply replace the operands with "SP" and "32" respectively (for the case of fi#0 being SP+32), i.e. you'd end up with "addi reg, SP, 32". When you have a simple "mov reg, fi#0", the replacement will be harder, since you can't have "mov reg, SP+32".