Frame index arithmetic

I’ve developed a working back-end for a custom architecture, based on LLVM 2.6. I’m now trying to cover more of the unique features of this architecture.

To make use of one such feature, I’m trying something cunning/crazy with the stack - implementing it in a type of memory that can only be addressed via immediates.

I’ve got this mostly working. However, I came across a problem which I’ve been unable to work around: lowering the IR (even without any optimisations enabled) often requires the pattern:

i32 = FrameIndex

For normal memory, I was using the following instruction to match this pattern:

// Get the address in memory corresponding to the given frame index, saving the address
// in a register.
def MOV_FI : PseudoInstr<(outs GPR:$dst), (ins frameIndex:$addr),
“// $dst := frame index $addr”,
[(set GPR:$dst, frameIndex:$addr)]>;

Which is later replaced by a MOV (output register = stack pointer + constant offset) in eliminateFrameIndex().

However, it isn’t appropriate to do this with the proposed stack memory - it doesn’t make sense to move the address into a register (where arithmetic can be performed on it), as it isn’t possible to move that back to the domain of an immediate. So I conditionally disabled this instruction. But that leads to most programs failing to select the above pattern.

The issue is that this pattern is required even in code that doesn’t conceptually seem to need it (see the example below). I couldn’t figure out how to avoid this during DAG legalisation. Most often, the resulting machine assembly when the above pattern is enabled, simply stores a particular stack slot in a register, for later use in the same basic block, e.g.:

MOV out=r4 in=SP+4
LOAD out=r4 addr=r4

despite patterns existing for LOAD with a constant offset (which is successfully used by other stack slots in the same basic block), e.g.:

LOAD out=r3 addr=SP off=8

Am I missing some other patterns that would avoid this? For example, is it possible to write patterns that allow for arithmetic involving only immediates, with the result being another immediate?

If all else fails, I was thinking of writing a custom pass to identify and remove these. But that could be a lot of work.

Thanks,

  • Mark

Example:

int result;

int foo(int cond, int a, int b)
{
return cond? a : b;
}

int main()
{
return result = foo(1, 2, 3);
// Expected: result = 2.
}

Resulting IR:

@result = common global i32 0, align 4 ; <i32*> [#uses=2]

define i32 @foo(i32 %cond, i32 %a, i32 %b) nounwind {
entry:
%retval = alloca i32 ; <i32*> [#uses=2]
%cond.addr = alloca i32 ; <i32*> [#uses=2]
%a.addr = alloca i32 ; <i32*> [#uses=2]
%b.addr = alloca i32 ; <i32*> [#uses=2]
store i32 %cond, i32* %cond.addr
store i32 %a, i32* %a.addr
store i32 %b, i32* %b.addr
%tmp = load i32* %cond.addr ; [#uses=1]
%tobool = icmp ne i32 %tmp, 0 ; [#uses=1]
%tmp1 = load i32* %a.addr ; [#uses=1]
%tmp2 = load i32* %b.addr ; [#uses=1]
%cond3 = select i1 %tobool, i32 %tmp1, i32 %tmp2 ; [#uses=1]
store i32 %cond3, i32* %retval
%0 = load i32* %retval ; [#uses=1]
ret i32 %0
}

define i32 @main() nounwind {
entry:
%retval = alloca i32 ; <i32*> [#uses=3]
store i32 0, i32* %retval
%call = call i32 @foo(i32 1, i32 2, i32 3) ; [#uses=1]
store i32 %call, i32* @result
%tmp = load i32* @result ; [#uses=1]
store i32 %tmp, i32* %retval
%0 = load i32* %retval ; [#uses=1]
ret i32 %0
}

(Note: for simplicity, the calling convention in use here places all arguments on the stack)

I’ve developed a working back-end for a custom architecture, based on LLVM 2.6. I’m now trying to cover more of the unique features of this architecture.

To make use of one such feature, I’m trying something cunning/crazy with the stack - implementing it in a type of memory that can only be addressed via immediates.

I’ve got this mostly working. However, I came across a problem which I’ve been unable to work around: lowering the IR (even without any optimisations enabled) often requires the pattern:

i32 = FrameIndex

For normal memory, I was using the following instruction to match this pattern:

// Get the address in memory corresponding to the given frame index, saving the address
// in a register.
def MOV_FI : PseudoInstr<(outs GPR:$dst), (ins frameIndex:$addr),
“// $dst := frame index $addr”,
[(set GPR:$dst, frameIndex:$addr)]>;

Which is later replaced by a MOV (output register = stack pointer + constant offset) in eliminateFrameIndex().

However, it isn’t appropriate to do this with the proposed stack memory - it doesn’t make sense to move the address into a register (where arithmetic can be performed on it), as it isn’t possible to move that back to the domain of an immediate. So I conditionally disabled this instruction. But that leads to most programs failing to select the above pattern.

The issue is that this pattern is required even in code that doesn’t conceptually seem to need it (see the example below). I couldn’t figure out how to avoid this during DAG legalisation. Most often, the resulting machine assembly when the above pattern is enabled, simply stores a particular stack slot in a register, for later use in the same basic block, e.g.:

MOV out=r4 in=SP+4
LOAD out=r4 addr=r4

despite patterns existing for LOAD with a constant offset (which is successfully used by other stack slots in the same basic block), e.g.:

LOAD out=r3 addr=SP off=8

Am I missing some other patterns that would avoid this? For example, is it possible to write patterns that allow for arithmetic involving only immediates, with the result being another immediate?

Sounds like your load / store address selection routine isn’t working like what you expected.

Evan

I’m trying something cunning/crazy with the stack - implementing it in a type of memory that can only be addressed via immediates.

I’ve got this mostly working. However, I came across a problem which I’ve been unable to work around: lowering the IR (even without any optimisations enabled) often requires the pattern:

i32 = FrameIndex

It isn’t appropriate to do this with the proposed stack memory - it doesn’t make sense to move the address into a register, as it isn’t possible to move that back to the domain of an immediate. So I conditionally disabled this instruction. But that leads to most programs failing to select the above pattern.

Sounds like your load / store address selection routine isn’t working like what you expected.

Thanks for the reply. Unfortunately, this doesn’t seem to be the problem.

I have the following definition for the frameIndex:

def frameIndex : Operand,
ComplexPattern<i32 /valueType/, 1 /numOperands/,
“SelectFrameIndex” /selectFunction/,
[frameindex] /rootNodes/> {
let PrintMethod = “printFrameIndexOperand”;
let MIOperandInfo = (ops GPR);
}

And the following selection code:

bool
MyDAGToDAGISel::
SelectFrameIndex(SDValue Op, SDValue N, SDValue& Address) {
if (FrameIndexSDNode* FIN = dyn_cast(N)) {
Address = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i32);
return true;
}
return false;
}

In light of your comment, I tried extending this method to only allow cases where Op is ISD::LOAD or ISD::STORE. I found this made no difference to the behaviour. That was surprising, so I added code to print out each instruction seen by that method. And it turns out that all the operations were loads or stores anyway. So it must be later on that the conversion happens, which turns the operation into some form of indirect addressing.

To further explore the example I gave in my original email, I have an instruction matching the pattern:

[(set GPR:$dst, (select GPR:$sel, immOrGPR:$a, immOrGPR:$b))]

The target-independent IR (as shown in the original message):

define i32 @foo(i32 %cond, i32 %a, i32 %b) nounwind {
entry:
%retval = alloca i32 ; <i32*> [#uses=2]
%cond.addr = alloca i32 ; <i32*> [#uses=2]
%a.addr = alloca i32 ; <i32*> [#uses=2]
%b.addr = alloca i32 ; <i32*> [#uses=2]
store i32 %cond, i32* %cond.addr
store i32 %a, i32* %a.addr
store i32 %b, i32* %b.addr
%tmp = load i32* %cond.addr ; [#uses=1]
%tobool = icmp ne i32 %tmp, 0 ; [#uses=1]
%tmp1 = load i32* %a.addr ; [#uses=1]
%tmp2 = load i32* %b.addr ; [#uses=1]
%cond3 = select i1 %tobool, i32 %tmp1, i32 %tmp2 ; [#uses=1]

To me seems to be doing what I want - i.e. storing arguments ‘a’ and ‘b’ into local stack slots, then selecting between the values stored in those stack slots. This appears to be what is seen during selection - SelectFrameIndex(), as indicated above.

The normal debug output from the back-end shows that the target-independent IR gets lowered to a select between the addresses of the two argument stack slots, rather than their values. That’s a nice optimisation in general, but isn’t allowed here, hence leading to the:

LLVM ERROR: Cannot yet select: 0x1811798: i32 = FrameIndex <3>

What transforms are performed during selection? I think this is where I should be looking, but I’m a bit lost.

Any help would be greatly appreciated.

  • Mark

Hi Mark,

Sounds like your load / store address selection routine isn't working like what you expected.

Thanks for the reply. Unfortunately, this doesn't seem to be the problem.

do you handle truncating stores and extending loads?

Ciao,

Duncan.

Sounds like your load / store address selection routine isn't working like what you expected.

Thanks for the reply. Unfortunately, this doesn't seem to be the problem.

do you handle truncating stores and extending loads?

I have instructions matching patterns for zero- and sign-extending loads (8-bit to 32-bit, 16-bit to 32-bit), and truncating stores (32-bit to 16-bit, 32-bit to 8-bit). And I've also defined patterns to map 'extload*' to 'zextload*'.

A bit more info as to where the problem occurs (in case that helps), the debug output shows that at the 'Initial selection DAG' phase, the loads, stores, and select are all as I want them to be. However, the very next dump 'Optimized lowered selection DAG' shows the select having been altered to operate on the addresses instead of the values. I've listed these two dumps below.

Regards,

- Mark

=== foo
Initial selection DAG:
SelectionDAG has 35 nodes:
  0x1403e98: ch = EntryToken
  0x1810e90: i32 = undef
    0x1403e98: <multiple use>
    0x1810e08: i32 = FrameIndex <-1>
    0x1810e90: <multiple use>
  0x1810f18: i32,ch = load 0x1403e98, 0x1810e08, 0x1810e90 <null:0> alignment=4
    0x1403e98: <multiple use>
    0x1810fa0: i32 = FrameIndex <-2>
    0x1810e90: <multiple use>
  0x1811028: i32,ch = load 0x1403e98, 0x1810fa0, 0x1810e90 <null:0> alignment=4
    0x1403e98: <multiple use>
    0x18110b0: i32 = FrameIndex <-3>
    0x1810e90: <multiple use>
  0x1811138: i32,ch = load 0x1403e98, 0x18110b0, 0x1810e90 <null:0> alignment=4
  0x18114f0: i32 = FrameIndex <1>
  0x1811688: i32 = FrameIndex <2>
  0x1811798: i32 = FrameIndex <3>
        0x1403e98: <multiple use>
        0x1810f18: <multiple use>
        0x18114f0: <multiple use>
        0x1810e90: <multiple use>
      0x1811600: ch = store 0x1403e98, 0x1810f18, 0x18114f0, 0x1810e90 <0x14021fc:0> alignment=4
      0x1811028: <multiple use>
      0x1811688: <multiple use>
      0x1810e90: <multiple use>
    0x1811710: ch = store 0x1811600, 0x1811028, 0x1811688, 0x1810e90 <0x140226c:0> alignment=4
    0x1811138: <multiple use>
    0x1811798: <multiple use>
    0x1810e90: <multiple use>
  0x1811820: ch = store 0x1811710, 0x1811138, 0x1811798, 0x1810e90 <0x14022ac:0> alignment=4
    0x1811820: <multiple use>
    0x18114f0: <multiple use>
    0x1810e90: <multiple use>
  0x18118a8: i32,ch = load 0x1811820, 0x18114f0, 0x1810e90 <0x14021fc:0> alignment=4
    0x1811820: <multiple use>
    0x1811688: <multiple use>
    0x1810e90: <multiple use>
  0x1811a40: i32,ch = load 0x1811820, 0x1811688, 0x1810e90 <0x140226c:0> alignment=4
    0x1811820: <multiple use>
    0x1811798: <multiple use>
    0x1810e90: <multiple use>
  0x1811ac8: i32,ch = load 0x1811820, 0x1811798, 0x1810e90 <0x14022ac:0> alignment=4
  0x1811bd8: i32 = FrameIndex <0>
      0x18118a8: <multiple use>
      0x1811a40: <multiple use>
      0x1811ac8: <multiple use>
    0x1811c60: ch = TokenFactor 0x18118a8:1, 0x1811a40:1, 0x1811ac8:1
        0x18118a8: <multiple use>
        0x1811578: i32 = Constant <0>
        0x1811930: ch = setne
      0x18119b8: i1 = setcc 0x18118a8, 0x1811578, 0x1811930
      0x1811a40: <multiple use>
      0x1811ac8: <multiple use>
    0x1811b50: i32 = select 0x18119b8, 0x1811a40, 0x1811ac8
    0x1811bd8: <multiple use>
    0x1810e90: <multiple use>
  0x1811ce8: ch = store 0x1811c60, 0x1811b50, 0x1811bd8, 0x1810e90 <0x14020fc:0> alignment=4
        0x1403e98: <multiple use>
        0x18111c0: i32 = Register #1024
        0x1810f18: <multiple use>
      0x1811248: ch = CopyToReg 0x1403e98, 0x18111c0, 0x1810f18
        0x1403e98: <multiple use>
        0x18112d0: i32 = Register #1025
        0x1811028: <multiple use>
      0x1811358: ch = CopyToReg 0x1403e98, 0x18112d0, 0x1811028
        0x1403e98: <multiple use>
        0x18113e0: i32 = Register #1026
        0x1811138: <multiple use>
      0x1811468: ch = CopyToReg 0x1403e98, 0x18113e0, 0x1811138
      0x1811ce8: <multiple use>
    0x1822808: ch = TokenFactor 0x1811248, 0x1811358, 0x1811468, 0x1811ce8
    0x1822890: i32 = Register r3
      0x1811ce8: <multiple use>
      0x1811bd8: <multiple use>
      0x1810e90: <multiple use>
    0x1811d70: i32,ch = load 0x1811ce8, 0x1811bd8, 0x1810e90 <0x14020fc:0> alignment=4
  0x1822918: ch,flag = CopyToReg 0x1822808, 0x1822890, 0x1811d70
    0x1822918: <multiple use>
    0x1822918: <multiple use>
  0x18229a0: ch = MYISD::RET_FLAG 0x1822918, 0x1822918:1

Replacing.1 0x1811d70: i32,ch = load 0x1811ce8, 0x1811bd8, 0x1810e90 <0x14020fc:0> alignment=4
With: 0x1811b50: i32 = select 0x18119b8, 0x1811a40, 0x1811ac8 and 1 other values

Replacing.1 0x1811b50: i32 = select 0x18119b8, 0x1811a40, 0x1811ac8
With: 0x1822a28: i32,ch = load 0x1811820, 0x1811d70, 0x1810e90 <0x140226c:0> alignment=4 and 0 other values

Replacing.1 0x1811a40: i32,ch = load 0x1811820, 0x1811688, 0x1810e90 <0x140226c:0> alignment=4
With: 0x1822a28: i32,ch = load 0x1811820, 0x1811d70, 0x1810e90 <0x140226c:0> alignment=4 and 1 other values

Replacing.1 0x1811ac8: i32,ch = load 0x1811820, 0x1811798, 0x1810e90 <0x14022ac:0> alignment=4
With: 0x1822a28: i32,ch = load 0x1811820, 0x1811d70, 0x1810e90 <0x140226c:0> alignment=4 and 1 other values

Replacing.1 0x1811c60: ch = TokenFactor 0x18118a8:1, 0x1822a28:1, 0x1822a28:1
With: 0x1811ac8: ch = TokenFactor 0x18118a8:1, 0x1822a28:1 and 0 other values
Optimized lowered selection DAG:
SelectionDAG has 33 nodes:
  0x1403e98: ch = EntryToken
  0x1810e90: i32 = undef
    0x1403e98: <multiple use>
    0x1810e08: i32 = FrameIndex <-1>
    0x1810e90: <multiple use>
  0x1810f18: i32,ch = load 0x1403e98, 0x1810e08, 0x1810e90 <null:0> alignment=4
    0x1403e98: <multiple use>
    0x1810fa0: i32 = FrameIndex <-2>
    0x1810e90: <multiple use>
  0x1811028: i32,ch = load 0x1403e98, 0x1810fa0, 0x1810e90 <null:0> alignment=4
    0x1403e98: <multiple use>
    0x18110b0: i32 = FrameIndex <-3>
    0x1810e90: <multiple use>
  0x1811138: i32,ch = load 0x1403e98, 0x18110b0, 0x1810e90 <null:0> alignment=4
  0x18114f0: i32 = FrameIndex <1>
  0x1811688: i32 = FrameIndex <2>
  0x1811798: i32 = FrameIndex <3>
        0x1403e98: <multiple use>
        0x1810f18: <multiple use>
        0x18114f0: <multiple use>
        0x1810e90: <multiple use>
      0x1811600: ch = store 0x1403e98, 0x1810f18, 0x18114f0, 0x1810e90 <0x14021fc:0> alignment=4
      0x1811028: <multiple use>
      0x1811688: <multiple use>
      0x1810e90: <multiple use>
    0x1811710: ch = store 0x1811600, 0x1811028, 0x1811688, 0x1810e90 <0x140226c:0> alignment=4
    0x1811138: <multiple use>
    0x1811798: <multiple use>
    0x1810e90: <multiple use>
  0x1811820: ch = store 0x1811710, 0x1811138, 0x1811798, 0x1810e90 <0x14022ac:0> alignment=4
    0x1811820: <multiple use>
    0x18114f0: <multiple use>
    0x1810e90: <multiple use>
  0x18118a8: i32,ch = load 0x1811820, 0x18114f0, 0x1810e90 <0x14021fc:0> alignment=4
        0x1403e98: <multiple use>
        0x18111c0: i32 = Register #1024
        0x1810f18: <multiple use>
      0x1811248: ch = CopyToReg 0x1403e98, 0x18111c0, 0x1810f18
        0x1403e98: <multiple use>
        0x18112d0: i32 = Register #1025
        0x1811028: <multiple use>
      0x1811358: ch = CopyToReg 0x1403e98, 0x18112d0, 0x1811028
        0x1403e98: <multiple use>
        0x18113e0: i32 = Register #1026
        0x1811138: <multiple use>
      0x1811468: ch = CopyToReg 0x1403e98, 0x18113e0, 0x1811138
          0x18118a8: <multiple use>
          0x1822a28: <multiple use>
        0x1811ac8: ch = TokenFactor 0x18118a8:1, 0x1822a28:1
        0x1822a28: <multiple use>
        0x1811bd8: i32 = FrameIndex <0>
        0x1810e90: <multiple use>
      0x1811ce8: ch = store 0x1811ac8, 0x1822a28, 0x1811bd8, 0x1810e90 <0x14020fc:0> alignment=4
    0x1822808: ch = TokenFactor 0x1811248, 0x1811358, 0x1811468, 0x1811ce8
    0x1822890: i32 = Register r3
    0x1822a28: <multiple use>
  0x1822918: ch,flag = CopyToReg 0x1822808, 0x1822890, 0x1822a28
    0x1811820: <multiple use>
        0x18118a8: <multiple use>
        0x1811578: i32 = Constant <0>
        0x1811930: ch = setne
      0x18119b8: i1 = setcc 0x18118a8, 0x1811578, 0x1811930
      0x1811688: <multiple use>
      0x1811798: <multiple use>
    0x1811d70: i32 = select 0x18119b8, 0x1811688, 0x1811798
    0x1810e90: <multiple use>
  0x1822a28: i32,ch = load 0x1811820, 0x1811d70, 0x1810e90 <0x140226c:0> alignment=4
    0x1822918: <multiple use>
    0x1822918: <multiple use>
  0x18229a0: ch = MYISD::RET_FLAG 0x1822918, 0x1822918:1

Also, just after that, the dump includes:

Legally typed node: 0x1811798: i32 = FrameIndex <3>
Legally typed node: 0x1811688: i32 = FrameIndex <2>

I'm not sure how to indicate that these aren't legally typed. Or at least, aren't legally typed when storing that i32 into a register.