Glue to connect two nodes in LLVM backend

Hello everyone,

I wanted to attach a node without affecting the present nodes in any way. I tried to use MVT::Glue for that but I think I’m missing something as I could not achieve the below state.

LUI LUI

ADDI ----GLUE---- ADDI

store

I’ve few question about this and Glue node in general, I’ll be happy to get some help on this :slight_smile:

  1. Is Gluing the right approach or there is something better for such use case?
  2. If I do something like DAG.getNode(ISD::ADDI, DL, MVT::Glue, LUINode, OtherADDINode). How will it know that the other ADDI node should be glued and not LUI. I seriously feel I’m missing something.

Many thanks,
Reshabh

Hi Reshabh

I was able to implement a custom backend for a particular architecture but I'm unsure if I understand your question. What you mean by “LUI”?. Also I think there’s no "ISD:ADDI” (?).

The third parameter of the DAG.getNode is the result type. If the node has a single result you can specify a simple MVT, otherwise you must create a SDVTList by calling DAG.getVTList. In your example you set the result of the DAG.getNode to be a MVT:Glue, but there’s no other result that you can use. I assume this is not what you want. The input types are given by the nodes that you pass as operands. Nodes with a glue only result are useful to implement things such as compare instructions that do not produce regular values. However, even in this case it is generally better to just add the flags as the instruction output and forget about the glue.

You can connect nodes in any way you want. Nodes can have input glues and output glues. Gluing nodes means that they will be executed in a defined order, but you should not need to do so if you already have regular values that you can connect in the way that will do what you want.

John

1. Is Gluing the right approach or there is something better for such use case?

I would expect the store to take an i64 operand as its address
(otherwise, in just what sense are your pointers 64-bit?), and
wouldn't expect to need glue at all.

There are two key legalization steps here:

1. Taking an i64:GlobalAddress and converting it into an
ISD::BUILD_PAIR of two nodes computing the low & high 32-bits of the
address.
2. Whatever replaces the store splitting its 64-bit pointer operand
into 2 32-bit values (via the inverse, ISD::EXTRACT_ELEMENT).

The combiner will eliminate the extra nodes from these steps, leading
to a single customized store taking two well-defined 32-bit operands.
All values would be tracked directly, and not need glue to convey a
side-channel of information. I think trying to use glue is misguided.

2. If I do something like DAG.getNode(ISD::ADDI, DL, MVT::Glue, LUINode, OtherADDINode). How will it know that the other ADDI node should be glued and not LUI. I seriously feel I'm missing something.

Again, I don't think Glue applies here, but I will try to describe how
it *is* used elsewhere a little...

If two nodes need glueing, that information is conveyed via a special
extra operand of type MVT::Glue. Some nodes produce this in addition
to their real outputs, and others consume it in addition to their real
inputs. Implementation-wise in your case you'd:

1. Create the OtherADDI node with an extra[*] output type MVT::Glue
beyond what would be expected (it should be the last output type).
2. Add that particular edge as an extra input (aside from the normal
inputs) to the ADDI you're creating now. If it's a duplicate, that
probably means you didn't need the glue in the first place because
normal dependency tracking would handle the situation. Otherwise, your
node now has an extra MVT::Glue edge that tracks this hidden
dependency. The glue should also be the last operand.

You also have to declare this kind of relationship in TableGen if
you're planning to use these nodes there, via SDNPInGlue and
SDNPOutGlue.

Cheers.

Tim.

[*] I wouldn't be surprised if this meant you needed a new
ISD::ADDIWithGlue instead. I haven't done it in years, but you should
be prepared in case a single node with two different output profiles
is more than LLVM can cope with.

Hi Reshabh

I was able to implement a custom backend for a particular architecture but I’m unsure if I understand your question. What you mean by “LUI”?. Also I think there’s no "ISD:ADDI” (?).

Sorry, my bad. LUI and ADDI are present in RISCV ISA.

The third parameter of the DAG.getNode is the result type. If the node has a single result you can specify a simple MVT, otherwise you must create a SDVTList by calling DAG.getVTList. In your example you set the result of the DAG.getNode to be a MVT:Glue, but there’s no other result that you can use. I assume this is not what you want. The input types are given by the nodes that you pass as operands. Nodes with a glue only result are useful to implement things such as compare instructions that do not produce regular values. However, even in this case it is generally better to just add the flags as the instruction output and forget about the glue.

You can connect nodes in any way you want. Nodes can have input glues and output glues. Gluing nodes means that they will be executed in a defined order, but you should not need to do so if you already have regular values that you can connect in the way that will do what you want.

Many thanks Joan, I was not aware about the SDVTList. After knowing more about MVT::Glue, I guess we might not need it and either a wrapper or as you said connecting them will just work.

Sorry Tim, I clicked on reply instead of reply all.

  1. Is Gluing the right approach or there is something better for such use case?

I would expect the store to take an i64 operand as its address
(otherwise, in just what sense are your pointers 64-bit?), and
wouldn’t expect to need glue at all.

I think we might not need a i64 operand at all (won’t they be pain during the legalization as we are working on RV32), as we plan to introduce new load/store like NEW_STORE rd rs1 rs2 (where rs1 has lower 32 bits of 64 bit address and rs2 have higher 32 bits).

There are two key legalization steps here:

  1. Taking an i64:GlobalAddress and converting it into an
    ISD::BUILD_PAIR of two nodes computing the low & high 32-bits of the
    address.

The use of BUILD_PAIR sounds really promising (I was considering to use a custom wrapper node to do the same thing). I took some time to try to make it work but I guess the legalizer fails after seeing a node with result type i64. I tried to be sure about it but it does not fail when it tries to legalize the BUILD_PAIR node with return type i64 instead it fails when it try to legalize the store node having this as an operand.

When I tried to replace BUILD_PAIR with a WRAPPER node which does the same thing but had a return type MVT::Glue (I was not sure so just used this, MVT::Other might be a better option?). Something like this,
t16: i32 = ADDI t15, …

t20: i32 = ADDI t19, …

t21: glue = RISCVISD::WRAPPER t16, t20

This passed the legalizer and failed at the instruction selection (which it should).

What do you think about it? I think the reason for failure (using BUILD_PAIR) is the i64 result which is not legal so the codegen tries to expand it over store, in which it fails as it does not know how to do that.

  1. Whatever replaces the store splitting its 64-bit pointer operand
    into 2 32-bit values (via the inverse, ISD::EXTRACT_ELEMENT).

The combiner will eliminate the extra nodes from these steps, leading
to a single customized store taking two well-defined 32-bit operands.
All values would be tracked directly, and not need glue to convey a
side-channel of information. I think trying to use glue is misguided.

Yes, now I agree that this is a far better way to deal with it.

  1. If I do something like DAG.getNode(ISD::ADDI, DL, MVT::Glue, LUINode, OtherADDINode). How will it know that the other ADDI node should be glued and not LUI. I seriously feel I’m missing something.

Again, I don’t think Glue applies here, but I will try to describe how
it is used elsewhere a little…

If two nodes need glueing, that information is conveyed via a special
extra operand of type MVT::Glue. Some nodes produce this in addition
to their real outputs, and others consume it in addition to their real
inputs. Implementation-wise in your case you’d:

  1. Create the OtherADDI node with an extra[*] output type MVT::Glue
    beyond what would be expected (it should be the last output type).
  2. Add that particular edge as an extra input (aside from the normal
    inputs) to the ADDI you’re creating now. If it’s a duplicate, that
    probably means you didn’t need the glue in the first place because
    normal dependency tracking would handle the situation. Otherwise, your
    node now has an extra MVT::Glue edge that tracks this hidden
    dependency. The glue should also be the last operand.

You also have to declare this kind of relationship in TableGen if
you’re planning to use these nodes there, via SDNPInGlue and
SDNPOutGlue.

Million thanks Tim, I was not aware about quite a many things. I’m happy I got to know new things. The BUILD_PAIR approach looks a lot cleaner than the Glue one.

I would expect the store to take an i64 operand as its address
(otherwise, in just what sense are your pointers 64-bit?), and
wouldn't expect to need glue at all.

I think we might not need a i64 operand at all (won't they be pain during the legalization as we are working on RV32), as we plan to introduce new load/store like NEW_STORE rd rs1 rs2 (where rs1 has lower 32 bits of 64 bit address and rs2 have higher 32 bits).

That makes sense, and should interact well with the pair/extract idiom
(with some minor annoyance to create those stores taking two sources).

The use of BUILD_PAIR sounds really promising (I was considering to use a custom wrapper node to do the same thing). I took some time to try to make it work but I guess the legalizer fails after seeing a node with result type i64. I tried to be sure about it but it does not fail when it tries to legalize the BUILD_PAIR node with return type i64 instead it fails when it try to legalize the store node having this as an operand.

What do you think about it? I think the reason for failure (using BUILD_PAIR) is the i64 result which is not legal so the codegen tries to expand it over store, in which it fails as it does not know how to do that.

If it's not feeding directly into an EXTRACT of the usual i64 ->
(i32,i32) type legalization[*] kind (I think EXTRACT_ELEMENT) then I
would expect failure. So you need to replace your store with the
custom RS1, RS2 variant at the type legalization phase. That's very
early, and might not even have hooks. If you're finding that then the
first DAG-combine phase runs before type legalization, and definitely
can be hooked to replace all stores with your custom node.

Cheers.

Tim.

[*] In case this term isn't familiar, the sequence of DAG phases goes
something like:

Combine 1 -> Type legalization -> Combine 2 -> Legalization -> Selection

After type legalization (for you) *nothing* should have type i64, so
if you need to replace things you have to do it before then. I believe
a part of type legalization will fold BUILD_PAIR/EXTRACT_ELEMENT pairs
to eliminate incidental/trivial i64s.

Million thanks Tim, your replies always take us to a better and much cleaner approach.

I would expect the store to take an i64 operand as its address
(otherwise, in just what sense are your pointers 64-bit?), and
wouldn’t expect to need glue at all.

I think we might not need a i64 operand at all (won’t they be pain during the legalization as we are working on RV32), as we plan to introduce new load/store like NEW_STORE rd rs1 rs2 (where rs1 has lower 32 bits of 64 bit address and rs2 have higher 32 bits).

That makes sense, and should interact well with the pair/extract idiom
(with some minor annoyance to create those stores taking two sources).

The use of BUILD_PAIR sounds really promising (I was considering to use a custom wrapper node to do the same thing). I took some time to try to make it work but I guess the legalizer fails after seeing a node with result type i64. I tried to be sure about it but it does not fail when it tries to legalize the BUILD_PAIR node with return type i64 instead it fails when it try to legalize the store node having this as an operand.

What do you think about it? I think the reason for failure (using BUILD_PAIR) is the i64 result which is not legal so the codegen tries to expand it over store, in which it fails as it does not know how to do that.

If it’s not feeding directly into an EXTRACT of the usual i64 →
(i32,i32) type legalization[*] kind (I think EXTRACT_ELEMENT) then I
would expect failure. So you need to replace your store with the
custom RS1, RS2 variant at the type legalization phase. That’s very
early, and might not even have hooks. If you’re finding that then the
first DAG-combine phase runs before type legalization, and definitely
can be hooked to replace all stores with your custom node.

Yeah this seems far better, we should be doing this already. Last week (after reading your reply) I tried to do this at the type legalization phase. I could successfully replace store node with the custom store (but things didn’t go well).
I am replacing the following store node:

ch = store<(store 4 into i32 addrspace(1)* getelementptr inbounds ([1 x i32], [1 x i32] addrspace(1)* @foo, i32 0, i32 0), addrspace 1)> t4, Constant:i32<87>, GlobalAddress:i64<[1 x i32] addrspace(1)* @foo> 0, undef:i64

where,
t4 : ch = store<(store 4 into %ir.retval)> t0, Constant:i32<0>, FrameIndex:i32<0>, undef:i32
t0 : ch = EntryToken

into:

ch = CUSTOM_STORE Constant:i32<87>, tx, ty, t4

where tx and ty are ADDI for Hi and Lo.

If I don’t keep t4 in the operand list, it fails at second DAG combiner phase, saying entry node not found. I found your reply on a thread about chaining loads and stores(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114783.html) and it makes some sense to me that we need to have the custom store chained as it was before(?) and when I tried that, it failed an assertion at the end of scheduler, saying “#operands for dag node doesn’t match .td file!” (This makes some sense to me, as per .td file it should have three operand not four(?)). Can you please suggest something for this, I don’t have any backup plan to make this work (I’m stuck).

SelectionDAG after instruction selection (Line 11 has the CUSTOM_STORE)

  1. t0: ch = EntryToken

  2. t5: i32 = ADDI Register:i32 $x0, TargetConstant:i32<4>

  3. t34: i32,ch = CopyFromReg t0, Register:i32 $x0

  4. t4: ch = SW<Mem:(store 4 into %ir.retval)> t34, TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0

  5. t7: ch = SW<Mem:(store 4 into %ir.a)> t5, TargetFrameIndex:i32<1>, TargetConstant:i32<0>, t4

  6. t8: i32 = ADDI Register:i32 $x0, TargetConstant:i32<87>

  7. t19: i32 = LUI TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=2]

  8. t20: i32 = ADDI t19, TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=3]

  9. t23: i32 = LUI TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=4]

  10. t24: i32 = ADDI t23, TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=1]

  11. t27: ch = CUSTOM_STORE t8, t20, t24, t7

  12. t13: i32,ch = LW<Mem:(dereferenceable load 4 from %ir.a)> TargetFrameIndex:i32<1>, TargetConstant:i32<0>, t27

  13. t15: ch,glue = CopyToReg t27, Register:i32 $x10, t13

  14. t16: ch = PseudoRET Register:i32 $x10, t15, t15:1

Cheers.

Tim.

[*] In case this term isn’t familiar, the sequence of DAG phases goes
something like:

Combine 1 → Type legalization → Combine 2 → Legalization → Selection

After type legalization (for you) nothing should have type i64, so
if you need to replace things you have to do it before then. I believe
a part of type legalization will fold BUILD_PAIR/EXTRACT_ELEMENT pairs
to eliminate incidental/trivial i64s.

Many thanks
Reshabh

Million thanks Tim, your replies always take us to a better and much cleaner approach.

I would expect the store to take an i64 operand as its address
(otherwise, in just what sense are your pointers 64-bit?), and
wouldn’t expect to need glue at all.

I think we might not need a i64 operand at all (won’t they be pain during the legalization as we are working on RV32), as we plan to introduce new load/store like NEW_STORE rd rs1 rs2 (where rs1 has lower 32 bits of 64 bit address and rs2 have higher 32 bits).

That makes sense, and should interact well with the pair/extract idiom
(with some minor annoyance to create those stores taking two sources).

The use of BUILD_PAIR sounds really promising (I was considering to use a custom wrapper node to do the same thing). I took some time to try to make it work but I guess the legalizer fails after seeing a node with result type i64. I tried to be sure about it but it does not fail when it tries to legalize the BUILD_PAIR node with return type i64 instead it fails when it try to legalize the store node having this as an operand.

What do you think about it? I think the reason for failure (using BUILD_PAIR) is the i64 result which is not legal so the codegen tries to expand it over store, in which it fails as it does not know how to do that.

If it’s not feeding directly into an EXTRACT of the usual i64 →
(i32,i32) type legalization[*] kind (I think EXTRACT_ELEMENT) then I
would expect failure. So you need to replace your store with the
custom RS1, RS2 variant at the type legalization phase. That’s very
early, and might not even have hooks. If you’re finding that then the
first DAG-combine phase runs before type legalization, and definitely
can be hooked to replace all stores with your custom node.

Yeah this seems far better, we should be doing this already. Last week (after reading your reply) I tried to do this at the type legalization phase. I could successfully replace store node with the custom store (but things didn’t go well).
I am replacing the following store node:

ch = store<(store 4 into i32 addrspace(1)* getelementptr inbounds ([1 x i32], [1 x i32] addrspace(1)* @foo, i32 0, i32 0), addrspace 1)> t4, Constant:i32<87>, GlobalAddress:i64<[1 x i32] addrspace(1)* @foo> 0, undef:i64

where,
t4 : ch = store<(store 4 into %ir.retval)> t0, Constant:i32<0>, FrameIndex:i32<0>, undef:i32
t0 : ch = EntryToken

into:

ch = CUSTOM_STORE Constant:i32<87>, tx, ty, t4

where tx and ty are ADDI for Hi and Lo.

If I don’t keep t4 in the operand list, it fails at second DAG combiner phase, saying entry node not found. I found your reply on a thread about chaining loads and stores(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114783.html) and it makes some sense to me that we need to have the custom store chained as it was before(?) and when I tried that, it failed an assertion at the end of scheduler, saying “#operands for dag node doesn’t match .td file!” (This makes some sense to me, as per .td file it should have three operand not four(?)). Can you please suggest something for this, I don’t have any backup plan to make this work (I’m stuck).

SelectionDAG after instruction selection (Line 11 has the CUSTOM_STORE)

  1. t0: ch = EntryToken

  2. t5: i32 = ADDI Register:i32 $x0, TargetConstant:i32<4>

  3. t34: i32,ch = CopyFromReg t0, Register:i32 $x0

  4. t4: ch = SW<Mem:(store 4 into %ir.retval)> t34, TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0

  5. t7: ch = SW<Mem:(store 4 into %ir.a)> t5, TargetFrameIndex:i32<1>, TargetConstant:i32<0>, t4

  6. t8: i32 = ADDI Register:i32 $x0, TargetConstant:i32<87>

  7. t19: i32 = LUI TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=2]

  8. t20: i32 = ADDI t19, TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=3]

  9. t23: i32 = LUI TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=4]

  10. t24: i32 = ADDI t23, TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=1]

  11. t27: ch = CUSTOM_STORE t8, t20, t24, t7

  12. t13: i32,ch = LW<Mem:(dereferenceable load 4 from %ir.a)> TargetFrameIndex:i32<1>, TargetConstant:i32<0>, t27

  13. t15: ch,glue = CopyToReg t27, Register:i32 $x10, t13

  14. t16: ch = PseudoRET Register:i32 $x10, t15, t15:1

We were using a wrong input as chain input, I’ve fixed it and we could pass this. We are now failing at instruction printer and I think the reason is the missing value operand in our custom store after the conversion to machine instruction (More description: http://lists.llvm.org/pipermail/llvm-dev/2019-July/134119.html)

Many thanks,
Reshabh