Million thanks Tim, your replies always take us to a better and much cleaner approach.
I would expect the store to take an i64 operand as its address
(otherwise, in just what sense are your pointers 64-bit?), and
wouldn’t expect to need glue at all.
I think we might not need a i64 operand at all (won’t they be pain during the legalization as we are working on RV32), as we plan to introduce new load/store like NEW_STORE rd rs1 rs2 (where rs1 has lower 32 bits of 64 bit address and rs2 have higher 32 bits).
That makes sense, and should interact well with the pair/extract idiom
(with some minor annoyance to create those stores taking two sources).
The use of BUILD_PAIR sounds really promising (I was considering to use a custom wrapper node to do the same thing). I took some time to try to make it work but I guess the legalizer fails after seeing a node with result type i64. I tried to be sure about it but it does not fail when it tries to legalize the BUILD_PAIR node with return type i64 instead it fails when it try to legalize the store node having this as an operand.
What do you think about it? I think the reason for failure (using BUILD_PAIR) is the i64 result which is not legal so the codegen tries to expand it over store, in which it fails as it does not know how to do that.
If it’s not feeding directly into an EXTRACT of the usual i64 →
(i32,i32) type legalization[*] kind (I think EXTRACT_ELEMENT) then I
would expect failure. So you need to replace your store with the
custom RS1, RS2 variant at the type legalization phase. That’s very
early, and might not even have hooks. If you’re finding that then the
first DAG-combine phase runs before type legalization, and definitely
can be hooked to replace all stores with your custom node.
Yeah this seems far better, we should be doing this already. Last week (after reading your reply) I tried to do this at the type legalization phase. I could successfully replace store node with the custom store (but things didn’t go well).
I am replacing the following store node:
ch = store<(store 4 into i32 addrspace(1)* getelementptr inbounds ([1 x i32], [1 x i32] addrspace(1)* @foo, i32 0, i32 0)
, addrspace 1)> t4, Constant:i32<87>, GlobalAddress:i64<[1 x i32] addrspace(1)* @foo> 0, undef:i64
where,
t4 : ch = store<(store 4 into %ir.retval)> t0, Constant:i32<0>, FrameIndex:i32<0>, undef:i32
t0 : ch = EntryToken
into:
ch = CUSTOM_STORE Constant:i32<87>, tx, ty, t4
where tx and ty are ADDI for Hi and Lo.
If I don’t keep t4 in the operand list, it fails at second DAG combiner phase, saying entry node not found. I found your reply on a thread about chaining loads and stores(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114783.html) and it makes some sense to me that we need to have the custom store chained as it was before(?) and when I tried that, it failed an assertion at the end of scheduler, saying “#operands for dag node doesn’t match .td file!” (This makes some sense to me, as per .td file it should have three operand not four(?)). Can you please suggest something for this, I don’t have any backup plan to make this work (I’m stuck).
SelectionDAG after instruction selection (Line 11 has the CUSTOM_STORE)
-
t0: ch = EntryToken
-
t5: i32 = ADDI Register:i32 $x0, TargetConstant:i32<4>
-
t34: i32,ch = CopyFromReg t0, Register:i32 $x0
-
t4: ch = SW<Mem:(store 4 into %ir.retval)> t34, TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0
-
t7: ch = SW<Mem:(store 4 into %ir.a)> t5, TargetFrameIndex:i32<1>, TargetConstant:i32<0>, t4
-
t8: i32 = ADDI Register:i32 $x0, TargetConstant:i32<87>
-
t19: i32 = LUI TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=2]
-
t20: i32 = ADDI t19, TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=3]
-
t23: i32 = LUI TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=4]
-
t24: i32 = ADDI t23, TargetGlobalAddress:i32<[1 x i32] addrspace(1)* @foo> 0 [TF=1]
-
t27: ch = CUSTOM_STORE t8, t20, t24, t7
-
t13: i32,ch = LW<Mem:(dereferenceable load 4 from %ir.a)> TargetFrameIndex:i32<1>, TargetConstant:i32<0>, t27
-
t15: ch,glue = CopyToReg t27, Register:i32 $x10, t13
-
t16: ch = PseudoRET Register:i32 $x10, t15, t15:1
Cheers.
Tim.
[*] In case this term isn’t familiar, the sequence of DAG phases goes
something like:
Combine 1 → Type legalization → Combine 2 → Legalization → Selection
After type legalization (for you) nothing should have type i64, so
if you need to replace things you have to do it before then. I believe
a part of type legalization will fold BUILD_PAIR/EXTRACT_ELEMENT pairs
to eliminate incidental/trivial i64s.
Many thanks
Reshabh