Hi,
I used 2 different compilers to compile the same IR for the same custom target.
The LLVM IR code is
define i32 @_Z9test_mathv() #0 {
%a = alloca i32, align 4
%1 = load i32, i32* %a, align 4
ret i32 %1
}
Before instruction selection, the Selection DAGs are the same:
Optimized legalized selection DAG: %bb.0 ‘_Z9test_mathv:’
SelectionDAG has 7 nodes:
t0: ch = EntryToken
t4: i32,ch = load<(dereferenceable load 4 from %ir.a)> t0, FrameIndex:i32<0>, undef:i32
t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4
t7: ch = UISD::Ret t6, Register:i32 $r4, t6:1
But after it, one has 1 more node than the other
compiler 1
===== Instruction selection ends:
Selected selection DAG: %bb.0 ‘_Z9test_mathv:’
SelectionDAG has 8 nodes:
t0: ch = EntryToken
t1: i32 = add TargetFrameIndex:i32<0>, TargetConstant:i32<0>
t4: i32,ch = LDWI<Mem:(dereferenceable load 4 from %ir.a)> t1, t0
t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4
t7: ch = JLR Register:i32 $r4, t6, t6:1
compiler 2
===== Instruction selection ends:
Selected selection DAG: BB#0 ‘_Z9test_mathv:’
SelectionDAG has 7 nodes:
t0: ch = EntryToken
t4: i32,ch = LDWIMem:LD4[%a](dereferenceable) TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0
t6: ch,glue = CopyToReg t0, Register:i32 %$r4, t4
t7: ch = JLR Register:i32 %$r4, t6, t6:1
In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior? Where in the code should I look into?
(Note that “LDWI” is an instruction that adds up a register and an immediate and loads the memory content located at the address represented by the sum into a register)
Thanks.