Hi @niwinanto. By looking at this pattern, I think this is generated by LowerCall. The main situation here is that, by the ABI, we have an overlap between the return and argument registers (we have the same problem with ARM, and it disappears with the pre-ra scheduler off). I can reproduce this behavior with:
void init_var(int *v);
int chain(int c, int n);
void start() {
int a, b, c;
init_var(&a);
init_var(&b);
init_var(&c);
int r = chain(b, a);
r = chain(c, r);
}
In this case, we have the following code:
start: # @start
# %bb.0: # %entry
addi sp, sp, -16
sw ra, 12(sp) # 4-byte Folded Spill
addi a0, sp, 8
call init_var
addi a0, sp, 4
call init_var
mv a0, sp
call init_var
lw a0, 4(sp)
lw a1, 8(sp)
call chain
lw a1, 0(sp)
mv a2, a0
mv a0, a1
mv a1, a2
call chain
lw ra, 12(sp) # 4-byte Folded Reload
addi sp, sp, 16
ret
In this code, a0 is both used and defined, so, if we first copy to a0 and then a1 we will need an extra temporary. But, if we reverse the copy, we can facilitate machine-cp
work.
If we use this:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 26475d94aef0..5713654d9ad6 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17769,7 +17769,7 @@ SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,
SDValue Glue;
// Build a sequence of copy-to-reg nodes, chained and glued together.
- for (auto &Reg : RegsToPass) {
+ for (auto &Reg : llvm::reverse(RegsToPass)) {
Chain = DAG.getCopyToReg(Chain, DL, Reg.first, Reg.second, Glue);
Glue = Chain.getValue(1);
}
We can reduce a bit the problem to:
start: # @start
# %bb.0: # %entry
addi sp, sp, -16
sw ra, 12(sp) # 4-byte Folded Spill
addi a0, sp, 8
call init_var
addi a0, sp, 4
call init_var
mv a0, sp
call init_var
lw a0, 4(sp)
lw a1, 8(sp)
call chain
lw a2, 0(sp)
mv a1, a0
mv a0, a2
call chain
lw ra, 12(sp) # 4-byte Folded Reload
addi sp, sp, 16
ret
As this problem also exists in other targets, it will be nice to fix it in a target-independent way.
I also took a look and saw that GCC can generate the code as you pointed out.
About the scheduler, I think this can trigger an interesting discussion because it solves this problem along with the generation of a bit of more compact code for this target.
Regards,
Andreu