Question about Pre-RA-schedule in LLVM3.3

Hi,
I compile a case (test.c) to get object machine file (test.o) using clang as follows:
“clang -target arm -integrated-as -c test.c -o test.o”
My clang version is 3.3 and debug build.

//test.c
int a[6] = {1, 2, 3, 4, 5, 6}
int main() {
a[0] = a[5];
a[1] = a[4];
a[2] = a[5];
}
//end test.c
Then test.dump is generated by using the objdump tool.
//test.dump
ldr r1, [r0, #20]
str r1, [r0]
ldr r1, [r0, #16]
str r1, [r0, #4]
ldr r1, [r0, #12]
str r1, [r0, #8]
bx lr
//end test.dump
From the test.dump, we can see that the first instruction and second one use a register “r1”, the 3th and 4th use the same register “r1”, it’s same to the 5th and 6th instruction.
That’s to say, the six instructions use the same register.
However, for 3th and 4th instructions, they should be allocated different register from the second instruction.
So, I insert a breakpoint in BuildSchedGraph function in ScheduleDAGSNodes.cpp to debug the source code.
Then I get schedule graph of this basic block:

Like the above graph, Pre-RA-sched(ScheduleRRList.cpp) is unable to insert the 3th SDNode(load2 instruction) between the first SDNode(load1 Instruction) and the second store1 SDNode.
Then in the register allocation step, the pair instruction are allocated same register.
However, if we build a schedule graph like the following:

I think that Pre-RA-sched has change to schedule apart load1 and store1, the same to load2 and store2.
Have someone considered building such a schedule graph?
Thank you very much if any suggestion.
-Haishan

Haishan писал 15.12.2013 17:47:

Hi,
I compile a case (test.c) to get object machine file (test.o) using
clang as follows:
"clang -target arm -integrated-as -c test.c -o test.o"
My clang version is 3.3 and debug build.

//test.c
int a[6] = {1, 2, 3, 4, 5, 6}
int main() {
a[0] = a[5];
a[1] = a[4];
a[2] = a[5];
}
//end test.c
Then test.dump is generated by using the objdump tool.
//test.dump
ldr r1, [r0, #20]
str r1, [r0]
ldr r1, [r0, #16]
str r1, [r0, #4]
ldr r1, [r0, #12]
str r1, [r0, #8]
bx lr
//end test.dump
From the test.dump, we can see that the first instruction and second
one use a register "r1", the 3th and 4th use the same register "r1",
it's same to the 5th and 6th instruction.
That's to say, the six instructions use the same register.
However, for 3th and 4th instructions, they should be allocated
different register from the second instruction.
So, I insert a breakpoint in BuildSchedGraph function in
ScheduleDAGSNodes.cpp to debug the source code.
Then I get schedule graph of this basic block:

Like the above graph, Pre-RA-sched(ScheduleRRList.cpp) is unable to
insert the 3th SDNode(load2 instruction) between the first
SDNode(load1 Instruction) and the second store1 SDNode.
Then in the register allocation step, the pair instruction are
allocated same register.
However, if we build a schedule graph like the following:

I think that Pre-RA-sched has change to schedule apart load1 and
store1, the same to load2 and store2.
Have someone considered building such a schedule graph?
Thank you very much if any suggestion.
-Haishan

Try -mllvm -pre-RA-sched=list-burr

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Haishan
Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3

My clang version is 3.3 and debug build.

//test.c
int a[6] = {1, 2, 3, 4, 5, 6}
int main() {
a[0] = a[5];
a[1] = a[4];
a[2] = a[5];
}
//end test.c
Then test.dump is generated by using the objdump tool.
//test.dump
ldr r1, [r0, #20]
str r1, [r0]
ldr r1, [r0, #16]
str r1, [r0, #4]
ldr r1, [r0, #12]
str r1, [r0, #8]
bx lr
//end test.dump

It appears you have a typo in the above, since the generated array reference offsets do not correspond to the code in test.c. Presumably, the last array reference in test.c was really from a[3], not a[5].

However, for 3th and 4th instructions, they should be allocated different
register from the second instruction.

Why?

- Chuck

>> From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
>> On Behalf Of Haishan
>> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3
>
>> My clang version is 3.3 and debug build.
>
>> //test.c
>> int a[6] = {1, 2, 3, 4, 5, 6}
>> int main() {
>>  a[0] = a[5];
>>  a[1] = a[4];
>>  a[2] = a[5];
>> }
>> //end test.c
>> Then test.dump is generated by using the objdump tool.
>> //test.dump
>> ldr  r1, [r0, #20]
>> str  r1, [r0]
>> ldr  r1, [r0, #16]
>> str  r1, [r0, #4]
>> ldr  r1, [r0, #12]
>> str  r1, [r0, #8]
>> bx  lr
>> //end test.dump
>
>It appears you have a typo in the above, since the generated array reference offsets do not correspond to the code in test.c.  Presumably, the last array reference in test.c was really from a[3], not a[5].
I'm sorry for making a mistake in the above test.c.
And your presumption is right.
>
>> However, for 3th and 4th instructions, they should be allocated different 
>> register from the second instruction.
>
>Why?
>
> - Chuck
>

Thank you for your answer.
If 3th and 4th instructions are allocated different register from the second instruction.
Then the same machine register dependence will disappear, 
this sequence instructions would be executed with less stalls and cycles.
However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following.
//original graph
----> data flow
====> control flow
load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3
//end original graph
So, Pre-RA-sched is unable to schedule apart load/store instruction pair.
Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register.

If we change the control flow in the above original graph, the modified graph is shown following.
//modified graph
----> data flow
====> control flow
load1 ----> store1 ====> store2 ====> store3
load2 ----> store2
load3 ----> store3
//end modified graph

I think the Pre-RA-sched is able to schedule apart load/store instruction pairs.
Then each instruction pair uses different register.
The order of scheduled instruction of test.c may be load1, load2, load3, store1, store2, store3.
Best Wishes
- Haishan

The flag -enable-aa-sched-mi should do what you want you want in the MachineScheduler pass.

If you want to do it in the selection DAG, there is a subtarget hook that might do it:

TargetSubtargetInfo::useAA()

LLVM won’t generate the schedule you want anyway for Intel core processors, but the alias analysis can be useful in general.

-Andy

From: "Andrew Trick" <atrick@apple.com>
To: "Haishan" <hndxvon@163.com>
Cc: "llvmdev" <llvmdev@cs.uiuc.edu>
Sent: Friday, December 20, 2013 8:41:40 PM
Subject: Re: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3

The flag -enable-aa-sched-mi should do what you want you want in the
MachineScheduler pass.

If you want to do it in the selection DAG, there is a subtarget hook
that might do it:

TargetSubtargetInfo::useAA()

To be clear, returning true from useAA enables the use of AA both in the selection DAG and also in the MachineScheduler pass.

-Hal