Register allocation of stack slots

Hi all,
for my target, the register allocator tends to make use of few (i.e. it seems one single) registers when allocating stack slots to registers. This happens only to function locals (allocas) - allocation for e.g. function arguments passed by the stack work fine.
For example, the debug output of the initialization of several stack slots is the following:

1 : entry:
2 : %reg1074<def> = movC 0
3 : Store: store <fi#18>, 0, %R0<kill>
4 : Remembering SS#18 in physreg R0
5 : store <fi#18>, 0, %R0<kill>
6 : Reusing SS#18 from physreg R0 for vreg1075 instead of reloading into physreg R0
7 : store <fi#9>, 0, %R0, Mem:ST(2,2) [sig5069_nl + 0]
8 : Reusing SS#18 from physreg R0 for vreg1076 instead of reloading into physreg R0
9 : store <fi#8>, 0, %R0, Mem:ST(2,2) [sig5069_nc + 0]
10: %R0<def> = movC 16384
11: PhysReg R0 clobbered, invalidating SS#18
12: store <fi#7>, 0, %R0<kill>, Mem:ST(2,2) [sig5069_re + 0]
13: Remembering SS#18 in physreg R0
14: %R0<def> = load <fi#18>, 0

If I interpret the log correctly:
-R0 is used to initialize the fi#18
-R0 is used to initialize fi #8 and #9 as well
-in line 10 and 12, another value is has to be used to init frame objects. R0 is allocated again
-in line 14, R0 has to be reloaded

The target has 16 registers, why there is no other register used at line 10?
Interestingly, the post-RA-scheduler pass, which I turn on by default, reassigns R1 for the constant 16384 at line 10. However, the reload of fi#18 stays in the code.

BTW, if someone could advise about the risks of using the apparently unfinished post-RA-scheduler - we really need it especially for the hazard recognizer, so it would be nice to be aware of known problems...

Thank you,
Christian

Hi all,
for my target, the register allocator tends to make use of few (i.e. it seems one single) registers when allocating stack slots to registers. This happens only to function locals (allocas) - allocation for e.g. function arguments passed by the stack work fine.
For example, the debug output of the initialization of several stack slots is the following:

1 : entry:
2 : %reg1074<def> = movC 0
3 : Store: store <fi#18>, 0, %R0<kill>
4 : Remembering SS#18 in physreg R0
5 : store <fi#18>, 0, %R0<kill>
6 : Reusing SS#18 from physreg R0 for vreg1075 instead of reloading into physreg R0
7 : store <fi#9>, 0, %R0, Mem:ST(2,2) [sig5069_nl + 0]
8 : Reusing SS#18 from physreg R0 for vreg1076 instead of reloading into physreg R0
9 : store <fi#8>, 0, %R0, Mem:ST(2,2) [sig5069_nc + 0]
10: %R0<def> = movC 16384
11: PhysReg R0 clobbered, invalidating SS#18
12: store <fi#7>, 0, %R0<kill>, Mem:ST(2,2) [sig5069_re + 0]
13: Remembering SS#18 in physreg R0
14: %R0<def> = load <fi#18>, 0

If I interpret the log correctly:
-R0 is used to initialize the fi#18
-R0 is used to initialize fi #8 and #9 as well
-in line 10 and 12, another value is has to be used to init frame objects. R0 is allocated again
-in line 14, R0 has to be reloaded

The target has 16 registers, why there is no other register used at line 10?

There's not quite enough information here for us to see
what's going on here. Could you post more of the code?

Interestingly, the post-RA-scheduler pass, which I turn on by default, reassigns R1 for the constant 16384 at line 10. However, the reload of fi#18 stays in the code.

BTW, if someone could advise about the risks of using the apparently unfinished post-RA-scheduler - we really need it especially for the hazard recognizer, so it would be nice to be aware of known problems...

There are currently no known functionality problems. It isn't
enabled by default because it doesn't make enough of a
difference on ordinary code, while it does take compile time.

It is unfinished, in the sense that it doesn't have all the
features one might want in a post-RA scheduler, and because
the way it tracks register liveness information has a lot of
room for improvement. However, it is usable.

Dan

Dan,

thanks for your hints about the post-RA scheduler.

However, I have difficulties creating a reasonably small and concise
testcase for the problem I described, and it's on my own target anyway.
In principle, I see the following:

%theStruct = type { %theStruct2, %theStruct3, i16, ... }

define void @f1(%theStruct* %tmp) {

// initialize some locals
%a= alloca i16, align 2
store i16 1, i16* %a ;similar b ...
.
.

// GEP some pointers to elements of tmp
%x = getelementptr %theStruct* %tmp, i64 0, i32 13 ; similiar y,z...
.
.

call void f2 (%theStruct2* %x, %theStruct2* %y, %theStruct3* %z, i16* %a, i16* %b, ...)

%var = getelementptr %theStruct* %tmp, i64 0, i32 7
store i16 0, i16* %var

ret void

}

Without the penultimate instruction (store something into the struct),
registers for locals init like a, b, are allocated as I expect, i.e. one for
each constant (the architecture needs to do a mov and a store to initialize memory).
With these two instructions, the initialization of all the locals are done using only
one register, which results in reloads of locals just initialized before.

Well I don't know if it is possible to say anything about that, but maybe someone
has a clue?

thank you,

Christian

Are you generating code with the "fast" option, or the equivalent? This
can cause a variety of code pessimizations.

Otherwise, it's hard to say. The final store and its getelementptr are
keeping %tmp live across the call; without them no values are live
across the call, but I don't know what effect that has on your target.

Dan