Heads up: Local register allocator going away

I just changed the default register allocator for -O0 builds to the fast allocator.

This means that the local register allocator is not used anymore, and since it does more or less the same as the fast allocator, there is no reason to keep it around.

I am going to delete it in a week or two.

If you are using the local register allocator, please try switching to the fast allocator and report any bugs you find.

Thanks,
/jakob

Tried it, and it seems to break quite a big chunk of our tests on SPU :slight_smile:

Before r103488 ("Mostly rewrite RegAllocFast") there was no problem.

But with r103488, I get a:
llvm/lib/CodeGen/RegisterScavenging.cpp:196: void
llvm::RegScavenger::forward(): Assertion `SubUsed && "Using an undefined
register!"' failed.

In r103685 ("More asserts around physreg uses") the error changed to:
llvm/lib/CodeGen/RegAllocFast.cpp:629:
void<unnamed>::RAFast::AllocateBasicBlock(llvm::MachineBasicBlock&):
Assertion `PhysRegState[Reg] <= regReserved && "Using clobbered
physreg"' failed.

And with latest it is now:
Instruction uses an allocated register
UNREACHABLE executed
at /home/kraiskil/llvm/svn-clean/llvm/lib/CodeGen/RegAllocFast.cpp:302!

This is probably in the SPU backend, as all (most) other backends
compile the example just fine? Where do I start to look if I want to fix
this? I can file a PR if this is not in the SPU backend.

kalle

P.s. This is a simplification of programs that crash:

declare [8 x [8 x float]]* @extFunc()
define void @testFunc() {
  %sl8_5 = tail call [8 x [8 x float]]* @extFunc()
  br label %Entry
Entry:
  %idx = phi i64 [ 0, %0 ], [ %next, %Entry ]
  %scevgep = getelementptr [8 x [8 x float]]* %sl8_5, i64 0, i64 %idx,
i64 0
  %next = add i64 %idx, 1
  %exitcond = icmp eq i64 %next, 8
  br i1 %exitcond, label %Exit, label %Entry
Exit:
  ret void
}
(Sorry for the cluttered simpification: removing any of the call,
getelementpointer or loop removes the llc crash)

In my target the CALL instruction change the link Register %LR
In the target InstrInfo.td I have "Defs=[LR]" on the CALL instruction
definition to handle that.

So your CALL instructions are clobbering your callee-saved registers, eh? :wink:

It works well with others registers allocators: when there is a call
they mark LR used in the function with MRI->addPhysRegsUsed(UsedInInstr);
and LR is callee saved correctly during prolog/epilog insertion.

But with fast register allocator the following lines ignore the Def on
LR in the call instruction: (line 747)

if (TID.isCall()) {
// Spill all virtregs before a call. This serves two purposes: 1. If an
// exception is thrown, the landing pad is going to expect to find registers
// in their spill slots, and 2. we don't have to wade through all the
// <imp-def> operands on the call instruction.
DefOpEnd = VirtOpEnd;
DEBUG(dbgs() << " Spilling remaining registers before call.\n");
spillAll(MI);
}

if I remove this block this fix the bug: LR is marked used with
MRI->addPhysRegsUsed(UsedInInstr)
and LR is correctly saved and restored in the function prolog/epilog.

Is this code really usefull ? :smiley:

Yes, calls can have ridiculously long lists of clobbered registers that we really don't want to go through in a 'fast' allocator.

But you are right, those registers still need to be marked as used by the function.

I'll fix it, thanks!

/jakob

If you are using the local register allocator, please try switching to the fast allocator and report any bugs you find.

Tried it, and it seems to break quite a big chunk of our tests on SPU :slight_smile:

Thanks for testing it!

[...]

And with latest it is now:
Instruction uses an allocated register
UNREACHABLE executed
at /home/kraiskil/llvm/svn-clean/llvm/lib/CodeGen/RegAllocFast.cpp:302!

This is RegAllocFast's way of saying "Oops, I clobbered the return value from your CALL. Didn't think you would need it."

The problem is this code:

BB#0: derived from LLVM BB %0
  BRASL <ga:@extFunc>, %R0<imp-def>, %R1<imp-def>, %R3<imp-def>, %R0<imp-use>, ...
  %reg1028<def> = ILv4i32 0
  %reg1027<def> = ORi64_v2i64 %reg1028
  ADJCALLSTACKUP 0, %R1<imp-def>, %R1<imp-use>
  %reg1029<def> = LRr32 %R3

The return value from the call is in %R3, but %reg1027 and %reg1028 are also allocated to %R3 before it is copied to a safe place (%reg1029).

RegAllocFast does not distinguish between call-clobbered registers and return value registers. They are all considered 'free' after the call. It expects return value physregs to be copied to virtregs immediately after the call instruction, so they are not clobbered.

This is a bit risky, but it works when the return value CopyFromReg is scheduled immediately following the call.

So the question is: What are those ILv4i32 and ORi64_v2i64 doing inside the call sequence? Can you get rid of them?

If you look at the scheduler DAG for this BB (llc -view-sched-dags), you can see that the return value CopyFromReg is tied to ADJCALLSTACKUP with a flag, but there is no flag from ADJCALLSTACKUP to BRASL. This allows the scheduler to put other instructions in the gap, and you don't want that.

You should fix SPUTargetLowering::LowerCall to make sure there is an unbroken chain of flag ties between CopyFromReg and BRASL. At least ARM, MBlaze, and Blackfin are doing this, if you need example code.

This is probably in the SPU backend, as all (most) other backends
compile the example just fine? Where do I start to look if I want to fix
this? I can file a PR if this is not in the SPU backend.

In this case it was RegAllocFast making special assumptions, but in general -verify-machineinstrs and -debug-only=regalloc might help.

And feel free to file PRs for the SPU backend as well.

(Sorry for the cluttered simpification: removing any of the call,
getelementpointer or loop removes the llc crash)

Do you know about bugpoint?

"bugpoint test.ll -run-llc -tool-args -O0 -march=cellspu" will reduce the test case for you. It is awesome.

/jakob

Jakob Stoklund Olesen wrote:

In my target the CALL instruction change the link Register %LR
In the target InstrInfo.td I have "Defs=[LR]" on the CALL instruction
definition to handle that.

So your CALL instructions are clobbering your callee-saved registers, eh? :wink:

Yes, exactly !

It works well with others registers allocators: when there is a call
they mark LR used in the function with MRI->addPhysRegsUsed(UsedInInstr);
and LR is callee saved correctly during prolog/epilog insertion.

But with fast register allocator the following lines ignore the Def on
LR in the call instruction: (line 747)

if (TID.isCall()) {
// Spill all virtregs before a call. This serves two purposes: 1. If an
// exception is thrown, the landing pad is going to expect to find registers
// in their spill slots, and 2. we don't have to wade through all the
// <imp-def> operands on the call instruction.
DefOpEnd = VirtOpEnd;
DEBUG(dbgs() << " Spilling remaining registers before call.\n");
spillAll(MI);
}

if I remove this block this fix the bug: LR is marked used with
MRI->addPhysRegsUsed(UsedInInstr)
and LR is correctly saved and restored in the function prolog/epilog.

Is this code really usefull ? :smiley:

Yes, calls can have ridiculously long lists of clobbered registers that we really don't want to go through in a 'fast' allocator.

But you are right, those registers still need to be marked as used by the function.

I'll fix it, thanks!

Your fix works, thanks !

Thanks for the tip. This got fixed in 105601.

And with that, half of the problematic tests appearing with
--regalloc=fast flag started working. Unfortunately the second half
started to miscompile :frowning:

Also, I now see some rather unoptimal code, e.g:
  ...
  brasl $lr, extFunc
  lr $3, $3
  lr $3, $3
  ...
('lr rt, ra' moves ra->rt). I guess the miscompilations are due to the
same problem as this sort of stuff gets generated. I'm looking into it,
but do you have any more of those useful tips? :wink: The code I used is the
same test case as earlier in this chain. Compiled with:
   llc --march=cellspu --regalloc=fast test.ll -o -

thanks,
kalle

You should fix SPUTargetLowering::LowerCall to make sure there is an unbroken chain of flag ties between CopyFromReg and BRASL. At least ARM, MBlaze, and Blackfin are doing this, if you need example code.

Thanks for the tip. This got fixed in 105601.

Great!

And with that, half of the problematic tests appearing with
--regalloc=fast flag started working. Unfortunately the second half
started to miscompile :frowning:

Do you know how they are being miscompiled? Is the return value from a call being clobbered? Did they crash the compiler before your fix?

Also, I now see some rather unoptimal code, e.g:
  ...
  brasl $lr, extFunc
  lr $3, $3
  lr $3, $3

That's odd, identity copies should get coalesced by RegAllocFast. Is isMoveInstr working correctly for these instructions? Is it setting DstReg and SrcReg correctly?

  ...
('lr rt, ra' moves ra->rt). I guess the miscompilations are due to the
same problem as this sort of stuff gets generated. I'm looking into it,
but do you have any more of those useful tips? :wink: The code I used is the
same test case as earlier in this chain. Compiled with:
  llc --march=cellspu --regalloc=fast test.ll -o -

Look into the identity copy issue first. It could be related.

Does it make a difference if you run llc with -O0?

/jakob