Possible bug in the ARM backend?

Hi,

I'm working on the iterated register coalescing graph coloring
allocator and try to test it with all backends available currently in
LLVM.
Initial tests with most of the backends are successful.

It turned out that my allocator triggers a specific assertion in the
RegScavenger and only for the ARM target. It looks like the LR
register is used for frame pointer related things,
but it is STILL available for register allocation according to the
ARMRegisterInfo.td:

def GPR : RegisterClass<"ARM", [i32], 32, [R0, R1, R2, R3, R4, R5, R6,
                                           R7, R8, R9, R10, R12, R11,
                                           LR, SP, PC]>

Let me now explain the problem step-by-step:

1) Here is the function's machine code before register allocation
(this code is produced by bugpoint from a bigger test-case):
    If you need the BC file, it is attached:

# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
        %reg1026<def,dead> = MOVr %R1<kill>, 14, %reg0, %reg0
        %reg1025<def> = MOVr %R0<kill>, 14, %reg0, %reg0
        %reg1024<def> = MOVr %reg1025, 14, %reg0, %reg0
        CMPri %reg1025<kill>, 0, 14, %reg0, %CPSR<imp-def>
        Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
    Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
    Predecessors according to CFG: 0x8fdac90 (#0)
        %reg1027<def> = MOVi 0, 14, %reg0, %reg0
        STR %reg1024<kill>, %reg1027<kill>, %reg0, 0, 14, %reg0,
Mem:ST(4,4) [0x8fc2d68 + 0]
        BX_RET 14, %reg0

UnifiedReturnBlock: 0x8fdad70, LLVM BB @0x8fc2cc0, ID#2:
    Predecessors according to CFG: 0x8fdac90 (#0)
        BX_RET 14, %reg0

# End machine code for Insert().

2) My register allocator produces a following allocation:

********** REGISTER MAP **********
[reg1024 -> LR]
[reg1025 -> R0]
[reg1026 -> R1]
[reg1027 -> R0]

The interesting bit is that it is:
- different from the linearscan result
- it assigned the LR reg to the reg1024, even though LR is not the
first register in the allocation order for the GPR register class.
  Even though, it ignores the preferred allocation order, it is not a
bug and is quite legal.

  BTW, I obtain the set of allocatable register using the following
code at the beginning of the runOnMachineFunction() of my register
allocator.
  Is anything wrong with it?

  mri = tm->getRegisterInfo();

  // Prepare regClass2AllowedSet for each register class
  // This should be done on a per function basis, because
  // some registers may get included/excluded on a per
  // function basic (e.g. frame pointer on X86)

  regClass2AllowedSet.clear();
  regClass2AllowedSet.resize(mri->getNumRegClasses() + 1);

  for (TargetRegisterInfo::regclass_iterator
       RCI = mri->regclass_begin(), RCE = mri->regclass_end();
       RCI != RCE; ++RCI) {

    int regClassId = (*RCI)->getID();
    regClass2AllowedSet[regClassId].resize(mri->getNumRegs() + 1);

    for (TargetRegisterClass::iterator I = (*RCI)->allocation_order_begin(*mf),
                                       E = (*RCI)->allocation_order_end(*mf);
         I != E; ++I)
      regClass2AllowedSet[regClassId].set(*I);
  }

3) This register allocation results in the following machine code,
after replacement of virtual regs by the assigned physical regs:

**** Post Machine Instrs ****
# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
        %LR<def> = MOVr %R0, 14, %reg0, %reg0
        CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
        Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
    Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
    Predecessors according to CFG: 0x8fdac90 (#0)
        %R0<def> = MOVi 0, 14, %reg0, %reg0
        STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
        BX_RET 14, %reg0

UnifiedReturnBlock: 0x8fdad70, LLVM BB @0x8fc2cc0, ID#2:
    Predecessors according to CFG: 0x8fdac90 (#0)
        BX_RET 14, %reg0

# End machine code for Insert().

4) Then I get the following assertion:
     llc: /opt/llvm/lib/CodeGen/RegisterScavenging.cpp:223: void
llvm::RegScavenger::forward(): Assertion `isUsed(Reg) && "Using an
undefined register!"' failed.

    It is triggered by PrologEpilogInserter::replaceFrameIndices()
function. The undefined register is the LR register.

    If I dump the function at this point I see the following (the
instruction tiggering the assetion is marked by ***):

# Machine code for Insert():
  <fi#0>: size is 4 bytes, alignment is 4 bytes, at location [SP-4]
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1 %LR
        %SP<def> = SUBri %SP<kill>, 4, 14, %reg0, %reg0
        STR %LR<kill>, %SP, %reg0, 0, 14, %reg0
        %LR<def> = MOVr %R0, 14, %reg0, %reg0
        CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
        Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
    Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
    Predecessors according to CFG: 0x8fdac90 (#0)
        %R0<def> = MOVi 0, 14, %reg0, %reg0
*** STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
        %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
        %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
        BX_RET 14, %reg0

UnifiedReturnBlock: 0x8fdad70, LLVM BB @0x8fc2cc0, ID#2:
    Predecessors according to CFG: 0x8fdac90 (#0)
        %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
        %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
        BX_RET 14, %reg0

# End machine code for Insert().

  As you can see, PrologEpilogInserter has inserted at the beginning
of the function some code for manipulation of the frame pointer and
this inserted code uses the LR register.
  As far as I understand, ARMRegisterInfo.td should exclude the LR
register from the set of allocatable registers for functions that
require frame pointer manipulation.
  But currently it is not the case, or?

  I hope that I provided enough information to explain my problem. I
also provided my initial analysis, but may be I'm wrong.

  Can someone more knowledgeable in ARM backend and LLVM's register
allocation framework have a look at it?
  If it is a bug in the ARM backend, could it be fixed?

  Thanks,
   Roman

bugpoint-reduced-simplified.bc (536 Bytes)

As you can see, PrologEpilogInserter has inserted at the beginning
of the function some code for manipulation of the frame pointer and
this inserted code uses the LR register.
As far as I understand, ARMRegisterInfo.td should exclude the LR
register from the set of allocatable registers for functions that
require frame pointer manipulation.
But currently it is not the case, or?

No, LR is not the frame pointer. It’s the link register (caller address). It should be available as a general purpose register. The bug is elsewhere. It has to do with kill / dead markers.

%LR = LDR <fi#0>, %reg0, 0, 14, %reg0
%SP = ADDri %SP, 4, 14, %reg0, %reg0
BX_RET 14, %reg0

LR is restored here but it’s not killed before the end of the block is reached. Should BX_RET use it?

Evan

Hi Evan,

Thanks for your feedback!

As you can see, PrologEpilogInserter has inserted at the beginning
of the function some code for manipulation of the frame pointer and
this inserted code uses the LR register.
As far as I understand, ARMRegisterInfo.td should exclude the LR
register from the set of allocatable registers for functions that
require frame pointer manipulation.
But currently it is not the case, or?

No, LR is not the frame pointer. It's the link register (caller address). It
should be available as a general purpose register.

OK.

The bug is elsewhere. It has to do with kill / dead markers.
       %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
       %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
       BX_RET 14, %reg0
LR is restored here but it's not killed before the end of the block is
reached.

Hmm. I have no idea about what ARM backend does. My register allocator
just assigns the registers as I explained in my original mail.
Then it lets VirtRegMap.cpp do its job, i.e. it lets it rewrite the
code and replace virtual registers by the assigned physical registers.
You can see the result in the step (3) of my original mail. In my
opinion, it still looks correct. May be this rewriting process does
something wrong?

Then PrologEpilogInserter and some other standard post RA passes are
invoked for the ARM backend. But I have not changed anything there, so
I have no idea what happens.

And, BTW, the instructions you mentioned above are after the
instruction triggering the assertion, which is:
STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4) [0x8fc2d68 + 0]

Should BX_RET use it?

I don't know the semantics of BX_RET on the ARM platform. May be it
uses BX_RET somehow.

BTW, an idea: May be it is easy to trigger exactly the same behaviour
with the linear scan if one does the following:
  - comment out dependency on the coalescer, so that it is not invoked
  - change the allocation order of the GPR register class for ARM, so
that it starts with the LR register.

This looks like a bar in ARMInstrInfo.td:

BX_RET should be marked with Uses = [LR] since it uses LR. However, this won't work if there is a call BL before the BX_RET. BL is marked as if it implicitly define LR. So we'll end up with this (hello world example):

Live Ins: %LR %R7
         %SP<def> = SUBri %SP<kill>, 8, 14, %reg0, %reg0
         STR %LR<kill>, %SP, %reg0, 4, 14, %reg0
         STR %R7<kill>, %SP, %reg0, 0, 14, %reg0
         %R7<def> = MOVr %SP, 14, %reg0, %reg0
         %R0<def> = LDR <cp#0>, %reg0, 0, 14, %reg0, Mem:LD(4,4) [<unknown> + 0]
         BL <ga:puts>, %R0<kill>, %R0<imp-def,dead>, %R1<imp-def,dead>, %R2<imp-def,dead>, %R3<imp-def,dead>, %R12<imp-def,dead>, %LR<imp-def>, %D0<imp-def,dead>, %D1<imp-def,de\

, %D2<imp-def,dead>, %D3<imp-def,dead>, %D4<imp-def,dead>, %D5<imp-

def,dead>, %D6<imp-def,dead>, %D7<imp-def,dead>, %CPSR<imp-def,dead>
         %R0<def> = MOVi 0, 14, %reg0, %reg0
         %R7<def> = LDR %SP, %reg0, 0, 14, %reg0
         %LR<def> = LDR %SP, %reg0, 4, 14, %reg0
  %SP<def> = ADDri %SP<kill>, 8, 14, %reg0, %reg0
         BX_RET 14, %reg0, %R0<imp-use,kill>, %LR<imp-use,kill>

The LR defined by BL is not killed before the PEI inserted LR restore. The register scavenger doesn't like this.

The issue is while BL does modifies LR, it doesn't actually defines LR so later instructions can use it. I'll think about how to fix this. It's not obvious to me at this point.

Evan

This looks like a bar in ARMInstrInfo.td:

BX_RET should be marked with Uses = [LR] since it uses LR. However,
this won't work if there is a call BL before the BX_RET. BL is marked
as if it implicitly define LR. So we'll end up with this (hello world
example):

PPC has the call (BL) marked with Defs=LR and the return (BLR)
marked with Uses=LR, and works AFAIK. Let me figure out what's different...

This looks like a bar in ARMInstrInfo.td:

BX_RET should be marked with Uses = [LR] since it uses LR. However,
this won't work if there is a call BL before the BX_RET. BL is marked
as if it implicitly define LR. So we'll end up with this (hello world
example):

PPC has the call (BL) marked with Defs=LR and the return (BLR)
marked with Uses=LR, and works AFAIK. Let me figure out what's different...

OK, it's got special MFLR/MTLR instructions to do the save/restore of LR (actually
they go through a GR first) which are marked as use/def of LR. You could simulate
something like that on ARM I guess.

Ok, ignore my earlier email about BX_RET. The issue is LR should be added to livein of BB #1.

**** Post Machine Instrs ****
# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
        %LR<def> = MOVr %R0, 14, %reg0, %reg0
        CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
        Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
    Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
    Predecessors according to CFG: 0x8fdac90 (#0)
        %R0<def> = MOVi 0, 14, %reg0, %reg0
        STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
        BX_RET 14, %reg0

Here the STR is using LR, but there isn't a def earlier.

Evan

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
  Predecessors according to CFG: 0x8fdac90 (#0)
      %R0<def> = MOVi 0, 14, %reg0, %reg0
*** STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
      %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
      %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
      BX_RET 14, %reg0

Ok, ignore my earlier email about BX_RET. The issue is LR should be added to
livein of BB #1.

Who should do it?
Do you mean that ARM backend/LiveIntervalsAnalysis/LiveVariables
should do it or do you mean that my regalloc should do it?

**** Post Machine Instrs ****
# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
      %LR<def> = MOVr %R0, 14, %reg0, %reg0
      CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
      Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
  Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
  Predecessors according to CFG: 0x8fdac90 (#0)
      %R0<def> = MOVi 0, 14, %reg0, %reg0
      STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
      BX_RET 14, %reg0

Here the STR is using LR, but there isn't a def earlier.

May be I overlook something, but doesn't
%LR<def> = MOVr %R0, 14, %reg0, %reg0
in MBB#0 define the LR? It should be enough, or?

-Roman

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
Predecessors according to CFG: 0x8fdac90 (#0)
     %R0<def> = MOVi 0, 14, %reg0, %reg0
*** STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
     %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
     %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
     BX_RET 14, %reg0

Ok, ignore my earlier email about BX_RET. The issue is LR should be added to
livein of BB #1.

Who should do it?
Do you mean that ARM backend/LiveIntervalsAnalysis/LiveVariables
should do it or do you mean that my regalloc should do it?

Register allocator should update mbb Livein info.

**** Post Machine Instrs ****
# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
     %LR<def> = MOVr %R0, 14, %reg0, %reg0
     CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
     Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
Predecessors according to CFG: 0x8fdac90 (#0)
     %R0<def> = MOVi 0, 14, %reg0, %reg0
     STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
     BX_RET 14, %reg0

Here the STR is using LR, but there isn't a def earlier.

May be I overlook something, but doesn't
%LR<def> = MOVr %R0, 14, %reg0, %reg0
in MBB#0 define the LR? It should be enough, or?

Every machine basic block must list physical register livein's.

Evan

Hi again,

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
Predecessors according to CFG: 0x8fdac90 (#0)
    %R0<def> = MOVi 0, 14, %reg0, %reg0
*** STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
    %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
    %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
    BX_RET 14, %reg0

Ok, ignore my earlier email about BX_RET. The issue is LR should be added
to
livein of BB #1.

Who should do it?
Do you mean that ARM backend/LiveIntervalsAnalysis/LiveVariables
should do it or do you mean that my regalloc should do it?

Register allocator should update mbb Livein info.

OK.

**** Post Machine Instrs ****
# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
    %LR<def> = MOVr %R0, 14, %reg0, %reg0
    CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
    Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
Predecessors according to CFG: 0x8fdac90 (#0)
    %R0<def> = MOVi 0, 14, %reg0, %reg0
    STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
    BX_RET 14, %reg0

Here the STR is using LR, but there isn't a def earlier.

May be I overlook something, but doesn't
%LR<def> = MOVr %R0, 14, %reg0, %reg0
in MBB#0 define the LR? It should be enough, or?

Every machine basic block must list physical register livein's.

One question, just to be sure I understand you correctly. You mean
that after the RegAlloc has assigned physical registers to
LiveIntervals and before it calls the VRM to rewrite the function, it
should explicitly add lives-ins for each MBB, just like LinearScan
does it at the end of the RALinScan::linearScan() function???

E.g. like this:

  // Add live-ins to every BB except for entry. Also perform trivial coalescing.
  MachineFunction::iterator EntryMBB = mf_->begin();
  SmallVector<MachineBasicBlock*, 8> LiveInMBBs;
  for (LiveIntervals::iterator i = li_->begin(), e = li_->end(); i != e; ++i) {
    LiveInterval &cur = *i->second;
    unsigned Reg = 0;
    bool isPhys = TargetRegisterInfo::isPhysicalRegister(cur.reg);
    if (isPhys)
      Reg = cur.reg;
    else if (vrm_->isAssignedReg(cur.reg))
      Reg = attemptTrivialCoalescing(cur, vrm_->getPhys(cur.reg));
    if (!Reg)
      continue;
    // Ignore splited live intervals.
    if (!isPhys && vrm_->getPreSplitReg(cur.reg))
      continue;
    for (LiveInterval::Ranges::const_iterator I = cur.begin(), E = cur.end();
         I != E; ++I) {
      const LiveRange &LR = *I;
      if (li_->findLiveInMBBs(LR.start, LR.end, LiveInMBBs)) {
        for (unsigned i = 0, e = LiveInMBBs.size(); i != e; ++i)
          if (LiveInMBBs[i] != EntryMBB)
            LiveInMBBs[i]->addLiveIn(Reg);
        LiveInMBBs.clear();
      }
    }
  }

If it is the case, it is OK. It was not clear for me that one has to
do it in the regalloc. My assumption was that VRM would do it.

Thanks a lot,
  Roman

Hi Evan,

OK, I got it. Your advice about LiveIn regs saved my day! Now ARM
failures are in the past.

Thanks a lot!
-Roman

Hi again,

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
Predecessors according to CFG: 0x8fdac90 (#0)
   %R0<def> = MOVi 0, 14, %reg0, %reg0
*** STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
   %LR<def> = LDR <fi#0>, %reg0, 0, 14, %reg0
   %SP<def> = ADDri %SP<kill>, 4, 14, %reg0, %reg0
   BX_RET 14, %reg0

Ok, ignore my earlier email about BX_RET. The issue is LR should be added
to
livein of BB #1.

Who should do it?
Do you mean that ARM backend/LiveIntervalsAnalysis/LiveVariables
should do it or do you mean that my regalloc should do it?

Register allocator should update mbb Livein info.

OK.

**** Post Machine Instrs ****
# Machine code for Insert():
Live Ins: R0 in VR#1025 R1 in VR#1026

entry: 0x8fdac90, LLVM BB @0x8fc2c48, ID#0:
Live Ins: %R0 %R1
   %LR<def> = MOVr %R0, 14, %reg0, %reg0
   CMPri %R0<kill>, 0, 14, %reg0, %CPSR<imp-def>
   Bcc mbb<UnifiedReturnBlock,0x8fdad70>, 10, %CPSR<kill>
Successors according to CFG: 0x8fdad00 (#1) 0x8fdad70 (#2)

bb368: 0x8fdad00, LLVM BB @0x8fc2c98, ID#1:
Predecessors according to CFG: 0x8fdac90 (#0)
   %R0<def> = MOVi 0, 14, %reg0, %reg0
   STR %LR<kill>, %R0<kill>, %reg0, 0, 14, %reg0, Mem:ST(4,4)
[0x8fc2d68 + 0]
   BX_RET 14, %reg0

Here the STR is using LR, but there isn't a def earlier.

May be I overlook something, but doesn't
%LR<def> = MOVr %R0, 14, %reg0, %reg0
in MBB#0 define the LR? It should be enough, or?

Every machine basic block must list physical register livein's.

One question, just to be sure I understand you correctly. You mean
that after the RegAlloc has assigned physical registers to
LiveIntervals and before it calls the VRM to rewrite the function, it
should explicitly add lives-ins for each MBB, just like LinearScan
does it at the end of the RALinScan::linearScan() function???

E.g. like this:

// Add live-ins to every BB except for entry. Also perform trivial coalescing.
MachineFunction::iterator EntryMBB = mf_->begin();
SmallVector<MachineBasicBlock*, 8> LiveInMBBs;
for (LiveIntervals::iterator i = li_->begin(), e = li_->end(); i != e; ++i) {
   LiveInterval &cur = *i->second;
   unsigned Reg = 0;
   bool isPhys = TargetRegisterInfo::isPhysicalRegister(cur.reg);
   if (isPhys)
     Reg = cur.reg;
   else if (vrm_->isAssignedReg(cur.reg))
     Reg = attemptTrivialCoalescing(cur, vrm_->getPhys(cur.reg));
   if (!Reg)
     continue;
   // Ignore splited live intervals.
   if (!isPhys && vrm_->getPreSplitReg(cur.reg))
     continue;
   for (LiveInterval::Ranges::const_iterator I = cur.begin(), E = cur.end();
        I != E; ++I) {
     const LiveRange &LR = *I;
     if (li_->findLiveInMBBs(LR.start, LR.end, LiveInMBBs)) {
       for (unsigned i = 0, e = LiveInMBBs.size(); i != e; ++i)
         if (LiveInMBBs[i] != EntryMBB)
           LiveInMBBs[i]->addLiveIn(Reg);
       LiveInMBBs.clear();
     }
   }
}

Only the last part.

If it is the case, it is OK. It was not clear for me that one has to
do it in the regalloc. My assumption was that VRM would do it.

One day I'd like to factor out the code to do it with an analysis pass. But unfortunately it's done by the allocators for now. It might be ok to move it to VRM. If you want to make that change I am ok with it.

Evan