Mapping bytecode to X86

Dear guys,

    I am in need of more of your help. I'm implementing a register
allocator, and I am having problems to make it produce correct code.
Consider this program here:

int main(int argc, char ** argv) {
    int i, j, sum;
    i = argv[0][0];
    j = argv[0][1];
    sum = (i + j) * j;
    printf("Sum = %d\n", sum);
}

that maps to this llvm bytecode:

entry (0xa785590, LLVM BB @0xa77ebf8):
        FNSTCW16m <fi#0>, 1, %NOREG, 0
        MOV8mi <fi#0>, 1, %NOREG, 1, 2
        FLDCW16m <fi#0>, 1, %NOREG, 0
        %reg1024 = MOV32rm <fi#-2>, 1, %NOREG, 0
        %reg1025 = MOV32rm %reg1024, 1, %NOREG, 0
        %reg1026 = MOVSX32rm8 %reg1025, 1, %NOREG, 0
        %reg1027 = MOVSX32rm8 %reg1025, 1, %NOREG, 1
        ADJCALLSTACKDOWN 8
        %reg1028 = ADD32rr %reg1026, %reg1027
        %reg1029 = IMUL32rr %reg1028, %reg1027
        MOV32mr %ESP, 1, %NOREG, 4, %reg1029
        MOV32mi %ESP, 1, %NOREG, 0, <ga:.str_1>
        CALLpcrel32 <ga:printf>
        ADJCALLSTACKUP 8, 0
        %reg1030 = MOV32rr %EAX
        %reg1031 = IMPLICIT_DEF_GR32
        %EAX = MOV32rr %reg1031
        RET

My allocator produces this mapping:

FNSTCW16m :=
MOV8mi :=
FLDCW16m :=
MOV32rm EAX :=
MOV32rm EAX := EAX
MOVSX32rm8 ECX := EAX
MOVSX32rm8 EAX := EAX
ADJCALLSTACKDOWN :=
ADD32rr ECX := ECX EAX
IMUL32rr EAX := ECX EAX
MOV32mr := ESP EAX
MOV32mi := ESP
CALLpcrel32 :=
ADJCALLSTACKUP :=
IMPLICIT_DEF_GR32 EAX :=
RET :=

This mapping looks correct, but the final code generated by llvm is not
correct. Check the assembly code below: the add instruction should be
addl %eax, %ecs

main:
        subl $12, %esp
        fnstcw 10(%esp)
        movb $2, 11(%esp)
        fldcw 10(%esp)
        movl 20(%esp), %eax
        movl (%eax), %eax
        movsbl (%eax), %ecx
        movsbl 1(%eax), %eax
        addl %ecx, %ecx
        imull %ecx, %eax
        movl %eax, 4(%esp)
        movl $.str_1, (%esp)
        call printf
        #IMPLICIT_DEF %eax
        addl $12, %esp
        ret
        .size main, .-main

On the other hand, if I use the TwoAddressInstructionPass, then llvm
produces this code here:

FNSTCW16m :=
MOV8mi :=
FLDCW16m :=
MOV32rm EAX :=
MOV32rm EAX := EAX
MOVSX32rm8 ECX := EAX
MOVSX32rm8 EAX := EAX
ADJCALLSTACKDOWN :=
ADD32rr ECX := EAX
IMUL32rr ECX := EAX
MOV32mr := ESP ECX
MOV32mi := ESP
CALLpcrel32 :=
ADJCALLSTACKUP :=
IMPLICIT_DEF_GR32 EAX :=
RET :=

which is correctly mapped by llvm to assembly code. Check the assembly
below:

main:
        subl $12, %esp
        fnstcw 10(%esp)
        movb $2, 11(%esp)
        fldcw 10(%esp)
        movl 20(%esp), %eax
        movl (%eax), %eax
        movsbl (%eax), %ecx
        movsbl 1(%eax), %eax
        addl %eax, %ecx
        imull %eax, %ecx
        movl %ecx, 4(%esp)
        movl $.str_1, (%esp)
        call printf
        #IMPLICIT_DEF %eax
        addl $12, %esp
        ret
        .size main, .-main

The problem is that, after the TwoAddressInstructionPass is used, the
code is no longer in SSA form, and my register allocator rely on
some SSA properties. I am using the Spiller in VirtRegMap.* to generate
the code, but the incorrect mapping still happens when I invoke the
setReg() method directly on machine operands. Any of you guys has some
hints to help me to produce correct mapping without using the Two
address pass?

Thanks a lot,

Fernando

You're missing one critical piece. Currently, the register allocator is in charge of removing one of the operands of a two address instruction. Whether you choose to use the two-address pass or not (your choice), you should make sure that the first two operands of the instructions are assigned to the same register, and then you delete one. For example, before RA you might have:

vreg1024 = ADDxxx vreg1025, 123

after, you should have:

ADDxxx EAX, 123

Where the "EAX" operand is marked as a def&use operand.

-Chris

Thank you Chris. I will try to implement the TwoAddress pass to run on
machine code. Why it has not been originally implemented to run on
machine code? Is there anything that makes it troublesome after RA
has been performed? Could you tell me if the transformations below
are correct?

1) a := b op c --> a := b --> a := b

2) a := a op c --> a := c

3) a := b op a --> a := a op b --> a := b (???)

What if the operation in (3) is non-commutative?

Thanks a lot,

Fernando

Thank you Chris. I will try to implement the TwoAddress pass to run on
machine code. Why it has not been originally implemented to run on
machine code?

I'm not sure what you mean. It definitely does run on machine code.

Is there anything that makes it troublesome after RA
has been performed?

Do you specifically mean removing the extra operand? One of my eventual goals is to eliminate this behavior of the register allocator. I'd much rather have it leave the instructions as:

EAX = OP EAX, EDX

instead of forcing it to:

OP EAX [D&U] EDX

Could you tell me if the transformations below
are correct?

1) a := b op c --> a := b --> a := b
                         a := a op c a := c

Yes.

2) a := a op c --> a := c

Yes

3) a := b op a --> a := a op b --> a := b (???)

No.

What if the operation in (3) is non-commutative?

2-address instructions are only two-addressy in their first two operands. You can't join the first and third operand.

Note that the 2-addr pass does all of the above for you, but it does destroy SSA.

-Chris

Thanks a lot,

Fernando

The problem is that, after the TwoAddressInstructionPass is used, the
code is no longer in SSA form, and my register allocator rely on
some SSA properties. I am using the Spiller in VirtRegMap.* to generate
the code, but the incorrect mapping still happens when I invoke the
setReg() method directly on machine operands. Any of you guys has some
hints to help me to produce correct mapping without using the Two
address pass?

You're missing one critical piece. Currently, the register allocator is
in charge of removing one of the operands of a two address instruction.
Whether you choose to use the two-address pass or not (your choice), you
should make sure that the first two operands of the instructions are
assigned to the same register, and then you delete one. For example,
before RA you might have:

vreg1024 = ADDxxx vreg1025, 123

after, you should have:

ADDxxx EAX, 123

Where the "EAX" operand is marked as a def&use operand.

-Chris

--
Chris Lattner's Homepage
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

> Thank you Chris. I will try to implement the TwoAddress pass to run on
> machine code. Why it has not been originally implemented to run on
> machine code?

I'm not sure what you mean. It definitely does run on machine code.

I was thinking that it only transformed instructions with virtual
registers because of this code in the TwoAddressInstructionPass.cpp:

        unsigned regA = mi->getOperand(0).getReg();
        unsigned regB = mi->getOperand(1).getReg();

        assert(MRegisterInfo::isVirtualRegister(regA) &&
               MRegisterInfo::isVirtualRegister(regB) &&
               "cannot update physical register live information");

By machine code I meant code with virtual registers, instead of machine
(physical) registers; not the passes on machine functions/machine basic
blocks.

Fernando

Right. Two address instructions can only use virtual registers, not physical registers.

-Chris

Dear llvmers,

    I am trying to insert a move instruction where both source and
destination registers are physical registers. How is the code for this?
I tried this one here:

void PhiDeconstruction_Fer::add_move (
                                       MachineFunction & mf,
                                       MachineBasicBlock & mbb,
                                       unsigned src,
                                       unsigned dst
                                     )
{
    MachineBasicBlock::iterator iter = mbb.getFirstTerminator();

    const TargetRegisterClass *rc = mf.getSSARegMap()->getRegClass(dst);

    const MRegisterInfo * reg_info = mf.getTarget().getRegisterInfo();

    reg_info->copyRegToReg(mbb, iter, dst, src, rc);
}

But the getRegClass method seems to expect a virtual register. Could
someone fix this code for me? I could not find an example in the source of
LLVM.

Thank you very much,

Fernando

You can't do it with this information. In some higher context you should have information about what register class the physreg is to be interpreted as. Physregs can be in multiple register classes.

All of the register allocators call this method, so there are plenty of examples.

-Chris

Thank you, chris. But I still do not understand how to insert this move
instruction :slight_smile: I have the machine function, the basic block, and the
unsigned descriptors of the source and destiny register. With this
information is possible to insert a move, considering that the source
and destiny are physical registers? The example that I could find, in
the phi-elimination pass, expects virtual registers. But I am eliminating
phi functions after register allocation has been performed. I would like
to know if there is a simple way to discover the class of a physical
register given MachineFunction and MachineBasicBlock. Could you point me
an example in the code of one of the register allocators?

Thanks a lot,

Fernando

Hi, again,

    I think I got around this problem of discovering the class of a
physical register. I am using this code here:

void PhiDeconstruction_Fer::add_move
                        (MachineBasicBlock & mbb, unsigned src, unsigned
dst) {
    MachineBasicBlock::iterator iter = mbb.getFirstTerminator();
    const MRegisterInfo * reg_info =
                       this->machine_function->getTarget().getRegisterInfo();

    // TODO: verify if does not causes incorrect allocation:
    for(MRegisterInfo::regclass_iterator rcii = reg_info->regclass_begin(),
                       rcie = reg_info->regclass_end(); rcii != rcie; ++rcii) {
        if( (*rcii)->contains(dst) ) {
            rc = * rcii;
        }
    }

    reg_info->copyRegToReg(mbb, iter, dst, src, rc);
}

Fernando

Thank you, chris. But I still do not understand how to insert this move
instruction :slight_smile:

You call copyRegToReg, like you are already doing. What you really aren't understanding is how to pick a regclass, which is a different issue.

I have the machine function, the basic block, and the
unsigned descriptors of the source and destiny register. With this
information is possible to insert a move, considering that the source
and destiny are physical registers?

No, it isn't. What you're basically saying here is that, in the process of your register allocator, you have discarded information about what register classes the physical registers used to be in. If you retained this information, you would have no problem. I suggest you modify your allocator to retain this information, just like the other extant allocators do.

   I think I got around this problem of discovering the class of a
physical register. I am using this code here:

As I mentioned in my previous email, this won't work because physregs can be in multiple reg classes. For example, if you are trying to copy between R1 and R2, it's quite possible that R1 is in RC1 and RC2 and that R2 is in RC2 only. "Picking" RC1 won't work. The safe thing to do would be to pick something out of the intersection of the register classes the registers are in, but computing this on the fly would be horribly inefficient and building big tables of these would take a bunch of space.

You'd be far better off by not throwing away the information you need earlier.

-Chris

void PhiDeconstruction_Fer::add_move
                       (MachineBasicBlock & mbb, unsigned src, unsigned
dst) {
   MachineBasicBlock::iterator iter = mbb.getFirstTerminator();
   const MRegisterInfo * reg_info =
                      this->machine_function->getTarget().getRegisterInfo();

   // TODO: verify if does not causes incorrect allocation:
   for(MRegisterInfo::regclass_iterator rcii = reg_info->regclass_begin(),
                      rcie = reg_info->regclass_end(); rcii != rcie; ++rcii) {
       if( (*rcii)->contains(dst) ) {
           rc = * rcii;
       }
   }

   reg_info->copyRegToReg(mbb, iter, dst, src, rc);
}

Fernando

You can't do it with this information. In some higher context you should
have information about what register class the physreg is to be
interpreted as. Physregs can be in multiple register classes.

All of the register allocators call this method, so there are plenty of
examples.

-Chris

Thank you, chris. But I still do not understand how to insert this move
instruction :slight_smile: I have the machine function, the basic block, and the
unsigned descriptors of the source and destiny register. With this
information is possible to insert a move, considering that the source
and destiny are physical registers? The example that I could find, in
the phi-elimination pass, expects virtual registers. But I am eliminating
phi functions after register allocation has been performed. I would like
to know if there is a simple way to discover the class of a physical
register given MachineFunction and MachineBasicBlock. Could you point me
an example in the code of one of the register allocators?

Thanks a lot,

Fernando
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

Dear guys,

    when the phi-elimination pass is called (after local
register allocation, for example), it seems to me that the control flow
graph may have critical edges. Is it true?

If it is true, I would like to know more about the
algorithm that you guys use to destroy phi-functions. Is there any
reference? Why did not implement a machine function pass to
remove critical edges? I mean, I would like to implement something
like this. Is there any problem that would make it hard?

Thank you very much,

Fernando

   when the phi-elimination pass is called (after local
register allocation, for example), it seems to me that the control flow
graph may have critical edges. Is it true?

Yes.

If it is true, I would like to know more about the
algorithm that you guys use to destroy phi-functions. Is there any
reference? Why did not implement a machine function pass to

We use a trivial PHI elimination algorithm. The code is extensively commented in CodeGen/PHIElimination.cpp, but basically it emits one copy in the def block and one copy for each input in the pred blocks.

remove critical edges? I mean, I would like to implement something
like this. Is there any problem that would make it hard?

This wouldn't be hard to implement, patches welcome!

-Chris

Dear guys,

    I've adapted the pass in BreakCriticalEdges.cpp so I can use it
before register allocation. But the pass is not changing the control
flow graph of the machine function. I think it is because I am inserting
the pass after the control flow graph of the machine function is built.
I am inserting the pass as required by the register allocator, and I
can see that the pass is splitting critical edges. Question is: where
do I insert this pass so its modifications reflects in the control
flow graph of the machine function? The code of the pass is the code
of BreakCriticalEdges.cpp, and it is a runOnFunction pass, not a
runOnMachineFunction.

Thank you very much,

Fernando

Hi,

    I am able to remove the critical edges now. I only had to insert the
line below in PPCTargetmachine.cpp.

  PM.add(createBreakCriticalEdgesPass());

However, it does not remove all the critical edges. I am getting a very
weird dataflow graph (even without the Break Critical edges pass). The
dataflow generated by MachineFunction::dump() for the program below is
given here:
http://compilers.cs.ucla.edu/fernando/projects/soc/images/loop_no_crit2.pdf

int main(int argc, char **argv) {
    int result = 0;
    int counter = argc;
    while(counter > 0) {
        if(counter % 3 == 0) {
            result--;
        } else {
            result++;
        }
        counter = counter - 1;
    }
    printf("Result = %d\n", result);
}

The problem is the no_exit block. I think it was changed by one of the
optimization passes, and was split into three basic blocks. But now there
is a phi function where both the parameters are defined in the same basic
block. Any of you guys now which pass I should cut off if I want to avoid
this optimization?

Thanks,

Fernando

   I've adapted the pass in BreakCriticalEdges.cpp so I can use it
before register allocation. But the pass is not changing the control

BreakCriticalEdges modifies the CFG of the LLVM IR, so you'd have to run it before any machine code is generated: i.e. before instruction selection.

-Chris

flow graph of the machine function. I think it is because I am inserting
the pass after the control flow graph of the machine function is built.
I am inserting the pass as required by the register allocator, and I
can see that the pass is splitting critical edges. Question is: where
do I insert this pass so its modifications reflects in the control
flow graph of the machine function? The code of the pass is the code
of BreakCriticalEdges.cpp, and it is a runOnFunction pass, not a
runOnMachineFunction.

Thank you very much,

Fernando
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

However, it does not remove all the critical edges. I am getting a very
weird dataflow graph (even without the Break Critical edges pass). The
dataflow generated by MachineFunction::dump() for the program below is
given here:
http://compilers.cs.ucla.edu/fernando/projects/soc/images/loop_no_crit2.pdf

...

The problem is the no_exit block. I think it was changed by one of the
optimization passes, and was split into three basic blocks. But now there
is a phi function where both the parameters are defined in the same basic
block. Any of you guys now which pass I should cut off if I want to avoid
this optimization?

This is due to instruction selection. The llvm code turns your testcase into something like this:

   X += cond ? 1 : -1;

This is a 'select' instruction in LLVM.

Generic PowerPC chips doesn't support integer select instructions, so it must be lowered to a conditional branch. This lowering is what you're seeing, there is no way to disable it.

If you don't want critical edges in the machine code CFG, you're going to have to write a machine code CFG critical edge splitting pass: LLVM doesn't currently have one.

-Chris

If you don't want critical edges in the machine code CFG, you're going to
have to write a machine code CFG critical edge splitting pass: LLVM
doesn't currently have one.

-Chris

Hey guys,

    I've coded a pass to break the critical edges of the machine
control flow graph. The program works fine, but I am sure it is not
the right way of implementing it. There are two problems:

error for sure: I am not inserting a branch instruction at the
end of the newly created block, nor updating the branch in the
end of the old source block. How do I do this?

error almost for sure: I cannot create a new BasicBlock and
add it to the function, because it is a MachineFunctionPass. But it
seems to work in the way that I am doing: I get the basic block of
the origin of the critical edge, and use it to create the new
MachineBasicBlock. I know this seems wrong, but how to get around this?
But, before running my pass, I am using LLVM function pass to break
critical edges; thus, I only have to deal with critical edges inserted
during the instruction selection phase, and both, origin and destiny of
these critical edges contain the same BasicBlock.

The code is small. Could someone read it, and tell me if it is likely
to generate incorrect code (besides the problem of not inserting
branch instructions)?

Thanks a lot,

Fernando

Code ---------------------------------------------------------------

void CriticalEdgeRemoval_Fer::getAnalysisUsage
(AnalysisUsage & AU) const {
    AU.addPreserved<ETForest>();
    AU.addPreserved<DominatorSet>();
    AU.addPreserved<ImmediateDominators>();
    AU.addPreserved<DominatorTree>();
    AU.addPreserved<DominanceFrontier>();
    AU.addPreserved<LoopInfo>();
    AU.addPreservedID(LoopSimplifyID);
}

bool CriticalEdgeRemoval_Fer::runOnMachineFunction
(MachineFunction & mf) {
    std::vector<MachineBasicBlock *> src_blocks;
    std::vector<MachineBasicBlock *> dst_blocks;

    // first, only find the critical edges:
    for(unsigned u = 0; u < mf.size(); u++) {
        MachineBasicBlock * mbb = mf.getBlockNumbered(u);
        for(MachineBasicBlock::succ_iterator succ = mbb->succ_begin();
                                      succ != mbb->succ_end(); succ++) {
            MachineBasicBlock * mbb_succ = *succ;
            if(is_critical_edge(*mbb, *mbb_succ)) {
                src_blocks.push_back(mbb);
                dst_blocks.push_back(mbb_succ);
            }
        }
    }

    // second, break the critical edges:
    for(unsigned u = 0; u < src_blocks.size() > 0; u++) {
        MachineBasicBlock * src = src_blocks[u];
        MachineBasicBlock * dst = dst_blocks[u];
        split_critical_edge(*src, *dst, mf);
    }
    return src_blocks.size() > 0;
}

bool CriticalEdgeRemoval_Fer::is_critical_edge
(const MachineBasicBlock & src, const MachineBasicBlock & dst) {

    unsigned num_succ = 0;
    unsigned num_pred = 0;
    bool src_has_many_succ = false;
    bool dst_has_many_pred = false;

    for(MachineBasicBlock::const_succ_iterator succ = src.succ_begin();
                                      succ != src.succ_end(); succ++) {
        num_succ++;
        if(num_succ > 1) {
            src_has_many_succ = true;
            break;
        }
    }
    for(MachineBasicBlock::const_pred_iterator pred = dst.pred_begin();
                                      pred != dst.pred_end(); pred++) {
        num_pred++;
        if(num_pred > 1) {
            dst_has_many_pred = true;
            break;
        }
    }
    return src_has_many_succ && dst_has_many_pred;
}

void CriticalEdgeRemoval_Fer::split_critical_edge
(MachineBasicBlock & src, MachineBasicBlock & dst, MachineFunction & mf) {

    const BasicBlock * src_bb = src.getBasicBlock();
    const BasicBlock * dst_bb = dst.getBasicBlock();

    MachineBasicBlock * crit_mbb = new MachineBasicBlock(src_bb);

    src.addSuccessor(crit_mbb);
    crit_mbb->addSuccessor(& dst);
    src.removeSuccessor(& dst);

    ilist<MachineBasicBlock>::iterator mbb_it = mf.getLastBlock();
    ++mbb_it;
    mf.getBasicBlockList().insert(mbb_it, crit_mbb);

}

Dear guys,

    I am having problem to split edges correctly. Mostly because the new
basic blocks are creating infinite loops. Could someone help me fixing the
code below? It is creating assembly like this one below. Block LBB1_9 was
inserted to break the critical edge between blocks LBB1_3 and LBB1_8. But
it changes the semantics of the original program, because, before, LBB1_8
was falling through LBB1_4, and now it is falling on LBB1_9.

LBB1_3: ;no_exit
        lis r4, 21845
        ori r4, r4, 21846
        mulhw r4, r2, r4
        addi r5, r2, -1
        li r6, -1
        srwi r6, r4, 31
        add r4, r4, r6
        mulli r4, r4, 3
        li r6, 1
        subf r2, r4, r2
        cmpwi cr0, r2, 0
        beq cr0, LBB1_9 ;no_exit
LBB1_7: ;no_exit
        mr r2, r6
LBB1_8: ;no_exit
        cmpwi cr0, r5, 0
        add r2, r2, r3
        bgt cr0, LBB1_5 ;no_exit.no_exit_llvm_crit_edge
LBB1_9: ;no_exit
        mr r2, r6
        b LBB1_8 ;no_exit
LBB1_4: ;no_exit.loopexit_llvm_crit_edge

The code is this one below:

void CriticalEdgeRemoval_Fer::split_critical_edge
     (MachineBasicBlock & src, MachineBasicBlock & dst, MachineFunction &
mf) {

    const BasicBlock * src_bb = src.getBasicBlock();
    MachineBasicBlock * crit_mbb = new MachineBasicBlock(src_bb);

    /// modify the llvm control flow graph
    src.addSuccessor(crit_mbb);
    crit_mbb->addSuccessor(& dst);
    src.removeSuccessor(& dst);

    /// insert the new block into the machine function.
    ilist<MachineBasicBlock>::iterator mbb_it = mf.getLastBlock();
    ++mbb_it;
    mf.getBasicBlockList().insert(mbb_it, crit_mbb);

    /// insert a unconditional branch linking the new block to dst
    const TargetMachine & tm = mf.getTarget();
    const TargetInstrInfo * tii = tm.getInstrInfo();
    tii->insertGoto(*crit_mbb, dst);

    /// modify every branch in src that points to dst to point to the new
    /// machine basic block instead:
    MachineBasicBlock::iterator mii = src.end();
    while (mii != src.begin()) {
        mii--;
        // if there is not more branches, finish the loop
        if (!tii->isTerminatorInstr(mii->getOpcode())) {
            break;
        }
        // Scan the operands of this machine instruction, replacing any
        // use of dst with crit_mbb.
        for (unsigned i = 0, e = mii->getNumOperands(); i != e; ++i) {
            if (mii->getOperand(i).isMachineBasicBlock() &&
                   mii->getOperand(i).getMachineBasicBlock() == & dst) {
                mii->getOperand(i).setMachineBasicBlock(crit_mbb);
            }
        }
    }

}

The problem is that you are inserting block 9 in the wrong spot. mf.getLastBlock() returns the block with the greatest number which may have nothing to do with the ordering. Why not use the end iterator (mf.end) to insert?

-Tanya