Loop simplification

I have a (non-entry) basic block that contains only PHI nodes and an unconditional branch (that does not branch to itself). Is it always possible to merge this block with it's successor and produce a semantically equivalent program? I'm trying to undo some of the loop optimizations that LLVM has applied to my program to reduce a pair of nested loops to a single loop.

llvm::MergeBlockIntoPredecessor does not do what I want since it requires that the the block have a unique predecessor.

Andrew

I didn't notice anything that will do what you want out-of-box, but it should not be hard to write. llvm::FoldSingleEntryPHINodes is an example of phi node replacement. But in this case, you'll need to do one in-place operand replacement for each successor phi use and call PhiNode::addIncoming for the rest. Note that multiple successor phis may use the same predecessor phi, so you should be careful of mutating the phis while iterating their uses. If you cover the trivial case first with llvm::MergeBlockIntoPredecessor, then the predecessor phis should have no uses other than successor phis. That would violate strict SSA (the CFG edge you described is a dominance frontier).

-Andy

I forgot to ask why you're doing this. If the goal is to remove a branch, that would typically be handled by BranchFolder during codegen after phis have been removed. I don't see a problem forcing the CFG to be more canonical earlier, but if the successor is in a deeper loop, then you could be eliminating a preheader and forcing compensation code into the loop. In fact, I wouldn't be surprised if some loop passes might want a single preheader.

-Andy

Oops. I just realized you are intentionally doing loop combining. If the only other predecessor edges are back edges, then my statement above is untrue. You would have to replace all other uses of the predecessor phis (that are not successor phis) with a potentially new phi that uses itself on the backedge!

-Andy

Here's what I've got so far - it seems to work, aside from the fact that DeleteDeadPHIs is not removing at least one dead PHI in my test program.

The goal is to make the LoopInfo pass return a single loop where I expect a single loop (based on an input program that has a loop containing an if statement - which seems to generate an outer loop and an inner loop). The particular block that seems to cause this is a "loop.outer" that presumably gets added by one of the common optimization passes.

Andrew

PHI nodes don't seem to be cleaned up correctly since in llvm::RecursivelyDeleteDeadPHINode the algorithm can only handle single-use PHIs while my dead PHI has 2 uses (both self-references).

Andrew

Here's what I've got so far - it seems to work, aside from the fact that DeleteDeadPHIs is not removing at least one dead PHI in my test program.

The hasOneUse check may be failing in your case. Do you need to call SimplifyInstruction first? I'm not sure that will help though.

Your design looks mostly adequate for the simple nested loop case. I won't be able to spot all the issues upon inspection. I haven't done anything like this in LLVM either. I can add a couple comments:

---------------------

static bool
mergeBlockIntoSuccessor(BasicBlock *pred, BasicBlock *succ)
{
   if (succ == pred)
       return false;

   if (pred->getFirstNonPHI() != pred->getTerminator())
       return false;

   // Delete the terminator in the predecessor block
   pred->getTerminator()->eraseFromParent();

Is that sufficient to check for a side-effect free uncondition branch?

   // Update predecessor PHIs
   for (BasicBlock::iterator it = pred->begin();
           it != pred->end(); ++it)
   {
       PHINode *phi = dyn_cast<PHINode>(it);

       UT_ASSERT(phi);

       // Adjust the PHI to have the correct incoming block set
       for (pred_iterator pi = pred_begin(succ);
               pi != pred_end(succ); ++pi)
       {
           // We're a different predecessor than the predecessor block
           if (*pi != pred)
           {
               phi->addIncoming(phi, *pi);

You need to guarantee pi->succ is a backedge of course. Handling other cases will be more involved.

           }
       }
   }

   // Update successor PHIs
   for (BasicBlock::iterator it = succ->begin();
           succ->getFirstNonPHI() != it; ++it)
   {
       PHINode *phi = dyn_cast<PHINode>(it);

       UT_ASSERT(phi);
       UT_ASSERT(phi->getBasicBlockIndex(pred) >= 0);

       Value *val = phi->getIncomingValueForBlock(pred);
       PHINode *predphi = dyn_cast<PHINode>(val);

       if (predphi && predphi->getParent() != pred)
           predphi = 0;

       phi->removeIncomingValue(pred, false);

       for (pred_iterator pi = pred_begin(pred);
               pi != pred_end(pred); ++pi)
       {
           // We're a new predecessor
           if (phi->getBasicBlockIndex(*pi) < 0)
           {
               if (predphi)
               {
                   UT_ASSERT(predphi->getBasicBlockIndex(*pi) >= 0);
                   phi->addIncoming(
                           predphi->getIncomingValueForBlock(*pi), *pi);
               }
               else
                   phi->addIncoming(val, *pi);
           }
       }
   }

   // Move the PHIs into the successor
   succ->getInstList().splice(succ->begin(), pred->getInstList());

   // Remove the predecessor block
   pred->replaceAllUsesWith(succ);

   // Simplify conditional branches
   for (Value::use_iterator ui = succ->use_begin();
           ui != succ->use_end(); ++ui)
   {
       Instruction *inst = dyn_cast<Instruction>(*ui);
       if (inst)
           ConstantFoldTerminator(inst->getParent());
   }

   // Clean out dead PHI nodes
   DeleteDeadPHIs(succ);

   return true;
}

Do you need to worry about updating AliasAnalysis/MemoryDepAnalysis, DominatorTree, LoopInfo...?

-Andy

That's probably because LoopSimplify split the loop to give it a single backedge. It's a feature. Beyond that I'm not the person to ask about it though.

-Andy

I don't think that's always possible.
For example: if a phi in the successor has a phi in the predecessor as
one of it's incoming values, then the predecessor cases have to be
merged into the successor phi. However, if they share a predecessor
but have different incoming values for it that can't be done.

You're right. I think it's possible to avoid this (and all other?) failing cases by ensuring that the predecessor dominates the successor before trying to merge them. If the dominance relationship is not true, it may not be possible to merge them.

Andrew

That simplifies your implementation, but it's really unnecessary to handle this case. The CFG+phi transformation is still valid. As long as you avoid creating new critical edges, then the phis will always be representable (leave a single predecessor block on the branch targets).
-Andy