Oddity w/MachineBlockPlacement and Loops

I'm getting some odd behavior out of MBP and was hoping someone knowledge of the code might be able to give some guidance. Fair warning, I'm trying to describe a problem in code I don't really understand, so if something doesn't make sense, assume I misunderstood something.

The problematic case I'm seeing is that cold blocks are being placed between the preheader and header of a hot loop. This has the result of adding a bunch of cold code spread through out the code rather than grouped all together at the end of the function.

From what I can tell tracing through the code, the critical decision that goes wrong is when we're visiting the preheader after forming a (correct) chain for the loop itself. When selecting a successor to merge with, we appear to not be considering the loop even though the loop hasn't been rotated and the header would be ideal for fallthrough.

In particular, we're printing the "(prob) (non-cold CFG conflict)" debug output message for the successor of the preheader which is the header. If I'm reading this code correctly, it's identifying the fact there's a global more important predecessor for the header (i.e. the latch block), but it doesn't seem to be account for the fact that the latch block has already been combined into the header's chain. Unless I'm misreading something, we *should* be able to merge the loop chain with the preheader chain in it's entirety right?

At least one on example I've looked at, adding an early exit from BadCFGConflict loop when Pred and Succ are part of the same chain does appear to give the expected result, but I don't understand the code well enough to reason about whether that is generally a correct thing to do or not.

Philip

Hi Phillip,
http://reviews.llvm.org/D10825 tries to fix similar issues. Looks like there are missing cases. Can you create a small reproducer?

Cong has also improved loop rotation based on better cost model – it is not yet enabled (-mllvm -precise-rotation-cost=true to turn on). If possible, can you also give it a try?

thanks,

David

Dear Philip,

On a related note, Rahman Lavaee has been working on optimizing code layout with LLVM. His approach uses a trace of the program, so it may not be applicable to what you're doing, but I thought I should let you know since it works on a similar problem.

I believe his code is at https://github.com/rlavaee/code-layout-optimizer.

Rahman, does this code do the function-level code layout, the basic block layout optimization, or both?

Regards,

John Criswell

This doesn’t quite look the same. That review is about continuing the loop chain itself into the most profitable successor. My case is about continuing the preheader’s chain into the (previously formed) loop chain. Loop rotation is not the problem here. The layout of the loop (header->latch->header cycle) is exactly what it should be.

Hi Phillip,
http://reviews.llvm.org/D10825 tries to fix similar issues. Looks like
there are missing cases. Can you create a small reproducer?

This doesn't quite look the same. That review is about continuing the
loop chain itself into the most profitable successor. My case is about
continuing the preheader's chain into the (previously formed) loop chain.

yes -- but it is in the same category -- the layout algorithm needs to
ignore blocks already in chain (not just the current chain, and there needs
to be a way to detect trivial chains).

Cong has also improved loop rotation based on better cost model -- it is
not yet enabled (-mllvm -precise-rotation-cost=true to turn on). If
possible, can you also give it a try?

Loop rotation is not the problem here. The layout of the loop
(header->latch->header cycle) is exactly what it should be.

yes -- I am just advertising the new layout strategy here :slight_smile:

thanks,

David

I've posted a patch which addresses this:
http://reviews.llvm.org/D17830