Structurizing multi-exit regions

Hi,

I'm trying to solve a problem from StructurizeCFG not actually handling regions with multiple exits. Sample IR attached.

StructurizeCFG doesn't touch this function, exiting early on the isTopLevelRegion check. SIAnnotateControlFlow then gets confused and ends up inserting an if into one of the blocks, and the matching end.cf into one of the return/unreachable blocks. The input to the end.cf is then not dominated by the condition which fails the verifier.

I'm not sure exactly about how to go about fixing this. I see a few options:

- Try to make the annotator aware of multi exit regions and insert the necessary phis for the input mask values for the end.cf calls. This seems undesirable and I'm not sure works in all cases.

- Make StructurizeCFG duplicate blocks to get simple regions. Is there already code to do this somewhere? CodeExtractor seems to do something similar, but not quite the same. Can this be done in the region pass, or does StructurizeCFG need to be converted to a function pass? RegionInfo mentions support for "extended" regions with multiple exits, but I don't think this helps any here.

-Matt

multi-divergent-region-exit.ll (1.96 KB)

Hi,

I'm trying to solve a problem from StructurizeCFG not actually handling
regions with multiple exits.

SEME or MEME or both?

Sample IR attached.

This is an example that just exhibits undefined behavior.
Is there one that doesn't?
IE if i change the unreachables to ret something, is that still an example
of something you care about?

StructurizeCFG doesn't touch this function, exiting early on the
isTopLevelRegion check.

FWIW: I'm not sure it should care whether it's a top level region, only
whether it's SESE, SEME, etc.

SIAnnotateControlFlow then gets confused and ends up inserting an if into
one of the blocks, and the matching end.cf into one of the
return/unreachable blocks. The input to the end.cf is then not dominated
by the condition which fails the verifier.

I'm not sure exactly about how to go about fixing this. I see a few
options:

- Try to make the annotator aware of multi exit regions and insert the
necessary phis for the input mask values for the end.cf calls. This seems
undesirable and I'm not sure works in all cases.

It's possible to always find a unique entering/exiting condition for any of
the multiple-entrance/exit, and to transform it into a bunch of SESE
regions.
see, e.g,

- Make StructurizeCFG duplicate blocks to get simple regions. Is there
already code to do this somewhere? CodeExtractor seems to do something
similar, but not quite the same. Can this be done in the region pass, or
does StructurizeCFG need to be converted to a function pass? RegionInfo
mentions support for "extended" regions with multiple exits, but I don't
think this helps any here.

This also would work.

SEME. I don’t think I need to worry about MEME.

Yes, changing to ret also has the same problem. It just happens bugpoint found the unreachable case in the original problem

-Matt

Hi Matt,

one prerequisite for the algorithm is that each function has exactly one entry and one exit node.

If I remember correctly there was a LLVM pass called mergereturn or something like that which made sure that you have at most one ret instruction for each function.

Using that used to be a prerequisite for running the transformation, but that obviously won't work for unreachable instructions.

The only doable approach I can see is to make the annotator able to handle those. Otherwise you could also have a BB existing a function using ret and another one running into an unreachable.

Regards,
Christian.

FWIW, now that postdom is fixed, all the possible “returns/exits” of the function should be direct children of the virtual exit node in the post-dom tree.

Thus, the following would work to detect whether each block is a real return or not:

for (auto &PDTChild : children<DomTreeNode *>(PDT.getRootNode())) {
auto *BB = PDTChild->getBlock();
auto &Info = BlockInfo[BB];
// Real function return
if (isa(Info.Terminator)) {
DEBUG(dbgs() << "post-dom root child is a return: " << BB->getName()
<< ‘\n’;);
continue;
}
// PDTChild is something else, like a no-return or infinite loop
}

You could also just insert a br i1 false , to have a single exit.
Until you simplify the cfg, it would work.

Thanks, I didn't know about mergereturn. Running that first seems to practically solve the problem. It does also merge unreachables, so that's OK too. If one branch is unreachable, and the other is return I still see the same problem like you mentioned.

-Matt