CFG entry and exit blocks

Hi,

I'm confused about the entry and exit blocks of an LLVM CFG. I understand that every CFG has one and only one entry block, and this is confirmed by the existence of the getEntryBlock function in the Function class.

But what about exit (a.k.a. return) blocks? At first I assumed that LLVM CFGs have one and only one exit block, too, but the following code is a counter-example:

int simple(int j) {
     try {
         if (j > 50) throw 100;
     } catch (int status) {
         j++;
     }

     return j;
}

This produces the CFG shown in the attachment, which appears to have 3 exit blocks. At least, there are 3 blocks with no successors. (I don't yet understand the semantics of the "invcont" and "Unwind" blocks, so perhaps I'm misunderstanding something here.)

Anyway, given this code, how would one identify the "real" exit block (the one labeled "return")? Obviously I can't just search for the block without successors because there are 3 of them.

Thanks,

Trevor

cfg._Z6simplei.dot (4.56 KB)

Dear Trevor,

I'm too lazy to convert your .dot file into a graph file, but I'll make some comments anyway.
:slight_smile:

First, LLVM does not guarantee that a function has a single exit block. Multiple basic blocks can have a return instruction as their terminator instruction. There is a pass (Unify Function Exit nodes i.e., -mergereturn <http://llvm.org/docs/Passes.html#mergereturn&gt;\) that transform a function to have only 1 return instruction.

Now, I haven't looked at LLVM's new exception handling facilities recently, but there used to be an unwind instruction that would unwind the stack to the nearest dynamically executed invoke instruction (see the LLVM Langauge Reference Manual for more details). Functions that use unwind would naturally have multiple exits. I don't know of a transform that would change it to have a single exit (or one return and one unwind).

-- John T.

Trevor Harmon wrote:

I'm too lazy to convert your .dot file into a graph file

What format should I have posted? (I'm not sure what you mean by "graph file".) I had thought that .dot was the preferred format here, since that's what LLVM generates (e.g., "opt -dot-cfg ...").

First, LLVM does not guarantee that a function has a single exit block.
Multiple basic blocks can have a return instruction as their terminator
instruction.

Any suggestions on what kind of source code pattern would cause this to happen? I've tried feeding C++ code with multiple return statements to LLVM, but it always spits out a CFG with a single return block. All of the blocks for the return statements simply branch to this final return block.

There is a pass (Unify Function Exit nodes i.e.,
-mergereturn <LLVM’s Analysis and Transform Passes — LLVM 18.0.0git documentation) that
transform a function to have only 1 return instruction.

That sounds like a good approach, except that this pass does not appear to produce a CFG with exactly one return block, but rather one or zero return blocks. The distinction is important because if the pass's getReturnBlock function returns null (see UnifyFunctionExitNodes.h), I'm back to square one of not having a single exit block.

I'm wondering what would cause a CFG not to have a return block. The comments in UnifyFunctionExitNodes.cpp say: "If there are no return stmts in the Function, a null pointer is returned." But this doesn't make sense; even an empty void function has a return block with a ret instruction inside it. Is there some other kind of source code pattern that would cause an absence of a return block?

Thanks,

Trevor

Yes, an infinite loop. The return block would be found unreachable and deleted.

Reid

Other possibilities are functions which always throw an exception, and
functions which call "exit".

Ciao,

Duncan.