Function finalization section

If you have a function which has multiple returns does LLVM have some way for a section of code to always be run, regardless of which return was reached first? Not knowing much I’m thinking you basically use something like a goto which is always run before each return instruction and goes to the tail of the function body.

Is that correct?, and if so does LLVM have such a concept?

Thanks.

I don’t think LLVM itself has such a concept. It would be up to the frontend to produce IR that did this. The example that comes to mind would be in C++, if a function has local variables that need to have destructors called, the frontend will probably generate that code in an exit block and all the “return” statements will branch to it.

It is often a good idea to ask Clang what kind of IR it creates for small C-ish examples:

clang -S -emit-llvm foo.c
void foo() {
RAAI raii;

several for loops and if statements with returns spread over the function.
}

Clang will invoke the destructor of raii before every return statement.

That makes sense to have the returns jump to the exit block. I haven’t gotten this far but do LLVM functions even support more than 1 “ret” instruction or do you always need to jump to a single exit block if your function returns in multiple places?

According to the IR definition of ‘ret’ it terminates a block, but it doesn’t say anything about having only one per function.

That said, I’ve been playing around with some simple cases and in fact clang seems to emit IR to branch to a single ‘ret’ even when I’d think it would make sense not to. So perhaps I am wrong about this.

The abstraction level of LLVM IR is C with vectors. If you have several return statements in your C program, then you will have several ret instructions in LLVM IR (per function).

Clang was probably already to aggressive:

-Xclang -disable-llvm-passes

Do you have an actual example? I’ve been unable to come up with one, even though it’s the result I would expect.

@tschuett Sorry, how is this relevant? I asked if you had an example C/C++ function that produced IR with multiple ‘ret’ instructions. Telling me about -emit-llvm is not helpful.
Even with a function that has multiple raii variables and a return in the middle of a for loop still generates a single exit point.

Odd. Clang somehow normalises the IR to precisely one ret?!?

More likely, clang organizes its AST to a single exit point, or perhaps clang’s CodeGen just works that way.

Anyway, to the OP’s point: While clang seems to always produce a single exit point, I don’t know that it’s a property you can depend on, and LLVM does not require it. Hand-written IR with multiple ‘ret’ instructions passes the verifier just fine.

The AST models the source program as closely as possible. The single ret must be a codegen feature.

void maint() {
  for(unsigned i = 0; i < 10; ++i) {
    if (i > 100)
      return;
    else if (i > 1000)
      return;
  }
  return;
}
TranslationUnitDecl 0x7f84ca02f608 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x7f84ca02fe30 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x7f84ca02fbd0 '__int128'
|-TypedefDecl 0x7f84ca02fea0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7f84ca02fbf0 'unsigned __int128'
|-TypedefDecl 0x7f84ca0301b0 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7f84ca02ff80 'struct __NSConstantString_tag'
|   `-Record 0x7f84ca02fef8 '__NSConstantString_tag'
|-TypedefDecl 0x7f84ca030258 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7f84ca030210 'char *'
|   `-BuiltinType 0x7f84ca02f6b0 'char'
|-TypedefDecl 0x7f84ca030548 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag[1]'
| `-ConstantArrayType 0x7f84ca0304f0 'struct __va_list_tag[1]' 1
|   `-RecordType 0x7f84ca030330 'struct __va_list_tag'
|     `-Record 0x7f84ca0302b0 '__va_list_tag'
`-FunctionDecl 0x7f84ca077600 <multi.c:1:1, line:9:1> line:1:6 maint 'void ()'
  `-CompoundStmt 0x7f84ca077a70 <col:14, line:9:1>
    |-ForStmt 0x7f84ca077a28 <line:2:3, line:7:3>
    | |-DeclStmt 0x7f84ca0777a0 <line:2:7, col:21>
    | | `-VarDecl 0x7f84ca077700 <col:7, col:20> col:16 used i 'unsigned int' cinit
    | |   `-ImplicitCastExpr 0x7f84ca077788 <col:20> 'unsigned int' <IntegralCast>
    | |     `-IntegerLiteral 0x7f84ca077768 <col:20> 'int' 0
    | |-<<<NULL>>>
    | |-BinaryOperator 0x7f84ca077828 <col:23, col:27> 'int' '<'
    | | |-ImplicitCastExpr 0x7f84ca0777f8 <col:23> 'unsigned int' <LValueToRValue>
    | | | `-DeclRefExpr 0x7f84ca0777b8 <col:23> 'unsigned int' lvalue Var 0x7f84ca077700 'i' 'unsigned int'
    | | `-ImplicitCastExpr 0x7f84ca077810 <col:27> 'unsigned int' <IntegralCast>
    | |   `-IntegerLiteral 0x7f84ca0777d8 <col:27> 'int' 10
    | |-UnaryOperator 0x7f84ca077868 <col:31, col:33> 'unsigned int' prefix '++'
    | | `-DeclRefExpr 0x7f84ca077848 <col:33> 'unsigned int' lvalue Var 0x7f84ca077700 'i' 'unsigned int'
    | `-CompoundStmt 0x7f84ca077a10 <col:36, line:7:3>
    |   `-IfStmt 0x7f84ca0779e0 <line:3:5, line:6:7> has_else
    |     |-BinaryOperator 0x7f84ca0778f0 <line:3:9, col:13> 'int' '>'
    |     | |-ImplicitCastExpr 0x7f84ca0778c0 <col:9> 'unsigned int' <LValueToRValue>
    |     | | `-DeclRefExpr 0x7f84ca077880 <col:9> 'unsigned int' lvalue Var 0x7f84ca077700 'i' 'unsigned int'
    |     | `-ImplicitCastExpr 0x7f84ca0778d8 <col:13> 'unsigned int' <IntegralCast>
    |     |   `-IntegerLiteral 0x7f84ca0778a0 <col:13> 'int' 100
    |     |-ReturnStmt 0x7f84ca077910 <line:4:7>
    |     `-IfStmt 0x7f84ca0779c0 <line:5:10, line:6:7>
    |       |-BinaryOperator 0x7f84ca077990 <line:5:14, col:18> 'int' '>'
    |       | |-ImplicitCastExpr 0x7f84ca077960 <col:14> 'unsigned int' <LValueToRValue>
    |       | | `-DeclRefExpr 0x7f84ca077920 <col:14> 'unsigned int' lvalue Var 0x7f84ca077700 'i' 'unsigned int'
    |       | `-ImplicitCastExpr 0x7f84ca077978 <col:18> 'unsigned int' <IntegralCast>
    |       |   `-IntegerLiteral 0x7f84ca077940 <col:18> 'int' 1000
    |       `-ReturnStmt 0x7f84ca0779b0 <line:6:7>
    `-ReturnStmt 0x7f84ca077a60 <line:8:3>

Notice the fleet of ReturnStmt s

I’m finally trying this now and I see it doesn’t like multiple ret instructions. Does this look right?

define void @MultipleExit(i64 %0) {
entry:
  %i = alloca i64, align 8
  store i64 %0, ptr %i, align 4
  %i1 = load i64, ptr %i, align 4
  %int_slt_tmp = icmp slt i64 %i1, 1
  br i1 %int_slt_tmp, label %then, label %outer

then:                                             ; preds = %entry
  %const_tmp = alloca [255 x i8], align 1
  store [12 x i8] c"less than 1\00", ptr %const_tmp, align 1
  %1 = call i32 (ptr, ...) @printf(ptr %const_tmp)
  ret void
  br label %outer

else:                                             ; No predecessors!
  %const_tmp2 = alloca [255 x i8], align 1
  store [12 x i8] c"more than 1\00", ptr %const_tmp2, align 1
  %2 = call i32 (ptr, ...) @printf(ptr %const_tmp2)
  ret void
  br label %outer

outer:                                            ; preds = %else, %then, %entry
  ret void
}

Gives the error:

Terminator found in the middle of a basic block!
label %then
Terminator found in the middle of a basic block!
label %else
LLVM ERROR: Broken function found, compilation aborted!

Terminators are the last instruction of a basic block. ret and br are terminators. They may only be the last instruction (terminator) of a block. You have br after ret, which is illegal. It is questionable how to reach the br after the ret.

1 Like