LLVM Block is not the basic block

Hi

I am using the LLVM function pass to help me to do code analysis. I use

Hi

Sorry, that the previous email is sent out before I complete it due to my mistake. Please read this

I am using the LLVM function pass to help me to do code analysis. However, I found that the block LLVM identified will ignore the function call.

For example, the below IR should not be a basic block.

%call17 = call i32* @__errno_location() #14, !dbg !1384
%18 = load i32, i32* %call17, align 4, !dbg !1384
%19 = load i8*, i8** %dest_dirname, align 4, !dbg !1386
call void (i32, i32, i8*, …) @error(i32 1, i32 %18, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.389, i32 0, i 32 0), i8* %19), !dbg !1387
br label %if.end18, !dbg !1388

The corresponding binary i s below

.text:0001A530 BL __errno_location
.text:0001A534 LDR R1, [R0] ; errnum
.text:0001A538 LDR R3, [SP,#0x100+var_100]
.text:0001A53C LDR R2, =aS_1 ; “%s”
.text:0001A540 MOV R0, #1 ; status
.text:0001A544 BL error
.text:0001A548 B loc_1A54C

Here you can see it obviously should not be a basic block because you called two functions! So the control flow graph LLVM generated is also not the real control flow graph, right? Do anyone know why or give me some suggestions?

Regards
Muhui

In LLVM, basic blocks can contain function calls. This allows partial and/or full code inlining.

The control flow graph (CFG) referred to in LLVM passes only include the LLVM basic blocks inside a function. In LLVM, only tail-call exits are considered terminator instructions (and thus will delineate the basic block boundaries).

If you want to see function-call graphs, you may want to look at CallGraphSCCPass instead of FunctionPass.

http://llvm.org/docs/WritingAnLLVMPass.html#the-callgraphsccpass-class

Cheers

-- Dean

Hi Dean

Thank you very much for you very quick reply. I am still a little bit confused and below is some of my questions.

In LLVM, basic blocks can contain function calls. This allows partial and/or full code inlining.

Not really. Basic blocks are units of code in a context of a specific function. When execution reaches a function call, it will go outside of the current function for some time, but then it will return back to the instruction following the call. From the perspective of the function containing the call, the call is simply another (potentially complicated and long-running) instruction.
Generally speaking, a basic block ends at instruction A, if the instruction following A is not guaranteed to execute after A has executed. From the point of view of a given function, a call to another function will return back to the caller, so logically there is no reason to terminate the basic block at a call. There are of course complications, like calls that can throw exceptions, or calls that don't return, but the general idea is that calls do return.

-Krzysztof

Hi Krzysztof

I see and I agree with your explanation

However, you know some start of art binary analysis tools like angr will accept LLVM’s such kind of design. You know there are some non-return functions. Does LLVM consider this? Do you have any ideas if I want to create a block that cannot contain function calls with LLVM IR.

Regards
Muhui

It doesn't seem that LLVM treats non-returning calls in any special way in the LLVM IR.

You could probably write your own pass that splits basic blocks at each call, but there are optimizations that would merge the pieces back together. If you do the splitting at the right moment, it may work, but it really depends on your specific application.

-Krzysztof

The graph is designed to model all issues relevant to the compiler.
Mostly this revolves around making sure values are still live when
needed and that phi instructions work properly to merge different
values in from different paths.

In the case of function calls though, execution always returns to
precisely the next instruction with some registers defined (the return
value), some clobbered, and some memory affected. There's no real
reason to distinguish that from any similar instruction as far as the
compiler is concerned.

The case where a call can affect control-flow is when it might throw
an exception, and that's represented in LLVM IR by a different
"invoke" instruction that does describe the possible return locations.

Not to pile issues on your plate, but you should also be aware that
CodeGen can synthesize calls in some cases. For example accessing a
"thread_local" variable on Apple platforms will generate a call to
__tlv_get_address that is completely invisible in the IR. Memcpy calls
may also appear out of nowhere on most platforms.

To get something close to the CPU-level control flow graph you'd
probably have to run a very late MachineFunction pass that looked not
only at the presented basic blocks, but also checked individual
instructions for isCall[*].

Cheers.

Tim.

[*] And even that would be an abstraction. IMO there's a pretty broad,
grey spectrum when it comes to "real" control flow. Depending on what
you're doing you may or may not include segfaults, floating-point
exceptions, and even asynchronous interrupts as control flow.