Is BlockAddress always correct ?


I use BlockAddress to get the address of BasicBlock ,

and I use GlobalVariable 's getInitializer()

to pass the address of BasicBlock to the global variable of my own program

and then I print it out.

But , I found that BlockAddress is not always correct.

For example, some function’s rsp (stack pointer) or other register is maintained by caller,

so it would be like:
0x42c37a: e8 c1 7a 00 00 call 433e40 <retrieve_url>
0x42c37f: 48 83 c4 20 add rsp,0x20
0x42c383: eb 00 jmp 42c385 <main+0x16b5>

What I want is the basic block which is “excatly” after the function call , 0x42c37f

I want BlockAddress give me 0x42c37f.

But actually, the output my program print out is 0x42c383.

I guess “add rsp,0x20” is seen as within the basic block of the function call.

Maybe reset the rsp (stack pointer) is part of the function call.

Can I say there is bug in BlockAddress ?

Or there is some bug in LLVM’s backend?

How to solve this problem?

Force clang/llvm not to use caller-saved convention or something like that ?


In general, no, there is no way in LLVM IR to get the return address of a single function call, which appears to be what you want. The compiler is free to insert instructions at the end of the basic block and into the beginning of the next block, so yes, the BlockAddress is always exact, but it doesn’t seem to be quite what you want. Something else that would break your invariant, for example, is if the register allocator decided to spill the return value in RAX right after the call, which is pretty typical.

There are several existing LLVM features that record function return addresses, but it is not implementable in LLVM IR. For example, CodeView debug info records heap allocation call sites. You can see this in the assembly in this example, see the labels Ltmp3 and Ltmp5 etc at the return addresses:

$ cat t.cpp
struct Foo {
int x, y;
__declspec(allocator) Foo *newFoo();
void bar(Foo **foos) {
foos[0] = newFoo();
foos[1] = newFoo();
foos[2] = newFoo();
foos[3] = newFoo();

$ clang -cc1 -gcodeview -masm-verbose -debug-info-kind=limited -triple x86_64-windows-msvc -fms-extensions -S t.cpp -o - | grep -A4 ‘S_HEAP|call.*newFoo’
callq “?newFoo@@YAPEAUFoo@@XZ
movq 32(%rsp), %rcx
movq %rax, (%rcx)

Thank for your explanation !!
I got it !!

Besides for caller saved register.
Actually, I also found that some compiler optimization would break the BlockAddress in llvm backend.
some test cases are correct in -O0 flag , and be wrong in -O2 flag.

Now I know that BlockAddress is not reliable.

Thank you again~

Reid Kleckner <> 於 2020年2月29日 週六 上午2:54寫道: