I want to recuperate for a C++ program compiled with Clang/LLVM on an
Ubuntu CPU x86_64 bit architecture all the addresses of the call
instructions (C++ object dispatches) or directly the return address
which are just the next address after a call instruction.
I think that this information is not obtainable during link time since
we have at that moment only IR code. Please corect me if I am wrong.
So my assumption is that in the compiler back end after the IR code is
lowered to machine code and the addresses for the call instructions
and the addresses next to the call instructions are available.
Has anybody a suggestion where are the possible places in the compiler
where I should look for?
Since I am new to this topic suggestions or solutions are highly welcome.
Is it enough to compute the set of all possible return addresses, or do you need to limit the set to only C++ method calls? If you just need the full set of return addresses for a given DSO, I’d recommend disassembling the object after linking, scraping the output for “callq” instructions, and taking the address of the next instruction. This will give you the return address “VA” (I think, in ELF parlance), which is the address of the instruction assuming the ELF binary is loaded at the address listed in its program headers. You can compute the possible return addresses at runtime by adding the difference between the on-disk p_vaddr values and the actual addresses that the loader used at runtime. You can probably discover the load addresses with dl_iterate_phdr.
If you need only some specific annotated list of return addresses, you will probably have to make complicated changes to LLVM that insert labels after certain CALL instructions and emit some object file section with relocations against those labels. This is doable but complicated. You can follow the EH label machinery to see how to insert labels into the instruction stream and create relocations against them from read-only data sections.