How to add custom instrumentation?

Hi everyone,

I run some functions using ORC JIT, now I need to add custom instrumentation.
I want to add two callbacks to each function: ‘enterFunction' at the beginning and ‘leaveFunction' at the end.
Intuition says that I could ‘just' insert CallInst's to the first and the last basic blocks in the function.

Am I correct? Are there any other/better way to do this? Is there anything special I need to be aware of?

Thank you.

The '-finstrument-functions' option may already be sufficient for your needs.

When selected this inserts the following two calls on entry-to and exit-from a function:

  __cyg_profile_func_enter(void* this_fn, void* call_site)
  __cyg_profile_func_exit(void* this_fn, void* call_site)

You can then provide a custom implementation of these calls to perform the analysis tasks that you require.

The two parameters are the address of the function into which the instrumentation is inserted, and the address of the call site to the instrumentation function.

There is an "Execution Trace" implementation of these instrumentation hooks on GitHub called 'eTrace' that may guide you, though I can't recall the link.

All the best,

  MartinO

Thanks for the hint, I didn’t know about this option. That’s a great reference!

However, I am trying to be a compiler/language agnostic.
Also (for whatever reasons) I need a numeric ID of a function rather then its address.
So the question is still opened.

May I assume that the following always holds:
The first basic block in a function is an entry point and the last basic block in a function is an exit point
?

A numeric ID is difficult (or impossible) at compile-time, because at the time of compilation, nothing is known about the functions compiled in other translation units. So the usual trick is to use a map from the address of the function (which is unique after linking) to other meta-data - the most common being the Dwarf meta-data. I would usually correlate the IP address of the function to the information in the Dwarf data to give me the whole picture.

The other address - the "call site" - is a bit misleading, as it is usually the address to which the function will "return" rather than the address at which it was called (the return address is easily found). So for example, given the following:

  void foo () { ... }
  void bar () { ... foo(); ... }

The instrumentation for 'foo' will show the address to which it will return to in 'bar' and not the exact call-site to 'foo'. Again, by cross correlating this IP information against the address data in the Dwarf information, you can easily determine where the call was made with reasonable accuracy. A target knowledgeable tool (e.g. debugger) might even be able to further refine this to determine the actual call address with knowledge of the ABI.

I would usually use '-finstrument-functions' with '-g' to get the corresponding address/location information. Since the IP address of the function is unique, the mapping to a unique ordinal (i.e. ID) is pretty straightforward. After all, why invent a custom map when there is already a standard definition for in in Dwarf.

Regarding the BB's; not necessarily. The BB ordering may be changed by the scheduler - for example, in our target we prioritise the BBs in a "trace" based on most probable execution path. I am not 100% sure where the '__cyg_*' hooks are added in the IR, though I suspect that they occur in the front-end (CLang). Somebody else can answer this more accurately. I don't in principal see why the "exit" cannot occur more than once though, so there is no reason why it would be tied to a particular BB, though the "enter" probably is.

In any event, the semantics are the same, the instrumentation 'exit' hook will still tell you when the function exits, what function it is exiting, and where it is returning to regardless of whether it is consolidated to only one BB or replicated across many BBs.

  MartinO

Numeric IDs are trivial in my case since I have the whole program in IR form.
I want to avoid clang specific features like this one since I want to be able to use my tool with IR coming from any language.

Anyways, because of your hint (-finstrument-functions), I learned more about bitcode. Namely, I learned that each basic block has a terminator, which could be a return statement.

So the algorithm seems to be trivial now:
- insert functionEnter into the very first basic block
- insert functionLeave into each basic block which terminator is a return statement

Thanks a lot for your help, Martin, I appreciate it!