Reining in profile instrumentation

When either ‘-pg’ or ‘-finstrument-functions’ is used, the compiler inserts the appropriate profiling hooks. This happens prior to inlining, so the hooks remain in place.

Normally this is fine, but with C++ and the heavy use of inline functions and templates, there can be a vast number of trivial functions that are normally optimised away; but with the instrumentation hooks present, this does not happen and the code becomes severely larger and more expensive to execute. Also, because of this, the program being profiled does not even approximately resemble the normal program with no profiling hooks, so the data gathered is of little use.

My question is whether there are any mechanisms in LLVM to control what functions get instrumented; for instance ‘#pragma’s that can be added to the code, especially headers, that can be used to disable the instrumentation of large groups of functions. Or an option to remove the instrumentation during inlining?

But I really do need a way of preventing the instrumentation of large numbers of functions is a simple way.

Thanks,

MartinO

When either ‘-pg’ or ‘-finstrument-functions’ is used, the compiler inserts the appropriate profiling hooks. This happens prior to inlining, so the hooks remain in place.

Have you tried compiling with -fprofile-generate? It enables IR-based profiling
instrumentation, which has supported pre-inlining since r275588. That should
mitigate the issue you're seeing with excessive instrumentation.

Normally this is fine, but with C++ and the heavy use of inline functions and templates, there can be a vast number of trivial functions that are normally optimised away; but with the instrumentation hooks present, this does not happen and the code becomes severely larger and more expensive to execute. Also, because of this, the program being profiled does not even approximately resemble the normal program with no profiling hooks, so the data gathered is of little use.

The pre-inlining should address this issue. E.g, if A calls B, B calls C, and
B+C are inlined into A, then the profile you'd get back is {1, 0, 0}. Without
pre-inlining, you'd get back {1, 1, 1}.

That said, I don't know what kinds of issues this would cause in practice. I'd
really like to hear about how the performance of your optimized application
changes when you turn pre-inlining on during the instrumentation step.

You can experiment with this with -mllvm -disable-preinline.

My question is whether there are any mechanisms in LLVM to control what functions get instrumented; for instance ‘#pragma’s that can be added to the code, especially headers, that can be used to disable the instrumentation of large groups of functions. Or an option to remove the instrumentation during inlining?

Not that I'm aware of. One option is to not pass -fprofile-blah into
translation units you don't want instrumented.

best,
vedant

Thanks Vedant, and my apologies for the delay getting back to you - work got "busy".

I wasn't aware of the '-fprofile-generate' option, so thanks for point this out. I have tried running it and I can see the instrumentation hooks that it generates - I assume that there is a library I have to implement to support this, can you let me know where the source for this library is?

This approach uses the C++ ctor initialisation support which is generally fine. However, in many cases in our embedded target programmers often forbid using static objects so that they can eliminate the start-up overhead of their initialisation; but that's another issue.

So the reason we cannot simply exclude source files from instrumentation, is that the majority of the real code involved tends to reside in the headers in the source for massively inlined template classes, and it is the instrumentation of these that is creating the real problem. And we do want to profile our own functions in the source file itself. For instance, a simple accessor function such as:

  // From 'header.h'
  struct X {
    int k;
    int getK() const { return k; }
    ...
  };

  // In 'source.cpp'
  #include "header.h"
  ...
  X anX;
  ...
  int check = anX.getK();

Now the tiny accessor function which is usually trivially eliminated during inlining, is unnecessarily instrumented with the '__cyg_profile_func_enter' and '__cyg_profile_func_exit' calls, as well as the calling function.

Magnify this by the expansion and inlining of many hundreds of such functions, and the overhead becomes very large. And unfortunately, it also hides the true cost of the component that the programmer actually wants to measure. This is why I was wondering was there a '#pragma' that might allow me to write (contrived '#pragma' syntax):

  // In 'source.cpp'
  #pragma push profile instrumentation
  #pragma disable profile instrumentation
  #include "header.h"
  #pragma pop profile instrumentation
  ...
  X anX;
  ...
  int check = anX.getK();

or an alternative mechanism. The GCC compiler has the options '-finstrument-functions-exclude-file-list' and '-finstrument-functions-exclude-function-list' for this purpose, but these are not available in CLang/LLVM.

I will experiment with the '-mllvm -disable-preinline' option, thanks for telling me about this too.

All the best,

    MartinO

Thanks Vedant, and my apologies for the delay getting back to you - work got "busy".

No problem :).

I wasn't aware of the '-fprofile-generate' option, so thanks for point this out. I have tried running it and I can see the instrumentation hooks that it generates - I assume that there is a library I have to implement to support this, can you let me know where the source for this library is?

It's compiler-rt's libclang_rt.profile_$platform.a.

(See compiler-rt/lib/profile.)

This approach uses the C++ ctor initialisation support which is generally fine. However, in many cases in our embedded target programmers often forbid using static objects so that they can eliminate the start-up overhead of their initialisation; but that's another issue.

It's possible to use the profiling runtime without static initializers:

http://clang.llvm.org/docs/SourceBasedCodeCoverage.html#using-the-profiling-runtime-without-static-initializers

So the reason we cannot simply exclude source files from instrumentation, is that the majority of the real code involved tends to reside in the headers in the source for massively inlined template classes, and it is the instrumentation of these that is creating the real problem. And we do want to profile our own functions in the source file itself. For instance, a simple accessor function such as:

// From 'header.h'
struct X {
   int k;
   int getK() const { return k; }
   ...
};

// In 'source.cpp'
#include "header.h"
...
X anX;
...
int check = anX.getK();

Now the tiny accessor function which is usually trivially eliminated during inlining, is unnecessarily instrumented with the '__cyg_profile_func_enter' and '__cyg_profile_func_exit' calls, as well as the calling function.

Magnify this by the expansion and inlining of many hundreds of such functions, and the overhead becomes very large. And unfortunately, it also hides the true cost of the component that the programmer actually wants to measure. This is why I was wondering was there a '#pragma' that might allow me to write (contrived '#pragma' syntax):

// In 'source.cpp'
#pragma push profile instrumentation
#pragma disable profile instrumentation
#include "header.h"
#pragma pop profile instrumentation
...
X anX;
...
int check = anX.getK();

or an alternative mechanism. The GCC compiler has the options '-finstrument-functions-exclude-file-list' and '-finstrument-functions-exclude-function-list' for this purpose, but these are not available in CLang/LLVM.

There isn't such a pragma right now. To implement this, I think we'd need the
frontend to attach a 'no_instrument' attribute to functions that need to be
skipped. Next, we'd need to update the frontend and IR based instrumentation
logic to respect that attribute.

best,
vedant