[PGO] Are the `__llvm_profile_` functions stable C APIs across LLVM releases?

We have numerous __llvm_profile_ functions declared in compiler-rt/lib/profile/InstrProfiling.h. It seems at least some of these function (e.g. __llvm_profile_dump()) can be used in a user application for fine-grained control of profile collection. See comment at llvm-project/compiler-rt/lib/profile/InstrProfiling.h at 892862246e7d976251e34029baa013e1b175076a · llvm/llvm-project · GitHub.

Do we expect these __llvm_profile_ functions to be stable across LLVM releases in terms of interface and semantics? Is it safe to assume that if a user calls such functions in their code, when they upgrade to a later version of LLVM, their program behaviour is unchanged when they compile with the newer compiler?

FYI @w2yehia .

There are quite a few __llvm_profile_ functions as you noted; precisely speaking, not all of the functions remains stable (e.g., __llvm_profile_get_padding_sizes_for_counters function signature could change if you look at commit history).

Relatedly, with new LLVM releases, the data format of raw profiles (which might be a part of semantics __llvm_profile_dump depending on how the function is used) could be updated, and raw profile format does not have no backward or forward compatibility guarantees [1]. For instance, if raw profile version is updated in a new LLVM release, raw profile data generated by old compiler cannot be parsed by llvm-profdata, the LLVM command line tool to convert raw profiles to the indexed format (to be used by compiler).

  • Note indexed format used by compiler does backward compatibility guarantee.

Could you elaborate more on how the user code calls these APIs, and what kind of program behavior change or API change is not desired for user code?

p.s., Just in case PGO profile documentation helps, [docs][IRPGO]Document two binary formats for IRPGO profiles by minglotus-6 · Pull Request #76105 · llvm/llvm-project · GitHub is a working-in-progress document.

[1] Source-based Code Coverage — Clang 18.0.0git documentation

The APIs in __llvm namespace (prefix) are public and are intended to be stable in signature and semantics (at least for primary dumping APIs). Internal APIs are named as lprofXXX. If there are changes in signature, it might be unintentional.

Those public APIs are intended to be called by user programs to explicitly control profile dumping, merging etc. For instance, user can start profile collection after the startup/initialization phase.

Thanks so much for the replies @mingmingl-llvm and @davidxl !

As @davidxl noted in his reply, the user program can call such APIs to control the exact timing of profile dumping to collect profile data only for a part of the program execution. For example, the user program can have

// Introduce the API names.
__llvm_profile_reset_counters();
__llvm_profile_dump();

int main(...) {
   initialization(...)
   
   // Reset PGO counters to so profile collection starts after initialization.
#ifdef RESET_COUNTER
  __llvm_profile_reset_counters();
#endif

   // Profile is collected during kernel execution.
   kernel(...);

  // Dump the profile right after kernel execution.
#ifdef DUMP_BEFORE_CLEANUP
  __llvm_profile_dump();
#endif

  // No profile is collected during program clean up.
  cleanup(...);
}

Ah this is awesome! Thanks!

Thanks for the confirmation! I see that currently, an LLVM installation does not include a InstrProfiling.h file which introduces these API names to a user program. This makes sense to me because InstrProfiling.h contains a lot of PGO implementation details in addition to the public APIs. Is it correct that at this time, the user will have to know such names to use them? Is the example above how people typically use these APIs in their programs (introducing names by declaring them, and using macros to guard the calls)?

Does it sound reasonable to introduce a header that is installed, which contains a list of such “primary dumping APIs”? Maybe we can introduce some macro mechanism so that such API calls are guarded automatically so the user can avoid introducing the guards against the calls? See example below.

// New Header. InstrProfileControl.h. 
// List of function names.
void __llvm_profile_dump();

// Macro wrapper so that the user can avoid the guards in their programs. 
// -fprofile-generate can define PROFILE_GENERATE_ON. 
// Users can call __llvm_pgo_profile_dump instead of calling __llvm_profile_dump
// directly.
#ifdef PROFILE_GENERATE_ON
void __llvm_pgo_profile_dump() { __llvm_profile_dump(); }
#else
#define __llvm_pgo_profile_dump()
#endif

Since the API signature of these functions are stable, the common way of calling these functions from user programs is to declare them as extern with a weak reference like so:

extern "C" __attribute__((weak)) int __llvm_profile_dump(void);

Then to call it, you can do the following:

// Check whether this build was linked against the profiling runtime
if (__llvm_profile_dump)
    __llvm_profile_dump();

This avoids the need for a header and avoids the dependency on LLVM (or the need for macros) in the non-instrumented build.

Testing for weak function address against NULL is not supported everywhere (at least on AIX that is not supported).

Specifically, having the linker replace an undefined weak function address with null is not supported on AIX.

Since the number of most useful APIs are very limited, I am leaning towards let introduce a header like you suggested to reduce the number of files to be maintained.

The PGO_GEN and PGO_USE macro themselves are a useful features that may find other uses. The caveat is to avoid introducing code (control flow) divergence between prof-gen and prof-use with these macros, especially for hot code regions .

Thanks for your comments @snehasish and @davidxl !

Sounds good! I will look into adding the header file.

PR posted at [PGO] Exposing PGO's Counter Reset and File Dumping APIs by qiongsiwu · Pull Request #76471 · llvm/llvm-project · GitHub.