Profiling data structure

Hello,

I've been working on implementing some basic functionality in order to
use the llvm profiling functionality inside of a kernel (the Xen
hypervisor). The only functionality I'm interested in is being able to
reset the counters, get the size of the data, and dump the data into a
memory buffer.

I have to admit I haven't been able to find a lot of documentation
about how this data is stored inside of the different sections, but
with the code in compiler_rt/libs/profile I've been able to craft
something that seems to do something sensible.

I have however a couple of questions:

- The "Values" field in __llvm_profile_data always seems to be NULL.
   Is this expected? What/why could cause this?

- The fields in the NumValueSites array inside of __llvm_profile_data
   also seem to always be 0.

- Since what I'm coding is a decoupled replacement for the profiling
   runtime inside of compiler_rt, is there anyway that at compile or
   run time I can fetch the version of the profiling data?
   I'm mostly worried that at some point llvm will bump the version
   and change the layout, and then I will have to update my runtime
   accordingly, but without llvm reporting the version used by the
   compiler it's going to be very hard to keep backwards
   compatibility or to detect version bumps.

Thanks, Roger.

Adding who I think are the maintainers of the profile related llvm
bits, sorry should have done that earlier.

+David Li and Vedant Kumar who are active in this area

Hello,

I've been working on implementing some basic functionality in order to
use the llvm profiling functionality inside of a kernel (the Xen
hypervisor). The only functionality I'm interested in is being able to
reset the counters, get the size of the data, and dump the data into a
memory buffer.

I have to admit I haven't been able to find a lot of documentation
about how this data is stored inside of the different sections, but
with the code in compiler_rt/libs/profile I've been able to craft
something that seems to do something sensible.

I have however a couple of questions:

- The "Values" field in __llvm_profile_data always seems to be NULL.
   Is this expected? What/why could cause this?

This is for value profiling. For now there are only two kinds: indirect
call targets and memcpy/memset size. If the function body does not have any
value sites, the field will be null.

- The fields in the NumValueSites array inside of __llvm_profile_data
   also seem to always be 0.

See above.

- Since what I'm coding is a decoupled replacement for the profiling
   runtime inside of compiler_rt, is there anyway that at compile or
   run time I can fetch the version of the profiling data?

At compile time, it is the macro: INST_PROF_RAW_VERSION. At runtime, it is
the second field of the raw profile header.

   I'm mostly worried that at some point llvm will bump the version

   and change the layout, and then I will have to update my runtime
   accordingly, but without llvm reporting the version used by the
   compiler it's going to be very hard to keep backwards
   compatibility or to detect version bumps.

yes, the raw profile format can change anytime. We only try to keep
backward compatibility for indexed format.

At runtime with in-process profile merging, if the source raw profile
data's format version is different from the current runtime version, an
error will be emitted.

David

> Hello,
>
> I've been working on implementing some basic functionality in order to
> use the llvm profiling functionality inside of a kernel (the Xen
> hypervisor). The only functionality I'm interested in is being able to
> reset the counters, get the size of the data, and dump the data into a
> memory buffer.
>
> I have to admit I haven't been able to find a lot of documentation
> about how this data is stored inside of the different sections, but
> with the code in compiler_rt/libs/profile I've been able to craft
> something that seems to do something sensible.
>
> I have however a couple of questions:
>
> - The "Values" field in __llvm_profile_data always seems to be NULL.
> Is this expected? What/why could cause this?
>

This is for value profiling. For now there are only two kinds: indirect
call targets and memcpy/memset size. If the function body does not have any
value sites, the field will be null.

You will have to bear with me because my knowledge of compiler
internals is very limited. What is exactly a "value site"?

Can you give me some code examples that trigger this in C?

>
> - Since what I'm coding is a decoupled replacement for the profiling
> runtime inside of compiler_rt, is there anyway that at compile or
> run time I can fetch the version of the profiling data?
>

At compile time, it is the macro: INST_PROF_RAW_VERSION. At runtime, it is
the second field of the raw profile header.

I'm not able to use INST_PROF_RAW_VERSION at compile time. Are you
sure this is exported? If I do:

cc -fprofile-instr-generate -fcoverage-mapping -dM -E - < /dev/null

I don't see INST_PROF_RAW_VERSION neither any similar defines.

   I'm mostly worried that at some point llvm will bump the version
> and change the layout, and then I will have to update my runtime
> accordingly, but without llvm reporting the version used by the
> compiler it's going to be very hard to keep backwards
> compatibility or to detect version bumps.
>
>
yes, the raw profile format can change anytime. We only try to keep
backward compatibility for indexed format.

At runtime with in-process profile merging, if the source raw profile
data's format version is different from the current runtime version, an
error will be emitted.

Keep in mind this is a kernel, so the source is compiled with
"-fprofile-instr-generate -fcoverage-mapping", but the profiling
runtime in compiler_rt is not linked against the kernel.

I would like to have a reliable way that I could use to detect version
bumps in the internal coverage data, so that I can implement the
required changes in my in-kernel coverage code.

I have a series ready for Xen in order to implement this, I will send
the patch with the in-kernel profiling implementation to this list for
review.

Thanks, Roger.

>
> > Hello,
> >
> > I've been working on implementing some basic functionality in order to
> > use the llvm profiling functionality inside of a kernel (the Xen
> > hypervisor). The only functionality I'm interested in is being able to
> > reset the counters, get the size of the data, and dump the data into a
> > memory buffer.
> >
> > I have to admit I haven't been able to find a lot of documentation
> > about how this data is stored inside of the different sections, but
> > with the code in compiler_rt/libs/profile I've been able to craft
> > something that seems to do something sensible.
> >
> > I have however a couple of questions:
> >
> > - The "Values" field in __llvm_profile_data always seems to be NULL.
> > Is this expected? What/why could cause this?
> >
>
> This is for value profiling. For now there are only two kinds: indirect
> call targets and memcpy/memset size. If the function body does not have
any
> value sites, the field will be null.

You will have to bear with me because my knowledge of compiler
internals is very limited. What is exactly a "value site"?

It refers to a location in a function where the compiler inserts the value
profiling hook.

Can you give me some code examples that trigger this in C?

typedef void (*FP)(void);

void test (FP fp) {

     fp(); /* a value site */
}

There are two ways to turn on value profiling:

1) using instrumentation for PGO (we call it IR-PGO):

   -fprofile-generate

2) using Front-end based instrumentation which is used for coverage testing:

   -fprofile-instr-generate -mllvm -enable-value-profiling=true

> >
> > - Since what I'm coding is a decoupled replacement for the profiling
> > runtime inside of compiler_rt, is there anyway that at compile or
> > run time I can fetch the version of the profiling data?
> >
>
> At compile time, it is the macro: INST_PROF_RAW_VERSION. At runtime, it
is
> the second field of the raw profile header.

I'm not able to use INST_PROF_RAW_VERSION at compile time. Are you
sure this is exported? If I do:

cc -fprofile-instr-generate -fcoverage-mapping -dM -E - < /dev/null

Ok. The macro is defined for building the compiler itself, but not passed
down to the user program when compiling it.

If you use IR-PGO (turned on with -fprofile-generate), the information is
stored in a symbol in rodata: __llvm_profile_raw_version -- the least
significant 32bits has the value of the raw version. Unfortunately, the
front-end based instrumentation currently does not emit such a symbol.

I don't see INST_PROF_RAW_VERSION neither any similar defines.

> I'm mostly worried that at some point llvm will bump the version
> > and change the layout, and then I will have to update my runtime
> > accordingly, but without llvm reporting the version used by the
> > compiler it's going to be very hard to keep backwards
> > compatibility or to detect version bumps.
> >
> >
> yes, the raw profile format can change anytime. We only try to keep
> backward compatibility for indexed format.
>
> At runtime with in-process profile merging, if the source raw profile
> data's format version is different from the current runtime version, an
> error will be emitted.

Keep in mind this is a kernel, so the source is compiled with
"-fprofile-instr-generate -fcoverage-mapping", but the profiling
runtime in compiler_rt is not linked against the kernel.

I would like to have a reliable way that I could use to detect version
bumps in the internal coverage data, so that I can implement the
required changes in my in-kernel coverage code.

I have a series ready for Xen in order to implement this, I will send
the patch with the in-kernel profiling implementation to this list for
review.

The right way for this is to define __llvm_profile_raw_version variable
with FE instrumentation as is done by IR-PGO. I have cc'ed Vedant who may
help with this.

thanks,

David

> > On Wed, Oct 25, 2017 at 12:26 AM, Roger Pau Monné via llvm-dev <
> > > - The "Values" field in __llvm_profile_data always seems to be NULL.
> > > Is this expected? What/why could cause this?
> > >
> >
> > This is for value profiling. For now there are only two kinds: indirect
> > call targets and memcpy/memset size. If the function body does not have
> any
> > value sites, the field will be null.
>
> You will have to bear with me because my knowledge of compiler
> internals is very limited. What is exactly a "value site"?
>
>
It refers to a location in a function where the compiler inserts the value
profiling hook.

> Can you give me some code examples that trigger this in C?
>

typedef void (*FP)(void);

void test (FP fp) {

     fp(); /* a value site */
}

There are two ways to turn on value profiling:

1) using instrumentation for PGO (we call it IR-PGO):

   -fprofile-generate

2) using Front-end based instrumentation which is used for coverage testing:

   -fprofile-instr-generate -mllvm -enable-value-profiling=true

I'm currently using -fprofile-instr-generate -fcoverage-mapping which
are the options specified in [0]. ATM I'm only interested in getting
coverage data.

I've created the following simple example:

static void foo(void)
{
  int foo = 1;
}

static void bar(void)
{
  int bar = 1;
}

static void exec(void (*fp)(void))
{
  fp();
}

int main(int argc, char **argv)
{
  int count = 0;

  while ( count++ < 10000 )
    exec(rand() % 2 ? foo : bar);

  return 0;
}

Then I've used my custom made runtime and checked that both Values and
NumValueSites inside of __llvm_profile_data are still NULL.

I've also dumped the coverage data using my own runtime (which ignores
Values and NumValueSites), and got the following profile, which looks
right:

     >static void foo(void)
4.95k|{
4.95k| int foo = 1;
4.95k|}
     >
     >static void bar(void)
5.04k|{
5.04k| int bar = 1;
5.04k|}
     >
     >static void exec(void (*fp)(void))
10.0k|{
10.0k| fp();
10.0k|}
     >
     >int main(int argc, char **argv)
    1|{
    1| int count = 0;
    1|
10.0k| while ( count++ < 10000 )
10.0k| exec(rand() % 2 ? foo : bar);
    1|
    1| return 0;
    1|}

So can Values and NumValueSites be safely ignored in order to obtain
the coverage data?

>
> > >
> > > - Since what I'm coding is a decoupled replacement for the profiling
> > > runtime inside of compiler_rt, is there anyway that at compile or
> > > run time I can fetch the version of the profiling data?
> > >
> >
> > At compile time, it is the macro: INST_PROF_RAW_VERSION. At runtime, it
> is
> > the second field of the raw profile header.
>
> I'm not able to use INST_PROF_RAW_VERSION at compile time. Are you
> sure this is exported? If I do:
>
> cc -fprofile-instr-generate -fcoverage-mapping -dM -E - < /dev/null
>
>
Ok. The macro is defined for building the compiler itself, but not passed
down to the user program when compiling it.

If you use IR-PGO (turned on with -fprofile-generate), the information is
stored in a symbol in rodata: __llvm_profile_raw_version -- the least
significant 32bits has the value of the raw version. Unfortunately, the
front-end based instrumentation currently does not emit such a symbol.

That's right, __llvm_profile_raw_version cannot be used in my case
because it's exported by the runtime, and here I'm not using the
compiler_rt runtime at all.

Would it be possible to export this as a compile time define when
-fprofile-instr-generate -fcoverage-mapping is used?

Does that sound sensible?

> I don't see INST_PROF_RAW_VERSION neither any similar defines.
>
> > I'm mostly worried that at some point llvm will bump the version
> > > and change the layout, and then I will have to update my runtime
> > > accordingly, but without llvm reporting the version used by the
> > > compiler it's going to be very hard to keep backwards
> > > compatibility or to detect version bumps.
> > >
> > >
> > yes, the raw profile format can change anytime. We only try to keep
> > backward compatibility for indexed format.
> >
> > At runtime with in-process profile merging, if the source raw profile
> > data's format version is different from the current runtime version, an
> > error will be emitted.
>
> Keep in mind this is a kernel, so the source is compiled with
> "-fprofile-instr-generate -fcoverage-mapping", but the profiling
> runtime in compiler_rt is not linked against the kernel.
>
> I would like to have a reliable way that I could use to detect version
> bumps in the internal coverage data, so that I can implement the
> required changes in my in-kernel coverage code.
>
> I have a series ready for Xen in order to implement this, I will send
> the patch with the in-kernel profiling implementation to this list for
> review.
>

The right way for this is to define __llvm_profile_raw_version variable
with FE instrumentation as is done by IR-PGO. I have cc'ed Vedant who may
help with this.

Wouldn't it be better to export an internal compiler define rather
than creating a symbol with the profile version?

Thanks, Roger.

[0] Source-based Code Coverage — Clang 18.0.0git documentation

>
> > > On Wed, Oct 25, 2017 at 12:26 AM, Roger Pau Monné via llvm-dev <
> > > > - The "Values" field in __llvm_profile_data always seems to be
NULL.
> > > > Is this expected? What/why could cause this?
> > > >
> > >
> > > This is for value profiling. For now there are only two kinds:
indirect
> > > call targets and memcpy/memset size. If the function body does not
have
> > any
> > > value sites, the field will be null.
> >
> > You will have to bear with me because my knowledge of compiler
> > internals is very limited. What is exactly a "value site"?
> >
> >
> It refers to a location in a function where the compiler inserts the
value
> profiling hook.
>
>
>
> > Can you give me some code examples that trigger this in C?
> >
>
>
> typedef void (*FP)(void);
>
> void test (FP fp) {
>
> fp(); /* a value site */
> }
>
>
> There are two ways to turn on value profiling:
>
> 1) using instrumentation for PGO (we call it IR-PGO):
>
> -fprofile-generate
>
>
> 2) using Front-end based instrumentation which is used for coverage
testing:
>
> -fprofile-instr-generate -mllvm -enable-value-profiling=true

I'm currently using -fprofile-instr-generate -fcoverage-mapping which
are the options specified in [0]. ATM I'm only interested in getting
coverage data.

I've created the following simple example:

static void foo(void)
{
  int foo = 1;
}

static void bar(void)
{
  int bar = 1;
}

static void exec(void (*fp)(void))
{
  fp();
}

int main(int argc, char **argv)
{
  int count = 0;

  while ( count++ < 10000 )
    exec(rand() % 2 ? foo : bar);

  return 0;
}

Then I've used my custom made runtime and checked that both Values and
NumValueSites inside of __llvm_profile_data are still NULL.

It should not. The IR dump shows:

@__profd_ic.c_exec = private global { i64, i64, i64*, i8*, i8*, i32, [2 x
i16] } { i64 3252020653354712924, i64 0, i64* getelementptr inbounds ([1 x
i64], [1 x i64]* @__profc_ic.c_exec, i32 0, i32 0), i8* null, i8* bitcast
([1 x i64]* @__profvp_ic.c_exec to i8*), i32 1, [2 x i16] [i16 1, i16 0] },
section "__llvm_prf_data", align 8

The Value's field is i8* bitcast ([1 x i64]* @__profvp_ic.c_exec to i8*),
and num of value sites for the kind is 1.

I've also dumped the coverage data using my own runtime (which ignores
Values and NumValueSites), and got the following profile, which looks
right:

     >static void foo(void)
4.95k|{
4.95k| int foo = 1;
4.95k|}
     >
     >static void bar(void)
5.04k|{
5.04k| int bar = 1;
5.04k|}
     >
     >static void exec(void (*fp)(void))
10.0k|{
10.0k| fp();
10.0k|}
     >
     >int main(int argc, char **argv)
    1|{
    1| int count = 0;
    1|
10.0k| while ( count++ < 10000 )
10.0k| exec(rand() % 2 ? foo : bar);
    1|
    1| return 0;
    1|}

So can Values and NumValueSites be safely ignored in order to obtain
the coverage data?

Yes, value profiles are not used for coverage testing, it is for PGO only.

> >
> > > >
> > > > - Since what I'm coding is a decoupled replacement for the
profiling
> > > > runtime inside of compiler_rt, is there anyway that at compile
or
> > > > run time I can fetch the version of the profiling data?
> > > >
> > >
> > > At compile time, it is the macro: INST_PROF_RAW_VERSION. At
runtime, it
> > is
> > > the second field of the raw profile header.
> >
> > I'm not able to use INST_PROF_RAW_VERSION at compile time. Are you
> > sure this is exported? If I do:
> >
> > cc -fprofile-instr-generate -fcoverage-mapping -dM -E - < /dev/null
> >
> >
> Ok. The macro is defined for building the compiler itself, but not
passed
> down to the user program when compiling it.
>
> If you use IR-PGO (turned on with -fprofile-generate), the information is
> stored in a symbol in rodata: __llvm_profile_raw_version -- the least
> significant 32bits has the value of the raw version. Unfortunately, the
> front-end based instrumentation currently does not emit such a symbol.

That's right, __llvm_profile_raw_version cannot be used in my case
because it's exported by the runtime, and here I'm not using the
compiler_rt runtime at all.

Not really. The compile_rt's version is a weak symbol. The one defined by
the compiler is the strong definition. IR PGO always defines it so there is
no dependency on runtime. However the current problem is that frontend
instrumentation does not define this symbol.

Would it be possible to export this as a compile time define when
-fprofile-instr-generate -fcoverage-mapping is used?

It can, but the use case seems too narrow to be generally useful.

Does that sound sensible?

> > I don't see INST_PROF_RAW_VERSION neither any similar defines.
> >
> > > I'm mostly worried that at some point llvm will bump the version
> > > > and change the layout, and then I will have to update my runtime
> > > > accordingly, but without llvm reporting the version used by the
> > > > compiler it's going to be very hard to keep backwards
> > > > compatibility or to detect version bumps.
> > > >
> > > >
> > > yes, the raw profile format can change anytime. We only try to keep
> > > backward compatibility for indexed format.
> > >
> > > At runtime with in-process profile merging, if the source raw profile
> > > data's format version is different from the current runtime version,
an
> > > error will be emitted.
> >
> > Keep in mind this is a kernel, so the source is compiled with
> > "-fprofile-instr-generate -fcoverage-mapping", but the profiling
> > runtime in compiler_rt is not linked against the kernel.
> >
> > I would like to have a reliable way that I could use to detect version
> > bumps in the internal coverage data, so that I can implement the
> > required changes in my in-kernel coverage code.
> >
> > I have a series ready for Xen in order to implement this, I will send
> > the patch with the in-kernel profiling implementation to this list for
> > review.
> >
>
>
> The right way for this is to define __llvm_profile_raw_version variable
> with FE instrumentation as is done by IR-PGO. I have cc'ed Vedant who
may
> help with this.

Wouldn't it be better to export an internal compiler define rather
than creating a symbol with the profile version?

It is certainly doable, but I don't see any existing macros that are
similar to this use case. You may want to bring this up to cfe-dev.

thanks,

David