Profiling with LLVM.

Dear Duncan,

Thank you a lot for your feedback. I have a problem though. The branch weights counters overflow in some files and thus I get incorrect numbers.
Is there any way to find a workaround for that? Is is supposed to be a known bug or is it something that needs configuration on my part?

Again, thank you a lot for your reply.

Best Regards,
Georgios Zacharopoulos

It's up to the frontend (or whatever is generating the branch weights metadata) to scale the branch weights down appropriately. You can have a look at how clang does it for an example (IIRC, it's in clang's lib/CodeGen/CodeGenPGO.cpp, but possibly just the caller is there and the scaling logic is somewhere in LLVM).

When are your counters overflowing? If they overflow during an optimization pass, that sounds like a bug.

Dear Duncan,

I am generating branch-weights annotated IR files as described in the documentation of LLVM, using profiling with instrumentation.
http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation

e.g.

llvm-profdata merge -output=$(BENCH).profdata default.profraw
clang -S -emit-llvm -O3 -fprofile-instr-use=$(BENCH).profdata -o bench.prof.ll bench.c

The issue is that in some benchmarks I get crazy numbers in the annotated metadata inside the generated *.ll files.

e.g.

!16 = !{!“branch_weights”, i32 -2147483648, i32 0}
!155 = !{!“branch_weights”, i32 1075807200, i32 -1501637297}
!181 = !{!“branch_weights”, i32 -965299388, i32 218980800}

This should be a counter overflow.

Now the interesting thing is that by using these annotated files as input for the BasicBlockFrequency analysis pass,the output seems to give correct numbers, regarding the Frequency execution of each Basic Block, even though few of the counters have overflowed.

This seems like a bug, unless I need to do specific configurations while running the profiling part before the analysis.
From your experience, would you say that the BasicBlockFrequency analysis pass output is to be trusted? Is it known to be stable or do I need to be really cautious and always inspect the output? Are there any common cases of not having accurate profiling?

As I mentioned earlier, the analysis pass seems fine to me, but I have only tested it for a number of benchmarks.

Best Regards.
Georgios Zacharopoulos

Dear Duncan,

I am generating branch-weights annotated IR files as described in the
documentation of LLVM, using profiling with instrumentation.
http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation

e.g.

llvm-profdata merge -output=$(BENCH).profdata default.profraw

clang -S -emit-llvm -O3 -fprofile-instr-use=$(BENCH).profdata -o
bench.prof.ll bench.c

The issue is that in some benchmarks I get crazy numbers in the annotated
metadata inside the generated *.ll files.

e.g.

!16 = !{!"branch_weights", i32 -2147483648, i32 0}

!155 = !{!"branch_weights", i32 1075807200, i32 -1501637297}
!181 = !{!"branch_weights", i32 -965299388, i32 218980800}

This should be a counter overflow.

It is not counter overflow. Branch weights are not the same as branch
profile counts. Branch weights are intended to represent branch probability
and the absolute value of 'weight' does not mean anything. For branch
weights that come from real profile data, they may look like real profile
counts if not scaled. The negative value is a problem in dumping -- it
should be printed as uint32.

In fact, BPI and MBPI no longer have weight based interfaces (since the
concept of weight is confusing). However 'weight' remains in the meta data
representation.

Now the interesting thing is that by using these annotated files as input
for the BasicBlockFrequency analysis pass,the output seems to give correct
numbers, regarding the Frequency execution of each Basic Block, even though
few of the counters have overflowed.

The correct frequency information is expected except for a couple of known
cases where block frequency propagation does not work well. For instance
handling irreducible loops, infinite loops (in general branch with zero
weights) etc.

To get the real block and edge/branch profile count, you should look at the
computed frequency data and combine it with function's
'function_entry_count' meta data. The later is the real profile count of
the entry block.

This seems like a bug, unless I need to do specific configurations while
running the profiling part before the analysis.
From your experience, would you say that the BasicBlockFrequency analysis
pass output is to be trusted? Is it known to be stable or do I need to be
really cautious and always inspect the output? Are there any common cases
of not having accurate profiling?

For common cases, it should be trusted. If you see problems, please file
bugs.

thanks,

David

Dear David,

Thank you a lot for clarifying all these points. Yes, I was also thinking that the final block profile count should be equal to the computed frequency data (floating point number) of each Basic Block multiplied with the entry count of the respective function.

In case I come across any unexpected behaviour in the future I will notify the llvm-bugs list.

Best Regards,
Georgios Zacharopoulos