counting branch frequencies

Thanks Alastair.

Is it possible to associate the branch frequency counts with the basic blocks
in the intermediate representation? (e.g. Can I access basic block
frequencies in runOnFunction()?)

Also, I was able to produce a 'llvmprof.out' file. What is the format of this file? How can I parse it?

  Thanks.
-Apala

Hi Apala,

Is it possible to associate the branch frequency counts with the basic
blocks
in the intermediate representation? (e.g. Can I access basic block
frequencies in runOnFunction()?)

Profile data really needs to be loaded at a module level, but once this has been done it can be accessed at any level (including function).

In LLVM 3.1 ProfileInfo stores block execution frequencies (use -profile-loader).

For LLVM svn you can look at BlockFrequencyInfo, which I generates its data from BranchFrequencyInfo, which in turn uses the branch weight metadata (set by -profile-metadata-loader). I haven't actually tried this though, so I'm not sure how accurately the block frequencies are maintained.

Also, I was able to produce a 'llvmprof.out' file. What is the format of
this file? How can I parse it?

Very roughy the format of the file is lots of unsigned integers. -profile-loader or -preofile-metadata-loader will parse it for you. Parsing outside of LLVM is tricky as it relies on exact ordering of basic blocks.

Regards,
Alastair.

I tried getting profile data from LLVM 3.1, using the method mentioned below. I tried it out on a simple matrix multiplication program. However, I noticed the following problems:

1. There is a warning message: "WARNING: profile information is inconsistent with the current program!"
2. The basic block counts (obtained from ProfileInfo::getExecutionCount(const BasicBlock*)) are correct only if I have compiled with "-disable-opt" or "-O0". When compiled with "-O3", the basic block counts are bogus values.
3. Some of the function counts (obtained from ProfileInfo::getExecutionCount(const Function*)) are incorrect i.e. they do not equal the number of times the function was invoked.

Can someone please explain why I am experiencing the above problems? Thanks in advance!

-Apala

Apala-

At which stage are you doing -insert-edge-profiling or similar ? My guess is if you do that early the opts/inline coming later at -O3 will mutate the cfg(s) causing errors. You should use -insert-edge-profiling without opts and call opt -O3 during the -profile-metadata-loader step.

-dibyendu

Hi Apala,

Dibyendu is correct that this is likely due to pass order, but things get a bit complicated with -O[1-9] or -std-compile-opts as they insert early passes *before* the profiling code.

I recommend that you use identical optimizations to insert instrumentation and to load the profiling data.

E.g.:
opt -insert-edge-profiling -O3 foo.bc -o foo.2.bc
opt -profile-loader -O3 foo.bc -o foo.opt.bc

(The same applies to -profile-metadata-loader.) The -O3 on the first line seems pointless, but without it the CFG will be different (due to the early passes it inserts which run before any user specified passes).

I've been thinking about submitting a patch to also include profile passes in the early passes (it they are turned on), but that seems quite limiting (a user may want to run them after other passes).

This should fix all your problems (I hope!). Problem 1 means the program CFG does not match and thus the profiling data is basically gibbberish (hence problems 2 and 3). Perhaps that is another patch: fail on this condition rather than just warn.

Hope this helps,
Alastair.

Can we not run the -insert-edge-profiling and -profile-loader passes at the beginning of the opt? Orthogonal point is, is it worth doing any optimizations when -insert-edge-profiling is specified on command line?

-Prashantha

Thanks everyone for the replies. After some experimentation, I found that the order in which the passes are specified matters:

opt -O3 -profile-loader matmult.bc -o matmult.opt.bc (works)
opt -profile-loader -O3 matmult.bc -o matmult.opt.bc (does not work)

Also, I am able to avoid the inconsistency warning only for optimization levels -O3 and -O2. I get that warning when using -O1 and -disable-opt.

Anyone else have this experience? Or, any ideas why the above might happen?

Thanks.
-Apala

Another issue is with ProfileInfo::getExecutionCount(Function* F). Looking at the source code and results, I am seeing that it always returns the execution count of the entry basic block of the function. If the entry basic block is part of a loop, its execution count does not match the function invocation count.

Is my assumption wrong, that ProfileInfo::getExecutionCount(Function* F) is supposed to return the function invocation count?
If my assumption is not wrong, is there any way to fix the results?

Thanks.
-Apala

Hi Apala,

Another issue is with ProfileInfo::getExecutionCount(Function* F).
Looking at the source code and results, I am seeing that it always
returns the execution count of the entry basic block of the function. If
the entry basic block is part of a loop, its execution count does not
match the function invocation count.

Is my assumption wrong, that ProfileInfo::getExecutionCount(Function* F)
is supposed to return the function invocation count?
If my assumption is not wrong, is there any way to fix the results?

Your assumption is correct. It should be returning the execution count of the 0->entry edge (it is edges that are being profiled, but obviously the counters have to exist in basic blocks).

If entry is also part of a loop then that edge should be getting split. It is possible that you have found a bug. I will look into it, but not until the weekend at the earliest. (Note: I'm not especially interested in fixing -profile-loader, but -profile-metadata-loader works is a similar manner, so it could also be affected.)

It is also possible this is still just a pass order issue. I.e. the CFG has the same number of basic blocks so no issue is warned about, but the block layout is different so the counters do not match. The profiling code is extremely fragile with respect to the CFG shape.

Regards,
Alastair.

Hi Apala,

Hi Prashantha,

Can we not run the -insert-edge-profiling and -profile-loader passes
at the beginning of the opt? Orthogonal point is, is it worth doing
any optimizations when -insert-edge-profiling is specified on
command line?

Yes, we could, and if they were tied into the early pass code it would prevent a lot of confusion (though the early pass stuff only happens for -O[1-9] or -std-compile-opts, but I'm sure something could be figured out).

Though for Apala's case (using LLVM 3.1) it would be disastrous as nothing preserves ProfileInfo. Branch weight metadata (LLVM 3.2 -profile-metadata-loader) would do much better as a lot of effort (not me, others!) has gone into making sure that the metadata is preserved.

Regards,
Alastair.