count how many basic block executed

Linhai · January 26, 2018, 6:04am

Hello everyone,

I am writing a pass to instrument program and count how many basic block executed. What I have tried is to instrument a local counter inside each function, add 1 to the local counter inside each basic block, and save the counter value to a global counter. The current runtime overhead is around 25%. Is there any way I can try to lower the overhead? Like keeping the local counter inside a register or applying the path profiling algorithm?

Thanks a lot!

Best,

Linhai

John_Criswell4 · January 27, 2018, 9:11pm

By “local counter,” I assume you mean that you created an alloca instruction that allocates memory and that you increment the value in this alloca’ed memory using a load, add, and store instruction. Is that correct? If so, have you tried using the mem2reg pass to convert the local counter into a SSA virtual register? That may speed it up a bit. After that, other LLVM optimizations may be able to remove redundant instructions or combine additions. If that isn’t enough, then you’ll probably need to make your instrumentation smarter. LLVM has passes that you can use to locate loops; if the loop has the right structure, you can increment the count at the end of the loop. Likewise, if you can find control equivalent basic blocks, you only need to increment the counter in one of them. Regards, John Criswell

Linhai · January 28, 2018, 5:06am

Hi John,

Thanks a lot for the reply! I try mem2reg opt and also implement the algorithm proposed in “Efficiently Counting Program Events with Support for On-line Queries” to place the local counter smarter. If I build the executable by using -O0, the overhead would be 20% - 30%. But if I build the executable by using -O2, the overhead would be more than 3X. I feel instrumenting counter will disable some optimization. Any other suggestions I could try?

Thanks a lot!

Best,

Linhai

John_Criswell4 · January 29, 2018, 3:37pm

The overhead is probably getting worse because the baseline is getting better (the program before instrumentation runs faster at -O2 than -O0). I assume you’re running the -O2 optimizations, then instrumenting the code, and then running -O2 again to optimize your instrumentation.Â If you’re not doing that, try it first. Otherwise, you’ll need to take a look at the bitcode that you’re generating after instrumentation and optimization to see what is not getting optimized and develop some ideas as to why the code is not being optimized. Regards, John Criswell

Topic		Replies	Views
Measure execution time of each basic block LLVM Dev List Archives	6	75	September 19, 2014
Questions !! LLVM Dev List Archives	1	66	April 7, 2005
Basic Block API LLVM Dev List Archives	3	43	May 23, 2009
LLVM Profiler uses 32-bit counters for Basic Blocks? LLVM Dev List Archives	0	71	April 26, 2013
LLVM profiling LLVM Dev List Archives	2	61	November 26, 2008

count how many basic block executed

Related Topics