Static Profiling Algorithms in LLVM

Hi all,

Does LLVM infrastructure contain implementation of any static profiling algorithm apart from “Spill-Weight” calculation present in Live Intervals class? The future work page does suggest implementation of some “static profiling” algorithms to make an educated guesses about the relative execution frequencies of various parts of the code.

Thanks

–Kapil

Hi all,

Does LLVM infrastructure contain implementation of any static profiling
algorithm apart from "Spill-Weight" calculation present in Live Intervals
class? The future work page does suggest implementation of some "static
profiling" algorithms to make an educated guesses about the relative
execution frequencies of various parts of the code.

If you look at old releases, there was a profiling library and
transforms, including infrastructure to do any of the profiling with
random sampling. Those were unmaintained and removed not that long
ago. What you could profile was somewhat limited.

Andrew

Hello Kapil,

  I have implemented a static profiler for LLVM as a google summer of
code project in 2009. I wrote it for the 2.4 branch, but the
implementation never made into the tree. I have recently ported it to
LLVM 2.8, but I haven't tested it. You can take a look at the code

  The implementation is based on Wu's [1994] paper and provides a
branch predictor that calculates probabilities. The implementation
also covers an intraprocedural and interprocedural frequency
calculator for edges and functions.

  Reference:
Youfeng Wu and James R. Larus. Static branch frequency and program
profile analysis. In MICRO 27: Proceedings of the 27th annual
international symposium on Microarchitecture. IEEE, 1994.

  Regards,
    Andrei

Thanks Andrei!

I haven’t read the paper. I would see whether this fulfills my requirements or whether I need to make any changes.

–Kapil

My god! I would love a branch predictor! It would simplify many aspects of my register allocator.

Second, I am surprised it did not make it into the tree. Since more is being done with register allocation as a while “RegAllocBasic” was just put in, I hope this is looked at again.

Do you have a working svn copy?

Also, could you send me a copy/link to that '94 paper off the list please?

-Thanks
-Jeff Kunkel

Hello Jeff,

My god! I would love a branch predictor! It would simplify many aspects of
my register allocator.

The branch predictor of the implementation is not as accurate as the
one from the paper, but it is close enough. Unfortunately, the branch
predictor is a very expensive pass, because it relies on post
domination information for some heuristics which is costly to
calculate.

Second, I am surprised it did not make it into the tree. Since more is being
done with register allocation as a while "RegAllocBasic" was just put in, I
hope this is looked at again.
Do you have a working svn copy?

I created the patch based on the official 2.8 branch release, but as I
said before I didn't get the time to fully tested it on this release.

Also, could you send me a copy/link to that '94 paper off the list please?

You can get it from the following link:
  http://ftp.cs.wisc.edu/pub/techreports/1994/TR1248.pdf

Regards,
  Andrei

You said it was expensive, but if you had to put a big-o estimate on it, what would it be?

-Thanks
Jeff Kunkel

Hi Jeff,

  There is an algorithm to build the dominator tree that is O(n2),
where n is the number of nodes on the control flow graph. I believe
exists another that is linear, but I don't which one of them is
implemented in LLVM.

  The problem is that the branch predictor requires post dominance
information. None of the LLVM basic passes require post dominance
information (AFAIK), hence it is not available for us to use for free.
That's why there is a hidden cost involved with our pass.

  z, Andrei

The MachineLoopInfo pass uses machine dominator tree information. So another pass using it should not cost much. I am going to patch it in and try it out.

My allocator is running about O(2km + coalescing/phi elimination) speed and O(m+k) space where m is the number of instructions and k is the number of registers.

  • Thanks
  • Jeff Kunkel