GSoC pre-proposal (instrumentation pass)

Hi,

I’d like to propose a GSoC project with the goal of implementing a library for profiling instrumentation of LLVM IR.

Currently, my idea is make the library general enough to insert arbitrary code or a call to a void(*)(void) before or after reads/writes from a specified variable or in the prologue/epilogue of a specified function.

I would like to build more than this on top of the general interface, but a lot of profiling specific code is also above LLVM’s level, such as getting timestamps to measure time spent in a function or outputting results to a file (like gcc -pg).

As a compromise, I could implement the higher level profiling instrumentation in clang. The goal of this part of the project would be to build on top of the general library to the point that clang could support gprof profiling.

The instrumentation would be inserted by an LLVM pass.

I’m posting here to get a feel of what you guys think before I advance in the process and to receive any and all suggestions with an open mind. And also to see if there are any mentors willing guide me in this.

Thanks for reading,
Tyler Hardin

Hi,

Note: I’m not in any way associated with the GSoC infrastructure in the LLVM community; these are purely personal thoughts.

I’d like to propose a GSoC project with the goal of implementing a library for profiling instrumentation of LLVM IR.

Currently, my idea is make the library general enough to insert arbitrary code or a call to a void(*)(void) before or after reads/writes from a specified variable or in the prologue/epilogue of a specified function.

I would like to build more than this on top of the general interface, but a lot of profiling specific code is also above LLVM’s level, such as getting timestamps to measure time spent in a function or outputting results to a file (like gcc -pg).

One of the things that is obviously present in putting in many timing points is a “Heisenberg-type effect”: if you check too frequently you’re perturbing the program (either/both in terms of inhibiting optimizations and changing run-time behaviour). So one thing that I think would be very interesting would be a way to specify which useful positions to put timing points on (not everywhere, but not just at function level granularity). For example, it might be interesting to put timing points before and after loops (or perhaps just outermost loops). Of course this is a very vague notion which would need firming up significantly before beginning. For example, in trying to determine which formulation has the best cache behaviour adding lots of timing points may provide enough extra instructions that a smart out-of-order processor could hide the issues that would manifest in the non-instrumented code. This sort of stuff would, for me, be most interesting if it was done at the LLVM rather than clang level, since not every interesting program is written in a dialect of C.

Now I don’t know if either this is of interest to other members of the LLVM community or this sort of non-known stuff, with a consequently greater uncertainty is suitable for a GSoC project; these are more my thoughts about profiling instrumentation in general.

Cheers,

Dave