valgrind for BitCode

Hi All,

I’m currently learning llvm, for later use with a research project. I thought a good way to learn would be to use it in a small to medium sized project. A valgrind like tool for BitCode would work quite nicely.

IR will made some things easier and somethings harder.

Good things:

  • valgrind is very tightly tied to the underlying architecture, bcgrind can be totally platform independent.
  • Memory allocation is done with intrinsics, and will be very easy to keep track of. For a memory use analyser (memcheck).

Bad things:

  • A cache profiler will be tricky, because we are quite abstracted from the hardware.
    If a cache emulator was programmed, it could only give rough estimates of the cache
    miss rate.
  • When the bc code calls into native libraries it will be a black box. There are work arounds we can use, to catch things like memory allocation, but they become icky and platform dependent.

The basic idea, will be to follow the llvm style, and create a new interpreter, based on the old one that will have a plug-in architecture, and allow analysis tools to be plugged in.

The question is, will this tool be useful to anyone? does anyone have insights into a good
plug-in architecture? (I was thinking call-backs can be registered with each IR operation and some state information) and, does anyone want to have a hand in cutting the code :0) ?

PS.
Also, I know nothing about OpenMP, but perhaps we could do multithreaded memory access analysis which is something that hasn’t really been done much/well before. The great thing about valgrind is that it can tell you, per-bit, if you have allocated the memory and if it is initialised. This could be done for threads as-well. (ie. how many different threads access this memory, where was this memory is allocated, is there an associated lock?) If people can associate memory regions with locks, we can make sure that no thread ever access synchronised memory without a lock. Although now that I think of it, you could do this with valgrind too.

Hi Andy,

Hi All,

I'm currently learning llvm, for later use with a research project. I
thought a good way to learn would be to use it in a small to medium
sized project. A valgrind like tool for BitCode would work quite
nicely.

Interesting idea :slight_smile:

IR will made some things easier and somethings harder.

Good things:
   * valgrind is very tightly tied to the underlying architecture,
bcgrind can be totally platform independent.

That might be overstating the case a bit. Bitcode *can* be platform
independent, if the front end generating it desires. However, all the
functional front ends we have today are generating bitcode that is very
platform dependent. This is of necessity because the source language is
also platform dependent. However, i don't think this affects bcgrind too
much. You just have to emulate the details correctly.

   * Memory allocation is done with intrinsics, and will be very easy
to keep track of. For a memory use analyser (memcheck).

Yes, although malloc and alloca are LLVM instructions not intrinsics :slight_smile:

Bad things:
   * A cache profiler will be tricky, because we are quite abstracted
from the hardware.
     If a cache emulator was programmed, it could only give rough
estimates of the cache
     miss rate.

Yup. Maybe do this for version two.

   * When the bc code calls into native libraries it will be a black
box. There are work arounds we can use, to catch things like memory
allocation, but they become icky and platform dependent.

Yup.

The basic idea, will be to follow the llvm style, and create a new
interpreter, based on the old one that will have a plug-in
architecture, and allow analysis tools to be plugged in.

That also is a good idea.

The question is, will this tool be useful to anyone?

There are lots of researchers that would find this handy. Susan at NASA
does similar analysis things with the interpreter and separating her
analysis code from the interpreter via plugins would probably be quite
welcome.

does anyone have insights into a good
plug-in architecture? (I was thinking call-backs can be registered
with each IR operation and some state information)

I would like to see a class that derives from some abstract base class.
A plugin consists of an implementation of that class which can be
runtime loaded. The abstract base class contains virtual methods for the
various things of interest. The plugin class overrides only those things
it is interested in.

and, does anyone want to have a hand in cutting the code :0) ?

Got enough on my plate, sorry.

PS.
Also, I know nothing about OpenMP, but perhaps we could do
multithreaded memory access analysis which is something that hasn't
really been done much/well before.

Chandler Caruth is currently working on synchronization primitives for
LLVM. They could be used to support things like OpenMP. Have a chat with
Chandler about it sometime. Should be done by the end of August.

The great thing about valgrind is that it can tell you, per-bit, if
you have allocated the memory and if it is initialised. This could be
done for threads as-well. (ie. how many different threads access this
memory, where was this memory is allocated, is there an associated
lock?)

Yes, that would be extremely handy for debugging synchronization issues
(e.g deadlock).

If people can associate memory regions with locks, we can make sure
that no thread _ever_ access synchronised memory without a lock.
Although now that I think of it, you could do this with valgrind too.

Okay.

Reid.

Andy,

Oops. I meant Sarah at NASA, not Susan. That is, Sarah Thompson. Sorry,
answered pre-coffee this morning.

Reid.