GSoC'09 question - previous mail w/o html

Hello llvm,

I am a first year PhD student at "Politehnica" University of Timisoara, Romania. I am a part of the LOOSE Research group (http://www.loose.upt.ro) and until now I have been involved in several research projects in reverse-engineering object oriented software.

Currently I am using llvm to statically check C/C++ programs for memory leaks. What I have until now is a llvm FunctionPass that tries to find a CFG path from each malloc or new to a corresponding free or delete. This is achieved by first doing a quick intra-procedural search for leaking paths and then inter-procedurally double checking the results. Further improvements would include a path viability check with a solver and some shape analysis.

This summer however, I plan to create an "optimization" that automatically fixes memory leaks in programs - obviously only those that can be fixed with the available information, for example:

int addAndFree(int *x, int y) {
  y += *x; free(x);
  return y;
}

int functionCalled10MTimesInSomeLoop(int index, int data) {
  int *x = (int*)malloc(sizeof(int));
  if(index < 3) { // do stuff with x, index and data
    *x = data[index];
    return addAndFree(x,index);
  } else
    return -1;
}

the function would not leak if that free instruction was copied in bb2: before ret (see bellow: -O4 version of above)

define i32 @_Z32functionCalled10MTimesInSomeLoopiPi(i32 %index, i32* nocapture %data) nounwind {
entry:
  %0 = malloc i32 ; <i32*> [#uses=1]
  %1 = icmp sgt i32 %index, 2 ; <i1> [#uses=1]
  br i1 %1, label %bb2, label %bb

bb: ; preds = %entry
  %2 = getelementptr i32* %data, i32 %index ; <i32*> [#uses=1]
  %3 = load i32* %2, align 4 ; <i32> [#uses=1]
  %4 = add i32 %3, %index ; <i32> [#uses=1]
  free i32* %0
  ret i32 %4

bb2: ; preds = %entry
  ; would be nice to insert: free i32* %0 ; here
  ret i32 -1
}

So seeing as this is not on the open projects list, is it even worth applying for SummerOfCode?

best wishes,

Mihai.

PS: sorry for including html in the previous mail

Hello, Mihai

So seeing as this is not on the open projects list, is it even worth
applying for SummerOfCode?

You're free to suggest your own idea for the project (if you feel that
LLVM would benefit from it, surely). However, for such sort of thing
I'd suggest you to prepare something like project proposal - to allow
others understand what are your goals, etc.

Hello,

This doesn't sound advisable. A memory leak in a normal C/C++ program
is a bug, so this wouldn't be an optimization -- it'd be a transformation that
unreliably hides bugs.

Dan

It doesn't sound correct either. I, for one, have a multithreaded code where all "created" threads allocate "request" objects, put them in a list and wake up the main thread. The main thread on the other hand sleeps, consumes requests, frees the objects it consumed and goes back to sleep in a loop. There is no path between the malloc() and the free(), but that's how it is supposed to be, there is no leak.

Anthony

You are right it's not an optimization, it is a bug-fixing transformation and I wasn't really planning on being totally silent and covert about anything I find (fixable or not).

Regarding precision, without a minimum of real-world case studies not much can be said, but I will insert fix code only for allocations that do not escape (via returns, arguments, member fields or global variables) and are deleted on an viable alternate CFG path. Granted, I will loose some leaks but the ones I do find are likely fixable.

I'll prepare a short project proposal with a summary of the approach and some expected goals, however this feels like research material and I don't know if it is eligible for GSoC funding....

best wishes,

Mihai.

This summer however, I plan to create an "optimization" that
automatically fixes memory leaks in programs - obviously only those
that can be fixed with the available information, for example:

Hello,

This doesn't sound advisable. A memory leak in a normal C/C++
program
is a bug, so this wouldn't be an optimization -- it'd be a
transformation that
unreliably hides bugs.

It doesn't sound correct either. I, for one, have a multithreaded
code where all "created" threads allocate "request" objects, put them
in a list and wake up the main thread. The main thread on the other
hand sleeps, consumes requests, frees the objects it consumed and goes
back to sleep in a loop. There is no path between the malloc() and
the free(), but that's how it is supposed to be, there is no leak.

You are right it's not an optimization, it is a bug-fixing
transformation and I wasn't really planning on being totally silent
and covert about anything I find (fixable or not).

This sounds more like something that could be added to the clang static analyzer ( http://clang.llvm.org/StaticAnalysis.html ). I'm not a big fan of the compiler changing the semantics of code that a programmer wrote. It's full of pitfalls. If, on the other hand, you can tell the programmer that there's the potential for a leak and let her determine if it's a "real" leak, that would be much better for all concerned.

Regarding precision, without a minimum of real-world case studies not
much can be said, but I will insert fix code only for allocations that
do not escape (via returns, arguments, member fields or global
variables) and are deleted on an viable alternate CFG path. Granted, I
will loose some leaks but the ones I do find are likely fixable.

I'll prepare a short project proposal with a summary of the approach
and some expected goals, however this feels like research material and
I don't know if it is eligible for GSoC funding....

Sounds good!

-bw

This summer however, I plan to create an "optimization" that
automatically fixes memory leaks in programs - obviously only those
that can be fixed with the available information, for example:

Hello,

This doesn't sound advisable. A memory leak in a normal C/C++
program
is a bug, so this wouldn't be an optimization -- it'd be a
transformation that
unreliably hides bugs.

It doesn't sound correct either. I, for one, have a multithreaded
code where all "created" threads allocate "request" objects, put them
in a list and wake up the main thread. The main thread on the other
hand sleeps, consumes requests, frees the objects it consumed and
goes
back to sleep in a loop. There is no path between the malloc() and
the free(), but that's how it is supposed to be, there is no leak.

You are right it's not an optimization, it is a bug-fixing
transformation and I wasn't really planning on being totally silent
and covert about anything I find (fixable or not).

This sounds more like something that could be added to the clang
static analyzer ( http://clang.llvm.org/StaticAnalysis.html ).

I just skimmed through clang's static analyzer web-pages and couldn't find an example analysis, I will dig deeper later. One of the reasons why llvm is preferred is because of the SSA form representation which makes analysis implementation easier - but then again I haven't seen a clang static analysis yet...

I'm not
a big fan of the compiler changing the semantics of code that a
programmer wrote. It's full of pitfalls. If, on the other hand, you
can tell the programmer that there's the potential for a leak and let
her determine if it's a "real" leak, that would be much better for all
concerned.

As it turns out there have been some experiments with a static, compile-time garbage collector for Java, ( http://www.cs.cornell.edu/projects/frex/index.php ) and in one of the papers they say they were able to statically deallocate 50%-85% of objects - but for Java where GC is assumed and there's no address-of operator.

best wishes.

Mihai.