Summer of code 2007 project proposal

Dear LLVM developers:

I would like to propose a project for Google's summer of code 2007.

I have been working on binary program matching for the detection of
open source license violations. Basically by using slices generated
from the SSA, it is possible to match binaries generated by different
compilers and transformed by different obfuscators.

So far, my work has been related only to Java, however it is very
important to protect other languages. LLVM can read object files from
GCC and therefore it can process already a very important set of

Even though Java can be processed via gcc, for detecting open source
violations, it is necessary to be able to directly input byte code.
The "pirate" program will not come with source code. Java is a widely
used programming language for open source projects and I believe it
would be nice to have it integrated with LLVM. By using an unified
framework that slices binary programs, human resources can be

My proposal is to add a byte code "frontend" to LLVM. If there is
extra time, I can help fixing bugs as I want to get acquainted with
the tool. Also, I would like to ask a question:
If you open one object file A, do you need to open all the object
files that are needed by A? Can you open one method full of unresolved
references and slice it with the slicing process that you guys are

Thank you very much! I am looking forward to hear your comments.


Arnoldo Muller