Programmatic compilation of C++ file into bitcode

I'm building a static analysis tool on top of LLVM. It needs to take in a C++ source file and have LLVM translate it into bitcode. In other words, it basically needs to do this:

   llvmc hello.cpp -emit-llvm -O0 -S -g

Except that instead of writing the bitcode to a file, it needs to load it into memory (presumably as an instance of Module) for further processing and analysis. So my goal is to do essentially what llvmc does, but programmatically by invoking the LLVM API directly.

I thought I could use lib/CompilerDriver/Main.cpp as a guide, but after studying the code and associated docs, I'm stumped. Much of the logic is woven through TableGen'd drivers, so I can't even figure out how the command line options are fed into the LLVM API.

Any suggestions? Thanks,

Trevor

Hi Trevor,

I'm building a static analysis tool on top of LLVM. It needs to take
in a C++ source file and have LLVM translate it into bitcode. In other
words, it basically needs to do this:

    llvmc hello.cpp -emit-llvm -O0 -S -g

behind the scenes it's actually llvm-gcc that is generating the
bitcode.

Except that instead of writing the bitcode to a file, it needs to load
it into memory (presumably as an instance of Module) for further
processing and analysis.

You could just pipe it to your program:

llvm-gcc hello.cpp -emit-llvm -c -o - | analysis_program

So my goal is to do essentially what llvmc

does, but programmatically by invoking the LLVM API directly.

You can add your static analysis to llvm-gcc as an LLVM pass.
If you write it as an LLVM pass then you can also use it from
"opt", which would be convenient.

I thought I could use lib/CompilerDriver/Main.cpp as a guide, but
after studying the code and associated docs, I'm stumped. Much of the
logic is woven through TableGen'd drivers, so I can't even figure out
how the command line options are fed into the LLVM API.

Any suggestions? Thanks,

I don't think you should bother with llvmc: it's a compiler driver,
that launches the real compiler.

Did you take a look at the clang static analyser?

Ciao,

Duncan.

Thanks, that sounds like a good approach. It appears I can get a Module instance simply by inheriting from ModulePass.

There's one problem, however. I will at some point want to integrate this analysis tool with other tools. For example, an Eclipse plugin might run the analysis tool and then display the analysis results in an Eclipse window. I suppose the plugin could execute "opt" as a subprocess and then parse the output, but that seems brittle. I'd prefer to define an API in my analysis tool that other tools could then call. That's why I was trying to build upon projects/sample/* instead of lib/Transforms/Hello/*.

Is there perhaps some structured mechanism for retrieving the output of an LLVM pass? That is, something better than just parsing the output of "opt"...

Thanks,

Trevor

On rereading the opt documentation, I see:

"In a few cases, it will ... generate a file with the analysis output, which is usually done when the output is meant for another program."

I suppose the format of this file is totally dependent on the analyzer, but what about the location of the file? Is there some convention on where analyzers send their output? Otherwise I'm not sure how the other program can find the output, unless of course the analyzer simply dumps the file to a hard-coded location (/tmp?).

Trevor

Replying to myself again...

After sifting through many of the existing transforms, I discovered that new command-line parameters can be added to opt simply by declaring them in the transform code, such as in this example from Internalize.cpp:

   static cl::opt<std::string>
     APIFile("internalize-public-api-file", cl::value_desc("filename"),
       cl::desc("A file containing list of symbol names to preserve"));

So, the calling program can simply pass the name of a file to opt as a parameter, and it will then know exactly where the analyzer will send its output.

Trevor

Of course the option will only affect Passes that actually examine it.
I missed the earlier part of this thread. What kind of output do you
need? I'm guessing some sort of dataflow information.

                                -Dave

I'm not entirely sure at this point, but it will most likely be some representation of the control flow graph that's been processed and filtered in some way. The output won't contain data flow information, at least not yet.

Trevor