How to do some dataflow analysis via IR on a large project?

I am learning to do some basic dataflow analysis on C code projects,
such as Apache httpd. Now I am a newbie of LLVM IR, and there are some
problems in doing that.
In the official User Manual of how to write a LLVM PASS, they only
show the way to generate IR of a single ".c" source file. But in fact
there are so many examples that the usage and definition of a function
are not in the same ".c" source file, rising the difficulty of
analzing the dataflow of such functions.
So, I want to know, is there any method of strategies to generate the
IR among many related source files?
Thanks a lot !

Dear Shulin,

Sounds like you’ll need an inter-procedural analysis across different source files. One

option I can think of is to manually update the Makefiles, make compiler emit bitcode
files, merge them with llvm-link and run it through your optimization pass as a big bitcode
file. This somehow could involve huge amount of tedious work for a huge project like
Apache, and may be error-prone.

Another option I can think of is the libLTO. It should link all bitcode files and run optimization

passes for you.

http://llvm.org/docs/LinkTimeOptimization.html#phase-3-optimize-bitcode-files

I’ve never dealt with it myself and I don’t find abundant tutorials and examples online. You

probably need to dig into the source code. (in lib/LTO ?)

Correct me where I’m wrong or if there are any other good approaches.

Regards,

Kevin

Dear Kevin,

The first method seems a litttle difficult for me to deal with the
Makefiles of several projects. So the LTO might be the only choice.
Thank you very much.