[analyzer] Whole Program Analysis - Full Call Graph

Hello Phil,

I met this problem while implementing inter-unit analysis proof-of-concept for CSA. I'll describe my solution here. As I understand, this problem cannot be solved inside CSA, so I used multi-step approach.

First, for all translation units we should collect a list of 'required' and 'exported' functions. To build a list of required declarations, we iterate over AST to find CallExprs that use callee decl without body. To build a list of exported declarations we just dump all the functions that are visible externally.

I used a simple text signature of function as mangled_name@target_triple_arch for dumping. I used additional arch mark to distinguish between functions in multi-arch builds.

Note that some function may be defined in a headers. One should not merge them because they may have different bodies due to macro expansion. These calls are local and the approach below should solve this problem.

After this, we have two lists. List of required functions is just a list of signatures; list of external functions is a list of entries looking like a map: function_signature::file_name@target_triple_arch. Final export map contains items from exported functions whose key was listed in required functions. Resulting mapping is an inter-unit call graph.

You may also need to use C++ mangling-style for C if you build multiple projects that may contain functions with the same name. However, if you need to do this, you may need more complicated approach.

You can take a look at the code. This code also dumps local calls but marks as local explicitly so it may be used to build a whole program call graph. You can find our github repo at https://github.com/haoNoQ/clang/tree/summary-ipa-draft. See tools/clang-func-mapping (clang/ClangFnMapGen.cpp at summary-ipa-draft · haoNoQ/clang · GitHub) and tools/scan-build/xtu-analyze.py (clang/xtu-analyze.py at summary-ipa-draft · haoNoQ/clang · GitHub) for some code.

Hope it will help.

Hello Phil,

I met this problem while implementing inter-unit analysis
proof-of-concept for CSA. I'll describe my solution here. As I
understand, this problem cannot be solved inside CSA, so I used
multi-step approach.

First, for all translation units we should collect a list of 'required'
and 'exported' functions. To build a list of required declarations, we
iterate over AST to find CallExprs that use callee decl without body. To
build a list of exported declarations we just dump all the functions
that are visible externally.

I used a simple text signature of function as
mangled_name@target_triple_arch for dumping. I used additional arch mark
to distinguish between functions in multi-arch builds.

Note that some function may be defined in a headers. One should not
merge them because they may have different bodies due to macro
expansion. These calls are local and the approach below should solve
this problem.

After this, we have two lists. List of required functions is just a list
of signatures; list of external functions is a list of entries looking
like a map: function_signature::file_name@target_triple_arch. Final
export map contains items from exported functions whose key was listed
in required functions. Resulting mapping is an inter-unit call graph.

You might also want to track linker calls and then model static/dynamic linking when matching symbols to improve precision and avoid collisions.

Hello Aleksei.

Thank you very much for the reply and details of your proof-of-concept approach. I will certainly look at the implementation you detailed.

My requirements may be somewhat simpler as I'm mainly concerned with analysis of calls to a well defined SDK API. As long as I can securely find those in the call-graph, it may be ok to ignore some of the more edge-case problems.

Thanks again.

Phil

I use a proprietary linker, so will certainly be looking into hooks into the link process as well. Thank you for you reply.

Phil