Cross Translational Unit Analysis in Clang Static Analyzer

Hi All,

at the EuroLLVM’17 conference we presented our results about a new analysis mode in clang static analyzer: Cross Translational Unit analysis.

See patch https://reviews.llvm.org/D30691

which is based on the work of A. Sidorin et al. http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html, but without function summaries and updated to the newest Clang.

The CTU mode allows the analyzer to “inline” function calls that are defined in another TU than the one currently analyzed.

So it allows to find bugs that span multiple source files.

Without this patch the static analyzer engine, when meets an external function call,

cannot reason about the return value of a function (unknown) and the pointed values, references passed to a function as parameter are invalidated.

You can find a full patched clang 4.0 (use it with llvm commit 01609a325b5f85d88e3ab5c7d470409092436cb2 )

https://github.com/dkrupp/clang/tree/ctu-master

We have run the analysis on some reasonably-sized (ffmpeg, curl, vim, openssl, postgresql) open source C projects and found many additional true positive reports compared to the traditional single TU mode in all projects.

This indicates that this feature would give many new results on any project.

We measured the heap usage, the analysis time and the number of new findings.

You can find the detailed comparison results here:

http://cc.elte.hu/clang-ctu/

In summary, the number of reported bugs is ~1.5-5x times the original single TU analysis, at the cost of 1.5-5x higher analysis time, 1.5-5x max heap usage (roughly in proportion to the increase in the number of reported faults).

The design concept is described shortly in this document: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf

If you would like to try this analysis mode on your project please find the description of the 2 new additional analyzer scripts here:

https://github.com/dkrupp/clang/blob/ctu-master/tools/xtu-build-new/readme.md

Would be happy to hear your opinion and experiences with this feature and would appreciate your help in reviewing the patch.

Thanks & Regards,

Daniel

Hello Daniel & Gabor. Thank you very much for your work!

I saw the patch and found it mostly familiar for me. But, unfortunately, now I cannot find enough time to make its review (my solutions that were implemented 2 years ago need some revisiting too).

I can try to do this review incrementally, by small chunks, if you are OK with it. But it will still take time. Sorry for this inconvenience.

31.03.2017 18:28, Dániel Krupp via cfe-dev пишет:

Hi Aleksei,

your review would be very highly appreciated in any pace and form. J

Thanks,

Daniel

Hi all,

thanks again to Dániel, Gábor, Aleksei and everyone for this work. I'm
going to brain-dump my experiences with the CTU analysis here for the
record. I'm also going to try and review patches when I get time.

At Google, we were trying to perform CTU analysis on Magenta, a
microkernel for the new Fuchsia operating system [0,1]. I wrote a
document describing these efforts; most of the document has been
published here [2]. This might help folks who want to get started
running CTU analysis, although it was last updated in December 2016.

We found several bugs that could not be caught by single-translation
unit analysis. There were both bugs at the *caller* side, where a value
set in a called function (in a different TU) caused an error in the
caller; and also bugs in the *callee* side, where a value passed in as a
parameter from a different TU caused a bug in a called function.

During EuroLLVM, it was mentioned that the analysis is truncated at a
call depth of 4. In practice, I found that the analyser read on average
between 15-20 ASTs from disk for each function that it analysed, and
never more than about 100. (Note, if every function calls 2 functions in
a different TU, then for that function we must load 2 + 4 + 8 + 16 = 30
ASTs from disk). It might be possible to find more bugs by increasing
the call depth, though I didn't experiment with this.

The main problem I ran into was incomplete implementation of
ASTImporter.cpp. In particular, whenever the analyser tries to load an
AST node from disk that does not have an implementation in the AST
Importer, the analyser crashes. So for us, most of the work involved
adding support for the AST nodes that were present in our codebase, but
which were not in the Importer. These were mostly obscure C++
constructs. Note that in some cases, support for those already exists in
Aleksei's patch but not in Gábor's; so it's always worth looking at
Aleksei's patch too.

Note, I'm no longer affiliated with Google (I was just interning there),
but I'm happy to answer whatever questions I can.

[0] https://fuchsia.googlesource.com
[1] Fuchsia: a new operating system [LWN.net]
[2] Cross Translation Unit Static Analysis in Magenta

thanks!

Hi Daniel,

Thank you for sending the patch! While I think that doing whole project analysis via “inlining” is not a scalable solution, this prototype could be useful for the community to experiment with. It can also serve as bases for other two stage analysis, where we collect some data about functions in the first pass and use it in the second pass.

A side benefit is that this direction exercises the ASTImporter and would benefit other uses of it such as lldb.

I am sure there will be a few comments about the patch itself and it’s important to have the workflow integrated into scan-build, which is our user facing tool.

For those interested in the topic, I recommend watching Gabor’s talk at LLVM Euro 2017 once the video is available:
http://llvm.org/devmtg/2017-03//2017/02/20/accepted-sessions.html#7

Thank you!
Anna

Hi Anna,

thanks for the positive feedback. I am sure this analysis option will be useful for many.

We will consult with Laszlo Nagy on how to integrate 2-stage analysis into scan-build-py. I assume you did not mean the perl version of scan-build.

For those who are interested in the latest version of the patch it is moved into this repo:

https://github.com/Ericsson/clang/tree/ctu-os

An extended version of the patch including coverage measurement is available on this branch:

https://github.com/Ericsson/clang/tree/ctu-master

Thanks for the review in advance,

Daniel