Clang Static Analyzer supporting Cross Translation Unit

Hi all,

As far as I know, Clang static analyzer suported interprocedural analysis in one translation unit very well but not very good for the whole program interprocedural analysis.

I got some primilary information about cross translation unit analysis online like this one: But it seems like it’s still an exprimental work, and no more meterials are available.

My work cares about interprocedural analysis a lot. I am kind of struggling on choosing tools to write checkers between clang analyzer and llvm pass. LLVM pass supports interprocedural analysis very well but they don’t have a lot of checkers available like clang analyzer. I wonder will cross translation unit analysis in clang analyzer be supported very solidly in the furture? Is it a promising project that you might be interested putting effort on?

I know it might be very expensive to support both path-sensitive and interprocedural analysis, especially for large systems. They may run out of memory. So I am curious if anyone is working on cross translation unit.

Thank you.

Hi Yingtong,

The work on integration of CTU into Clang Static Analyzer upstream is still ongoing. There were some experimental prototypes and now, as I know, Ericsson CodeChecker contains the most production-close version of CTU.
You should note that it is still experimental and has a number of known bugs and non-implemented functionality; however, we're working on fixing them.

(+ Gabor).

15.02.2018 03:31, Yingtong Liu via cfe-dev пишет:

The current CTU effort erases the boundaries we have between a single translation unit and the whole program, but it isn't going to be powerful enough to be described as a "whole-program" analysis, similarly to how our existing inter-procedural analysis isn't quite "whole translation unit" analysis.

With out static symbolic execution-based approach, we do not ever attempt to understand any significant module of the program "as a whole". Instead, we try to model specific individual functions, and sometimes, occasionally, depending on numerous unobvious circumstances, when we encounter calls of other functions during such modeling, we allow ourselves to descend into the callee function to explore consequences of the function call in the current context. It opens up execution paths that traverse multiple functions, but we always keep in mind that we're still analyzing the program by focusing on a very small part of the code at a time, conducting multiple independent analyses even within a single translation unit, and never assuming understanding the program as a whole.

CTU allows us, sometimes, occasionally, depending on numerous unobvious circumstances, to do the same when we encounter calls of functions that have their bodies defined in a different translation unit, therefore erasing the boundaries and allowing us to focus on more promising execution paths. The current effort is for now focused at that first step for now - erasing the boundaries. As far as i know, not much effort has been done to tweak our heuristics to determine the promising execution paths, but the existing heuristics work pretty well in the new circumstances, and a significant improvement of the bugs-per-second metric is observed, together with a considerable skew from finding deeper bugs within the current translation unit towards finding shallower bugs that require understanding of multiple translation units. But still, and probably even more so, CTU is not whole-program analysis - it's only an effort to erase the artificial boundaries of translation unit, but our static symbolic execution approach would never scale enough to understanding the program as a whole. Even if at all possible, it requires a way more significant effort and advanced techniques.

So the real question here is - what kind of analysis do you want to perform? Is symbolic execution the right tool for your work? Like, for ~1/2 of problems, symbolic execution is not even the right tool: if, for instance, you're trying to find a problem that can be identified by an invariant that holds on all paths (dead code, expression always has the same value, various check-after-use), then the analyzer wouldn't be of much help, because it never guarantees to explore all paths through the program; it's only good for finding specific paths on which a certain invariant is violated (use-after-failed-check, null dereference, memory leak). And also symbolic execution of the whole program's source code doesn't scale, but another analysis method may scale well.

Hi Aleksei and Artem,

Thank you so much for the detailed answers. I decided to write my own checkers from scratch using LLVM because I need the checkers to be more specific and they do rely on interprocedural analysis a lot.