Two pass analysis framework: AST merging approach

Xazax-hun · May 4, 2016, 1:09pm

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.: http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html

They accomplished a two pass analysis, the first pass is serializing the AST of every translation unit and creates an index of functions, the second pass does the real analysis, which can load the AST of function bodies on demand.

This approach can be used to achieve cross translation unit analysis for the clang Static Analyzer to some extent, but similar approach could be applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static Analyzer, I did some benchmarks to see how feasible this approach is. The baseline was running the Static Analyzer without the two pass analyis, the second one was running using the framework linked above.

For a 150k LOC C projects I got the following results:

The size of the serialized ASTs was: 140MB

The size of the indexes (textual representation): 4.4MB

The time of the analysis was bellow 4X

The amount of memory consumed was bellow 2X

All in all it looks like a feasible approach for some use cases.

I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I was not able to run the analysis due to some missing features in the AST Importer. But I was able to serialize the ASTs and generate the indices:

The siye of the serialized ASTs: 45.4 GB

The siye of the function index: 1,6GB

While these numbers are less promising, I think there are some opportunities to reduce them significantly.

I propose the introduction of an analysis mode for exporting ASTs. In analysis mode the AST exporter would not emit the function body of a function for several cases:

In case a function is defined in a header, do not emit the body.
In case the function was defined in an implicit template specialisation, do not emit the body.

I think after similar optimizations it might be feasible to use this approach on LLVM scale projects as well, and it would be much easier to implement Clang based tools that can utilize cross translation unit capabilities.

In case the analyzer gets a new interprocedural analysis method that would increase the performance the users of this framework would profit from that approach immediately.

Does a framework like this worth mainlining and working on? What do you think?

(Note that, AST Importer related improvements are already being mainlined by Yury et al. My question is about the “analysis mode” for exporting ASTs, and a general framework to consume those exported ASTs.)

Regards,

Gábor

r4nt · May 4, 2016, 1:28pm

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.: http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html

They accomplished a two pass analysis, the first pass is serializing the AST of every translation unit and creates an index of functions, the second pass does the real analysis, which can load the AST of function bodies on demand.

This approach can be used to achieve cross translation unit analysis for the clang Static Analyzer to some extent, but similar approach could be applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static Analyzer, I did some benchmarks to see how feasible this approach is. The baseline was running the Static Analyzer without the two pass analyis, the second one was running using the framework linked above.

For a 150k LOC C projects I got the following results:

The size of the serialized ASTs was: 140MB

The size of the indexes (textual representation): 4.4MB

The time of the analysis was bellow 4X

The amount of memory consumed was bellow 2X

All in all it looks like a feasible approach for some use cases.

I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I was not able to run the analysis due to some missing features in the AST Importer. But I was able to serialize the ASTs and generate the indices:

The siye of the serialized ASTs: 45.4 GB

The siye of the function index: 1,6GB

While these numbers are less promising, I think there are some opportunities to reduce them significantly.

I propose the introduction of an analysis mode for exporting ASTs. In analysis mode the AST exporter would not emit the function body of a function for several cases:

In case a function is defined in a header, do not emit the body.

In case the function was defined in an implicit template specialisation, do not emit the body.

I think after similar optimizations it might be feasible to use this approach on LLVM scale projects as well, and it would be much easier to implement Clang based tools that can utilize cross translation unit capabilities.

I agree that we want cross translation unit analysis to be simpler to implement, but I think that parallelization of single steps will still be key for usability. Thus, I’m not convinced the “one large glob” approach is going to play out (but I might well be wrong).

Xazax-hun · May 4, 2016, 1:30pm

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.:
[cfe-dev] [analyzer] Whole Program Analysis - Full Call Graph

They accomplished a two pass analysis, the first pass is serializing the
AST of every translation unit and creates an index of functions, the second
pass does the real analysis, which can load the AST of function bodies on
demand.

This approach can be used to achieve cross translation unit analysis for
the clang Static Analyzer to some extent, but similar approach could be
applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static
Analyzer, I did some benchmarks to see how feasible this approach is. The
baseline was running the Static Analyzer without the two pass analyis, the
second one was running using the framework linked above.

For a 150k LOC C projects I got the following results:
The size of the serialized ASTs was: 140MB
The size of the indexes (textual representation): 4.4MB
The time of the analysis was bellow 4X
The amount of memory consumed was bellow 2X

All in all it looks like a feasible approach for some use cases.

I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately
I was not able to run the analysis due to some missing features in the AST
Importer. But I was able to serialize the ASTs and generate the indices:
The siye of the serialized ASTs: 45.4 GB
The siye of the function index: 1,6GB

While these numbers are less promising, I think there are some
opportunities to reduce them significantly.

I propose the introduction of an analysis mode for exporting ASTs. In
analysis mode the AST exporter would not emit the function body of a
function for several cases:
- In case a function is defined in a header, do not emit the body.
- In case the function was defined in an implicit template
specialisation, do not emit the body.

I think after similar optimizations it might be feasible to use this
approach on LLVM scale projects as well, and it would be much easier to
implement Clang based tools that can utilize cross translation unit
capabilities.

I agree that we want cross translation unit analysis to be simpler to
implement, but I think that parallelization of single steps will still be
key for usability. Thus, I'm not convinced the "one large glob" approach is
going to play out (but I might well be wrong).

Sorry for omitting this information, the serialized ASTs are actually not
one large glob, but one for each translation unit. Are you comfortable with
this level of granularity?

r4nt · May 4, 2016, 1:37pm

Ah, sorry, I misread. That’s definitely more interesting. Any ideas on whether we can also shard this?
For example, can we produce an AST of all directly called functions for each translation unit? (that will contain some denormalization, but on the other hand would enable effectively sharding the computation across machines)

Xazax-hun · May 4, 2016, 2:00pm

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.:
[cfe-dev] [analyzer] Whole Program Analysis - Full Call Graph

They accomplished a two pass analysis, the first pass is serializing
the AST of every translation unit and creates an index of functions, the
second pass does the real analysis, which can load the AST of function
bodies on demand.

This approach can be used to achieve cross translation unit analysis
for the clang Static Analyzer to some extent, but similar approach could be
applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static
Analyzer, I did some benchmarks to see how feasible this approach is. The
baseline was running the Static Analyzer without the two pass analyis, the
second one was running using the framework linked above.

For a 150k LOC C projects I got the following results:
The size of the serialized ASTs was: 140MB
The size of the indexes (textual representation): 4.4MB
The time of the analysis was bellow 4X
The amount of memory consumed was bellow 2X

All in all it looks like a feasible approach for some use cases.

I also tried to do a benchmark on the LLVM+Clang codebase.
Unfortunately I was not able to run the analysis due to some missing
features in the AST Importer. But I was able to serialize the ASTs and
generate the indices:
The siye of the serialized ASTs: 45.4 GB
The siye of the function index: 1,6GB

While these numbers are less promising, I think there are some
opportunities to reduce them significantly.

I propose the introduction of an analysis mode for exporting ASTs. In
analysis mode the AST exporter would not emit the function body of a
function for several cases:
- In case a function is defined in a header, do not emit the body.
- In case the function was defined in an implicit template
specialisation, do not emit the body.

I think after similar optimizations it might be feasible to use this
approach on LLVM scale projects as well, and it would be much easier to
implement Clang based tools that can utilize cross translation unit
capabilities.

I agree that we want cross translation unit analysis to be simpler to
implement, but I think that parallelization of single steps will still be
key for usability. Thus, I'm not convinced the "one large glob" approach is
going to play out (but I might well be wrong).

Sorry for omitting this information, the serialized ASTs are actually not
one large glob, but one for each translation unit. Are you comfortable with
this level of granularity?

Ah, sorry, I misread. That's definitely more interesting. Any ideas on
whether we can also shard this?
For example, can we produce an AST of all directly called functions for
each translation unit? (that will contain some denormalization, but on the
other hand would enable effectively sharding the computation across
machines)

I think right now the serialized ASTs are just binary AST dumps. As far as
I remember during the import there is an assumption: for each function body
all the required type declarations/definitions are available. I think it is
possible to do sharding but implementing this is non trivial and would
require significant amount of engineering. Right now I think the best way
(in terms of effort needed to get it done) to distribute the analysis
across machines is to export as little redundant information as possible,
so it is feasible to store a copy of the serialized ASTs on each of the
machines.

AnnaZaks · May 5, 2016, 2:11am

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.: http://lists.llvm.org/pipermail/cfe-dev/2015-December/046299.html

They accomplished a two pass analysis, the first pass is serializing the AST of every translation unit and creates an index of functions, the second pass does the real analysis, which can load the AST of function bodies on demand.

This approach can be used to achieve cross translation unit analysis for the clang Static Analyzer to some extent, but similar approach could be applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static Analyzer, I did some benchmarks to see how feasible this approach is. The baseline was running the Static Analyzer without the two pass analyis, the second one was running using the framework linked above.

Can you explain what the “two pass analysis” does? Ex: Does it loop through each of the TU second time and “inline” every call from other TUs? In which order are the other TUs loaded? In which order the call sites are processed? Do you repeat until no change? Did you measure coverage in some way? Did you perform path-sensitive checks? (The time of analysis of 4X seems much lower than what I would expect, given that we now explore much deeper paths and the analyzer has exponential running time.)

Artem_Dergachev1 · May 5, 2016, 6:27am

Can you explain what the "two pass analysis" does?

> Ex: Does it loop through each of the TU second time and
> “inline” every call from other TUs? In which order are the other
> TUs loaded? In which order the call sites are processed? Do you
> repeat until no change? Did you measure coverage in some way?
> Did you perform path-sensitive checks? (The time of analysis of 4X
> seems much lower than what I would expect, given that we now
> explore much deeper paths and the analyzer has exponential
> running time.)

On first pass we get a bunch of -emit-ast dumps, on second pass we go ahead and analyze each translation unit old-style, but whenever we find an inter-unit CallEvent during path-sensitive analysis, we import the section of the AST dump containing the function body and all dependent sections, and inline the call. The inlined call may later trigger more imports if there are inter-unit calls we'd end up wanting to model.

Yeah, benchmarking is a bit more difficult than that. I think Alexey has some complicated numbers. I guess the slowdown is only-4x on path-sensitive checks simply because there are too many drops due to -analyzer-config max-nodes= limit. The most practical measurement would probably be to increase limits until the number of reports stops growing. It's also possible to count number of limit drops, number of exploded nodes constructed, number of bug reports with and without unification, we did some of this but not all.

Xazax-hun · May 5, 2016, 10:25am

> Can you explain what the "two pass analysis" does?
> Ex: Does it loop through each of the TU second time and
> “inline” every call from other TUs? In which order are the other
> TUs loaded? In which order the call sites are processed? Do you
> repeat until no change? Did you measure coverage in some way?
> Did you perform path-sensitive checks? (The time of analysis of 4X
> seems much lower than what I would expect, given that we now
> explore much deeper paths and the analyzer has exponential
> running time.)

Yes, path sensitive checks were running. I forget to mention: the number of
reports were about 3X. I did not have time to evaluate them yet though.

On first pass we get a bunch of -emit-ast dumps, on second pass we go
ahead and analyze each translation unit old-style, but whenever we find an
inter-unit CallEvent during path-sensitive analysis, we import the section
of the AST dump containing the function body and all dependent sections,
and inline the call. The inlined call may later trigger more imports if
there are inter-unit calls we'd end up wanting to model.

Yeah, benchmarking is a bit more difficult than that. I think Alexey has
some complicated numbers. I guess the slowdown is only-4x on path-sensitive
checks simply because there are too many drops due to -analyzer-config
max-nodes= limit. The most practical measurement would probably be to
increase limits until the number of reports stops growing. It's also
possible to count number of limit drops, number of exploded nodes
constructed, number of bug reports with and without unification, we did
some of this but not all.

It is not just the node limit. This solution also maintains a list which
functions were analyzed (during a call from another TU). This implies that
less functions are analyzed as top level functions.

________

My best idea on reducing AST loads is to relax "typedness" requirements on
SVal hierarchy. For example, if an inter-unit function references an
inter-unit static global variable, this variable can probably be
represented as some kind of "untyped VarRegion" (let's call this class
"XTUVarRegion", and inherit it from SubRegion directly, rather than from
TypedValueRegion), and then its type (which may be a complicated class or
template-instantiation declaration) doesn't need to be imported. The
XTUVarRegion should still be uniquely determined by the variable - we need
to know that two different functions imported from that translation unit
refer to the same variable. Not sure - maybe some MemSpace magic may be
employed to control invalidation, maybe we could use separate memory spaces
for different translation units.

I do not like the idea of ignoring type information. First it would be
worth to measure what portion of the ASTs are actually type related nodes
and not function bodies.

Xazax-hun · May 12, 2016, 12:35pm

Hi!

This e-mail is a proposal based on the work done by Yury Gibrov et al.:
[cfe-dev] [analyzer] Whole Program Analysis - Full Call Graph

They accomplished a two pass analysis, the first pass is serializing the
AST of every translation unit and creates an index of functions, the second
pass does the real analysis, which can load the AST of function bodies on
demand.

This approach can be used to achieve cross translation unit analysis for
the clang Static Analyzer to some extent, but similar approach could be
applicable to Clang Tidy and other clang based tools.

While this method is not likely to be a silver bullet for the Static
Analyzer, I did some benchmarks to see how feasible this approach is. The
baseline was running the Static Analyzer without the two pass analyis, the
second one was running using the framework linked above.

For a 150k LOC C projects I got the following results:
The size of the serialized ASTs was: 140MB
The size of the indexes (textual representation): 4.4MB
The time of the analysis was bellow 4X
The amount of memory consumed was bellow 2X

All in all it looks like a feasible approach for some use cases.

I also tried to do a benchmark on the LLVM+Clang codebase. Unfortunately I
was not able to run the analysis due to some missing features in the AST
Importer. But I was able to serialize the ASTs and generate the indices:
The siye of the serialized ASTs: 45.4 GB
The siye of the function index: 1,6GB

While these numbers are less promising, I think there are some
opportunities to reduce them significantly.

I propose the introduction of an analysis mode for exporting ASTs. In
analysis mode the AST exporter would not emit the function body of a
function for several cases:
- In case a function is defined in a header, do not emit the body.

I did some experiments and unfortunately it looks like less than the 10% of
the size of the serialized ASTs can be accounted for function bodies that
are in headers.

Xazax-hun · June 20, 2016, 8:30am

Hi!

I did some additional benchmarks.

statistics.json (33.4 KB)

Xazax-hun · July 12, 2016, 11:59am

Hi!

(As a ping), I would like to summarize the measurements I done since the original e-mail:

The approach is to first serialize all the translation units to the storage, create an index of the functions, and then load them lazily on demand to achieve cross translation unit support. This does not modify the inter-procedural analysis of the Static Analyzer and could be used for Clang Tidy as well. Once a new inter-procedural analysis is introduced for the Static Analyzer, the cross tu support would profit from it immediately.

Benchmarks:

rAthena, a 150k LOC C project:

The size of the serialized ASTs was: 140MB

The size of the indexes: 4.4MB

The time of the analysis was bellow 4X

The amount of memory consumed was bellow 2X

The number of reports is 3 times more

Xerces, 150k LOC C++ project:
The size of the serialized ASTs was: 800MB
The size of the indexes: 90MB

The analysis time using CTU was the half of the one without CTU

LLVM + Clang + Clang tools extra:
The size of the serialized ASTs was: 45.4 GB
The size of the indexes: 1,6GB

Some optimization effort to reduce te size of the CFG:
TU ASTs after omitting function bodies from headers: 42.7 GB
TU ASTs after omitting STL: 34.0 GB
TU ASTs after skipping implicit instantiations: 21.5 GB
TU ASTs after omitting STL and implicit instantiations: 16.0 GB

Considering that the build directory of a debug build is also about 40 GB on my platform, I do not consider the size of the serialized ASTs a blocking issue. However, in case it is a concern, according to the attached statistics about the serialized AST dumps, there are some optimization possibilities in the binary format.

This solution also works in a multi process build/analysis environment. Some of the necessary framework, for example ASTImporter code is being accepted into mainline clang right now.

All in all, this approach:

Can discover new bug reports as is.
Feasible to implement, does not need sever modifications to the Static Analyzer or Clang Tidy.
Has acceptable performance for lots of the real world projects.

I think, this would be a useful addition to the clang source tree. Do you agree?

Regards,

Gábor

statistics.json (33.4 KB)

r4nt · July 12, 2016, 12:09pm

Hi!

(As a ping), I would like to summarize the measurements I done since the original e-mail:

The approach is to first serialize all the translation units to the storage, create an index of the functions, and then load them lazily on demand to achieve cross translation unit support. This does not modify the inter-procedural analysis of the Static Analyzer and could be used for Clang Tidy as well. Once a new inter-procedural analysis is introduced for the Static Analyzer, the cross tu support would profit from it immediately.

Benchmarks:

rAthena, a 150k LOC C project:

The size of the serialized ASTs was: 140MB

The size of the indexes: 4.4MB

The time of the analysis was bellow 4X

The amount of memory consumed was bellow 2X

The number of reports is 3 times more

Xerces, 150k LOC C++ project:
The size of the serialized ASTs was: 800MB
The size of the indexes: 90MB

The analysis time using CTU was the half of the one without CTU

LLVM + Clang + Clang tools extra:
The size of the serialized ASTs was: 45.4 GB
The size of the indexes: 1,6GB

Some optimization effort to reduce te size of the CFG:
TU ASTs after omitting function bodies from headers: 42.7 GB
TU ASTs after omitting STL: 34.0 GB
TU ASTs after skipping implicit instantiations: 21.5 GB
TU ASTs after omitting STL and implicit instantiations: 16.0 GB

Considering that the build directory of a debug build is also about 40 GB on my platform, I do not consider the size of the serialized ASTs a blocking issue. However, in case it is a concern, according to the attached statistics about the serialized AST dumps, there are some optimization possibilities in the binary format.

This solution also works in a multi process build/analysis environment. Some of the necessary framework, for example ASTImporter code is being accepted into mainline clang right now.

All in all, this approach:

Can discover new bug reports as is.

Feasible to implement, does not need sever modifications to the Static Analyzer or Clang Tidy.

Has acceptable performance for lots of the real world projects.

I think, this would be a useful addition to the clang source tree. Do you agree?

I definitely think this is interesting - I’d be curious if we could design the interfaces in a way that we can
a) fully split out indexing from analysis
b) allow storing the indexing data in a distributed storage system

Xazax-hun · July 12, 2016, 12:36pm

Hi!

(As a ping), I would like to summarize the measurements I done since the
original e-mail:

The approach is to first serialize all the translation units to the
storage, create an index of the functions, and then load them lazily on
demand to achieve cross translation unit support. This does not modify the
inter-procedural analysis of the Static Analyzer and could be used for
Clang Tidy as well. Once a new inter-procedural analysis is introduced for
the Static Analyzer, the cross tu support would profit from it immediately.

Benchmarks:
rAthena, a 150k LOC C project:
The size of the serialized ASTs was: 140MB
The size of the indexes: 4.4MB
The time of the analysis was bellow 4X
The amount of memory consumed was bellow 2X
The number of reports is 3 times more

Xerces, 150k LOC C++ project:
The size of the serialized ASTs was: 800MB
The size of the indexes: 90MB
The analysis time using CTU was the half of the one without CTU

LLVM + Clang + Clang tools extra:
The size of the serialized ASTs was: 45.4 GB
The size of the indexes: 1,6GB

Some optimization effort to reduce te size of the CFG:
TU ASTs after omitting function bodies from headers: 42.7 GB
TU ASTs after omitting STL: 34.0 GB
TU ASTs after skipping implicit instantiations: 21.5 GB
TU ASTs after omitting STL and implicit instantiations: 16.0 GB

Considering that the build directory of a debug build is also about 40 GB
on my platform, I do not consider the size of the serialized ASTs a
blocking issue. However, in case it is a concern, according to the attached
statistics about the serialized AST dumps, there are some optimization
possibilities in the binary format.

This solution also works in a multi process build/analysis environment.
Some of the necessary framework, for example ASTImporter code is being
accepted into mainline clang right now.

All in all, this approach:
- Can discover new bug reports as is.
- Feasible to implement, does not need sever modifications to the Static
Analyzer or Clang Tidy.
- Has acceptable performance for lots of the real world projects.

I think, this would be a useful addition to the clang source tree. Do you
agree?

I definitely think this is interesting - I'd be curious if we could design
the interfaces in a way that we can
a) fully split out indexing from analysis

In the current implementation the analysis and dumping and indexing ASTs
are two separate passes, so they can be used completely independently.
Which is a very good design choice in my opinion.

b) allow storing the indexing data in a distributed storage system

I do not have much experience with distributed storage systems, but I think
that should work. The only problem might be the size of the individual AST
dumps.

r4nt · July 12, 2016, 12:45pm

Yea, I think my main point is that I’d want the interface on the clang side to not be coupled to things being on a file system (basically have an interface for loading the indexed files).

AnnaZaks · July 15, 2016, 6:17am

Hi!

(As a ping), I would like to summarize the measurements I done since the original e-mail:

The approach is to first serialize all the translation units to the storage, create an index of the functions, and then load them lazily on demand to achieve cross translation unit support. This does not modify the inter-procedural analysis of the Static Analyzer and could be used for Clang Tidy as well. Once a new inter-procedural analysis is introduced for the Static Analyzer, the cross tu support would profit from it immediately.

Benchmarks:

rAthena, a 150k LOC C project:

The size of the serialized ASTs was: 140MB

The size of the indexes: 4.4MB

The time of the analysis was bellow 4X

The amount of memory consumed was bellow 2X

The number of reports is 3 times more

Xerces, 150k LOC C++ project:
The size of the serialized ASTs was: 800MB
The size of the indexes: 90MB

The analysis time using CTU was the half of the one without CTU

LLVM + Clang + Clang tools extra:
The size of the serialized ASTs was: 45.4 GB
The size of the indexes: 1,6GB

Some optimization effort to reduce te size of the CFG:
TU ASTs after omitting function bodies from headers: 42.7 GB
TU ASTs after omitting STL: 34.0 GB
TU ASTs after skipping implicit instantiations: 21.5 GB
TU ASTs after omitting STL and implicit instantiations: 16.0 GB

Considering that the build directory of a debug build is also about 40 GB on my platform, I do not consider the size of the serialized ASTs a blocking issue. However, in case it is a concern, according to the attached statistics about the serialized AST dumps, there are some optimization possibilities in the binary format.

This solution also works in a multi process build/analysis environment. Some of the necessary framework, for example ASTImporter code is being accepted into mainline clang right now.

All in all, this approach:

Can discover new bug reports as is.

Hi Gabor,

I’d like to get more data on this point. We know that inlining-style inter-procedural analysis does not scale. That is the main issue with adding the cross-translation unit support to the clang static analyzer.

I suspect that the only way the suggested approach works is that we time out more often when the TU support is enabled. Since the static analyzer performs DFS of the paths, this means that we trade catching bugs localized within a single TU to bugs that require deep cross-TU analysis. I suspect those bugs are much harder to understand, but more, importantly, we trade coverage in a lot of cases instead of gaining it.

Looks like you clam that the number of bugs reported increased for one of the projects, but there is no data for all the other projects. Also, had that project ever been analyzed with the clang static analyzer before and had some of the intra-procedural bugs been fixed before you made the measurement? I’d like to see the absolute number of bugs reported in both cases. Also, it would be good to measure coverage and the number of times the analyzer times out in both cases. (Those statistics should be easy to generate as the tracking is already implemented by the analyzer and we used it when bringing up inter procedural analysis.)

Thank you,
Anna.

Xazax-hun · July 26, 2016, 8:07am

Hi!

(As a ping), I would like to summarize the measurements I done since the
original e-mail:

The approach is to first serialize all the translation units to the
storage, create an index of the functions, and then load them lazily on
demand to achieve cross translation unit support. This does not modify the
inter-procedural analysis of the Static Analyzer and could be used for
Clang Tidy as well. Once a new inter-procedural analysis is introduced for
the Static Analyzer, the cross tu support would profit from it immediately.

Benchmarks:
rAthena, a 150k LOC C project:
The size of the serialized ASTs was: 140MB
The size of the indexes: 4.4MB
The time of the analysis was bellow 4X
The amount of memory consumed was bellow 2X
The number of reports is 3 times more

Xerces, 150k LOC C++ project:
The size of the serialized ASTs was: 800MB
The size of the indexes: 90MB
The analysis time using CTU was the half of the one without CTU

LLVM + Clang + Clang tools extra:
The size of the serialized ASTs was: 45.4 GB
The size of the indexes: 1,6GB

Some optimization effort to reduce te size of the CFG:
TU ASTs after omitting function bodies from headers: 42.7 GB
TU ASTs after omitting STL: 34.0 GB
TU ASTs after skipping implicit instantiations: 21.5 GB
TU ASTs after omitting STL and implicit instantiations: 16.0 GB

Considering that the build directory of a debug build is also about 40 GB
on my platform, I do not consider the size of the serialized ASTs a
blocking issue. However, in case it is a concern, according to the attached
statistics about the serialized AST dumps, there are some optimization
possibilities in the binary format.

This solution also works in a multi process build/analysis environment.
Some of the necessary framework, for example ASTImporter code is being
accepted into mainline clang right now.

All in all, this approach:
- Can discover new bug reports as is.

Hi Gabor,

I’d like to get more data on this point. We know that inlining-style
inter-procedural analysis does not scale. That is the main issue with
adding the cross-translation unit support to the clang static analyzer.

I suspect that the only way the suggested approach works is that we time
out more often when the TU support is enabled. Since the static analyzer
performs DFS of the paths, this means that we trade catching bugs localized
within a single TU to bugs that require deep cross-TU analysis. I suspect
those bugs are much harder to understand, but more, importantly, we trade
coverage in a lot of cases instead of gaining it.

Hi Anna,

Good point. Using the cross TU analysis, the analyzer has very different
coverage pattern. I did not have time to look into the reports yet, but
will do so soon.

Looks like you clam that the number of bugs reported increased for one of
the projects, but there is no data for all the other projects. Also, had
that project ever been analyzed with the clang static analyzer before and
had some of the intra-procedural bugs been fixed before you made the
measurement?

There is no trace of fixed static analyzer warnings in the commit logs of
rAthena project. But in case issues found w/o cross tu and was fixed it can
be still useful to analyze the project with a different coverage pattern.

I’d like to see the absolute number of bugs reported in both cases. Also,
it would be good to measure coverage and the number of times the analyzer
times out in both cases. (Those statistics should be easy to generate as
the tracking is already implemented by the analyzer and we used it when
bringing up inter procedural analysis.)

I did some measurements, and here are the results for the rAthena project:

Coverage results without cross translation unit:

105499 AnalysisConsumer - The # of basic blocks in the analyzed
functions.4167 AnalysisConsumer - The # of functions and blocks
analyzed (as top level with inlining turned on).10084 AnalysisConsumer
- The # of functions at top level.60.0244651798 AnalysisConsumer - The
% of reachable basic blocks.3360 AnalysisConsumer - The maximum number
of basic blocks in a function.960440 CoreEngine - The # of paths
explored by the analyzer.140425349 CoreEngine - The # of steps
executed.646201 ExprEngine - The # of aborted paths due to
reaching the maximum block count in a top level function24317519
ExprEngine - The # of times RemoveDeadBindings is called489956
ExprEngine - The # of times we inlined a call20300 ExprEngine
- The # of times we re-evaluated a call without inlining

Coverage results with cross translation unit:

150406 AnalysisConsumer - The # of basic blocks in the analyzed
functions.3308 AnalysisConsumer - The # of functions and blocks
analyzed (as top level with inlining turned on).9829 AnalysisConsumer
- The # of functions at top level.48.9545495971 AnalysisConsumer - The
% of reachable basic blocks.3360 AnalysisConsumer - The maximum number
of basic blocks in a function.2088751 CoreEngine - The # of
paths explored by the analyzer.345875017 CoreEngine - The # of
steps executed.464409 ExprEngine - The # of aborted paths due to
reaching the maximum block count in a top level function64756893
ExprEngine - The # of times RemoveDeadBindings is called2176922
ExprEngine - The # of times we inlined a call44215 ExprEngine
- The # of times we re-evaluated a call without inlining

Note that, the results are not fully precise for the following reasons:
* I simply collected and merged the results for each translation unit. In
the cross translation unit case a function might be analyzed from multiple
translation unit, that might include duplicates in the number of analyzed
basic blocks. Inline functions in headers might cause the same effects even
when cross translation unit is turned off.
* The coverage % is also imprecise. Basic blocks not reached in one TU
might reached from another TU during the analysis.

Even though these numbers are just hints, they show about 50% increase in
the number of analyzed blocks. This makes me think that the overall
coverage is increased (however as you pointed out, the pattern is
different). It is also worth noting that the number of functions analyzed
as top level decreased.

In case the issues reported this way are too hard to understand, the
heuristics might be tuned in case this feature is turned on. This is no
silver bullet of course, but once a new interprocedural analysis is
introduced, this would provide an easy way to evaluate it in a cross tu
analysis.

What do you think?

Regards,
Gábor

Xazax-hun · August 9, 2016, 9:03am

Hi!

A side note, it might be worth to consider IDDFS instead of DFS (https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search), to get more optimal coverage patterns regardless of the “shape” of the available code.

What do you think?

Regards,

Gábor

NoQ · August 9, 2016, 9:09am

Tweaking the current search algorithm is a good thing to try, but it's clearly a matter of experiment a lot more than rational thinking. In fact, i've no idea why exactly are we doing depth-first nowadays, but as far as i remember it was much worse when we tried breadth-first.

It seems that because most path-sensitive bugs manifest themselves on a huge number of different paths, the approach that tries to take at least one of those paths [before hitting some limit] turns out to be superior.

Topic		Replies	Views
Cross Translation Unit Support in Clang Clang Frontend	17	213	July 26, 2017
Cross Translational Unit Analysis in Clang Static Analyzer Clang Frontend	5	247	April 10, 2017
[RFC] Much faster cross translation unit (CTU) analysis implementation Static Analyzer	17	1989	April 28, 2022
Clang Static Analyzer without scan-build Clang Frontend	13	213	July 31, 2013
Inter procedural analysis across translation unit in clang static analyzer Clang Frontend	7	178	February 4, 2013

Two pass analysis framework: AST merging approach

Related topics