Full program Analysis w. Clang

Hi all,
I am interested in using Clang to write a checker that reads in custom annotations (we are considering using attributes rather than pragmas) to guarantee the absence of some types of bugs. The next step after this is done would be to try and infer as many of the annotations as possible, but this will require whole program analysis. I know that clang works per translation unit (TU), so I was wondering if there is some advice on how to go about it. Perhaps serialize the AST of different TUs, merge them and analyze the whole program offline (after compilation) to prove the desired safety guarantees. It may actually be possible to perform inference per TU, as long as any cyclic call-graph dependencies are within a single TU.

Any feedback/brainstorming will be greatly appreciated!

Cheers!
Alex

Hi all,
I am interested in using Clang to write a checker that reads in custom annotations (we are considering using attributes rather than pragmas) to guarantee the absence of some types of bugs. The next step after this is done would be to try and infer as many of the annotations as possible, but this will require whole program analysis. I know that clang works per translation unit (TU), so I was wondering if there is some advice on how to go about it. Perhaps serialize the AST of different TUs, merge them and analyze the whole program offline (after compilation) to prove the desired safety guarantees.

Are you planning to write a static analyzer checker? In general, we are interested in adding whole program analyzes to the analyzer, but it's an ambitious project.

It may actually be possible to perform inference per TU, as long as any cyclic call-graph dependencies are within a single TU.

Scalability wise, developing summary based analyzes is better than serializing the AST. (Very simply put, you'd analyze each function separately, store their summaries, and re-analyze all the functions again taking into account the generated info. If you have cyclic dependencies, you could consider repeating the process more than once.)

This thread also discusses options on working with multiple TUs:

http://clang-developers.42468.n3.nabble.com/C-analysis-with-Clang-td4024252.html

Hi Anna,

Thanks for the link.

We are planning to write a static analyzer that perform analysis with the aid of limited user annotations. It would be good to fit it in the static analyzer infrastructure to have access to the bug reporter, CFG, etc… The missing piece for us is the whole program analysis support from static analyzer (and as you pointed out, it is a big project), at least from an infrastructure point of view, current static analyzer is invoked per file base so each analysis won’t know other TU.

At the beginning our analysis would be quite simple. It is not flow/path sensitive at least for now and would only (Alex please correct me if I am wrong…) require access to AST to construct call graph. So, it looks like without accessing to all the nice features done in the static analyzer we can still build the tool using libTooling and AST serialization, and at the analysis time hopefully we will have the whole program image ready to consume.

Cheers

Michael

Hi Anna,

Thanks for the link.

We are planning to write a static analyzer that perform analysis with the aid of limited user annotations. It would be good to fit it in the static analyzer infrastructure to have access to the bug reporter, CFG, etc… The missing piece for us is the whole program analysis support from static analyzer (and as you pointed out, it is a big project), at least from an infrastructure point of view, current static analyzer is invoked per file base so each analysis won’t know other TU.

At the beginning our analysis would be quite simple. It is not flow/path sensitive at least for now and would only (Alex please correct me if I am wrong…) require access to AST to construct call graph. So, it looks like without accessing to all the nice features done in the static analyzer we can still build the tool using libTooling and AST serialization, and at the analysis time hopefully we will have the whole program image ready to consume.

Cheers

Michael

Hi all,
I am interested in using Clang to write a checker that reads in custom annotations (we are considering using attributes rather than pragmas) to guarantee the absence of some types of bugs. The next step after this is done would be to try and infer as many of the annotations as possible, but this will require whole program analysis. I know that clang works per translation unit (TU), so I was wondering if there is some advice on how to go about it. Perhaps serialize the AST of different TUs, merge them and analyze the whole program offline (after compilation) to prove the desired safety guarantees. It may actually be possible to perform inference per TU, as long as any cyclic call-graph dependencies are within a single TU.

Any feedback/brainstorming will be greatly appreciated!

We do (very simple) full program (well, full code base) analysis by outputting locally determinable aspects about entities in the code and then reduce that information in a subsequent pass. Getting the full AST for multiple TUs in C++ seems to be pretty much impossible in general, and even if it works, it doesn’t sound like it would scale well.

Cheers,
/Manuel