Idea for better invoking static analysis via command line

As mentioned by myself, Aaron Ballman, and Philip Reames, in a reply to “Proposal: Integrate static analysis test suites”, the fact that static analysis generates a totally different set of warnings than compilation (not a superset), is surprising to some.

One possibility, in order to preserve the current behavior for any tools that rely on this, is to add an option to clang, something like “-enable-analyze-pass” that the user can specify to run analysis AND compilation.

Thoughts?

I’m all for this idea. There is precedent for this in other tools (Visual Studio’s /analyze). I think it also greatly reduces the need for build interposition via scan-build.

I would ask that you think carefully about the output format of the detailed analysis for -enable-analyze-pass. If people are using -enable-analyze-pass on most of their builds, then plist and html reports are likely to go unread for the most part. Consider making “no detailed analysis” an option for -enable-analyze-pass to help with these use cases.

Another possible (not mutually exclusive) extension point to add static analysis into would be through the Tooling interface and compilation databases - this would allow just the analysis to be run, without making it part of a build (no one would make static analysis part of an interactive build, it’s too slow, right? the only reason it was integrated into the build with scan-build was because it was the best way to discover the build commands - but the Tooling/Compilation Database system allows us to separate build discovery from tool execution)

The judgment of “how much slower is too much slower” is pretty subjective, and depends on the project. Increasing a project’s build from 30 seconds to 5 minutes is probably too much for that project. Increasing a project’s build from 45 minutes to 55 minutes is probably fine though.

Adding a flag to a build is also a much lower barrier to entry to get started. There are a lot of build systems out there, and scan-build and compilation databases only work easily with a few of those systems. Adding a build flag is pretty easy on every build system that I’ve ever worked with.

If the work is done right, then the combined compile+analyze execution could be faster than a compile action followed by an analyze action. If you’re willing to give up “#ifndef clang_analyzer”, then the AST from the compile can be reused by the analyzer. Even if you aren’t willing to give up that feature, doing a compile of a file immediately followed by an analysis of the same file is probably going to be faster than doing all the compiles then all the analyzes due to disk caching effects.

The static analyzer has a “shallow” mode designed to be fast enough to be included as part of normal builds. This can be enabled by passing '-Xclang -analyzer-config -Xclang mode=shallow’ to clang —analyze. Apple exposes this as a setting in Xcode and people do use it.

Devin

I’m still not quite used to the archaic mailing list format, but I thought the proper thing to do when replying to a message on a mailing list is to do a reply all? I totally missed these three messages, because they were part of the silly “batched” messages.

Consider making “no detailed analysis” an option for -enable-analyze-pass to help with these use cases

Eh? Do you mean less detailed output or less detailed analysis done by clang?

Another possible (not mutually exclusive) extension point to add static
analysis into would be through the Tooling interface and compilation
databases - this would allow just the analysis to be run, without making it
part of a build (no one would make static analysis part of an interactive
build, it’s too slow, right? the only reason it was integrated into the
build with scan-build was because it was the best way to discover the build
commands - but the Tooling/Compilation Database system allows us to
separate build discovery from tool execution)

I’m not (yet) terribly familiar with the tooling interface. What exactly do you mean? Also: what do you mean by “interactive build”?

Adding a flag to a build is also a much lower barrier to entry to get
started.

Example #1: I don’t have Perl. Not many machines have Perl. That makes scan-build problematic.

“Not many” is relative to Python. Python ate the world.

If the work is done right, then the combined compile+analyze execution
could be faster than a compile action followed by an analyze action. If
you’re willing to give up “#ifndef clang_analyzer”, then the AST
from the compile can be reused by the analyzer.

Not just faster, but also better. Currently the analyzer makes some inlining decisions on its own; currently the analyzer can’t make use of any kind of analysis or folding done by the optimization passes.

I’ll CC some other static analysis (more knowledgeable & important that me) people.

Neither Python nor Perl is in the FreeBSD base system. Additionally, we can very easily add flags to our build system, but we can’t reliably run it to completion with something that interposes on CC, because various things define a different CC for different things.

Having a well-documented set of flags for static analysis would also make it much easier to integrate with systems such as CMake. I’d love to be able to kick off a build in our CI systems for things that use CMake that would do the build and analysis in parallel, with neither blocking the other, and be able to start running tests once the build has finished, even if the static analysis is still ongoing. All of the dependency metadata for doing this exists in our build system, none of it is easily exploited by scan-build (for things that use CMake, all of it could be extracted from the generated JSON, but it would be nicer to just have some separate targets that ninja knew were independent top-level things).

David

I mean analysis that just emits diagnostics on the console, but doesn’t emit a .plist or .html report.

Adding a flag to a build is also a much lower barrier to entry to get
started.

Example #1: I don’t have Perl. Not many machines have Perl. That makes scan-build problematic.

“Not many” is relative to Python. Python ate the world.

Neither Python nor Perl is in the FreeBSD base system. Additionally, we can very easily add flags to our build system, but we can’t reliably run it to completion with something that interposes on CC, because various things define a different CC for different things.

The new scan-build rewrite can interpose on the build system without interposing on CC. It produces a compilation database as output.
http://llvm.org/viewvc/llvm-project?view=revision&revision=257533
http://reviews.llvm.org/D9600

As I’ve explained in the the other thread (http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html), there are reasons to discourage usage of the static analyzer directly from command line:

"Most importantly, end users should never invoke the analyzer by calling “clang —analyze” since “clang —analyze” is an implementation detail of the static analyzer. The only advertised user facing clang static analysis tool is scan-build (see http://clang-analyzer.llvm.org). Here are some reasons for that. For one, it is almost impossible to understand why the static analyzer warns without examining the error paths. Second, the analyzer could be extended to perform whole project analysis in the future and “clang —analyze” works with a single TU at a time.

I agree that the best user experience is to report all warnings in one place, while still differentiating which warning was reported by which tool. It would be awesome if the results from all bug finding tools such as the clang static analyzer, the compiler, and clang-tidy would be reported through the same interface.

The CodeChecker team is working on a solution for that and I hope we can incorporate their technology in LLVM/clang.
"

Having a well-documented set of flags for static analysis would also make it much easier to integrate with systems such as CMake. I’d love to be able to kick off a build in our CI systems for things that use CMake that would do the build and analysis in parallel, with neither blocking the other, and be able to start running tests once the build has finished, even if the static analysis is still ongoing. All of the dependency metadata for doing this exists in our build system, none of it is easily exploited by scan-build (for things that use CMake, all of it could be extracted from the generated JSON, but it would be nicer to just have some separate targets that ninja knew were independent top-level things).

As others mentioned in that thread, even though we do not encourage using 'clang —analyze’, the options are documented in clang help, so you could integrate it into your build system. The main issues would be hard to understand results and possibility that the integration is going to break in the future.

How user-friendly is the text output? Had anyone analyzed a large codebase, triaged the results, and fixed the reported bugs with just relying on the text output?

If you have a large, “dirty” code base, then the test output on its own isn’t going to be terribly useful.

However, you can get useful information from the text output if you start with a code base that already runs cleanly through the static analyzer. The warning is likely pointing at some code that you just modified. An example from one of the tests…

 long *lp1 = malloc(sizeof(short)); // expected-warning {{Result of ‘malloc’ is converted to a pointer of type ‘long’, which is incompatible with sizeof operand type ‘short’}}

That warning text on the command line is useful by itself, especially if it is fresh code.

Some checkers may produce less useful output. That’s fine, and I can understand why we may not want to make stdout-only the default as a result. I do think it’s useful as an option though.

The new scan-build rewrite can interpose on the build system without interposing on CC. It produces a compilation database as output.
http://llvm.org/viewvc/llvm-project?view=revision&revision=257533
http://reviews.llvm.org/D9600

Do you foresee the python version of scan-build being suitable for use in XCode? Would you expect the python version of scan-build to be suitable for a Visual Studio build that uses the clang front end?

As I’ve explained in the the other thread (), there are reasons to discourage usage of the static analyzer directly from command line:

"Most importantly, end users should never invoke the analyzer by calling “clang —analyze” since “clang —analyze” is an implementation detail of the static analyzer. The only advertised user facing clang static analysis tool is scan-build (see http://clang-analyzer.llvm.org). Here are some reasons for that. For one, it is almost impossible to understand why the static analyzer warns without examining the error paths. Second, the analyzer could be extended to perform whole project analysis in the future and “clang —analyze” works with a single TU at a time.

Yes, --analyze is currently an implementation detail. I would prefer that not be the case. It is my opinion that --analyze (or something like it) should be the supported user interface, and that scan-build should be a client of that interface. The same HTML and/or plist (or neither!) could be generated from the command line. With the current state of support, we have high profile customers (XCode) of static analysis using unsupported flags.

As for post-processing and whole program analysis, I think the best user experience here would be to embed some meta-data in .o files, and let the clang-driver orchestrate the post-processing / WPA during the linking step. This closely parallels how link-time-optimization works, and still makes for a work-flow with a low barrier to entry. Just put --enable-analyze-pass in the compile line and the link line, and you can still get the nice index.html and whole program analysis without any extra tools.

As for post-processing and whole program analysis, I think the best user experience here would be to embed some meta-data in .o files, and let the clang-driver orchestrate the post-processing / WPA during the linking step. This closely parallels how link-time-optimization works, and still makes for a work-flow with a low barrier to entry. Just put --enable-analyze-pass in the compile line and the link line, and you can still get the nice index.html and whole program analysis without any extra tools.

That’s exactly what I’m thinking.

Somewhat related, although slightly out of date as we’re actually getting modules in C++17: “Large Code Base Change Ripple Management in C++: My thoughts on how a new Boost C++ Library could help” (Niall Douglas)

The new scan-build rewrite can interpose on the build system without interposing on CC. It produces a compilation database as output.
http://llvm.org/viewvc/llvm-project?view=revision&revision=257533
http://reviews.llvm.org/D9600

Do you foresee the python version of scan-build being suitable for use in XCode? Would you expect the python version of scan-build to be suitable for a Visual Studio build that uses the clang front end?

The Xcode build system (xcodebuild) directly integrates the static analyzer. It knowingly relies on the “implementation detail” and we’ll be ready to update our integration if WPA is added. The reason for this is that we are not afraid to break Xcode, which we control. Xcode IDE directly consumes the plist files. The new scan-build will have support for xcodebuild.

As I’ve explained in the the other thread (), there are reasons to discourage usage of the static analyzer directly from command line:

"Most importantly, end users should never invoke the analyzer by calling “clang —analyze” since “clang —analyze” is an implementation detail of the static analyzer. The only advertised user facing clang static analysis tool is scan-build (see http://clang-analyzer.llvm.org). Here are some reasons for that. For one, it is almost impossible to understand why the static analyzer warns without examining the error paths. Second, the analyzer could be extended to perform whole project analysis in the future and “clang —analyze” works with a single TU at a time.

Yes, --analyze is currently an implementation detail. I would prefer that not be the case. It is my opinion that --analyze (or something like it) should be the supported user interface, and that scan-build should be a client of that interface. The same HTML and/or plist (or neither!) could be generated from the command line. With the current state of support, we have high profile customers (XCode) of static analysis using unsupported flags.

Ok. I agree that it would be beneficial to officially support 'clang —analyze’ as an option to analyze a single translation unit. We’ve essentially been doing that for years and many do rely on it whether we want it on not. I think the default output should be either plist or some other format that preserves the same level of information (JASON?) because all the other formats are less precise.

As for post-processing and whole program analysis, I think the best user experience here would be to embed some meta-data in .o files, and let the clang-driver orchestrate the post-processing / WPA during the linking step. This closely parallels how link-time-optimization works, and still makes for a work-flow with a low barrier to entry. Just put --enable-analyze-pass in the compile line and the link line, and you can still get the nice index.html and whole program analysis without any extra tools.

Performing WPA at the link step is not necessarily the right design option and I would not want to commit to it before the feature is designed. Moreover, most build systems call the linker directly, so the driver option might not buy us much anyway. This means that users may need to revisit their integration once WPA is added (given that they’d want to take advantage of that feature).

Consider making “no detailed analysis” an option for -enable-analyze-pass to help with these use cases

Eh? Do you mean less detailed output or less detailed analysis done by clang?

I mean analysis that just emits diagnostics on the console, but doesn’t emit a .plist or .html report.

How user-friendly is the text output? Had anyone analyzed a large codebase, triaged the results, and fixed the reported bugs with just relying on the text output?

If you have a large, “dirty” code base, then the test output on its own isn’t going to be terribly useful.

However, you can get useful information from the text output if you start with a code base that already runs cleanly through the static analyzer. The warning is likely pointing at some code that you just modified. An example from one of the tests…

long *lp1 = malloc(sizeof(short)); // expected-warning {{Result of ‘malloc’ is converted to a pointer of type ‘long’, which is incompatible with sizeof operand type ‘short’}}

This is not a path-sensitive check whereas most of the analyzer checks are path-sensitive. It might be hard for people to understand the warnings by just looking at the text output, which has mainly been designed for lit testing. This is also a problem with advertising ‘clang —analyzer’ to the end users, who are not likely to implement their own plist viewers. Scan-build is a much better “batteries included” option.

I’m still not quite used to the archaic mailing list format, but I thought the proper thing to do when replying to a message on a mailing list is to do a reply all? I totally missed these three messages, because they were part of the silly “batched” messages.

Consider making “no detailed analysis” an option for -enable-analyze-pass to help with these use cases

Eh? Do you mean less detailed output or less detailed analysis done by clang?

Another possible (not mutually exclusive) extension point to add static
analysis into would be through the Tooling interface and compilation
databases - this would allow just the analysis to be run, without making it
part of a build (no one would make static analysis part of an interactive
build, it’s too slow, right? the only reason it was integrated into the
build with scan-build was because it was the best way to discover the build
commands - but the Tooling/Compilation Database system allows us to
separate build discovery from tool execution)

I’m not (yet) terribly familiar with the tooling interface. What exactly do you mean? Also: what do you mean by “interactive build”?

Adding a flag to a build is also a much lower barrier to entry to get
started.

Example #1: I don’t have Perl. Not many machines have Perl. That makes scan-build problematic.

“Not many” is relative to Python. Python ate the world.

If the work is done right, then the combined compile+analyze execution
could be faster than a compile action followed by an analyze action. If
you’re willing to give up “#ifndef clang_analyzer”, then the AST
from the compile can be reused by the analyzer.

We cannot drop support for “#ifndef clang_analyzer”, which many users rely on for false positive suppression.

Not just faster, but also better. Currently the analyzer makes some inlining decisions on its own; currently the analyzer can’t make use of any kind of analysis or folding done by the optimization passes.

Since the analyzer works on the AST level, it will not be able to utilize the llvm analysis.