Proposal: Integrate static analysis test suites

_Alexander_G_Riccio1 · December 8, 2015, 2:50am

First time Clang contributor here,

I'd like to add the "C Test Suite for Source Code Analyzer v2", a
relatively small test suite (102 cases/flaws), some of which Clang
doesn't yet detect*. See link at bottom.

Immediate questions:
0. Does the Clang community/project like the idea?
1. What's the procedure for including new tests? (not the technical,
but the community/project).
2. How do I include failing tests without breaking things? Some of
these tests will fail - that's why I'm proposing their inclusion - but
they shouldn't yet cause the regression testing system to complain.
3. How does Clang handle licensing of third party code? Some of these
tests are clearly in the public domain (developed at NIST, says "in
the public domain"), but others are less clearly licensed.

Should the community accept that testsuite, and I successfully add
that test suite, then I'd like to step it up a bit, and include the
"Juliet Test Suite for C/C++". "Juliet" is a huge test suite by the
NSA Center for Assured Software & NIST's Software Assurance Metrics
And Tool Evaluation project, which has 25,477 test cases (!!) for 118
CWEs. I don't think any other open source compiler could compete with
Clang after this. There's a ton of literature on the "Juliet" suite,
and listing it here is not necessary.

This project would be my first Clang contribution

Personally, I'm interested in static analysis, and this is the first
step in understanding & improving Clang's static analysis
capabilities.

I have some ideas on how to detect the currently undetected bugs, and
I'm curious to see where things lead.

Secondary questions:
1. How should I break the new tests up into patches? Should I just
whack the whole 102 case suite into a single patch, or a bunch of
smaller ones?
2. How does the Clang/LLVM static analysis testing infrastructure
work? I'm going to have to figure this out myself anyways, but where
should I start? Any tips on adding new tests?

*If I remember correctly,
Test Case 149055 - NIST Software Assurance Reference Dataset passes
analysis without complaint. I manually spot checked a very small
number of tests.

"C Test Suite for Source Code Analyzer v2" (valid code):

"C Test Suite for Source Code Analyzer v2" (invalid code):

"Juliet Test Suite for C/C++" (files):
https://samate.nist.gov/SRD/testsuites/juliet/Juliet_Test_Suite_v1.2_for_C_Cpp.zip
"Juliet Test Suite for C/C++" (docs):
https://samate.nist.gov/SRD/resources/Juliet_Test_Suite_v1.2_for_C_Cpp_-_User_Guide.pdf

Sincerely,
Alexander Riccio

Aaron_Ballman · December 10, 2015, 9:04pm

First time Clang contributor here,

I'd like to add the "C Test Suite for Source Code Analyzer v2", a
relatively small test suite (102 cases/flaws), some of which Clang
doesn't yet detect*. See link at bottom.

Immediate questions:
0. Does the Clang community/project like the idea?

I've included a few other devs (CCed) to get further opinions.

I like the idea of being able to diagnose the issues covered by the
test suite, but I don't think including the test suite by itself is
particularly useful without that goal in mind. Also, one question I
would have has to do with the licensing of the tests themselves and
whether we would need to do anything special there.

1. What's the procedure for including new tests? (not the technical,
but the community/project).

Getting the discussion going about the desired goal (as you are doing)
is the right first step.

2. How do I include failing tests without breaking things? Some of
these tests will fail - that's why I'm proposing their inclusion - but
they shouldn't yet cause the regression testing system to complain.

Agreed, any test cases that are failing would have to fail gracefully.
I assume that by failure, you mean "should diagnose in some way, but
currently does not". I would probably split the tests into two types:
one set of tests that properly diagnose the issue (can be checked with
FileCheck or -verify, depending on the kind of tests we're talking
about), and one set of tests where we do not diagnose, but want to see
them someday (which can be tested with expect-no-diagnostics, for
example). This way, we can ensure test cases continue to diagnose when
we want them to, and we can be alerted when new diagnostics start to
catch previously uncaught tests. This is assuming that it makes sense
to include all of the tests at once, which may not make sense in
practice.

3. How does Clang handle licensing of third party code? Some of these
tests are clearly in the public domain (developed at NIST, says "in
the public domain"), but others are less clearly licensed.

Oh look, you asked the same question I asked. If the tests are in
the public domain and clearly state as such, I think we can go ahead
and include them. If the other tests are not clearly licensed, we
should try to get NIST to clarify the license of them before
inclusion. Depending on the license, we may be able to include them
under their original license. If we cannot clarify the license, I
would guess that we simply should not include those tests as part of
our test suite. Note: I could be totally wrong, IANAL.

Should the community accept that testsuite, and I successfully add
that test suite, then I'd like to step it up a bit, and include the
"Juliet Test Suite for C/C++". "Juliet" is a huge test suite by the
NSA Center for Assured Software & NIST's Software Assurance Metrics
And Tool Evaluation project, which has 25,477 test cases (!!) for 118
CWEs. I don't think any other open source compiler could compete with
Clang after this. There's a ton of literature on the "Juliet" suite,
and listing it here is not necessary.

This project would be my first Clang contribution

Personally, I'm interested in static analysis, and this is the first
step in understanding & improving Clang's static analysis
capabilities.

I have some ideas on how to detect the currently undetected bugs, and
I'm curious to see where things lead.

Adding the tests by themselves is not necessarily interesting to the
project unless they exercise the compiler in ways it's not currently
being exercised. So just having tests for the sake of having the tests
is not too useful (IMO). However, if the goal is to have the tests
because you would like to make efforts to have the compiler diagnose
their cases properly, that's far more interesting and a good reason to
bring in the tests.

One possible approach if you are interested in having the compiler
diagnose the cases is to bring the tests in one at a time. Start with
the initial batch of "these are diagnosed properly", then move on to
"this test is diagnosed properly because of this patch." Eventually
we'll get to the stage where all of the tests are diagnosed properly.

Secondary questions:
1. How should I break the new tests up into patches? Should I just
whack the whole 102 case suite into a single patch, or a bunch of
smaller ones?

See comments above.

2. How does the Clang/LLVM static analysis testing infrastructure
work? I'm going to have to figure this out myself anyways, but where
should I start? Any tips on adding new tests?

http://clang-analyzer.llvm.org/checker_dev_manual.html

Another good place for some of these checkers may be clang-tidy, or
the compiler frontend itself. It's likely to depend on case-by-case
code patterns.

http://clang.llvm.org/extra/clang-tidy/

Thank you for looking into this!

~Aaron

_Alexander_G_Riccio · December 17, 2015, 7:01pm

However, if the goal is to have the tests
because you would like to make efforts to have the compiler diagnose
their cases properly, that’s far more interesting and a good reason to
bring in the tests.

That’s exactly my intention. Improving the static analyzer to detect these cases, that will be interesting.
placeholder text

If the other tests are not clearly licensed, we
should try to get NIST to clarify the license of them before
inclusion.

That sounds like the best idea, as a government agency, they almost certainly have lawyers.

I think the next step is to integrate the working (error correctly diagnosed) tests, only those that are obviously in the public domain, and propose them as a big batched patch. This shouldn’t itself be controversial.

How exactly do I submit a patch? I see that the LLVM developer policy says to send it to the mailing list (cfe-commits), but I also see that Phabricator comes into this somewhere?

Aaron_Ballman · December 17, 2015, 9:09pm

However, if the goal is to have the tests
because you would like to make efforts to have the compiler diagnose
their cases properly, that's far more interesting and a good reason to
bring in the tests.

That's exactly my intention. Improving the static analyzer to detect these
cases, that will be interesting.
placeholder text

If the other tests are not clearly licensed, we
should try to get NIST to clarify the license of them before
inclusion.

That sounds like the best idea, as a government agency, they almost
certainly have lawyers.

I think the next step is to integrate the working (error correctly
diagnosed) tests, only those that are obviously in the public domain, and
propose them as a big batched patch. This shouldn't itself be controversial.

How exactly do I submit a patch? I see that the LLVM developer policy says
to send it to the mailing list (cfe-commits), but I also see that
Phabricator comes into this somewhere?

Using phabricator is the preferred approach these days. Either way,
you make your patch file (svn format), pick some reviewers (the people
listed in this email are all good choices), and submit the patch to
the mailing list. With phab this is done by putting cfe-commits in the
subscribers line, and the individual reviewers on the reviewers line.
With email, it's just the To: and CC: fields.

~Aaron

_Alexander_G_Riccio1 · December 21, 2015, 11:21pm

Grr. This was sent from the wrong account.

AnnaZaks · December 28, 2015, 5:23am

However, if the goal is to have the tests
because you would like to make efforts to have the compiler diagnose
their cases properly, that’s far more interesting and a good reason to
bring in the tests.

That’s exactly my intention. Improving the static analyzer to detect these cases, that will be interesting.
placeholder text

If the other tests are not clearly licensed, we
should try to get NIST to clarify the license of them before
inclusion.

That sounds like the best idea, as a government agency, they almost certainly have lawyers.

I think the next step is to integrate the working (error correctly diagnosed) tests, only those that are obviously in the public domain, and propose them as a big batched patch. This shouldn’t itself be controversial.

How exactly do I submit a patch? I see that the LLVM developer policy says to send it to the mailing list (cfe-commits), but I also see that Phabricator comes into this somewhere?

Devin has started writing scripts for running additional analyzer tests as described in this thread:
http://clang-developers.42468.n3.nabble.com/analyzer-Adding-build-bot-for-static-analyzer-reference-results-td4047770.html

The idea was to check out the tests/projects from the existing repos instead of copying them. Would it be possible to do the same with these tests?

Sorry for not replying sooner!
Anna.

_Alexander_G_Riccio1 · January 2, 2016, 8:45pm

Devin has started writing scripts for running additional analyzer tests as described in this thread:

A buildbot sounds like the perfect idea!

The idea was to check out the tests/projects from the existing repos instead of copying them. Would it be possible to do the same with these tests?

Eh? What do you mean? Would that stop someone from running them in the clang unit test infrastructure?

I believe that these tests WILL need to be modified to run in the Clang testing infrastructure.

Is there any way to treat static analyzer warnings as plain old warnings/errors? Dumping them to a plist file from a command line compilation is a bit annoying, and I think is incompatible with the clang unit testing infrastructure?

devincoughlin · January 3, 2016, 9:02pm

Hi Alexander,

This sounds like an exciting project.

Devin has started writing scripts for running additional analyzer tests as described in this thread:

A buildbot sounds like the perfect idea!

The idea was to check out the tests/projects from the existing repos instead of copying them. Would it be possible to do the same with these tests?

Eh? What do you mean? Would that stop someone from running them in the clang unit test infrastructure?

Yes. There are separate build bot scripts to detect regressions on real-world projects. These scripts (in clang/utils/analyzer/) run scan-build on projects and compare the analysis results to expected reference results, causing an internal build bot to fail when there is a difference. We use these scripts to maintain coverage on real-world code, where analysis time is often much too long to run as part of clang’s normal hand-crafted, minimized regression tests. As Anna alluded to above, these scripts can also be used to avoid checking in benchmark source code to the reference results repository. Apple uses these scripts internally to detect analyzer regressions — and we will be adding a public-facing build bot to Green Dragon <http://lab.llvm.org:8080/green/> in the relatively near future with reference results checked into a public llvm repository.

As to whether these tests should be run as part of clang’s regular regression tests or as a separate build bot, I think there are two key questions:

Are the tests licensed under the UIUC license? Any code contributed to clang needs to be under the UIUC license. If these tests are not, we can use the download-and-patch strategy that Anna mentioned. The scripts in clang/utils/analyzer/ would probably be useful here — although you might have to write a harness to build the tests if one does not already exist.
How long does it take to run these tests? If it takes minutes, they are probably better suited to running on a separate build bot.

Is there any way to treat static analyzer warnings as plain old warnings/errors? Dumping them to a plist file from a command line compilation is a bit annoying, and I think is incompatible with the clang unit testing infrastructure?

The tests in clang/tests/Analysis use the same lit.py-based infrastructure as the rest of clang and can use the same “ // expect-warning {{…}}” annotations — so in general, there is no need to dump to a plist. We do dump plists in some cases to test that proper path diagnostics are being generated (see, for example, test/Analysis/null-deref-path-notes.m), but for most tests these aren’t needed.

Devin

AnnaZaks · January 4, 2016, 8:05pm

Devin has started writing scripts for running additional analyzer tests as described in this thread:

A buildbot sounds like the perfect idea!

The idea was to check out the tests/projects from the existing repos instead of copying them. Would it be possible to do the same with these tests?

Eh? What do you mean? Would that stop someone from running them in the clang unit test infrastructure?

I believe that these tests WILL need to be modified to run in the Clang testing infrastructure.

Currently, the analyzer is only tested with the regression tests. However, those need to be fast (since they effect all clang developers) and they have limited coverage. Internally, we’ve been testing the analyzer with the test scripts Devin described in the email I referenced. We use that testing method to analyze whole projects and long running tests. Those tests can and should be executed separately as they take more than an hour to complete. The plan is to set up an external builedbot running those tests.

How long would it take to analyze the tests you are planning to add? Depending on the answer to that question, adding your tests to the new builedbot might be a better fit than adding them to the regression tests.

I also prefer not to modify the externally written tests since it would allow us to update them more easily, for example, when a new version of the tests comes out.

Is there any way to treat static analyzer warnings as plain old warnings/errors? Dumping them to a plist file from a command line compilation is a bit annoying, and I think is incompatible with the clang unit testing infrastructure?

Plist output is one if the outputs that the clang static analyzer supports. It is a much richer format than the textual warning since it contains information about the path on which the error occurred. We did have some lit tests checking plist output as well.

_Alexander_G_Riccio1 · January 20, 2016, 6:11am

A quick update on this project:

I’ve been slowed by a technical issue, and I lost ~2 weeks as two family members were in the hospital (sorry!).

Since I develop on Windows, I quickly hit a testcase that clang didn’t detect, as I discussed in “Clang on Windows fails to detect trivial double free in static analysis”.

That resulted in D16245, which (when accepted) fixes that issue. I want to ensure that novice can simply pass “–analyze”, and clang to “just work”, so I’ve intentionally put off further testing work. Otherwise, I could hack around it, and subsequently forget about the workaround. Once that’s dealt with, then I can resume work at a faster pace.

_Alexander_G_Riccio1 · January 24, 2016, 8:58am

Since that patch landed, I’ve manually run ~30 of the SAMATE/SARD tests, and so far, Clang has missed 5 stack buffer overruns, 4 heap buffer overruns, and a couple of format string issues. Clang seems a bit better with double-free/use-after-free issues, and leak issues.

So it looks like there’s some good stuff here, and we’ll have a pretty specific set of things to work on!

Pretty cool, eh?

AnnaZaks · January 25, 2016, 4:48pm

Since that patch landed, I’ve manually run ~30 of the SAMATE/SARD tests, and so far, Clang has missed 5 stack buffer overruns, 4 heap buffer overruns,

This is not surprising, the static analyzer does not catch buffer overflows. We do have an experimental checker for it but it is not very strong. One of the main issues is that the solver we use does not reason about relational constraints that involve 2 symbols (ex: i < n).

and a couple of format string issues.

Would those be caught with compiler warnings? (Try running clang on them with -Weverything.)

Clang seems a bit better with double-free/use-after-free issues, and leak issues.

So it looks like there’s some good stuff here, and we’ll have a pretty specific set of things to work on!

Thanks!
Anna.

_Alexander_G_Riccio1 · January 28, 2016, 7:16am

This is not surprising, the static analyzer does not catch buffer overflows. We do have an experimental checker for it but it is not very strong.

Personally, I think detecting stack overruns is a very valuable capability of a static analysis tool. Getting Clang to detect this issue with the default options should be a high priority.

One of the main issues is that the solver we use does not reason about relational constraints that involve 2 symbols (ex: i < n).

Stepping through the checker code over the past couple days, I can see this: it appears to “brute force” array accesses, evaluating loop conditions every single time. When checking the attached minimized case, enabling only the “-analyzer-checker=alpha.security.ArrayBoundV2”, Clang seemed to bail out at around the third access. That makes sense, as the default value for analyzer-max-loop is 4. Indeed, if I bump “analyzer-max-loop” up 11 (no pun intended), then Clang catches the issue. Console command line attached.

As I think you alluded to, this sort of checking is best done “algebraically” instead of with brute force. I’m not really sure how to implement that sort of “algebraic” checker, but I’ve been curious about the problem for several years now. By discovering Clang’s weaknesses in static analysis, and subsequently fixing them, I’ll learn exactly that. I usually learn best when I learn “the hard way”, and this seems like the perfect opportunity.

Side note: during any of the optimization passes does Clang/LLVM do any kind of loop bound access analysis? Perhaps we can use that info to evaluate relational constraints that govern array access?

Would those be caught with compiler warnings? (Try running clang on them with -Weverything.)

Actually, they do seem to be caught when compiler warnings are turned on… but they ONLY warn when NOT passing --analyze? Huh?

I think diagnosing format string misuse during normal compilation is a fantastic idea - MSVC until 2015 required that you run /analyze, which very few people actually do* - but I’m not used to the idea that I can’t run analysis at the same time…

In the short term:

Now that I’m a bit familiar with the codebase I expect to finish manually running the SAMATE tests in the next couple days, and start work on getting them to run under LIT after that.

*back then it was only for Xbox 360 devs, now it’s in all editions of Visual Studio

minimized-stack_overflow-bad_commandline.txt (1.7 KB)

minimized-stack_overflow-bad.c (128 Bytes)

AnnaZaks · January 28, 2016, 7:31am

This is not surprising, the static analyzer does not catch buffer overflows. We do have an experimental checker for it but it is not very strong.

Personally, I think detecting stack overruns is a very valuable capability of a static analysis tool. Getting Clang to detect this issue with the default options should be a high priority.

One of the main issues is that the solver we use does not reason about relational constraints that involve 2 symbols (ex: i < n).

Stepping through the checker code over the past couple days, I can see this: it appears to “brute force” array accesses, evaluating loop conditions every single time. When checking the attached minimized case, enabling only the “-analyzer-checker=alpha.security.ArrayBoundV2”, Clang seemed to bail out at around the third access. That makes sense, as the default value for analyzer-max-loop is 4. Indeed, if I bump “analyzer-max-loop” up 11 (no pun intended), then Clang catches the issue. Console command line attached.

As I think you alluded to, this sort of checking is best done “algebraically” instead of with brute force. I’m not really sure how to implement that sort of “algebraic” checker, but I’ve been curious about the problem for several years now. By discovering Clang’s weaknesses in static analysis, and subsequently fixing them, I’ll learn exactly that. I usually learn best when I learn “the hard way”, and this seems like the perfect opportunity.

Side note: during any of the optimization passes does Clang/LLVM do any kind of loop bound access analysis? Perhaps we can use that info to evaluate relational constraints that govern array access?

Would those be caught with compiler warnings? (Try running clang on them with -Weverything.)

Actually, they do seem to be caught when compiler warnings are turned on… but they ONLY warn when NOT passing --analyze? Huh?

This is by design. Many more people have compiler as part of their daily flow so it’s best to have such errors being reported by the compiler.
Having the analyzer produce all of the compiler warnings is likely to be too nosy for the users.

I think diagnosing format string misuse during normal compilation is a fantastic idea - MSVC until 2015 required that you run /analyze, which very few people actually do* - but I’m not used to the idea that I can’t run analysis at the same time…

In the short term:

Now that I’m a bit familiar with the codebase I expect to finish manually running the SAMATE tests in the next couple days, and start work on getting them to run under LIT after that.

Have you considered adding the tests to be tested on the additional analyzer build bot (http://lab.llvm.org:8080/green/job/StaticAnalyzerBenchmarks/) instead of adding them to lit? (I’ve suggested that earlier.)

_Alexander_G_Riccio1 · January 28, 2016, 7:50am

This is by design. Many more people have compiler as part of their daily flow so it’s best to have such errors being reported by the compiler.
Having the analyzer produce all of the compiler warnings is likely to be too nosy for the users.

(no_i_sy)

Ahh, makes sense. 'Twas a quirk of my workflow.

Have you considered adding the tests to be tested on the additional analyzer build bot (http://lab.llvm.org:8080/green/job/StaticAnalyzerBenchmarks/) instead of adding them to lit? (I’ve suggested that earlier.)

I did, but I naively assumed that the buildbot ran some form of lit under the hood. The README for buildbot testsuite looks helpful

Aaron_Ballman · January 28, 2016, 1:53pm

<snip..>

This is by design. Many more people have compiler as part of their daily
flow so it’s best to have such errors being reported by the compiler.
Having the analyzer produce all of the compiler warnings is likely to be too
nosy for the users.

Personally, I find that design to lead to a confusing user experience.
When I run the analyzer, my mental model is that I am running the
compiler plus some additional analyses. When I don't get compiler
warnings that I would otherwise get, it feels like I (as the user)
have configured things improperly and done something wrong. Put
another way: the point to running a static analyzer is to find out
what's wrong with some code, so it's surprising that we would disable
some of those notices of what's wrong that would otherwise be enabled
by default.

Perhaps my mental model is in the minority, but it's another anecdote
to remember if this design is ever reconsidered again.

~Aaron

preames · January 30, 2016, 12:33am

I'd also find the current design slightly confusing. I generally don't expect to see *fewer* warnings when I tell the compiler to work harder unless the original warning really was a false positive.

Philip

AnnaZaks · January 30, 2016, 4:32am

<snip…>

This is by design. Many more people have compiler as part of their daily
flow so it’s best to have such errors being reported by the compiler.
Having the analyzer produce all of the compiler warnings is likely to be too
nosy for the users.

Personally, I find that design to lead to a confusing user experience.
When I run the analyzer, my mental model is that I am running the
compiler plus some additional analyses. When I don’t get compiler
warnings that I would otherwise get, it feels like I (as the user)
have configured things improperly and done something wrong. Put
another way: the point to running a static analyzer is to find out
what’s wrong with some code, so it’s surprising that we would disable
some of those notices of what’s wrong that would otherwise be enabled
by default.

Perhaps my mental model is in the minority, but it’s another anecdote
to remember if this design is ever reconsidered again.

I’d also find the current design slightly confusing. I generally don’t expect to see fewer warnings when I tell the compiler to work harder unless the original warning really was a false positive.

By calling "$clang —analyze” you are not calling the compiler and asking it to work harder. You are calling another tool that is not going to compile for you but rather provide deep static code analysis. Calling “clang —analyze” could call the compiler behind the scenes and report the compiler warnings in addition to the static analyzer issues. However, when warnings from both tools are merged in a straightforward way on command line, the user experience could be confusing. For example, both tools report some issues such as warning on code like this:
int j = 5/0; // warning: Division by zero
// warning: division by zero is undefined [-Wdivision-by-zero]

Most importantly, end users should never invoke the analyzer by calling “clang —analyze” since “clang —analyze” is an implementation detail of the static analyzer. The only documented user facing clang static analysis tool is scan-build (see http://clang-analyzer.llvm.org). Here are some reasons for that. For one, it is almost impossible to understand why the static analyzer warns without examining the error paths. Second, the analyzer could be extended to perform whole project analysis in the future and “clang —analyze” works with a single TU at a time.

I agree that the best user experience is to report all warnings in one place, while still differentiating which warning was reported by which tool. It would be awesome if the results from all bug finding tools such as the clang static analyzer, the compiler, and clang-tidy would be reported through the same interface.

The CodeChecker team is working on a solution for that and I hope we can incorporate their technology in LLVM/clang.

AnnaZaks · January 30, 2016, 4:33am

<snip…>

This is by design. Many more people have compiler as part of their daily
flow so it’s best to have such errors being reported by the compiler.
Having the analyzer produce all of the compiler warnings is likely to be too
nosy for the users.

Personally, I find that design to lead to a confusing user experience.
When I run the analyzer, my mental model is that I am running the
compiler plus some additional analyses. When I don’t get compiler
warnings that I would otherwise get, it feels like I (as the user)
have configured things improperly and done something wrong. Put
another way: the point to running a static analyzer is to find out
what’s wrong with some code, so it’s surprising that we would disable
some of those notices of what’s wrong that would otherwise be enabled
by default.

Perhaps my mental model is in the minority, but it’s another anecdote
to remember if this design is ever reconsidered again.

I’d also find the current design slightly confusing. I generally don’t expect to see fewer warnings when I tell the compiler to work harder unless the original warning really was a false positive.

By calling "$clang —analyze” you are not calling the compiler and asking it to work harder. You are calling another tool that is not going to compile for you but rather provide deep static code analysis. Calling “clang —analyze” could call the compiler behind the scenes and report the compiler warnings in addition to the static analyzer issues. However, when warnings from both tools are merged in a straightforward way on command line, the user experience could be confusing. For example, both tools report some issues such as warning on code like this:
int j = 5/0; // warning: Division by zero
// warning: division by zero is undefined [-Wdivision-by-zero]

Most importantly, end users should never invoke the analyzer by calling “clang —analyze” since “clang —analyze” is an implementation detail of the static analyzer. The only documented user facing clang static analysis tool is scan-build (see http://clang-analyzer.llvm.org). Here are some reasons for that. For one, it is almost impossible to understand why the static analyzer warns without examining the error paths. Second, the analyzer could be extended to perform whole project analysis in the future and “clang —analyze” works with a single TU at a time.

I agree that the best user experience is to report all warnings in one place, while still differentiating which warning was reported by which tool. It would be awesome if the results from all bug finding tools such as the clang static analyzer, the compiler, and clang-tidy would be reported through the same interface.

The CodeChecker team is working on a solution for that and I hope we can incorporate their technology in LLVM/clang.

_sean_silva · January 30, 2016, 5:12am

<snip..>

This is by design. Many more people have compiler as part of their daily
flow so it’s best to have such errors being reported by the compiler.
Having the analyzer produce all of the compiler warnings is likely to be
too
nosy for the users.

Personally, I find that design to lead to a confusing user experience.
When I run the analyzer, my mental model is that I am running the
compiler plus some additional analyses. When I don't get compiler
warnings that I would otherwise get, it feels like I (as the user)
have configured things improperly and done something wrong. Put
another way: the point to running a static analyzer is to find out
what's wrong with some code, so it's surprising that we would disable
some of those notices of what's wrong that would otherwise be enabled
by default.

Perhaps my mental model is in the minority, but it's another anecdote
to remember if this design is ever reconsidered again.

I'd also find the current design slightly confusing. I generally don't
expect to see *fewer* warnings when I tell the compiler to work harder
unless the original warning really was a false positive.

By calling "$clang —analyze” you are not calling the compiler and asking
it to work harder. You are calling another tool that is not going to
compile for you but rather provide deep static code analysis. Calling
"clang —analyze" could call the compiler behind the scenes and report the
compiler warnings in addition to the static analyzer issues. However, when
warnings from both tools are merged in a straightforward way on command
line, the user experience could be confusing. For example, both tools
report some issues such as warning on code like this:
int j = 5/0; // warning: Division by zero
// warning: division by zero is undefined
[-Wdivision-by-zero]

Most importantly, end users should never invoke the analyzer by calling
“clang —analyze” since “clang —analyze” is an implementation detail of the
static analyzer. The only documented *user facing* clang static analysis
tool is scan-build (see http://clang-analyzer.llvm.org).

--analyze is in `clang -help`. Also, clang-check advertises a `-analyze`
option which was clearly intentionally added. So it seems spurious to say
that the only user-facing way to invoke the analyzer is scan-build.

In fact, anecdotally I seem to remember --analyze as being considered a
user-facing option. I'm pretty sure that if I go digging back through old
devmtg slides or whatnot I'll find a presentation recommending its use.

-- Sean Silva

Topic		Replies	Views
Static analysis tool development Clang Frontend	21	199	January 21, 2009
Clang Analysis of several open source projects. Clang Frontend	24	240	May 13, 2011
my experience with clang Clang Frontend	13	152	January 13, 2008
Idea for better invoking static analysis via command line Clang Frontend	15	157	February 5, 2016
Clang analyzer Google Summer of Code ideas/proposals Clang Frontend	3	144	April 7, 2010

Proposal: Integrate static analysis test suites

Related topics