scan-build in python

hi there,

started to write a python script around Clang's static analyzer.
mainly because found difficult to use the `scan-build` script. wanted
to run against a compilation database instead of intercept the
compiler calls. but would like to preserve the current usage as
well... then at the same time, saw this page
<http://clang-analyzer.llvm.org/open_projects.html> which suggest the
rewrite as well. so, it might be a perfect match! :wink:

and as i try to copy the functionality of the `ccc-analyze` and
`scan-build` scripts, discovered few minor bugs. (or they are not
bugs, and i'm the one who can't read perl well.)

now i'm looking for a person i can ask about these scripts.

regards,
Laszlo

This is the right list for analyzer questions. Ted Kremenek, Anna Zaks, and I are the primary maintainers. What in particular is broken in ccc-analyze / scan-build?

Also, being able to run the analyzer against a compilation database seems like a good feature. I think the auxiliary "clang-check" executable can already do this with its -analyze option, though I haven't tried it, but it doesn't allow for much customization of the output like scan-build does. Still, it could give you a place to start.

  For example, to run clang-check on all files in a subtree of the
  source tree, use:

    find path/in/subtree -name '*.cpp'|xargs clang-check

  or using a specific build path:

    find path/in/subtree -name '*.cpp'|xargs clang-check -p build/path

Jordan

thanks Jordan for the reply.

This is the right list for analyzer questions. Ted Kremenek, Anna Zaks, and I are the primary maintainers.
What in particular is broken in ccc-analyze / scan-build?

small thing i found, when `ccc-analyzer` processes the arguments, take
'-m.*' and put into the CompileOpts. then a few lines bellow there is
a CompilerLinkerOptionMap lookup which has 5 parameters starting with
'-m'. which implies they never gonna end up in CompileOpts _and_
LinkOpts.

also wondered many times where are these flags coming from? like
'-arch'? clang/gcc does not have it. (found a few others which are
also not found in clang/gcc.) which compiler tries the 'ccc-analyze'
simulate?

Also, being able to run the analyzer against a compilation database seems like a good feature. I think the auxiliary "clang-check" executable can already do this with its -analyze option, though I haven't tried it, but it doesn't allow for much customization of the output like scan-build does. Still, it could give you a place to start.

was not aware about 'clang-check' existence. although the usage of it
i find a little bit ugly. :slight_smile: guess it's the tooling library which make
it like this. will look at it to see how it works. thanks.

regards,
Laszlo

thanks Jordan for the reply.

This is the right list for analyzer questions. Ted Kremenek, Anna Zaks, and I are the primary maintainers.
What in particular is broken in ccc-analyze / scan-build?

small thing i found, when `ccc-analyzer` processes the arguments, take
'-m.*' and put into the CompileOpts. then a few lines bellow there is
a CompilerLinkerOptionMap lookup which has 5 parameters starting with
'-m'. which implies they never gonna end up in CompileOpts _and_
LinkOpts.

Ah, good catch! Fixed in r193184.

also wondered many times where are these flags coming from? like
'-arch'? clang/gcc does not have it. (found a few others which are
also not found in clang/gcc.) which compiler tries the 'ccc-analyze'
simulate?

-arch is a Darwin-specific option that provides a shorthand for -target on a few common Apple architectures. It's only enabled if the default target triple is a Darwin OS.

Jordan

hi Jordan,
hi everyone,

i'm still trying the python rewrite of `scan-build` and found things
which does not seem logical to me. could you help me to understand
these issues?

ccc-analyzer: Analyze method around line 197. it appends
`-analyzer-display-progress` to a list with `-Xclang`. and next to it
there is a for loop which injects `-Xclang` infront of the arguments.
which will end up 3 times `-Xclang`. if i read it correctly.

ccc-analyzer: Analyze method around line 174. it checks language match
to `header`. i think it never will have that value. because the caller
of that method takes the language from the given parameters. (it can't
be detected by file name extension, since `%LangMap` does not have
'header' value.) and then call the `Analyze` method only if the
language is one of those declared in `%LangsAccepted`.

and i have an extra question. :slight_smile: why there is a lookup for default
compiler? (the value of `$Compiler`) to me it does not seems logical
to forward even the original arguments to `gcc` when we try to run
`clang` afterwards. in case of the sources compile only with `gcc`,
then Clang's static analyser will report problem. (what report will be
that?) in case of the source compiles only with Clang, then
`scan-build` crashes the build on `gcc`. wouldn't be more
consistent/less error-prone to call always `clang` on every platform?

regards,
Laszlo

hi Jordan,
hi everyone,

i’m still trying the python rewrite of scan-build and found things
which does not seem logical to me. could you help me to understand
these issues?

Hi, Laszlo. Thanks for doing this.

ccc-analyzer: Analyze method around line 197. it appends
-analyzer-display-progress to a list with -Xclang. and next to it
there is a for loop which injects -Xclang infront of the arguments.
which will end up 3 times -Xclang. if i read it correctly.

The loop iterates over @AnalyzeArgs rather than @Args, which is coming in from outside the function. We could push onto @AnalyzeArgs within the Analyze routine and then have a single “-Xclang”-adding pass, but this way is not incorrect. Feel free to restructure the function in your rewrite, though.

ccc-analyzer: Analyze method around line 174. it checks language match
to header. i think it never will have that value. because the caller
of that method takes the language from the given parameters. (it can’t
be detected by file name extension, since %LangMap does not have
‘header’ value.) and then call the Analyze method only if the
language is one of those declared in %LangsAccepted.

I would guess this is a holdover from when Xcode would invoke the compiler to process PCH files (notice the reference to “gch”). I agree with your diagnosis that this is now dead code.

and i have an extra question. :slight_smile: why there is a lookup for default
compiler? (the value of $Compiler) to me it does not seems logical
to forward even the original arguments to gcc when we try to run
clang afterwards. in case of the sources compile only with gcc,
then Clang’s static analyser will report problem. (what report will be
that?) in case of the source compiles only with Clang, then
scan-build crashes the build on gcc. wouldn’t be more
consistent/less error-prone to call always clang on every platform?

Clang and GCC have largely-compatible interfaces, but at the time scan-build was written Clang wasn’t up to par with GCC in a lot of ways. (It is an old program.) Even today, it’s still possible to have programs that compile with GCC but not with Clang. Since scan-build is interposing on the build process, we still need to build those files, and therefore we should choose the compiler most likely to compile them, even if it means we can’t analyze a particular file. Always choosing Clang would mean that some projects would fail to build, which could cause downstream issues and keep us from analyzing every file after this one.

That said, Clang is now the default compiler on OS X, so we use that as the default when running scan-build there. But mostly it’s just a guess: what’s most likely the best option on this platform? As much as we like Clang, the answer is probably still GCC.

That said, it would remove complexity to just say “we can only analyze your project if it builds with Clang”. Ted, Anna, and I have talked about that idea before. But there’s no urgent need to switch.

Jordan

hi Jordan,
hi All,

to continue with the python rewrite on the 'scan-build'. i reached to
the point when the 'ccc-analyzer' and 'c++-analyzer' re-implemented.
now i would like to do some regression testing. (to compare the output
of the perl and python implementation.) my question would be: is there
any test suite against scan-build which is suitable for such test?

regards,
Laszlo

Excellent news. Could you share the code somewhere?

FYI, Alexey, as cc, will work as part of the GSoC to improve scan-build.
The main idea is to have a database behind to keep track of the bugs,
tag them, have
some history information, etc.
Hacking on a scan-build in Python would be great (especially with stuff
like sqlalchemy)

Thanks,
Sylvestre

hi Sylvestre,
hi Alexey,

you can find my rewrite attempt here: <https://github.com/rizsotto/Beye>

my initial focus was only to use compilation database as input and execute less processes, but parallel during the check. then i realized how big and error prone is that work. :slight_smile: so, the re-implementation could be easier to verify if i keep the build wrappers. therefore i targeted the ‘ccc-analyzer’ to rewrite first… now i’m here and looking for a nice test subject.

regards,
Laszlo

We have a test suite, but it has internal Apple projects as well as open-source projects. I can either run the suite locally with your changed ccc-analyzer or try to strip out the internal parts...what would you prefer?

Jordan

thanks Jordan for your reply. i would be interested in the striped version of the test set. it would give me more more insights how the current implementation works. (i do not trust my perl reading skill :))

This is the right list for analyzer questions. Ted Kremenek, Anna Zaks,
and I are the primary maintainers. What in particular is broken in
ccc-analyze / scan-build?

Also, being able to run the analyzer against a compilation database seems
like a good feature. I think the auxiliary "clang-check" executable can
already do this with its -analyze option, though I haven't tried it, but it
doesn't allow for much customization of the output like scan-build does.
Still, it could give you a place to start.

Just an additional FYI: clang-tidy also allows to run static analyzer
checks and supports running via a compilation database.

Here it is: https://attache.apple.com/AttacheWeb/dl?id=ATCf722062eedd34dba96a56bde1310ed41&ek=ATChtmkeWLieQe1gA7V22FwhA%3D%3D

(no promises about the lifetime of that link to anyone discovering this e-mail in the cfe-dev archives in the future)

It’s just a snapshot of a few open-source projects, along with reference results (which I regenerated before making the archive). Unfortunately, we haven’t gone through all the results on a recent analyzer build to verify that they’re all desired or have bugs filed, but for testing output identity it should be good enough. The tests are meant to be run with utils/analyzer/SATestBuild.py, with scan-build and clang in your path.

It’s a small number of projects, unfortunately, so it’s not really exercising scan-build that much, but it’s something.

Jordan

hi Jordan,
hi all,

i have a question related to ccc-analyzer. it is about the Analyze method (line 272). it does check the $IncludeParserRejects variable, which is initialized to zero once and has a comment “Set this to 1 if we want to include ‘parser rejects’ files.” my question is: who will set that to 1? shall it be an environment which turn this on? can i leave it out from the rewritten version?

thanks,
Laszlo

ps.: today i could fetch the attached file. thanks for that!

I'm not sure. Ted, do you remember what this was for?

Laszlo, you can just leave it out for now. It's certainly not something most people are using if it's off by default and not exposed in scan-build.

Jordan

hi all,

Jordan, thanks for your reply. i did leave out this option for a while. :slight_smile:

and i got another question: about ‘-isysroot’ flag uniqueness checking. (ccc-analyzer: line 520) the current behavior insert the first usage of this flag. (although it does not check the ‘–sysroot’ uniqueness.) i’m wondering that, shall this wrapper change or correct incorrect invocations?

i did run a test against gcc 4.9 on linux. (which shows that actually the last ‘-isysroot’ flag wins.) here it comes:

$ gcc -c functional_test/divide_zero.cpp
$ gcc -c functional_test/divide_zero.cpp -isysroot /

$ gcc -c functional_test/divide_zero.cpp -isysroot /tmp -isysroot /

$ gcc -c functional_test/divide_zero.cpp -isysroot / -isysroot /tmp
In file included from functional_test/divide_zero.cpp:1:0:
/usr/include/c++/4.9.0/cassert:43:20: fatal error: assert.h: No such file or directory
#include <assert.h>
^
compilation terminated.

$

another test against clang 3.4 on linux (which shows that it ignores ‘-isysroot’ on this platform) here it comes:

$ clang -c functional_test/divide_zero.cpp -isysroot /tmp -isysroot /

$ clang -c functional_test/divide_zero.cpp -isysroot / -isysroot /tmp
$ clang -c functional_test/divide_zero.cpp -isysroot /tmp
$ clang -c functional_test/divide_zero.cpp
$

what do you think? or am i reading the perl code wrong about the ignorance of the multiple ‘-isysroot’?

regards,
Laszlo

hi there,

Wow, nice work!

I’m a bit concerned about this note:

IIRC this happens when the analyzer crashes for some reason. I can’t remember what happens if the file merely fails to compile, but you can test the crash case with “#pragma clang __debug crash”.

I would guess we collect link options because $CC or $CXX is often used for linking as well as building—remember that scan-build actually builds a project in addition to analyzing it. I’m not sure how we detect that, though, or if it’s still necessary.

And sorry for completely missing the -isysroot question. I agree, checking that explicitly seems unnecessary.

I’m on vacation for the rest of the week, so I wouldn’t be able to get back to you right away. Anna and Ted probably need to weigh in on if we want to take this into Clang proper (and we’d have to test it on OS X, including Xcode integration). If so, then yes, we would want to use a non-external unit test format. I’ve never worked with them, but it looks like we have tests for the Python bindings to libclang in bindings/python/tests.

One thing I am a bit concerned about is complexity—not that the old code was simple, but some of analyzer.py seems rather opaque. (Does the CPS model really help that much?) The style also doesn’t always seem consistent throughout the code (spaces around assignment operators was the main thing that jumped out at me).

But overall, nice work, and I hope I or Anna gets the chance to try it out soon.
Jordan

Ah, I see it’s not actually building the project, just using a compilation database. I don’t think that covers all the existing uses of scan-build, though, so I’m not sure that’s good enough. (Consider cases where some of the source files are generated. The Xcode project thing is another issue entirely, but that uses a separate code path with modern Xcodes anyway.)

Jordan

hi Jordan,

thanks for your answer.

  • i might have been confusing with the README file. as you recognized the ‘beye’ does not build the project. that’s right. but i do not propose to replace ‘scan-build’ with ‘beye’ now. :slight_smile: what i am offering here is the ‘ccc-analyzer’ and ‘c+±analyzer’. (as far as i remember somebody else also working on to rewrite ‘scan-build’ as part of SoC. that’s why i not invested much about that part.)

  • will check about the style issue. (although it comply with PEP8. runs continuously on every build.)

  • about CPS (continuation-passing style). i think really it makes easier to test the parts. if you tell me which part you find difficult to read, i would try to simplify it more.

  • about the tests. i will work on to rewrite them based on ‘unittest’ instead of ‘nose’. so, testing won’t need external dependencies either.

  • about ‘parser_reject’ reports. it does report when Clang exit with non zero exit code. (the original scripts are reports: ‘crash’, ‘other_error’, ‘parser_reject’ and ‘attribute_ignored’. the ‘attribute_ignored’ guarded with a if condition which will never be true, the ‘parser_reject’ was controlled by some variable which is assigned to false and never changed.) so, broken compilation report are generated. those are the ‘crash’ or ‘other_error’ depends on the exit code.

  • about the link options: can you say if i run something like ‘clang test.c -o a.out -lmylib -L…’ this is linking and not compilation. the ‘-lmylib -L…’ goes to link options. but then i run the analyzer as ‘clang --analyze test.c …’ without the link options it will works just fine. won’t it? analyzer does not execute the link phase, does it?

thanks again your feedback. i will fix these issues and come back. enjoy your vacation!

regards,
Laszlo