ccc-analyzer progression/regression

Hi.

I really love the concept of static analysis, and would like to
contribute somehow. However, I am not a programmer myself, so i was
wondering what i could do to contribute to the ccc-analyzer ?

I have the following idea:

1.) Let clang/ccc-analyzer devs pick a hand full reasonably well known
/ used open source projects
    preferably code bases that the ccc-analyzer devs are at least
somewhat familiar with
2.) Let me run the analyzer on these code bases with the latest trunk
clang/ccc-analyzer
3.) Let me post the results on a website
4.) a while later (some months ?) I could run the latest rev of
clang/ccc-analyzer on the same versions of the chosen codebases again

Then perhaps the differences in the results of running analyzer rev x
on codebase y versus running analyzer rev x+1 on the same code base y
could provide some insight into how well the analyzer is progressing
on real world code bases ?

The core idea here is not to find issues in the latest version of any
particular codebase, but rather to discover progression/regression of
the static analyzer. Of course, this would probably work best if the
code being analyzed are both 'reasonably well/widely used' real word
code bases, and the clang/ccc-analyzer devs are fairly familiar with
these code bases so that they can relatively easily interpret the
(differences in) results.

Just my 2$... Could this be useful ?

Let me know what you think ?

Regards,

John Smith.

Hi John,

I think that tracking the progress of the analyzer is very important.

Actually, we are in the process of setting up an internal buildbot which would regularly analyze several open source projects(Python, openssl, postresql, Adium,…). The goal is to be able to spot regressions as quickly as possible as well as measure the progress. There are several progress metrics which one could use, for example:

  • reduction of analyzer failures/crashes
  • number of new bugs reported / number of false positives removed
  • increase in speed
  • memory consumption reduction

Going back to your proposal, I think it would be helpful to:

  1. Run the analyzer on a bunch of projects and report failures if any. (Ideally, these would be the projects which are not being tested in our buildbot.)
  2. In regards to the analyzer reports, it would be VERY helpful to categorize the (subset of) bugs into real bugs and false positives. An increase in the number of bugs reported might mean both improvement and regression in the analyzer. (Regression might be due to the analyzer reporting more false positives.) Since you are not a programmer, I am not sure if you are interested in doing this.
  3. A much more ambitious project would be setting up an external buildbot, which would test the open source projects and, possibly, allow others to categorize the reports. It could be something similar to Coverity’s open source scan results (http://scan.coverity.com/rung2.html).

Cheers,
Anna.

Hi Anna,

Thank you for your kind reply.

1.)
Doing this automatically using a build-bot would of course be way more
convenient and effective than doing things manually. I was unaware
that you are currently setting this up. I could of course still do
something similar manually for projects that arent covered by the
build-bot, but maybe it would be a better idea to just add those to
the automatic process ?
:wink:
I ran the analyzer on a few projects a way back, but as a non-dev I
didnt immediately know to turn that data into something useful (until
I came up with the idea to compare results from different versions of
the analyzer). The open source projects I tried were: bind, dhcp, gcc,
gdb, glib, ntp, openldap, openssl, postfix. Some of those might be
good candidates for the build-bot ?

2.)
It's not that im not interested in determining which results are
indeed bugs, which are false positives, and maybe even were the false
negatives are... It's just that I dont have the required skills to do
so. Which in turn made me wonder in what ways I could help out as a
non-programmer, and resulted in the email below. Doing this
automatically would of course be preferable.

3.)
Doing something like Coverity's open source scan results would indeed
be the holy-grail. :slight_smile:
But as a start, it might be a lot easier, but still helpful, if just
the reports would be published on some publicly accessible web server
? There would be no fancy stuff like people
categorizing/reporting/collaborating on stuff, but interested people
would be able to easily compare results from different versions
manually for a particular project. Which might be a good start ?

Anyway, thank you again for your time and reply,

Regards,

John Smith.

Hi Anna,

Thank you for your kind reply.

1.)
Doing this automatically using a build-bot would of course be way more
convenient and effective than doing things manually. I was unaware
that you are currently setting this up.

We just started working on this a week ago and it has not been publicly discussed. (Perfect timing for your email:))

I could of course still do
something similar manually for projects that arent covered by the
build-bot, but maybe it would be a better idea to just add those to
the automatic process ?

The immediate goal of the bot is to determine regressions we introduce during development. It will be run daily, so adding too many projects to it might not be possible. In addition to this, it would be great if we could run the analyzer on a larger number of projects each month or so to catch the issues the bot did not catch. (You could still use the same scripts as our buildbot to automate the process.) It would also be useful to have someone file the bugs reporting the failures.

:wink:
I ran the analyzer on a few projects a way back, but as a non-dev I
didnt immediately know to turn that data into something useful (until
I came up with the idea to compare results from different versions of
the analyzer). The open source projects I tried were: bind, dhcp, gcc,
gdb, glib, ntp, openldap, openssl, postfix. Some of those might be
good candidates for the build-bot ?

Definitely.

2.)
It's not that im not interested in determining which results are
indeed bugs, which are false positives, and maybe even were the false
negatives are... It's just that I dont have the required skills to do
so. Which in turn made me wonder in what ways I could help out as a
non-programmer, and resulted in the email below.

I understand. Thanks for the interest!

Doing this
automatically would of course be preferable.

Automating this is difficult (impossible in some cases). Think about it this way: the analyzer is an automated process which cannot determine if these are real bugs or not. If we knew how to automate this, we would add the logic to the analyzer.

3.)
Doing something like Coverity's open source scan results would indeed
be the holy-grail. :slight_smile:
But as a start, it might be a lot easier, but still helpful, if just
the reports would be published on some publicly accessible web server
?

What would these reports look like? I think it would be valuable to list which projects have been successfully analyzed and how many bugs have been found in each one. Going back to the HTML reports themselves, we could list them, but I am not 100% sure on how they'll be used by others. One scenario is that a project developer would try to investigate a reported bug and either fix it or mark it as a false positive.. (Again, would be great to provide the interface for feedback.)

Developing an interface to display analysis results would be useful even for providing feedback about the projects we analyze on our buildbot.

crontab to do some projects daily/weekly, and other month ?

Some early diffs between trunk rev. 131083, and trunk rev. 139148....

http://lbalbalba.freezoka.net/ccc-analyzer/

I thought you were interested in doing something similar to the monthly test (+ making the results public), so the point is that it would be a good way to split the work.

Uploading results as we speak:

want me to keep doing this monthly ?
got it
:lol:

Alright, got some comparative results.

Although the baseline is not as good as one would like it to be.

All the earliest results are "clang v3.0 trunk rev. 131083 - or earlier".

I didnt have the vision/clarity of mind to do all of these with the
exact same version back then...
:frowning:

All the '2nd' tests are using 'clang v3.0 trunk rev. 139148' now.
using the exact same version for all the builds now.

Exciting points from the non educated man: 'total issues found' seem
to differ and even decrease in most projects. Hoping thid means less
false positives.
:slight_smile:

Uploading stuff here:
http://lbalbalba.freezoka.net/ccc-analyzer/

Well, I ran scan-build on a few open source codebases again. This time
using trunk r140859 (compared to r139148).

The most obvious diffs: lots less 'Dead store/Dead assignment' reports
on some codebases (bind, dhcp). So either someone has done very good
[eliminating false positives] or messed up [increased false
negatives]...
:wink:

It's pretty much the same results between revs for the rest of the
codebases though...

http://lbalbalba.freezoka.net/ccc-analyzer/clang%20v3.0%20trunk%20rev.%20140859/

- John Smith

Did you modify the gdb makefiles in any way before running “scan-build”? I have implemented a checker and I want to test its effectiveness against open source code. For that I am trying to run scan-build on gdb. Nothing happens as in the analysis never gets run during the gdb make process. I dug through the scan-build files to see whats going on and found that unless the makefile uses $(CC) and $(CXX) in it, the analyzer is not run. So I wanted to find out if you modified anything for this experiment? If yes, can you please give me the details?

Hi,

I dont remember doing anything special for gdb. I just downloaded the
source tarball for gdb-7.2, unzipped/untarred, and ran

scan-build ./configure
scan-build make

dont know if it makes any difference, but after doing a './configure
&& make && make install' on llvm/clang, I did manually copy scan-build
to a directory that's included in my PATH:

cp llvm/tools/clang/tools/scan-build/* /usr/local/bin

For glib, I did have to make a small modification to the generated
make stuff though. For some reason, glib wont compile unless you
specify that you want optimizations (? perhaps depending on
optimizations done by gcc for correct code generation ?). And
scan-build wont work if you specify optimizations. So after doing

scan-build ./configure --disable-sanity-checks

for glib, i had to edit config.h and comment out the following section

/* #if !defined __ASSEMBLER__ && !defined _ISOMAC && !defined __OPTIMIZE__
# error "glibc cannot be compiled without optimization"
#endif */

After which the scan worked without problems. I doubt that this
generates working binaries, though.
:slight_smile:

Hope this helps.

- John Smith.