codechecker into clang/LLVM?

Hello All,

Scan-build, the current bug viewer Clang Static Analyzer front-end tool has some scalability issues and limitations.

For example, scan-build creates static HTML reports, storing whole source files as many times as they are included in a report.

Incremental bug reporting (show only new bugs compared to a baseline) and false positive suppression is not supported either.

To address these issues, back in July we published CodeChecker on GitHub ( https://github.com/Ericsson/codechecker ),

a new defect storage and management infrastructure for Clang Static Analyzer (written in python). We also gave a talk about this in Euro LLVM 2015 (http://llvm.org/devmtg/2015-04/).

The most important features are the following:

  • scalable dynamic web based defect viewer (instead of static html)

  • a new command line tool for analyzing projects which is usable in CI scripts

  • a PostgreSQL based defect storage & management

  • incremental bug reporting (show only new bugs compared to a baseline)

  • suppression of false positives

  • better integration with build systems (through the LD_PRELOAD mechanism)

  • Apache Thrift API based server-client model for storing bugs and viewing results.

  • It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.

Since its publication we have fixed many errors, addressed user-feedbacks and now I think it is mature enough.

We could release the tool under LLVM license.

If you agree, this tool could be part of the llvm/clang source tree, possibly besides scan-build (or a separate llvm repository?).

I am not sure about the official process.

Can anyone help with this?

Regards,

Daniel

Hi Daniel,

Sorry for taking so long to reply!

The clang static analyzer is definitely missing a bug tracking system and I believe this project has a good potential to fill that need. Here are a couple of concerns that immediately jump into mind:

- What would it take for this to replace scan-build? Can scan-build be used instead of the interposition module you use? For example, can we control the build interposition method by some option and the bug tracking would be an add-on on top of that? I suspect that your solution does not work on all platforms that scan-build currently supports (Mac and Windows come to mind). That is the main concern here. There are also projects that might not build with the type of interposition you use. I am not sure if you are aware of the scan-build rewrite (in Python) effort, where all these issues were raised as well.

- Is licensing compatible? The llvm codebase tries to stay clear of any dependencies on GPL or LGPL licenses because there are companies who are involved with the project and cannot use software tainted with those licenses.

- The list of dependencies is large, which is a concern if this was to replace scan-build.

Anna.

Hi Anna,

First, thanks for looking into this.

We are open to any suggestions that you feel necessary to get this accepted to LLVM/Clang as a bug-tracking solution…

>- What would it take for this to replace scan-build?..

We’ve been mainly targeting (I mean test it on ) Linux, but Mac and Windows support can be easily added too.

Since the whole thing is in python, the only issue here could be the “build interposition”, as you pointed out. Other than that, it’s pretty much platform independent.

Regarding the “bug interposition”: CodeChecker uses the standard clang JSON compilation database format as an input which you can pass like this (CodeChecker check -l <build_log.json>).

You can generate this log, by any other tools, using bear tool for example on Mac.

Currently the built in logger (based on LD_PRELOAD) supports compilation logging on Linux smoothly. We could add a (bash script) based logger too, that would be platform independent.

We can do some testing on windows, but any help testing on Mac is welcome (as we don’t use Macs).

> Is licensing compatible?

It should be. Except for psycopg (the postgres database connector) we not relying on any GPL or LGPL stuff.

If psycopg is a problem this could be replaced to another postgres connector, such as pg8000 with BSD license.

I collected here the licenses of the dependencies. All dependencies are run-time dependencies (except for the thrift compiler) and are used without modification.

Javascript dependencies

*codemirror (https://codemirror.net/ MIT licence)

*jsplumb (community edition, MIT https://jsplumbtoolkit.com/license#community)

*marked (BSD like https://github.com/chjj/marked/blob/master/LICENSE)

*dojotoolkit (new BSD license https://dojotoolkit.org/license.html)

Python dependencies

· Python2 (> 2.7) (Python Software Foundation ) https://www.python.org/download/releases/2.7/license/)

· Alembic (>=0.8.2) (MIT)

· SQLAlchemy (> 1.0.2) (MIT)

· psycopg2 (> 2.5.4) (LGPL http://initd.org/psycopg/license/)

Other dependencies

· Clang Static analyzer

· Thrift (compilation dependency) (Apache v2.0 https://thrift.apache.org/)

· Bzip2 is used for test project only (can be removed) (BSD like http://www.bzip.org/)

Could you suggest which dependencies are problematic? We will investigate how to replace those.

Could you help in testing and/or making it Mac compatible? I suggest first running it on a standard JSON build db using the –l option.

Regards,

Daniel

One important use case that scan-build/ccc-analyzer supports but that the LD_PRELOAD + compilation database approach doesn’t is running the analyzer as the code is being built rather than replaying the build commands later. This is important build systems that move or modify build system intermediates. I think it would be good to continue to have a way to support such projects.

Devin

Hi Anna,

First, thanks for looking into this.
We are open to any suggestions that you feel necessary to get this accepted to LLVM/Clang as a bug-tracking solution…

>- What would it take for this to replace scan-build?...
We’ve been mainly targeting (I mean test it on ) Linux, but Mac and Windows support can be easily added too.
Since the whole thing is in python, the only issue here could be the “build interposition”, as you pointed out. Other than that, it’s pretty much platform independent.

Regarding the “bug interposition”: CodeChecker uses the standard clang JSON compilation database <http://clang.llvm.org/docs/JSONCompilationDatabase.html> format as an input which you can pass like this (CodeChecker check -l <build_log.json>).

Do we have to use the JSON compilation database or can this be made to work with the existing scan-build?

You can generate this log, by any other tools, using bear <https://github.com/rizsotto/Bear> tool for example on Mac.

CC-ing Laszlo who is working on a scan-build rewrite in Python. This is the rewrite I've mentioned in one of the previous emails. It is flexible so that it could use either build interposition (bear) or ccc-analyzer style build.

Ideally, we would have a component that would have the parity in capabilities with scan-build (or better). Laszlo has made a lot of progress on this. We could use that component for interposition and have a bug management system on top of it.

Currently the built in logger (based on LD_PRELOAD) supports compilation logging on Linux smoothly. We could add a (bash script) based logger too, that would be platform independent.

How would the bash script logger work?

We can do some testing on windows, but any help testing on Mac is welcome (as we don’t use Macs).

> Is licensing compatible?
It should be. Except for psycopg <http://initd.org/psycopg/license/> (the postgres database connector) we not relying on any GPL or LGPL stuff.

If psycopg is a problem this could be replaced to another postgres connector, such as pg8000 <https://pypi.python.org/pypi/pg8000> with BSD license.

It is a problem. (I cannot test this until it's free of LGPL.) Good to hear that we can switch to using an alternate method.

I collected here the licenses of the dependencies. All dependencies are run-time dependencies (except for the thrift compiler) and are used without modification.

Javascript dependencies
*codemirror (https://codemirror.net/ MIT licence)
*jsplumb (community edition, MIT https://jsplumbtoolkit.com/license#community)
*marked (BSD like https://github.com/chjj/marked/blob/master/LICENSE)
*dojotoolkit (new BSD license https://dojotoolkit.org/license.html)

Python dependencies
· Python2 <https://www.python.org/> (> 2.7) (Python Software Foundation ) https://www.python.org/download/releases/2.7/license/)
· Alembic <https://pypi.python.org/pypi/alembic> (>=0.8.2) (MIT)
· SQLAlchemy <http://www.sqlalchemy.org/> (> 1.0.2) (MIT)
· psycopg2 <http://initd.org/psycopg/> (> 2.5.4) (LGPL http://initd.org/psycopg/license/)
Other dependencies
· Clang Static analyzer <http://clang-analyzer.llvm.org/>
Postgresql <http://www.postgresql.org/> (> 9.3.5) (BSD like http://www.postgresql.org/about/licence/)
· Thrift (compilation dependency) (Apache v2.0 https://thrift.apache.org/)
· Bzip2 is used for test project only (can be removed) (BSD like http://www.bzip.org/)

Could you suggest which dependencies are problematic? We will investigate how to replace those.

I do not know if there are other licensing issues. They all look OK to me but I am not an expert.

Ideally, we'd also want to smooth out the installation process as much as possible.

hey guys,

was took me some time to catch up with the topic. (checked the sources on github and saw the slides.) first i think your tool has very nice catches on what barriers of the existing tools has. and made a good effort to integrate these tools for better user experience… here come my thoughts what i had during the exploration.

  • there will be demand for functionality what the current scan-build does. (run against a build command and get static html files.)
  • there is need to extend the current scan-build functionality. (add bug tracking capabilities, suppression, support independent viewers, etc…)

and i think the problems/requirements can be partitioned into independent tools. or existing implementations could be reused to save efforts… (i think Anna told the same, but more english :))

talking about code… there is an ongoing rewrite of scan-build… from this implementation you could reuse the build command interception (to create compilation database) which use ld_preload on linux and osx, and have compiler wrappers for windows… also has the code to run the static analyzer and generate .plist files (support interposition too)… and would glad to implement clang-tidy config file support if that make any sense for the static analyzer too… or we can also come up with more traditional approaches to create compiler like script which runs the SA or clang-tidy only, and could be used from makefiles directly. (as still many lint like tools does.)… have you thought about to use sqlite instead of postgresql? it would definitely lower the bar for non experienced users (and the python api is part of standard library).

regards,
Laszlo

Hi Devin,

If I understand you correctly, you mean that in certain cases the build cannot be “replayed”

if some source files are deleted during the build.

I can think of the following possible solutions:

a) Even with the LD_PRELOAD technique we could solve the interposition by executing the clang analyzer command (instead of logging) before calling the real compiler (gcc). This might not work on windows.

b) Another possibility is that - similarly to scan-build- overriding the CC and CXX env variables to point to an “interposition” script which calls clang analyzer and then the original compiler. The drawback of this solution is that not all build system uses these env variables.

c) What we also tried earlier is to add the interposition script on names gcc and g++ to the beginning of the PATH, so the build system finds those instead of the real compilers. This relies on gcc to be called by the build system from the PATH.

Since build systems are so much different all three options could be useful in certain situations.

I created a “paste” thread for this in phabricator to continue this discussion in a more structured way.

http://reviews.llvm.org/P367

Regards,

Daniel

Hello Laszlo,

Good points.

Regarding the code I think we could reuse your implementation of lib ear for logging and replaying the build. Your library is tested on OSX, while ours is not. An issue is added https://github.com/Ericsson/codechecker/issues/149 so it will be not forgotten.

We have created a task for supporting clang-tidy: work consisting of calling it & parsing its output and feed it into the DB. See https://github.com/Ericsson/codechecker/issues/58

Supporting sqlite besides pgsql is also a good idea as it has lower configuration costs. I created an issue for this: https://github.com/Ericsson/codechecker/issues/148 so it will be remembered.

I suggest continuing this discussion on this thread

http://reviews.llvm.org/P367

Thanks,

Daniel

Hi Daniel,

I’ve looked at the project in a bit more detail and here are some other general comments.

It is very desirable to allow CodeChecker to work with scan-build (the current one and/or the python rewrite). This would ensure that all projects that we can analyze now (on all platforms such as OS X and Windows) will be supported. We agree that certain parts of current scan-build could be improved; however, getting a modular design will allow us to stage that process. Keeping code checker build interposition separate from the current scan-build would cause fragmentation between the users and developers. Is it possible the integrate CodeChecker with scan-build?

Is it possible to have something like "quick check" mode but with the enhanced HTML viewer? The database is needed for bug tracking, but requiring users to set it up to view bugs does not seem necessary. The easier it is to run/setup the tool for basic usage the more people will use it!

We’d need to integrate the automated tests into lit and add documentation.

Suppression in code should be handled by the compiler/analyzer, hopefully, using a more familiar syntax. (We can discuss this separately later.)

All command line options should be compatible/make sense in the potential future world of whole project analysis. (This might already be the case.)

Thank you,
Anna.

Hi Anna,

Please find my comments below in red.

We will start integration into lit auto-test integration soon.

Regards,

Daniel

Hi Anna,

Please find my comments below in red.
We will start integration into lit auto-test integration soon.

Regards,
Daniel

From: Anna Zaks [mailto:ganna@apple.com]
Sent: 2015. december 8. 20:32
To: Dániel Krupp
Cc: cfe-dev@lists.llvm.org <mailto:cfe-dev@lists.llvm.org>
Subject: Re: [cfe-dev] codechecker into clang/LLVM?

Hi Daniel,

I’ve looked at the project in a bit more detail and here are some other general comments.

It is very desirable to allow CodeChecker to work with scan-build (the current one and/or the python rewrite). This would ensure that all projects that we can analyze now (on all platforms such as OS X and Windows) will be supported. We agree that certain parts of current scan-build could be improved; however, getting a modular design will allow us to stage that process. Keeping code checker build interposition separate from the current scan-build would cause fragmentation between the users and developers. Is it possible the integrate CodeChecker with scan-build?
> Yes it is. I think the best would be to add an option to CodeChecker to be able to parse the new scan-build output directory.

Just to clarify, it would only work with the "new scan-build" since CodeChecker takes the compilation database as input, correct?

Later, we can replace codechecker’s build logger with the new scan-build after we made sure that scan-build supports everything we need (for example executing clang-tidy).
But for short term parsing scan-build’s output should be good enough.

Is it possible to have something like "quick check" mode but with the enhanced HTML viewer? The database is needed for bug tracking, but requiring users to set it up to view bugs does not seem necessary. The easier it is to run/setup the tool for basic usage the more people will use it!
> It is possible already using sqlite db.

Could you explain why the database is needed at all? Why the overall design requires it?

Setting up postgresql for a quick analysis is cumbersome,
therefore we added sqlite support (based on earlier suggestion from Laszlo Nagy).

You can run the check and the server like this
CodeChecker check -n test --sqlite -w /tmp/workspace -b "make"
CodeChecker server --sqlite -w /tmp/workspace -v 6060 --not-host-only

So you can run this quickly without postgres config, suitable for basic usage.

We’d need to integrate the automated tests into lit and add documentation.
> Sure, we will integrate auto-tests into lit. Documentation (user-guide) is already available in markdown: https://github.com/Ericsson/codechecker/blob/master/docs/user_guide.md

Suppression in code should be handled by the compiler/analyzer, hopefully, using a more familiar syntax. (We can discuss this separately later.)
> Would be nice to use the same suppression syntax also in clang-tidy.

I think we should discuss the design on a separate thread. I do not think it's a critical feature, so we could do this later. The main issue here is that I am not comfortable with recommending users a way for suppressing issues in code until the greater community agrees upon the design for issue suppression and compiler support is implemented. Once these recommendations are made, the users assume that the format will be supported going forward.

Hi,

Answers in blue.

Regards,

Daniel

Hi,
Answers in blue.

Regards,
Daniel

From: Anna Zaks [mailto:ganna@apple.com]
Sent: 2015. december 11. 23:52
To: Dániel Krupp
Cc: cfe-dev@lists.llvm.org <mailto:cfe-dev@lists.llvm.org>
Subject: Re: [cfe-dev] codechecker into clang/LLVM?

Hi Anna,

Please find my comments below in red.
We will start integration into lit auto-test integration soon.

Regards,
Daniel

From: Anna Zaks [mailto:ganna@apple.com]
Sent: 2015. december 8. 20:32
To: Dániel Krupp
Cc: cfe-dev@lists.llvm.org <mailto:cfe-dev@lists.llvm.org>
Subject: Re: [cfe-dev] codechecker into clang/LLVM?

Hi Daniel,

I’ve looked at the project in a bit more detail and here are some other general comments.

It is very desirable to allow CodeChecker to work with scan-build (the current one and/or the python rewrite). This would ensure that all projects that we can analyze now (on all platforms such as OS X and Windows) will be supported. We agree that certain parts of current scan-build could be improved; however, getting a modular design will allow us to stage that process. Keeping code checker build interposition separate from the current scan-build would cause fragmentation between the users and developers. Is it possible the integrate CodeChecker with scan-build?
> Yes it is. I think the best would be to add an option to CodeChecker to be able to parse the new scan-build output directory.

Just to clarify, it would only work with the "new scan-build" since CodeChecker takes the compilation database as input, correct?
> CodeChecker can take the compilation db as the input and invoke clang even today (CodeChecker check -l compilation_db …), so in this sense it works well together with scan-build.
However we could implement an option to take the output directory of the new-scan build with the plist files as the input. This would work better for projects that remove source files during build.

So this would work with the existing scan-build, correct? I would really like to have this option supported; this would also allow us to test the part of the tool that does not deal with interposition even now on OS X.

Later, we can replace codechecker’s build logger with the new scan-build after we made sure that scan-build supports everything we need (for example executing clang-tidy).
But for short term parsing scan-build’s output should be good enough.

Is it possible to have something like "quick check" mode but with the enhanced HTML viewer? The database is needed for bug tracking, but requiring users to set it up to view bugs does not seem necessary. The easier it is to run/setup the tool for basic usage the more people will use it!
> It is possible already using sqlite db.

Could you explain why the database is needed at all? Why the overall design requires it?
> For analyzing a project with a few files you would not need a db. However we designed it for the use case when you analyze a project with a million lines of code and with potentially thousands of analyzer reports. Imagine that you analyze and store many version of this project each with say 5000 reports (with bug path). Then let’s say you would like to see the new bugs and resolved bugs between 2 such versions (which now became possible with the bug hashes), or you want to filter/order the results based on fault priority/file path. For these tasks you need a DB backend with indexing. Does that make sense?

Yes, I understand why the database is needed in some workflows. My question is about whether it is possible to use the tool without a database. For example, a user may want to scan the project and see the results as a way of learning more about the static analyzer. Or a user might keep their project clean of the static analyzer issues and scan it from time to time. Basically, any workflow that scan-build supports right now would benefit from having a better report viewer, which code checker provides.