Proposal: Integrate CodeChecker analyzer infrastructure

Hi,

We would like to add CodeChecker (https://github.com/Ericsson/codechecker) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new analyzer can be easily done.
We are developing a tool which can be used easily by the developers or by automated continuous integration tools and view the results from multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs that can be detected by the other clang tools such as clang-tidy which is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they acceptable?
2. Is the community satisfied with the CodeChecker name?

Integration plan:
  0. CodeChecker should use scan-build.py (OSX support) to generate the compilation database instead of the current LD_PRELOAD technique
  1. Migrate CodeChecker testing infrastructure to the current LLVM testing infrastructure

(*) Most notably it extends the current tool set with the following features:
  - stores the result of multiple large analysis run results efficiently (opposed to scan-build/scan-view static htmls)
  - run multiple analyzers, currently Clang Static Analyzer and Clang-Tidy is supported
  - dynamic web based defect viewer (instead of static html)
  - a SQLite/PostgreSQL based defect storage & management (both are optional, results can be shown on standard output in quickcheck mode)
  - update analyzer results only for modified files (depends on the build system)
  - compare analysis results (new/resolved/unresolved bugs compared to a baseline)
  - filter analysis results (checker name, severity, source file name ...)
  - skip analysis in specific source directories if required
  - suppression of false positives (in config file or in the source)
  - Thrift API based server-client model for storing bugs and viewing results.
  - It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.
    (command line client is the recommended way to connect into Continuous Integration loops)

Command line examples of usage can be found here: https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
  - Small projects/several source files (quick feedback)
      No database is used, analysis results are shown in on the command line only
  - Medium size projects (~500 files)
      Results are stored in SQLite/PostgreSQL database and can be viewed from command line or web viewer clients
  - Large size projects (>500 files)
      Results are stored in PostgreSQL database and can be viewed from command line or web viewer clients

There are currently discussions about analyzer tool support in multiple email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

  - Problem: Different analyzers provide different output formats (Clang Static Analyzer provides plist/html/command line, Clang-tidy provides command line output only)
    Solution: With Codechecker analyzer results from multiple analyzers can be viewed in a common way for developers or other tools for further result processing.

  - Problem: CC environment variable overwriting by previous scan-build version (written in perl) is not always a good solution.
    Solution: Compilation database is generated by CodeChecker (currently using the LD_PRELOAD technique, later with scan-build.py for OSX support).

  - Problem: Analyzer has multiple command line arguments which could be changed by time, the end users should not be affected.
    Solution: CodeChecker hides the clang analyzer specific options from the user. Many options are preconfigured. But forwarding options without modifications to the analyzers is supported.

  - Problem: Understanding analyzer results might be harder if only command line results are available (currently generated static html sites do not scale and it is hard to manage).
    Solution: Analysis steps can be viewed in command line with quickcheck or in the web viewer (dynamically generated based on the database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
  - Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
  - SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, for supporting multiple database backends
  - Alembic (MIT) - required for database migration support which is only available for PostgreSQL database
  - pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is required for PostgreSQL database support (both are supported)
  - Thrift (Apache v2.0) - cross-language service building framework to handle data transfer for report storage and result viewer clients
  - Codemirror (MIT) - view source code in the browser
  - Jsplumb (community edition, MIT) - draw bug paths
  - Marked (BSD) - view documentation for checkers written in markdown (generated dynamically)
  - Dojotoolkit (BSD) - main framework for the web UI
  - Highlightjs (BSD) - required for highlighting the source code

For further information check out our GitHub (https://github.com/Ericsson/codechecker) page.

Best Regards,
Gyorgy Orban

This is very cool. So when you ask:

  1. Does the Clang community like the idea?

My answer is a yes.

I have much more to say about this, and I will do so tomorrow.

Nit: Your Python installation is a few years out of date:

Python 2.7.5

…the latest version of Python 2.7 is 2.7.11, and has many fixes.

Hi Alex,

The infrastructure that the Ericsson team has built can and should be used for clang tidy reported issues as well as the static analyzer. Would be good if you or someone else working on clang tidy could chime in on this!

(I’ve been talking with them off line and asked them to send out this email to gather the feedback from the community.)
Thanks,
Anna.

Adding CodeChecker infrastructure would be very valuable to those who use clang for bug finding. It provides a single place to view the bugs reported by different tools such as the static analyzer and clang-tidy. The ability to track bugs over time and cutting a baseline so that only the new bugs are reported is important for large projects that cannot address all of the issues at once.

Let’s proceed with merging it in. Please, split commits into incremental logical chunks. (http://llvm.org/docs/DeveloperPolicy.html#incremental-development)

Hi,

We would like to add CodeChecker (https://github.com/Ericsson/codechecker) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new analyzer can be easily done.
We are developing a tool which can be used easily by the developers or by automated continuous integration tools and view the results from multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs that can be detected by the other clang tools such as clang-tidy which is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they acceptable?
2. Is the community satisfied with the CodeChecker name?

Unless the name is a blocker on your side, I’d like to discuss it later once we see what the interface looks like. Frankly, I am not a fan of this name sine it’s very ambiguous.

Integration plan:
0. CodeChecker should use scan-build.py (OSX support) to generate the compilation database instead of the current LD_PRELOAD technique
1. Migrate CodeChecker testing infrastructure to the current LLVM testing infrastructure

(*) Most notably it extends the current tool set with the following features:
- stores the result of multiple large analysis run results efficiently (opposed to scan-build/scan-view static htmls)
- run multiple analyzers, currently Clang Static Analyzer and Clang-Tidy is supported
- dynamic web based defect viewer (instead of static html)
- a SQLite/PostgreSQL based defect storage & management (both are optional, results can be shown on standard output in quickcheck mode)
- update analyzer results only for modified files (depends on the build system)
- compare analysis results (new/resolved/unresolved bugs compared to a baseline)
- filter analysis results (checker name, severity, source file name ...)
- skip analysis in specific source directories if required
- suppression of false positives (in config file or in the source)
- Thrift API based server-client model for storing bugs and viewing results.
- It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.
  (command line client is the recommended way to connect into Continuous Integration loops)

Command line examples of usage can be found here: https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
- Small projects/several source files (quick feedback)
    No database is used, analysis results are shown in on the command line only
- Medium size projects (~500 files)
    Results are stored in SQLite/PostgreSQL database and can be viewed from command line or web viewer clients
- Large size projects (>500 files)
    Results are stored in PostgreSQL database and can be viewed from command line or web viewer clients

There are currently discussions about analyzer tool support in multiple email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

- Problem: Different analyzers provide different output formats (Clang Static Analyzer provides plist/html/command line, Clang-tidy provides command line output only)
  Solution: With Codechecker analyzer results from multiple analyzers can be viewed in a common way for developers or other tools for further result processing.

- Problem: CC environment variable overwriting by previous scan-build version (written in perl) is not always a good solution.
  Solution: Compilation database is generated by CodeChecker (currently using the LD_PRELOAD technique, later with scan-build.py for OSX support).

- Problem: Analyzer has multiple command line arguments which could be changed by time, the end users should not be affected.
  Solution: CodeChecker hides the clang analyzer specific options from the user. Many options are preconfigured. But forwarding options without modifications to the analyzers is supported.

- Problem: Understanding analyzer results might be harder if only command line results are available (currently generated static html sites do not scale and it is hard to manage).
  Solution: Analysis steps can be viewed in command line with quickcheck or in the web viewer (dynamically generated based on the database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
- Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
- SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, for supporting multiple database backends
- Alembic (MIT) - required for database migration support which is only available for PostgreSQL database
- pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is required for PostgreSQL database support (both are supported)

We should NOT include dependencies on LGPL!

Hi,

We started to restructure and cleanup our source code for a better integration with the current lit testing environment in llvm/clang.
After we are done we can start to merge the source in.

Adding CodeChecker infrastructure would be very valuable to those who use clang for bug finding. It provides a single place to view the bugs reported by different tools such as the static analyzer and clang-tidy. The ability to track bugs over time and cutting a baseline so that only the new bugs are reported is important for large projects that cannot address all of the issues at once.

Let’s proceed with merging it in. Please, split commits into incremental logical chunks. (http://llvm.org/docs/DeveloperPolicy.html#incremental-development)

Hi,

We would like to add CodeChecker (https://github.com/Ericsson/codechecker) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new analyzer can be easily done.
We are developing a tool which can be used easily by the developers or by automated continuous integration tools and view the results from multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs that can be detected by the other clang tools such as clang-tidy which is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they acceptable?
2. Is the community satisfied with the CodeChecker name?

Unless the name is a blocker on your side, I’d like to discuss it later once we see what the interface looks like. Frankly, I am not a fan of this name sine it’s very ambiguous.

I do not think it is a blocker, we can discuss it later.

Integration plan:
0. CodeChecker should use scan-build.py (OSX support) to generate the compilation database instead of the current LD_PRELOAD technique

Should we implement this feature (with scan-build.py intercept) before we merge our code base or after?

1. Migrate CodeChecker testing infrastructure to the current LLVM testing infrastructure

(*) Most notably it extends the current tool set with the following features:
- stores the result of multiple large analysis run results efficiently (opposed to scan-build/scan-view static htmls)
- run multiple analyzers, currently Clang Static Analyzer and Clang-Tidy is supported
- dynamic web based defect viewer (instead of static html)
- a SQLite/PostgreSQL based defect storage & management (both are optional, results can be shown on standard output in quickcheck mode)
- update analyzer results only for modified files (depends on the build system)
- compare analysis results (new/resolved/unresolved bugs compared to a baseline)
- filter analysis results (checker name, severity, source file name ...)
- skip analysis in specific source directories if required
- suppression of false positives (in config file or in the source)
- Thrift API based server-client model for storing bugs and viewing results.
- It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.
   (command line client is the recommended way to connect into Continuous Integration loops)

Command line examples of usage can be found here: https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
- Small projects/several source files (quick feedback)
     No database is used, analysis results are shown in on the command line only
- Medium size projects (~500 files)
     Results are stored in SQLite/PostgreSQL database and can be viewed from command line or web viewer clients
- Large size projects (>500 files)
     Results are stored in PostgreSQL database and can be viewed from command line or web viewer clients

There are currently discussions about analyzer tool support in multiple email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

- Problem: Different analyzers provide different output formats (Clang Static Analyzer provides plist/html/command line, Clang-tidy provides command line output only)
   Solution: With Codechecker analyzer results from multiple analyzers can be viewed in a common way for developers or other tools for further result processing.

- Problem: CC environment variable overwriting by previous scan-build version (written in perl) is not always a good solution.
   Solution: Compilation database is generated by CodeChecker (currently using the LD_PRELOAD technique, later with scan-build.py for OSX support).

- Problem: Analyzer has multiple command line arguments which could be changed by time, the end users should not be affected.
   Solution: CodeChecker hides the clang analyzer specific options from the user. Many options are preconfigured. But forwarding options without modifications to the analyzers is supported.

- Problem: Understanding analyzer results might be harder if only command line results are available (currently generated static html sites do not scale and it is hard to manage).
   Solution: Analysis steps can be viewed in command line with quickcheck or in the web viewer (dynamically generated based on the database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
- Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
- SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, for supporting multiple database backends
- Alembic (MIT) - required for database migration support which is only available for PostgreSQL database
- pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is required for PostgreSQL database support (both are supported)

We should NOT include dependencies on LGPL!

This is an optional runtime dependency (it is not included), we do not require it, we just support it at runtime if available at the host machine.

- Thrift (Apache v2.0) - cross-language service building framework to handle data transfer for report storage and result viewer clients
- Codemirror (MIT) - view source code in the browser
- Jsplumb (community edition, MIT) - draw bug paths
- Marked (BSD) - view documentation for checkers written in markdown (generated dynamically)
- Dojotoolkit (BSD) - main framework for the web UI
- Highlightjs (BSD) - required for highlighting the source code

For further information check out our GitHub (https://github.com/Ericsson/codechecker) page.

Best Regards,
Gyorgy Orban
_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Best Regards,
Gyorgy Orban

(having dug the e-mail from the bottom of my inbox)

I certainly like the idea of having an open-source web-based results browser for clang-tidy and clang static analyzer results. I like the features of CodeChecker (issue browsing, suppression, diffs). And I can suggest more potentially useful features like:

  • code-centric browsing of the issues (with directory view showing aggregate numbers of issues in each file/subdirectory and a file view showing all issues in the file in a compact form - without execution paths);
  • an easy way to apply fixes for a subset of issues in a file / directory.

I’m not sure though, if integrating CodeChecker source code to the LLVM project brings a lot of benefits to the CodeChecker developers and/or users. I don’t have any objections, I just don’t understand at this point, what are you expecting to achieve by moving the code to LLVM.

(having dug the e-mail from the bottom of my inbox)

I certainly like the idea of having an open-source web-based results browser for clang-tidy and clang static analyzer results. I like the features of CodeChecker (issue browsing, suppression, diffs). And I can suggest more potentially useful features like:

  • code-centric browsing of the issues (with directory view showing aggregate numbers of issues in each file/subdirectory and a file view showing all issues in the file in a compact form - without execution paths);
  • an easy way to apply fixes for a subset of issues in a file / directory.

I’m not sure though, if integrating CodeChecker source code to the LLVM project brings a lot of benefits to the CodeChecker developers and/or users. I don’t have any objections, I just don’t understand at this point, what are you expecting to achieve by moving the code to LLVM.

The benefit to the CodeChecker team is that they will gain more visibility (both in terms of users and fellow developers).
The benefit to LLVM is that we could gain a much better issue viewing and triaging tool than what we have now. Ex: my hope is that CodeChecker would replace scan-view.

Anna.

Fair enough. That makes sense to me. Thank you for the explanation! I’m glad the tool is being developed whether in its own repository or as a part of the llvm project.

Hi,

The first codechecker patch including the core modules, documentation
and unit tests is available here: https://reviews.llvm.org/D24040
After the review is done, the second patch will contain the functional tests.
The infrastructure dependencies for testing should be discussed. Right
now Travis CI Travis CI - Test and Deploy Your Code with Confidence is triggered
for every commit and pull request to run the unit and functional tests
on Linux and OSX.
The third patch will contain the web UI.

Any feedback is appreciated!

Br,
Gyorgy

Hi,

During the review a question was raised where should we put the
CodeChecker source code.
It has no strong revision lock to clang or clang-tidy.
I would like to ask the community where should we put the source code?
Should it go under the clang-tools-extra or kept under a separate
repository under the LLVM umbrella or do you have any other idea?

Br,
Gyorgy

Hi!

I think one of the long term goals of the Static Analyzer is to remove the current HTML. The recommended way to view path sensitive reports, however, is to use an IDE or a viewer that support highlighting execution paths. It would be strange that once the HTML output is removed the static analyzer will not have a recommended view method inside the clang repository. I don’t know whether this is a concern, but it something that is worth to consider.

Regards,

Gábor