Proposal: Integrate CodeChecker analyzer infrastructure

Gyorgy_Orban · February 23, 2016, 10:10am

Hi,

We would like to add CodeChecker (https://github.com/Ericsson/codechecker) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new analyzer can be easily done.
We are developing a tool which can be used easily by the developers or by automated continuous integration tools and view the results from multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs that can be detected by the other clang tools such as clang-tidy which is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they acceptable?
2. Is the community satisfied with the CodeChecker name?

Integration plan:
0. CodeChecker should use scan-build.py (OSX support) to generate the compilation database instead of the current LD_PRELOAD technique
1. Migrate CodeChecker testing infrastructure to the current LLVM testing infrastructure

(*) Most notably it extends the current tool set with the following features:
  - stores the result of multiple large analysis run results efficiently (opposed to scan-build/scan-view static htmls)
  - run multiple analyzers, currently Clang Static Analyzer and Clang-Tidy is supported
  - dynamic web based defect viewer (instead of static html)
  - a SQLite/PostgreSQL based defect storage & management (both are optional, results can be shown on standard output in quickcheck mode)
  - update analyzer results only for modified files (depends on the build system)
  - compare analysis results (new/resolved/unresolved bugs compared to a baseline)
  - filter analysis results (checker name, severity, source file name ...)
  - skip analysis in specific source directories if required
  - suppression of false positives (in config file or in the source)
  - Thrift API based server-client model for storing bugs and viewing results.
  - It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.
    (command line client is the recommended way to connect into Continuous Integration loops)

Command line examples of usage can be found here: https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
  - Small projects/several source files (quick feedback)
      No database is used, analysis results are shown in on the command line only
  - Medium size projects (~500 files)
      Results are stored in SQLite/PostgreSQL database and can be viewed from command line or web viewer clients
  - Large size projects (>500 files)
      Results are stored in PostgreSQL database and can be viewed from command line or web viewer clients

There are currently discussions about analyzer tool support in multiple email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

- Problem: Different analyzers provide different output formats (Clang Static Analyzer provides plist/html/command line, Clang-tidy provides command line output only)
Solution: With Codechecker analyzer results from multiple analyzers can be viewed in a common way for developers or other tools for further result processing.

- Problem: CC environment variable overwriting by previous scan-build version (written in perl) is not always a good solution.
Solution: Compilation database is generated by CodeChecker (currently using the LD_PRELOAD technique, later with scan-build.py for OSX support).

- Problem: Analyzer has multiple command line arguments which could be changed by time, the end users should not be affected.
Solution: CodeChecker hides the clang analyzer specific options from the user. Many options are preconfigured. But forwarding options without modifications to the analyzers is supported.

- Problem: Understanding analyzer results might be harder if only command line results are available (currently generated static html sites do not scale and it is hard to manage).
Solution: Analysis steps can be viewed in command line with quickcheck or in the web viewer (dynamically generated based on the database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
  - Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
  - SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, for supporting multiple database backends
  - Alembic (MIT) - required for database migration support which is only available for PostgreSQL database
  - pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is required for PostgreSQL database support (both are supported)
  - Thrift (Apache v2.0) - cross-language service building framework to handle data transfer for report storage and result viewer clients
  - Codemirror (MIT) - view source code in the browser
  - Jsplumb (community edition, MIT) - draw bug paths
  - Marked (BSD) - view documentation for checkers written in markdown (generated dynamically)
  - Dojotoolkit (BSD) - main framework for the web UI
  - Highlightjs (BSD) - required for highlighting the source code

For further information check out our GitHub (https://github.com/Ericsson/codechecker) page.

Best Regards,
Gyorgy Orban

_Alexander_G_Riccio1 · February 24, 2016, 8:55am

This is very cool. So when you ask:

Does the Clang community like the idea?

My answer is a yes.

I have much more to say about this, and I will do so tomorrow.

Nit: Your Python installation is a few years out of date:

Python 2.7.5

…the latest version of Python 2.7 is 2.7.11, and has many fixes.

AnnaZaks · February 27, 2016, 6:05pm

Hi Alex,

The infrastructure that the Ericsson team has built can and should be used for clang tidy reported issues as well as the static analyzer. Would be good if you or someone else working on clang tidy could chime in on this!

(I’ve been talking with them off line and asked them to send out this email to gather the feedback from the community.)
Thanks,
Anna.

AnnaZaks · March 2, 2016, 5:42pm

Adding CodeChecker infrastructure would be very valuable to those who use clang for bug finding. It provides a single place to view the bugs reported by different tools such as the static analyzer and clang-tidy. The ability to track bugs over time and cutting a baseline so that only the new bugs are reported is important for large projects that cannot address all of the issues at once.

Let’s proceed with merging it in. Please, split commits into incremental logical chunks. (LLVM Developer Policy — LLVM 18.0.0git documentation)

Hi,

We would like to add CodeChecker (GitHub - Ericsson/codechecker: CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new analyzer can be easily done.
We are developing a tool which can be used easily by the developers or by automated continuous integration tools and view the results from multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs that can be detected by the other clang tools such as clang-tidy which is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they acceptable?
2. Is the community satisfied with the CodeChecker name?

Unless the name is a blocker on your side, I’d like to discuss it later once we see what the interface looks like. Frankly, I am not a fan of this name sine it’s very ambiguous.

Integration plan:
0. CodeChecker should use scan-build.py (OSX support) to generate the compilation database instead of the current LD_PRELOAD technique
1. Migrate CodeChecker testing infrastructure to the current LLVM testing infrastructure

(*) Most notably it extends the current tool set with the following features:
- stores the result of multiple large analysis run results efficiently (opposed to scan-build/scan-view static htmls)
- run multiple analyzers, currently Clang Static Analyzer and Clang-Tidy is supported
- dynamic web based defect viewer (instead of static html)
- a SQLite/PostgreSQL based defect storage & management (both are optional, results can be shown on standard output in quickcheck mode)
- update analyzer results only for modified files (depends on the build system)
- compare analysis results (new/resolved/unresolved bugs compared to a baseline)
- filter analysis results (checker name, severity, source file name ...)
- skip analysis in specific source directories if required
- suppression of false positives (in config file or in the source)
- Thrift API based server-client model for storing bugs and viewing results.
- It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.
  (command line client is the recommended way to connect into Continuous Integration loops)

Command line examples of usage can be found here: https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
- Small projects/several source files (quick feedback)
    No database is used, analysis results are shown in on the command line only
- Medium size projects (~500 files)
    Results are stored in SQLite/PostgreSQL database and can be viewed from command line or web viewer clients
- Large size projects (>500 files)
    Results are stored in PostgreSQL database and can be viewed from command line or web viewer clients

There are currently discussions about analyzer tool support in multiple email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

- Problem: Different analyzers provide different output formats (Clang Static Analyzer provides plist/html/command line, Clang-tidy provides command line output only)
  Solution: With Codechecker analyzer results from multiple analyzers can be viewed in a common way for developers or other tools for further result processing.

- Problem: CC environment variable overwriting by previous scan-build version (written in perl) is not always a good solution.
  Solution: Compilation database is generated by CodeChecker (currently using the LD_PRELOAD technique, later with scan-build.py for OSX support).

- Problem: Analyzer has multiple command line arguments which could be changed by time, the end users should not be affected.
  Solution: CodeChecker hides the clang analyzer specific options from the user. Many options are preconfigured. But forwarding options without modifications to the analyzers is supported.

- Problem: Understanding analyzer results might be harder if only command line results are available (currently generated static html sites do not scale and it is hard to manage).
  Solution: Analysis steps can be viewed in command line with quickcheck or in the web viewer (dynamically generated based on the database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
- Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
- SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, for supporting multiple database backends
- Alembic (MIT) - required for database migration support which is only available for PostgreSQL database
- pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is required for PostgreSQL database support (both are supported)

We should NOT include dependencies on LGPL!

Gyorgy_Orban · March 4, 2016, 12:10pm

Hi,

We started to restructure and cleanup our source code for a better integration with the current lit testing environment in llvm/clang.
After we are done we can start to merge the source in.

Adding CodeChecker infrastructure would be very valuable to those who use clang for bug finding. It provides a single place to view the bugs reported by different tools such as the static analyzer and clang-tidy. The ability to track bugs over time and cutting a baseline so that only the new bugs are reported is important for large projects that cannot address all of the issues at once.

Let’s proceed with merging it in. Please, split commits into incremental logical chunks. (LLVM Developer Policy — LLVM 18.0.0git documentation)

Hi,

We would like to add CodeChecker (GitHub - Ericsson/codechecker: CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy) analyzer infrastructure.

This is an alternative tool to scan-build with extended functionality.
Some of the main features are: track issues over time, suppress false positives, detect new issues by comparing multiple analyzer run results,
view and compare results in a web browser or in the command line. A more detailed feature list can be found below (*).
The analyzer infrastructure is built in a way that integrating a new analyzer can be easily done.
We are developing a tool which can be used easily by the developers or by automated continuous integration tools and view the results from multiple analyzers in a common way.
We think it would serve as a good base for displaying and tracking bugs that can be detected by the other clang tools such as clang-tidy which is already supported.

For example, you can find the analysis results of the LLVM code 3.6.2 and 3.7.1 here: http://modelserver.inf.elte.hu:5000

Main questions to the community:
0. Does the Clang community like the idea?
1. CodeChecker has some 3rd party dependencies see below (**), are they acceptable?
2. Is the community satisfied with the CodeChecker name?

Unless the name is a blocker on your side, I’d like to discuss it later once we see what the interface looks like. Frankly, I am not a fan of this name sine it’s very ambiguous.

I do not think it is a blocker, we can discuss it later.

Integration plan:
0. CodeChecker should use scan-build.py (OSX support) to generate the compilation database instead of the current LD_PRELOAD technique

Should we implement this feature (with scan-build.py intercept) before we merge our code base or after?

1. Migrate CodeChecker testing infrastructure to the current LLVM testing infrastructure

(*) Most notably it extends the current tool set with the following features:
- stores the result of multiple large analysis run results efficiently (opposed to scan-build/scan-view static htmls)
- run multiple analyzers, currently Clang Static Analyzer and Clang-Tidy is supported
- dynamic web based defect viewer (instead of static html)
- a SQLite/PostgreSQL based defect storage & management (both are optional, results can be shown on standard output in quickcheck mode)
- update analyzer results only for modified files (depends on the build system)
- compare analysis results (new/resolved/unresolved bugs compared to a baseline)
- filter analysis results (checker name, severity, source file name ...)
- skip analysis in specific source directories if required
- suppression of false positives (in config file or in the source)
- Thrift API based server-client model for storing bugs and viewing results.
- It is possible to connect multiple bug viewers. Currently a web-based viewer and a command line viewer are provided.
   (command line client is the recommended way to connect into Continuous Integration loops)

Command line examples of usage can be found here: https://github.com/Ericsson/codechecker/blob/master/docs/usage.md

CodeChecker supports multiple use cases:
- Small projects/several source files (quick feedback)
     No database is used, analysis results are shown in on the command line only
- Medium size projects (~500 files)
     Results are stored in SQLite/PostgreSQL database and can be viewed from command line or web viewer clients
- Large size projects (>500 files)
     Results are stored in PostgreSQL database and can be viewed from command line or web viewer clients

There are currently discussions about analyzer tool support in multiple email threads:

http://clang-developers.42468.n3.nabble.com/Idea-for-better-invoking-static-analysis-via-command-line-td4049670.html
http://clang-developers.42468.n3.nabble.com/Proposal-Integrate-static-analysis-test-suites-td4048967.html

CodeChecker provides solutions for many problems discussed there:

- Problem: Different analyzers provide different output formats (Clang Static Analyzer provides plist/html/command line, Clang-tidy provides command line output only)
   Solution: With Codechecker analyzer results from multiple analyzers can be viewed in a common way for developers or other tools for further result processing.

- Problem: CC environment variable overwriting by previous scan-build version (written in perl) is not always a good solution.
   Solution: Compilation database is generated by CodeChecker (currently using the LD_PRELOAD technique, later with scan-build.py for OSX support).

- Problem: Analyzer has multiple command line arguments which could be changed by time, the end users should not be affected.
   Solution: CodeChecker hides the clang analyzer specific options from the user. Many options are preconfigured. But forwarding options without modifications to the analyzers is supported.

- Problem: Understanding analyzer results might be harder if only command line results are available (currently generated static html sites do not scale and it is hard to manage).
   Solution: Analysis steps can be viewed in command line with quickcheck or in the web viewer (dynamically generated based on the database), which can help to understand the analysis results.

(**) 3rd party dependencies for various features:
- Python 2.7.5 (Python Software Foundation) - required to run CodeChecker
- SQLAlchemy (MIT) - Python SQL toolkit and Object Relational Mapper, for supporting multiple database backends
- Alembic (MIT) - required for database migration support which is only available for PostgreSQL database
- pg8000 (BSD) or psycopg2 (LGPL) - at least one database connector is required for PostgreSQL database support (both are supported)

We should NOT include dependencies on LGPL!

This is an optional runtime dependency (it is not included), we do not require it, we just support it at runtime if available at the host machine.

- Thrift (Apache v2.0) - cross-language service building framework to handle data transfer for report storage and result viewer clients
- Codemirror (MIT) - view source code in the browser
- Jsplumb (community edition, MIT) - draw bug paths
- Marked (BSD) - view documentation for checkers written in markdown (generated dynamically)
- Dojotoolkit (BSD) - main framework for the web UI
- Highlightjs (BSD) - required for highlighting the source code

For further information check out our GitHub (GitHub - Ericsson/codechecker: CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy) page.

Best Regards,
Gyorgy Orban
_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
cfe-dev Info Page

Best Regards,
Gyorgy Orban

alexfh · April 25, 2016, 3:30pm

(having dug the e-mail from the bottom of my inbox)

I certainly like the idea of having an open-source web-based results browser for clang-tidy and clang static analyzer results. I like the features of CodeChecker (issue browsing, suppression, diffs). And I can suggest more potentially useful features like:

code-centric browsing of the issues (with directory view showing aggregate numbers of issues in each file/subdirectory and a file view showing all issues in the file in a compact form - without execution paths);
an easy way to apply fixes for a subset of issues in a file / directory.

I’m not sure though, if integrating CodeChecker source code to the LLVM project brings a lot of benefits to the CodeChecker developers and/or users. I don’t have any objections, I just don’t understand at this point, what are you expecting to achieve by moving the code to LLVM.

AnnaZaks · April 25, 2016, 4:22pm

(having dug the e-mail from the bottom of my inbox)

I certainly like the idea of having an open-source web-based results browser for clang-tidy and clang static analyzer results. I like the features of CodeChecker (issue browsing, suppression, diffs). And I can suggest more potentially useful features like:

code-centric browsing of the issues (with directory view showing aggregate numbers of issues in each file/subdirectory and a file view showing all issues in the file in a compact form - without execution paths);

an easy way to apply fixes for a subset of issues in a file / directory.

I’m not sure though, if integrating CodeChecker source code to the LLVM project brings a lot of benefits to the CodeChecker developers and/or users. I don’t have any objections, I just don’t understand at this point, what are you expecting to achieve by moving the code to LLVM.

The benefit to the CodeChecker team is that they will gain more visibility (both in terms of users and fellow developers).
The benefit to LLVM is that we could gain a much better issue viewing and triaging tool than what we have now. Ex: my hope is that CodeChecker would replace scan-view.

Anna.

alexfh · April 25, 2016, 4:29pm

Fair enough. That makes sense to me. Thank you for the explanation! I’m glad the tool is being developed whether in its own repository or as a part of the llvm project.

Orban_Gyorgy · August 31, 2016, 9:32am

Hi,

The first codechecker patch including the core modules, documentation
and unit tests is available here: https://reviews.llvm.org/D24040
After the review is done, the second patch will contain the functional tests.
The infrastructure dependencies for testing should be discussed. Right
now Travis CI Travis CI - Test and Deploy Your Code with Confidence is triggered
for every commit and pull request to run the unit and functional tests
on Linux and OSX.
The third patch will contain the web UI.

Any feedback is appreciated!

Br,
Gyorgy

Orban_Gyorgy · September 15, 2016, 9:38am

Hi,

During the review a question was raised where should we put the
CodeChecker source code.
It has no strong revision lock to clang or clang-tidy.
I would like to ask the community where should we put the source code?
Should it go under the clang-tools-extra or kept under a separate
repository under the LLVM umbrella or do you have any other idea?

Br,
Gyorgy

Xazax-hun · September 15, 2016, 3:45pm

Hi!

I think one of the long term goals of the Static Analyzer is to remove the current HTML. The recommended way to view path sensitive reports, however, is to use an IDE or a viewer that support highlighting execution paths. It would be strange that once the HTML output is removed the static analyzer will not have a recommended view method inside the clang repository. I don’t know whether this is a concern, but it something that is worth to consider.

Regards,

Gábor

Topic		Replies	Views
codechecker into clang/LLVM? Clang Frontend	12	276	December 14, 2015
CodeChecker infrastructure on GitHub Clang Frontend	0	138	July 28, 2015
Static analysis tool development Clang Frontend	21	320	January 21, 2009
Clang analyzer Google Summer of Code ideas/proposals Clang Frontend	3	167	April 7, 2010
More plans on Static Analyzer + Clang-Tidy interoperation. Clang Frontend	7	272	October 15, 2020

Proposal: Integrate CodeChecker analyzer infrastructure

Related topics