[CrossTU] [CTU] Status of Cross Translation Unit Static Analysis, and a new build server

Dear Clang Community,

TLDR; I’d like to encourage you to experiment with CTU static analysis because it has evolved a lot this year! Usage documentation is available online.

This year, we have landed several ASTImporter patches that made it possible to analyze even C++11/14 projects with reasonable stability. I can confidently assure you that the upstream master of llvm/llvm-project is as stable as our downstream fork. We’ve made enormous efforts to properly implement the error handling of ASTImporter and this way we could dramatically improve the stability of CTU. We do have a few more patches that we still want to land in the future, but they are not error handling related. From Clang version 10, we no longer plan to maintain our CTU downstream fork.

We have set up a publicly available Jenkins build server that continuously analyzes the following C and C++ projects:

  • Tmux (C)

  • Curl (C)

  • Redis (C)

  • Xerces (C++14)

  • Bitcoin (C++11)

  • Protobuf (C++11/C++14)

CTU analysis results are compared to non-CTU results, both for C projects and C++ projects. CTU always results in more findings, and the false-positive/true-positive ratio remains roughly the same [1]. We are monitoring the analysis job, and if an assertion/crash happens then we are going to get in touch with the author of the commit that plausibly caused the error. We also have a buildbot patch in Phabricator which is ought to analyse only one simple C project (Tmux). We decided to install Jenkins because of ownership and flexibility. The buildbot code is pretty convoluted and review and communication are very slow. The buildbot also uses CodeChecker as a dependency, changes to its workflow requires changes in the buildbot configuration as well.

Notes:

Please note that our primary target for CTU is Linux, and we encourage everybody to use CodeChecker for CTU (scan-build is no longer supported by CTU developers). Projects that use templates heavily may expect an increase in analysis time. Also note that CTU can be very memory consuming, e.g. in case of the analysis run on LLVM code we’ve seen 10GB of resident memory usage for one process. Thus, it may be useful to set a maximum limit for the loaded ASTUnits (e.g. -analyzer-config ctu-import-threshold=8, with CodeChecker you need to edit an saargs file).

[1] 2017 EuroLLVM Developers’ Meeting: G. Horvath “Cross Translational Unit Analysis in Clang …”

Cheers,

Gabor Marton

Dear Clang Community,

TLDR; I'd like to encourage you to experiment with CTU static analysis because it has evolved a lot this year! Usage documentation is available online.

This year, we have landed several ASTImporter patches that made it possible to analyze even C++11/14 projects with reasonable stability. I can confidently assure you that the upstream master of llvm/llvm-project is as stable as our downstream fork. We've made enormous efforts to properly implement the error handling of ASTImporter and this way we could dramatically improve the stability of CTU. We do have a few more patches that we still want to land in the future, but they are not error handling related. From Clang version 10, we no longer plan to maintain our CTU downstream fork.

We have set up a publicly available Jenkins build server that continuously analyzes the following C and C++ projects:

- Tmux (C)

- Curl (C)

- Redis (C)

- Xerces (C++14)

- Bitcoin (C++11)

- Protobuf (C++11/C++14)

CTU analysis results are compared to non-CTU results, both for C projects and C++ projects. CTU always results in more findings, and the false-positive/true-positive ratio remains roughly the same [1]. We are monitoring the analysis job, and if an assertion/crash happens then we are going to get in touch with the author of the commit that plausibly caused the error. We also have a buildbot patch in Phabricator which is ought to analyse only one simple C project (Tmux). We decided to install Jenkins because of ownership and flexibility. The buildbot code is pretty convoluted and review and communication are very slow. The buildbot also uses CodeChecker as a dependency, changes to its workflow requires changes in the buildbot configuration as well.

Notes:

Please note that our primary target for CTU is Linux, and we encourage everybody to use CodeChecker for CTU (scan-build is no longer supported by CTU developers). Projects that use templates heavily may expect an increase in analysis time. Also note that CTU can be very memory consuming, e.g. in case of the analysis run on LLVM code we've seen 10GB of resident memory usage for one process. Thus, it may be useful to set a maximum limit for the loaded ASTUnits (e.g. -analyzer-config ctu-import-threshold=8, with CodeChecker you need to edit an saargs file).

I was initially going to write a mail that CodeChecker trunk + clang
trunk isn't sufficient, since it complained for me:
$ clang --version
clang version 10.0.0-+20191116100608+584704c725a-1~exp1~20191116211218.2875
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
$ CodeChecker analyze --ctu build-Clang-SANITIZE/compile_commands.json
-o clang-sa-reports
usage: CodeChecker [-h] {analyze} ...
CodeChecker: error: unrecognized arguments: --ctu

Nut apparently the detection of clang to use is simply broken.
Because if i follow
https://github.com/Ericsson/codechecker#configuring-clang-version
and specify clang-10 it appears to start working. I guess i should file a bug.
Also, https://github.com/Ericsson/codechecker/issues/1841 is *really*
inconvenient :confused:

Roman.

Hi,

First, we try to find clang-extdef-mapping next to the used clang binary. Then we search in the PATH for clang-extdef-mapping.
We suspect that when you do not specify clang-10 then clang-extdef-mapping is not found in the directory where your clang selected by the PATH resides (provided by the Debian package?).
Anyway, thanks for the issue report. We are considering to give better guidelines when the clang-extdef-mapping is not found. Currently we just bail out with a simple “unrecognized arguments” error in this case, which is indeed not that user friendly.

Thanks,
Gabor

Hi Alexander,

Thank you for your feedback

Can you configure please HTTPS for the buildbot? :slight_smile: I know that’s not urgent for now but… you know, browsers “don’t like” unsecure conections :slight_smile:
Yes, we are working on it.

Do you plan to extend set of C/C++ projects for continuous analysis? I think it can help catch earlier more errors. But I’m not sure how many build power you have.
Actually, we are not against that, but we are already at the brink of our budget in Azure. Perhaps we could get rid of protobuf analysis (that is the longest) so we could add other valuable C++ projects instead.
Currently we rent an 8 cores machine with 64Gb of memory (Standard_E8s_v3).

If it’s not a secret - can you somewhere publish more details about your Jenkins configuration for continuous CSA testing? Why I am asking - I have some free hardware And I thought that it can be used for such stuff. Even for development it would be helpful

Yes. Here it is: https://github.com/Ericsson/clang-jenkins
We have two dependencies for the build: 1) CodeChecker 2) csa_testbanch. They are pre-installed into the directory ctu_pipeline_aux, i.e. the job does not clone/configure/install them.
The pipeline script is just copied into the jenkins config of the pipeline job.

Gabor

Any chance this could be wired up to the existing buildbot or jenkins infrastructure, rather than having another one?

(I’m not 100% sure it’s better for it to all be together, but figured I’d ask/perhaps there are reasons that’ve already been considered/articulated)

Any chance this could be wired up to the existing buildbot or jenkins infrastructure, rather than having another one?
Parallel to the Jenkins job we started to work on a simplified buildbot instance, see https://reviews.llvm.org/D61848. The process of integrating that, however, seems to be stuck. Maybe Galina and Endre could revive that process? Also, the Jenkins job is long running job, it takes 8 hours to complete (8cores/64GB azure vm), but it analyzes 6 C/C++ projects. The buildbot is quick, it takes roughly 1-2 hours, but analyzes only one simple C project.

About Jenkins in LLVM I have many infrastructural questions:
Is there a way to integrate our Jenkins job to any LLVM Jenkins cluster? I know about http://green.lab.llvm.org/green/, but that cluster seems to be OSX only. We need Linux. Is there a Jenkins cluster of Linux executors we could use? Should we somehow connect our Azure vm into that cluster as an additional executor?

Gabor

Any chance this could be wired up to the existing buildbot or jenkins infrastructure, rather than having another one?
Parallel to the Jenkins job we started to work on a simplified buildbot instance, see https://reviews.llvm.org/D61848. The process of integrating that, however, seems to be stuck. Maybe Galina and Endre could revive that process? Also, the Jenkins job is long running job, it takes 8 hours to complete (8cores/64GB azure vm), but it analyzes 6 C/C++ projects. The buildbot is quick, it takes roughly 1-2 hours, but analyzes only one simple C project.

About Jenkins in LLVM I have many infrastructural questions:
Is there a way to integrate our Jenkins job to any LLVM Jenkins cluster? I know about http://green.lab.llvm.org/green/, but that cluster seems to be OSX only. We need Linux. Is there a Jenkins cluster of Linux executors we could use? Should we somehow connect our Azure vm into that cluster as an additional executor?

Hey Adrian - is there any likelihood/interest in having Green Dragon include non-apple-managed builders/scenarios in part to avoid proliferation of distinct CI systems?

Any chance this could be wired up to the existing buildbot or jenkins infrastructure, rather than having another one?
Parallel to the Jenkins job we started to work on a simplified buildbot instance, see https://reviews.llvm.org/D61848. The process of integrating that, however, seems to be stuck. Maybe Galina and Endre could revive that process? Also, the Jenkins job is long running job, it takes 8 hours to complete (8cores/64GB azure vm), but it analyzes 6 C/C++ projects. The buildbot is quick, it takes roughly 1-2 hours, but analyzes only one simple C project.

About Jenkins in LLVM I have many infrastructural questions:
Is there a way to integrate our Jenkins job to any LLVM Jenkins cluster? I know about http://green.lab.llvm.org/green/, but that cluster seems to be OSX only. We need Linux. Is there a Jenkins cluster of Linux executors we could use? Should we somehow connect our Azure vm into that cluster as an additional executor?

Hey Adrian - is there any likelihood/interest in having Green Dragon include non-apple-managed builders/scenarios in part to avoid proliferation of distinct CI systems?

+Azhar manages Green Dragon and is in a better position to answer that question.

– adrian