[analyzer] Adding build bot for static analyzer reference results

Hi all,

We’re planning to add a public Apple build bot for the static analyzer to Green Dragon (http://lab.llvm.org:8080/green/). I’d like to get your feedback on our proposed approach.

The goal of this bot is to catch unexpected analyzer regressions, crashes, and coverage loss by periodically running the analyzer on a suite of open-source benchmarks. The bot will compare the produced path diagnostics to reference results. If these do not match, we will e-mail the committers and a small set of interested people. (Let us know if you want to be notified on every failure.) We’d like to make it easy for the community to respond to analyzer regressions and update the reference results.

We currently have an Apple-internal static analyzer build bot and have found it helpful for catching mistakes that make it past the normal tests. The main downside is that the results need to be updated when new checks are added or the analyzer output changes.

We propose taking a “curl + cache” approach to benchmarks. That is, we won’t store the benchmarks themselves in a repository. Instead, the bots will download them from the projects' websites and cache locally. If we need to change the benchmarks (to get them to compile with newer versions of clang, for example) we will represent these changes as patch sets which will be applied to the downloaded version. Both these patch sets and the reference results will be checked into the llvm.org/zorg repository so anyone with commit access will be able to update them. The bot will use the CmpRuns.py script (in clang’s utils/analyzer/) to compare the produced path diagnostic plists to the reference results.

We’d very much appreciate feedback on this proposed approach. We’d also like to solicit suggestions for benchmarks, which we hope to grow over time. We think sqlite, postgresql, openssl, and Adium (for Objective-C coverage) are good initial benchmarks — but we’d like to add C++ benchmarks as well (perhaps LLVM?).

Devin Coughlin
Apple Program Analysis Team

Hi all,

We’re planning to add a public Apple build bot for the static analyzer to
Green Dragon (http://lab.llvm.org:8080/green/). I’d like to get your
feedback on our proposed approach.

The goal of this bot is to catch unexpected analyzer regressions, crashes,
and coverage loss by periodically running the analyzer on a suite of
open-source benchmarks. The bot will compare the produced path diagnostics
to reference results. If these do not match, we will e-mail the committers
and a small set of interested people. (Let us know if you want to be
notified on every failure.) We’d like to make it easy for the community to
respond to analyzer regressions and update the reference results.

We currently have an Apple-internal static analyzer build bot and have
found it helpful for catching mistakes that make it past the normal tests.
The main downside is that the results need to be updated when new checks
are added or the analyzer output changes.

We propose taking a “curl + cache” approach to benchmarks. That is, we
won’t store the benchmarks themselves in a repository. Instead, the bots
will download them from the projects' websites and cache locally.

If we're going to be downloading things from external sources, those
sources could be changing, no? Or will we pin to a specific version - if
we're pinning to a specific version, what's the benefit to taking an
external dependency like that (untrusted, may be down when we need it,
etc), compared to copying the files permanently & checking them in to
clang-tests (or clang-tests-external) as I did for GDB?

If we're interested in catching regressions in both the external code and
our code (which I'm interested in doing for GDB, but haven't had time) I
can see why it'd make sense to track ToT of both projects, but that's a bit
of a different goal - and we'd probably want someone to triage those before
mailing developers who committed the changes. (because many regressions
will be due to the external project changing, not the LLVM developer's
change causing a regression)

We will pin it to a specific version, although we may want to periodically (yearly or even less frequently) update to a newer version of the benchmarks to make sure we get good coverage of newer language features and to keep the benchmarks compiling with ToT clang.

The goal here is to avoid unnecessary co-mingling of benchmarks with different licenses in the llvm.org repositories — but as you point out, it comes with significant downsides. What has your experience with gdb and clang-test-external been like? Has dealing with differing licensing w/r/t clang been a challenge?

Devin

I don't believe so - we just put it out in a separate repository from
clang-tests, to ensure those who might have reasons for not wanting to view
code under such a license could ensure they didn't accidentally run across
it.

But as usual, legal/licensing issues should be directed at lawyers, of
which I am not one.

- David

Sending emails to people who change the results of the static analyzer seems fine. I’m concerned that catching performance regressions in the analyzer might have some false positives, though. The static analyzer is fairly isolated, so maybe there won’t be many false positives, but if it becomes a problem, we should probably just disable this part of the reporting and simply track performance over time.

We’re not proposing to report performance regressions with this bot — only regressions in analyzer diagnostics. I think it would be very useful to track performance over time, but that is not something we plan to do initially with this bot.

Devin