Future directions for the analyzer

Hi, Alex, hi, cfe-dev. As I mentioned last night, here are some of the things Ted, Anna, and I have talked about as long-term directions for the static analyzer, particularly those that might relate to clang-tidy.

One of the things the recent CheckName patch brought up is how to silence warnings. Today’s check names (apart from those in the alpha.* packages) haven’t ever been promised to be stable, but we know people are relying on them, so we try not to change them often. Our last pass for cleaning up these names was prompted by the “macosx” → “osx” rename, but even there we’re sort of sidestepping iOS, which hasn’t been called “OS X” since 2008. The more interesting thing might be to break down existing checkers further and to come up with a good notion of what constitutes a single “check”. Which checkers need to be turned on to implement a particular “check” should be an implementation detail (and mostly already is).

Going a bit further, how do we mark issues as false positives? We have deliberately avoided just copying the diagnostic pragmas, because those aren’t necessarily the best interface for the analyzer. Instead, we’d like to come up with some kind of issue tracking, a way to identify an issue across rebuilds. Currently we have a basic implementation of such a thing based on the function the issue was reported in, its position within that function (actually just a relative line number), and the text of the diagnostic. This is clearly less than ideal, but even with that we’ve had a bit of internal success in comparing analysis results from different revisions of a project—or of the analyzer. (This is where the CmpRuns.py tool comes from, and why it still has that name even though it’s become a sort of general-purpose access to the analyzer plists.)

As you know, currently the only way to silence an unwanted analyzer report is to disable a checker, which can be done on a per-translation-unit basis. To make a long story short, we haven’t designed anything else yet, but if/when we do we’d prefer it to be more in line with “issue tracking” rather than “silencing”, (“We” here referring to “the LLVM community interested in the analyzer”, not just “Apple analyzer folks”.)

In the long-term, we’ve thought about adding a true full-program analysis mode to the analyzer. In this mode, scan-build might really be the primary interface, with “clang --analyze” limited to a single translation unit. The single-TU mode probably isn’t going away, and even today most of the checkers operate on a per-function level (most analyzer state is reset between top-level entry points), but there are certainly things that would require cross-TU reasoning, such as Vassil’s proposed project-wide copy-paste analysis. Of course, we would be able to expose this functionality in a library (pass a compilation database, analyze everything in it).

All of these things could be interesting for clang-tidy in the future, but likely not in any sort of incompatible way. (More proximally, I’m still wondering how you intend to expose path diagnostics in a meaningful way.) Regardless, it’s probably good to write this all down, and yes, I should probably include some of it on the analyzer website.

For now, I think Ted said it well in our offline chat yesterday: getting more people to use the analyzer can only be good for the project. Please continue to send requests and patches our way, and thanks for your patience and understanding of my concerns thus far. Let’s make this a good one!

Jordan

Thanks for the summary Jordan. I wanted to comment on this point in particular, because the motivation here might not be so apparent to everyone why we want to do things a bit differently for the analyzer than we do for ordinary compiler diagnostics.

With the static analyzer, two factors influence how a particular issue gets reported:

(1) The path the analyzer discovered that led to the issue.

(2) Where along the path the issue should be reported.

Both of these have changed over time, and as we have added richer analysis (e.g., interprocedural analysis within a translation unit) we’ve had a great deal of leeway, and design work, in figuring out the best locations for diagnostics to appear. We’d like to continue to explore new ways to improve analyzer diagnostics over time, while fundamentally not losing track of a users’s decision to suppress an issue (regardless of whether it is a true positive or a false positive).

Besides desiring flexibility in how issues are reported, we want issue triaging to persist (as much as possible) over code changes. If a user suppresses an issue and modifies their code, it is possible that the analyzer finds the same problem but along a different code path. In such cases, the report might even appear in a different location. Having a flexible triaging system besides pragmas allows us to potentially design solutions that make the analyzer far more tolerant of an evolving codebase.

Finally, users often use the static analyzer with projects that contain a mixture of code that they own, and code borrowed from somebody else. For example, this could be as simply as using templates from Boost. An issue from the analyzer may easily span such code boundaries, and users may feel reluctant to sprinkle pragmas (or other source annotations) in the code they don’t directly own. This same problem can exist with compiler warnings, but it doesn’t usually manifest in quite the same way. Today we already employ a fair number of heuristics in the analyzer to suppress warnings in contexts where they aren’t that interesting, but ideally we’d want a solution in the long term that is more flexible. Such flexibility will become more essential as the analyzer gets more sophisticated (e.g., global code analysis).