Static analysis tool development

Ben Laurie of Google has up to $50K to spend on a pilot project to
improve the state of static analysis of C code for open source
projects. Among other things Ben is involved with the OpenSSL project
and has tried some static analyzers such as Deputy, and Cyclone (which
is also a language extension of C), and has noted various problems and
limitations with these tools.

The goal of this pilot project is to get a static analyzer tool
developed/modified so that it is truly useful to the open source
community and can become a standard part of the development process.
The ability to customize the analysis is strongly desired. For
instance, after a security exploit is reported people might want to
review the rest of the code for the same problem. An analyzer that
helped do that would be quite beneficial.

If the pilot project goes well then additional funding is possible.

From the research I have done, clang seems to be the best front end

for this type of project. I have some questions:

1) What is the state of the static analyzer? Where do I start
learning what needs done on it? Is there a long term plan for it?

2) The ability to add custom logic to the analyzer is quite desirable.
Perhaps this could be made easier by integrating with a scripting
language like Python. To me, even the ability to write a script to
pattern match against the AST or other intermediate forms could be
quite useful.

3) Simply managing the volume of warnings can be difficult. I would
like to integrate some method of tracking warnings from build to build
to see what's new and perhaps to be able to prioritize what should be
investigated first. This would probably be separate from the
analyzer, but a useful front end will help the tool get adopted more
readily.

4) Annotations can be helpful to guide an analyzer. How difficult
would it be to extend the parser to accept a simple annotation syntax?

I am open to collaborating on this project if anyone is available.

Also, if you would like to learn more about this project or submit
your own proposal, feel free to contact "Ben Laurie"
<benl@google.com>.

Thanks for your help.

Monty

1) What is the state of the static analyzer? Where do I start
learning what needs done on it? Is there a long term plan for it?

To answer one of my own questions, I just found the slide and video
presentation on "Finding Bugs with the Clang Static Analyzer"
http://llvm.org/devmtg/2008-08/

Monty

Hi all,

Monty Zukowski wrote:

From the research I have done, clang seems to be the best front end

for this type of project. I have some questions:

1) What is the state of the static analyzer? Where do I start
learning what needs done on it? Is there a long term plan for it?

I would strongly suggest to start with:

Textbooks and introductory courses

Ben Laurie of Google has up to $50K to spend on a pilot project to
improve the state of static analysis of C code for open source
projects. Among other things Ben is involved with the OpenSSL project
and has tried some static analyzers such as Deputy, and Cyclone (which
is also a language extension of C), and has noted various problems and
limitations with these tools.

Hi Monty, Ben,

It’s great to hear about your interest! Since some of your questions are fairly broad, I have tried to respond to your questions with a varying levels of detail. Please do not hesitate to follow up with specific questions.

The goal of this pilot project is to get a static analyzer tool
developed/modified so that it is truly useful to the open source
community and can become a standard part of the development process.
The ability to customize the analysis is strongly desired. For
instance, after a security exploit is reported people might want to
review the rest of the code for the same problem. An analyzer that
helped do that would be quite beneficial.

If the pilot project goes well then additional funding is possible.

[1] The analyzer core can be modified or extended with new program analysis algorithms to meet the needs of different clients (e.g., provide the flexibility to tradeoff between accuracy and computational complexity, etc.).

[2] The analyzer core can be reused in a variety of contexts, such as in a standalone command-line tool or an IDE.

[3] The analyzer can be extended with new sets of “checks” by not invasively modifying the analyzer internals. Such extensibility can be provided with different layers of abstraction from the core analyzer API, with very high-level checks possibly being written in a very high-level manner (e.g., declarative) while some checks requiring more direct access to the core analyzer APIs. Both ends of the spectrum are important because some checks will require sophisticated algorithms with deep reasoning about the program while others might be much simpler and just clue off of common interfaces and APIs.

The analyzer core in Clang is intended to (eventually) meet all three goals. So far efforts have focused on [1] and [2], although there has been some work on [3] with regards to checks that directly use the analyzer’s C++ APIs. Providing higher-level interfaces to declaring checks is a great long term goal, but my feeling is that it makes more sense to mature the core technology first.

As a high-level design strategy, the analyzer core is written as a library (libAnalysis). This makes it inherently much easier to reuse in difference settings, which directly addresses goal [2]. The structure of libAnalysis is fairly modular, with a set of key abstract interfaces put in place to allow us to swap in and out different key components of the analyzer. For example, the engine that reasons about feasible paths within a function uses a “ConstraintManager” interface that is responsible for reasoning about the possible state of possible values along a path. Different implementations of ConstraintManager can be implemented to provide different levels of precision depending on the needs of clients.

Another high-level goal of the analyzer is to support the relaying of rich diagnostics to end-users about how a bug manifests in their program. The diagnostic reporting mechanism in the analyzer also uses a set of abstract interfaces so that bug reports can be rendered in a variety of ways (e.g., to the console, to an HTML page, within an IDE, etc.). Providing rich diagnostics is an important goal because without them the results of a static analysis algorithm is only useful to graduate students studying program analysis techniques rather than programmers who want to fix bugs.

From the research I have done, clang seems to be the best front end

for this type of project. I have some questions:

  1. What is the state of the static analyzer? Where do I start
    learning what needs done on it? Is there a long term plan for it?

The talk I gave last August provides a broad overview of the project and its state:

Finding Bugs with the Clang Static Analyzer
http://llvm.org/devmtg/2008-08/

More information on using the analyzer can be found at:

http://clang.llvm.org/StaticAnalysis.html

There is currently no real “internals” documentation on the analyzer, which of course is something that needs to be written. Perhaps the easiest way to look at the state of the analyzer is to look at the implementation of specific checks in the lib/Analysis directory of the Clang sources. Another good starting point is the file Driver/AnalysisConsumer.cpp; this file corresponds to the code in the driver that invokes the static analysis algorithms on a specific source file.

At a high level the analyzer consists of two analysis “engines”:

[a] A flow-sensitive, intra-procedural dataflow engine.
[b] A path-sensitive, intra-procedural (and eventually inter-procedural) dataflow engine.

FLOW-SENSITIVE ENGINE

Engine [a] mirrors the logic that typically goes on for flow-sensitive compiler checks, e.g.: live variables analysis, basic uninitialized variable checking, etc. The analyzer also has a “dead stores” checker that is based on [a]. The benefit of [a] is that the running type of an analysis is linear. The downside of [a] is that information can be lost where paths merge (e.g., at confluence points after branches).

Information on the theory and algorithms of [a] can be found in fairly standard compiler texts such as:

Advanced Compiler Design and Implementation
author: Muchnick

Compilers: Principles, Techniques and Tools Second Edition
authors: Aho, Lam, Sethi, Ullman

In libAnalysis, two analyses use [a]: LiveVariables.cpp and UninitializedValues.cpp.

PATH-SENSITIVE ENGINE

Engine [b] is a “path-sensitive” dataflow analysis engine which essentially is a symbolic simulator. This engine reasons about path-specific bugs such as null-dereferences, memory leaks, etc. It’s basically the core technology of what we generally equate with an “advanced” static analysis tool.

At a high-level, [b] reasons about the “reachable states” of a program be exploring an abstract state-space graph (represented as an “exploded graph”; ExplodedGraph.[h,cpp]). A “bug” is essentially a path that reaches a state that is considered to be “bad” (e.g., a null pointer was dereferenced). States are simply a collection of symbolic values representing an abstraction of a program’s state at a particular location in the program.

Operationally, the analyzer essentially interprets each expression in a program’s source code and reasons about the “state transitions” by considering the effect of an expression on a given state. From dataflow theory, the theoretical parlance for this operation is called a “transfer function”, but from an engineering standpoint it is essentially a visitor function that takes as input a state and an expression and returns a new state.

The analyzer design is inspired and similar (but in many ways quite different) to the algorithms and ideas in several well-known papers:

Precise interprocedural dataflow analysis via graph reachability

http://doi.acm.org/10.1145/199448.199462

Checking system rules using system-specific, programmer-written compiler extensions
http://portal.acm.org/citation.cfm?id=1251229.1251230

Symbolic path simulation in path-sensitive dataflow analysis
http://doi.acm.org/10.1145/1108792.1108808

(and several others)

From an engineering perspective, the trick is making the transfer function modular in construction, so that some aspects handle basic symbolic interpretation such as the manipulation of basic values (i.e., determine the result of “i + 10”) while other aspects are “checker specific” and reason about things such as the state of a lock, a file handle, whether or not a piece of data is considered “unsafe”, and so forth. The modular design of the transfer function interface is one important aspect of the analyzer that needs to be matured, although what’s there isn’t too bad of a starting point. This component is so important because it represents the core axis of flexibility with writing a diversity of checks, and thus maturing it is necessary in order to think about how to write checks using different high-level interfaces/

Other important components of the [b] include:

(1) ConstraintManager - the module that reasons about constraints on symbolic values
(2) StoreManager - the module that reasons about the bindings of variables to values. This is essentially the symbolic representation of the heap/stack.

The current implementation of ConstraintManager is “BasicConstraintManager”, which tracks only basic equality and inequality relationships (i.e., “a != 10”, “b == 4”). This is suitable for most predicates in system code (and for catching things like null dereferences) but not all. One goal is to implement another ConstraintManager that does basic range analysis (i.e., track “a > b”). This would be especially useful for buffer overflow detection.

For StoreManager, we are actively using BasicStoreManager, which just tracks bindings for variables on the stack. Zhongxing Xu has put a fair amount of work into developing a new StoreManager called “RegionStoreManager” that will extend the analyzer’s abilities to reason about the values of fields, arrays, etc. We certainly invite anyone who is interested in helping complete this feature to get involved.

Probably the other biggest point worth mentioning is that the path-sensitive engine is currently restricted to function-at-a-time analysis. Doing inter-procedural analysis is something that is definitely planned, but requires different levels of infrastructure depending on the scope of the analysis. In the most general case we need to be able to perform whole-program analysis across source files. This requires being able to build an entire image of the program (including information about linkage so we know we know the exact corresponding definitions for called functions). There are a variety of ways to tackle this task, and this is a major piece of infrastructure that certainly could benefit from anyone wishing to help get involved.

  1. The ability to add custom logic to the analyzer is quite desirable.
    Perhaps this could be made easier by integrating with a scripting
    language like Python. To me, even the ability to write a script to
    pattern match against the AST or other intermediate forms could be
    quite useful.

This is a great long term goal. My feeling is that the core analyzer logic needs to converge a little further first so that checks can be readily written using the C++ API. Afterwards using a higher-level language like Python or Javascript seems like an excellent project.

  1. Simply managing the volume of warnings can be difficult. I would
    like to integrate some method of tracking warnings from build to build
    to see what’s new and perhaps to be able to prioritize what should be
    investigated first. This would probably be separate from the
    analyzer, but a useful front end will help the tool get adopted more
    readily.

“Issue tracking” would useful for a wide range of clients. To do this well, however, I don’t think it can be done completely separate from the analyzer. Issue tracking requires rich diagnostics that describe a bug but also must be insensitive to code churn that has no bearing on a particular bug or false positive. In some cases this may require issuing queries to the analyzer to extract more information. As a first cut, however, the output could be based entirely on the diagnostics.

As an interesting side note, the developers of Adium (http://www.adiumx.com) have actually been regularly scanning their sources using Clang’s static analyzer. As part of their process, they wrote some scripts that inspect the emitted diagnostics (in this case HTML files) and did some munging to determine if the a report emitted on a given scan was the same as in a prior scan. Their goal is to avoid re-inspecting previously diagnosed bugs and false positives. Having a general mechanism in the analyzer library would be quite useful, and would be a good first order project for anyone who is interested in getting involved.

  1. Annotations can be helpful to guide an analyzer. How difficult
    would it be to extend the parser to accept a simple annotation syntax?

C and its derivatives are complex languages, so it really depends on the annotation syntax. For example, it is really easy to add GCC-style attributes to Clang (whose syntax could then be wrapped by macros):

http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html

The analyzer already exploits a variety of attributes such as “nonnull”, “unused”, “IBOutlet” (the latter being added for an Objective-C check the analyzer performs), and new ones could readily be added as well. It all depends on what you want to express.

I am open to collaborating on this project if anyone is available.

That would be wonderful. My suggestion is to start with a simple set of annotations for which the corresponding checks are not too complicated. The design can then iterate from there. There has been a lot of work on annotation systems, and probably the most traction is to start with annotations that would have the widest practical benefit (i.e., low cost of adoption, easy implementation, etc.).

Also, if you would like to learn more about this project or submit
your own proposal, feel free to contact “Ben Laurie”
<benl@google.com>.

If you wish to discuss more aspects of the analyzer I highly recommend you subscribe (if you haven’t already) to cfe-dev. For those who wish to get directly involved in development, we discuss patches on cfe-commits.

I mentioned a fair number of points in this email regarding directions for the static analyzer. Two particularly important action points (which are fairly accessible to anyone wishing to get involved) are:

  1. improving the web app to manage false positives
  2. run-to-run tracking of analysis results

Both points go right to the issue of user workflow and the overall usability of the analyzer, and don’t require (at least initially) having an intimate knowledge of the analyzer’s internals or its algorithms.

If you have any specific questions, please do not hesitate to ask.

Best,
Ted

Hi Monty, Ben,

Ted has given a detailed overview and long-term plan for the static analyzer of clang. I’ll list some short-term goals of mine.

  1. Consolidate the current implementation by running the analyzer on various code. There is a script called scan-build that can help integrate clang into the build process of target projects.

  2. I am also writing a libc wrapper that enables us to capture the build process better. Basically we could wrap various exec() system calls and record the whole build process. Then we could replay it in various ways.

  3. Finish AST serialization. This is the first step toward the whole program analysis.

Also there are some aspacts of the analyzer that can be worked on independently.

  1. Develop a new ConstraintManager. This could be based on basic range analysis, a SAT solver, or an SMT solver.

  2. Add specific checkers to the engine.

-Zhongxing

From: Ted Kremenek <kremenek@apple.com>
Date: January 16, 2009 3:42:34 PM PST
To: Monty Zukowski <monty@codetransform.com>
Cc: Ben Laurie <benl@google.com>, cfe-dev@cs.uiuc.edu
Subject: Re: [cfe-dev] Static analysis tool development
Reply-To: kremenek@apple.com

...

Another high-level goal of the analyzer is to support the relaying of rich diagnostics to end-users about how a bug manifests in their program. The diagnostic reporting mechanism in the analyzer also uses a set of abstract interfaces so that bug reports can be rendered in a variety of ways (e.g., to the console, to an HTML page, within an IDE, etc.). Providing rich diagnostics is an important goal because without them the results of a static analysis algorithm is only useful to graduate students studying program analysis techniques rather than programmers who want to fix bugs.

...

"Issue tracking" would useful for a wide range of clients. To do this well, however, I don't think it can be done completely separate from the analyzer. Issue tracking requires rich diagnostics that describe a bug but also must be insensitive to code churn that has no bearing on a particular bug or false positive. In some cases this may require issuing queries to the analyzer to extract more information. As a first cut, however, the output could be based entirely on the diagnostics.

As an interesting side note, the developers of Adium (http://www.adiumx.com) have actually been regularly scanning their sources using Clang's static analyzer. As part of their process, they wrote some scripts that inspect the emitted diagnostics (in this case HTML files) and did some munging to determine if the a report emitted on a given scan was the same as in a prior scan. Their goal is to avoid re-inspecting previously diagnosed bugs and false positives. Having a general mechanism in the analyzer library would be quite useful, and would be a good first order project for anyone who is interested in getting involved.

I thought I'd chime in here; I wrote the scripts we currently use in Adium for this purpose. Sources (which do not work with the latest version of Clang, to my knowledge) are at: http://trac.adiumx.com/browser/trunk/Utilities/Analyzer

In response to the above paragraphs of Ted's e-mail, I thought I'd announce here that I've been working on a Trac plugin for viewing Clang reports. Recent versions of the analyzer will output to plist files, or HTML files, but not both (though this would presumably be a trivial change). The Trac plugin parses plist files to generate a file view, which is annotated with the appropriate diagnostics.

Using a dynamic view (i.e., the one generated by Trac) to view the reports has some benefits over the static files. For one, we can strip parts of the code that are irrelevant to the report, instead showing only a few context lines above and below each diagnostic. Another thing we can do is add links to other pages, such as a report generated on a previous revision of the file.

More importantly, however, is the possibility to automatically file tickets based on new bugs. The system can track a single bug longitudinally across several commits, using simple heuristics to decide when a report has been resolved, or merely changed. (This would integrate well with a system like Buildbot for performing scan builds and uploading new plists.)

Is there any interest in a plugin like this? I'd be glad to share it on a repository if anyone would like to help flesh out the idea.

I'm glad to hear all the details of your analyzer. I'll have more
questions later, I'm sure, but for now this is what interests me most:

Another high-level goal of the analyzer is to support the relaying of rich
diagnostics to end-users about how a bug manifests in their program. The
diagnostic reporting mechanism in the analyzer also uses a set of abstract
interfaces so that bug reports can be rendered in a variety of ways (e.g.,
to the console, to an HTML page, within an IDE, etc.). Providing rich
diagnostics is an important goal because without them the results of a
static analysis algorithm is only useful to graduate students studying
program analysis techniques rather than programmers who want to fix bugs.

As you mentioned later, issue tracking is very important and the
analyzer can be designed to help with that. It seems to me that that
could be the best use of Google's money to get this tool into its most
useful state. I can see you've put a lot of thought into the other
analysis which can be added to the tool later. I'm not an expert in
that area so I'll probably leave that area untouched.

In any event, you've described a tool which seems to have been
designed to be both extensible and useful and for that I'm very
excited. It seems like such an obvious need, doesn't it?

Monty

P.S. I'll be offline on holiday and probably won't answer any other
emails until Monday night or Tuesday.

I'm glad to hear all the details of your analyzer. I'll have more
questions later, I'm sure, but for now this is what interests me most:

Another high-level goal of the analyzer is to support the relaying of rich
diagnostics to end-users about how a bug manifests in their program. The
diagnostic reporting mechanism in the analyzer also uses a set of abstract
interfaces so that bug reports can be rendered in a variety of ways (e.g.,
to the console, to an HTML page, within an IDE, etc.). Providing rich
diagnostics is an important goal because without them the results of a
static analysis algorithm is only useful to graduate students studying
program analysis techniques rather than programmers who want to fix bugs.

As you mentioned later, issue tracking is very important and the
analyzer can be designed to help with that.

I do wonder if suppression of false positives is better done by
annotation than by tracking...

a) The annotation can be reused by other analysers.

b) The annotation works for developers who start from scratch.

I thought I could comment on that a little. As Ted said, it's possible to extend analyzer with new sets of checks without large modifications. At our company we have combined this possiblity with manual code reviews - when a bug is found during a code review, we try to implement a static analyzer check which would a) automatically check rest of the code for the same problem and b) prevent this problem in the future. I have been able to implement some basic checks without extensive compiler or C++ background fairly easily into clang itself by using the AnalysisManager API. The results have been very positive.

If/when clang static analyzer will allow easy extensibility as Ted described in option [3], it would be very interesting to see if clang user community could come up with some collaborative way of sharing various custom checks as pluggable & configurable components. A wiki, perhaps? I think that sharing programming experience and knowledge as clang analyzer checks for common programming errors and best practices would be useful for open source community.

What would be wrong with just integrating them into the source and
having them invdividually enableable?

There's nothing wrong with integrating additional checks into clang, of course. However, as a Mac developer and clang user I would like to be able to download and install additional checks simply by dropping the downloaded binary into some directory, without recompiling clang each time I'd like to try a new check which some other developer has written. I also suspect that some checks could exist which aren't completely in line with clang's goals, e.g. which generate too many false positives for average project, but which would be beneficial in projects of a specific type. For example, I have written some specific coding convention checks which have way too high level of false positives for being included into the official clang, but which are useful for developers who follow the same conventions.

What's wrong with:

   #include <http://llvm.org/static_analysis/effective_cxx&gt;

? :slight_smile:

I agree.

As clang is backed by Apple, it contains built-in check for specifics Apple library like Cocoa/CoreFoundation.
But some other Open Source library may want to be able to also write library specific tests.

This feature will be very useful to distribute package that contains a library and a clang plugin to check this library good practices.
For example, it's probably possible to write test for glib or QT memory management. To be able to distribute them as plugin with the library would be a great.

I am totally a fan of modular plugin designs (I wrote the Apache 2
plugin management system, for example), so very much agreed.

How well does the current architecture support dynamic plugins?
Perhaps support for that is somewhere I could be instantly useful! :slight_smile:

Does that work? If so, awesome!

If not, well, it should!

LLVM supports dynamically loadable optimizations, code generators, etc. It would not be hard to dlopen("mycoolanalysis.so").

However, I think there are two separate things being discussed here: 1) checks written in C++, and 2) checks written in a higher-level (e.g. scripting) language. Right now, we have none of #2, but we definitely will in the future. I think most checks will end up being built off a shared analysis core written in C++, but whose high-level logic is implemented in a scripting language. We're just not quite there yet, and getting the design of the core solid and fully featured is more important than getting the bindings going (to the current contributors).

-Chris

As an interesting side note, the developers of Adium (http://www.adiumx.com) have actually been regularly scanning their sources using Clang’s static analyzer. As part of their process, they wrote some scripts that inspect the emitted diagnostics (in this case HTML files) and did some munging to determine if the a report emitted on a given scan was the same as in a prior scan. Their goal is to avoid re-inspecting previously diagnosed bugs and false positives. Having a general mechanism in the analyzer library would be quite useful, and would be a good first order project for anyone who is interested in getting involved.

I thought I’d chime in here; I wrote the scripts we currently use in Adium for this purpose. Sources (which do not work with the latest version of Clang, to my knowledge) are at:http://trac.adiumx.com/browser/trunk/Utilities/Analyzer

In response to the above paragraphs of Ted’s e-mail, I thought I’d announce here that I’ve been working on a Trac plugin for viewing Clang reports. Recent versions of the analyzer will output to plist files, or HTML files, but not both (though this would presumably be a trivial change). The Trac plugin parses plist files to generate a file view, which is annotated with the appropriate diagnostics.

That’s awesome!

Using a dynamic view (i.e., the one generated by Trac) to view the reports has some benefits over the static files. For one, we can strip parts of the code that are irrelevant to the report, instead showing only a few context lines above and below each diagnostic. Another thing we can do is add links to other pages, such as a report generated on a previous revision of the file.

I honestly would like to go to a completely dynamic view model. This was the reason for introducing ‘scan-view’ to view error reports so that we could eventually move to this model completely. The idea would be (more or less) to generate all error reports as plist files and then use clang on the side to generate syntax-highlighted HTML. This would be far more scalable; it would take less disk space (syntax-highlighted HTML files can get big) and it allows a diversity of rendering of the reports. Of course we still want to support the model where we can get self-contained report files that users can attach to emails, etc., but a dynamic tracking system inherently allows us to do more things: track false positives and bugs across runs, measure interesting statistics, etc.

I personally think a Trac plugin is a great idea. I would like to see some general infrastructure for building different dynamic interfaces (e.g., the Trac plugin, scan-view) to facilitate the construction of different interfaces that tailor to different tastes/workflows/development processes.

More importantly, however, is the possibility to automatically file tickets based on new bugs. The system can track a single bug longitudinally across several commits, using simple heuristics to decide when a report has been resolved, or merely changed. (This would integrate well with a system like Buildbot for performing scan builds and uploading new plists.)

Is there any interest in a plugin like this? I’d be glad to share it on a repository if anyone would like to help flesh out the idea.

I personally wouldn’t have much time to work on a Trac plugin, but I would love to see open development on this feature. We can also link to it from the Clang web page. Per my earlier statement, I would like to see some of the logic for tracking issues across runs put into a common library that could be used by multiple interfaces. Some of this logic could be specific to a particular interface (e.g., Trac) while the rest of it would be more generic and reusable.

I'm glad to hear all the details of your analyzer. I'll have more
questions later, I'm sure, but for now this is what interests me most:

Another high-level goal of the analyzer is to support the relaying of rich
diagnostics to end-users about how a bug manifests in their program. The
diagnostic reporting mechanism in the analyzer also uses a set of abstract
interfaces so that bug reports can be rendered in a variety of ways (e.g.,
to the console, to an HTML page, within an IDE, etc.). Providing rich
diagnostics is an important goal because without them the results of a
static analysis algorithm is only useful to graduate students studying
program analysis techniques rather than programmers who want to fix bugs.

As you mentioned later, issue tracking is very important and the
analyzer can be designed to help with that. It seems to me that that
could be the best use of Google's money to get this tool into its most
useful state.

I think having good issue tracking and improving the general infrastructure for the analyzer would expand its usefulness to more users. Improving the UI and workflow would greatly improve its usability to more developers.

Among other issues that I didn't mention was a better way of "intercepting the build" so that the analyzer scans every file that the compiler does (and with the same flags, include paths, etc.). Currently 'scan-build' just overrides CC to be a "fake compiler" that forwards its arguments onto gcc and clang. This solution doesn't work in many cases and could be greatly improved.

I can see you've put a lot of thought into the other
analysis which can be added to the tool later. I'm not an expert in
that area so I'll probably leave that area untouched.

In any event, you've described a tool which seems to have been
designed to be both extensible and useful and for that I'm very
excited. It seems like such an obvious need, doesn't it?

I certainly think so. :wink:

I personally like annotations. They represent “actionable documentation” that can be leveraged by multiple tools.

That said, in my own conversations with different developers I have seen cases where annotations directly in the source code are completely taboo. Probably the best example is source code that is under a tight “change control” process where only critical changes can be made to the source code. Being able to “annotate” code, be it for issue tracking or enhancing a tool’s knowledge of the code, without actually modifying the code is really useful in such circumstances. This allows code to be actively scanned and “annotated” without source modifications. Alternatively, one may wish to back-scan previous versions of a code base to determine when certain issues appeared. Earlier versions of a code base wouldn’t certainly lack any annotations embedded in the code itself.

That said, source-level annotations are the easiest to support (from an engineering perspective) and probably most accurate form of documentation.

This would be really useful, especially for people wishing to write both custom checks for their own APIs/software or wishing to contribute back to the general community.