[Patch] Update to Checker Development Manual

Hello all,

Several months ago, I posted a tutorial about developing a plugin for the Analyzer. After some discussion, it was decided that parts of the tutorial should be integrated into the existing Checker Development Manual.

I’ve finally had time to complete the majority of this work. Attached is a patch that contains the changes to date. This adds the following information:

  • Custom Program States
  • Checker Skeleton
  • Bug Reports
  • Additional sources of information

I also cleaned up the table of contents a bit, and fixed a incorrect tag.

There are still some other parts that I am looking at adding, most notably a description of how the Analyzer processes through a function, building the program state (I had an example of this in the original tutorial, but I now think it will need to be completely rewritten). Since I’m not sure how long these additional sections will take, I want to get the ball rolling on getting these changes reviewed and merged.

–Sam

checker_dev_manual.diff (13.3 KB)

+cc Anna, Jordan

Has someone had a chance to review this yet?

–Sam

Whoops, sorry. Short answer: no.

Slightly longer answer: I glanced through it and it’s actually looking quite good, but I’d/we’d like a chance to look at it more carefully before landing it…which probably won’t happen until next week. Sorry!

Jordan

Hello all,

Several months ago, I posted a tutorial about developing a plugin for the Analyzer. After some discussion, it was decided that parts of the tutorial should be integrated into the existing Checker Development Manual.

I’ve finally had time to complete the majority of this work. Attached is a patch that contains the changes to date. This adds the following information:

  • Custom Program States
  • Checker Skeleton
  • Bug Reports
  • Additional sources of information

I also cleaned up the table of contents a bit, and fixed a incorrect tag.

There are still some other parts that I am looking at adding, most notably a description of how the Analyzer processes through a function, building the program state (I had an example of this in the original tutorial, but I now think it will need to be completely rewritten). Since I’m not sure how long these additional sections will take, I want to get the ball rolling on getting these changes reviewed and merged.

–Sam

All opinions expressed in posts from this account are entirely my own, and not
those of any present or former employer. Furthermore, I assert that all works
contributed to the Clang project (1) were developed using no equipment,
supplies, facility or trade secrets of any such employer, (2) were developed
entirely on my own time, and (3) do not result from any work performed for any
such employer.

+cc Anna, Jordan

Has someone had a chance to review this yet?

Hi Sam,

Thanks for preparing the patch!

I’ve started reviewing this.

Anna.

Sam, thank you so much for working on this!

Here are some comments. However, this generally looks good.

Anna.

---- Custom Program states
+


+Checkers often need to extend the program state information to include their own
+custom information. The preferred way to do so is to use one of several macros
+designed for this purpose. They are:

Anna:

+<p>These macros will cover a majority of use cases; however, they still
have a
+few limitations. They cannot be used inside namespaces (since they expand
to
+contain top-level namespace references), and the data types that they
define
+cannot be referenced from more than one file.

I would just say "Note that the macros cannot be used inside namespaces
(since they expand to contain top-level namespace references)." The macros
can be referred to from multiple files if they are defined in a header that
is included from all of them. It's just that each checker is usually
contained in a single file.

The comment above REGISTER_TRAIT_WITH_PROGRAMSTATE says that should not be
used "for traits that must be accessible from more than one translation
unit." I assume this is the case because the GDMIndex function that is
generated is returning the address of a static variable, which would be
different in different translation units.

Also, addTransition always has arguments.

I've changed it to suggest CheckerContext::getState instead of
addTransition in cases where there are no state updates and the updated
state (as returned by addTransition) in cases where there are state updates.

----
I think the current flow does not make much sense anymore. For example,
"Idea for a Checker" should go before the sections you've added. I think
the following order would be best:

Getting Started
Static Analyzer Overview
  Interaction with Checkers
Idea for a Checker
Checker Registration
Checker Skeleton
Custom Program States
Representing Values // With added introduction saying that we are going to
store info about the values we are observing into the state..
Bug Reports
..

I was also wondering about the order, but reordering the document seemed a
bit too ambitious for a first step. The order I was thinking of was:

- Getting Started
- Static Analyzer overview // Without the details of how states are
represented
- Idea for a checker // Also merge the current "AST Visitors" section
into this one.
- Checker Registration
- Checker Skeleton
- Program State Example // Not yet written. An example showing how the
ExplodedGraph is constructed for a simple function, including both value
constraints and custom program state, without describing the specific data
structures used.
- Program State Details // Builds on the previous section; describe the
specifics of ExplodedGraph/ExplodedNode, components of ProgramState.
- Representing Values
- Custom Program States
- Bug Reporting
- Testing
- Hints

All of your other comments should be addressed; updated patch attached.

Thanks once again for reviewing this.

--Sam

checker_dev_manual.diff (16.7 KB)

I see. My bad!

That does not seem right. A node consists of a ProgramState AND a ProgramPoint.

However I’ve realized that, here, my previous review comment is not correct. You CAN call addTransition with no arguments:
ExplodedNode *addTransition(ProgramStateRef State = 0,
const ProgramPointTag *Tag = 0) {
return addTransitionImpl(State ? State : getState(), false, 0, Tag);
}

In that case, the current ProgramState will be used. However, a new node will be created for the bug report. The node will be different from the previous node - it will be tagged to identify that it has been produced by the current checker. Sorry for the confusion!

Let’s leave “AST Visitors” in the separate section. They are very different from the regular checkers. Someone learning about path sensitive checkers does not need to know about how to write an AST Visitor.

Maybe rename into Checker Callbacks? That’s what the new content seems to be.

This roadmap looks good. I think it’s better to start restructuring as we are adding new content. This goes live as soon as you commit it, so we want the flow to ALWAYS make sense.

Here are a couple of comments about the new patch. Not sure if Jordan wants to review this pre-commit as well.

+

All of these macros take as parameters the name to be used for the custom
“All of these macros take two parameters: the name to be used for the custom”

+category of state information, and the data type(s) to be used for storage. The
No comma before “and”

  • Create a new checker implementation file, for example ./lib/StaticAnalyzer/Checkers/SimpleStreamChecker.cpp.
  • Include the following code (replacing SimpleStreamChecker with the name of the new checker).

This is a bit confusing. We probably do not want people to create “SimpleStreamChecker.cpp” as such a file already exists. Maybe we should just change the name to something else throughout the manual. For example, we could use MySimpleStreamChecker. In that case, we would also not need to tell the reader to change the name, as in bullet #2 above. What do you think?
We could also mention that the sample code used in the manual is based on the SimpleStreamChecker from the presentation.

  • There are two main decisions you need to make:
    • Which events the checker should be tracking.
    • See CheckerDocumentation
    • for the list of available checker callbacks.
    • What data you want to store as part of the checker-specific program
    • state. Try to minimize the checker state as much as possible.

I think it’s very important to give people this high level roadmap, before diving into the example. Maybe this could be moved into a separate section in the very beginning. You can then link the first bullet to Checker Skeleton and the second one to Custom Program States. Maybe we could move this into the Idea for a Checker?

  • it to the Analyzer so that it can be displayed. This uses two classes:
    “This uses two classes” → “The following two classes are used to construct a bug report:”

+the bug in the code and the program’s state when this location is reached. This
+is in the form of a node in the ExplodedGraph.

"the bug in the code " → "the bug in the source code "

“and the program’s state when this location is reached. This is in the form of a node” → “and the ExplodedNode which encompasses the state of the program at the time the bug is encountered”

(Note the reason for this change is that ProgramState is a part of ExplodedNode.)

+

The node representing this context can be obtained one of two ways:

“The node” → “The ExplodedNode, which is a node in the ExplodedGraph”

+

  • If the state has not been updated, it can be obtained by calling the <a
    +href=“http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a81bd66f80b18117a9a64a8d0daa62825”>CheckerContext::getState
    +function.
    This is wrong. See the comment above.

    +would prevent the program could continuing. For example, leaking of a resource
    The wording is off (“the program could continuing”).

    +

    Additional Sources of Information


    Note that I’ve just updated the first paragraph of the page to include some of these links. I think it’s OK to keep both, especially if we were to extend the section you’ve added.

  • Also, addTransition always has arguments.

    I've changed it to suggest CheckerContext::getState instead of
    addTransition in cases where there are no state updates and the updated
    state (as returned by addTransition) in cases where there are state updates.

    That does not seem right. A node consists of a ProgramState AND a
    ProgramPoint.

    However I've realized that, here, my previous review comment is not
    correct. You CAN call addTransition with no arguments:
      ExplodedNode *addTransition(ProgramStateRef State = 0,
                                  const ProgramPointTag *Tag = 0) {
        return addTransitionImpl(State ? State : getState(), false, 0, Tag);
      }

    In that case, the current ProgramState will be used. However, a new node
    will be created for the bug report. The node will be different from the
    previous node - it will be tagged to identify that it has been produced by
    the current checker. Sorry for the confusion!

    A quick survey of the existing checkers indicates that, in cases where a
    sink node or state transition is not being generated, the most popular way
    to get an ExplodedNode to pass to BugReport is addTransition with no
    arguments. I've changed my text to recommend this.

    - Getting Started
    - Static Analyzer overview // Without the details of how states are
    represented
    - Idea for a checker // Also merge the current "AST Visitors" section
    into this one.

    Let's leave "AST Visitors" in the separate section. They are very
    different from the regular checkers. Someone learning about path sensitive
    checkers does not need to know about how to write an AST Visitor.

    Ah, I misinterpreted that section. I read "AST Visitor" as Clang's
    ASTVisitor (which could be used to implement "checks" that do not need path
    analysis), not as a separate item specific to the analyzer. I now agree
    that that section should be left in place (and, of course, should be
    expanded at some point).

      - Checker Registration
    - Checker Skeleton

    Maybe rename into Checker Callbacks? That's what the new content seems to
    be.

    Changed it to "Events, Callbacks, and Checker Class Structure". All three
    are discussed, and are closely related to each other.

    +<p>All of these macros take as parameters the name to be used for the

    custom
    "All of these macros take two parameters: the name to be used for the
    custom"

    REGISTER_MAP_WITH_PROGRAMSTATE takes three parameters, hence the "type(s)".

    + <li>Create a new checker implementation file, for example
    <tt>./lib/StaticAnalyzer/Checkers/SimpleStreamChecker.cpp</tt>.
    + <li>Include the following code (replacing SimpleStreamChecker with the
    name of the new checker).
    This is a bit confusing. We probably do not want people to create
    "SimpleStreamChecker.cpp" as such a file already exists. Maybe we should
    just change the name to something else throughout the manual. For example,
    we could use MySimpleStreamChecker. In that case, we would also not need to
    tell the reader to change the name, as in bullet #2 above. What do you
    think?
    We could also mention that the sample code used in the manual is based on
    the SimpleStreamChecker from the presentation.

    I've changed the tone of this section from "this is what you do for a new
    checker" to "this is what was done for SimpleStreamChecker", which matches
    the tone of the other sections better. I don't think anyone will have
    trouble determining what needs to be changed for their own new checkers.
    However, I'm not sure how well this section reads after this change, please
    review and give your opinion.

    Also, would it be a good idea to have a section giving an overview of the
    SimpleStreamChecker and declaring that it will be the example used
    throughout the document? This would mean that we could simply say "for
    SimpleStreamChecker" instead of having phrases like "for
    SimpleStreamChecker, a checker that warns about improper use of file stream
    APIs" throughout the document.

    - There are two main decisions you need to make:
    - <ul>
    - <li> Which events the checker should be tracking.
    - See <a href="
    http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html
    ">CheckerDocumentation</a>
    - for the list of available checker callbacks.</li>
    - <li> What data you want to store as part of the checker-specific
    program
    - state. Try to minimize the checker state as much as possible. </li>
    - </ul>
    I think it's very important to give people this high level roadmap, before
    diving into the example. Maybe this could be moved into a separate section
    in the very beginning. You can then link the first bullet to Checker
    Skeleton and the second one to Custom Program States. Maybe we could move
    this into the Idea for a Checker?

    I've reinserted this material (with minor adjustment) in the "Idea for a
    Checker" section.

    +the bug in the code and the program's state when this location is
    reached. This
    +is in the form of a node in the <tt>ExplodedGraph</tt>.

    "the bug in the code " -> "the bug in the source code "
    "and the program's state when this location is reached. This is in the
    form of a node" -> "and the <tt>ExplodedNode</tt> which encompasses the
    state of the program at the time the bug is encountered"

    I've rewritten it to emphasize that both the program location and the
    program state are contained in the ExplodedNode.

    Updated and freshly rebased patch attached.

    --Sam

    checker_dev_manual.diff (18.5 KB)

    Sam,

    This looks very good! I have a couple of minor comments below. I would also like to have the CustomProgram States section move before the commit.

    — Custom Program States
    -The data type(s) specified will also become the parameter and/or return type

    +The data type(s) specified will also become the parameter types and/or return type
    It would be better to present either a map or a set in the example of register with program state macros as these are the most common (in particular, a map that has a SymbolRef key)
    Also, “Custom Program States” section should come after “Events Callbacks,…”.

    — Idea for a Checker
    We could briefly introduce the SimpleStreamChecker and what it does at the end of Idea for a Checker section. What do you think?

    — Events, Callbacks… section
    -For each event type requested, a corresponding callback function with the name check<event> must be defined
    +For each event type requested, a corresponding callback function must be defined
    Since some functions don’t start with check and “” does not follow the format…

    — Bug Reports section
    -This two classes used to construct this report are
    +The two classes used to construct this report are

    1. If the state has been updated, then the CheckerContext::addTransition function will return the ExplodedNode with the updated state.
    2. If the state has not been updated, the ExplodedNode can be obtained by calling the CheckerContext::addTransition function with no arguments.
      These two cases are more or less the same. I’d just first, describe the difference between sink and regular node(the paragraph below) and after that give the two options: addTransition and generateSink, which are essentially the same except for generating a transition to sink or a regular node. these would be my 1 & 2 option bullets. (The API names are not consistent :frowning: )

    Thanks!
    Anna.

    All of your other comments should be addressed. If you don’t see any other issues, please feel free to commit for me (as I don’t have commit access).

    –Sam

    checker_dev_manual.diff (19.1 KB)