[RFC] Tutorial for Clang Analyzer Plugins

All,

I have written a tutorial about how to create a plugin for the static analyzer. The target audience is developers who reasonably familiar with using Clang, but are unsure how they would go about extending Clang for custom analysis cases.

The tutorial describes a complete example of a plugin that checks for proper use of a trivial locking construct. This was the simplest example I could think of that requires the use of stored program state.

The tutorial covers in detail everything needed to write a plugin, including: the interface between Clang and the plugin, how to subclass Checker, defining and using custom program state data, reporting found bugs, and compilation. As such, it is relatively self-contained.

Any feedback would be greatly appreciated. I hope to get this integrated into the Clang documentation, but I am unsure about what the best way to do that would be; any suggestions about that would also be welcome.

AnalyzerPlugin.rst (29 KB)

This is fantastic! Thanks for contributing it :slight_smile:

I'm CC'ing a couple reviewers who are familiar with the static
analyzer so that they can comment on the content of your tutorial.
After you have incorporated their review comments I would love to
commit this for you.

Also, in the section "Additional sources of information", you may want
to link to the slides and video of the dev meeting talk "Building a
Checker in 24 hours" at <http://llvm.org/devmtg/2012-11/> as a next
step for a person new to writing checkers.

-- Sean Silva

Hi Sam,

Thanks for writing this up!

We currently have the following as documentation for people writing their own checkers:

Your tutorial is somewhere in between. It provides in depth info on some of the aspects of writing a checker as well as being a tutorial. For example, the section on bug reporting and adding checker state would be a good addition to the manual. Also, registering a checker as a plugin could also be a section in the manual.

I am not 100% sure where your tutorial should go and how it compliments what’s already there. (I am not sure if you were aware of the existence of the other two documents.) What do you think?

Cheers,
Anna.

I have seen both of those. I actually started work on the tutorial before the talk was given. I now see the tutorial as being similar to the talk, but a bit more thorough (by virtue of not being confined to the format of a presentation).

I really like the idea of having a motivating example that ties all the information together and “drives” the concepts that are explained. At the same time, such an example limits what can be included. For example, the tutorial contains no discussion of value evaluation (SVals, etc.), because it not relevant to the example at hand. This limitation will always be true of a tutorial-like presentation.

I see three options:

  1. Put up the tutorial as is, and work on filling in the Checker Writer Manual (perhaps using some of the same content) as a separate effort.
  2. Move most of the more technical content to the Checker Writer Manual, and have the tutorial as a separate page, referencing it as appropriate. I think that this would be too confusing for readers (too much back-and-forth between the two documents).
  3. Integrate the two documents into one, with the tutorial (sans heavy technical details) at the end as an example of the concepts presented.

I think option 3 would likely be the best long-term option, but am curious to hear what others think. Is having a completely self-contained tutorial useful?

On a related note: are there plans to convert the analyzer documentation (and other items in www/) from HTML to RST format? I put the tutorial in RST format because I saw that that is the format that Clang documentation is moving towards. If such a conversion is imminent, I would probably want to wait until it is complete before attempting to merge the two documents.

On a related note: are there plans to convert the analyzer documentation
(and other items in www/) from HTML to RST format? I put the tutorial in RST
format because I saw that that is the format that Clang documentation is
moving towards. If such a conversion is imminent, I would probably want to
wait until it is complete before attempting to merge the two documents.

There are some issues with moving the www/ analyzer stuff. See the
thread started by the commit email for r170260. Basically the analyzer
is considered a separate project although it lives inside the clang
tree. Arranging things to enable the analyzer to transition to rst
would require some work on the llvm.org server. I'm not exactly sure
what would need to change, but I'm not currently empowered to do
anything there so no forward progress is being made in that regard.

Until that is sorted out, I think that it would make sense to just put
this in clang/docs/ so that people can read it and learn from it.

I have seen both of those. I actually started work on the tutorial before the talk was given. I now see the tutorial as being similar to the talk, but a bit more thorough (by virtue of not being confined to the format of a presentation).

I really like the idea of having a motivating example that ties all the information together and “drives” the concepts that are explained. At the same time, such an example limits what can be included. For example, the tutorial contains no discussion of value evaluation (SVals, etc.), because it not relevant to the example at hand. This limitation will always be true of a tutorial-like presentation.

I see three options:

  1. Put up the tutorial as is, and work on filling in the Checker Writer Manual (perhaps using some of the same content) as a separate effort.
  2. Move most of the more technical content to the Checker Writer Manual, and have the tutorial as a separate page, referencing it as appropriate. I think that this would be too confusing for readers (too much back-and-forth between the two documents).
  3. Integrate the two documents into one, with the tutorial (sans heavy technical details) at the end as an example of the concepts presented.

I think this option is the best. Your tutorial explains many concepts in detail; the kind of info you would expect to see in the manual. We should not repeat the same info in two places (tutorial and manual), so the best option is to combine them. The current manual does use code snippets to explain some APIs; it would be good if those would come from a self-contained checker, presented at the end.

I think option 3 would likely be the best long-term option, but am curious to hear what others think. Is having a completely self-contained tutorial useful?

I view the presentation as a self-contained tutorial.

However, I am also curious to see what others think.

On a related note: are there plans to convert the analyzer documentation (and other items in www/) from HTML to RST format?

Would be great to have the analyzer documentation converted, but I don’t know if anyone has committed to do the work.

I put the tutorial in RST format because I saw that that is the format that Clang documentation is moving towards. If such a conversion is imminent, I would probably want to wait until it is complete before attempting to merge the two documents.

Thanks fro working on this!
Anna.

In r171424 and r171425 I have set up a basic Sphinx setup for the
analyzer in docs/analyzer/ (segregated from the rest of clang's docs).
There is no server-side support though so it doesn't currently affect
the website.

-- Sean Silva

Hearing no other opinions…

I will assume that the best way forward is to combine my tutorial with the existing Checker Developer Manual.

I will look at producing this document in RST format. If the web site is ready to use RST for the analyzer docs by the time I finish, great. If not, it can be converted to an HTML page until such a transition is complete.

Hi ddunbar, in transitioning the analyzer to Sphinx, I'm going to need
some setup on the llvm.org server as with the other Sphinx
transitions. As background, clang/docs/analyzer is excluded from the
Sphinx build in clang/docs/, and has its own self-contained Sphinx
build.

So what is needed is to run `make` inside `clang/docs/analyzer/` (no
`-f Makefile.sphinx` since there was no conflicting Makefile in this
directory), then copy `clang/docs/analyzer/_build/html/` to appear at
<http://clang-analyzer.llvm.org/docs/>.

Thanks,

-- Sean Silva

Hi, Sean. Anna and I discussed this a bit and we decided that what’s in docs/analyzer/ really shouldn’t appear on the analyzer website at all. It’s internals documentation that is only relevant to people working on the static analyzer, and we don’t yet know how we want to expose that on the site.

What does seem interesting is transitioning the entire analyzer site over to Sphinx, for increasing consistency among the LLVM projects. It’s the pages people actually use that benefit most from transitioning to Sphinx, rather than random docs. Then there’s no reason to have a docs/ subdirectory there, either.

We were then wondering why the same isn’t happening for LLVM and Clang (and LLDB?). Transitioning the docs is great, but it’s very jarring to have half of each site in Sphinx and half in HTML…and all with different themes. I think we’d prefer that the analyzer site not transition to Sphinx at all unless we can do the entire site at once.

Ted may want to chime in here too.
Jordan

Hi Sean,

Thanks, I’ll start on this now.

  • Daniel

Ok, done.

I just realized I didn’t read Jordan’s mail thoroughly and I converted one of the internal docs to Sphinx format, which was against his post. So I will weigh in on that now:

There is a distinction between the “docs” and the “website” for each project. For LLVM, the website isn’t even open source. For Clang and the Analyzer, it is in a separate directory from the docs. All the websites are currently plain HTML, almost all the code documentation is now Sphinx. It’s different, and it might be worth considering making the websites Sphinx too, but a priori they service different goals so having the website and the programming docs in different formats doesn’t seem terrible.

It makes total sense to me to have the internal documentation browsable on the websites somewhere, and this is what we do for LLVM and Clang and all the other projects, so I see no reason for the analyzer to be different. Note that its only “exposed” as much as there is a URL for it. You can decide if and when you want to link to it from the website.

  • Daniel

Ok, done.

I just realized I didn’t read Jordan’s mail thoroughly and I converted one of the internal docs to Sphinx format, which was against his post. So I will weigh in on that now:

There is a distinction between the “docs” and the “website” for each project. For LLVM, the website isn’t even open source. For Clang and the Analyzer, it is in a separate directory from the docs. All the websites are currently plain HTML, almost all the code documentation is now Sphinx. It’s different, and it might be worth considering making the websites Sphinx too, but a priori they service different goals so having the website and the programming docs in different formats doesn’t seem terrible.

We were just not clear what were the immediate goals of the clang and llvm conversion and if the complete websites (including plain HTML) are being converted as well as everything in the docs folder. This seems to be the best for constancy, however, it is not clear how much effort is required.

Currently, most of the analyzer documentation is in plain HTML, so unless there are immediate plans on transitioning HTML to Sphinx/RST, we would continue committing enhancements in HTML.

It makes total sense to me to have the internal documentation browsable on the websites somewhere, and this is what we do for LLVM and Clang and all the other projects, so I see no reason for the analyzer to be different. Note that its only “exposed” as much as there is a URL for it. You can decide if and when you want to link to it from the website.

Makes sense.

Ok, done.

I just realized I didn't read Jordan's mail thoroughly and I converted one
of the internal docs to Sphinx format, which was against his post. So I
will weigh in on that now:

There is a distinction between the "docs" and the "website" for each
project. For LLVM, the website isn't even open source. For Clang and the
Analyzer, it is in a separate directory from the docs. All the websites are
currently plain HTML, almost all the code documentation is now Sphinx. It's
different, and it might be worth considering making the websites Sphinx
too, but a priori they service different goals so having the website and
the programming docs in different formats doesn't seem terrible.

We were just not clear what were the immediate goals of the clang and llvm
conversion and if the complete websites (including plain HTML) are being
converted as well as everything in the docs folder. This seems to be the
best for constancy, however, it is not clear how much effort is required.

I'm not sure how relevant consistency is here, I think of the website as
being the "nice, pretty, easy to navigate" thing. I think of the
documentation as being the "book or guide I would be comfortable printing,
and can also read online". Sphinx is nice for the latter. For the former,
the best thing is a good designer who knows HTML + CSS intimately.

In my mind, it makes total sense for the analyzer website to stay as it is
now (HTML), but all the pages under the user manual to be Sphinx based. My
predicate is "what parts do I want in the printed manual".

- Daniel

Hi, Sean. Anna and I discussed this a bit and we decided that what's in
docs/analyzer/ really shouldn't appear on the analyzer website at all. It's
internals documentation that is only relevant to people working on the
static analyzer, and we don't yet know how we want to expose that on the
site.

Ah, ok, so clang-analyzer.llvm.org is primarily a user-facing site,
rather than a site for developers "working on the static analyzer"; I
wasn't quite grokking that. That makes sense then, although then it
seems like <http://clang-analyzer.llvm.org/checker_dev_manual.html> is
misplaced (or maybe I misunderstand what "working on the static
analyzer" means).

What does seem interesting is transitioning the entire analyzer site over to
Sphinx, for increasing consistency among the LLVM projects. It's the pages
people actually use that benefit most from transitioning to Sphinx, rather
than random docs. Then there's no reason to have a docs/ subdirectory there,
either.

This is certainly a possibility. One of the primary advantages of
Sphinx is simply making it easier to write content, and it is much
more suitable for a "website hosted in an SVN repo" than HTML.
However, see below.

We were then wondering why the same isn't happening for LLVM and Clang (and
LLDB?). Transitioning the docs is great, but it's very jarring to have half
of each site in Sphinx and half in HTML…and all with different themes. I
think we'd prefer that the analyzer site not transition to Sphinx at all
unless we can do the entire site at once.

The stylistic consistency can be achieved in a fairly straightforward
way with appropriate CSS. The bottom line though is that we don't have
anybody who is responsible for making everything look and feel
consistent, so naturally it doesn't happen. What we really need is for
one of the many large companies invested in LLVM to step up and get
one of their web developers/designers to put some effort into the site
(my impression is that this is exactly how our awesome dragon logo
came to be). Since you work for one such large company, maybe you
could try to get the ball rolling on that? Feel free to put them in
contact with me if that would help in any way.

I fully agree though, the current pages are an incoherent mess from a
web design point of view: that's what happens when compiler and
toolchain writers try to do web design!

-- Sean Silva

For LLVM, the website isn't even open source.

Is it not in www/ in the SVN repo? Or is that something different from
what you're referring to?

Sphinx is also nice for making it easy for developers to write
documentation in the first place and to improve existing
documentation.

-- Sean Silva

> For LLVM, the website isn't even open source.

Is it not in www/ in the SVN repo? Or is that something different from
what you're referring to?

Oh, you are right. For some reason I thought it wasn't...

This isn’t exactly true in Sphinx, because the default mode becomes “include” instead of “exclude”.

It probably does make sense to make the “Development” part of the analyzer site become equivalent to LLVM and Clang’s “docs”. This is where the current Checker Developer Manual lives (and thanks for working on that, Sam; I think I agree with Anna that we’d rather have one merged doc for now), and where some of the current internal stuff could go if it were cleaned up a bit more, but I think we’d want to help moderate that transition. (If you think we should move our text-file notes out of docs/ for now, we could do that too.)

Jordan

It makes total sense to me to have the internal documentation browsable on the websites somewhere, and this is what we do for LLVM and Clang and all the other projects, so I see no reason for the analyzer to be different. Note that its only “exposed” as much as there is a URL for it. You can decide if and when you want to link to it from the website.

Makes sense.

This isn’t exactly true in Sphinx, because the default mode becomes “include” instead of “exclude”.

It probably does make sense to make the “Development” part of the analyzer site become equivalent to LLVM and Clang’s “docs”.

I am not sure I agree that everything/anything under “Development” tab should be in different format than the rest of the site.