scan-build man page

The attached file is a complete man page in mdoc format for
scan-build.

I used the text produced by running scan-build without options as a
basis. I cleaned up the wording and tried to arrange things
in a somewhat predictable order while adhering to the standard man page
format.

--jkl

scan-build.1 (9.14 KB)

Hi James,

The man page looks really great. My main concern, however, is supporting divergent documents. Ideally we want the documentation for scan-build, the man page, etc., to all be in sync. My main concern about having a separate man page file is that someone is now responsible for keeping it up-to-date. It's a bit of engineering, but I'd prefer we go in a direction where the man page and the scan-build documentation on the website (or at least part of it) were machine generated from some common description, which includes the common prose and the the checker options.

It sounds like you had a fairly mechanical process for generating the man page (you took scan-build's output and manually post processed it). Do you think we could automate this with a script, so the man page could just be a product of the build? Alternatively, since scan-build generates most of this text, maybe it could generate the options part of the man page itself (as an option to its output format), and have that output concatenated with some common preamble. What do you think?

Cheers,
Ted

Hi Ted,

The man page looks really great.

Thanks. I'm glad to start a conversation with it.

My main concern, however, is
supporting divergent documents. Ideally we want the documentation
for scan-build, the man page, etc., to all be in sync.

Having maintained a 90-page user guide for 10 years, I understand that
concern. I want to suggest, though, that "keeping it in sync" not as
big a deal as most people think. Documentation is not very redundant
because it's labor-intensive, and labor, as you know, is scarce.
Therefore overlap is inherently self-limiting.

Documentation is also not amenable to automation. We want it to be; we
want the code to be self-documenting. And good code (by definition)
is self-documenting. But what we want from documentation isn't *in* the
code (or shouldn't be). It has to be written. Asking for
self-generating documentation is a bit like asking for self-generating
code.

That said, the most tedious part of maintaining a reference manual is
documenting all the options and ensuring 1) all options are documented
and 2) all documented options are implemented. Keeping the man page
synchronized with the implementation can, in principle, be automated.
But it doesn't follow that the man page must therefore be generated,
even in part; it might be better simply to have a system that compared
the two and reported on the differences. For a small number of man
pages, that "system" might be just eyeballs, and some vigilance when
options are added/deleted.

My main concern about having a separate man page file is that someone
is now responsible for keeping it up-to-date.

True. Documentation even introduces bugs, because undocumented
functionality never contradicted observed behavior. :wink:

It's a bit of engineering, but I'd prefer we go in a direction where
the man page and the scan-build documentation on the website (or at
least part of it) were machine generated from some common description

I would like to convince you that's both unnecessary and infeasible.

First, it's an optimization of labor with labor, right? And the rule
for optimization is to measure first. How much do the man page and the
website have in common? I don't see much, nor need for more.

Even the obvious overlap -- command-line options -- doesn't warrant
wholesale duplication. A guide properly presents some of the
options in an order chosen for ease of learning. Rather than interrupt
the text with an exhaustive list of every option and its synonym, a
guide serves the user better by referring him to the reference manual
for complete details. As soon as you're selecting *some* options in a
pedagogical order, you might as well just do it by hand. The time
spent getting that information into a back-end database and building
the integration system will never be repaid.

Keep in mind Vint Cerf's dictum, too. If you reject documentation
because it's not in the "right" form, you restrict the number of
contributors. Not everyone willing and able to document will be
interested in learning a specialized technology to do so.

It sounds like you had a fairly mechanical process for generating the
man page (you took scan-build's output and manually post processed
it). Do you think we could automate this with a script, so the man
page could just be a product of the build?

Mechnical, yes, but it can't be automated. The text from scan-build
lacks the very markers I added: the headings, that "model" is an
argument, that [=title] is optional, and so on.

In principle the text could remain embedded in scan-build and extracted
to generate a man page. But you'd have to re-invent half of -mdoc in
the process, without any improvement in the outcome. Perldoc is a
good example.

Alternatively, since scan-build generates most of this text, maybe it
could generate the options part of the man page itself (as an option
to its output format), and have that output concatenated with some
common preamble. What do you think?

I would remove the help text from scan-build. You don't need it
anymore. "man scan-build" is easier to use, and everyone knows how.
Maintaining the man page is trivial next to the effort that goes into
the scanner.

Besides, I'm allergic to so-called "help" that scrolls off my screen
and destroys the 23 lines of context I had before it took over. Be
warned: for reasons science has been totally unable to explain, that
allergy has been observed to be contagious. I think I got it from
Subversion.

Someone will be tempted to suggest that if the documentation is amid
the code, the programmer will be more likely to keep it up to date.
That proposition is contradicted by vast collective experience. We both
know huge projects with enviable (never perfect) documentation
maintained as man pages. I don't know of any that owe their
documentation to how easy it is to maintain.

Again: I do think that reference documentation can and should be
synchronized with the code. Better tools could make that more
convenient than anything available today by reducing redundancy, by not
requiring, for instance, that function and argument names be restated.
Clang promises to make that possible for C++. It's one of the reasons
I'm interested.

Regards,

--jkl

Hi James,

You’ve made some great points. My point was more that I wanted the documentation of the basic command line options of scan-build, wherever we post them, to be in sync. Certainly the detailed prose information doesn’t need to be verbatim everywhere, but there should be some synergy there as well.

Mechnical, yes, but it can’t be automated. The text from scan-build
lacks the very markers I added: the headings, that “model” is an
argument, that [=title] is optional, and so on.

In principle the text could remain embedded in scan-build and extracted
to generate a man page. But you’d have to re-invent half of -mdoc in
the process, without any improvement in the outcome. Perldoc is a
good example.

What I want is very simple:

(1) You’ve written prose that could be used to generate a good man page.

(2) You had a mechanical process to generate the man page from the scan-build options.

Given (1), why can’t we generate a man page by automating (2)? I don’t understand why this is so hard. Given the right level of refactoring within scan-build, we should be able to print out the text in different flavors. One flavor could be amendable to a “scan-build help” and another for a man page generation. Indeed, they could be made on in the same (see comment below).

Alternatively, since scan-build generates most of this text, maybe it
could generate the options part of the man page itself (as an option
to its output format), and have that output concatenated with some
common preamble. What do you think?

I would remove the help text from scan-build. You don’t need it
anymore. “man scan-build” is easier to use, and everyone knows how.
Maintaining the man page is trivial next to the effort that goes into
the scanner.

I disagree. One of the things I like about git is “git help ”. I find that interactive help wonderful, and a degree more powerful than a static man page. It’s a powerful idiom that would translate well to scan-build, e.g.:

scan-build help check

which could print out information specific about that checker. A single man page needs to make compromises about what it includes and doesn’t include. Yes, “git help” is based on “man”, but given that some of the information for scan-build can be dynamically queried from clang (e.g., the list of available checkers), it would be really powerful to have scan-build be able to dynamically generate man pages as necessary. The last thing I want to do is every time I roll a new checker build that I need to manually check if the list of checkers in the man page is up-to-date.

Cheers,
Ted

Hi James

Don't let yourself be discouraged by these people expecting some kind of "perfect" solution... it is far too hard to get even the smallest change in, but do persevere. :slight_smile:

--Dave.

What I want is very simple:

(1) You've written prose that could be used to generate a good man page.

(2) You had a mechanical process to generate the man page from the scan-build options.

Given (1), why can't we generate a man page by automating (2)?

Because a man page is more than just list of command line options. There
are basically two kinds of tools: those that have a few trivial options,
where this automation would just be overkill, and those that have a lot
of options, where a proper man page requires a lot of good structuring,
making any automation a pain to deal with.

This is the same problem as the discussion for the clang documentation.
The same arguments given to that email apply here as well.

I disagree. One of the things I like about git is "git help <command>".
I find that interactive help wonderful, and a degree more powerful than
a static man page.

One of the things I hate about git is that the user interface is
inconsistent enough that they have to create a hundred man pages
documenting the different option sets.

I really, really hope scan-build won't end up with that mistake.

Take a look at flex(1) and yacc(1) for a moment. I think they are quite
good at illustrating completely different levels of verbosity.

http://netbsd.gw.com/cgi-bin/man-cgi?yacc++NetBSD-current
http://netbsd.gw.com/cgi-bin/man-cgi?lex++NetBSD-current

yacc(1) is nice because it starts by answering the most important
questions first: what is this and how do I run this. Disadvantage is
that it doesn't even cross-reference the input format. Would I like in a
man page for the grammar specification? Not really, I know the
complexity of the beast.

lex(1) is a lot more messy. Even if I just want to know how to run it, I
have to skip past all this introductional use etc. It's overloaded.
Look at the AUTHOR section -- it's just missing a completely ChangeLog.

The last thing I want to do is every time I roll a new checker build
that I need to manually check if the list of checkers in the man page
is up-to-date.

*That* part can and should be automated easily.

Now the real question in all of this is how much of the options are
really scan-build specific and not shared e.g. with clang. For complex
build systems it can be a lot easier to integrated clang --analyze
directly and the checker specification and many other things are
logically more a part of the clang CLI. I'm generally not sure manual
pages are the best format to describe the checkers, especially it is
often very useful (if not necessary) to show test cases using HTML
output mode to clarify the behavior. For that reason alone I think a
well structured hyper text document is a more appropiate format. This
doesn't remove the need for a manual page, but restricts what it has to
do. Provide a description of the command line and reference the online
manual for further details.

Joerg

How in any way is this a productive comment? We keep high standards because we want to make high quality tools. I’m sorry if you submitted a patch in the past that wasn’t reviewed as quickly as you hoped, or didn’t go as smoothly as you wished, but we do try and make sure that what goes into the repository is good stuff.

In this case in particular, I honestly don’t’ see what is wrong about having a discussion about what is the best solution for the users. I very much appreciate James work on creating a man page, but I think there are issues here worth discussing. That doesn’t mean we won’t accept the patch, but this discussion is all part of the review process. Any time we throw something into the repository we are committing to maintaining it, and I’m very much interested in what is in the best interest for our users.

> I would remove the help text from scan-build. You don't need it
> anymore. "man scan-build" is easier to use, and everyone knows how.

I disagree. One of the things I like about git is "git help
<command>". I find that interactive help wonderful, and a degree
more powerful than a static man page.

Ted, no, really. How can you disagree? How is

  $ git checkout --help

"a degree more powerful" than

  $ man git-checkout

given that they produce exactly the same thing?

(Please, let the answer not include the verb "typing".)

I don't like the idea that everyone needs to learn each utility's
idiosyncratic way to view documentation. We have man(1). Why the
complication of N ways to do 1 thing?

Here's a rule of thumb for you: -h or --help should, for any command,
produce at *most* 2 lines of help If you need more than that, you
should be reading the documentation, not asking for help.

$ ls --help
ls: unknown option -- -
usage: ls [-AaBbCcdFfghikLlmnopqRrSsTtuWwx1] [file ...]

A good man utility can extract just the synopsis. On my
system today, I can do this:

$ man -h scan-build
scan-build [-ohkvV] [-analyze-headers] [-enable-checker [checker_name]]
[-disable-checker [checker_name]] [--help]
[--html-title [=title]] [--keep-going] [--plist]
[--plist-html] [--status-bugs] [--use-c++ [=compiler_path]]
[--use-cc [=compiler_path]] [--view] [-constraints [model]]
[-maxloop N] [-no-failure-reports] [-stats] [-store [model]]
build_command [build_options]

Isn't that nice? Even better: it's *standard*.

I would change scan-build to behave thus:

  $ scan-build --help
  See "man scan-build" for documentation.

  $ scan-build
  scan-build: fatal error: no input files
  analysis terminated.

(You want this behavior for automated batch processing, to
detect invocations that fail to provide command-line arguments.)

We could add one more:

  $ scan-build --man

to call exec(3) with /usr/bin/man. That way we can claim to have
invented something.

The last thing I want to do is every time I roll a new
checker build that I need to manually check if the list of checkers
in the man page is up-to-date.

OK, but that's orthogonal to how the documentation is
maintained. Better to check the documentation than to generate it.

First, let's not exaggerate. If there are N checkers and you add one,
you have to check the man page for 1, not N. At release time, the
primary focus should be not counting checkers, but ensuring that the
documentation is coherent and accurate. No machine can do that.

Second, you will find, in practice, that editing the man page
"manually" is in fact easier and produces better results. Why? Checker
descriptions require markup. Markup is a domain of knowledge. The
people who know that domain best are not necessarily the ones adding
the checkers. Requiring markup in some tiny doc-fragment in Options.td
-- which itself uses *other* markup -- by someone who doesn't know or
care about markup is a recipe for inconsistency and error. Requiring
the person taking care of the man page to grovel around in some other
file is a good way to encourage him to find something better to do with
his time.

Third and last, to verify that the checker list and the documentation
are in sync doesn't require the documentation be generated. It requires
only that they be verified. If TableGen (or whatever) produced a simple
ASCII list of checkers, a tiny shell script could scrape through the
man page to make sure they're all present and accounted for. Here,
let me help:

$ sed -ne '/Sh..*CHECKER/,$p' scan-build.1 \
  > awk '/^\.It/ {print$2}'
core.AdjustedReturnValue
core.AttributeNonNull
core.CallAndMessage
core.DivideZero
core.NullDereference
core.StackAddressEscape
core.UndefinedBinaryOperatorResult
core.VLASize
core.builtin.BuiltinFunctions
core.builtin.NoReturnFunctions
....

Let me know when TableGen is ready.

But let me make one more point about time and how it's spent, now that
we're friends. Creating the man page for scan-build from scratch took
an hour or so. The best and most appropriate response, not that I
expected it, would have been Thank You. The answer instead, on this
list and off, has been that a machine could have done it. The shortest
and best answer to that might be: Show me.

But I'm not that smart, so I engaged in a discussion, which discussion
has so far taken me -- because I have to articulate my arguments
carefully, in public, to people whose backgrounds I don't know -- about
six hours. And we're not done, and whatever system build fu we're
going to have is also not done. How much better those hours might have
been spent creating other man pages instead!

To you and the others who object on theoretical grounds, I say: Peace!
Don't let's create a new problem to solve. You didn't have a proper
man page, and some guy in a time machine parachutes one in for you.
Is it a puppy? No! Cool. "svn add"! Maybe he'll do another! What's
the worst case? A few hours over a few years dealing with -mdoc. If
that really turns into a hassle, you can go and build your
doc-generator and "svn rm". Meanwhile, work deferred is time bought for
something productive. What's not to like?

Regards,

--jkl

Great comments. Comments inline.

What I want is very simple:

(1) You've written prose that could be used to generate a good man page.

(2) You had a mechanical process to generate the man page from the scan-build options.

Given (1), why can't we generate a man page by automating (2)?

Because a man page is more than just list of command line options. There
are basically two kinds of tools: those that have a few trivial options,
where this automation would just be overkill, and those that have a lot
of options, where a proper man page requires a lot of good structuring,
making any automation a pain to deal with.

This is the same problem as the discussion for the clang documentation.
The same arguments given to that email apply here as well.

I disagree. One of the things I like about git is "git help <command>".
I find that interactive help wonderful, and a degree more powerful than
a static man page.

One of the things I hate about git is that the user interface is
inconsistent enough that they have to create a hundred man pages
documenting the different option sets.

That's fair, but would you agree that the idea is good in principle, and that this is problem with the execution, or do you think the fundamental design of that help system is flawed?

I really, really hope scan-build won't end up with that mistake.

Just to be clear, which mistake are you referring to? The inconsistency of the interface?

Take a look at flex(1) and yacc(1) for a moment. I think they are quite
good at illustrating completely different levels of verbosity.

http://netbsd.gw.com/cgi-bin/man-cgi?yacc++NetBSD-current
http://netbsd.gw.com/cgi-bin/man-cgi?lex++NetBSD-current

yacc(1) is nice because it starts by answering the most important
questions first: what is this and how do I run this. Disadvantage is
that it doesn't even cross-reference the input format. Would I like in a
man page for the grammar specification? Not really, I know the
complexity of the beast.

lex(1) is a lot more messy. Even if I just want to know how to run it, I
have to skip past all this introductional use etc. It's overloaded.
Look at the AUTHOR section -- it's just missing a completely ChangeLog.

Excellent points.

scan-build's documentation, which can be found by running it with no options, consists of a short list of options, a list of checkers, and a short example (at the very end). The man page that James provided is essentially this output reformatted as man page. Other documentation can be found on the clang-analyzer.llvm.org website, that has a bunch of information such as tips for using it with projects the use configure, etc. All of that is missing from the help emitted from the tool.

From my perspective, right now the man page provided by James adds incremental value, as it doesn't solve any of the concerns you mention, and it creates a potential maintenance burden to maintain. If the goal of the man page is to essentially to replicate the output of scan-build's auto-generated help text, then I think the majority of the man page could be auto-generated by formatting that output into something than 'man' likes. Having a man page is useful for users that expect man to be the source of help.

Given your feedback, would you think a scan-build be to have a very focused man page that omitted the checker descriptions? That would essentially be NAME, SYNOPSIS, OPTIONS, and EXAMPLE. I could then see 'scan-build help' just loading up the man page, and providing other means to get the list of checkers.

The last thing I want to do is every time I roll a new checker build
that I need to manually check if the list of checkers in the man page
is up-to-date.

*That* part can and should be automated easily.

Now the real question in all of this is how much of the options are
really scan-build specific and not shared e.g. with clang. For complex
build systems it can be a lot easier to integrated clang --analyze
directly and the checker specification and many other things are
logically more a part of the clang CLI.

It was never my intention to make 'clang --analyze' a public part of the interface to the analyzer, but more a low-level implementation detail. The problem with people adopting clang --analyze directly is that it encumbers us with how we (one day) will design global analysis. My hope was to have a simple command for "analyzing my project or source file", and that was meant to be scan-build, or have proper integration within an IDE that understands that 'clang --analyze' is just one implementation detail part of probably others that need to be considered when integrating the analyzer into an IDE.

I'm generally not sure manual
pages are the best format to describe the checkers, especially it is
often very useful (if not necessary) to show test cases using HTML
output mode to clarify the behavior. For that reason alone I think a
well structured hyper text document is a more appropiate format. This
doesn't remove the need for a manual page, but restricts what it has to
do. Provide a description of the command line and reference the online
manual for further details.

Makes sense.

I think you are taking all of my comments out of context. I’m sorry I didn’t properly thank you, but I did say I thought the man page “looked great”. That wasn’t just lip service. But I do want to do what is “right”, and that is the point of discussion. You characterize this as “theoretical grounds”, and I can understand that perspective even though I don’t agree with it. To you it sucked that there was no man page. I agree completely with you. I also completely acknowledge that my original reply was probably too much a brain dump rather than a proper “thanks, and here is some feedback”. Perhaps I should have just accepted the patch as is, and then revised the man page later, but I thought you wanted some input on what that meant since you took the time to create the man page in the first place.

Review is part of how we do things. I meant no offense to your efforts. If your main concern is lack of a man page, I’m fine with adding one. My questions more focused on what went into the man page, how it got there, etc. These are important questions. Should the man page serve as the comprehensive documentation for all things related to scan-build? There have been different opinions on this thread.

(my apologies for breaking up your email with multiple responses, but I find it a bit cleaner so that specific issues can be isolated from one another)

Ted, no, really. How can you disagree? How is

$ git checkout --help

“a degree more powerful” than

$ man git-checkout

given that they produce exactly the same thing?

(Please, let the answer not include the verb “typing”.)

To clarify my position, the usage of git I was referring to was:

$ git help

When I type “git” on my machine, I see:

$ git
usage: git [–version] [–exec-path[=]] [–html-path] [–man-path] [–info-path]
[-p|–paginate|–no-pager] [–no-replace-objects] [–bare]
[–git-dir=] [–work-tree=] [–namespace=]
[-c name=value] [–help]
[]

The most commonly used git commands are:
add Add file contents to the index
bisect Find by binary search the change that introduced a bug
branch List, create, or delete branches
checkout Checkout a branch or paths to the working tree
clone Clone a repository into a new directory
commit Record changes to the repository
diff Show changes between commits, commit and working tree, etc
fetch Download objects and refs from another repository
grep Print lines matching a pattern
init Create an empty git repository or reinitialize an existing one
log Show commit logs
merge Join two or more development histories together
mv Move or rename a file, a directory, or a symlink
pull Fetch from and merge with another repository or a local branch
push Update remote refs along with associated objects
rebase Forward-port local commits to the updated upstream head
reset Reset current HEAD to the specified state
rm Remove files from the working tree and from the index
show Show various types of objects
status Show the working tree status
tag Create, list, delete or verify a tag object signed with GPG

See ‘git help ’ for more information on a specific command.

To me this is awesome. I see a short table of contents of things I can drill down into, with instructions on how to get there. I can then do (as suggested):

$ git help pull

and I get even more precise documentation. I don’t need to know the specific command this maps to, because it doesn’t matter. I really like that fact that the ‘git’ executable is a one stop shop for getting all my functionality out of git. I don’t have to memorize a hundred different commands, or figure out how they map to a man page.

I don’t like the idea that everyone needs to learn each utility’s
idiosyncratic way to view documentation. We have man(1). Why the
complication of N ways to do 1 thing?

That’s an excellent argument, and the only reason I would say do it a different way is if it provided significant increased value to the users. For me the workflow I described above is awesome. That doesn’t mean that I don’t think a man page for git isn’t useful, but it inherently maps you into a different universe where you have reason about each command separately (otherwise, how else would you be able to look up the man page information for that specific bit of functionality)?

Now scan-build isn’t git. The options are fairly simple, and most of them exist today because of stupid limitations in its current implementation. The only thing I see changing quickly is the list of checkers, which was my main point of contention in my original email. Should those be included in the man page, and if so, how much information? It’s not like the current documentation on the existing checkers is adequate. Where should that information go? In the man page? If so, do we enter a place where the man page gets ridiculously long and lacks focus? After the discussion on this thread, my current inclination is that the documentation on specific checkers should go elsewhere (e.g., the website). Perhaps these are bike shed questions, but they are important.

The last thing I want to do is every time I roll a new
checker build that I need to manually check if the list of checkers
in the man page is up-to-date.

OK, but that’s orthogonal to how the documentation is
maintained. Better to check the documentation than to generate it.

First, let’s not exaggerate. If there are N checkers and you add one,
you have to check the man page for 1, not N. At release time, the
primary focus should be not counting checkers, but ensuring that the
documentation is coherent and accurate. No machine can do that.

Completely agreed.

Second, you will find, in practice, that editing the man page
“manually” is in fact easier and produces better results. Why? Checker
descriptions require markup. Markup is a domain of knowledge. The
people who know that domain best are not necessarily the ones adding
the checkers. Requiring markup in some tiny doc-fragment in Options.td
– which itself uses other markup – by someone who doesn’t know or
care about markup is a recipe for inconsistency and error. Requiring
the person taking care of the man page to grovel around in some other
file is a good way to encourage him to find something better to do with
his time.

That’s an interesting argument that I’ll need more time to digest. I’d really like to have a more cohesive documentation story for the analyzer, but I’m not certain what it should be. This discussion speaks more about the command line usage of scan-build, which is limited in scope. Assuming we get beyond this current thread, I assume we will find some resolution here. My broader concern is about the documentation of the checkers themselves. Right now it’s hardly anything, but where should it eventually go? My interpretation of your points is that pointing some kind of crazy markup in Options.td is a recipe for failure (e.g., creating a meta-documentation with markup that can then be lowered to something else). I can see that argument, as that is something else for someone to understand and maintain. The main conclusion I can draw from that then is that we should have one canonical way of providing documentation for the checkers, and if we had to pick one location for that it would probably be the web, not a man page.

Third and last, to verify that the checker list and the documentation
are in sync doesn’t require the documentation be generated. It requires
only that they be verified. If TableGen (or whatever) produced a simple
ASCII list of checkers, a tiny shell script could scrape through the
man page to make sure they’re all present and accounted for. Here,
let me help:

$ sed -ne ‘/Sh…*CHECKER/,$p’ scan-build.1 \

awk ‘/^.It/ {print$2}’
core.AdjustedReturnValue
core.AttributeNonNull
core.CallAndMessage
core.DivideZero
core.NullDereference
core.StackAddressEscape
core.UndefinedBinaryOperatorResult
core.VLASize
core.builtin.BuiltinFunctions
core.builtin.NoReturnFunctions

Let me know when TableGen is ready.

I both agree and disagree with you. If the list of checkers can be easily verified with sed or perl, why can’t this section of the man page (if we want to include it) be auto-generated? It’s just the reverse process. There’s not really any special markup here either. The names of the checkers are in bold, and everything else is not. This seems like a case of piping the output of scan-build through a perl script that generates that markup for man.

I understand completely if this isn’t something you wanted to tackle. You just wanted a man page. As a first cut I think what you have is fine. I’m sorry I dragged you into a broader discussion which had to do with my own musings of where I think this should go. I will go ahead and commit the man page you added. Thanks so much for providing it.

> I disagree. One of the things I like about git is "git help
> <command>". I find that interactive help wonderful, and a degree
> more powerful than a static man page.

One of the things I hate about git is that the user interface is
inconsistent enough that they have to create a hundred man pages
documenting the different option sets.

I really, really hope scan-build won't end up with that mistake.

The complexity of the documentation is a function of the complexity of
the interface. git is hard to learn because of the size of its
interface, not because of the form of its documentation.

I'm generally not sure manual
pages are the best format to describe the checkers, especially it is
often very useful (if not necessary) to show test cases using HTML
output mode to clarify the behavior. For that reason alone I think a
well structured hyper text document is a more appropiate format. This
doesn't remove the need for a manual page, but restricts what it has
to do.

You want the man page to list all the checkers. Consider:

$ man scan-build | grep debug
     debug.DumpCFG
     debug.DumpCallGraph
     debug.DumpDominators
     debug.DumpLiveVars
     debug.Stats
     debug.TaintTest
     debug.ViewCFG
     debug.ViewCallGraph

That's a great way to grab a reminder or build a command line with all
the debug checkers.

If test cases and examples are many, I would say you're describing
a user guide more than a reference manual. As long as you have two
documents, the man page can be terse because "how" is explicated in the
user guide.

I could make a case for a user guide in -mdoc format. It would
be easy to view in a terminal, and PDF is very pleasant for linear
narratives (not to mention printing).

But DocBook gets the job done, too. It has more verbose input and a
bigger toolchain, but you get very nice HTML. A single DocBook document
*can* be rendered as HTML and Postscript/PDF, but it takes some skill
and willingness to particularize parts of the source with escapes for
intended targets. A project the size of clang can probably cope with
those issues.

--jkl

I meant no offense to your efforts. If your main concern is lack of
a man page, I'm fine with adding one. My questions more focused on
what went into the man page, how it got there, etc. These are
important questions.

No offense taken, and none intended. I think the discussion is
healthy. Sure, it might have helped had you said, "I'll commit this for
now pending a larger discussion about what to do"; maybe I'd have felt
a little more like I'd done something useful. But I don't feel
denigrated or rejected. Your response was polite and matter of fact.

I think you might have been the been on the receiving end of some
frustration for things you didn't say. I'm sorry for that.

To me, documentation *systems* have been done and done again. We have
DocBook, TexInfo and mdoc. And, heaven knows, doxygen. The world does
not need another, and clang is ill-suited to invent another. It might
seem like a good idea because we're leveraging something we've already
got. But the experience of thousands of projects over decades is that
documentation is a significant and *separate* effort. It's never
rooted in the help system (unless it *is* the help system, viz
Microsoft). It's not table-driven and it's not mechanically generated.
That fact alone should tell anyone that clang should be, in this case,
a technology adopter, not a technology inventor.

> Should the man page serve as the comprehensive
> documentation for all things related to scan-build? There have been
> different opinions on this thread.

All documentation is of two kinds: User Guide and Reference Manual.

A UG walks you through how do things in some learnable order. It is
read linearly. It starts with simple concepts and dismissable claims,
and works through the process the creator's of the software supposed a
user might take. It's indexed, and the user might have to backtrack.
But it's fundamentally an exposition of why and how things are done.
To some extent, it may explain the problem domain too.

The RM is never read linearly except by dweebs and authors (possibly
the same set). It is organized by feature and accessed by its
index. It says what's true in as few words as possible. It's a "what"
document. There's no place for why and how. The user is assumed to
have read the UG, or at least to have the sense to follow the "read the
UG" advice when the RM points him that way.

The linear/lookup dichotomy influences technology choices. No one (if
you ask me) wants reference manual in HTML, at least not *only* in
HTML. Nothing is faster than 'man foo', especially when the immediate
next step is / to start searching. (My favorite is "man foo
<enter> /^FILE" to find the FILES section of the page.) But HTML is
soft and portable, not a bad format for linear access. PDF has many of
the same properties, but online viewing isn't as convenient.

So, no, I would never propose a man page as the be-all and end-all of
scan-build documentation. I don't yet know enough about it to write
the UG. I started with a man page because I knew how.

I have seen too much documentation that confuses guide and reference.
I'm planting this stake in the ground because, if I'm to be involved in
clang's documentation, I would aim to make the guide expansive and the
reference terse.

(None of the above is intended as criticism of the existing
documentation.)

--jkl

> One of the things I hate about git is that the user interface is
> inconsistent enough that they have to create a hundred man pages
> documenting the different option sets.

That's fair, but would you agree that the idea is good in principle, and that
this is problem with the execution, or do you think the fundamental
design of that help system is flawed?

Personally, I find it slightly annoying since by default that will put
me in some pager.

>
> I really, really hope scan-build won't end up with that mistake.

Just to be clear, which mistake are you referring to?
The inconsistency of the interface?

Having hundreds of different switches that need to be explained. This is
different from having a hundred checkers available, since they all
follow the same style.

scan-build's documentation, which can be found by running it with
no options, consists of a short list of options, a list of checkers,
and a short example (at the very end).
The man page that James provided is essentially this output reformatted
as man page. Other documentation can be found on the
clang-analyzer.llvm.org website, that has a bunch of information such
as tips for using it with projects the use configure, etc. All of that
is missing from the help emitted from the tool.

scan-build without arguments can default to the equivalent of --help.
I'm not sure about including an example in that, but that's a separate
point.

Given your feedback, would you think a scan-build be to have a very
focused man page that omitted the checker descriptions? That would
essentially be NAME, SYNOPSIS, OPTIONS, and EXAMPLE. I could then
see 'scan-build help' just loading up the man page, and providing
other means to get the list of checkers.

I would include the "normal" checkers, e.g. those that are stable and of
interest for a wide audience.

> Now the real question in all of this is how much of the options are
> really scan-build specific and not shared e.g. with clang. For complex
> build systems it can be a lot easier to integrated clang --analyze
> directly and the checker specification and many other things are
> logically more a part of the clang CLI.

It was never my intention to make 'clang --analyze' a public part of
the interface to the analyzer, but more a low-level implementation
detail. The problem with people adopting clang --analyze directly is
that it encumbers us with how we (one day) will design global analysis.
My hope was to have a simple command for "analyzing my project or
source file", and that was meant to be scan-build, or have proper
integration within an IDE that understands that 'clang --analyze' is
just one implementation detail part of probably others that need to be
considered when integrating the analyzer into an IDE.

My problem with scan-build is that it requires at least Perl. NetBSD has
a mostly self-contained toolchain, so that's problematic.

Joerg

Because you don't want markup in Options.td.

It's easy to use regex to find the markup and extract the data. It's
impossible to devise an algorithm that can generate the markup
spontaneously except insofar as the markup is derivable from something
about the data (say, its structure).

Here's an example from the page:

.It osx.cocoa.ClassRelease
Check for sending 'retain', 'release', or 'autorelease' directly to a
Class
.Bq off

That's a little broken, a first-draft cheat. Those single quotes
shouldn't be there because groff can't style them as open-close
quotes. In fact, IIUC they're literals, which in PDF format won't have
quotes at all! (Question for the audience: would the programmer adding
this to Options.td know that?)

What you want is

.It osx.cocoa.ClassRelease
Check for sending
.Li retain ,
.Li release ,
or
.Li autorelease
directly to a Class
.Bq off

producing http://www.schemamania.org/clang/scan-build.pdf.

That's not the only such example.

Do you want to keep text just like that -- verbatim, no whitespace
before the line ending -- in Options.td? Clearly not. Do you want to
invent/use some other markup that can stand in for it, and write a
translator to produce the appropriate mdoc markup? When the mdoc
output is wrong, how much debugging do you want to do? When the
invented markup is insufficient, how much more time do you want to pour
into it?

If you care about the documentation enough to write it, how much time
do you want to invest learn the clang NIH markup toolchain?

Remember, these questions exist in the service of only one question:
how to generate documentation programmatically? If the question is
shortened to "how to generate documentation", the answer is easy:
Typing! And the problem of verification is reduced to a tiny shell
script.

Regards,

--jkl

Because you don’t want markup in Options.td.

Hmm, but that’s not what I was suggesting. The checkers don’t even appear in Options.td in any case.

What I was suggesting was piping the output of ‘clang -cc1 -analyze -analyzer-checker-help’ through a perl script that extracts the list of checkers and formats it with the markup for man.

It’s easy to use regex to find the markup and extract the data. It’s
impossible to devise an algorithm that can generate the markup
spontaneously except insofar as the markup is derivable from something
about the data (say, its structure).

Yes, I think we are in agreement.

Here’s an example from the page:

.It osx.cocoa.ClassRelease
Check for sending ‘retain’, ‘release’, or ‘autorelease’ directly to a
Class
.Bq off

That’s a little broken, a first-draft cheat. Those single quotes
shouldn’t be there because groff can’t style them as open-close
quotes. In fact, IIUC they’re literals, which in PDF format won’t have
quotes at all! (Question for the audience: would the programmer adding
this to Options.td know that?)

Ah, interesting.

Great points. I suppose what we have on the website is somewhere in between a UG and RM, but not entirely both.

James,

I’ve just noticed that there is some discrepancy in the man page, where a lot of checkers that are turned on by default are marked as [off]. For example, all of unix and osx checkers.

Anna.

Jacob Kaplan-Moss in "Writing great documentation" talks about four
types (Tutorials, Topic Guides, Reference, Troubleshooting) and levels
(Project, Document, Section, Element) of documentation. See page 42 of
[1] for a nice tabular overview of his "documentation is fractal"
philosophy. [2] is his video presentation, and [3] the webpages.

IMO worth at least glancing at for ideas of "What" your documentation
set should contain. I also like his use of Sphinx [4] (and therefore
ReST [5]) but that's a separate issue :slight_smile:

[1] Writing great documentation - CodeConf 2011

[2] http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-writing-great-documentation-4899042

[3] Writing Great Documentation - Jacob Kaplan-Moss

[4] http://sphinx.pocoo.org/

[5] reStructuredText