cfe-dev Digest, Vol 45, Issue 19

Hi Chris,

I would be interested to get my feet wet and actually I was thinking of this no later than yesterday (following Sean’s mail).

If there’s one thing that I appreciate in Visual Studio’s compiler it’s their extensive documentation on errors and warnings, when fix-it are not available it’s not always obvious to the beginner what need to be done and I’d be glad to help CLang improve in this department.

I was thinking of adding a target to generate the .html webpages from the Diagnostic*.td files, though I haven’t yet thought extensively of how to organize the said pages (probably by groups ?). I had not thought about making this available directly from the command line, but it does sound great to have it available at the tips of my fingers.

I had thought about 3 sections for each error/warning actually:

  • brief
  • description, including code excerpts to illustrate the issue.
  • references to the standard (for those who wish to dig deeper)

I would be glad to actually contribute to these sections, though given the sheer number of errors and warnings and my total lack of knowledge about Objective-C, I suppose I would need help in this department.

I’d be glad to hear about everyone’s thoughts on the subject, since I guess that a number of people have already thought about it without having the time to actually launch themselves in.

Matthieu

Note: regarding the activation of “everything”, what of a “-Wextra” that activates everything “-Wall” did not, like gcc ?

Hi Chris,

I would be interested to get my feet wet and actually I was thinking of this no later than yesterday (following Sean's mail).

If there's one thing that I appreciate in Visual Studio's compiler it's their extensive documentation on errors and warnings, when fix-it are not available it's not always obvious to the beginner what need to be done and I'd be glad to help CLang improve in this department.

That would be great!

I was thinking of adding a target to generate the .html webpages from the Diagnostic*.td files, though I haven't yet thought extensively of how to organize the said pages (probably by groups ?). I had not thought about making this available directly from the command line, but it does sound great to have it available at the tips of my fingers.

Yeah. In addition to allowing us to support a --help <warning> flag, this would mean that we could actually vend documentation for the various warnings through the libclang interface.

I had thought about 3 sections for each error/warning actually:
- brief

We definitely want this for all errors/warnings.

- description, including code excerpts to illustrate the issue.

I expect we won't need this everywhere. Rather, we should focus on giving good descriptions for the diagnostics that tend to confuse people.

- references to the standard (for those who wish to dig deeper)

The language lawyer in me loves this idea, but I think it's more work than it's worth. Only a tiny fraction of users will ever use this feature.

I would be glad to actually contribute to these sections, though given the sheer number of errors and warnings and my total lack of knowledge about Objective-C, I suppose I would need help in this department.

I'd be glad to hear about everyone's thoughts on the subject, since I guess that a number of people have already thought about it without having the time to actually launch themselves in.

I think this would be great, and I think you'll get help with the content once there's a framework in place.

  - Doug

For the record, gcc does not do that. -Wextra is more warnings than -
Wall, but it is not everything.

Hello,

This is the first patch-set to better document clang diagnostics.

I have added two fields into the Diagnostic (.td) class: Brief and Explanation; and modified the subsequent generation (and exploitation) of the table
I have added two getters to DiagnosticIDs to get the “BriefExplanation” and “FullExplanation”

There is also a few corrections of comments that were (it seems) out of sync

Note: there is one (little) patch for LLVM, because the TableGen backend is in llvm repository… and the two patches are intrisically tied (obviously)

Note 2: is this normally tested or do we assume that is compilation a test in itself ?

Unless we want the same Brief / Explanation for Groups too, this normally implement the “libclang” part, so next are the HTML pages generation and the Help menu modification.

For those tasks, the distinction of where the diagnostic comes from probably does not makes sense for the user, so I would expect to have to group the warnings using the categories and subgroups rather than AST/Sema/Parse. I suppose that we’ll want to expose both warnings and extension warnings here.

There is one catch though: the individual warnings are not named. I was wondering if I should create a name (#ENUM in the macro ? The internal ID ?) or if the Brief would be enough and I’d just list all the briefs/explanations underneath the subgroup they belong to.

With regard to the HTML generation: I was thinking about using a Python script and invoke it as part of the build step. This would require, I guess, that I update the Python Bindings (if any ?) to extract the information from libclang. I’d appreciate pointers to the documentation (if any).

Comments / Advices are welcome,
Matthieu.

2011/3/8 Sean McBride <sean@rogue-research.com>

clang_diagnostic_explanation.diff (9.53 KB)

llvm_diagnostic_explanation.diff (565 Bytes)

Hello,

This is the first patch-set to better document clang diagnostics.

I have added two fields into the Diagnostic (.td) class: Brief and Explanation; and modified the subsequent generation (and exploitation) of the table
I have added two getters to DiagnosticIDs to get the “BriefExplanation” and “FullExplanation”

There is also a few corrections of comments that were (it seems) out of sync

Note: there is one (little) patch for LLVM, because the TableGen backend is in llvm repository… and the two patches are intrisically tied (obviously)

Note 2: is this normally tested or do we assume that is compilation a test in itself ?

Compilation, plus any tests for the feature that the TableGen changes support, are enough.

Unless we want the same Brief / Explanation for Groups too, this normally implement the “libclang” part, so next are the HTML pages generation and the Help menu modification.

I don’t think we need to worry about this just yet.

For those tasks, the distinction of where the diagnostic comes from probably does not makes sense for the user, so I would expect to have to group the warnings using the categories and subgroups rather than AST/Sema/Parse.

Yes. More categorization of the existing diagnostics would be useful.

I suppose that we’ll want to expose both warnings and extension warnings here.

Yes.

There is one catch though: the individual warnings are not named. I was wondering if I should create a name (#ENUM in the macro ? The internal ID ?) or if the Brief would be enough and I’d just list all the briefs/explanations underneath the subgroup they belong to.

Well, the warnings do have the names they’re given in the .td files, e.g.,

def err_array_init_not_init_list : Error<
"array initializer must be an initializer "
“list%select{| or string literal}0”>;

We could take the diagnostic name without the err_/ext_/warn_ prefix, so the name of this diagnostic would be “array_init_not_init_list”. TableGen could expose this as the diagnostic name.

With regard to the HTML generation: I was thinking about using a Python script and invoke it as part of the build step. This would require, I guess, that I update the Python Bindings (if any ?) to extract the information from libclang. I’d appreciate pointers to the documentation (if any).

Well, that’s certainly one approach. You’d need to extend libclang to provide diagnostic documentation information, and update the Python bindings for libclang. I guess the benefit is that it’s easier to emit HTML from a Python script than from C++.

Personally, I think you should start simpler: add a flag that teaches the diagnostic printer to emit the name of the diagnostic as part of the brackets, e.g.,

warning: semicolon before method body is ignored [-Wsemicolon-before-method-body, semicolon_before_method_body]

and then add some kind of --help option that looks up the documentation for a diagnostic based on that name, e.g.,

clang --help semicolon_before_method_body

With this working, you’ll be able to write regression tests that ensure that we’re getting the right documentation for the right names. And users could use this feature to get more information about a particular diagnostic. That entire system could be in place before you start worrying about the formatting of HTML itself, so other people could help fill in content.

  • Doug

Hello,

I have further modified the diagnostics system of clang.

  1. I have introduce two new options -fdiagnostics-show-name and -fno-diagnostics-show-name to activate/deactive printing the diagnostic name. It is activated by default (like -fdiagnostics-show-option), I don’t suppose there’s any opposition, but just in case you wish a different name, shout!

(note: I’ve copied it in several places, following -fdiagnostics-show-option example, I don’t know if I missed something… and I hope I didn’t make a typo in one site)

  1. I have added an Index to map the name of the diagnostic back to the ID. For efficient search this index is sorted… on first use. I could not find a way to get the TableGen backend to sort it by itself because the diagnostics are splitted over multiple files. Because it is sorted only on first use, I don’t expect any impact (apart from the memory it takes) during a regular run, since it should only be invoked through the help menu. Also, even if sorting is “relatively” expensive (< 2k elements), since it’s for the help, I guess it does not matter too much.

I just have one little issue: what if I don’t find it ? At the moment I hijacked diag::DIAG_UPPER_LIMIT for signaling an invalid name. Is this correct ?

I was also thinking that we could probably propose a fuzzy match for a best effort search, in case we didn’t locate the name, because users may make a typo when invoking the help. I would suggest adding a boolean parameter to indicate whether the call wishes for this extended search or not. Thoughts ?

  1. I have had some troubles extracting names from the IDs. Most IDs are well-behaved and begin by fatal_, err_, ext_, warn_ or note_ but there’s a couple that do not (beginning by error_ or embedding the warn in the middle of the string or just not indicating anything of the sort). For those, I simply print the whole ID as the name, however I’d like to provide a clean-up (separate) patch to “standardize” them.

  2. I have wired it all in the TextDiagnosticPrinter.cpp (as you’ll see). At the moment I print it in a separate bracket block

$ clang++ error.cpp
error.cpp:1:5: error: second parameter of ‘main’ (argument array) must be of type 'char *’ [main_arg_wrong]
int main(int, char
) {
^
1 error generated.

$ clang++ -Wunused-parameter warning.cpp
warning.cpp:1:21: warning: unused parameter ‘argv’ [unused_parameter] [-Wunused-parameter]
int main(int, char* argv) { // expected-warning{{unused parameter ‘argv’}}
^
1 warning generated.

Would you rather I put it in the already existing bracket block ?

I’ve bypassed it for notes, I don’t expect the name to be useful for notes, but I would not mind printing it if some felt strongly about it.

I could also bypass it if there is no BriefExplanation, but (at least for warning) we could always display even then the categories it belongs to, the different level of options etc… thoughts ?

  1. I have written a simple test in the FrontEnd section, but I have some trouble with using lit.py on MSYS (MinGW shell), I’ll send a separate mail to the list for support on this. I’ve included the test anyway, to see if the intent is right at least.

  2. I was planning on producing the patches in several drops, this being the first. If I get the test to play nice… Does this seem reasonable ? Do you think this patch misses something (apart from a test for the name index… since I’ll require to tweak the help for that).

And obviously, comments on the patches themselves would be welcome.

Regards :slight_smile:

2011/3/14 Douglas Gregor <dgregor@apple.com>

llvm_diagnostic_1.diff (1.72 KB)

clang_diagnostic_1.diff (19.1 KB)

Hello,

Little ping on this :slight_smile:

I’ve joined two “fresher” patches, the only real update is that thanks to Takumi’s help I was able to actually run the test I had wrote and thus ensure it passed and tested the functionality as expected.

I am still waiting for comments on:

  1. the option name, and the fact it is activated by default
  2. the quality of implementation itself (I hope it did get the naming convention right)
  3. the standardization of the names
  4. whether the display suits you or not
  5. whether the test suits you or not
  6. if this seems sufficient for a first patch :slight_smile:

I’ll forge ahead once this first step is settled.

Matthieu.

2011/3/19 Matthieu Monrocq <matthieu.monrocq@gmail.com>

clang_diagnostic_1.diff (19 KB)

llvm_diagnostic_1.diff (1.72 KB)

Hi Matthieu, I really like your suggestions for improving diagnostics!

2. I have added an Index to map the name of the diagnostic back to the ID. For efficient search this index is sorted... on first use. I could not find a way to get the TableGen backend to sort it by itself because the diagnostics are splitted over multiple files. Because it is sorted only on first use, I don't expect any impact (apart from the memory it takes) during a regular run, since it should only be invoked through the help menu. Also, even if sorting is "relatively" expensive (< 2k elements), since it's for the help, I guess it does not matter too much.

.td files can include other .td files. You could add a new .td file aggregating all the diagnostics so you can have TableGen produce the necessary sorted arrays.

I just have one little issue: what if I don't find it ? At the moment I hijacked diag::DIAG_UPPER_LIMIT for signaling an invalid name. Is this correct ?

I was also thinking that we could probably propose a fuzzy match for a best effort search, in case we didn't locate the name, because users may make a typo when invoking the help.

Proposing a match if one was not found is a good idea.

I would suggest adding a boolean parameter to indicate whether the call wishes for this extended search or not. Thoughts ?

Why not always propose a match ?

3. I have had some troubles extracting names from the IDs. Most IDs are well-behaved and begin by `fatal_`, `err_`, `ext_`, `warn_` or `note_` but there's a couple that do not (beginning by `error_` or embedding the `warn` in the middle of the string or just not indicating anything of the sort). For those, I simply print the whole ID as the name, however I'd like to provide a clean-up (separate) patch to "standardize" them.

Please "standardize" them.

-Argiris

Hello,

I have sent the two patches with the full precomputation of the index during compilation.

I’ll standardize the existing diagnostics in another patch to differentiate a functional patch from a stylistic one.

Also, I did not implement the fuzzy matching yet, mainly because the index is still unused, and so I would not be able to test it. I just wanted to create the index in this patch to avoid later modifications to tblgen (since it implies modifying llvm and providing two linked patches).

I would recommend using StringRef::edit_distance for the fuzzy match, since it’s what is used for auto completion. However the names are “bare” C-Strings… would it be okay to StringRef-ize the Diagnostic interface ? Or perhaps only StringRef-ize the internals and still return C-Strings ?

Matthieu

2011/4/13 Argyrios Kyrtzidis <kyrtzidis@apple.com>

Hello,

I have sent the two patches with the full precomputation of the index during compilation.

I'll standardize the existing diagnostics in another patch to differentiate a functional patch from a stylistic one.

Okay, thanks.

Also, I did not implement the fuzzy matching yet, mainly because the index is still unused, and so I would not be able to test it. I just wanted to create the index in this patch to avoid later modifications to tblgen (since it implies modifying llvm and providing two linked patches).

The fuzzy matching can come later.

I would recommend using StringRef::edit_distance for the fuzzy match, since it's what is used for auto completion. However the names are "bare" C-Strings... would it be okay to StringRef-ize the Diagnostic interface ? Or perhaps only StringRef-ize the internals and still return C-Strings ?

Sure. StringRef'ification is general goodness.

  - Doug